Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Models & Providers

ARES routes LLM requests across multiple providers through a single API. You do not call providers directly — ARES selects the appropriate model based on the agent configuration and handles credentials, rate limits, and failover transparently.

Available models

TierProviderModelBest for
fastGroqllama-3.1-8b-instantQuick responses, classification, simple Q&A
balancedGroqllama-3.3-70b-versatileGeneral-purpose tasks, GPT-4 class quality
powerfulAnthropicclaude-sonnet-4-6Complex reasoning, long-form analysis, nuanced tasks
deepseekNVIDIAdeepseek-v3.2Code generation, technical documentation, structured output
localOllamaministral-3:3bDevelopment, testing, offline use

How model selection works

You do not specify a model directly in your API calls. Instead, you specify an agent_type, and each agent is configured with a model tier.

# This request is routed to whichever model the "product" agent is configured to use
curl -X POST https://api.ares.dirmacs.com/v1/chat \
  -H "Authorization: Bearer ares_xxx" \
  -H "Content-Type: application/json" \
  -d '{"message": "Compare these two options", "agent_type": "product"}'

The mapping between agents and models is configured by your tenant administrator. A typical setup might look like:

AgentModel tierRationale
classifierfastNeeds speed, not depth
productbalancedGeneral-purpose, good quality
analystpowerfulComplex reasoning required
code-reviewdeepseekSpecialized for code tasks

This design means you can upgrade an agent’s underlying model without changing any client code.

Provider architecture

ARES uses a named-provider system. Each provider is configured with its API endpoint, credentials, and rate limits. Models reference their provider by name.

┌─────────────┐
│  Your App   │
│  agent_type │
└──────┬──────┘
       │
       ▼
┌─────────────┐     ┌──────────┐
│    ARES     │────▶│   Groq   │  fast, balanced
│   Router    │     └──────────┘
│             │     ┌──────────┐
│             │────▶│Anthropic │  powerful
│             │     └──────────┘
│             │     ┌──────────┐
│             │────▶│  NVIDIA  │  deepseek
│             │     └──────────┘
│             │     ┌──────────┐
│             │────▶│  Ollama  │  local
└─────────────┘     └──────────┘

Provider details

Groq — High-throughput inference on custom LPUs. Extremely fast response times. Hosts open-source models (Llama, Mixtral). Free tier available with rate limits.

Anthropic — Claude models. Best-in-class for complex reasoning, instruction following, and safety. Requires a paid API key.

NVIDIA (DeepSeek) — NVIDIA-hosted DeepSeek models via the NVIDIA AI API. Strong at code generation and structured technical output.

Ollama — Self-hosted, local inference. No external API calls. Useful for development, air-gapped environments, or when you need to keep data on-premises.

Rate limits

Rate limits are enforced per provider and per tenant. The following are default limits for the Groq free tier:

Model tierRequests per dayTokens per minute
fast (llama-3.1-8b)14,40020,000
balanced (llama-3.3-70b)6,0006,000

Anthropic and NVIDIA rate limits depend on your API plan with those providers. ARES surfaces rate limit errors transparently:

{
  "error": "Rate limit exceeded for provider 'groq'",
  "code": "RATE_LIMIT_EXCEEDED",
  "retry_after": 60
}

Tenant-level rate limits and quotas are configured separately by your administrator and enforced by ARES regardless of provider limits.

Adding your own providers

If you are self-hosting ARES, you can add providers in your ares.toml configuration:

[[providers]]
name = "my-openai"
kind = "openai"
api_base = "https://api.openai.com/v1"
api_key_env = "OPENAI_API_KEY"

[[models]]
name = "gpt-4o"
provider = "my-openai"
model_id = "gpt-4o"
tier = "powerful"

Any provider that exposes an OpenAI-compatible API (vLLM, Together AI, Fireworks, etc.) can be added using the openai provider kind.

Choosing the right tier

If you need…Use tier
Fastest possible responsefast
Good quality at reasonable speedbalanced
Maximum reasoning capabilitypowerful
Code generation or technical tasksdeepseek
Offline or local developmentlocal

When in doubt, start with balanced. It provides the best trade-off between quality, speed, and cost for most use cases.