Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Rate Limits and Quotas

ARES enforces two independent layers of rate limiting to protect the platform and ensure fair resource allocation across tenants.


Layer 1: IP-Based Rate Limiting

Every incoming request is subject to per-IP rate limiting via tower_governor. This layer protects against abuse, brute-force attacks, and accidental request floods regardless of authentication status.

IP-based limits apply to all routes, including unauthenticated endpoints like /health. The specific thresholds are configured server-side and are intentionally generous for normal usage patterns.

If you hit the IP rate limit, you will receive a 429 Too Many Requests response. Back off and retry after a short delay.


Layer 2: Tenant Quotas

Authenticated requests to /v1/* are additionally subject to tenant-level quotas based on the tenant’s tier. These quotas reset at the beginning of each calendar month.

TierMonthly RequestsMonthly TokensDaily Rate Limit
Free1,000100,000100/day
Dev10,0001,000,0001,000/day
Pro100,00010,000,00010,000/day
EnterpriseUnlimitedUnlimitedUnlimited

What Counts as a Request

Each API call to a metered endpoint counts as one request:

  • POST /v1/agents/{name}/run — 1 request
  • POST /v1/chat — 1 request
  • POST /v1/chat/stream — 1 request
  • GET /v1/agents — 1 request

Read-only endpoints like GET /v1/usage and GET /v1/api-keys are metered but count toward the request total.

What Counts as Tokens

Token usage is tracked per request based on the combined input and output token count from the LLM provider. Both the prompt tokens and completion tokens are summed.


Response Headers

When you make a request to a metered endpoint, ARES includes rate limit information in the response headers:

HeaderDescription
X-RateLimit-LimitMaximum requests allowed in the current period
X-RateLimit-RemainingRequests remaining in the current period
X-RateLimit-ResetUTC timestamp when the current period resets
X-Quota-Tokens-RemainingTokens remaining in the current monthly period

Example headers:

X-RateLimit-Limit: 10000
X-RateLimit-Remaining: 7482
X-RateLimit-Reset: 2026-04-01T00:00:00Z
X-Quota-Tokens-Remaining: 8241037

Exceeding Limits

When you exceed either rate limit layer, ARES returns:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "error": "Rate limit exceeded. Daily request limit reached for your tier."
}

The error message indicates which limit was hit:

Error MessageCauseResolution
Rate limit exceededIP-based rate limitWait and retry. Reduce request frequency.
Daily request limit reached for your tierTenant daily capWait until the next UTC day, or upgrade your tier.
Monthly request quota exceededTenant monthly capWait until the next billing period, or upgrade.
Monthly token quota exceededTenant token capWait until the next billing period, or upgrade.

Checking Your Usage

You can proactively monitor your consumption to avoid hitting limits:

curl https://api.ares.dirmacs.com/v1/usage \
  -H "Authorization: Bearer ares_xxx"

Response:

{
  "period_start": "2026-03-01T00:00:00Z",
  "period_end": "2026-03-31T23:59:59Z",
  "total_runs": 4821,
  "total_tokens": 2847193,
  "total_api_calls": 5290,
  "quota_runs": 100000,
  "quota_tokens": 10000000,
  "daily_usage": [
    { "date": "2026-03-13", "runs": 312, "tokens": 184920, "api_calls": 340 }
  ]
}

Compare total_runs against quota_runs and total_tokens against quota_tokens to see how much headroom you have.


Best Practices

  1. Monitor usage proactively. Poll GET /v1/usage periodically rather than waiting for 429 errors.

  2. Implement exponential backoff. When you receive a 429, wait before retrying. A simple strategy: wait 1s, then 2s, then 4s, up to a maximum of 30s.

  3. Cache where possible. Agent listings and model metadata change infrequently. Cache these responses to reduce unnecessary API calls.

  4. Use streaming for chat. POST /v1/chat/stream counts as a single request regardless of response length, same as the non-streaming variant.

  5. Request a tier upgrade early. If you anticipate hitting your quota before month-end, contact your platform administrator to upgrade your tier. Tier changes take effect immediately.