Introduction

ARES is a multi-provider LLM platform that gives you a single, unified API to route requests across Groq, Anthropic, NVIDIA DeepSeek, and Ollama. It handles tool calling, retrieval-augmented generation (RAG), multi-step workflows, streaming, usage metering, and multi-tenant isolation out of the box — so you can focus on building your AI application instead of stitching together provider SDKs.

Key capabilities

Multi-provider LLM routing — Send requests to Groq, Anthropic, NVIDIA, or Ollama through one API. Switch models without changing your integration.
Tool calling — Define tools your agents can invoke. ARES manages the tool-call loop, execution, and response assembly.
Retrieval-augmented generation (RAG) — Ground LLM responses in your own data with built-in retrieval pipelines.
Workflows — Chain multiple agents and processing steps into deterministic, multi-step workflows.
Multi-tenant enterprise support — Tenant isolation, per-tenant agent configuration, API key scoping, and usage tracking at the tenant level.
Streaming — Server-Sent Events (SSE) streaming for real-time, token-by-token responses.
Usage metering — Track tokens, requests, and costs per tenant with built-in rate limiting and quota enforcement.
Skills — SKILL.md file discovery and loading via thulp-skill-files. Scope-based priority resolution (project > personal > plugin).
MCP integration — Bridge external MCP servers as agent-callable tools. Connect Eruka, Daedra, or any MCP-compatible service.
Loop detection — Sliding-window hash tracking with 3-tier escalation (warn, force alternative, halt) prevents agents from getting stuck in infinite loops.
Crash recovery — Checkpoint-based state serialization lets agents resume from the last saved state after failures.
Agent versioning — Version history, rollback, and emergency stop (kill switch) for all agent requests.
Research coordination — Deep research agent with configurable depth and max iterations for multi-step investigation tasks.
Deployment automation — Built-in deploy/rollback endpoints with service health monitoring and log streaming.

Who is ARES for?

Platform teams building internal AI infrastructure who need a reliable, multi-provider abstraction layer.
Enterprise clients who want managed AI agents with tenant isolation, usage visibility, and SLA guarantees.
Developers building AI applications who want a clean API without managing provider credentials, rate limits, and failover logic themselves.

Base URL

All API requests are made to:

http://localhost:3000

Quick links

Resource	Description
Quickstart	Zero to first API call in 5 minutes
Authentication	API keys, JWT tokens, and admin auth
Models & Providers	Available models, tiers, and provider configuration
Changelog	Release history and breaking changes

Quickstart

Get from zero to your first ARES API call in under 5 minutes.

Prerequisites

An ARES API key (format: ares_xxx). Contact your administrator or use the Dirmacs Admin provisioning UI to generate one.

1. Make your first chat request

Send a message to an ARES agent using the chat endpoint.

curl

curl -X POST http://localhost:3000/v1/chat \
  -H "Authorization: Bearer ares_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What can you help me with?",
    "agent_type": "product"
  }'

Python

import requests

response = requests.post(
    "http://localhost:3000/v1/chat",
    headers={
        "Authorization": "Bearer ares_xxx",
        "Content-Type": "application/json",
    },
    json={
        "message": "What can you help me with?",
        "agent_type": "product",
    },
)

data = response.json()
print(data["response"])

JavaScript

const response = await fetch("http://localhost:3000/v1/chat", {
  method: "POST",
  headers: {
    "Authorization": "Bearer ares_xxx",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    message: "What can you help me with?",
    agent_type: "product",
  }),
});

const data = await response.json();
console.log(data.response);

Response

{
  "response": "I can help you with product information, recommendations, and questions...",
  "agent": "product",
  "context_id": "ctx_a1b2c3d4"
}

The context_id is returned with every response. Pass it back in subsequent requests to maintain conversation context.

2. Try streaming

For real-time, token-by-token output, use the streaming endpoint. ARES streams responses using Server-Sent Events (SSE).

curl

curl -N -X POST http://localhost:3000/v1/chat/stream \
  -H "Authorization: Bearer ares_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Explain how LLM routing works",
    "agent_type": "product"
  }'

The -N flag disables output buffering so you see tokens as they arrive.

Python

import requests

response = requests.post(
    "http://localhost:3000/v1/chat/stream",
    headers={
        "Authorization": "Bearer ares_xxx",
        "Content-Type": "application/json",
    },
    json={
        "message": "Explain how LLM routing works",
        "agent_type": "product",
    },
    stream=True,
)

for line in response.iter_lines():
    if line:
        decoded = line.decode("utf-8")
        if decoded.startswith("data: "):
            print(decoded[6:], end="", flush=True)

JavaScript

const response = await fetch("http://localhost:3000/v1/chat/stream", {
  method: "POST",
  headers: {
    "Authorization": "Bearer ares_xxx",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    message: "Explain how LLM routing works",
    agent_type: "product",
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split("\n");

  for (const line of lines) {
    if (line.startsWith("data: ")) {
      process.stdout.write(line.slice(6));
    }
  }
}

3. Continue a conversation

Use the context_id from a previous response to maintain conversation history:

curl -X POST http://localhost:3000/v1/chat \
  -H "Authorization: Bearer ares_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Tell me more about that",
    "agent_type": "product",
    "context_id": "ctx_a1b2c3d4"
  }'

Next steps

Authentication — Learn about API keys, JWT tokens, and admin authentication.
Models & Providers — Understand which models are available and how to choose the right one.

Authentication

ARES supports three authentication methods, each designed for a different use case.

Method	Header	Routes	Use case
API Key	`Authorization: Bearer ares_xxx`	`/v1/*`	Client applications, backend services
JWT	`Authorization: Bearer <access_token>`	`/api/*`	End-user sessions, frontend apps
Admin Secret	`X-Admin-Secret: <secret>`	`/api/admin/*`	Internal administration

API Key authentication

API keys are the simplest way to authenticate with ARES. Each key is scoped to a single tenant and carries that tenant's permissions and rate limits.

Format: ares_ followed by a random string (e.g., ares_k7Gx9mPqR2vLwN4s).

How to get one: API keys are generated during tenant provisioning via the Dirmacs Admin dashboard, or through the admin API.

Usage

Pass the API key in the Authorization header on any /v1/* endpoint:

curl -X POST http://localhost:3000/v1/chat \
  -H "Authorization: Bearer ares_k7Gx9mPqR2vLwN4s" \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello", "agent_type": "product"}'

import requests

headers = {
    "Authorization": "Bearer ares_k7Gx9mPqR2vLwN4s",
    "Content-Type": "application/json",
}

response = requests.post(
    "http://localhost:3000/v1/chat",
    headers=headers,
    json={"message": "Hello", "agent_type": "product"},
)

const response = await fetch("http://localhost:3000/v1/chat", {
  method: "POST",
  headers: {
    "Authorization": "Bearer ares_k7Gx9mPqR2vLwN4s",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ message: "Hello", agent_type: "product" }),
});

Security: Treat API keys like passwords. Do not embed them in client-side code, commit them to version control, or expose them in logs. Use environment variables or a secrets manager.

JWT authentication

JWT authentication is designed for end-user sessions. Users register and log in to receive short-lived access tokens and long-lived refresh tokens.

Access tokens expire after 15 minutes.
Refresh tokens are used to obtain new access tokens without re-entering credentials.

Register a new user

curl -X POST http://localhost:3000/api/auth/register \
  -H "Content-Type: application/json" \
  -d '{
    "email": "developer@example.com",
    "password": "your-secure-password",
    "name": "Jane Developer"
  }'

Response:

{
  "message": "Registration successful",
  "user_id": "usr_abc123"
}

Log in

curl -X POST http://localhost:3000/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{
    "email": "developer@example.com",
    "password": "your-secure-password"
  }'

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "refresh_token": "rt_x9Kp2mQvL8wN3rTs...",
  "expires_in": 900
}

Use the access token

Pass the access token in the Authorization header on any /api/* endpoint:

curl http://localhost:3000/api/chat \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello", "agent_type": "product"}'

Refresh an expired token

When your access token expires, use the refresh token to get a new one:

curl -X POST http://localhost:3000/api/auth/refresh \
  -H "Content-Type: application/json" \
  -d '{
    "refresh_token": "rt_x9Kp2mQvL8wN3rTs..."
  }'

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "expires_in": 900
}

Log out

Invalidate a refresh token when the user logs out:

curl -X POST http://localhost:3000/api/auth/logout \
  -H "Content-Type: application/json" \
  -d '{
    "refresh_token": "rt_x9Kp2mQvL8wN3rTs..."
  }'

Token management in Python

import requests
import time


class AresClient:
    def __init__(self, base_url="http://localhost:3000"):
        self.base_url = base_url
        self.access_token = None
        self.refresh_token = None
        self.token_expiry = 0

    def login(self, email, password):
        response = requests.post(
            f"{self.base_url}/api/auth/login",
            json={"email": email, "password": password},
        )
        data = response.json()
        self.access_token = data["access_token"]
        self.refresh_token = data["refresh_token"]
        self.token_expiry = time.time() + data["expires_in"]

    def _ensure_valid_token(self):
        if time.time() >= self.token_expiry - 30:  # Refresh 30s before expiry
            response = requests.post(
                f"{self.base_url}/api/auth/refresh",
                json={"refresh_token": self.refresh_token},
            )
            data = response.json()
            self.access_token = data["access_token"]
            self.token_expiry = time.time() + data["expires_in"]

    def chat(self, message, agent_type="product"):
        self._ensure_valid_token()
        response = requests.post(
            f"{self.base_url}/api/chat",
            headers={"Authorization": f"Bearer {self.access_token}"},
            json={"message": message, "agent_type": agent_type},
        )
        return response.json()

Token management in JavaScript

class AresClient {
  constructor(baseUrl = "http://localhost:3000") {
    this.baseUrl = baseUrl;
    this.accessToken = null;
    this.refreshToken = null;
    this.tokenExpiry = 0;
  }

  async login(email, password) {
    const response = await fetch(`${this.baseUrl}/api/auth/login`, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ email, password }),
    });
    const data = await response.json();
    this.accessToken = data.access_token;
    this.refreshToken = data.refresh_token;
    this.tokenExpiry = Date.now() + data.expires_in * 1000;
  }

  async ensureValidToken() {
    if (Date.now() >= this.tokenExpiry - 30000) {
      const response = await fetch(`${this.baseUrl}/api/auth/refresh`, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ refresh_token: this.refreshToken }),
      });
      const data = await response.json();
      this.accessToken = data.access_token;
      this.tokenExpiry = Date.now() + data.expires_in * 1000;
    }
  }

  async chat(message, agentType = "product") {
    await this.ensureValidToken();
    const response = await fetch(`${this.baseUrl}/api/chat`, {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${this.accessToken}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ message, agent_type: agentType }),
    });
    return response.json();
  }
}

Admin Secret authentication

The admin secret provides full access to ARES administration endpoints. It is intended for internal tools and the Dirmacs Admin dashboard only.

Pass the secret in the X-Admin-Secret header:

curl http://localhost:3000/api/admin/tenants \
  -H "X-Admin-Secret: your-admin-secret"

Warning: The admin secret grants unrestricted access to all tenants, agents, and configuration. Never expose it outside your infrastructure. It should only be used in server-to-server calls from trusted internal services.

Error responses

Authentication failures return standard HTTP status codes:

Status	Meaning
`401 Unauthorized`	Missing or invalid credentials
`403 Forbidden`	Valid credentials but insufficient permissions
`429 Too Many Requests`	Rate limit exceeded for this API key or tenant

Example error response:

{
  "error": "Invalid or expired token",
  "code": "AUTH_INVALID_TOKEN"
}

Models & Providers

ARES routes LLM requests across multiple providers through a single API. You do not call providers directly — ARES selects the appropriate model based on the agent configuration and handles credentials, rate limits, and failover transparently.

Available models

Tier	Provider	Model	Best for
`fast`	Groq	`llama-3.1-8b-instant`	Quick responses, classification, simple Q&A
`balanced`	Groq	`llama-3.3-70b-versatile`	General-purpose tasks, GPT-4 class quality
`powerful`	Anthropic	`claude-3.5-sonnet`	Complex reasoning, long-form analysis, nuanced tasks
`deepseek`	NVIDIA	`deepseek-r1-distill-llama-70b`	Code generation, technical documentation, structured output
`local`	Ollama	`ministral-3:3b`	Development, testing, offline use

How model selection works

You do not specify a model directly in your API calls. Instead, you specify an agent_type, and each agent is configured with a model tier.

# This request is routed to whichever model the "product" agent is configured to use
curl -X POST http://localhost:3000/v1/chat \
  -H "Authorization: Bearer ares_xxx" \
  -H "Content-Type: application/json" \
  -d '{"message": "Compare these two options", "agent_type": "product"}'

The mapping between agents and models is configured by your tenant administrator. A typical setup might look like:

Agent	Model tier	Rationale
`classifier`	`fast`	Needs speed, not depth
`product`	`balanced`	General-purpose, good quality
`analyst`	`powerful`	Complex reasoning required
`code-review`	`deepseek`	Specialized for code tasks

This design means you can upgrade an agent's underlying model without changing any client code.

Provider architecture

ARES uses a named-provider system. Each provider is configured with its API endpoint, credentials, and rate limits. Models reference their provider by name.

┌─────────────┐
│  Your App   │
│  agent_type │
└──────┬──────┘
       │
       ▼
┌─────────────┐     ┌──────────┐
│    ARES     │────▶│   Groq   │  fast, balanced
│   Router    │     └──────────┘
│             │     ┌──────────┐
│             │────▶│Anthropic │  powerful
│             │     └──────────┘
│             │     ┌──────────┐
│             │────▶│  NVIDIA  │  deepseek
│             │     └──────────┘
│             │     ┌──────────┐
│             │────▶│  Ollama  │  local
└─────────────┘     └──────────┘

Provider details

Groq — High-throughput inference on custom LPUs. Extremely fast response times. Hosts open-source models (Llama, Mixtral). Free tier available with rate limits.

Anthropic — Claude models. Best-in-class for complex reasoning, instruction following, and safety. Requires a paid API key.

NVIDIA (DeepSeek) — NVIDIA-hosted DeepSeek models via the NVIDIA AI API. Strong at code generation and structured technical output.

Ollama — Self-hosted, local inference. No external API calls. Useful for development, air-gapped environments, or when you need to keep data on-premises.

Rate limits

Rate limits are enforced per provider and per tenant. The following are default limits for the Groq free tier:

Model tier	Requests per day	Tokens per minute
`fast` (llama-3.1-8b)	14,400	20,000
`balanced` (llama-3.3-70b)	6,000	6,000

Anthropic and NVIDIA rate limits depend on your API plan with those providers. ARES surfaces rate limit errors transparently:

{
  "error": "Rate limit exceeded for provider 'groq'",
  "code": "RATE_LIMIT_EXCEEDED",
  "retry_after": 60
}

Tenant-level rate limits and quotas are configured separately by your administrator and enforced by ARES regardless of provider limits.

Adding your own providers

If you are self-hosting ARES, you can add providers in your ares.toml configuration:

[[providers]]
name = "my-openai"
kind = "openai"
api_base = "https://api.openai.com/v1"
api_key_env = "OPENAI_API_KEY"

[[models]]
name = "gpt-4o"
provider = "my-openai"
model_id = "gpt-4o"
tier = "powerful"

Any provider that exposes an OpenAI-compatible API (vLLM, Together AI, Fireworks, etc.) can be added using the openai provider kind.

Choosing the right tier

If you need...	Use tier
Fastest possible response	`fast`
Good quality at reasonable speed	`balanced`
Maximum reasoning capability	`powerful`
Code generation or technical tasks	`deepseek`
Offline or local development	`local`

When in doubt, start with balanced. It provides the best trade-off between quality, speed, and cost for most use cases.

Chat & Conversations

Send messages to ARES agents and manage multi-turn conversations.

Send a message

POST /api/chat

Send a message to an agent and receive a response. ARES routes the message to the appropriate agent based on the agent_type parameter, or uses the default router agent if none is specified.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Request body

Parameter	Type	Required	Description
`message`	string	Yes	The user's message or prompt.
`agent_type`	string	No	Which agent handles the request (e.g., `"product"`, `"research"`, `"router"`). Defaults to the router agent.
`context_id`	string	No	Conversation context ID. Pass this value back on subsequent requests to continue a multi-turn conversation.

Response

{
  "response": "Here's what I found about your question...",
  "agent": "product",
  "context_id": "ctx_a1b2c3d4",
  "sources": null
}

Field	Type	Description
`response`	string	The agent's response text.
`agent`	string	The agent that handled the request.
`context_id`	string	Context identifier. Pass this back to continue the conversation.
`sources`	array\|null	Source references, if the agent performed retrieval. Otherwise `null`.

Examples

curl

curl -X POST http://localhost:3000/api/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "message": "What pricing plans do you offer?",
    "agent_type": "product"
  }'

Python

import requests

response = requests.post(
    "http://localhost:3000/api/chat",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "message": "What pricing plans do you offer?",
        "agent_type": "product"
    }
)

data = response.json()
print(data["response"])

# Continue the conversation using the returned context_id
follow_up = requests.post(
    "http://localhost:3000/api/chat",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "message": "How does the Pro plan compare to Enterprise?",
        "context_id": data["context_id"]
    }
)

JavaScript

const response = await fetch("http://localhost:3000/api/chat", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJhbGciOi..."
  },
  body: JSON.stringify({
    message: "What pricing plans do you offer?",
    agent_type: "product"
  })
});

const data = await response.json();
console.log(data.response);

// Continue the conversation
const followUp = await fetch("http://localhost:3000/api/chat", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJhbGciOi..."
  },
  body: JSON.stringify({
    message: "How does the Pro plan compare to Enterprise?",
    context_id: data.context_id
  })
});

Stream a response

POST /api/chat/stream

Send a message and receive the response as a stream of Server-Sent Events (SSE). Each event contains a text chunk. This is the recommended approach for user-facing applications where you want to display the response as it is generated.

The request body is identical to POST /api/chat.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Response format

The response uses the text/event-stream content type. Each SSE event contains a chunk of the agent's response:

data: Here's
data:  what I
data:  found about
data:  your question...

Collect all chunks to form the complete response. The connection closes automatically when the response is complete.

Examples

curl

curl -N -X POST http://localhost:3000/api/chat/stream \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -H "Accept: text/event-stream" \
  -d '{
    "message": "Explain quantum computing",
    "agent_type": "research"
  }'

Python

import requests

response = requests.post(
    "http://localhost:3000/api/chat/stream",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi...",
        "Accept": "text/event-stream"
    },
    json={
        "message": "Explain quantum computing",
        "agent_type": "research"
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        decoded = line.decode("utf-8")
        if decoded.startswith("data: "):
            chunk = decoded[6:]  # Strip "data: " prefix
            print(chunk, end="", flush=True)

JavaScript

const response = await fetch("http://localhost:3000/api/chat/stream", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJhbGciOi...",
    "Accept": "text/event-stream"
  },
  body: JSON.stringify({
    message: "Explain quantum computing",
    agent_type: "research"
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const text = decoder.decode(value, { stream: true });
  for (const line of text.split("\n")) {
    if (line.startsWith("data: ")) {
      const chunk = line.slice(6);
      process.stdout.write(chunk); // Node.js
      // Or append to DOM in browsers
    }
  }
}

Conversations

Manage stored conversations and their message history.

List conversations

GET /api/conversations

Returns all conversations for the authenticated user.

Authentication: JWT required.

curl http://localhost:3000/api/conversations \
  -H "Authorization: Bearer eyJhbGciOi..."

Get a conversation

GET /api/conversations/{id}

Returns a single conversation along with its full message history.

Authentication: JWT required.

Parameter	Type	In	Description
`id`	string	path	The conversation ID

curl http://localhost:3000/api/conversations/conv_abc123 \
  -H "Authorization: Bearer eyJhbGciOi..."

Update a conversation

PUT /api/conversations/{id}

Update the title of a conversation.

Authentication: JWT required.

Request body:

{
  "title": "Pricing discussion"
}

curl -X PUT http://localhost:3000/api/conversations/conv_abc123 \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{"title": "Pricing discussion"}'

Delete a conversation

DELETE /api/conversations/{id}

Permanently delete a conversation and all its messages.

Authentication: JWT required.

curl -X DELETE http://localhost:3000/api/conversations/conv_abc123 \
  -H "Authorization: Bearer eyJhbGciOi..."

User memory

GET /api/memory

Retrieve memory and preferences that ARES has learned from your conversations. This includes user preferences, context, and behavioral patterns the system has observed.

Authentication: JWT required.

curl http://localhost:3000/api/memory \
  -H "Authorization: Bearer eyJhbGciOi..."

Agents

ARES agents are autonomous units that process requests using a configured LLM model, a system prompt, and a set of tools. Each agent is specialized for a particular domain or task — routing, research, product knowledge, risk analysis, and more.

Agents are defined by four properties:

Model — The LLM that powers the agent (e.g., llama-3.3-70b, claude-3-5-sonnet, deepseek-r1).
System prompt — Instructions that shape the agent's behavior, personality, and domain knowledge.
Tools — Capabilities the agent can invoke during processing (e.g., calculator, web_search, code_interpreter).
Name — A unique identifier used to route requests to this agent.

Agents can be platform-provided (available to all users) or user-defined (private, created via API or TOON config).

List all agents

GET /api/agents

Returns all available agents on the platform. This endpoint does not require authentication.

Response

[
  {
    "name": "router",
    "description": "Routes incoming requests to the most appropriate specialist agent.",
    "model": "llama-3.3-70b-versatile",
    "tools": []
  },
  {
    "name": "research",
    "description": "Conducts deep multi-step research with source synthesis.",
    "model": "deepseek-r1-distill-llama-70b",
    "tools": ["web_search", "calculator"]
  },
  {
    "name": "product",
    "description": "Answers product-related questions with detailed knowledge.",
    "model": "llama-3.3-70b-versatile",
    "tools": []
  }
]

Examples

curl

curl http://localhost:3000/api/agents

Python

import requests

response = requests.get("http://localhost:3000/api/agents")
agents = response.json()

for agent in agents:
    print(f"{agent['name']}: {agent['description']}")

JavaScript

const response = await fetch("http://localhost:3000/api/agents");
const agents = await response.json();

agents.forEach(agent => {
  console.log(`${agent.name}: ${agent.description}`);
});

User agents

Create and manage your own custom agents. User agents are private to your account and can be configured with any available model, custom system prompts, and tool selections.

All user agent endpoints require JWT authentication: Authorization: Bearer <jwt_access_token>

List your agents

GET /api/user/agents

Returns all custom agents owned by the authenticated user.

curl http://localhost:3000/api/user/agents \
  -H "Authorization: Bearer eyJhbGciOi..."

Create an agent

POST /api/user/agents

Create a new custom agent.

Request body

Parameter	Type	Required	Description
`name`	string	Yes	Unique agent name (alphanumeric, hyphens).
`model`	string	Yes	LLM model identifier.
`system_prompt`	string	Yes	Instructions that define agent behavior.
`tools`	string[]	No	List of tool names the agent can use.

Example

curl -X POST http://localhost:3000/api/user/agents \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "name": "code-reviewer",
    "model": "llama-3.3-70b-versatile",
    "system_prompt": "You are an expert code reviewer. Analyze code for bugs, security issues, and style problems. Be concise and actionable.",
    "tools": ["calculator"]
  }'

import requests

requests.post(
    "http://localhost:3000/api/user/agents",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "name": "code-reviewer",
        "model": "llama-3.3-70b-versatile",
        "system_prompt": "You are an expert code reviewer. Analyze code for bugs, security issues, and style problems. Be concise and actionable.",
        "tools": ["calculator"]
    }
)

Get agent details

GET /api/user/agents/{name}

Retrieve the full configuration of a specific user agent.

Parameter	Type	In	Description
`name`	string	path	The agent's name

curl http://localhost:3000/api/user/agents/code-reviewer \
  -H "Authorization: Bearer eyJhbGciOi..."

Update an agent

PUT /api/user/agents/{name}

Update an existing agent's configuration. You can modify the model, system prompt, or tools.

curl -X PUT http://localhost:3000/api/user/agents/code-reviewer \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "model": "deepseek-r1-distill-llama-70b",
    "system_prompt": "You are a senior code reviewer specializing in Rust and TypeScript.",
    "tools": ["calculator", "web_search"]
  }'

Delete an agent

DELETE /api/user/agents/{name}

Permanently delete a user agent.

curl -X DELETE http://localhost:3000/api/user/agents/code-reviewer \
  -H "Authorization: Bearer eyJhbGciOi..."

TOON import/export

TOON is ARES's agent configuration format. You can import and export agent configs as TOON to share agent definitions, back up configurations, or migrate agents between environments.

Import a TOON config

POST /api/user/agents/import

Import an agent definition from a TOON configuration file.

curl -X POST http://localhost:3000/api/user/agents/import \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d @agent-config.toon

Export as TOON

GET /api/user/agents/{name}/export

Export an agent's configuration in TOON format. Useful for sharing agent definitions or version-controlling them alongside your codebase.

curl http://localhost:3000/api/user/agents/code-reviewer/export \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -o code-reviewer.toon

Tools

ARES provides a type-safe tool calling framework with automatic schema generation.

Built-in Tools

Tool	Description	Feature
Calculator	Mathematical expression evaluation	default
Web Search	Search via Daedra	`search-tools`
Web Scrape	Fetch URL and extract readable text content	`search-tools`

Tool Trait

Implement Tool to create custom tools:

#![allow(unused)]
fn main() {
use ares::tools::registry::Tool;
use async_trait::async_trait;
use serde_json::Value;

struct MyTool;

#[async_trait]
impl Tool for MyTool {
    fn name(&self) -> &str { "my_tool" }

    fn description(&self) -> &str { "Does something useful" }

    fn parameters_schema(&self) -> Value {
        serde_json::json!({
            "type": "object",
            "properties": {
                "input": { "type": "string" }
            }
        })
    }

    async fn execute(&self, args: Value) -> ares::Result<Value> {
        let input = args["input"].as_str().unwrap_or("");
        Ok(serde_json::json!({ "result": format!("Processed: {}", input) }))
    }
}
}

Tool Registry

#![allow(unused)]
fn main() {
use ares::tools::ToolRegistry;
use std::sync::Arc;

// Create empty registry
let mut registry = ToolRegistry::new();

// Or create from config (auto-registers configured tools)
let mut registry = ToolRegistry::with_config(&config);

// Register a custom tool
registry.register(Arc::new(MyTool));

// Get tool definitions for LLM function calling
let definitions = registry.get_tool_definitions();

// Get definitions for specific tools only
let subset = registry.get_tool_definitions_for(&["calculator", "my_tool"]);

// Execute a tool by name
let result = registry.execute("my_tool", serde_json::json!({"input": "hello"})).await?;

// Check tool availability
assert!(registry.has_tool("calculator"));
}

Tool Configuration

Tools support per-tool configuration (enabled/disabled, timeouts):

#![allow(unused)]
fn main() {
// Check if a tool is enabled
registry.is_enabled("web_search");

// Get tool timeout
let timeout_secs = registry.get_timeout("web_search");
}

ToolCoordinator

The ToolCoordinator (in ares::llm) handles multi-turn tool calling conversations with any LLM provider:

#![allow(unused)]
fn main() {
use ares::llm::{ToolCoordinator, ToolCallingConfig};

let coordinator = ToolCoordinator::new(client, registry, ToolCallingConfig::default());

// Execute a conversation with automatic tool calling
let result = coordinator.execute(
    Some("You are a helpful assistant."),
    "What is 25 * 4 + 100?"
).await?;

println!("Response: {}", result.content);
println!("Tool calls: {}", result.tool_calls.len());
}

Per-Agent Tool Filtering

Agents can be restricted to specific tools via TOON configuration:

[agent.math-helper]
tools = ["calculator"]
# This agent can ONLY use the calculator

MCP Bridge

MCP servers are bridged into the tool ecosystem. See MCP Integration.

Skills

ARES supports SKILL.md file discovery and loading via the skills feature flag, powered by thulp-skill-files.

Feature Flag

[dependencies]
ares-server = { version = "0.7", features = ["skills"] }

Configuration

Configure skill directories in your ares.toml:

[skills]
project_dir = "./.claude/skills/"
personal_dir = "~/.claude/skills/"
plugin_dirs = ["./plugins/my-plugin/skills"]

API

List Skills

GET /api/skills

Returns all discovered skills with scope-based priority (project > personal > enterprise > plugin).

Get Skill

GET /api/skills/{name}

Returns a single skill by qualified name, including full body content.

Library Usage

Skills are also available as a library API for direct Rust usage:

#![allow(unused)]
fn main() {
use ares::skills::{SkillsConfig, load_skills, list_skills, get_skill};

let config = SkillsConfig {
    project_dir: Some("./.claude/skills/".into()),
    personal_dir: Some("~/.claude/skills/".into()),
    ..Default::default()
};

// Load all skills
let skills = load_skills(&config);

// List summaries (name, description, scope)
let summaries = list_skills(&config);

// Get specific skill
let skill = get_skill(&config, "my-skill");
}

Skill File Format

Skills are SKILL.md files with YAML frontmatter:

---
name: my-skill
description: What this skill does
---

# My Skill

Instructions for the AI agent...

Scope Priority

When multiple skills share the same name, scope priority determines which wins:

Project — ./.claude/skills/ (highest priority)
Personal — ~/.claude/skills/
Enterprise — organization-wide skills
Plugin — from plugin directories (lowest priority)

MCP Integration

ARES integrates with Model Context Protocol servers, allowing agents to use external tools as first-class capabilities.

Feature Flag

[dependencies]
ares-server = { version = "0.7", features = ["mcp"] }

MCP is included in the default feature set.

Configuration

MCP servers are configured via .toon files in your config directory. Each server gets its own TOON configuration.

How It Works

ARES discovers MCP server configs from the config directory
McpRegistry::from_dir() loads and connects to configured servers
Each server provides an McpClient for tool invocation
Agents access MCP tools through the registry

Architecture

Agent Request → McpRegistry → get_client("eruka") → McpClient → MCP Server
                                                                      ↓
Agent Response ← Tool Result ←────────────────────────────────────────┘

Library Usage

#![allow(unused)]
fn main() {
use ares::mcp::McpRegistry;

// Load MCP servers from config directory
let registry = McpRegistry::from_dir("config/mcp")?;

// List connected servers
let names = registry.client_names();

// Get a specific client
if let Some(client) = registry.get_client("eruka") {
    // Use the client to call MCP tools
}

// Convenience method for Eruka specifically
if let Some(eruka) = registry.eruka() {
    // Direct access to Eruka MCP client
}
}

Per-Agent MCP Access

Agents can be configured with specific MCP server access via TOON configuration:

[agent.researcher]
mcp_servers = ["eruka", "search"]

Memory

ARES provides conversation memory and user context management for maintaining state across agent interactions.

Features

Sliding window over conversation history (DEFAULT_HISTORY_WINDOW = 10)
Token-budget-aware history truncation
User memory formatting (facts, preferences) for system prompts
Integration with Eruka for persistent cross-session context

Core Functions

History Management

#![allow(unused)]
fn main() {
use ares::memory::{truncate_history, truncate_history_to_tokens};

// Keep last N messages
let recent = truncate_history(&messages, 10);

// Keep messages within a token budget
let within_budget = truncate_history_to_tokens(&messages, 4096);
}

Context Building

#![allow(unused)]
fn main() {
use ares::memory::{build_context, format_memory_for_prompt};

// Format user memory (facts + preferences) into a system prompt section
let memory_text = format_memory_for_prompt(&user_memory);

// Build full context with history window and memory injection
let context = build_context(&user_memory, &history, window_size);
}

Filtering

#![allow(unused)]
fn main() {
use ares::memory::{filter_facts_by_category, filter_preferences_by_category};

// Filter facts by category (e.g., "health", "technical")
let health_facts = filter_facts_by_category(&facts, "health");

// Filter preferences similarly
let prefs = filter_preferences_by_category(&preferences, "communication");
}

Constants

Constant	Value	Purpose
`DEFAULT_HISTORY_WINDOW`	10	Default number of messages to keep
`MAX_FACTS_IN_PROMPT`	20	Max facts injected into system prompt
`MAX_PREFERENCES_IN_PROMPT`	10	Max preferences injected

Token Estimation

#![allow(unused)]
fn main() {
use ares::memory::estimate_tokens;

let tokens = estimate_tokens("Hello, how are you?");
// Rough estimate: ~5 tokens (word count * 1.3)
}

Eruka Integration

When ARES is paired with Eruka (via the ContextProvider trait), the memory flow becomes:

On session start, ContextProvider::get_context() fetches user state from Eruka
Facts and preferences are formatted and injected into the agent system prompt
After exchanges, agent signals (emotional state, topics, preferences) are written back to Eruka
Next session starts with updated context — agents remember users across conversations

RAG (Retrieval-Augmented Generation)

The RAG API lets you ingest documents, search them using multiple retrieval strategies, and manage document collections. RAG powers knowledge-grounded responses by retrieving relevant context from your documents before generating answers.

Feature flag: The RAG API requires ARES to be built with the ares-vector feature. If your deployment does not include this feature, these endpoints will return 404.

Ingest documents

POST /api/rag/ingest

Ingest content into a named collection. The content is automatically chunked and indexed for retrieval.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Request body

Parameter	Type	Required	Default	Description
`collection`	string	Yes	--	Name of the collection to ingest into. Created automatically if it doesn't exist.
`content`	string	Yes	--	The text content to ingest.
`metadata`	object	No	`{}`	Arbitrary key-value metadata attached to the document.
`chunking_strategy`	string	No	`"word"`	How to split the content into chunks. Options: `"word"`, `"sentence"`, `"paragraph"`.

Response

{
  "chunks_created": 5,
  "document_ids": [
    "doc_a1b2c3d4",
    "doc_e5f6g7h8",
    "doc_i9j0k1l2",
    "doc_m3n4o5p6",
    "doc_q7r8s9t0"
  ],
  "collection": "docs"
}

Field	Type	Description
`chunks_created`	integer	Number of chunks produced from the content.
`document_ids`	string[]	IDs assigned to each chunk.
`collection`	string	The collection the content was ingested into.

Examples

curl

curl -X POST http://localhost:3000/api/rag/ingest \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "collection": "product-docs",
    "content": "ARES is a multi-agent AI platform that orchestrates specialized agents to handle complex queries. It supports multiple LLM providers including Groq, Anthropic, and NVIDIA...",
    "metadata": {
      "source": "documentation",
      "version": "2.0",
      "author": "engineering"
    },
    "chunking_strategy": "paragraph"
  }'

Python

import requests

response = requests.post(
    "http://localhost:3000/api/rag/ingest",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "collection": "product-docs",
        "content": "ARES is a multi-agent AI platform...",
        "metadata": {"source": "documentation", "version": "2.0"},
        "chunking_strategy": "paragraph"
    }
)

result = response.json()
print(f"Created {result['chunks_created']} chunks in '{result['collection']}'")

JavaScript

const response = await fetch("http://localhost:3000/api/rag/ingest", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJhbGciOi..."
  },
  body: JSON.stringify({
    collection: "product-docs",
    content: "ARES is a multi-agent AI platform...",
    metadata: { source: "documentation", version: "2.0" },
    chunking_strategy: "paragraph"
  })
});

const result = await response.json();
console.log(`Created ${result.chunks_created} chunks in '${result.collection}'`);

Search documents

POST /api/rag/search

Search a collection using one of several retrieval strategies. Returns the most relevant document chunks.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Request body

Parameter	Type	Required	Default	Description
`collection`	string	Yes	--	Collection to search.
`query`	string	Yes	--	The search query.
`strategy`	string	No	`"hybrid"`	Retrieval strategy (see below).
`top_k`	integer	No	5	Maximum number of results to return.
`rerank`	boolean	No	`false`	Whether to rerank results for improved relevance ordering.

Search strategies

Strategy	Description
`semantic`	Vector similarity search. Best for conceptual or meaning-based queries.
`bm25`	Classic keyword-based ranking (BM25 algorithm). Best for exact term matching.
`fuzzy`	Tolerates typos and approximate matches. Useful for user-facing search with imprecise input.
`hybrid`	Combines semantic and keyword search, then merges results. Best overall performance for most use cases.

Response

The response contains an array of matching document chunks, each with its content, relevance score, and metadata.

Examples

curl

curl -X POST http://localhost:3000/api/rag/search \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "collection": "product-docs",
    "query": "how does agent routing work",
    "strategy": "hybrid",
    "top_k": 5,
    "rerank": true
  }'

Python

import requests

response = requests.post(
    "http://localhost:3000/api/rag/search",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "collection": "product-docs",
        "query": "how does agent routing work",
        "strategy": "hybrid",
        "top_k": 5,
        "rerank": True
    }
)

results = response.json()
for result in results:
    print(result)

JavaScript

const response = await fetch("http://localhost:3000/api/rag/search", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJhbGciOi..."
  },
  body: JSON.stringify({
    collection: "product-docs",
    query: "how does agent routing work",
    strategy: "hybrid",
    top_k: 5,
    rerank: true
  })
});

const results = await response.json();
results.forEach(result => console.log(result));

List collections

GET /api/rag/collections

Returns all document collections for the authenticated user.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

curl http://localhost:3000/api/rag/collections \
  -H "Authorization: Bearer eyJhbGciOi..."

Delete a collection

DELETE /api/rag/collection

Permanently delete a collection and all its indexed documents.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Request body

{
  "collection": "product-docs"
}

Example

curl -X DELETE http://localhost:3000/api/rag/collection \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{"collection": "product-docs"}'

Workflows

Workflows are multi-agent orchestration pipelines. A workflow defines an entry point agent (typically a router) that analyzes the incoming query and delegates to specialist agents in sequence. The result is a coordinated, multi-step response that leverages the strengths of different agents.

How workflows operate:

The query enters through an entry agent (usually a router).
The router analyzes intent and selects the most appropriate specialist agent.
The specialist processes the query, optionally delegating further.
Each step is recorded in the reasoning path, providing full transparency into the decision chain.
The final response is returned along with metadata about the execution.

List workflows

GET /api/workflows

Returns the names of all available workflows.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Response

["default", "research", "support"]

Example

curl http://localhost:3000/api/workflows \
  -H "Authorization: Bearer eyJhbGciOi..."

Execute a workflow

POST /api/workflows/{workflow_name}

Execute a named workflow. The query is routed through the workflow's agent chain, and the final synthesized response is returned along with execution metadata.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Path parameters

Parameter	Type	Description
`workflow_name`	string	Name of the workflow to execute

Request body

Parameter	Type	Required	Description
`query`	string	Yes	The input query or task for the workflow.
`context`	object	No	Additional context passed to agents during execution.

Response

{
  "final_response": "Based on our analysis, the Pro plan at $49/month offers the best value for your use case. It includes 100K API calls, priority support, and access to all models. The Enterprise plan adds dedicated infrastructure and SLA guarantees, which may be worth considering if you expect to exceed 500K calls/month.",
  "steps_executed": 3,
  "agents_used": ["router", "sales", "product"],
  "reasoning_path": [
    {
      "agent": "router",
      "action": "Classified as pricing inquiry. Routing to sales agent."
    },
    {
      "agent": "sales",
      "action": "Retrieved pricing tiers. Consulting product agent for feature comparison."
    },
    {
      "agent": "product",
      "action": "Compared Pro vs Enterprise feature sets. Synthesized final recommendation."
    }
  ]
}

Field	Type	Description
`final_response`	string	The synthesized response from the workflow.
`steps_executed`	integer	Total number of agent steps in the execution.
`agents_used`	string[]	Ordered list of agents that participated.
`reasoning_path`	array	Step-by-step trace of each agent's reasoning and actions.

Examples

curl

curl -X POST http://localhost:3000/api/workflows/default \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "query": "Compare your Pro and Enterprise pricing plans for a mid-size SaaS company",
    "context": {
      "company_size": "50-200 employees",
      "expected_volume": "200K calls/month"
    }
  }'

Python

import requests

response = requests.post(
    "http://localhost:3000/api/workflows/default",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "query": "Compare your Pro and Enterprise pricing plans for a mid-size SaaS company",
        "context": {
            "company_size": "50-200 employees",
            "expected_volume": "200K calls/month"
        }
    }
)

result = response.json()
print(result["final_response"])

# Inspect the reasoning chain
for step in result["reasoning_path"]:
    print(f"  [{step['agent']}] {step['action']}")

JavaScript

const response = await fetch(
  "http://localhost:3000/api/workflows/default",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": "Bearer eyJhbGciOi..."
    },
    body: JSON.stringify({
      query: "Compare your Pro and Enterprise pricing plans for a mid-size SaaS company",
      context: {
        company_size: "50-200 employees",
        expected_volume: "200K calls/month"
      }
    })
  }
);

const result = await response.json();
console.log(result.final_response);

// Inspect the reasoning chain
result.reasoning_path.forEach(step => {
  console.log(`  [${step.agent}] ${step.action}`);
});

Workflow behavior

Agent selection. The entry agent examines the query and routes to the specialist best suited to handle it. If a specialist determines it needs input from another agent, it can delegate further, creating a multi-hop chain.

Context propagation. The optional context object is available to every agent in the chain. Use it to pass structured information (user tier, session metadata, domain-specific parameters) that agents can reference during processing.

Determinism. Workflow routing is driven by the entry agent's LLM reasoning, so the same query may route differently depending on phrasing. The reasoning_path in the response provides full visibility into routing decisions.

Research

The Research API performs deep, multi-step research on a topic using parallel sub-agents. Unlike a single chat request, a research query spawns multiple agents that independently explore facets of the question, synthesize findings, and produce a comprehensive result with source attribution.

Execute a research query

POST /api/research

Submit a research query for deep, multi-step investigation.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Request body

Parameter	Type	Required	Default	Description
`query`	string	Yes	--	The research question or topic.
`depth`	integer	No	3	How many levels deep the research goes. Higher values explore sub-topics more thoroughly.
`max_iterations`	integer	No	5	Maximum total agent calls. Acts as a cost/time ceiling.

Understanding depth: At depth 1, the research agent answers the query directly. At depth 2, it identifies sub-questions, spawns agents to answer each, then synthesizes. At depth 3+, sub-agents can spawn their own sub-agents, creating a tree of investigation.

Understanding max_iterations: This is a hard cap on total agent invocations across all depth levels. If the research tree would require more calls than max_iterations, it stops expanding and synthesizes what it has. Use this to control cost and response time.

Response

{
  "findings": "## Market Analysis: Edge Computing in Healthcare\n\nEdge computing adoption in healthcare is accelerating, driven by three primary factors...\n\n### Key Findings\n1. **Latency requirements** — Real-time patient monitoring demands sub-10ms response times...\n2. **Data sovereignty** — HIPAA compliance increasingly favors on-premise processing...\n3. **Cost dynamics** — Edge deployment reduces cloud egress costs by 40-60% for imaging workloads...\n\n### Sources\n- Gartner Healthcare IT Report 2025\n- IEEE Edge Computing Survey\n- HHS HIPAA Guidance Update",
  "sources": [
    "Gartner Healthcare IT Report 2025",
    "IEEE Edge Computing Survey",
    "HHS HIPAA Guidance Update"
  ],
  "duration_ms": 8432
}

Field	Type	Description
`findings`	string	The synthesized research output, typically in Markdown.
`sources`	string[]	References and sources discovered during research.
`duration_ms`	integer	Total time taken for the research in milliseconds.

Examples

curl

curl -X POST http://localhost:3000/api/research \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "query": "What are the current trends in edge computing for healthcare?",
    "depth": 3,
    "max_iterations": 5
  }'

Python

import requests

response = requests.post(
    "http://localhost:3000/api/research",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "query": "What are the current trends in edge computing for healthcare?",
        "depth": 3,
        "max_iterations": 5
    }
)

result = response.json()
print(result["findings"])
print(f"\nCompleted in {result['duration_ms']}ms")
print(f"Sources: {', '.join(result['sources'])}")

JavaScript

const response = await fetch("http://localhost:3000/api/research", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJhbGciOi..."
  },
  body: JSON.stringify({
    query: "What are the current trends in edge computing for healthcare?",
    depth: 3,
    max_iterations: 5
  })
});

const result = await response.json();
console.log(result.findings);
console.log(`\nCompleted in ${result.duration_ms}ms`);
console.log(`Sources: ${result.sources.join(", ")}`);

Tuning research parameters

Scenario	Recommended `depth`	Recommended `max_iterations`
Quick factual lookup	1	2
Standard research question	2	5
Deep competitive analysis	3	10
Exhaustive literature review	4+	15+

Higher depth and iteration values produce more comprehensive results but take longer and consume more API quota. For most use cases, the defaults (depth: 3, max_iterations: 5) provide a good balance of thoroughness and speed.

Streaming

ARES supports real-time streaming responses via Server-Sent Events (SSE). Instead of waiting for the full response to be generated, you receive text chunks as they are produced. This enables responsive UIs that display text as it appears.

Endpoint

POST /api/chat/stream

JWT authentication: Authorization: Bearer <jwt_access_token>

POST /v1/chat/stream

API key authentication: Authorization: Bearer ares_xxx

Both endpoints accept the same request body as POST /api/chat and return the same SSE format.

SSE format

The response uses Content-Type: text/event-stream. Each event contains a data: field with a text chunk:

data: The
data:  answer
data:  to your
data:  question is
data:  as follows...

Each data: line represents one chunk of the response. Concatenate all chunks in order to reconstruct the complete response. The server closes the connection when generation is complete.

Examples

curl

The -N flag disables output buffering so chunks appear immediately:

curl -N -X POST http://localhost:3000/api/chat/stream \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -H "Accept: text/event-stream" \
  -d '{
    "message": "Explain how neural networks learn",
    "agent_type": "research"
  }'

Python

Using the requests library with stream=True:

import requests

response = requests.post(
    "http://localhost:3000/api/chat/stream",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi...",
        "Accept": "text/event-stream"
    },
    json={
        "message": "Explain how neural networks learn",
        "agent_type": "research"
    },
    stream=True
)

full_response = []

for line in response.iter_lines():
    if line:
        decoded = line.decode("utf-8")
        if decoded.startswith("data: "):
            chunk = decoded[6:]
            print(chunk, end="", flush=True)
            full_response.append(chunk)

complete_text = "".join(full_response)

For production use, consider using httpx with async streaming:

import httpx
import asyncio

async def stream_chat(message: str, token: str) -> str:
    chunks = []

    async with httpx.AsyncClient() as client:
        async with client.stream(
            "POST",
            "http://localhost:3000/api/chat/stream",
            headers={
                "Content-Type": "application/json",
                "Authorization": f"Bearer {token}",
                "Accept": "text/event-stream"
            },
            json={"message": message}
        ) as response:
            async for line in response.aiter_lines():
                if line.startswith("data: "):
                    chunk = line[6:]
                    print(chunk, end="", flush=True)
                    chunks.append(chunk)

    return "".join(chunks)

result = asyncio.run(stream_chat("Explain how neural networks learn", "eyJhbGciOi..."))

JavaScript (Browser)

Using the Fetch API with ReadableStream:

async function streamChat(message, token) {
  const response = await fetch("http://localhost:3000/api/chat/stream", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${token}`,
      "Accept": "text/event-stream"
    },
    body: JSON.stringify({
      message: message,
      agent_type: "research"
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let fullResponse = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value, { stream: true });
    for (const line of text.split("\n")) {
      if (line.startsWith("data: ")) {
        const chunk = line.slice(6);
        fullResponse += chunk;

        // Update your UI here
        document.getElementById("output").textContent = fullResponse;
      }
    }
  }

  return fullResponse;
}

JavaScript (Node.js)

async function streamChat(message, token) {
  const response = await fetch("http://localhost:3000/api/chat/stream", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${token}`,
      "Accept": "text/event-stream"
    },
    body: JSON.stringify({ message })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let fullResponse = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value, { stream: true });
    for (const line of text.split("\n")) {
      if (line.startsWith("data: ")) {
        const chunk = line.slice(6);
        fullResponse += chunk;
        process.stdout.write(chunk);
      }
    }
  }

  return fullResponse;
}

Go

package main

import (
	"bufio"
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
	"strings"
)

func streamChat(message, token string) (string, error) {
	body, _ := json.Marshal(map[string]string{
		"message":    message,
		"agent_type": "research",
	})

	req, err := http.NewRequest("POST",
		"http://localhost:3000/api/chat/stream",
		bytes.NewReader(body))
	if err != nil {
		return "", err
	}

	req.Header.Set("Content-Type", "application/json")
	req.Header.Set("Authorization", "Bearer "+token)
	req.Header.Set("Accept", "text/event-stream")

	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		return "", err
	}
	defer resp.Body.Close()

	var fullResponse strings.Builder
	scanner := bufio.NewScanner(resp.Body)

	for scanner.Scan() {
		line := scanner.Text()
		if strings.HasPrefix(line, "data: ") {
			chunk := line[6:]
			fmt.Print(chunk)
			fullResponse.WriteString(chunk)
		}
	}

	return fullResponse.String(), scanner.Err()
}

func main() {
	result, err := streamChat("Explain how neural networks learn", "eyJhbGciOi...")
	if err != nil {
		panic(err)
	}
	fmt.Printf("\n\nFull response length: %d characters\n", len(result))
}

Error handling

If the request is invalid or authentication fails, the server returns a standard HTTP error response (not SSE). Always check the response status before attempting to read the stream:

response = requests.post(url, headers=headers, json=body, stream=True)

if response.status_code != 200:
    print(f"Error {response.status_code}: {response.text}")
else:
    for line in response.iter_lines():
        # process SSE events

const response = await fetch(url, { method: "POST", headers, body });

if (!response.ok) {
  throw new Error(`Error ${response.status}: ${await response.text()}`);
}

// proceed with stream reading

Best practices

Always set Accept: text/event-stream to signal that you expect a streaming response.
Disable client-side buffering where possible (e.g., -N in curl, stream=True in Python requests).
Handle connection drops gracefully. The stream may close unexpectedly due to network issues. Implement retry logic for production applications.
Set reasonable timeouts. Long research queries may stream for 30+ seconds. Configure your HTTP client timeout accordingly.
Concatenate chunks for the final result. Individual chunks may split mid-word. Only process the complete response for downstream use.

V1 Client API

The V1 API is the primary interface for enterprise clients integrating ARES into their applications. All endpoints are scoped to the authenticated tenant — you only see your own agents, runs, and usage.

Base URL: http://localhost:3000

Authentication

Every request to /v1/* must include your API key in the Authorization header:

Authorization: Bearer ares_xxx

API keys are issued during tenant provisioning. You can create additional keys via the API or request them from your platform administrator.

Agents

List Agents

GET /v1/agents?page=1&per_page=20

Returns a paginated list of agents configured for your tenant.

Query Parameters:

Parameter	Type	Default	Description
`page`	integer	`1`	Page number
`per_page`	integer	`20`	Results per page

Response:

{
  "agents": [
    {
      "id": "uuid",
      "name": "risk-analyzer",
      "agent_type": "classifier",
      "status": "active",
      "config": { "model": "llama-3.3-70b", "tools": ["calculator"] },
      "created_at": "2026-03-01T00:00:00Z",
      "last_run": "2026-03-13T14:22:00Z",
      "total_runs": 1547,
      "success_rate": 0.982
    }
  ],
  "total": 4,
  "page": 1,
  "per_page": 20
}

Get Agent Details

GET /v1/agents/{name}

Returns full details for a single agent.

Response:

{
  "id": "uuid",
  "name": "risk-analyzer",
  "agent_type": "classifier",
  "status": "active",
  "config": {
    "model": "llama-3.3-70b",
    "system_prompt": "You are a risk analysis agent...",
    "tools": ["calculator"],
    "max_tokens": 2048
  },
  "created_at": "2026-03-01T00:00:00Z",
  "last_run": "2026-03-13T14:22:00Z",
  "total_runs": 1547,
  "success_rate": 0.982
}

Run an Agent

POST /v1/agents/{name}/run

Execute an agent with the provided input. This is the core endpoint for triggering agent work.

Request Body:

{
  "input": {
    "message": "Analyze the risk profile for transaction TX-9921",
    "context": {
      "amount": 15000,
      "currency": "USD",
      "merchant_category": "electronics"
    }
  }
}

Response:

{
  "id": "run-uuid",
  "agent_id": "agent-uuid",
  "status": "completed",
  "input": { "message": "Analyze the risk profile..." },
  "output": {
    "risk_score": 0.73,
    "risk_level": "medium",
    "reasoning": "Elevated amount for merchant category..."
  },
  "error": null,
  "started_at": "2026-03-13T14:22:00Z",
  "finished_at": "2026-03-13T14:22:01Z",
  "duration_ms": 1243,
  "tokens_used": 847
}

If the agent fails, status will be "failed" and error will contain a description.

List Agent Runs

GET /v1/agents/{name}/runs?page=1&per_page=20

Returns the run history for a specific agent, newest first.

Chat

Send a Chat Message

POST /v1/chat

Send a message to a model or agent and receive a complete response.

Request Body:

{
  "messages": [
    { "role": "user", "content": "Summarize Q1 revenue trends." }
  ],
  "model": "llama-3.3-70b",
  "agent_type": "analyst"
}

Response:

{
  "id": "msg-uuid",
  "content": "Based on the data, Q1 revenue showed...",
  "model": "llama-3.3-70b",
  "tokens_used": 312,
  "finish_reason": "stop"
}

Stream a Chat Response

POST /v1/chat/stream

Same request body as /v1/chat, but returns a Server-Sent Events (SSE) stream.

data: {"delta": "Based on", "finish_reason": null}
data: {"delta": " the data,", "finish_reason": null}
data: {"delta": " Q1 revenue", "finish_reason": null}
...
data: {"delta": "", "finish_reason": "stop", "tokens_used": 312}

Usage

Get Usage Summary

GET /v1/usage

Returns your tenant's usage for the current billing period.

Response:

{
  "period_start": "2026-03-01T00:00:00Z",
  "period_end": "2026-03-31T23:59:59Z",
  "total_runs": 4821,
  "total_tokens": 2847193,
  "total_api_calls": 5290,
  "quota_runs": 100000,
  "quota_tokens": 10000000,
  "daily_usage": [
    { "date": "2026-03-13", "runs": 312, "tokens": 184920, "api_calls": 340 },
    { "date": "2026-03-12", "runs": 287, "tokens": 171003, "api_calls": 315 }
  ]
}

API Keys

List API Keys

GET /v1/api-keys

Returns all API keys for your tenant. The full key secret is never returned after creation.

Response:

{
  "keys": [
    {
      "id": "key-uuid",
      "name": "android-production",
      "prefix": "ares_a1b2",
      "created_at": "2026-03-01T00:00:00Z",
      "expires_at": "2027-03-01T00:00:00Z",
      "last_used": "2026-03-13T14:00:00Z"
    }
  ]
}

Create API Key

POST /v1/api-keys

Request Body:

{
  "name": "mobile-app-key",
  "expires_in_days": 365
}

expires_in_days is optional. If omitted, the key does not expire.

Response:

{
  "key": "key-uuid",
  "secret": "ares_x7k9m2p4q8r1s5t3..."
}

Important: The secret field is only returned once at creation time. Store it securely — it cannot be retrieved again.

Revoke API Key

DELETE /v1/api-keys/{id}

Immediately invalidates the key. Returns 204 No Content on success.

Examples

Run an Agent (curl)

curl -X POST http://localhost:3000/v1/agents/risk-analyzer/run \
  -H "Authorization: Bearer ares_x7k9m2p4q8r1s5t3" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "message": "Evaluate this transaction",
      "context": {"amount": 15000, "currency": "USD"}
    }
  }'

Run an Agent (Python)

import requests

API_KEY = "ares_x7k9m2p4q8r1s5t3"
BASE_URL = "http://localhost:3000"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

# Run an agent
response = requests.post(
    f"{BASE_URL}/v1/agents/risk-analyzer/run",
    headers=headers,
    json={
        "input": {
            "message": "Evaluate this transaction",
            "context": {"amount": 15000, "currency": "USD"},
        }
    },
)

result = response.json()
print(f"Status: {result['status']}")
print(f"Output: {result['output']}")
print(f"Duration: {result['duration_ms']}ms")
print(f"Tokens: {result['tokens_used']}")

Check Usage (curl)

curl http://localhost:3000/v1/usage \
  -H "Authorization: Bearer ares_x7k9m2p4q8r1s5t3"

Check Usage (Python)

response = requests.get(f"{BASE_URL}/v1/usage", headers=headers)
usage = response.json()

print(f"Runs this month: {usage['total_runs']} / {usage['quota_runs']}")
print(f"Tokens this month: {usage['total_tokens']} / {usage['quota_tokens']}")

Chat with Streaming (Python)

import requests
import json

response = requests.post(
    f"{BASE_URL}/v1/chat/stream",
    headers=headers,
    json={
        "messages": [{"role": "user", "content": "Explain quantum computing."}],
        "model": "llama-3.3-70b",
    },
    stream=True,
)

for line in response.iter_lines():
    if line:
        text = line.decode("utf-8")
        if text.startswith("data: "):
            data = json.loads(text[6:])
            print(data.get("delta", ""), end="", flush=True)

Chat with Streaming (JavaScript)

const response = await fetch("http://localhost:3000/v1/chat/stream", {
  method: "POST",
  headers: {
    "Authorization": "Bearer ares_x7k9m2p4q8r1s5t3",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    messages: [{ role: "user", content: "Explain quantum computing." }],
    model: "llama-3.3-70b",
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const text = decoder.decode(value);
  for (const line of text.split("\n")) {
    if (line.startsWith("data: ")) {
      const data = JSON.parse(line.slice(6));
      process.stdout.write(data.delta || "");
    }
  }
}

Admin API

The Admin API provides full platform management capabilities for ARES operators. Use it to provision tenants, manage agents, monitor usage, and operate the platform.

Base URL: http://localhost:3000

Authentication

Every request to /api/admin/* must include the admin secret:

X-Admin-Secret: <secret>

This secret is set in your ares.toml configuration. Guard it carefully — it grants full platform access.

Tenants

Create Tenant

POST /api/admin/tenants

Request Body:

{
  "name": "acme-corp",
  "tier": "pro"
}

Valid tiers: free, dev, pro, enterprise.

Response:

{
  "id": "tenant-uuid",
  "name": "acme-corp",
  "tier": "pro",
  "created_at": "2026-03-13T00:00:00Z"
}

List Tenants

GET /api/admin/tenants

Response:

{
  "tenants": [
    {
      "id": "tenant-uuid",
      "name": "acme-corp",
      "tier": "pro",
      "agent_count": 4,
      "created_at": "2026-03-13T00:00:00Z"
    }
  ]
}

Get Tenant Details

GET /api/admin/tenants/{id}

Response:

{
  "id": "tenant-uuid",
  "name": "acme-corp",
  "tier": "pro",
  "agent_count": 4,
  "api_key_count": 2,
  "total_runs": 12849,
  "total_tokens": 7291034,
  "created_at": "2026-03-13T00:00:00Z"
}

Update Tenant Tier

PUT /api/admin/tenants/{id}/quota

Request Body:

{
  "tier": "enterprise"
}

Response: Updated tenant object.

Provisioning

Provision a Client

POST /api/admin/provision-client

This is the recommended way to onboard a new enterprise client. It atomically creates a tenant, clones the appropriate agent templates, and generates an API key — all in a single transaction. If any step fails, everything is rolled back.

Request Body:

{
  "name": "acme-corp",
  "tier": "pro",
  "product_type": "kasino",
  "api_key_name": "production"
}

Field	Type	Required	Description
`name`	string	Yes	Unique tenant name (lowercase, alphanumeric + hyphens)
`tier`	string	Yes	One of: `free`, `dev`, `pro`, `enterprise`
`product_type`	string	Yes	Template set to clone: `generic`, `kasino`, `ehb`
`api_key_name`	string	Yes	Label for the initial API key

Response:

{
  "tenant_id": "tenant-uuid",
  "tenant_name": "acme-corp",
  "tier": "pro",
  "product_type": "kasino",
  "api_key_id": "key-uuid",
  "api_key_prefix": "ares_a1b2",
  "raw_api_key": "ares_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5",
  "agents_created": [
    "kasino-classifier",
    "kasino-risk",
    "kasino-transaction",
    "kasino-report"
  ]
}

Important: The raw_api_key is only returned once. Store it securely and deliver it to the client through a secure channel.

curl Example:

curl -X POST http://localhost:3000/api/admin/provision-client \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "acme-corp",
    "tier": "pro",
    "product_type": "kasino",
    "api_key_name": "production"
  }'

API Keys

Create API Key for Tenant

POST /api/admin/tenants/{id}/api-keys

Request Body:

{
  "name": "staging-key"
}

Response:

{
  "id": "key-uuid",
  "prefix": "ares_x7k9",
  "raw_key": "ares_x7k9m2p4q8r1s5t3...",
  "created_at": "2026-03-13T00:00:00Z"
}

List API Keys for Tenant

GET /api/admin/tenants/{id}/api-keys

Response:

{
  "keys": [
    {
      "id": "key-uuid",
      "name": "production",
      "prefix": "ares_a1b2",
      "created_at": "2026-03-13T00:00:00Z",
      "last_used": "2026-03-13T14:00:00Z"
    }
  ]
}

Tenant Agents

List Tenant Agents

GET /api/admin/tenants/{id}/agents

Response:

{
  "agents": [
    {
      "id": "agent-uuid",
      "name": "kasino-classifier",
      "agent_type": "classifier",
      "status": "active",
      "model": "llama-3.3-70b",
      "total_runs": 2841,
      "success_rate": 0.991
    }
  ]
}

Create Tenant Agent

POST /api/admin/tenants/{id}/agents

Request Body:

{
  "name": "custom-analyzer",
  "agent_type": "analyzer",
  "config": {
    "model": "llama-3.3-70b",
    "system_prompt": "You are a financial data analyzer...",
    "tools": ["calculator"],
    "max_tokens": 4096
  }
}

Update Tenant Agent

PUT /api/admin/tenants/{id}/agents/{name}

Request Body: Same structure as create. Fields provided will be updated.

Delete Tenant Agent

DELETE /api/admin/tenants/{id}/agents/{name}

Returns 204 No Content on success.

Templates and Models

List Agent Templates

GET /api/admin/agent-templates?product_type=kasino

Returns the pre-configured agent templates available for a given product type. These are cloned during provisioning.

Response:

{
  "templates": [
    {
      "name": "kasino-classifier",
      "agent_type": "classifier",
      "product_type": "kasino",
      "config": {
        "model": "llama-3.3-70b",
        "system_prompt": "You are a transaction classifier...",
        "tools": []
      }
    }
  ]
}

List Available Models

GET /api/admin/models

Returns all models configured across all providers.

Response:

{
  "models": [
    {
      "id": "llama-3.3-70b",
      "provider": "groq",
      "context_length": 131072,
      "supports_tools": true
    },
    {
      "id": "deepseek-r1",
      "provider": "nvidia-deepseek",
      "context_length": 65536,
      "supports_tools": false
    },
    {
      "id": "claude-3.5-sonnet",
      "provider": "anthropic",
      "context_length": 200000,
      "supports_tools": true
    }
  ]
}

Usage and Analytics

Tenant Usage Summary

GET /api/admin/tenants/{id}/usage

Response:

{
  "tenant_id": "tenant-uuid",
  "tenant_name": "acme-corp",
  "tier": "pro",
  "period_start": "2026-03-01T00:00:00Z",
  "period_end": "2026-03-31T23:59:59Z",
  "total_runs": 4821,
  "total_tokens": 2847193,
  "quota_runs": 100000,
  "quota_tokens": 10000000
}

Daily Usage Breakdown

GET /api/admin/tenants/{id}/usage/daily?days=30

Response:

{
  "daily": [
    { "date": "2026-03-13", "runs": 312, "tokens": 184920 },
    { "date": "2026-03-12", "runs": 287, "tokens": 171003 }
  ]
}

Agent Run History

GET /api/admin/tenants/{id}/agents/{name}/runs?limit=50

Response:

{
  "runs": [
    {
      "id": "run-uuid",
      "status": "completed",
      "started_at": "2026-03-13T14:22:00Z",
      "duration_ms": 1243,
      "tokens_used": 847
    }
  ]
}

Agent Stats

GET /api/admin/tenants/{id}/agents/{name}/stats

Response:

{
  "agent_name": "kasino-classifier",
  "total_runs": 2841,
  "successful_runs": 2815,
  "failed_runs": 26,
  "success_rate": 0.991,
  "avg_duration_ms": 1102,
  "avg_tokens": 723,
  "last_run": "2026-03-13T14:22:00Z"
}

Cross-Tenant Agent List

GET /api/admin/agents

Returns agents across all tenants. Useful for platform-wide visibility.

Platform Stats

GET /api/admin/stats

Response:

{
  "total_tenants": 12,
  "total_agents": 47,
  "total_runs_today": 3291,
  "total_tokens_today": 1948271,
  "active_alerts": 2
}

Alerts and Audit

List Alerts

GET /api/admin/alerts?severity=critical&resolved=false&limit=100

Query Parameters:

Parameter	Type	Default	Description
`severity`	string	all	Filter by: `info`, `warning`, `critical`
`resolved`	boolean	all	Filter by resolution status
`limit`	integer	`100`	Maximum results to return

Response:

{
  "alerts": [
    {
      "id": "alert-uuid",
      "severity": "critical",
      "message": "Tenant acme-corp approaching token quota (92%)",
      "tenant_id": "tenant-uuid",
      "created_at": "2026-03-13T10:00:00Z",
      "resolved": false
    }
  ]
}

Resolve Alert

POST /api/admin/alerts/{id}/resolve

Returns 200 OK with the updated alert object.

Audit Log

GET /api/admin/audit-log?limit=50

Response:

{
  "entries": [
    {
      "id": "entry-uuid",
      "action": "tenant.created",
      "actor": "admin",
      "details": { "tenant_name": "acme-corp", "tier": "pro" },
      "timestamp": "2026-03-13T00:00:00Z"
    },
    {
      "id": "entry-uuid",
      "action": "agent.deleted",
      "actor": "admin",
      "details": { "tenant_id": "...", "agent_name": "old-agent" },
      "timestamp": "2026-03-12T23:00:00Z"
    }
  ]
}

Deployment API

The Deployment API allows you to trigger, monitor, and inspect deployments of ARES platform services. Deployments run server-side on the VPS and stream build output for observability.

Base URL: http://localhost:3000

Authentication

All deployment endpoints require the admin secret:

X-Admin-Secret: <secret>

Trigger a Deployment

POST /api/admin/deploy

Starts a deployment for the specified target service. The deployment runs asynchronously — you receive a deployment ID immediately and poll for completion.

Request Body:

{
  "target": "ares"
}

Target	Description
`ares`	ARES backend — pulls latest code, rebuilds, and restarts
`admin`	dirmacs-admin dashboard — rebuilds Leptos frontend
`eruka`	Eruka backend — pulls, rebuilds, and restarts

Response:

{
  "id": "deploy-uuid",
  "status": "running",
  "message": "Deployment started for ares"
}

curl Example:

curl -X POST http://localhost:3000/api/admin/deploy \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{"target": "ares"}'

Poll Deployment Status

GET /api/admin/deploy/{id}

Returns the current status of a deployment. Poll this endpoint until status is no longer "running".

Response:

{
  "id": "deploy-uuid",
  "target": "ares",
  "status": "success",
  "started_at": "2026-03-13T14:00:00Z",
  "finished_at": "2026-03-13T14:03:42Z",
  "output": "Pulling latest changes...\nCompiling ares-server v0.1.0...\nFinished release target(s) in 3m 41s\nRestarting ares.service...\nService started successfully."
}

Status Values:

Status	Meaning
`running`	Deployment is in progress
`success`	Deployment completed successfully
`failed`	Deployment failed — check `output` for details

Polling Pattern

The recommended approach is to trigger a deployment, then poll every 3 seconds until it completes:

# 1. Trigger deployment
DEPLOY_ID=$(curl -s -X POST http://localhost:3000/api/admin/deploy \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{"target": "ares"}' | jq -r '.id')

echo "Deployment started: $DEPLOY_ID"

# 2. Poll until complete
while true; do
  RESULT=$(curl -s http://localhost:3000/api/admin/deploy/$DEPLOY_ID \
    -H "X-Admin-Secret: your-admin-secret")

  STATUS=$(echo "$RESULT" | jq -r '.status')
  echo "Status: $STATUS"

  if [ "$STATUS" != "running" ]; then
    echo "$RESULT" | jq -r '.output'
    break
  fi

  sleep 3
done

Python Example:

import requests
import time

ADMIN_SECRET = "your-admin-secret"
BASE_URL = "http://localhost:3000"
headers = {
    "X-Admin-Secret": ADMIN_SECRET,
    "Content-Type": "application/json",
}

# Trigger
resp = requests.post(
    f"{BASE_URL}/api/admin/deploy",
    headers=headers,
    json={"target": "ares"},
)
deploy_id = resp.json()["id"]
print(f"Deployment started: {deploy_id}")

# Poll
while True:
    resp = requests.get(
        f"{BASE_URL}/api/admin/deploy/{deploy_id}",
        headers=headers,
    )
    result = resp.json()
    print(f"Status: {result['status']}")

    if result["status"] != "running":
        print(result["output"])
        break

    time.sleep(3)

List Recent Deployments

GET /api/admin/deploys

Returns the 20 most recent deployments, newest first.

Response:

{
  "deploys": [
    {
      "id": "deploy-uuid",
      "target": "ares",
      "status": "success",
      "started_at": "2026-03-13T14:00:00Z",
      "finished_at": "2026-03-13T14:03:42Z"
    },
    {
      "id": "deploy-uuid-2",
      "target": "admin",
      "status": "failed",
      "started_at": "2026-03-12T10:00:00Z",
      "finished_at": "2026-03-12T10:02:15Z"
    }
  ]
}

curl Example:

curl http://localhost:3000/api/admin/deploys \
  -H "X-Admin-Secret: your-admin-secret"

Service Health

List All Services

GET /api/admin/services

Returns the runtime status of all managed services.

Response:

{
  "ares": {
    "status": "running",
    "pid": 12847,
    "port": 3000
  },
  "eruka": {
    "status": "running",
    "pid": 12901,
    "port": 8081
  },
  "admin": {
    "status": "running",
    "pid": null,
    "port": null
  }
}

Status	Meaning
`running`	Service is up and healthy
`stopped`	Service is not running
`degraded`	Service is running but unhealthy

curl Example:

curl http://localhost:3000/api/admin/services \
  -H "X-Admin-Secret: your-admin-secret"

Get Service Logs

GET /api/admin/services/{name}/logs

Returns recent log output from the service's systemd journal.

Response:

{
  "service": "ares",
  "lines": [
    "Mar 13 14:03:42 vps ares-server[12847]: Listening on 0.0.0.0:3000",
    "Mar 13 14:03:42 vps ares-server[12847]: Connected to PostgreSQL",
    "Mar 13 14:03:43 vps ares-server[12847]: Loaded 29 agents, 4 providers, 11 models",
    "Mar 13 14:04:01 vps ares-server[12847]: POST /v1/agents/risk-analyzer/run 200 1243ms"
  ]
}

curl Example:

curl http://localhost:3000/api/admin/services/ares/logs \
  -H "X-Admin-Secret: your-admin-secret"

Multi-Tenant Architecture

ARES is a multi-tenant platform. Each enterprise client operates within an isolated tenant, with their own agents, API keys, usage quotas, and data boundaries. This page explains the tenancy model and how to provision new clients.

Core Concepts

Tenants

A tenant is an isolated namespace on the ARES platform. Each tenant has:

A unique name and ID
A tier that determines rate limits and quotas
Its own set of agents (cloned from templates or created manually)
One or more API keys for authentication
Independent usage tracking and billing data

Tenants cannot see or interact with each other's resources. A request authenticated with Tenant A's API key will never return Tenant B's agents, runs, or usage data.

Tiers

Every tenant is assigned a tier that governs their resource limits:

Tier	Monthly Requests	Monthly Tokens	Daily Rate Limit	Use Case
Free	1,000	100,000	100/day	Evaluation and testing
Dev	10,000	1,000,000	1,000/day	Development and staging
Pro	100,000	10,000,000	10,000/day	Production workloads
Enterprise	Unlimited	Unlimited	Unlimited	High-volume clients

Tiers can be changed at any time via the Admin API without disrupting the tenant's service.

Agent Templates

When a tenant is provisioned, ARES clones a set of pre-configured agent templates based on the specified product_type. Templates provide a working starting point that can be customized after creation.

Available product types:

Product Type	Templates Included	Description
`generic`	General-purpose agents	Default chat and analysis agents
`kasino`	`kasino-classifier`, `kasino-risk`, `kasino-transaction`, `kasino-report`	Transaction analysis and reporting
`ehb`	Health-oriented agents	eHealthBuddy clinical agents

Each template defines the agent's model, system prompt, tool access, and default configuration. After provisioning, agents can be freely modified or new ones added.

API Key Scoping

Every API key is bound to exactly one tenant. When a request arrives with an API key:

ARES looks up the key and identifies the associated tenant
All operations execute within that tenant's scope
Usage is tracked against that tenant's quotas
The response only includes that tenant's data

A tenant can have multiple API keys (e.g., separate keys for production, staging, and mobile). Each key's usage is tracked individually but counts toward the shared tenant quota.

Data Isolation

Tenant isolation is enforced at the database query level. Every data-accessing query includes the tenant ID as a filter condition. This means:

Agent listings only return the requesting tenant's agents
Run history only shows runs from the requesting tenant
Usage data only reflects the requesting tenant's consumption
There is no API surface to query across tenant boundaries (except via the Admin API)

Provisioning Flow

The recommended way to onboard a new client is the atomic provisioning endpoint. It creates all required resources in a single database transaction.

Step 1: Provision the Client

curl -X POST http://localhost:3000/api/admin/provision-client \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "acme-corp",
    "tier": "pro",
    "product_type": "kasino",
    "api_key_name": "production"
  }'

Response:

{
  "tenant_id": "550e8400-e29b-41d4-a716-446655440000",
  "tenant_name": "acme-corp",
  "tier": "pro",
  "product_type": "kasino",
  "api_key_id": "key-uuid",
  "api_key_prefix": "ares_a1b2",
  "raw_api_key": "ares_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5",
  "agents_created": [
    "kasino-classifier",
    "kasino-risk",
    "kasino-transaction",
    "kasino-report"
  ]
}

This single call:

Creates the tenant with the specified tier
Looks up the agent templates for the given product_type
Clones each template as a tenant-specific agent
Generates an API key bound to the new tenant
Returns the raw API key (shown only once)

If any step fails, the entire operation is rolled back. You will never end up with a half-provisioned tenant.

Step 2: Deliver the API Key

Securely deliver the raw_api_key to your client. This is the only time the full key is visible — ARES stores only a hashed version internally.

Step 3: Verify the Setup

Confirm the tenant's agents are accessible using their new API key:

curl http://localhost:3000/v1/agents \
  -H "Authorization: Bearer ares_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5"

The client should see their four provisioned agents.

Step 4: Test an Agent Run

curl -X POST http://localhost:3000/v1/agents/kasino-classifier/run \
  -H "Authorization: Bearer ares_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "message": "Classify this transaction: $500 at electronics store"
    }
  }'

Managing Tenants After Provisioning

Add More Agents

curl -X POST http://localhost:3000/api/admin/tenants/{tenant_id}/agents \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "custom-summarizer",
    "agent_type": "summarizer",
    "config": {
      "model": "llama-3.3-70b",
      "system_prompt": "You summarize financial reports concisely.",
      "tools": [],
      "max_tokens": 2048
    }
  }'

Issue Additional API Keys

curl -X POST http://localhost:3000/api/admin/tenants/{tenant_id}/api-keys \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{"name": "staging-key"}'

Upgrade a Tenant's Tier

curl -X PUT http://localhost:3000/api/admin/tenants/{tenant_id}/quota \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{"tier": "enterprise"}'

Monitor Usage

# Current period summary
curl http://localhost:3000/api/admin/tenants/{tenant_id}/usage \
  -H "X-Admin-Secret: your-admin-secret"

# Daily breakdown for the last 30 days
curl "http://localhost:3000/api/admin/tenants/{tenant_id}/usage/daily?days=30" \
  -H "X-Admin-Secret: your-admin-secret"

Architecture Notes

Shared infrastructure: All tenants run on the same ARES instance and database. Isolation is logical, not physical. This keeps operational costs low for the MVP phase.
Atomic provisioning: The provisioning endpoint uses a database transaction. If agent template cloning fails halfway through, the tenant and any partially created resources are rolled back.
Key hashing: API keys are hashed before storage. The raw key is returned exactly once during creation. Lost keys must be revoked and replaced.
Auto-migration: ARES runs database migrations on startup (sqlx::migrate!()). New tenant-related schema changes are applied automatically when the server restarts.

Rate Limits and Quotas

ARES enforces two independent layers of rate limiting to protect the platform and ensure fair resource allocation across tenants.

Layer 1: IP-Based Rate Limiting

Every incoming request is subject to per-IP rate limiting via tower_governor. This layer protects against abuse, brute-force attacks, and accidental request floods regardless of authentication status.

IP-based limits apply to all routes, including unauthenticated endpoints like /health. The specific thresholds are configured server-side and are intentionally generous for normal usage patterns.

If you hit the IP rate limit, you will receive a 429 Too Many Requests response. Back off and retry after a short delay.

Layer 2: Tenant Quotas

Authenticated requests to /v1/* are additionally subject to tenant-level quotas based on the tenant's tier. These quotas reset at the beginning of each calendar month.

Tier	Monthly Requests	Monthly Tokens	Daily Rate Limit
Free	1,000	100,000	100/day
Dev	10,000	1,000,000	1,000/day
Pro	100,000	10,000,000	10,000/day
Enterprise	Unlimited	Unlimited	Unlimited

What Counts as a Request

Each API call to a metered endpoint counts as one request:

POST /v1/agents/{name}/run — 1 request
POST /v1/chat — 1 request
POST /v1/chat/stream — 1 request
GET /v1/agents — 1 request

Read-only endpoints like GET /v1/usage and GET /v1/api-keys are metered but count toward the request total.

What Counts as Tokens

Token usage is tracked per request based on the combined input and output token count from the LLM provider. Both the prompt tokens and completion tokens are summed.

Response Headers

When you make a request to a metered endpoint, ARES includes rate limit information in the response headers:

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed in the current period
`X-RateLimit-Remaining`	Requests remaining in the current period
`X-RateLimit-Reset`	UTC timestamp when the current period resets
`X-Quota-Tokens-Remaining`	Tokens remaining in the current monthly period

Example headers:

X-RateLimit-Limit: 10000
X-RateLimit-Remaining: 7482
X-RateLimit-Reset: 2026-04-01T00:00:00Z
X-Quota-Tokens-Remaining: 8241037

Exceeding Limits

When you exceed either rate limit layer, ARES returns:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "error": "Rate limit exceeded. Daily request limit reached for your tier."
}

The error message indicates which limit was hit:

Error Message	Cause	Resolution
`Rate limit exceeded`	IP-based rate limit	Wait and retry. Reduce request frequency.
`Daily request limit reached for your tier`	Tenant daily cap	Wait until the next UTC day, or upgrade your tier.
`Monthly request quota exceeded`	Tenant monthly cap	Wait until the next billing period, or upgrade.
`Monthly token quota exceeded`	Tenant token cap	Wait until the next billing period, or upgrade.

Checking Your Usage

You can proactively monitor your consumption to avoid hitting limits:

curl http://localhost:3000/v1/usage \
  -H "Authorization: Bearer ares_xxx"

Response:

{
  "period_start": "2026-03-01T00:00:00Z",
  "period_end": "2026-03-31T23:59:59Z",
  "total_runs": 4821,
  "total_tokens": 2847193,
  "total_api_calls": 5290,
  "quota_runs": 100000,
  "quota_tokens": 10000000,
  "daily_usage": [
    { "date": "2026-03-13", "runs": 312, "tokens": 184920, "api_calls": 340 }
  ]
}

Compare total_runs against quota_runs and total_tokens against quota_tokens to see how much headroom you have.

Best Practices

Monitor usage proactively. Poll GET /v1/usage periodically rather than waiting for 429 errors.
Implement exponential backoff. When you receive a 429, wait before retrying. A simple strategy: wait 1s, then 2s, then 4s, up to a maximum of 30s.
Cache where possible. Agent listings and model metadata change infrequently. Cache these responses to reduce unnecessary API calls.
Use streaming for chat. POST /v1/chat/stream counts as a single request regardless of response length, same as the non-streaming variant.
Request a tier upgrade early. If you anticipate hitting your quota before month-end, contact your platform administrator to upgrade your tier. Tier changes take effect immediately.

Loop Detection & Safety

ARES includes built-in safety mechanisms to prevent agents from getting stuck in infinite loops or crashing mid-execution.

Loop Detection

The LoopDetector monitors agent tool-calling conversations for repetitive patterns using a sliding-window hash approach.

How It Works

Each agent response is hashed (after whitespace normalization)
Hashes are stored in a sliding window (configurable size, default 10)
When duplicate hashes exceed a threshold, a loop is detected
The detector escalates through 3 tiers of intervention

Escalation Tiers

Tier	Action	Description
1	`InjectWarning`	Adds a system message warning the agent it's repeating itself
2	`ForceAlternative`	Forces the agent to take a different approach
3	`HaltAgent`	Stops the agent entirely and returns an error to the caller

Configuration

#![allow(unused)]
fn main() {
use ares::agents::loop_detector::{LoopDetector, LoopDetectorConfig};

let config = LoopDetectorConfig {
    window_size: 10,        // Number of recent responses to track
    threshold: 3,           // Duplicates before triggering
    min_response_length: 20, // Ignore very short responses
};

let mut detector = LoopDetector::new(config);
}

Usage in Agents

Loop detection is automatically applied during multi-turn tool-calling conversations. The ConfigurableAgent checks the detector after each response.

Crash Recovery

The CheckpointManager provides state serialization for long-running agent tasks.

Checkpoints

#![allow(unused)]
fn main() {
use ares::agents::checkpoint::{CheckpointManager, Checkpoint};

let manager = CheckpointManager::new("/data/checkpoints");

// Save a checkpoint
let checkpoint = Checkpoint {
    session_id: "session-123".to_string(),
    step: 5,
    messages: vec![/* conversation history */],
    tool_calls: vec![/* pending tool calls */],
    partial_results: vec![/* results so far */],
    status: "in_progress".to_string(),
};
manager.save(&checkpoint)?;

// Resume from latest checkpoint
if let Some(restored) = manager.load_latest("session-123")? {
    // Continue from where we left off
}
}

Cleanup

Old checkpoints are cleaned up automatically based on age:

#![allow(unused)]
fn main() {
// Remove checkpoints older than 24 hours
manager.cleanup(Duration::from_secs(86400))?;
}

Emergency Stop

The emergency stop is a global kill switch that immediately rejects all agent requests with HTTP 503.

# Activate emergency stop
curl -X POST http://localhost:3000/api/admin/agents/emergency-stop \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{"active": true}'

# Deactivate
curl -X POST http://localhost:3000/api/admin/agents/emergency-stop \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{"active": false}'

When active, all /api/chat, /api/chat/stream, /v1/chat, and agent execution endpoints return:

{
  "error": "Emergency stop is active. All agent requests are suspended.",
  "code": "EMERGENCY_STOP"
}

Error Handling

ARES uses conventional HTTP status codes and a consistent JSON error format across all endpoints. This page documents the error response structure, status code meanings, and common errors with their solutions.

Error Response Format

All errors return a JSON object with an error field containing a human-readable message:

{
  "error": "Human-readable error message"
}

The HTTP status code indicates the category of error. The error string provides specific details about what went wrong.

HTTP Status Codes

Success Codes

Code	Meaning	When Used
`200`	OK	Successful read or update operation
`201`	Created	Resource successfully created (tenant, agent, API key)
`204`	No Content	Successful delete with no response body

Client Error Codes

Code	Meaning	When Used
`400`	Bad Request	Malformed JSON, missing required fields, invalid parameter types
`401`	Unauthorized	Missing or invalid authentication credentials
`403`	Forbidden	Valid credentials but insufficient permissions for this operation
`404`	Not Found	Resource does not exist, or does not belong to your tenant
`409`	Conflict	Resource already exists (e.g., duplicate tenant name or agent name)
`422`	Unprocessable Entity	Request is well-formed but contains invalid values (e.g., unknown tier, invalid model name)
`429`	Too Many Requests	Rate limit or quota exceeded

Server Error Codes

Code	Meaning	When Used
`500`	Internal Server Error	Unexpected server-side failure

Common Errors and Solutions

Authentication Errors

Missing API key:

HTTP 401
{"error": "Missing authorization header"}

Add the Authorization: Bearer ares_xxx header to your request.

Invalid API key:

HTTP 401
{"error": "Invalid API key"}

Verify that the API key is correct and has not been revoked. API keys start with ares_.

Missing admin secret:

HTTP 401
{"error": "Missing X-Admin-Secret header"}

Admin endpoints require the X-Admin-Secret header, not the Authorization header.

Invalid admin secret:

HTTP 401
{"error": "Invalid admin secret"}

Verify the admin secret matches the value configured in ares.toml.

Resource Errors

Agent not found:

HTTP 404
{"error": "Agent not found: risk-analyzer"}

The agent does not exist for your tenant. Check the agent name with GET /v1/agents. Agent names are case-sensitive.

Tenant not found:

HTTP 404
{"error": "Tenant not found"}

The tenant ID does not exist. List tenants with GET /api/admin/tenants to find the correct ID.

Duplicate resource:

HTTP 409
{"error": "Agent with name 'risk-analyzer' already exists for this tenant"}

An agent with this name already exists. Use a different name or update the existing agent.

Validation Errors

Invalid tier:

HTTP 422
{"error": "Invalid tier: 'gold'. Valid tiers: free, dev, pro, enterprise"}

Use one of the supported tier values.

Missing required field:

HTTP 400
{"error": "Missing required field: name"}

Include all required fields in your request body. Refer to the API documentation for the specific endpoint.

Invalid JSON:

HTTP 400
{"error": "Invalid JSON in request body"}

Ensure your request body is valid JSON. Check for trailing commas, unquoted keys, or mismatched brackets. Verify the Content-Type: application/json header is set.

Rate Limit Errors

Quota exceeded:

HTTP 429
{"error": "Monthly request quota exceeded"}

Your tenant has used all allocated requests for the current billing period. Wait until the period resets or contact your administrator to upgrade your tier.

Daily limit:

HTTP 429
{"error": "Daily request limit reached for your tier"}

Your tenant has hit the daily rate cap. Wait until the next UTC day or upgrade your tier.

See Rate Limits and Quotas for details on limits by tier.

Server Errors

Internal server error:

HTTP 500
{"error": "Internal server error"}

An unexpected error occurred on the server. These are not caused by your request. If the error persists, check service health via GET /api/admin/services or inspect server logs.

Error Handling Best Practices

Always check the HTTP status code first. The status code tells you the error category before you parse the response body.
Parse the error message for user display. The error field is written to be human-readable and safe to show to end users.
Retry on 429 and 500. Rate limit errors (429) should be retried with exponential backoff. Server errors (500) may be transient — retry once or twice before treating as a permanent failure.
Do not retry on 400, 401, 403, 404, 409, or 422. These indicate problems with the request itself. Fix the request before retrying.
Log the full response. When debugging, log both the HTTP status code and the response body. The error message often contains the specific field or value that caused the problem.

Example: Robust Error Handling (Python)

import requests

def run_agent(api_key, agent_name, input_data):
    response = requests.post(
        f"http://localhost:3000/v1/agents/{agent_name}/run",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json",
        },
        json={"input": input_data},
    )

    if response.status_code == 200:
        return response.json()

    error = response.json().get("error", "Unknown error")

    if response.status_code == 401:
        raise AuthenticationError(f"Authentication failed: {error}")
    elif response.status_code == 404:
        raise AgentNotFoundError(f"Agent '{agent_name}' not found: {error}")
    elif response.status_code == 429:
        raise RateLimitError(f"Rate limited: {error}")
    elif response.status_code >= 500:
        raise ServerError(f"Server error: {error}")
    else:
        raise APIError(f"API error ({response.status_code}): {error}")

Example: Robust Error Handling (JavaScript)

async function runAgent(apiKey, agentName, inputData) {
  const response = await fetch(
    `http://localhost:3000/v1/agents/${agentName}/run`,
    {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${apiKey}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ input: inputData }),
    }
  );

  if (response.ok) {
    return await response.json();
  }

  const { error } = await response.json();

  switch (response.status) {
    case 401: throw new Error(`Authentication failed: ${error}`);
    case 404: throw new Error(`Agent '${agentName}' not found: ${error}`);
    case 429: throw new Error(`Rate limited: ${error}`);
    default:  throw new Error(`API error (${response.status}): ${error}`);
  }
}

Self-Hosting

Run your own ARES instance on your infrastructure. This guide covers local development setup, production deployment, and configuration options.

Prerequisites

Requirement	Minimum Version	Notes
Rust	1.91+	Install via rustup
PostgreSQL	15+	Used for tenants, agents, usage tracking
Git	2.x	For cloning the repository

Optional, depending on your provider configuration:

Requirement	When Needed
Groq API key	Using Groq as an LLM provider
Anthropic API key	Using Anthropic as an LLM provider
NVIDIA API key	Using NVIDIA-hosted DeepSeek models
Ollama	Running local models

Quick Start

1. Clone the Repository

git clone https://github.com/dirmacs/ares
cd ares

2. Set Up the Database

Create a PostgreSQL database for ARES:

createdb ares

ARES runs migrations automatically on startup. No manual schema setup is required.

3. Create Configuration

Copy the example config and customize it:

cp ares.example.toml ares.toml

Edit ares.toml to configure your providers and models. At minimum, you need one LLM provider:

[server]
port = 3000

[database]
url = "postgres://localhost/ares"

[[providers]]
name = "groq"
type = "openai"
base_url = "https://api.groq.com/openai/v1"
api_key_env = "GROQ_API_KEY"

[[providers.models]]
id = "llama-3.3-70b-versatile"
name = "llama-3.3-70b"
context_length = 131072

4. Set Environment Variables

export DATABASE_URL="postgres://localhost/ares"
export JWT_SECRET="your-secret-key-at-least-32-characters-long"
export API_KEY="your-admin-api-secret"
export GROQ_API_KEY="gsk_..."

Variable	Required	Description
`DATABASE_URL`	Yes	PostgreSQL connection string
`JWT_SECRET`	Yes	Secret for signing JWT tokens (32+ characters)
`API_KEY`	Yes	Admin secret for `/api/admin/*` endpoints
`GROQ_API_KEY`	If using Groq	Groq API key
`ANTHROPIC_API_KEY`	If using Anthropic	Anthropic API key
`NVIDIA_API_KEY`	If using NVIDIA	NVIDIA API key

5. Build

cargo build --release --features openai,postgres,mcp

See Feature Flags for all available options.

6. Run

./target/release/ares-server

7. Verify

curl http://localhost:3000/health

You should receive a 200 OK response. ARES is running.

Feature Flags

ARES uses Cargo feature flags to control which capabilities are compiled into the binary. This keeps the binary lean — only include what you need.

Feature	Default	Description
`openai`	Yes	OpenAI-compatible provider support (also used for Groq, NVIDIA)
`anthropic`	No	Anthropic Claude provider support
`ollama`	No	Local Ollama model support
`postgres`	Yes	PostgreSQL database backend
`mcp`	No	Model Context Protocol support for external tool servers
`ares-vector`	No	Vector storage and semantic search

Build Examples

Minimal build (Groq only):

cargo build --release --no-default-features --features openai,postgres

Full build (all providers):

cargo build --release --features openai,anthropic,ollama,postgres,mcp,ares-vector

Production build (recommended for VPS deployment):

cargo build --release --no-default-features --features openai,postgres,mcp

Production Deployment

systemd Service

Create a systemd unit file at /etc/systemd/system/ares.service:

[Unit]
Description=ARES AI Agent Platform
After=network.target postgresql.service
Wants=postgresql.service

[Service]
Type=simple
User=ares
Group=ares
WorkingDirectory=/opt/ares
ExecStart=/opt/ares/target/release/ares-server
Restart=on-failure
RestartSec=5
Environment=DATABASE_URL=postgres://dirmacs:password@localhost/ares
Environment=JWT_SECRET=your-production-jwt-secret
Environment=API_KEY=your-admin-secret
Environment=GROQ_API_KEY=gsk_...
Environment=RUST_LOG=info

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable ares
sudo systemctl start ares
sudo systemctl status ares

View logs:

journalctl -u ares -f

Caddy Reverse Proxy

Caddy provides automatic HTTPS with Let's Encrypt. Create a Caddyfile:

api.ares.yourdomain.com {
    reverse_proxy localhost:3000
}

Start Caddy:

sudo systemctl enable caddy
sudo systemctl start caddy

Caddy automatically provisions and renews TLS certificates. No manual certificate management is needed.

PostgreSQL Setup

For production, create a dedicated database user:

CREATE USER ares WITH PASSWORD 'strong-password-here';
CREATE DATABASE ares OWNER ares;

Update your DATABASE_URL accordingly:

DATABASE_URL=postgres://ares:strong-password-here@localhost/ares

Configuration Reference

The ares.toml file is the primary configuration file. It controls server settings, providers, models, and agent definitions.

Server Section

[server]
port = 3000          # HTTP port (overrides PORT env var)
host = "0.0.0.0"     # Bind address

Database Section

[database]
url = "postgres://ares:password@localhost/ares"
max_connections = 10

Provider Section

Each provider is defined as a [[providers]] entry:

[[providers]]
name = "groq"
type = "openai"
base_url = "https://api.groq.com/openai/v1"
api_key_env = "GROQ_API_KEY"

[[providers.models]]
id = "llama-3.3-70b-versatile"
name = "llama-3.3-70b"
context_length = 131072

[[providers.models]]
id = "llama-3.1-8b-instant"
name = "llama-3.1-8b"
context_length = 131072

[[providers]]
name = "anthropic"
type = "anthropic"
api_key_env = "ANTHROPIC_API_KEY"

[[providers.models]]
id = "claude-3-5-sonnet-20241022"
name = "claude-3.5-sonnet"
context_length = 200000

[[providers]]
name = "local"
type = "ollama"
base_url = "http://localhost:11434"

[[providers.models]]
id = "mistral"
name = "mistral-7b"
context_length = 32768

Agent Section

Static agents can be defined in the config file:

[[agents]]
name = "general-assistant"
model = "llama-3.3-70b"
system_prompt = "You are a helpful assistant."
tools = ["calculator", "web_search"]
max_tokens = 4096

For tenant-specific agents, use the Admin API instead of config file definitions.

Updating

To update a running ARES instance:

cd /opt/ares
git pull origin main
cargo build --release --no-default-features --features openai,postgres,mcp
sudo systemctl restart ares

Database migrations run automatically on startup. No manual migration steps are needed.

Troubleshooting

Port already in use:

Error: Address already in use (os error 98)

Another process is using port 3000. Either stop it or change the port in ares.toml.

Database connection failed:

Error: error communicating with database

Verify PostgreSQL is running and your DATABASE_URL is correct. Check that the database user has permissions on the database.

Provider API key missing:

Error: Environment variable GROQ_API_KEY not set

Set the required API key environment variable, or remove the provider from ares.toml if you do not need it.

JWT secret too short:

Error: JWT_SECRET must be at least 32 characters

Use a longer secret. Generate one with: openssl rand -hex 32

ContextProvider Trait

ARES provides a ContextProvider trait that lets extension crates inject external context into every agent call before LLM invocation.

How It Works

Before every LLM call, ARES checks state.context_provider.get_context(agent_name, tenant_id). If it returns Some(context), the context is prepended to the agent's system prompt.

By default, ARES uses NoOpContextProvider which returns None — agents run with their configured system prompt only.

Implementing Your Own

#![allow(unused)]
fn main() {
use ares::agents::context_provider::ContextProvider;
use async_trait::async_trait;

struct MyKnowledgeProvider {
    api_url: String,
}

#[async_trait]
impl ContextProvider for MyKnowledgeProvider {
    async fn get_context(
        &self,
        agent_name: &str,
        tenant_id: &str,
    ) -> Option<String> {
        // Fetch relevant context from your knowledge base
        // Return None if no context available
        let url = format!("{}/context/{}/{}", self.api_url, tenant_id, agent_name);
        reqwest::get(&url).await.ok()?.text().await.ok()
    }
}
}

Wiring Into AppState

#![allow(unused)]
fn main() {
use std::sync::Arc;

let state = AppState {
    context_provider: Arc::new(MyKnowledgeProvider {
        api_url: "http://localhost:8081".to_string(),
    }),
    // ... other fields
};
}

Use Cases

Knowledge base injection — fetch relevant docs per agent and tenant
User preference injection — personalize agent behavior based on user history
Compliance constraints — inject regulatory rules into agent prompts
RAG augmentation — supplement the built-in RAG with external retrieval

Building on base_router()

ARES exports base_router(state) which returns a fully configured Axum router with all generic endpoints. Extension crates can build managed platforms by merging additional routes on top.

Pattern

use ares::{base_router, AppState};
use axum::{routing::post, Router};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Build AppState (config, DB, LLM, tools, agents)
    let state = build_my_state().await?;

    // Start with all ARES generic routes
    let app = Router::new()
        .route("/health", axum::routing::get(|| async { "OK" }))
        .nest("/api", ares::api::routes::create_router(
            state.auth_service.clone(),
            state.tenant_db.clone(),
        ))
        // Add your own routes
        .nest("/v1/my-feature", my_routes())
        .with_state(state);

    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?;
    axum::serve(listener, app).await?;
    Ok(())
}

What base_router() Includes

Route Group	Endpoints
Auth	`/api/auth/register`, `/api/auth/login`, `/api/auth/refresh`, `/api/auth/logout`
Chat	`/api/chat`, `/api/chat/stream`
Agents	`/api/agents`
Research	`/api/research`
Workflows	`/api/workflows`, `/api/workflows/{name}`
User Agents	`/api/user/agents/*`
Conversations	`/api/conversations/*`
Admin	`/api/admin/tenants/`, `/api/admin/agents/`, `/api/admin/deploy/*`
V1 (API Key)	`/api/v1/chat`, `/api/v1/agents/*`, `/api/v1/usage`
RAG	`/api/rag/ingest`, `/api/rag/search` (requires `local-embeddings` + `ares-vector` features)

Registering Custom Tools

#![allow(unused)]
fn main() {
let mut tool_registry = ToolRegistry::with_config(&config);

// Built-in tools
tool_registry.register(Arc::new(ares::tools::calculator::Calculator));

// Your custom tools
tool_registry.register(Arc::new(MyCustomTool::new()));
}

Adding Middleware

#![allow(unused)]
fn main() {
let app = base_router(state.clone())
    .layer(my_auth_middleware())
    .layer(my_logging_middleware());
}

Guide: Build a Chat Agent

This guide walks you through creating a custom chat agent on ARES — from defining its behavior to testing it in production.

What is an Agent?

An ARES agent is a configured LLM endpoint with a specific personality, instructions, and tool access. Each agent has:

A name — unique identifier used in API calls
A model — which LLM powers it (e.g., llama-3.3-70b, claude-3.5-sonnet)
A system prompt — instructions that define the agent's behavior
Tools — optional capabilities like calculator or web_search
Configuration — max tokens, temperature, and other parameters

You can create agents in two ways: via the configuration file or via the API.

Option 1: Define in ares.toml

For agents that are part of your core platform, define them in the ares.toml configuration file:

[[agents]]
name = "financial-analyst"
model = "llama-3.3-70b"
system_prompt = """
You are a senior financial analyst. You help users understand financial data,
calculate metrics, and provide clear explanations of financial concepts.

Guidelines:
- Always show your calculations step by step
- Use the calculator tool for arithmetic to ensure accuracy
- Present numbers with appropriate formatting (commas, decimal places)
- When uncertain, clearly state your assumptions
"""
tools = ["calculator"]
max_tokens = 4096

Restart ARES to load the new agent. It will be available immediately at /api/chat using agent_type: "financial-analyst".

TOON Config Format

ARES also supports the TOON configuration format for more structured agent definitions:

[[agents]]
name = "support-agent"
model = "llama-3.3-70b"

[agents.toon]
role = "Customer Support Specialist"
personality = "Professional, empathetic, solution-oriented"
knowledge = ["product documentation", "pricing plans", "common issues"]
constraints = [
    "Never make up information about products",
    "Escalate billing disputes to human agents",
    "Always confirm the customer's issue before proposing a solution",
]
tools = ["web_search"]

The TOON format structures the system prompt into semantic fields that ARES assembles into a coherent prompt. This makes agent behavior easier to reason about and modify.

Option 2: Create via API

For tenant-specific agents or agents you want to manage programmatically, use the API.

As a Platform Admin

curl -X POST http://localhost:3000/api/admin/tenants/{tenant_id}/agents \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "financial-analyst",
    "agent_type": "analyst",
    "config": {
      "model": "llama-3.3-70b",
      "system_prompt": "You are a senior financial analyst...",
      "tools": ["calculator"],
      "max_tokens": 4096
    }
  }'

As an Authenticated User

curl -X POST http://localhost:3000/api/user/agents \
  -H "Authorization: Bearer <jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-analyst",
    "agent_type": "analyst",
    "config": {
      "model": "llama-3.3-70b",
      "system_prompt": "You are a senior financial analyst...",
      "tools": ["calculator"],
      "max_tokens": 4096
    }
  }'

Testing Your Agent

Basic Chat

Send a message to your agent:

curl -X POST http://localhost:3000/api/chat \
  -H "Authorization: Bearer <jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What is the compound annual growth rate if revenue went from $1M to $1.8M over 3 years?",
    "agent_type": "financial-analyst"
  }'

Expected response:

{
  "response": "To calculate the Compound Annual Growth Rate (CAGR):\n\nCAGR = (Ending Value / Beginning Value)^(1/n) - 1\nCAGR = ($1,800,000 / $1,000,000)^(1/3) - 1\nCAGR = (1.8)^(0.3333) - 1\nCAGR = 1.2164 - 1\nCAGR = 0.2164\n\n**The CAGR is 21.64%.**\n\nThis means revenue grew at an average annual rate of approximately 21.6% over the 3-year period.",
  "agent": "financial-analyst",
  "context_id": "ctx_abc123"
}

Multi-Turn Conversation

Pass the context_id from the previous response to continue the conversation. ARES manages history server-side:

curl -X POST http://localhost:3000/api/chat \
  -H "Authorization: Bearer <jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What if the period was 5 years instead?",
    "agent_type": "financial-analyst",
    "context_id": "ctx_abc123"
  }'

With Tool Usage

If your agent has tools enabled, ARES handles the tool calling loop automatically. You send a normal chat message, and the agent uses tools as needed:

curl -X POST http://localhost:3000/api/chat \
  -H "Authorization: Bearer <jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Calculate 15% annual compound interest on $50,000 over 10 years",
    "agent_type": "financial-analyst"
  }'

The agent will internally call the calculator tool to compute 50000 * (1.15)^10 and return the formatted result.

Streaming

For real-time responses, use the streaming endpoint:

curl -X POST http://localhost:3000/api/chat/stream \
  -H "Authorization: Bearer <jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Explain the difference between NPV and IRR",
    "agent_type": "financial-analyst"
  }'

This returns a Server-Sent Events stream. See the V1 API docs for client-side streaming examples.

Iterating on the System Prompt

The system prompt is the most important part of your agent. Here are practical guidelines:

Be Specific About Format

Bad:

You are a helpful assistant.

Good:

You are a financial analyst. When presenting calculations:
- Show each step on its own line
- Use the calculator tool for all arithmetic
- Format currency with $ and commas
- Round percentages to 2 decimal places
- End with a bold summary line

Define Boundaries

Tell the agent what it should not do:

Constraints:
- Never provide specific investment advice or recommend buying/selling securities
- If asked about tax implications, recommend consulting a tax professional
- Do not speculate about future market movements
- If you don't have enough data to answer accurately, say so

Include Examples

For complex formatting requirements, show the agent what you want:

When comparing metrics, use this format:

| Metric | 2024 | 2025 | Change |
|--------|------|------|--------|
| Revenue | $1.2M | $1.8M | +50% |
| EBITDA | $300K | $480K | +60% |

Test Edge Cases

After writing your system prompt, test these scenarios:

Off-topic requests — Does the agent stay in character or helpfully redirect?
Ambiguous inputs — Does the agent ask for clarification?
Tool failures — Does the agent handle tool errors gracefully?
Long conversations — Does the agent maintain context over multiple turns?

Adding Tool Access

Agents can use built-in tools to extend their capabilities:

[[agents]]
name = "research-agent"
model = "llama-3.3-70b"
system_prompt = "You are a research agent with access to web search and calculation tools."
tools = ["calculator", "web_search"]

Available built-in tools:

Tool	Description
`calculator`	Evaluate mathematical expressions
`web_search`	Search the web for current information

See the Tool Calling guide for details on how tool execution works.

Choosing a Model

Different models have different strengths. Consider these factors when choosing:

Model	Provider	Best For
`llama-3.3-70b`	Groq	General-purpose, fast, good reasoning
`llama-3.1-8b`	Groq	Simple tasks, lowest latency
`deepseek-r1`	NVIDIA	Complex reasoning, chain-of-thought
`claude-3.5-sonnet`	Anthropic	Nuanced writing, careful analysis

Start with llama-3.3-70b for most use cases. It offers a strong balance of capability, speed, and cost. Move to a specialized model only if you have a specific need.

Check available models with:

curl http://localhost:3000/api/admin/models \
  -H "X-Admin-Secret: your-admin-secret"

Guide: Tool Calling

ARES supports tool calling (also known as function calling), allowing agents to use external tools during a conversation. When an agent needs to perform a calculation, search the web, or interact with an external system, it requests a tool call. ARES executes the tool and feeds the result back to the agent, which then incorporates it into its response.

How It Works

Tool calling in ARES follows a multi-turn loop managed by the ToolCoordinator:

User message
    |
    v
Agent (LLM) generates response
    |
    ├── If response is final text → return to user
    |
    └── If response contains tool_calls →
            |
            v
        ARES executes each tool
            |
            v
        Results sent back to agent
            |
            v
        Agent generates next response (may call more tools or return final text)

This loop continues until the agent produces a final text response or the maximum iteration limit is reached. The entire process is transparent to the caller — you send a chat message and receive a complete response.

Built-in Tools

ARES ships with two built-in tools:

calculator

Evaluates mathematical expressions and returns the result.

Capabilities:

Basic arithmetic: +, -, *, /
Exponents: ^ or **
Parentheses for grouping
Common functions: sqrt, sin, cos, log, ln, abs
Constants: pi, e

Example tool call from agent:

{
  "name": "calculator",
  "arguments": {
    "expression": "50000 * (1.15 ^ 10)"
  }
}

Result returned to agent:

{
  "result": 202278.25
}

web_search

Searches the web and returns relevant results.

Example tool call from agent:

{
  "name": "web_search",
  "arguments": {
    "query": "current US federal interest rate 2026"
  }
}

Result returned to agent:

{
  "results": [
    {
      "title": "Federal Reserve holds rate at 4.25%",
      "url": "https://...",
      "snippet": "The Federal Reserve maintained its benchmark rate..."
    }
  ]
}

Configuring Tool Access

Per-Agent Tool Filtering

Each agent specifies which tools it can use. An agent without tools configured cannot make tool calls, even if the underlying model supports them.

In ares.toml:

[[agents]]
name = "research-assistant"
model = "llama-3.3-70b"
system_prompt = "You are a research assistant with access to web search and calculation tools."
tools = ["calculator", "web_search"]

[[agents]]
name = "math-tutor"
model = "llama-3.3-70b"
system_prompt = "You are a math tutor. Use the calculator to verify your work."
tools = ["calculator"]

[[agents]]
name = "simple-chat"
model = "llama-3.3-70b"
system_prompt = "You are a conversational assistant."
tools = []

Via the API:

curl -X POST http://localhost:3000/api/admin/tenants/{id}/agents \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "analyst",
    "agent_type": "analyst",
    "config": {
      "model": "llama-3.3-70b",
      "system_prompt": "You are a data analyst.",
      "tools": ["calculator", "web_search"],
      "max_tokens": 4096
    }
  }'

ToolCoordinator

The ToolCoordinator is the internal component that manages the tool calling loop. It handles:

Multi-turn orchestration — Sending tool results back to the model and processing follow-up tool calls
Parallel execution — When the model requests multiple tools in a single turn, they execute concurrently
Timeout enforcement — Individual tool calls are bounded by a configurable timeout
Iteration limits — Prevents infinite tool-calling loops

Configuration

Tool calling behavior is configured at the server level:

Setting	Default	Description
`max_iterations`	`10`	Maximum tool-calling rounds before forcing a text response
`parallel_execution`	`true`	Execute multiple tool calls concurrently within a single turn
`tool_timeout`	`30s`	Maximum time for a single tool execution

If an agent hits the iteration limit, ARES instructs the model to produce a final response using the information gathered so far.

Provider Compatibility

Tool calling requires model support. Not all providers and models support function calling:

Provider	Models	Tool Calling
Groq	llama-3.3-70b, llama-3.1-8b	Supported
Anthropic	claude-3.5-sonnet	Supported
NVIDIA	deepseek-r1	Not supported
Ollama	Varies by model	Model-dependent

If you assign tools to an agent using a model that does not support tool calling, the tools will be ignored and the agent will respond with text only.

Example: Conversation with Tool Calls

Here is what happens internally when a user asks a question that requires tool use.

User sends:

curl -X POST http://localhost:3000/v1/chat \
  -H "Authorization: Bearer ares_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the monthly payment on a $400,000 mortgage at 6.5% for 30 years?"}
    ],
    "agent_type": "financial-analyst"
  }'

Internal flow:

ARES sends the message to the LLM with the calculator tool definition

The LLM responds with a tool call:

{
  "tool_calls": [{
    "name": "calculator",
    "arguments": {"expression": "(400000 * (0.065/12) * (1 + 0.065/12)^360) / ((1 + 0.065/12)^360 - 1)"}
  }]
}

ARES executes the calculator and gets 2528.27
ARES sends the result back to the LLM
The LLM produces a final text response incorporating the calculated value

User receives:

{
  "content": "The monthly payment on a $400,000 mortgage at 6.5% APR over 30 years would be **$2,528.27**.\n\nThis is calculated using the standard amortization formula...",
  "model": "llama-3.3-70b",
  "tokens_used": 412
}

The tool-calling steps are invisible to the caller. You send a question and receive a complete answer.

Example: Multiple Tool Calls in One Turn

Models can request multiple tools simultaneously. For example, a research agent asked to "Compare the population of Tokyo and New York" might request two web searches in parallel:

{
  "tool_calls": [
    {"name": "web_search", "arguments": {"query": "Tokyo population 2026"}},
    {"name": "web_search", "arguments": {"query": "New York population 2026"}}
  ]
}

With parallel_execution enabled (the default), both searches execute concurrently. The results are sent back to the model together, and it produces a response comparing both cities.

Example: Multi-Turn Tool Usage

Some questions require multiple rounds of tool use. For example:

User: "What is 15% of the GDP of France?"

Turn 1 — Agent calls web_search:

{"name": "web_search", "arguments": {"query": "France GDP 2026 USD"}}

Result: France's GDP is approximately $3.1 trillion.

Turn 2 — Agent calls calculator:

{"name": "calculator", "arguments": {"expression": "3100000000000 * 0.15"}}

Result: 465,000,000,000

Turn 3 — Agent produces final response: "15% of France's GDP (approximately $3.1 trillion) is $465 billion."

Each round counts toward the max_iterations limit.

Error Handling

If a tool call fails (timeout, invalid input, etc.), ARES returns an error result to the model:

{
  "tool_result": {
    "name": "web_search",
    "error": "Search timed out after 30 seconds"
  }
}

The model can then decide to:

Retry the tool call with different parameters
Use a different tool
Respond with what it knows, noting the tool failure

Well-designed system prompts should instruct the agent on how to handle tool failures gracefully.

Changelog

All notable changes to ARES are documented here. This project follows Semantic Versioning.

0.6.3

Multi-provider LLM, tenant agents, and enterprise metering.

This release transforms ARES from a single-provider system into a full multi-provider LLM platform with enterprise-grade tenant management.

Added

Multi-provider LLM routing — Support for 4 providers (Groq, Anthropic, NVIDIA DeepSeek, Ollama) and 11 models through a unified API.
Model tier system — fast, balanced, powerful, deepseek, and local tiers with automatic provider routing.
Tenant agent system — Agents stored in the database per tenant. Template-based provisioning with full CRUD via admin API.
Agent templates — Seed templates applied automatically on startup. New tenants receive a default agent set.
Usage metering — usage_events table, monthly_usage_cache, and daily_rate_limits for tracking tokens, requests, and costs per tenant.
API key authentication — Authorization: Bearer ares_xxx on /v1/* routes with tenant scoping.
Kasino enterprise agents — 4 specialized agent templates (kasino-classifier, kasino-risk, kasino-transaction, kasino-report) for the first enterprise client.
Kasino API routes — Both JWT-protected (/api/kasino/*) and API-key (/v1/kasino/*) endpoints.
Admin provisioning API — Atomic tenant creation: schema + agents + API key in a single operation.

Changed

Chat handler now resolves tenant_id from authentication context instead of hardcoded values.
Provider configuration moved from code to ares.toml for runtime flexibility.
Rate limit enforcement now operates at both the provider and tenant level.

Fixed

Chat handler tenant_id resolution for multi-tenant requests.

0.6.2

Streaming and SSE support.

Added

Server-Sent Events streaming — POST /v1/chat/stream endpoint for real-time, token-by-token responses.
Stream handler — Unified streaming across all providers with consistent SSE format.
Context continuation — context_id parameter for maintaining conversation history across requests.

Changed

Response format standardized to {"response", "agent", "context_id"} across all endpoints.

0.6.1

Tool calling and RAG foundations.

Added

Tool calling framework — Define tools per agent. ARES manages the tool-call loop, execution, and response assembly.
RAG pipeline — Retrieval-augmented generation with pluggable document stores.
Workflow engine — Chain multiple agents into multi-step workflows with deterministic execution.

Changed

Agent configuration schema extended to support tool definitions and RAG settings.

0.5.0

JWT authentication and user management.

Added

User registration and login — POST /api/auth/register, POST /api/auth/login.
JWT token lifecycle — 15-minute access tokens, refresh token rotation, logout/invalidation.
Role-based access — User roles with permission checks on protected routes.
Admin authentication — X-Admin-Secret header for internal administration endpoints.

Changed

All /api/* routes now require JWT authentication.
Error responses standardized with error and code fields.

0.4.0

PostgreSQL backend and multi-tenant schema.

Added

PostgreSQL integration — Full migration from in-memory storage to PostgreSQL with sqlx.
Auto-migration — sqlx::migrate!() runs on startup. No manual SQL required.
Tenant schema — tenants, tenant_agents, and api_keys tables with foreign key relationships.
Tenant tiers — Free, Dev, Pro, and Enterprise tiers with configurable limits.

Changed

All state persistence moved from in-memory structures to PostgreSQL.
Connection pooling via sqlx::PgPool with configurable pool size.

For the complete commit history, see the ARES repository on GitHub.

ARES Documentation