Introduction
ARES is a multi-provider LLM platform that gives you a single, unified API to route requests across Groq, Anthropic, NVIDIA DeepSeek, and Ollama. It handles tool calling, retrieval-augmented generation (RAG), multi-step workflows, streaming, usage metering, and multi-tenant isolation out of the box — so you can focus on building your AI application instead of stitching together provider SDKs.
Key capabilities
- Multi-provider LLM routing — Send requests to Groq, Anthropic, NVIDIA, or Ollama through one API. Switch models without changing your integration.
- Tool calling — Define tools your agents can invoke. ARES manages the tool-call loop, execution, and response assembly.
- Retrieval-augmented generation (RAG) — Ground LLM responses in your own data with built-in retrieval pipelines.
- Workflows — Chain multiple agents and processing steps into deterministic, multi-step workflows.
- Multi-tenant enterprise support — Tenant isolation, per-tenant agent configuration, API key scoping, and usage tracking at the tenant level.
- Streaming — Server-Sent Events (SSE) streaming for real-time, token-by-token responses.
- Usage metering — Track tokens, requests, and costs per tenant with built-in rate limiting and quota enforcement.
Who is ARES for?
- Platform teams building internal AI infrastructure who need a reliable, multi-provider abstraction layer.
- Enterprise clients who want managed AI agents with tenant isolation, usage visibility, and SLA guarantees.
- Developers building AI applications who want a clean API without managing provider credentials, rate limits, and failover logic themselves.
Base URL
All API requests are made to:
https://api.ares.dirmacs.com
Quick links
| Resource | Description |
|---|---|
| Quickstart | Zero to first API call in 5 minutes |
| Authentication | API keys, JWT tokens, and admin auth |
| Models & Providers | Available models, tiers, and provider configuration |
| Changelog | Release history and breaking changes |
Quickstart
Get from zero to your first ARES API call in under 5 minutes.
Prerequisites
- An ARES API key (format:
ares_xxx). Contact your administrator or use the Dirmacs Admin provisioning UI to generate one.
1. Make your first chat request
Send a message to an ARES agent using the chat endpoint.
curl
curl -X POST https://api.ares.dirmacs.com/v1/chat \
-H "Authorization: Bearer ares_xxx" \
-H "Content-Type: application/json" \
-d '{
"message": "What can you help me with?",
"agent_type": "product"
}'
Python
import requests
response = requests.post(
"https://api.ares.dirmacs.com/v1/chat",
headers={
"Authorization": "Bearer ares_xxx",
"Content-Type": "application/json",
},
json={
"message": "What can you help me with?",
"agent_type": "product",
},
)
data = response.json()
print(data["response"])
JavaScript
const response = await fetch("https://api.ares.dirmacs.com/v1/chat", {
method: "POST",
headers: {
"Authorization": "Bearer ares_xxx",
"Content-Type": "application/json",
},
body: JSON.stringify({
message: "What can you help me with?",
agent_type: "product",
}),
});
const data = await response.json();
console.log(data.response);
Response
{
"response": "I can help you with product information, recommendations, and questions...",
"agent": "product",
"context_id": "ctx_a1b2c3d4"
}
The context_id is returned with every response. Pass it back in subsequent requests to maintain conversation context.
2. Try streaming
For real-time, token-by-token output, use the streaming endpoint. ARES streams responses using Server-Sent Events (SSE).
curl
curl -N -X POST https://api.ares.dirmacs.com/v1/chat/stream \
-H "Authorization: Bearer ares_xxx" \
-H "Content-Type: application/json" \
-d '{
"message": "Explain how LLM routing works",
"agent_type": "product"
}'
The -N flag disables output buffering so you see tokens as they arrive.
Python
import requests
response = requests.post(
"https://api.ares.dirmacs.com/v1/chat/stream",
headers={
"Authorization": "Bearer ares_xxx",
"Content-Type": "application/json",
},
json={
"message": "Explain how LLM routing works",
"agent_type": "product",
},
stream=True,
)
for line in response.iter_lines():
if line:
decoded = line.decode("utf-8")
if decoded.startswith("data: "):
print(decoded[6:], end="", flush=True)
JavaScript
const response = await fetch("https://api.ares.dirmacs.com/v1/chat/stream", {
method: "POST",
headers: {
"Authorization": "Bearer ares_xxx",
"Content-Type": "application/json",
},
body: JSON.stringify({
message: "Explain how LLM routing works",
agent_type: "product",
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split("\n");
for (const line of lines) {
if (line.startsWith("data: ")) {
process.stdout.write(line.slice(6));
}
}
}
3. Continue a conversation
Use the context_id from a previous response to maintain conversation history:
curl -X POST https://api.ares.dirmacs.com/v1/chat \
-H "Authorization: Bearer ares_xxx" \
-H "Content-Type: application/json" \
-d '{
"message": "Tell me more about that",
"agent_type": "product",
"context_id": "ctx_a1b2c3d4"
}'
Next steps
- Authentication — Learn about API keys, JWT tokens, and admin authentication.
- Models & Providers — Understand which models are available and how to choose the right one.
Authentication
ARES supports three authentication methods, each designed for a different use case.
| Method | Header | Routes | Use case |
|---|---|---|---|
| API Key | Authorization: Bearer ares_xxx | /v1/* | Client applications, backend services |
| JWT | Authorization: Bearer <access_token> | /api/* | End-user sessions, frontend apps |
| Admin Secret | X-Admin-Secret: <secret> | /api/admin/* | Internal administration |
API Key authentication
API keys are the simplest way to authenticate with ARES. Each key is scoped to a single tenant and carries that tenant’s permissions and rate limits.
Format: ares_ followed by a random string (e.g., ares_k7Gx9mPqR2vLwN4s).
How to get one: API keys are generated during tenant provisioning via the Dirmacs Admin dashboard, or through the admin API.
Usage
Pass the API key in the Authorization header on any /v1/* endpoint:
curl -X POST https://api.ares.dirmacs.com/v1/chat \
-H "Authorization: Bearer ares_k7Gx9mPqR2vLwN4s" \
-H "Content-Type: application/json" \
-d '{"message": "Hello", "agent_type": "product"}'
import requests
headers = {
"Authorization": "Bearer ares_k7Gx9mPqR2vLwN4s",
"Content-Type": "application/json",
}
response = requests.post(
"https://api.ares.dirmacs.com/v1/chat",
headers=headers,
json={"message": "Hello", "agent_type": "product"},
)
const response = await fetch("https://api.ares.dirmacs.com/v1/chat", {
method: "POST",
headers: {
"Authorization": "Bearer ares_k7Gx9mPqR2vLwN4s",
"Content-Type": "application/json",
},
body: JSON.stringify({ message: "Hello", agent_type: "product" }),
});
Security: Treat API keys like passwords. Do not embed them in client-side code, commit them to version control, or expose them in logs. Use environment variables or a secrets manager.
JWT authentication
JWT authentication is designed for end-user sessions. Users register and log in to receive short-lived access tokens and long-lived refresh tokens.
- Access tokens expire after 15 minutes.
- Refresh tokens are used to obtain new access tokens without re-entering credentials.
Register a new user
curl -X POST https://api.ares.dirmacs.com/api/auth/register \
-H "Content-Type: application/json" \
-d '{
"email": "developer@example.com",
"password": "your-secure-password",
"name": "Jane Developer"
}'
Response:
{
"message": "Registration successful",
"user_id": "usr_abc123"
}
Log in
curl -X POST https://api.ares.dirmacs.com/api/auth/login \
-H "Content-Type: application/json" \
-d '{
"email": "developer@example.com",
"password": "your-secure-password"
}'
Response:
{
"access_token": "eyJhbGciOiJIUzI1NiIs...",
"refresh_token": "rt_x9Kp2mQvL8wN3rTs...",
"expires_in": 900
}
Use the access token
Pass the access token in the Authorization header on any /api/* endpoint:
curl https://api.ares.dirmacs.com/api/chat \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
-H "Content-Type: application/json" \
-d '{"message": "Hello", "agent_type": "product"}'
Refresh an expired token
When your access token expires, use the refresh token to get a new one:
curl -X POST https://api.ares.dirmacs.com/api/auth/refresh \
-H "Content-Type: application/json" \
-d '{
"refresh_token": "rt_x9Kp2mQvL8wN3rTs..."
}'
Response:
{
"access_token": "eyJhbGciOiJIUzI1NiIs...",
"expires_in": 900
}
Log out
Invalidate a refresh token when the user logs out:
curl -X POST https://api.ares.dirmacs.com/api/auth/logout \
-H "Content-Type: application/json" \
-d '{
"refresh_token": "rt_x9Kp2mQvL8wN3rTs..."
}'
Token management in Python
import requests
import time
class AresClient:
def __init__(self, base_url="https://api.ares.dirmacs.com"):
self.base_url = base_url
self.access_token = None
self.refresh_token = None
self.token_expiry = 0
def login(self, email, password):
response = requests.post(
f"{self.base_url}/api/auth/login",
json={"email": email, "password": password},
)
data = response.json()
self.access_token = data["access_token"]
self.refresh_token = data["refresh_token"]
self.token_expiry = time.time() + data["expires_in"]
def _ensure_valid_token(self):
if time.time() >= self.token_expiry - 30: # Refresh 30s before expiry
response = requests.post(
f"{self.base_url}/api/auth/refresh",
json={"refresh_token": self.refresh_token},
)
data = response.json()
self.access_token = data["access_token"]
self.token_expiry = time.time() + data["expires_in"]
def chat(self, message, agent_type="product"):
self._ensure_valid_token()
response = requests.post(
f"{self.base_url}/api/chat",
headers={"Authorization": f"Bearer {self.access_token}"},
json={"message": message, "agent_type": agent_type},
)
return response.json()
Token management in JavaScript
class AresClient {
constructor(baseUrl = "https://api.ares.dirmacs.com") {
this.baseUrl = baseUrl;
this.accessToken = null;
this.refreshToken = null;
this.tokenExpiry = 0;
}
async login(email, password) {
const response = await fetch(`${this.baseUrl}/api/auth/login`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ email, password }),
});
const data = await response.json();
this.accessToken = data.access_token;
this.refreshToken = data.refresh_token;
this.tokenExpiry = Date.now() + data.expires_in * 1000;
}
async ensureValidToken() {
if (Date.now() >= this.tokenExpiry - 30000) {
const response = await fetch(`${this.baseUrl}/api/auth/refresh`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ refresh_token: this.refreshToken }),
});
const data = await response.json();
this.accessToken = data.access_token;
this.tokenExpiry = Date.now() + data.expires_in * 1000;
}
}
async chat(message, agentType = "product") {
await this.ensureValidToken();
const response = await fetch(`${this.baseUrl}/api/chat`, {
method: "POST",
headers: {
"Authorization": `Bearer ${this.accessToken}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ message, agent_type: agentType }),
});
return response.json();
}
}
Admin Secret authentication
The admin secret provides full access to ARES administration endpoints. It is intended for internal tools and the Dirmacs Admin dashboard only.
Pass the secret in the X-Admin-Secret header:
curl https://api.ares.dirmacs.com/api/admin/tenants \
-H "X-Admin-Secret: your-admin-secret"
Warning: The admin secret grants unrestricted access to all tenants, agents, and configuration. Never expose it outside your infrastructure. It should only be used in server-to-server calls from trusted internal services.
Error responses
Authentication failures return standard HTTP status codes:
| Status | Meaning |
|---|---|
401 Unauthorized | Missing or invalid credentials |
403 Forbidden | Valid credentials but insufficient permissions |
429 Too Many Requests | Rate limit exceeded for this API key or tenant |
Example error response:
{
"error": "Invalid or expired token",
"code": "AUTH_INVALID_TOKEN"
}
Models & Providers
ARES routes LLM requests across multiple providers through a single API. You do not call providers directly — ARES selects the appropriate model based on the agent configuration and handles credentials, rate limits, and failover transparently.
Available models
| Tier | Provider | Model | Best for |
|---|---|---|---|
fast | Groq | llama-3.1-8b-instant | Quick responses, classification, simple Q&A |
balanced | Groq | llama-3.3-70b-versatile | General-purpose tasks, GPT-4 class quality |
powerful | Anthropic | claude-sonnet-4-6 | Complex reasoning, long-form analysis, nuanced tasks |
deepseek | NVIDIA | deepseek-v3.2 | Code generation, technical documentation, structured output |
local | Ollama | ministral-3:3b | Development, testing, offline use |
How model selection works
You do not specify a model directly in your API calls. Instead, you specify an agent_type, and each agent is configured with a model tier.
# This request is routed to whichever model the "product" agent is configured to use
curl -X POST https://api.ares.dirmacs.com/v1/chat \
-H "Authorization: Bearer ares_xxx" \
-H "Content-Type: application/json" \
-d '{"message": "Compare these two options", "agent_type": "product"}'
The mapping between agents and models is configured by your tenant administrator. A typical setup might look like:
| Agent | Model tier | Rationale |
|---|---|---|
classifier | fast | Needs speed, not depth |
product | balanced | General-purpose, good quality |
analyst | powerful | Complex reasoning required |
code-review | deepseek | Specialized for code tasks |
This design means you can upgrade an agent’s underlying model without changing any client code.
Provider architecture
ARES uses a named-provider system. Each provider is configured with its API endpoint, credentials, and rate limits. Models reference their provider by name.
┌─────────────┐
│ Your App │
│ agent_type │
└──────┬──────┘
│
▼
┌─────────────┐ ┌──────────┐
│ ARES │────▶│ Groq │ fast, balanced
│ Router │ └──────────┘
│ │ ┌──────────┐
│ │────▶│Anthropic │ powerful
│ │ └──────────┘
│ │ ┌──────────┐
│ │────▶│ NVIDIA │ deepseek
│ │ └──────────┘
│ │ ┌──────────┐
│ │────▶│ Ollama │ local
└─────────────┘ └──────────┘
Provider details
Groq — High-throughput inference on custom LPUs. Extremely fast response times. Hosts open-source models (Llama, Mixtral). Free tier available with rate limits.
Anthropic — Claude models. Best-in-class for complex reasoning, instruction following, and safety. Requires a paid API key.
NVIDIA (DeepSeek) — NVIDIA-hosted DeepSeek models via the NVIDIA AI API. Strong at code generation and structured technical output.
Ollama — Self-hosted, local inference. No external API calls. Useful for development, air-gapped environments, or when you need to keep data on-premises.
Rate limits
Rate limits are enforced per provider and per tenant. The following are default limits for the Groq free tier:
| Model tier | Requests per day | Tokens per minute |
|---|---|---|
fast (llama-3.1-8b) | 14,400 | 20,000 |
balanced (llama-3.3-70b) | 6,000 | 6,000 |
Anthropic and NVIDIA rate limits depend on your API plan with those providers. ARES surfaces rate limit errors transparently:
{
"error": "Rate limit exceeded for provider 'groq'",
"code": "RATE_LIMIT_EXCEEDED",
"retry_after": 60
}
Tenant-level rate limits and quotas are configured separately by your administrator and enforced by ARES regardless of provider limits.
Adding your own providers
If you are self-hosting ARES, you can add providers in your ares.toml configuration:
[[providers]]
name = "my-openai"
kind = "openai"
api_base = "https://api.openai.com/v1"
api_key_env = "OPENAI_API_KEY"
[[models]]
name = "gpt-4o"
provider = "my-openai"
model_id = "gpt-4o"
tier = "powerful"
Any provider that exposes an OpenAI-compatible API (vLLM, Together AI, Fireworks, etc.) can be added using the openai provider kind.
Choosing the right tier
| If you need… | Use tier |
|---|---|
| Fastest possible response | fast |
| Good quality at reasonable speed | balanced |
| Maximum reasoning capability | powerful |
| Code generation or technical tasks | deepseek |
| Offline or local development | local |
When in doubt, start with balanced. It provides the best trade-off between quality, speed, and cost for most use cases.
Chat & Conversations
Send messages to ARES agents and manage multi-turn conversations.
Send a message
POST /api/chat
Send a message to an agent and receive a response. ARES routes the message to the appropriate agent based on the agent_type parameter, or uses the default router agent if none is specified.
Authentication
Requires a JWT access token: Authorization: Bearer <jwt_access_token>
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
message | string | Yes | The user’s message or prompt. |
agent_type | string | No | Which agent handles the request (e.g., "product", "research", "router"). Defaults to the router agent. |
context_id | string | No | Conversation context ID. Pass this value back on subsequent requests to continue a multi-turn conversation. |
Response
{
"response": "Here's what I found about your question...",
"agent": "product",
"context_id": "ctx_a1b2c3d4",
"sources": null
}
| Field | Type | Description |
|---|---|---|
response | string | The agent’s response text. |
agent | string | The agent that handled the request. |
context_id | string | Context identifier. Pass this back to continue the conversation. |
sources | array|null | Source references, if the agent performed retrieval. Otherwise null. |
Examples
curl
curl -X POST https://api.ares.dirmacs.com/api/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOi..." \
-d '{
"message": "What pricing plans do you offer?",
"agent_type": "product"
}'
Python
import requests
response = requests.post(
"https://api.ares.dirmacs.com/api/chat",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
json={
"message": "What pricing plans do you offer?",
"agent_type": "product"
}
)
data = response.json()
print(data["response"])
# Continue the conversation using the returned context_id
follow_up = requests.post(
"https://api.ares.dirmacs.com/api/chat",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
json={
"message": "How does the Pro plan compare to Enterprise?",
"context_id": data["context_id"]
}
)
JavaScript
const response = await fetch("https://api.ares.dirmacs.com/api/chat", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
body: JSON.stringify({
message: "What pricing plans do you offer?",
agent_type: "product"
})
});
const data = await response.json();
console.log(data.response);
// Continue the conversation
const followUp = await fetch("https://api.ares.dirmacs.com/api/chat", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
body: JSON.stringify({
message: "How does the Pro plan compare to Enterprise?",
context_id: data.context_id
})
});
Stream a response
POST /api/chat/stream
Send a message and receive the response as a stream of Server-Sent Events (SSE). Each event contains a text chunk. This is the recommended approach for user-facing applications where you want to display the response as it is generated.
The request body is identical to POST /api/chat.
Authentication
Requires a JWT access token: Authorization: Bearer <jwt_access_token>
Response format
The response uses the text/event-stream content type. Each SSE event contains a chunk of the agent’s response:
data: Here's
data: what I
data: found about
data: your question...
Collect all chunks to form the complete response. The connection closes automatically when the response is complete.
Examples
curl
curl -N -X POST https://api.ares.dirmacs.com/api/chat/stream \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOi..." \
-H "Accept: text/event-stream" \
-d '{
"message": "Explain quantum computing",
"agent_type": "research"
}'
Python
import requests
response = requests.post(
"https://api.ares.dirmacs.com/api/chat/stream",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi...",
"Accept": "text/event-stream"
},
json={
"message": "Explain quantum computing",
"agent_type": "research"
},
stream=True
)
for line in response.iter_lines():
if line:
decoded = line.decode("utf-8")
if decoded.startswith("data: "):
chunk = decoded[6:] # Strip "data: " prefix
print(chunk, end="", flush=True)
JavaScript
const response = await fetch("https://api.ares.dirmacs.com/api/chat/stream", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi...",
"Accept": "text/event-stream"
},
body: JSON.stringify({
message: "Explain quantum computing",
agent_type: "research"
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value, { stream: true });
for (const line of text.split("\n")) {
if (line.startsWith("data: ")) {
const chunk = line.slice(6);
process.stdout.write(chunk); // Node.js
// Or append to DOM in browsers
}
}
}
Conversations
Manage stored conversations and their message history.
List conversations
GET /api/conversations
Returns all conversations for the authenticated user.
Authentication: JWT required.
curl https://api.ares.dirmacs.com/api/conversations \
-H "Authorization: Bearer eyJhbGciOi..."
Get a conversation
GET /api/conversations/{id}
Returns a single conversation along with its full message history.
Authentication: JWT required.
| Parameter | Type | In | Description |
|---|---|---|---|
id | string | path | The conversation ID |
curl https://api.ares.dirmacs.com/api/conversations/conv_abc123 \
-H "Authorization: Bearer eyJhbGciOi..."
Update a conversation
PUT /api/conversations/{id}
Update the title of a conversation.
Authentication: JWT required.
Request body:
{
"title": "Pricing discussion"
}
curl -X PUT https://api.ares.dirmacs.com/api/conversations/conv_abc123 \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOi..." \
-d '{"title": "Pricing discussion"}'
Delete a conversation
DELETE /api/conversations/{id}
Permanently delete a conversation and all its messages.
Authentication: JWT required.
curl -X DELETE https://api.ares.dirmacs.com/api/conversations/conv_abc123 \
-H "Authorization: Bearer eyJhbGciOi..."
User memory
GET /api/memory
Retrieve memory and preferences that ARES has learned from your conversations. This includes user preferences, context, and behavioral patterns the system has observed.
Authentication: JWT required.
curl https://api.ares.dirmacs.com/api/memory \
-H "Authorization: Bearer eyJhbGciOi..."
Agents
ARES agents are autonomous units that process requests using a configured LLM model, a system prompt, and a set of tools. Each agent is specialized for a particular domain or task — routing, research, product knowledge, risk analysis, and more.
Agents are defined by four properties:
- Model — The LLM that powers the agent (e.g.,
llama-3.3-70b,claude-3-5-sonnet,deepseek-r1). - System prompt — Instructions that shape the agent’s behavior, personality, and domain knowledge.
- Tools — Capabilities the agent can invoke during processing (e.g.,
calculator,web_search,code_interpreter). - Name — A unique identifier used to route requests to this agent.
Agents can be platform-provided (available to all users) or user-defined (private, created via API or TOON config).
List all agents
GET /api/agents
Returns all available agents on the platform. This endpoint does not require authentication.
Response
[
{
"name": "router",
"description": "Routes incoming requests to the most appropriate specialist agent.",
"model": "llama-3.3-70b-versatile",
"tools": []
},
{
"name": "research",
"description": "Conducts deep multi-step research with source synthesis.",
"model": "deepseek-r1-distill-llama-70b",
"tools": ["web_search", "calculator"]
},
{
"name": "product",
"description": "Answers product-related questions with detailed knowledge.",
"model": "llama-3.3-70b-versatile",
"tools": []
}
]
Examples
curl
curl https://api.ares.dirmacs.com/api/agents
Python
import requests
response = requests.get("https://api.ares.dirmacs.com/api/agents")
agents = response.json()
for agent in agents:
print(f"{agent['name']}: {agent['description']}")
JavaScript
const response = await fetch("https://api.ares.dirmacs.com/api/agents");
const agents = await response.json();
agents.forEach(agent => {
console.log(`${agent.name}: ${agent.description}`);
});
User agents
Create and manage your own custom agents. User agents are private to your account and can be configured with any available model, custom system prompts, and tool selections.
All user agent endpoints require JWT authentication: Authorization: Bearer <jwt_access_token>
List your agents
GET /api/user/agents
Returns all custom agents owned by the authenticated user.
curl https://api.ares.dirmacs.com/api/user/agents \
-H "Authorization: Bearer eyJhbGciOi..."
Create an agent
POST /api/user/agents
Create a new custom agent.
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique agent name (alphanumeric, hyphens). |
model | string | Yes | LLM model identifier. |
system_prompt | string | Yes | Instructions that define agent behavior. |
tools | string[] | No | List of tool names the agent can use. |
Example
curl -X POST https://api.ares.dirmacs.com/api/user/agents \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOi..." \
-d '{
"name": "code-reviewer",
"model": "llama-3.3-70b-versatile",
"system_prompt": "You are an expert code reviewer. Analyze code for bugs, security issues, and style problems. Be concise and actionable.",
"tools": ["calculator"]
}'
import requests
requests.post(
"https://api.ares.dirmacs.com/api/user/agents",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
json={
"name": "code-reviewer",
"model": "llama-3.3-70b-versatile",
"system_prompt": "You are an expert code reviewer. Analyze code for bugs, security issues, and style problems. Be concise and actionable.",
"tools": ["calculator"]
}
)
Get agent details
GET /api/user/agents/{name}
Retrieve the full configuration of a specific user agent.
| Parameter | Type | In | Description |
|---|---|---|---|
name | string | path | The agent’s name |
curl https://api.ares.dirmacs.com/api/user/agents/code-reviewer \
-H "Authorization: Bearer eyJhbGciOi..."
Update an agent
PUT /api/user/agents/{name}
Update an existing agent’s configuration. You can modify the model, system prompt, or tools.
curl -X PUT https://api.ares.dirmacs.com/api/user/agents/code-reviewer \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOi..." \
-d '{
"model": "deepseek-r1-distill-llama-70b",
"system_prompt": "You are a senior code reviewer specializing in Rust and TypeScript.",
"tools": ["calculator", "web_search"]
}'
Delete an agent
DELETE /api/user/agents/{name}
Permanently delete a user agent.
curl -X DELETE https://api.ares.dirmacs.com/api/user/agents/code-reviewer \
-H "Authorization: Bearer eyJhbGciOi..."
TOON import/export
TOON is ARES’s agent configuration format. You can import and export agent configs as TOON to share agent definitions, back up configurations, or migrate agents between environments.
Import a TOON config
POST /api/user/agents/import
Import an agent definition from a TOON configuration file.
curl -X POST https://api.ares.dirmacs.com/api/user/agents/import \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOi..." \
-d @agent-config.toon
Export as TOON
GET /api/user/agents/{name}/export
Export an agent’s configuration in TOON format. Useful for sharing agent definitions or version-controlling them alongside your codebase.
curl https://api.ares.dirmacs.com/api/user/agents/code-reviewer/export \
-H "Authorization: Bearer eyJhbGciOi..." \
-o code-reviewer.toon
Workflows
Workflows are multi-agent orchestration pipelines. A workflow defines an entry point agent (typically a router) that analyzes the incoming query and delegates to specialist agents in sequence. The result is a coordinated, multi-step response that leverages the strengths of different agents.
How workflows operate:
- The query enters through an entry agent (usually a router).
- The router analyzes intent and selects the most appropriate specialist agent.
- The specialist processes the query, optionally delegating further.
- Each step is recorded in the reasoning path, providing full transparency into the decision chain.
- The final response is returned along with metadata about the execution.
List workflows
GET /api/workflows
Returns the names of all available workflows.
Authentication
Requires a JWT access token: Authorization: Bearer <jwt_access_token>
Response
["default", "research", "support"]
Example
curl https://api.ares.dirmacs.com/api/workflows \
-H "Authorization: Bearer eyJhbGciOi..."
Execute a workflow
POST /api/workflows/{workflow_name}
Execute a named workflow. The query is routed through the workflow’s agent chain, and the final synthesized response is returned along with execution metadata.
Authentication
Requires a JWT access token: Authorization: Bearer <jwt_access_token>
Path parameters
| Parameter | Type | Description |
|---|---|---|
workflow_name | string | Name of the workflow to execute |
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | The input query or task for the workflow. |
context | object | No | Additional context passed to agents during execution. |
Response
{
"final_response": "Based on our analysis, the Pro plan at $49/month offers the best value for your use case. It includes 100K API calls, priority support, and access to all models. The Enterprise plan adds dedicated infrastructure and SLA guarantees, which may be worth considering if you expect to exceed 500K calls/month.",
"steps_executed": 3,
"agents_used": ["router", "sales", "product"],
"reasoning_path": [
{
"agent": "router",
"action": "Classified as pricing inquiry. Routing to sales agent."
},
{
"agent": "sales",
"action": "Retrieved pricing tiers. Consulting product agent for feature comparison."
},
{
"agent": "product",
"action": "Compared Pro vs Enterprise feature sets. Synthesized final recommendation."
}
]
}
| Field | Type | Description |
|---|---|---|
final_response | string | The synthesized response from the workflow. |
steps_executed | integer | Total number of agent steps in the execution. |
agents_used | string[] | Ordered list of agents that participated. |
reasoning_path | array | Step-by-step trace of each agent’s reasoning and actions. |
Examples
curl
curl -X POST https://api.ares.dirmacs.com/api/workflows/default \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOi..." \
-d '{
"query": "Compare your Pro and Enterprise pricing plans for a mid-size SaaS company",
"context": {
"company_size": "50-200 employees",
"expected_volume": "200K calls/month"
}
}'
Python
import requests
response = requests.post(
"https://api.ares.dirmacs.com/api/workflows/default",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
json={
"query": "Compare your Pro and Enterprise pricing plans for a mid-size SaaS company",
"context": {
"company_size": "50-200 employees",
"expected_volume": "200K calls/month"
}
}
)
result = response.json()
print(result["final_response"])
# Inspect the reasoning chain
for step in result["reasoning_path"]:
print(f" [{step['agent']}] {step['action']}")
JavaScript
const response = await fetch(
"https://api.ares.dirmacs.com/api/workflows/default",
{
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
body: JSON.stringify({
query: "Compare your Pro and Enterprise pricing plans for a mid-size SaaS company",
context: {
company_size: "50-200 employees",
expected_volume: "200K calls/month"
}
})
}
);
const result = await response.json();
console.log(result.final_response);
// Inspect the reasoning chain
result.reasoning_path.forEach(step => {
console.log(` [${step.agent}] ${step.action}`);
});
Workflow behavior
Agent selection. The entry agent examines the query and routes to the specialist best suited to handle it. If a specialist determines it needs input from another agent, it can delegate further, creating a multi-hop chain.
Context propagation. The optional context object is available to every agent in the chain. Use it to pass structured information (user tier, session metadata, domain-specific parameters) that agents can reference during processing.
Determinism. Workflow routing is driven by the entry agent’s LLM reasoning, so the same query may route differently depending on phrasing. The reasoning_path in the response provides full visibility into routing decisions.
Research
The Research API performs deep, multi-step research on a topic using parallel sub-agents. Unlike a single chat request, a research query spawns multiple agents that independently explore facets of the question, synthesize findings, and produce a comprehensive result with source attribution.
Execute a research query
POST /api/research
Submit a research query for deep, multi-step investigation.
Authentication
Requires a JWT access token: Authorization: Bearer <jwt_access_token>
Request body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | – | The research question or topic. |
depth | integer | No | 3 | How many levels deep the research goes. Higher values explore sub-topics more thoroughly. |
max_iterations | integer | No | 5 | Maximum total agent calls. Acts as a cost/time ceiling. |
Understanding depth: At depth 1, the research agent answers the query directly. At depth 2, it identifies sub-questions, spawns agents to answer each, then synthesizes. At depth 3+, sub-agents can spawn their own sub-agents, creating a tree of investigation.
Understanding max_iterations: This is a hard cap on total agent invocations across all depth levels. If the research tree would require more calls than max_iterations, it stops expanding and synthesizes what it has. Use this to control cost and response time.
Response
{
"findings": "## Market Analysis: Edge Computing in Healthcare\n\nEdge computing adoption in healthcare is accelerating, driven by three primary factors...\n\n### Key Findings\n1. **Latency requirements** — Real-time patient monitoring demands sub-10ms response times...\n2. **Data sovereignty** — HIPAA compliance increasingly favors on-premise processing...\n3. **Cost dynamics** — Edge deployment reduces cloud egress costs by 40-60% for imaging workloads...\n\n### Sources\n- Gartner Healthcare IT Report 2025\n- IEEE Edge Computing Survey\n- HHS HIPAA Guidance Update",
"sources": [
"Gartner Healthcare IT Report 2025",
"IEEE Edge Computing Survey",
"HHS HIPAA Guidance Update"
],
"duration_ms": 8432
}
| Field | Type | Description |
|---|---|---|
findings | string | The synthesized research output, typically in Markdown. |
sources | string[] | References and sources discovered during research. |
duration_ms | integer | Total time taken for the research in milliseconds. |
Examples
curl
curl -X POST https://api.ares.dirmacs.com/api/research \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOi..." \
-d '{
"query": "What are the current trends in edge computing for healthcare?",
"depth": 3,
"max_iterations": 5
}'
Python
import requests
response = requests.post(
"https://api.ares.dirmacs.com/api/research",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
json={
"query": "What are the current trends in edge computing for healthcare?",
"depth": 3,
"max_iterations": 5
}
)
result = response.json()
print(result["findings"])
print(f"\nCompleted in {result['duration_ms']}ms")
print(f"Sources: {', '.join(result['sources'])}")
JavaScript
const response = await fetch("https://api.ares.dirmacs.com/api/research", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
body: JSON.stringify({
query: "What are the current trends in edge computing for healthcare?",
depth: 3,
max_iterations: 5
})
});
const result = await response.json();
console.log(result.findings);
console.log(`\nCompleted in ${result.duration_ms}ms`);
console.log(`Sources: ${result.sources.join(", ")}`);
Tuning research parameters
| Scenario | Recommended depth | Recommended max_iterations |
|---|---|---|
| Quick factual lookup | 1 | 2 |
| Standard research question | 2 | 5 |
| Deep competitive analysis | 3 | 10 |
| Exhaustive literature review | 4+ | 15+ |
Higher depth and iteration values produce more comprehensive results but take longer and consume more API quota. For most use cases, the defaults (depth: 3, max_iterations: 5) provide a good balance of thoroughness and speed.
RAG (Retrieval-Augmented Generation)
The RAG API lets you ingest documents, search them using multiple retrieval strategies, and manage document collections. RAG powers knowledge-grounded responses by retrieving relevant context from your documents before generating answers.
Feature flag: The RAG API requires ARES to be built with the
ares-vectorfeature. If your deployment does not include this feature, these endpoints will return404.
Ingest documents
POST /api/rag/ingest
Ingest content into a named collection. The content is automatically chunked and indexed for retrieval.
Authentication
Requires a JWT access token: Authorization: Bearer <jwt_access_token>
Request body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
collection | string | Yes | – | Name of the collection to ingest into. Created automatically if it doesn’t exist. |
content | string | Yes | – | The text content to ingest. |
metadata | object | No | {} | Arbitrary key-value metadata attached to the document. |
chunking_strategy | string | No | "word" | How to split the content into chunks. Options: "word", "sentence", "paragraph". |
Response
{
"chunks_created": 5,
"document_ids": [
"doc_a1b2c3d4",
"doc_e5f6g7h8",
"doc_i9j0k1l2",
"doc_m3n4o5p6",
"doc_q7r8s9t0"
],
"collection": "docs"
}
| Field | Type | Description |
|---|---|---|
chunks_created | integer | Number of chunks produced from the content. |
document_ids | string[] | IDs assigned to each chunk. |
collection | string | The collection the content was ingested into. |
Examples
curl
curl -X POST https://api.ares.dirmacs.com/api/rag/ingest \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOi..." \
-d '{
"collection": "product-docs",
"content": "ARES is a multi-agent AI platform that orchestrates specialized agents to handle complex queries. It supports multiple LLM providers including Groq, Anthropic, and NVIDIA...",
"metadata": {
"source": "documentation",
"version": "2.0",
"author": "engineering"
},
"chunking_strategy": "paragraph"
}'
Python
import requests
response = requests.post(
"https://api.ares.dirmacs.com/api/rag/ingest",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
json={
"collection": "product-docs",
"content": "ARES is a multi-agent AI platform...",
"metadata": {"source": "documentation", "version": "2.0"},
"chunking_strategy": "paragraph"
}
)
result = response.json()
print(f"Created {result['chunks_created']} chunks in '{result['collection']}'")
JavaScript
const response = await fetch("https://api.ares.dirmacs.com/api/rag/ingest", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
body: JSON.stringify({
collection: "product-docs",
content: "ARES is a multi-agent AI platform...",
metadata: { source: "documentation", version: "2.0" },
chunking_strategy: "paragraph"
})
});
const result = await response.json();
console.log(`Created ${result.chunks_created} chunks in '${result.collection}'`);
Search documents
POST /api/rag/search
Search a collection using one of several retrieval strategies. Returns the most relevant document chunks.
Authentication
Requires a JWT access token: Authorization: Bearer <jwt_access_token>
Request body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
collection | string | Yes | – | Collection to search. |
query | string | Yes | – | The search query. |
strategy | string | No | "hybrid" | Retrieval strategy (see below). |
top_k | integer | No | 5 | Maximum number of results to return. |
rerank | boolean | No | false | Whether to rerank results for improved relevance ordering. |
Search strategies
| Strategy | Description |
|---|---|
semantic | Vector similarity search. Best for conceptual or meaning-based queries. |
bm25 | Classic keyword-based ranking (BM25 algorithm). Best for exact term matching. |
fuzzy | Tolerates typos and approximate matches. Useful for user-facing search with imprecise input. |
hybrid | Combines semantic and keyword search, then merges results. Best overall performance for most use cases. |
Response
The response contains an array of matching document chunks, each with its content, relevance score, and metadata.
Examples
curl
curl -X POST https://api.ares.dirmacs.com/api/rag/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOi..." \
-d '{
"collection": "product-docs",
"query": "how does agent routing work",
"strategy": "hybrid",
"top_k": 5,
"rerank": true
}'
Python
import requests
response = requests.post(
"https://api.ares.dirmacs.com/api/rag/search",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
json={
"collection": "product-docs",
"query": "how does agent routing work",
"strategy": "hybrid",
"top_k": 5,
"rerank": True
}
)
results = response.json()
for result in results:
print(result)
JavaScript
const response = await fetch("https://api.ares.dirmacs.com/api/rag/search", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi..."
},
body: JSON.stringify({
collection: "product-docs",
query: "how does agent routing work",
strategy: "hybrid",
top_k: 5,
rerank: true
})
});
const results = await response.json();
results.forEach(result => console.log(result));
List collections
GET /api/rag/collections
Returns all document collections for the authenticated user.
Authentication
Requires a JWT access token: Authorization: Bearer <jwt_access_token>
curl https://api.ares.dirmacs.com/api/rag/collections \
-H "Authorization: Bearer eyJhbGciOi..."
Delete a collection
DELETE /api/rag/collection
Permanently delete a collection and all its indexed documents.
Authentication
Requires a JWT access token: Authorization: Bearer <jwt_access_token>
Request body
{
"collection": "product-docs"
}
Example
curl -X DELETE https://api.ares.dirmacs.com/api/rag/collection \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOi..." \
-d '{"collection": "product-docs"}'
Streaming
ARES supports real-time streaming responses via Server-Sent Events (SSE). Instead of waiting for the full response to be generated, you receive text chunks as they are produced. This enables responsive UIs that display text as it appears.
Endpoint
POST /api/chat/stream
JWT authentication: Authorization: Bearer <jwt_access_token>
POST /v1/chat/stream
API key authentication: Authorization: Bearer ares_xxx
Both endpoints accept the same request body as POST /api/chat and return the same SSE format.
SSE format
The response uses Content-Type: text/event-stream. Each event contains a data: field with a text chunk:
data: The
data: answer
data: to your
data: question is
data: as follows...
Each data: line represents one chunk of the response. Concatenate all chunks in order to reconstruct the complete response. The server closes the connection when generation is complete.
Examples
curl
The -N flag disables output buffering so chunks appear immediately:
curl -N -X POST https://api.ares.dirmacs.com/api/chat/stream \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOi..." \
-H "Accept: text/event-stream" \
-d '{
"message": "Explain how neural networks learn",
"agent_type": "research"
}'
Python
Using the requests library with stream=True:
import requests
response = requests.post(
"https://api.ares.dirmacs.com/api/chat/stream",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer eyJhbGciOi...",
"Accept": "text/event-stream"
},
json={
"message": "Explain how neural networks learn",
"agent_type": "research"
},
stream=True
)
full_response = []
for line in response.iter_lines():
if line:
decoded = line.decode("utf-8")
if decoded.startswith("data: "):
chunk = decoded[6:]
print(chunk, end="", flush=True)
full_response.append(chunk)
complete_text = "".join(full_response)
For production use, consider using httpx with async streaming:
import httpx
import asyncio
async def stream_chat(message: str, token: str) -> str:
chunks = []
async with httpx.AsyncClient() as client:
async with client.stream(
"POST",
"https://api.ares.dirmacs.com/api/chat/stream",
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {token}",
"Accept": "text/event-stream"
},
json={"message": message}
) as response:
async for line in response.aiter_lines():
if line.startswith("data: "):
chunk = line[6:]
print(chunk, end="", flush=True)
chunks.append(chunk)
return "".join(chunks)
result = asyncio.run(stream_chat("Explain how neural networks learn", "eyJhbGciOi..."))
JavaScript (Browser)
Using the Fetch API with ReadableStream:
async function streamChat(message, token) {
const response = await fetch("https://api.ares.dirmacs.com/api/chat/stream", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${token}`,
"Accept": "text/event-stream"
},
body: JSON.stringify({
message: message,
agent_type: "research"
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullResponse = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value, { stream: true });
for (const line of text.split("\n")) {
if (line.startsWith("data: ")) {
const chunk = line.slice(6);
fullResponse += chunk;
// Update your UI here
document.getElementById("output").textContent = fullResponse;
}
}
}
return fullResponse;
}
JavaScript (Node.js)
async function streamChat(message, token) {
const response = await fetch("https://api.ares.dirmacs.com/api/chat/stream", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${token}`,
"Accept": "text/event-stream"
},
body: JSON.stringify({ message })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullResponse = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value, { stream: true });
for (const line of text.split("\n")) {
if (line.startsWith("data: ")) {
const chunk = line.slice(6);
fullResponse += chunk;
process.stdout.write(chunk);
}
}
}
return fullResponse;
}
Go
package main
import (
"bufio"
"bytes"
"encoding/json"
"fmt"
"net/http"
"strings"
)
func streamChat(message, token string) (string, error) {
body, _ := json.Marshal(map[string]string{
"message": message,
"agent_type": "research",
})
req, err := http.NewRequest("POST",
"https://api.ares.dirmacs.com/api/chat/stream",
bytes.NewReader(body))
if err != nil {
return "", err
}
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Authorization", "Bearer "+token)
req.Header.Set("Accept", "text/event-stream")
resp, err := http.DefaultClient.Do(req)
if err != nil {
return "", err
}
defer resp.Body.Close()
var fullResponse strings.Builder
scanner := bufio.NewScanner(resp.Body)
for scanner.Scan() {
line := scanner.Text()
if strings.HasPrefix(line, "data: ") {
chunk := line[6:]
fmt.Print(chunk)
fullResponse.WriteString(chunk)
}
}
return fullResponse.String(), scanner.Err()
}
func main() {
result, err := streamChat("Explain how neural networks learn", "eyJhbGciOi...")
if err != nil {
panic(err)
}
fmt.Printf("\n\nFull response length: %d characters\n", len(result))
}
Error handling
If the request is invalid or authentication fails, the server returns a standard HTTP error response (not SSE). Always check the response status before attempting to read the stream:
response = requests.post(url, headers=headers, json=body, stream=True)
if response.status_code != 200:
print(f"Error {response.status_code}: {response.text}")
else:
for line in response.iter_lines():
# process SSE events
const response = await fetch(url, { method: "POST", headers, body });
if (!response.ok) {
throw new Error(`Error ${response.status}: ${await response.text()}`);
}
// proceed with stream reading
Best practices
- Always set
Accept: text/event-streamto signal that you expect a streaming response. - Disable client-side buffering where possible (e.g.,
-Nin curl,stream=Truein Python requests). - Handle connection drops gracefully. The stream may close unexpectedly due to network issues. Implement retry logic for production applications.
- Set reasonable timeouts. Long research queries may stream for 30+ seconds. Configure your HTTP client timeout accordingly.
- Concatenate chunks for the final result. Individual chunks may split mid-word. Only process the complete response for downstream use.
V1 Client API
The V1 API is the primary interface for enterprise clients integrating ARES into their applications. All endpoints are scoped to the authenticated tenant — you only see your own agents, runs, and usage.
Base URL: https://api.ares.dirmacs.com
Authentication
Every request to /v1/* must include your API key in the Authorization header:
Authorization: Bearer ares_xxx
API keys are issued during tenant provisioning. You can create additional keys via the API or request them from your platform administrator.
Agents
List Agents
GET /v1/agents?page=1&per_page=20
Returns a paginated list of agents configured for your tenant.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
page | integer | 1 | Page number |
per_page | integer | 20 | Results per page |
Response:
{
"agents": [
{
"id": "uuid",
"name": "risk-analyzer",
"agent_type": "classifier",
"status": "active",
"config": { "model": "llama-3.3-70b", "tools": ["calculator"] },
"created_at": "2026-03-01T00:00:00Z",
"last_run": "2026-03-13T14:22:00Z",
"total_runs": 1547,
"success_rate": 0.982
}
],
"total": 4,
"page": 1,
"per_page": 20
}
Get Agent Details
GET /v1/agents/{name}
Returns full details for a single agent.
Response:
{
"id": "uuid",
"name": "risk-analyzer",
"agent_type": "classifier",
"status": "active",
"config": {
"model": "llama-3.3-70b",
"system_prompt": "You are a risk analysis agent...",
"tools": ["calculator"],
"max_tokens": 2048
},
"created_at": "2026-03-01T00:00:00Z",
"last_run": "2026-03-13T14:22:00Z",
"total_runs": 1547,
"success_rate": 0.982
}
Run an Agent
POST /v1/agents/{name}/run
Execute an agent with the provided input. This is the core endpoint for triggering agent work.
Request Body:
{
"input": {
"message": "Analyze the risk profile for transaction TX-9921",
"context": {
"amount": 15000,
"currency": "USD",
"merchant_category": "electronics"
}
}
}
Response:
{
"id": "run-uuid",
"agent_id": "agent-uuid",
"status": "completed",
"input": { "message": "Analyze the risk profile..." },
"output": {
"risk_score": 0.73,
"risk_level": "medium",
"reasoning": "Elevated amount for merchant category..."
},
"error": null,
"started_at": "2026-03-13T14:22:00Z",
"finished_at": "2026-03-13T14:22:01Z",
"duration_ms": 1243,
"tokens_used": 847
}
If the agent fails, status will be "failed" and error will contain a description.
List Agent Runs
GET /v1/agents/{name}/runs?page=1&per_page=20
Returns the run history for a specific agent, newest first.
Chat
Send a Chat Message
POST /v1/chat
Send a message to a model or agent and receive a complete response.
Request Body:
{
"messages": [
{ "role": "user", "content": "Summarize Q1 revenue trends." }
],
"model": "llama-3.3-70b",
"agent_type": "analyst"
}
Response:
{
"id": "msg-uuid",
"content": "Based on the data, Q1 revenue showed...",
"model": "llama-3.3-70b",
"tokens_used": 312,
"finish_reason": "stop"
}
Stream a Chat Response
POST /v1/chat/stream
Same request body as /v1/chat, but returns a Server-Sent Events (SSE) stream.
data: {"delta": "Based on", "finish_reason": null}
data: {"delta": " the data,", "finish_reason": null}
data: {"delta": " Q1 revenue", "finish_reason": null}
...
data: {"delta": "", "finish_reason": "stop", "tokens_used": 312}
Usage
Get Usage Summary
GET /v1/usage
Returns your tenant’s usage for the current billing period.
Response:
{
"period_start": "2026-03-01T00:00:00Z",
"period_end": "2026-03-31T23:59:59Z",
"total_runs": 4821,
"total_tokens": 2847193,
"total_api_calls": 5290,
"quota_runs": 100000,
"quota_tokens": 10000000,
"daily_usage": [
{ "date": "2026-03-13", "runs": 312, "tokens": 184920, "api_calls": 340 },
{ "date": "2026-03-12", "runs": 287, "tokens": 171003, "api_calls": 315 }
]
}
API Keys
List API Keys
GET /v1/api-keys
Returns all API keys for your tenant. The full key secret is never returned after creation.
Response:
{
"keys": [
{
"id": "key-uuid",
"name": "android-production",
"prefix": "ares_a1b2",
"created_at": "2026-03-01T00:00:00Z",
"expires_at": "2027-03-01T00:00:00Z",
"last_used": "2026-03-13T14:00:00Z"
}
]
}
Create API Key
POST /v1/api-keys
Request Body:
{
"name": "mobile-app-key",
"expires_in_days": 365
}
expires_in_days is optional. If omitted, the key does not expire.
Response:
{
"key": "key-uuid",
"secret": "ares_x7k9m2p4q8r1s5t3..."
}
Important: The
secretfield is only returned once at creation time. Store it securely — it cannot be retrieved again.
Revoke API Key
DELETE /v1/api-keys/{id}
Immediately invalidates the key. Returns 204 No Content on success.
Examples
Run an Agent (curl)
curl -X POST https://api.ares.dirmacs.com/v1/agents/risk-analyzer/run \
-H "Authorization: Bearer ares_x7k9m2p4q8r1s5t3" \
-H "Content-Type: application/json" \
-d '{
"input": {
"message": "Evaluate this transaction",
"context": {"amount": 15000, "currency": "USD"}
}
}'
Run an Agent (Python)
import requests
API_KEY = "ares_x7k9m2p4q8r1s5t3"
BASE_URL = "https://api.ares.dirmacs.com"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
# Run an agent
response = requests.post(
f"{BASE_URL}/v1/agents/risk-analyzer/run",
headers=headers,
json={
"input": {
"message": "Evaluate this transaction",
"context": {"amount": 15000, "currency": "USD"},
}
},
)
result = response.json()
print(f"Status: {result['status']}")
print(f"Output: {result['output']}")
print(f"Duration: {result['duration_ms']}ms")
print(f"Tokens: {result['tokens_used']}")
Check Usage (curl)
curl https://api.ares.dirmacs.com/v1/usage \
-H "Authorization: Bearer ares_x7k9m2p4q8r1s5t3"
Check Usage (Python)
response = requests.get(f"{BASE_URL}/v1/usage", headers=headers)
usage = response.json()
print(f"Runs this month: {usage['total_runs']} / {usage['quota_runs']}")
print(f"Tokens this month: {usage['total_tokens']} / {usage['quota_tokens']}")
Chat with Streaming (Python)
import requests
import json
response = requests.post(
f"{BASE_URL}/v1/chat/stream",
headers=headers,
json={
"messages": [{"role": "user", "content": "Explain quantum computing."}],
"model": "llama-3.3-70b",
},
stream=True,
)
for line in response.iter_lines():
if line:
text = line.decode("utf-8")
if text.startswith("data: "):
data = json.loads(text[6:])
print(data.get("delta", ""), end="", flush=True)
Chat with Streaming (JavaScript)
const response = await fetch("https://api.ares.dirmacs.com/v1/chat/stream", {
method: "POST",
headers: {
"Authorization": "Bearer ares_x7k9m2p4q8r1s5t3",
"Content-Type": "application/json",
},
body: JSON.stringify({
messages: [{ role: "user", content: "Explain quantum computing." }],
model: "llama-3.3-70b",
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
for (const line of text.split("\n")) {
if (line.startsWith("data: ")) {
const data = JSON.parse(line.slice(6));
process.stdout.write(data.delta || "");
}
}
}
Admin API
The Admin API provides full platform management capabilities for ARES operators. Use it to provision tenants, manage agents, monitor usage, and operate the platform.
Base URL: https://api.ares.dirmacs.com
Authentication
Every request to /api/admin/* must include the admin secret:
X-Admin-Secret: <secret>
This secret is set in your ares.toml configuration. Guard it carefully — it grants full platform access.
Tenants
Create Tenant
POST /api/admin/tenants
Request Body:
{
"name": "acme-corp",
"tier": "pro"
}
Valid tiers: free, dev, pro, enterprise.
Response:
{
"id": "tenant-uuid",
"name": "acme-corp",
"tier": "pro",
"created_at": "2026-03-13T00:00:00Z"
}
List Tenants
GET /api/admin/tenants
Response:
{
"tenants": [
{
"id": "tenant-uuid",
"name": "acme-corp",
"tier": "pro",
"agent_count": 4,
"created_at": "2026-03-13T00:00:00Z"
}
]
}
Get Tenant Details
GET /api/admin/tenants/{id}
Response:
{
"id": "tenant-uuid",
"name": "acme-corp",
"tier": "pro",
"agent_count": 4,
"api_key_count": 2,
"total_runs": 12849,
"total_tokens": 7291034,
"created_at": "2026-03-13T00:00:00Z"
}
Update Tenant Tier
PUT /api/admin/tenants/{id}/quota
Request Body:
{
"tier": "enterprise"
}
Response: Updated tenant object.
Provisioning
Provision a Client
POST /api/admin/provision-client
This is the recommended way to onboard a new enterprise client. It atomically creates a tenant, clones the appropriate agent templates, and generates an API key — all in a single transaction. If any step fails, everything is rolled back.
Request Body:
{
"name": "acme-corp",
"tier": "pro",
"product_type": "kasino",
"api_key_name": "production"
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique tenant name (lowercase, alphanumeric + hyphens) |
tier | string | Yes | One of: free, dev, pro, enterprise |
product_type | string | Yes | Template set to clone: generic, kasino, ehb |
api_key_name | string | Yes | Label for the initial API key |
Response:
{
"tenant_id": "tenant-uuid",
"tenant_name": "acme-corp",
"tier": "pro",
"product_type": "kasino",
"api_key_id": "key-uuid",
"api_key_prefix": "ares_a1b2",
"raw_api_key": "ares_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5",
"agents_created": [
"kasino-classifier",
"kasino-risk",
"kasino-transaction",
"kasino-report"
]
}
Important: The
raw_api_keyis only returned once. Store it securely and deliver it to the client through a secure channel.
curl Example:
curl -X POST https://api.ares.dirmacs.com/api/admin/provision-client \
-H "X-Admin-Secret: your-admin-secret" \
-H "Content-Type: application/json" \
-d '{
"name": "acme-corp",
"tier": "pro",
"product_type": "kasino",
"api_key_name": "production"
}'
API Keys
Create API Key for Tenant
POST /api/admin/tenants/{id}/api-keys
Request Body:
{
"name": "staging-key"
}
Response:
{
"id": "key-uuid",
"prefix": "ares_x7k9",
"raw_key": "ares_x7k9m2p4q8r1s5t3...",
"created_at": "2026-03-13T00:00:00Z"
}
List API Keys for Tenant
GET /api/admin/tenants/{id}/api-keys
Response:
{
"keys": [
{
"id": "key-uuid",
"name": "production",
"prefix": "ares_a1b2",
"created_at": "2026-03-13T00:00:00Z",
"last_used": "2026-03-13T14:00:00Z"
}
]
}
Tenant Agents
List Tenant Agents
GET /api/admin/tenants/{id}/agents
Response:
{
"agents": [
{
"id": "agent-uuid",
"name": "kasino-classifier",
"agent_type": "classifier",
"status": "active",
"model": "llama-3.3-70b",
"total_runs": 2841,
"success_rate": 0.991
}
]
}
Create Tenant Agent
POST /api/admin/tenants/{id}/agents
Request Body:
{
"name": "custom-analyzer",
"agent_type": "analyzer",
"config": {
"model": "llama-3.3-70b",
"system_prompt": "You are a financial data analyzer...",
"tools": ["calculator"],
"max_tokens": 4096
}
}
Update Tenant Agent
PUT /api/admin/tenants/{id}/agents/{name}
Request Body: Same structure as create. Fields provided will be updated.
Delete Tenant Agent
DELETE /api/admin/tenants/{id}/agents/{name}
Returns 204 No Content on success.
Templates and Models
List Agent Templates
GET /api/admin/agent-templates?product_type=kasino
Returns the pre-configured agent templates available for a given product type. These are cloned during provisioning.
Response:
{
"templates": [
{
"name": "kasino-classifier",
"agent_type": "classifier",
"product_type": "kasino",
"config": {
"model": "llama-3.3-70b",
"system_prompt": "You are a transaction classifier...",
"tools": []
}
}
]
}
List Available Models
GET /api/admin/models
Returns all models configured across all providers.
Response:
{
"models": [
{
"id": "llama-3.3-70b",
"provider": "groq",
"context_length": 131072,
"supports_tools": true
},
{
"id": "deepseek-r1",
"provider": "nvidia-deepseek",
"context_length": 65536,
"supports_tools": false
},
{
"id": "claude-3.5-sonnet",
"provider": "anthropic",
"context_length": 200000,
"supports_tools": true
}
]
}
Usage and Analytics
Tenant Usage Summary
GET /api/admin/tenants/{id}/usage
Response:
{
"tenant_id": "tenant-uuid",
"tenant_name": "acme-corp",
"tier": "pro",
"period_start": "2026-03-01T00:00:00Z",
"period_end": "2026-03-31T23:59:59Z",
"total_runs": 4821,
"total_tokens": 2847193,
"quota_runs": 100000,
"quota_tokens": 10000000
}
Daily Usage Breakdown
GET /api/admin/tenants/{id}/usage/daily?days=30
Response:
{
"daily": [
{ "date": "2026-03-13", "runs": 312, "tokens": 184920 },
{ "date": "2026-03-12", "runs": 287, "tokens": 171003 }
]
}
Agent Run History
GET /api/admin/tenants/{id}/agents/{name}/runs?limit=50
Response:
{
"runs": [
{
"id": "run-uuid",
"status": "completed",
"started_at": "2026-03-13T14:22:00Z",
"duration_ms": 1243,
"tokens_used": 847
}
]
}
Agent Stats
GET /api/admin/tenants/{id}/agents/{name}/stats
Response:
{
"agent_name": "kasino-classifier",
"total_runs": 2841,
"successful_runs": 2815,
"failed_runs": 26,
"success_rate": 0.991,
"avg_duration_ms": 1102,
"avg_tokens": 723,
"last_run": "2026-03-13T14:22:00Z"
}
Cross-Tenant Agent List
GET /api/admin/agents
Returns agents across all tenants. Useful for platform-wide visibility.
Platform Stats
GET /api/admin/stats
Response:
{
"total_tenants": 12,
"total_agents": 47,
"total_runs_today": 3291,
"total_tokens_today": 1948271,
"active_alerts": 2
}
Alerts and Audit
List Alerts
GET /api/admin/alerts?severity=critical&resolved=false&limit=100
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
severity | string | all | Filter by: info, warning, critical |
resolved | boolean | all | Filter by resolution status |
limit | integer | 100 | Maximum results to return |
Response:
{
"alerts": [
{
"id": "alert-uuid",
"severity": "critical",
"message": "Tenant acme-corp approaching token quota (92%)",
"tenant_id": "tenant-uuid",
"created_at": "2026-03-13T10:00:00Z",
"resolved": false
}
]
}
Resolve Alert
POST /api/admin/alerts/{id}/resolve
Returns 200 OK with the updated alert object.
Audit Log
GET /api/admin/audit-log?limit=50
Response:
{
"entries": [
{
"id": "entry-uuid",
"action": "tenant.created",
"actor": "admin",
"details": { "tenant_name": "acme-corp", "tier": "pro" },
"timestamp": "2026-03-13T00:00:00Z"
},
{
"id": "entry-uuid",
"action": "agent.deleted",
"actor": "admin",
"details": { "tenant_id": "...", "agent_name": "old-agent" },
"timestamp": "2026-03-12T23:00:00Z"
}
]
}
Deployment API
The Deployment API allows you to trigger, monitor, and inspect deployments of ARES platform services. Deployments run server-side on the VPS and stream build output for observability.
Base URL: https://api.ares.dirmacs.com
Authentication
All deployment endpoints require the admin secret:
X-Admin-Secret: <secret>
Trigger a Deployment
POST /api/admin/deploy
Starts a deployment for the specified target service. The deployment runs asynchronously — you receive a deployment ID immediately and poll for completion.
Request Body:
{
"target": "ares"
}
| Target | Description |
|---|---|
ares | ARES backend — pulls latest code, rebuilds, and restarts |
admin | dirmacs-admin dashboard — rebuilds Leptos frontend |
eruka | Eruka backend — pulls, rebuilds, and restarts |
Response:
{
"id": "deploy-uuid",
"status": "running",
"message": "Deployment started for ares"
}
curl Example:
curl -X POST https://api.ares.dirmacs.com/api/admin/deploy \
-H "X-Admin-Secret: your-admin-secret" \
-H "Content-Type: application/json" \
-d '{"target": "ares"}'
Poll Deployment Status
GET /api/admin/deploy/{id}
Returns the current status of a deployment. Poll this endpoint until status is no longer "running".
Response:
{
"id": "deploy-uuid",
"target": "ares",
"status": "success",
"started_at": "2026-03-13T14:00:00Z",
"finished_at": "2026-03-13T14:03:42Z",
"output": "Pulling latest changes...\nCompiling ares-server v0.1.0...\nFinished release target(s) in 3m 41s\nRestarting ares.service...\nService started successfully."
}
Status Values:
| Status | Meaning |
|---|---|
running | Deployment is in progress |
success | Deployment completed successfully |
failed | Deployment failed — check output for details |
Polling Pattern
The recommended approach is to trigger a deployment, then poll every 3 seconds until it completes:
# 1. Trigger deployment
DEPLOY_ID=$(curl -s -X POST https://api.ares.dirmacs.com/api/admin/deploy \
-H "X-Admin-Secret: your-admin-secret" \
-H "Content-Type: application/json" \
-d '{"target": "ares"}' | jq -r '.id')
echo "Deployment started: $DEPLOY_ID"
# 2. Poll until complete
while true; do
RESULT=$(curl -s https://api.ares.dirmacs.com/api/admin/deploy/$DEPLOY_ID \
-H "X-Admin-Secret: your-admin-secret")
STATUS=$(echo "$RESULT" | jq -r '.status')
echo "Status: $STATUS"
if [ "$STATUS" != "running" ]; then
echo "$RESULT" | jq -r '.output'
break
fi
sleep 3
done
Python Example:
import requests
import time
ADMIN_SECRET = "your-admin-secret"
BASE_URL = "https://api.ares.dirmacs.com"
headers = {
"X-Admin-Secret": ADMIN_SECRET,
"Content-Type": "application/json",
}
# Trigger
resp = requests.post(
f"{BASE_URL}/api/admin/deploy",
headers=headers,
json={"target": "ares"},
)
deploy_id = resp.json()["id"]
print(f"Deployment started: {deploy_id}")
# Poll
while True:
resp = requests.get(
f"{BASE_URL}/api/admin/deploy/{deploy_id}",
headers=headers,
)
result = resp.json()
print(f"Status: {result['status']}")
if result["status"] != "running":
print(result["output"])
break
time.sleep(3)
List Recent Deployments
GET /api/admin/deploys
Returns the 20 most recent deployments, newest first.
Response:
{
"deploys": [
{
"id": "deploy-uuid",
"target": "ares",
"status": "success",
"started_at": "2026-03-13T14:00:00Z",
"finished_at": "2026-03-13T14:03:42Z"
},
{
"id": "deploy-uuid-2",
"target": "admin",
"status": "failed",
"started_at": "2026-03-12T10:00:00Z",
"finished_at": "2026-03-12T10:02:15Z"
}
]
}
curl Example:
curl https://api.ares.dirmacs.com/api/admin/deploys \
-H "X-Admin-Secret: your-admin-secret"
Service Health
List All Services
GET /api/admin/services
Returns the runtime status of all managed services.
Response:
{
"ares": {
"status": "running",
"pid": 12847,
"port": 3000
},
"eruka": {
"status": "running",
"pid": 12901,
"port": 8081
},
"admin": {
"status": "running",
"pid": null,
"port": null
}
}
| Status | Meaning |
|---|---|
running | Service is up and healthy |
stopped | Service is not running |
degraded | Service is running but unhealthy |
curl Example:
curl https://api.ares.dirmacs.com/api/admin/services \
-H "X-Admin-Secret: your-admin-secret"
Get Service Logs
GET /api/admin/services/{name}/logs
Returns recent log output from the service’s systemd journal.
Response:
{
"service": "ares",
"lines": [
"Mar 13 14:03:42 vps ares-server[12847]: Listening on 0.0.0.0:3000",
"Mar 13 14:03:42 vps ares-server[12847]: Connected to PostgreSQL",
"Mar 13 14:03:43 vps ares-server[12847]: Loaded 29 agents, 4 providers, 11 models",
"Mar 13 14:04:01 vps ares-server[12847]: POST /v1/agents/risk-analyzer/run 200 1243ms"
]
}
curl Example:
curl https://api.ares.dirmacs.com/api/admin/services/ares/logs \
-H "X-Admin-Secret: your-admin-secret"
Multi-Tenant Architecture
ARES is a multi-tenant platform. Each enterprise client operates within an isolated tenant, with their own agents, API keys, usage quotas, and data boundaries. This page explains the tenancy model and how to provision new clients.
Core Concepts
Tenants
A tenant is an isolated namespace on the ARES platform. Each tenant has:
- A unique name and ID
- A tier that determines rate limits and quotas
- Its own set of agents (cloned from templates or created manually)
- One or more API keys for authentication
- Independent usage tracking and billing data
Tenants cannot see or interact with each other’s resources. A request authenticated with Tenant A’s API key will never return Tenant B’s agents, runs, or usage data.
Tiers
Every tenant is assigned a tier that governs their resource limits:
| Tier | Monthly Requests | Monthly Tokens | Daily Rate Limit | Use Case |
|---|---|---|---|---|
| Free | 1,000 | 100,000 | 100/day | Evaluation and testing |
| Dev | 10,000 | 1,000,000 | 1,000/day | Development and staging |
| Pro | 100,000 | 10,000,000 | 10,000/day | Production workloads |
| Enterprise | Unlimited | Unlimited | Unlimited | High-volume clients |
Tiers can be changed at any time via the Admin API without disrupting the tenant’s service.
Agent Templates
When a tenant is provisioned, ARES clones a set of pre-configured agent templates based on the specified product_type. Templates provide a working starting point that can be customized after creation.
Available product types:
| Product Type | Templates Included | Description |
|---|---|---|
generic | General-purpose agents | Default chat and analysis agents |
kasino | kasino-classifier, kasino-risk, kasino-transaction, kasino-report | Transaction analysis and reporting |
ehb | Health-oriented agents | eHealthBuddy clinical agents |
Each template defines the agent’s model, system prompt, tool access, and default configuration. After provisioning, agents can be freely modified or new ones added.
API Key Scoping
Every API key is bound to exactly one tenant. When a request arrives with an API key:
- ARES looks up the key and identifies the associated tenant
- All operations execute within that tenant’s scope
- Usage is tracked against that tenant’s quotas
- The response only includes that tenant’s data
A tenant can have multiple API keys (e.g., separate keys for production, staging, and mobile). Each key’s usage is tracked individually but counts toward the shared tenant quota.
Data Isolation
Tenant isolation is enforced at the database query level. Every data-accessing query includes the tenant ID as a filter condition. This means:
- Agent listings only return the requesting tenant’s agents
- Run history only shows runs from the requesting tenant
- Usage data only reflects the requesting tenant’s consumption
- There is no API surface to query across tenant boundaries (except via the Admin API)
Provisioning Flow
The recommended way to onboard a new client is the atomic provisioning endpoint. It creates all required resources in a single database transaction.
Step 1: Provision the Client
curl -X POST https://api.ares.dirmacs.com/api/admin/provision-client \
-H "X-Admin-Secret: your-admin-secret" \
-H "Content-Type: application/json" \
-d '{
"name": "acme-corp",
"tier": "pro",
"product_type": "kasino",
"api_key_name": "production"
}'
Response:
{
"tenant_id": "550e8400-e29b-41d4-a716-446655440000",
"tenant_name": "acme-corp",
"tier": "pro",
"product_type": "kasino",
"api_key_id": "key-uuid",
"api_key_prefix": "ares_a1b2",
"raw_api_key": "ares_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5",
"agents_created": [
"kasino-classifier",
"kasino-risk",
"kasino-transaction",
"kasino-report"
]
}
This single call:
- Creates the tenant with the specified tier
- Looks up the agent templates for the given
product_type - Clones each template as a tenant-specific agent
- Generates an API key bound to the new tenant
- Returns the raw API key (shown only once)
If any step fails, the entire operation is rolled back. You will never end up with a half-provisioned tenant.
Step 2: Deliver the API Key
Securely deliver the raw_api_key to your client. This is the only time the full key is visible — ARES stores only a hashed version internally.
Step 3: Verify the Setup
Confirm the tenant’s agents are accessible using their new API key:
curl https://api.ares.dirmacs.com/v1/agents \
-H "Authorization: Bearer ares_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5"
The client should see their four provisioned agents.
Step 4: Test an Agent Run
curl -X POST https://api.ares.dirmacs.com/v1/agents/kasino-classifier/run \
-H "Authorization: Bearer ares_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5" \
-H "Content-Type: application/json" \
-d '{
"input": {
"message": "Classify this transaction: $500 at electronics store"
}
}'
Managing Tenants After Provisioning
Add More Agents
curl -X POST https://api.ares.dirmacs.com/api/admin/tenants/{tenant_id}/agents \
-H "X-Admin-Secret: your-admin-secret" \
-H "Content-Type: application/json" \
-d '{
"name": "custom-summarizer",
"agent_type": "summarizer",
"config": {
"model": "llama-3.3-70b",
"system_prompt": "You summarize financial reports concisely.",
"tools": [],
"max_tokens": 2048
}
}'
Issue Additional API Keys
curl -X POST https://api.ares.dirmacs.com/api/admin/tenants/{tenant_id}/api-keys \
-H "X-Admin-Secret: your-admin-secret" \
-H "Content-Type: application/json" \
-d '{"name": "staging-key"}'
Upgrade a Tenant’s Tier
curl -X PUT https://api.ares.dirmacs.com/api/admin/tenants/{tenant_id}/quota \
-H "X-Admin-Secret: your-admin-secret" \
-H "Content-Type: application/json" \
-d '{"tier": "enterprise"}'
Monitor Usage
# Current period summary
curl https://api.ares.dirmacs.com/api/admin/tenants/{tenant_id}/usage \
-H "X-Admin-Secret: your-admin-secret"
# Daily breakdown for the last 30 days
curl "https://api.ares.dirmacs.com/api/admin/tenants/{tenant_id}/usage/daily?days=30" \
-H "X-Admin-Secret: your-admin-secret"
Architecture Notes
- Shared infrastructure: All tenants run on the same ARES instance and database. Isolation is logical, not physical. This keeps operational costs low for the MVP phase.
- Atomic provisioning: The provisioning endpoint uses a database transaction. If agent template cloning fails halfway through, the tenant and any partially created resources are rolled back.
- Key hashing: API keys are hashed before storage. The raw key is returned exactly once during creation. Lost keys must be revoked and replaced.
- Auto-migration: ARES runs database migrations on startup (
sqlx::migrate!()). New tenant-related schema changes are applied automatically when the server restarts.
Rate Limits and Quotas
ARES enforces two independent layers of rate limiting to protect the platform and ensure fair resource allocation across tenants.
Layer 1: IP-Based Rate Limiting
Every incoming request is subject to per-IP rate limiting via tower_governor. This layer protects against abuse, brute-force attacks, and accidental request floods regardless of authentication status.
IP-based limits apply to all routes, including unauthenticated endpoints like /health. The specific thresholds are configured server-side and are intentionally generous for normal usage patterns.
If you hit the IP rate limit, you will receive a 429 Too Many Requests response. Back off and retry after a short delay.
Layer 2: Tenant Quotas
Authenticated requests to /v1/* are additionally subject to tenant-level quotas based on the tenant’s tier. These quotas reset at the beginning of each calendar month.
| Tier | Monthly Requests | Monthly Tokens | Daily Rate Limit |
|---|---|---|---|
| Free | 1,000 | 100,000 | 100/day |
| Dev | 10,000 | 1,000,000 | 1,000/day |
| Pro | 100,000 | 10,000,000 | 10,000/day |
| Enterprise | Unlimited | Unlimited | Unlimited |
What Counts as a Request
Each API call to a metered endpoint counts as one request:
POST /v1/agents/{name}/run— 1 requestPOST /v1/chat— 1 requestPOST /v1/chat/stream— 1 requestGET /v1/agents— 1 request
Read-only endpoints like GET /v1/usage and GET /v1/api-keys are metered but count toward the request total.
What Counts as Tokens
Token usage is tracked per request based on the combined input and output token count from the LLM provider. Both the prompt tokens and completion tokens are summed.
Response Headers
When you make a request to a metered endpoint, ARES includes rate limit information in the response headers:
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests allowed in the current period |
X-RateLimit-Remaining | Requests remaining in the current period |
X-RateLimit-Reset | UTC timestamp when the current period resets |
X-Quota-Tokens-Remaining | Tokens remaining in the current monthly period |
Example headers:
X-RateLimit-Limit: 10000
X-RateLimit-Remaining: 7482
X-RateLimit-Reset: 2026-04-01T00:00:00Z
X-Quota-Tokens-Remaining: 8241037
Exceeding Limits
When you exceed either rate limit layer, ARES returns:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
{
"error": "Rate limit exceeded. Daily request limit reached for your tier."
}
The error message indicates which limit was hit:
| Error Message | Cause | Resolution |
|---|---|---|
Rate limit exceeded | IP-based rate limit | Wait and retry. Reduce request frequency. |
Daily request limit reached for your tier | Tenant daily cap | Wait until the next UTC day, or upgrade your tier. |
Monthly request quota exceeded | Tenant monthly cap | Wait until the next billing period, or upgrade. |
Monthly token quota exceeded | Tenant token cap | Wait until the next billing period, or upgrade. |
Checking Your Usage
You can proactively monitor your consumption to avoid hitting limits:
curl https://api.ares.dirmacs.com/v1/usage \
-H "Authorization: Bearer ares_xxx"
Response:
{
"period_start": "2026-03-01T00:00:00Z",
"period_end": "2026-03-31T23:59:59Z",
"total_runs": 4821,
"total_tokens": 2847193,
"total_api_calls": 5290,
"quota_runs": 100000,
"quota_tokens": 10000000,
"daily_usage": [
{ "date": "2026-03-13", "runs": 312, "tokens": 184920, "api_calls": 340 }
]
}
Compare total_runs against quota_runs and total_tokens against quota_tokens to see how much headroom you have.
Best Practices
-
Monitor usage proactively. Poll
GET /v1/usageperiodically rather than waiting for 429 errors. -
Implement exponential backoff. When you receive a 429, wait before retrying. A simple strategy: wait 1s, then 2s, then 4s, up to a maximum of 30s.
-
Cache where possible. Agent listings and model metadata change infrequently. Cache these responses to reduce unnecessary API calls.
-
Use streaming for chat.
POST /v1/chat/streamcounts as a single request regardless of response length, same as the non-streaming variant. -
Request a tier upgrade early. If you anticipate hitting your quota before month-end, contact your platform administrator to upgrade your tier. Tier changes take effect immediately.
Error Handling
ARES uses conventional HTTP status codes and a consistent JSON error format across all endpoints. This page documents the error response structure, status code meanings, and common errors with their solutions.
Error Response Format
All errors return a JSON object with an error field containing a human-readable message:
{
"error": "Human-readable error message"
}
The HTTP status code indicates the category of error. The error string provides specific details about what went wrong.
HTTP Status Codes
Success Codes
| Code | Meaning | When Used |
|---|---|---|
200 | OK | Successful read or update operation |
201 | Created | Resource successfully created (tenant, agent, API key) |
204 | No Content | Successful delete with no response body |
Client Error Codes
| Code | Meaning | When Used |
|---|---|---|
400 | Bad Request | Malformed JSON, missing required fields, invalid parameter types |
401 | Unauthorized | Missing or invalid authentication credentials |
403 | Forbidden | Valid credentials but insufficient permissions for this operation |
404 | Not Found | Resource does not exist, or does not belong to your tenant |
409 | Conflict | Resource already exists (e.g., duplicate tenant name or agent name) |
422 | Unprocessable Entity | Request is well-formed but contains invalid values (e.g., unknown tier, invalid model name) |
429 | Too Many Requests | Rate limit or quota exceeded |
Server Error Codes
| Code | Meaning | When Used |
|---|---|---|
500 | Internal Server Error | Unexpected server-side failure |
Common Errors and Solutions
Authentication Errors
Missing API key:
HTTP 401
{"error": "Missing authorization header"}
Add the Authorization: Bearer ares_xxx header to your request.
Invalid API key:
HTTP 401
{"error": "Invalid API key"}
Verify that the API key is correct and has not been revoked. API keys start with ares_.
Missing admin secret:
HTTP 401
{"error": "Missing X-Admin-Secret header"}
Admin endpoints require the X-Admin-Secret header, not the Authorization header.
Invalid admin secret:
HTTP 401
{"error": "Invalid admin secret"}
Verify the admin secret matches the value configured in ares.toml.
Resource Errors
Agent not found:
HTTP 404
{"error": "Agent not found: risk-analyzer"}
The agent does not exist for your tenant. Check the agent name with GET /v1/agents. Agent names are case-sensitive.
Tenant not found:
HTTP 404
{"error": "Tenant not found"}
The tenant ID does not exist. List tenants with GET /api/admin/tenants to find the correct ID.
Duplicate resource:
HTTP 409
{"error": "Agent with name 'risk-analyzer' already exists for this tenant"}
An agent with this name already exists. Use a different name or update the existing agent.
Validation Errors
Invalid tier:
HTTP 422
{"error": "Invalid tier: 'gold'. Valid tiers: free, dev, pro, enterprise"}
Use one of the supported tier values.
Missing required field:
HTTP 400
{"error": "Missing required field: name"}
Include all required fields in your request body. Refer to the API documentation for the specific endpoint.
Invalid JSON:
HTTP 400
{"error": "Invalid JSON in request body"}
Ensure your request body is valid JSON. Check for trailing commas, unquoted keys, or mismatched brackets. Verify the Content-Type: application/json header is set.
Rate Limit Errors
Quota exceeded:
HTTP 429
{"error": "Monthly request quota exceeded"}
Your tenant has used all allocated requests for the current billing period. Wait until the period resets or contact your administrator to upgrade your tier.
Daily limit:
HTTP 429
{"error": "Daily request limit reached for your tier"}
Your tenant has hit the daily rate cap. Wait until the next UTC day or upgrade your tier.
See Rate Limits and Quotas for details on limits by tier.
Server Errors
Internal server error:
HTTP 500
{"error": "Internal server error"}
An unexpected error occurred on the server. These are not caused by your request. If the error persists, check service health via GET /api/admin/services or inspect server logs.
Error Handling Best Practices
-
Always check the HTTP status code first. The status code tells you the error category before you parse the response body.
-
Parse the error message for user display. The
errorfield is written to be human-readable and safe to show to end users. -
Retry on 429 and 500. Rate limit errors (429) should be retried with exponential backoff. Server errors (500) may be transient — retry once or twice before treating as a permanent failure.
-
Do not retry on 400, 401, 403, 404, 409, or 422. These indicate problems with the request itself. Fix the request before retrying.
-
Log the full response. When debugging, log both the HTTP status code and the response body. The error message often contains the specific field or value that caused the problem.
Example: Robust Error Handling (Python)
import requests
def run_agent(api_key, agent_name, input_data):
response = requests.post(
f"https://api.ares.dirmacs.com/v1/agents/{agent_name}/run",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
},
json={"input": input_data},
)
if response.status_code == 200:
return response.json()
error = response.json().get("error", "Unknown error")
if response.status_code == 401:
raise AuthenticationError(f"Authentication failed: {error}")
elif response.status_code == 404:
raise AgentNotFoundError(f"Agent '{agent_name}' not found: {error}")
elif response.status_code == 429:
raise RateLimitError(f"Rate limited: {error}")
elif response.status_code >= 500:
raise ServerError(f"Server error: {error}")
else:
raise APIError(f"API error ({response.status_code}): {error}")
Example: Robust Error Handling (JavaScript)
async function runAgent(apiKey, agentName, inputData) {
const response = await fetch(
`https://api.ares.dirmacs.com/v1/agents/${agentName}/run`,
{
method: "POST",
headers: {
"Authorization": `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ input: inputData }),
}
);
if (response.ok) {
return await response.json();
}
const { error } = await response.json();
switch (response.status) {
case 401: throw new Error(`Authentication failed: ${error}`);
case 404: throw new Error(`Agent '${agentName}' not found: ${error}`);
case 429: throw new Error(`Rate limited: ${error}`);
default: throw new Error(`API error (${response.status}): ${error}`);
}
}
Self-Hosting
Run your own ARES instance on your infrastructure. This guide covers local development setup, production deployment, and configuration options.
Prerequisites
| Requirement | Minimum Version | Notes |
|---|---|---|
| Rust | 1.91+ | Install via rustup |
| PostgreSQL | 15+ | Used for tenants, agents, usage tracking |
| Git | 2.x | For cloning the repository |
Optional, depending on your provider configuration:
| Requirement | When Needed |
|---|---|
| Groq API key | Using Groq as an LLM provider |
| Anthropic API key | Using Anthropic as an LLM provider |
| NVIDIA API key | Using NVIDIA-hosted DeepSeek models |
| Ollama | Running local models |
Quick Start
1. Clone the Repository
git clone https://github.com/dirmacs/ares
cd ares
2. Set Up the Database
Create a PostgreSQL database for ARES:
createdb ares
ARES runs migrations automatically on startup. No manual schema setup is required.
3. Create Configuration
Copy the example config and customize it:
cp ares.example.toml ares.toml
Edit ares.toml to configure your providers and models. At minimum, you need one LLM provider:
[server]
port = 3000
[database]
url = "postgres://localhost/ares"
[[providers]]
name = "groq"
type = "openai"
base_url = "https://api.groq.com/openai/v1"
api_key_env = "GROQ_API_KEY"
[[providers.models]]
id = "llama-3.3-70b-versatile"
name = "llama-3.3-70b"
context_length = 131072
4. Set Environment Variables
export DATABASE_URL="postgres://localhost/ares"
export JWT_SECRET="your-secret-key-at-least-32-characters-long"
export API_KEY="your-admin-api-secret"
export GROQ_API_KEY="gsk_..."
| Variable | Required | Description |
|---|---|---|
DATABASE_URL | Yes | PostgreSQL connection string |
JWT_SECRET | Yes | Secret for signing JWT tokens (32+ characters) |
API_KEY | Yes | Admin secret for /api/admin/* endpoints |
GROQ_API_KEY | If using Groq | Groq API key |
ANTHROPIC_API_KEY | If using Anthropic | Anthropic API key |
NVIDIA_API_KEY | If using NVIDIA | NVIDIA API key |
5. Build
cargo build --release --features openai,postgres,mcp
See Feature Flags for all available options.
6. Run
./target/release/ares-server
7. Verify
curl http://localhost:3000/health
You should receive a 200 OK response. ARES is running.
Feature Flags
ARES uses Cargo feature flags to control which capabilities are compiled into the binary. This keeps the binary lean — only include what you need.
| Feature | Default | Description |
|---|---|---|
openai | Yes | OpenAI-compatible provider support (also used for Groq, NVIDIA) |
anthropic | No | Anthropic Claude provider support |
ollama | No | Local Ollama model support |
postgres | Yes | PostgreSQL database backend |
mcp | No | Model Context Protocol support for external tool servers |
ares-vector | No | Vector storage and semantic search |
Build Examples
Minimal build (Groq only):
cargo build --release --no-default-features --features openai,postgres
Full build (all providers):
cargo build --release --features openai,anthropic,ollama,postgres,mcp,ares-vector
Production build (recommended for VPS deployment):
cargo build --release --no-default-features --features openai,postgres,mcp
Production Deployment
systemd Service
Create a systemd unit file at /etc/systemd/system/ares.service:
[Unit]
Description=ARES AI Agent Platform
After=network.target postgresql.service
Wants=postgresql.service
[Service]
Type=simple
User=ares
Group=ares
WorkingDirectory=/opt/ares
ExecStart=/opt/ares/target/release/ares-server
Restart=on-failure
RestartSec=5
Environment=DATABASE_URL=postgres://dirmacs:password@localhost/ares
Environment=JWT_SECRET=your-production-jwt-secret
Environment=API_KEY=your-admin-secret
Environment=GROQ_API_KEY=gsk_...
Environment=RUST_LOG=info
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl enable ares
sudo systemctl start ares
sudo systemctl status ares
View logs:
journalctl -u ares -f
Caddy Reverse Proxy
Caddy provides automatic HTTPS with Let’s Encrypt. Create a Caddyfile:
api.ares.yourdomain.com {
reverse_proxy localhost:3000
}
Start Caddy:
sudo systemctl enable caddy
sudo systemctl start caddy
Caddy automatically provisions and renews TLS certificates. No manual certificate management is needed.
PostgreSQL Setup
For production, create a dedicated database user:
CREATE USER ares WITH PASSWORD 'strong-password-here';
CREATE DATABASE ares OWNER ares;
Update your DATABASE_URL accordingly:
DATABASE_URL=postgres://ares:strong-password-here@localhost/ares
Configuration Reference
The ares.toml file is the primary configuration file. It controls server settings, providers, models, and agent definitions.
Server Section
[server]
port = 3000 # HTTP port (overrides PORT env var)
host = "0.0.0.0" # Bind address
Database Section
[database]
url = "postgres://ares:password@localhost/ares"
max_connections = 10
Provider Section
Each provider is defined as a [[providers]] entry:
[[providers]]
name = "groq"
type = "openai"
base_url = "https://api.groq.com/openai/v1"
api_key_env = "GROQ_API_KEY"
[[providers.models]]
id = "llama-3.3-70b-versatile"
name = "llama-3.3-70b"
context_length = 131072
[[providers.models]]
id = "llama-3.1-8b-instant"
name = "llama-3.1-8b"
context_length = 131072
[[providers]]
name = "anthropic"
type = "anthropic"
api_key_env = "ANTHROPIC_API_KEY"
[[providers.models]]
id = "claude-3-5-sonnet-20241022"
name = "claude-3.5-sonnet"
context_length = 200000
[[providers]]
name = "local"
type = "ollama"
base_url = "http://localhost:11434"
[[providers.models]]
id = "mistral"
name = "mistral-7b"
context_length = 32768
Agent Section
Static agents can be defined in the config file:
[[agents]]
name = "general-assistant"
model = "llama-3.3-70b"
system_prompt = "You are a helpful assistant."
tools = ["calculator", "web_search"]
max_tokens = 4096
For tenant-specific agents, use the Admin API instead of config file definitions.
Updating
To update a running ARES instance:
cd /opt/ares
git pull origin main
cargo build --release --no-default-features --features openai,postgres,mcp
sudo systemctl restart ares
Database migrations run automatically on startup. No manual migration steps are needed.
Troubleshooting
Port already in use:
Error: Address already in use (os error 98)
Another process is using port 3000. Either stop it or change the port in ares.toml.
Database connection failed:
Error: error communicating with database
Verify PostgreSQL is running and your DATABASE_URL is correct. Check that the database user has permissions on the database.
Provider API key missing:
Error: Environment variable GROQ_API_KEY not set
Set the required API key environment variable, or remove the provider from ares.toml if you do not need it.
JWT secret too short:
Error: JWT_SECRET must be at least 32 characters
Use a longer secret. Generate one with: openssl rand -hex 32
Guide: Build a Chat Agent
This guide walks you through creating a custom chat agent on ARES — from defining its behavior to testing it in production.
What is an Agent?
An ARES agent is a configured LLM endpoint with a specific personality, instructions, and tool access. Each agent has:
- A name — unique identifier used in API calls
- A model — which LLM powers it (e.g.,
llama-3.3-70b,claude-3.5-sonnet) - A system prompt — instructions that define the agent’s behavior
- Tools — optional capabilities like
calculatororweb_search - Configuration — max tokens, temperature, and other parameters
You can create agents in two ways: via the configuration file or via the API.
Option 1: Define in ares.toml
For agents that are part of your core platform, define them in the ares.toml configuration file:
[[agents]]
name = "financial-analyst"
model = "llama-3.3-70b"
system_prompt = """
You are a senior financial analyst. You help users understand financial data,
calculate metrics, and provide clear explanations of financial concepts.
Guidelines:
- Always show your calculations step by step
- Use the calculator tool for arithmetic to ensure accuracy
- Present numbers with appropriate formatting (commas, decimal places)
- When uncertain, clearly state your assumptions
"""
tools = ["calculator"]
max_tokens = 4096
Restart ARES to load the new agent. It will be available immediately at /api/chat using agent_type: "financial-analyst".
TOON Config Format
ARES also supports the TOON configuration format for more structured agent definitions:
[[agents]]
name = "support-agent"
model = "llama-3.3-70b"
[agents.toon]
role = "Customer Support Specialist"
personality = "Professional, empathetic, solution-oriented"
knowledge = ["product documentation", "pricing plans", "common issues"]
constraints = [
"Never make up information about products",
"Escalate billing disputes to human agents",
"Always confirm the customer's issue before proposing a solution",
]
tools = ["web_search"]
The TOON format structures the system prompt into semantic fields that ARES assembles into a coherent prompt. This makes agent behavior easier to reason about and modify.
Option 2: Create via API
For tenant-specific agents or agents you want to manage programmatically, use the API.
As a Platform Admin
curl -X POST https://api.ares.dirmacs.com/api/admin/tenants/{tenant_id}/agents \
-H "X-Admin-Secret: your-admin-secret" \
-H "Content-Type: application/json" \
-d '{
"name": "financial-analyst",
"agent_type": "analyst",
"config": {
"model": "llama-3.3-70b",
"system_prompt": "You are a senior financial analyst...",
"tools": ["calculator"],
"max_tokens": 4096
}
}'
As an Authenticated User
curl -X POST https://api.ares.dirmacs.com/api/user/agents \
-H "Authorization: Bearer <jwt_token>" \
-H "Content-Type: application/json" \
-d '{
"name": "my-analyst",
"agent_type": "analyst",
"config": {
"model": "llama-3.3-70b",
"system_prompt": "You are a senior financial analyst...",
"tools": ["calculator"],
"max_tokens": 4096
}
}'
Testing Your Agent
Basic Chat
Send a message to your agent:
curl -X POST https://api.ares.dirmacs.com/api/chat \
-H "Authorization: Bearer <jwt_token>" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What is the compound annual growth rate if revenue went from $1M to $1.8M over 3 years?"}
],
"agent_type": "financial-analyst"
}'
Expected response:
{
"content": "To calculate the Compound Annual Growth Rate (CAGR):\n\nCAGR = (Ending Value / Beginning Value)^(1/n) - 1\nCAGR = ($1,800,000 / $1,000,000)^(1/3) - 1\nCAGR = (1.8)^(0.3333) - 1\nCAGR = 1.2164 - 1\nCAGR = 0.2164\n\n**The CAGR is 21.64%.**\n\nThis means revenue grew at an average annual rate of approximately 21.6% over the 3-year period.",
"model": "llama-3.3-70b",
"tokens_used": 287
}
Multi-Turn Conversation
Include the conversation history in the messages array:
curl -X POST https://api.ares.dirmacs.com/api/chat \
-H "Authorization: Bearer <jwt_token>" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What is the CAGR from $1M to $1.8M over 3 years?"},
{"role": "assistant", "content": "The CAGR is 21.64%..."},
{"role": "user", "content": "What if the period was 5 years instead?"}
],
"agent_type": "financial-analyst"
}'
With Tool Usage
If your agent has tools enabled, ARES handles the tool calling loop automatically. You send a normal chat message, and the agent uses tools as needed:
curl -X POST https://api.ares.dirmacs.com/api/chat \
-H "Authorization: Bearer <jwt_token>" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Calculate 15% annual compound interest on $50,000 over 10 years"}
],
"agent_type": "financial-analyst"
}'
The agent will internally call the calculator tool to compute 50000 * (1.15)^10 and return the formatted result.
Streaming
For real-time responses, use the streaming endpoint:
curl -X POST https://api.ares.dirmacs.com/api/chat/stream \
-H "Authorization: Bearer <jwt_token>" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Explain the difference between NPV and IRR"}
],
"agent_type": "financial-analyst"
}'
This returns a Server-Sent Events stream. See the V1 API docs for client-side streaming examples.
Iterating on the System Prompt
The system prompt is the most important part of your agent. Here are practical guidelines:
Be Specific About Format
Bad:
You are a helpful assistant.
Good:
You are a financial analyst. When presenting calculations:
- Show each step on its own line
- Use the calculator tool for all arithmetic
- Format currency with $ and commas
- Round percentages to 2 decimal places
- End with a bold summary line
Define Boundaries
Tell the agent what it should not do:
Constraints:
- Never provide specific investment advice or recommend buying/selling securities
- If asked about tax implications, recommend consulting a tax professional
- Do not speculate about future market movements
- If you don't have enough data to answer accurately, say so
Include Examples
For complex formatting requirements, show the agent what you want:
When comparing metrics, use this format:
| Metric | 2024 | 2025 | Change |
|--------|------|------|--------|
| Revenue | $1.2M | $1.8M | +50% |
| EBITDA | $300K | $480K | +60% |
Test Edge Cases
After writing your system prompt, test these scenarios:
- Off-topic requests — Does the agent stay in character or helpfully redirect?
- Ambiguous inputs — Does the agent ask for clarification?
- Tool failures — Does the agent handle tool errors gracefully?
- Long conversations — Does the agent maintain context over multiple turns?
Adding Tool Access
Agents can use built-in tools to extend their capabilities:
[[agents]]
name = "research-agent"
model = "llama-3.3-70b"
system_prompt = "You are a research agent with access to web search and calculation tools."
tools = ["calculator", "web_search"]
Available built-in tools:
| Tool | Description |
|---|---|
calculator | Evaluate mathematical expressions |
web_search | Search the web for current information |
See the Tool Calling guide for details on how tool execution works.
Choosing a Model
Different models have different strengths. Consider these factors when choosing:
| Model | Provider | Best For |
|---|---|---|
llama-3.3-70b | Groq | General-purpose, fast, good reasoning |
llama-3.1-8b | Groq | Simple tasks, lowest latency |
deepseek-r1 | NVIDIA | Complex reasoning, chain-of-thought |
claude-3.5-sonnet | Anthropic | Nuanced writing, careful analysis |
Start with llama-3.3-70b for most use cases. It offers a strong balance of capability, speed, and cost. Move to a specialized model only if you have a specific need.
Check available models with:
curl https://api.ares.dirmacs.com/api/admin/models \
-H "X-Admin-Secret: your-admin-secret"
Guide: Tool Calling
ARES supports tool calling (also known as function calling), allowing agents to use external tools during a conversation. When an agent needs to perform a calculation, search the web, or interact with an external system, it requests a tool call. ARES executes the tool and feeds the result back to the agent, which then incorporates it into its response.
How It Works
Tool calling in ARES follows a multi-turn loop managed by the ToolCoordinator:
User message
|
v
Agent (LLM) generates response
|
├── If response is final text → return to user
|
└── If response contains tool_calls →
|
v
ARES executes each tool
|
v
Results sent back to agent
|
v
Agent generates next response (may call more tools or return final text)
This loop continues until the agent produces a final text response or the maximum iteration limit is reached. The entire process is transparent to the caller — you send a chat message and receive a complete response.
Built-in Tools
ARES ships with two built-in tools:
calculator
Evaluates mathematical expressions and returns the result.
Capabilities:
- Basic arithmetic:
+,-,*,/ - Exponents:
^or** - Parentheses for grouping
- Common functions:
sqrt,sin,cos,log,ln,abs - Constants:
pi,e
Example tool call from agent:
{
"name": "calculator",
"arguments": {
"expression": "50000 * (1.15 ^ 10)"
}
}
Result returned to agent:
{
"result": 202278.25
}
web_search
Searches the web and returns relevant results.
Example tool call from agent:
{
"name": "web_search",
"arguments": {
"query": "current US federal interest rate 2026"
}
}
Result returned to agent:
{
"results": [
{
"title": "Federal Reserve holds rate at 4.25%",
"url": "https://...",
"snippet": "The Federal Reserve maintained its benchmark rate..."
}
]
}
Configuring Tool Access
Per-Agent Tool Filtering
Each agent specifies which tools it can use. An agent without tools configured cannot make tool calls, even if the underlying model supports them.
In ares.toml:
[[agents]]
name = "research-assistant"
model = "llama-3.3-70b"
system_prompt = "You are a research assistant with access to web search and calculation tools."
tools = ["calculator", "web_search"]
[[agents]]
name = "math-tutor"
model = "llama-3.3-70b"
system_prompt = "You are a math tutor. Use the calculator to verify your work."
tools = ["calculator"]
[[agents]]
name = "simple-chat"
model = "llama-3.3-70b"
system_prompt = "You are a conversational assistant."
tools = []
Via the API:
curl -X POST https://api.ares.dirmacs.com/api/admin/tenants/{id}/agents \
-H "X-Admin-Secret: your-admin-secret" \
-H "Content-Type: application/json" \
-d '{
"name": "analyst",
"agent_type": "analyst",
"config": {
"model": "llama-3.3-70b",
"system_prompt": "You are a data analyst.",
"tools": ["calculator", "web_search"],
"max_tokens": 4096
}
}'
ToolCoordinator
The ToolCoordinator is the internal component that manages the tool calling loop. It handles:
- Multi-turn orchestration — Sending tool results back to the model and processing follow-up tool calls
- Parallel execution — When the model requests multiple tools in a single turn, they execute concurrently
- Timeout enforcement — Individual tool calls are bounded by a configurable timeout
- Iteration limits — Prevents infinite tool-calling loops
Configuration
Tool calling behavior is configured at the server level:
| Setting | Default | Description |
|---|---|---|
max_iterations | 10 | Maximum tool-calling rounds before forcing a text response |
parallel_execution | true | Execute multiple tool calls concurrently within a single turn |
tool_timeout | 30s | Maximum time for a single tool execution |
If an agent hits the iteration limit, ARES instructs the model to produce a final response using the information gathered so far.
Provider Compatibility
Tool calling requires model support. Not all providers and models support function calling:
| Provider | Models | Tool Calling |
|---|---|---|
| Groq | llama-3.3-70b, llama-3.1-8b | Supported |
| Anthropic | claude-3.5-sonnet | Supported |
| NVIDIA | deepseek-r1 | Not supported |
| Ollama | Varies by model | Model-dependent |
If you assign tools to an agent using a model that does not support tool calling, the tools will be ignored and the agent will respond with text only.
Example: Conversation with Tool Calls
Here is what happens internally when a user asks a question that requires tool use.
User sends:
curl -X POST https://api.ares.dirmacs.com/v1/chat \
-H "Authorization: Bearer ares_xxx" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What is the monthly payment on a $400,000 mortgage at 6.5% for 30 years?"}
],
"agent_type": "financial-analyst"
}'
Internal flow:
- ARES sends the message to the LLM with the calculator tool definition
- The LLM responds with a tool call:
{ "tool_calls": [{ "name": "calculator", "arguments": {"expression": "(400000 * (0.065/12) * (1 + 0.065/12)^360) / ((1 + 0.065/12)^360 - 1)"} }] } - ARES executes the calculator and gets
2528.27 - ARES sends the result back to the LLM
- The LLM produces a final text response incorporating the calculated value
User receives:
{
"content": "The monthly payment on a $400,000 mortgage at 6.5% APR over 30 years would be **$2,528.27**.\n\nThis is calculated using the standard amortization formula...",
"model": "llama-3.3-70b",
"tokens_used": 412
}
The tool-calling steps are invisible to the caller. You send a question and receive a complete answer.
Example: Multiple Tool Calls in One Turn
Models can request multiple tools simultaneously. For example, a research agent asked to “Compare the population of Tokyo and New York” might request two web searches in parallel:
{
"tool_calls": [
{"name": "web_search", "arguments": {"query": "Tokyo population 2026"}},
{"name": "web_search", "arguments": {"query": "New York population 2026"}}
]
}
With parallel_execution enabled (the default), both searches execute concurrently. The results are sent back to the model together, and it produces a response comparing both cities.
Example: Multi-Turn Tool Usage
Some questions require multiple rounds of tool use. For example:
User: “What is 15% of the GDP of France?”
Turn 1 — Agent calls web_search:
{"name": "web_search", "arguments": {"query": "France GDP 2026 USD"}}
Result: France’s GDP is approximately $3.1 trillion.
Turn 2 — Agent calls calculator:
{"name": "calculator", "arguments": {"expression": "3100000000000 * 0.15"}}
Result: 465,000,000,000
Turn 3 — Agent produces final response: “15% of France’s GDP (approximately $3.1 trillion) is $465 billion.”
Each round counts toward the max_iterations limit.
Error Handling
If a tool call fails (timeout, invalid input, etc.), ARES returns an error result to the model:
{
"tool_result": {
"name": "web_search",
"error": "Search timed out after 30 seconds"
}
}
The model can then decide to:
- Retry the tool call with different parameters
- Use a different tool
- Respond with what it knows, noting the tool failure
Well-designed system prompts should instruct the agent on how to handle tool failures gracefully.
Changelog
All notable changes to ARES are documented here. This project follows Semantic Versioning.
0.6.3
Multi-provider LLM, tenant agents, and enterprise metering.
This release transforms ARES from a single-provider system into a full multi-provider LLM platform with enterprise-grade tenant management.
Added
- Multi-provider LLM routing — Support for 4 providers (Groq, Anthropic, NVIDIA DeepSeek, Ollama) and 11 models through a unified API.
- Model tier system —
fast,balanced,powerful,deepseek, andlocaltiers with automatic provider routing. - Tenant agent system — Agents stored in the database per tenant. Template-based provisioning with full CRUD via admin API.
- Agent templates — Seed templates applied automatically on startup. New tenants receive a default agent set.
- Usage metering —
usage_eventstable,monthly_usage_cache, anddaily_rate_limitsfor tracking tokens, requests, and costs per tenant. - API key authentication —
Authorization: Bearer ares_xxxon/v1/*routes with tenant scoping. - Kasino enterprise agents — 4 specialized agent templates (
kasino-classifier,kasino-risk,kasino-transaction,kasino-report) for the first enterprise client. - Kasino API routes — Both JWT-protected (
/api/kasino/*) and API-key (/v1/kasino/*) endpoints. - Admin provisioning API — Atomic tenant creation: schema + agents + API key in a single operation.
Changed
- Chat handler now resolves
tenant_idfrom authentication context instead of hardcoded values. - Provider configuration moved from code to
ares.tomlfor runtime flexibility. - Rate limit enforcement now operates at both the provider and tenant level.
Fixed
- Chat handler tenant_id resolution for multi-tenant requests.
0.6.2
Streaming and SSE support.
Added
- Server-Sent Events streaming —
POST /v1/chat/streamendpoint for real-time, token-by-token responses. - Stream handler — Unified streaming across all providers with consistent SSE format.
- Context continuation —
context_idparameter for maintaining conversation history across requests.
Changed
- Response format standardized to
{"response", "agent", "context_id"}across all endpoints.
0.6.1
Tool calling and RAG foundations.
Added
- Tool calling framework — Define tools per agent. ARES manages the tool-call loop, execution, and response assembly.
- RAG pipeline — Retrieval-augmented generation with pluggable document stores.
- Workflow engine — Chain multiple agents into multi-step workflows with deterministic execution.
Changed
- Agent configuration schema extended to support tool definitions and RAG settings.
0.5.0
JWT authentication and user management.
Added
- User registration and login —
POST /api/auth/register,POST /api/auth/login. - JWT token lifecycle — 15-minute access tokens, refresh token rotation, logout/invalidation.
- Role-based access — User roles with permission checks on protected routes.
- Admin authentication —
X-Admin-Secretheader for internal administration endpoints.
Changed
- All
/api/*routes now require JWT authentication. - Error responses standardized with
errorandcodefields.
0.4.0
PostgreSQL backend and multi-tenant schema.
Added
- PostgreSQL integration — Full migration from in-memory storage to PostgreSQL with
sqlx. - Auto-migration —
sqlx::migrate!()runs on startup. No manual SQL required. - Tenant schema —
tenants,tenant_agents, andapi_keystables with foreign key relationships. - Tenant tiers — Free, Dev, Pro, and Enterprise tiers with configurable limits.
Changed
- All state persistence moved from in-memory structures to PostgreSQL.
- Connection pooling via
sqlx::PgPoolwith configurable pool size.
For the complete commit history, see the ARES repository on GitHub.