End-to-end llama.cpp toolkit in Rust. API client, HuggingFace Hub model management, server orchestration, and a 5-test benchmark suite.
Async OpenAI-compatible API client. Chat completions (streaming + non-streaming), text completions, embeddings. Builder pattern, type-safe.
Pure Rust HF Hub client. Search models, list GGUF files, download with progress callbacks. Auto-detects HF_TOKEN. Local cache management.
Programmatic control of llama-server, llama-cli, llama-quantize, llama-bench. Start, wait for health, stop. GPU layers, flash attention, mlock.
5-test triage: throughput (tok/s), tool calling, multi-tool, codegen (fizzbuzz), reasoning. Compare quantizations. JSON export. Auto-manages server.
use lancor::{LlamaCppClient, ChatCompletionRequest, Message};
let client = LlamaCppClient::new("http://localhost:8080")?;
let request = ChatCompletionRequest::new("your-model")
.message(Message::user("What is Rust?"))
.max_tokens(100);
let response = client.chat_completion(request).await?;
lancor pull unsloth/Qwen3.5-35B-A3B-GGUF model-Q4_K_M.gguf
lancor list
lancor search "qwen3.5 gguf"
lancor bench model-Q4_K_M.gguf --ngl 99 --json