Lancor — End-to-End llama.cpp Toolkit

LlamaCppClient

Async OpenAI-compatible API client. Chat completions (streaming + non-streaming), text completions, embeddings. Builder pattern, type-safe.

HuggingFace Hub

Pure Rust HF Hub client. Search models, list GGUF files, download with progress callbacks. Auto-detects HF_TOKEN. Local cache management.

Server Orchestration

Programmatic control of llama-server, llama-cli, llama-quantize, llama-bench. Start, wait for health, stop. GPU layers, flash attention, mlock.

Benchmark Suite

5-test triage: throughput (tok/s), tool calling, multi-tool, codegen (fizzbuzz), reasoning. Compare quantizations. JSON export. Auto-manages server.

Quick Start

API Client

use lancor::{LlamaCppClient, ChatCompletionRequest, Message};

let client = LlamaCppClient::new("http://localhost:8080")?;
let request = ChatCompletionRequest::new("your-model")
    .message(Message::user("What is Rust?"))
    .max_tokens(100);
let response = client.chat_completion(request).await?;

CLI

lancor pull unsloth/Qwen3.5-35B-A3B-GGUF model-Q4_K_M.gguf
lancor list
lancor search "qwen3.5 gguf"
lancor bench model-Q4_K_M.gguf --ngl 99 --json