Skip to content

ollama-server (Rust)

Ollama-compatible HTTP surface for hum — drop-in for any Ollama client

A bee that fronts hum’s local thrum socket with the Ollama REST API. Drop-in for ollama, open-webui, lobe-chat, cline, LangChain’s ChatOllama, or anything else that speaks Ollama’s /api/chat + /api/generate. Default port 11434 matches Ollama’s own default so most clients work with zero config.

Built in Rust with axum. Workspace member of the main hum repo.

Propensity

statefulnessrichnesswire shapehides
convention-statefulmediumOllama /api/chat + /api/generate NDJSONpulse, breath, drone, perf-mark, tendril, permission-ask, tool-meta

Ollama’s streaming response is already line-delimited JSON — it maps to thrum’s frame format 1:1. No SSE re-framing.

What it does

client ollama-server humd
│ │ │
│ POST /api/chat │ │
├──────────────────────────────────────►│ │
│ { model, messages, stream:true, │ │
│ tools? } │ chi:"prompt" │
│ ├───────────────────────────►│
│ │ chi:"chunk" │
│ │◄───────────────────────────┤
│ {"message":{"content":"Hi"},"done":false} │
│◄──────────────────────────────────────┤ │
│ │ chi:"finish" │
│ │◄───────────────────────────┤
│ {"done":true,...} │ │
│◄──────────────────────────────────────┤ │

Endpoints

routewhat
POST /api/chatmulti-message turn; NDJSON streaming
POST /api/generatesingle prompt; NDJSON streaming
GET /api/tagslist of available models (synthesized)
GET /health probe — returns "Ollama is running" like the real Ollama

Configure

envdefaultwhat
OLLAMA_SERVER_PORT11434HTTP listen port (matches Ollama’s default)
OLLAMA_SERVER_HOST127.0.0.1HTTP listen host
OLLAMA_SERVER_MODELSclaude-sonnet-4,claude-haiku-4.5,claude-opus-4.7comma-separated list returned by /api/tags
HUM_THRUM_SOCK$XDG_RUNTIME_DIR/hum/thrum.sockhumd’s NDJSON socket

Optional per-kind config file at ~/.config/hum/hives/ollama-server.json:

{
"host": "127.0.0.1",
"port": 11434,
"models": ["claude-sonnet-4", "claude-haiku-4.5", "claude-opus-4.7"]
}

Precedence: env > config file > built-in defaults.

Run

Terminal window
# From the workspace root.
cargo run -p ollama-server
# Listen on a different port:
OLLAMA_SERVER_PORT=14620 cargo run -p ollama-server

Use

Plain curl:

Terminal window
curl http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4",
"stream": true,
"messages": [{ "role": "user", "content": "ping" }]
}'

Ollama JS client:

import { Ollama } from "ollama";
const ollama = new Ollama({ host: "http://localhost:11434" });
const r = await ollama.chat({
model: "claude-sonnet-4",
messages: [{ role: "user", content: "ping" }],
stream: true,
});
for await (const chunk of r) process.stdout.write(chunk.message.content);

LangChain:

from langchain_ollama import ChatOllama
llm = ChatOllama(model="claude-sonnet-4", base_url="http://localhost:11434")
print(llm.invoke("ping"))

What flows where

Ollama surfacehum chi
POST /api/chatchi:"prompt"
messages[].role=systemprompt.systemPrompt
tools[] (function)prompt.tools[]
streamed message.contentchi:"chunk" (text part)
streamed message.tool_callschi:"chunk" (tool_use part)
done:true linechi:"finish"
error linechi:"error"

Status

Reference implementation. Streaming + non-streaming both supported. Tool use forwarded; embedding endpoints and image generation are not.

See also