ContextOS gives them one.
Claude remembers Claude. ChatGPT remembers ChatGPT. Your custom agent remembers nothing. ContextOS is the shared memory layer that runs on your machine — any LLM reads from it, any LLM writes to it. Your data. No vendor lock-in.
ContextOS sits between your apps and your LLMs. Write from any app. Read from any app.
The same user_id is the only key — that's all it takes for cross-app memory.
POST the conversation from Claude, GPT, or your custom agent. ContextOS extracts discrete memory fragments, embeds them, deduplicates, and stores. Returns 202 immediately — extraction runs in the background.
GET memory for a user with your query string — from any app, any model. ContextOS runs hybrid search, applies decay scoring, and returns a prompt_block ready to inject. Memory written by your GPT app is now available to your Claude app.
Cross-app memory by default. Production-grade from day one.
user_id is the only key. Memory written by your Claude app is instantly available to your GPT app — and any other LLM you build. ContextOS is the shared layer.
BM25 full-text search combined with pgvector cosine similarity, fused with Reciprocal Rank Fusion. Catches what pure vector search misses.
GET /memory results cached in Redis with a 60s TTL, keyed by SHA-256 of user + query + params. Repeat queries are instant.
Near-match fragments (cosine 0.75–0.94) automatically supersede outdated ones. Contradictions resolve in favor of the newest information.
Exponential time decay with a 30-day half-life. Stale fragments lose weight automatically — fresher memories surface first.
Full app isolation with per-app API keys. Admin API for app lifecycle, key rotation, usage stats, and GDPR bulk delete.
Extraction retries with exponential backoff (3×, 2s / 4s / 8s). Exhausted jobs land in a dead-letter table for inspection and replay.
JSON logs via structlog with X-Request-ID tracing on every request. Trace a session write through extraction, embedding, and storage.
Per-API-key rate limits via slowapi. 60 req/min on writes, 120 req/min on reads. Falls back to IP if no key is present.
No API keys needed for local dev — uses the mock extractor and local embeddings.
# Clone and start the Docker stack git clone https://github.com/bythebug/context-os cd context-os cp .env.example .env docker compose up -d # Create your first API key python scripts/seed_api_key.py \ --app-name "my-app" \ --database-url postgresql://contextos:contextos@localhost:5433/contextos # → API key: sk-...
curl -X POST http://localhost:8000/sessions \ -H "Authorization: Bearer sk-..." \ -H "Content-Type: application/json" \ -d '{ "user_id": "alice", "conversation": "User: I prefer async Python and deploy on Fly.io\nAssistant: Got it.", "source_client": "my-app" }' # Response: 202 Accepted — extraction runs in background { "session_id": "uuid", "status": "accepted" }
curl "http://localhost:8000/memory?user_id=alice&q=deployment" \ -H "Authorization: Bearer sk-..." # Response { "user_id": "alice", "fragments": [ { "content": "User deploys on Fly.io", "type": "decision", "importance": 4, "score": 0.89 } ], "prompt_block": "Relevant context about this user:\n- [decision] User deploys on Fly.io (relevance: 0.89)" }
import httpx, anthropic async def chat(user_id, message): # 1. Fetch memory mem = httpx.get( "http://localhost:8000/memory", params={"user_id": user_id, "q": message}, headers={"Authorization": "Bearer sk-..."} ).json() # 2. Inject into system prompt — one line system = f"You are helpful.\n\n{mem['prompt_block']}" # 3. Call LLM — nothing else changes reply = anthropic.Anthropic().messages.create( model="claude-opus-4-6", system=system, messages=[{"role": "user", "content": message}] ).content[0].text # 4. Save conversation httpx.post("http://localhost:8000/sessions", json={"user_id": user_id, "conversation": f"User: {message}\nAssistant: {reply}"}, headers={"Authorization": "Bearer sk-..."}) return reply
# Two apps. One ContextOS server. One brain. # App 1: Claude app writes Alice's memory import httpx claude_app = httpx.Client( base_url="http://localhost:8000", headers={"Authorization": "Bearer sk-claude-app"} ) # Alice tells your Claude app she prefers async Python claude_app.post("/sessions", json={ "user_id": "alice", "conversation": "User: I always use async Python and deploy on Fly.io\nAssistant: Got it." }) # App 2: GPT app reads Alice's memory — written by the Claude app gpt_app = httpx.Client( base_url="http://localhost:8000", headers={"Authorization": "Bearer sk-gpt-app"} ) mem = gpt_app.get("/memory", params={ "user_id": "alice", "q": "what does alice prefer?" }).json() # → "Relevant context: [preference] User prefers async Python. Deploys on Fly.io." # Alice never re-introduced herself. Your GPT app already knows her. print(mem["prompt_block"])
All endpoints require Authorization: Bearer <api-key>. Admin endpoints require Admin-Key header.
Sync and async Python. Typed TypeScript with zero runtime dependencies.
Sync + async · httpx only · Python 3.9+
from contextos import ContextOS client = ContextOS(api_key="sk-...") # Write client.write(user_id="alice", conversation=text) # Query mem = client.query(user_id="alice", q=message) system = f"You are helpful.\n\n{mem.prompt_block}" # Async versions: awrite(), aquery(), adelete()
Zero runtime deps · Node.js + edge runtimes
import { ContextOS } from "./lib/contextos" const client = new ContextOS({ apiKey: "sk-..." }) // Write await client.write("alice", conversation) // Query const mem = await client.query("alice", message) const system = `You are helpful.\n\n${mem.prompt_block}`
Postgres handles both relational data and vector search — no separate vector database.