Open Source · MIT License

Your AI tools have
different brains.

ContextOS gives them one.

Claude remembers Claude. ChatGPT remembers ChatGPT. Your custom agent remembers nothing. ContextOS is the shared memory layer that runs on your machine — any LLM reads from it, any LLM writes to it. Your data. No vendor lock-in.

Get started View on GitHub
After conversation
POST /sessions extract embed store
Before LLM call
GET /memory hybrid search re-rank prompt_block

One memory store. Every AI tool.

ContextOS sits between your apps and your LLMs. Write from any app. Read from any app. The same user_id is the only key — that's all it takes for cross-app memory.

01

Write after any conversation

POST the conversation from Claude, GPT, or your custom agent. ContextOS extracts discrete memory fragments, embeds them, deduplicates, and stores. Returns 202 immediately — extraction runs in the background.

LLM extraction (Anthropic / OpenAI / mock)
Embed (local sentence-transformers or OpenAI)
Deduplicate + consolidate near-matches
Store in Postgres + pgvector
02

Query before any LLM call

GET memory for a user with your query string — from any app, any model. ContextOS runs hybrid search, applies decay scoring, and returns a prompt_block ready to inject. Memory written by your GPT app is now available to your Claude app.

Redis cache check (60s TTL)
BM25 (Postgres FTS) + cosine vector search
Reciprocal Rank Fusion (k=60)
Re-rank: similarity + importance + decay

The shared brain your AI tools need.

Cross-app memory by default. Production-grade from day one.

🌐

Cross-app Memory

user_id is the only key. Memory written by your Claude app is instantly available to your GPT app — and any other LLM you build. ContextOS is the shared layer.

🧠

Hybrid Retrieval

BM25 full-text search combined with pgvector cosine similarity, fused with Reciprocal Rank Fusion. Catches what pure vector search misses.

Redis Hot Cache

GET /memory results cached in Redis with a 60s TTL, keyed by SHA-256 of user + query + params. Repeat queries are instant.

🔀

Memory Consolidation

Near-match fragments (cosine 0.75–0.94) automatically supersede outdated ones. Contradictions resolve in favor of the newest information.

📉

Decay Scoring

Exponential time decay with a 30-day half-life. Stale fragments lose weight automatically — fresher memories surface first.

🏢

Multi-tenancy

Full app isolation with per-app API keys. Admin API for app lifecycle, key rotation, usage stats, and GDPR bulk delete.

🔁

Retry + Dead Letter

Extraction retries with exponential backoff (3×, 2s / 4s / 8s). Exhausted jobs land in a dead-letter table for inspection and replay.

📊

Structured Logging

JSON logs via structlog with X-Request-ID tracing on every request. Trace a session write through extraction, embedding, and storage.

🛡️

Rate Limiting

Per-API-key rate limits via slowapi. 60 req/min on writes, 120 req/min on reads. Falls back to IP if no key is present.

Up in three commands.

No API keys needed for local dev — uses the mock extractor and local embeddings.

# Clone and start the Docker stack
git clone https://github.com/bythebug/context-os
cd context-os
cp .env.example .env
docker compose up -d

# Create your first API key
python scripts/seed_api_key.py \
  --app-name "my-app" \
  --database-url postgresql://contextos:contextos@localhost:5433/contextos
# → API key: sk-...
curl -X POST http://localhost:8000/sessions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alice",
    "conversation": "User: I prefer async Python and deploy on Fly.io\nAssistant: Got it.",
    "source_client": "my-app"
  }'

# Response: 202 Accepted — extraction runs in background
{
  "session_id": "uuid",
  "status": "accepted"
}
curl "http://localhost:8000/memory?user_id=alice&q=deployment" \
  -H "Authorization: Bearer sk-..."

# Response
{
  "user_id": "alice",
  "fragments": [
    {
      "content": "User deploys on Fly.io",
      "type": "decision",
      "importance": 4,
      "score": 0.89
    }
  ],
  "prompt_block": "Relevant context about this user:\n- [decision] User deploys on Fly.io (relevance: 0.89)"
}
import httpx, anthropic

async def chat(user_id, message):
    # 1. Fetch memory
    mem = httpx.get(
        "http://localhost:8000/memory",
        params={"user_id": user_id, "q": message},
        headers={"Authorization": "Bearer sk-..."}
    ).json()

    # 2. Inject into system prompt — one line
    system = f"You are helpful.\n\n{mem['prompt_block']}"

    # 3. Call LLM — nothing else changes
    reply = anthropic.Anthropic().messages.create(
        model="claude-opus-4-6", system=system,
        messages=[{"role": "user", "content": message}]
    ).content[0].text

    # 4. Save conversation
    httpx.post("http://localhost:8000/sessions",
        json={"user_id": user_id, "conversation": f"User: {message}\nAssistant: {reply}"},
        headers={"Authorization": "Bearer sk-..."})
    return reply
# Two apps. One ContextOS server. One brain.
# App 1: Claude app writes Alice's memory
import httpx

claude_app = httpx.Client(
    base_url="http://localhost:8000",
    headers={"Authorization": "Bearer sk-claude-app"}
)

# Alice tells your Claude app she prefers async Python
claude_app.post("/sessions", json={
    "user_id": "alice",
    "conversation": "User: I always use async Python and deploy on Fly.io\nAssistant: Got it."
})

# App 2: GPT app reads Alice's memory — written by the Claude app
gpt_app = httpx.Client(
    base_url="http://localhost:8000",
    headers={"Authorization": "Bearer sk-gpt-app"}
)

mem = gpt_app.get("/memory", params={
    "user_id": "alice",
    "q": "what does alice prefer?"
}).json()

# → "Relevant context: [preference] User prefers async Python. Deploys on Fly.io."
# Alice never re-introduced herself. Your GPT app already knows her.
print(mem["prompt_block"])

Simple, RESTful endpoints.

All endpoints require Authorization: Bearer <api-key>. Admin endpoints require Admin-Key header.

POST
/sessions
Ingest a conversation. Extraction runs async in the background. Returns 202 immediately.
60 req/min
GET
/memory
Retrieve top-k relevant fragments for a user. Params: user_id, q, top_k, scope, type.
120 req/min
DELETE
/memory/:id
Delete a specific fragment by ID. Scoped to the calling app.
app-scoped
GET
/health
Check service health. Returns status of Postgres and Redis.
public
POST
/admin/apps
Create an app. Returns app ID used to issue API keys.
Admin-Key
POST
/admin/apps/:id/keys
Issue a new API key. Raw key returned once — store it immediately.
Admin-Key
GET
/admin/apps/:id/usage
Fragment count, unique users, dead-letter count, and last active time.
Admin-Key
DELETE
/admin/memory
GDPR bulk delete — wipe all fragments for a user_id, optionally scoped to one app.
Admin-Key

Native clients for every stack.

Sync and async Python. Typed TypeScript with zero runtime dependencies.

🐍

Python SDK

Sync + async · httpx only · Python 3.9+

$ pip install contextos # PyPI — coming soon; for now: pip install ./sdk/python
from contextos import ContextOS

client = ContextOS(api_key="sk-...")

# Write
client.write(user_id="alice", conversation=text)

# Query
mem = client.query(user_id="alice", q=message)
system = f"You are helpful.\n\n{mem.prompt_block}"

# Async versions: awrite(), aquery(), adelete()
🟦

TypeScript SDK

Zero runtime deps · Node.js + edge runtimes

$ cp sdk/typescript/src/index.ts ./lib/
import { ContextOS } from "./lib/contextos"

const client = new ContextOS({ apiKey: "sk-..." })

// Write
await client.write("alice", conversation)

// Query
const mem = await client.query("alice", message)
const system = `You are helpful.\n\n${mem.prompt_block}`

One stack. No extra infra.

Postgres handles both relational data and vector search — no separate vector database.

API
FastAPI
Async Python, automatic OpenAPI docs, native background tasks.
Vector store
Postgres + pgvector
Single DB for relational + vector. No Pinecone, no Weaviate.
Cache
Redis
60s TTL on GET /memory. Repeat queries skip embedding + DB entirely.
Embeddings
Local / OpenAI
sentence-transformers (384-dim) for dev. OpenAI (1536-dim) for production.
Extraction
Anthropic / OpenAI / Mock
Mock provider for local dev — no API keys needed.
Migrations
Alembic
Versioned schema changes, async-compatible, 3 revisions so far.
Deploy
Fly.io / Docker
fly.toml included. docker-compose.yml for local development.
Auth
Bearer API key
SHA-256 hashed. OpenAI-familiar pattern. Raw key shown once on creation.