ContextOS — Your AI tools share one brain

How it works

One memory store. Every AI tool.

ContextOS sits between your apps and your LLMs. Write from any app. Read from any app. The same user_id is the only key — that's all it takes for cross-app memory.

Write after any conversation

POST the conversation from Claude, GPT, or your custom agent. ContextOS extracts discrete memory fragments, embeds them, deduplicates, and stores. Returns 202 immediately — extraction runs in the background.

LLM extraction (Anthropic / OpenAI / mock)

Embed (local sentence-transformers or OpenAI)

Deduplicate + consolidate near-matches

Store in Postgres + pgvector

Query before any LLM call

GET memory for a user with your query string — from any app, any model. ContextOS runs hybrid search, applies decay scoring, and returns a prompt_block ready to inject. Memory written by your GPT app is now available to your Claude app.

Redis cache check (60s TTL)

BM25 (Postgres FTS) + cosine vector search

Reciprocal Rank Fusion (k=60)

Re-rank: similarity + importance + decay

Features

The shared brain your AI tools need.

Cross-app memory by default. Production-grade from day one.

🌐

Cross-app Memory

user_id is the only key. Memory written by your Claude app is instantly available to your GPT app — and any other LLM you build. ContextOS is the shared layer.

🧠

Hybrid Retrieval

BM25 full-text search combined with pgvector cosine similarity, fused with Reciprocal Rank Fusion. Catches what pure vector search misses.

⚡

Redis Hot Cache

GET /memory results cached in Redis with a 60s TTL, keyed by SHA-256 of user + query + params. Repeat queries are instant.

🔀

Memory Consolidation

Near-match fragments (cosine 0.75–0.94) automatically supersede outdated ones. Contradictions resolve in favor of the newest information.

📉

Decay Scoring

Exponential time decay with a 30-day half-life. Stale fragments lose weight automatically — fresher memories surface first.

🏢

Multi-tenancy

Full app isolation with per-app API keys. Admin API for app lifecycle, key rotation, usage stats, and GDPR bulk delete.

🔁

Retry + Dead Letter

Extraction retries with exponential backoff (3×, 2s / 4s / 8s). Exhausted jobs land in a dead-letter table for inspection and replay.

📊

Structured Logging

JSON logs via structlog with X-Request-ID tracing on every request. Trace a session write through extraction, embedding, and storage.

🛡️

Rate Limiting

Per-API-key rate limits via slowapi. 60 req/min on writes, 120 req/min on reads. Falls back to IP if no key is present.

Quickstart

Up in three commands.

No API keys needed for local dev — uses the mock extractor and local embeddings.

# Clone and start the Docker stack
git clone https://github.com/bythebug/context-os
cd context-os
cp .env.example .env
docker compose up -d

# Create your first API key
python scripts/seed_api_key.py \
  --app-name "my-app" \
  --database-url postgresql://contextos:contextos@localhost:5433/contextos
# → API key: sk-...

curl -X POST http://localhost:8000/sessions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alice",
    "conversation": "User: I prefer async Python and deploy on Fly.io\nAssistant: Got it.",
    "source_client": "my-app"
  }'

# Response: 202 Accepted — extraction runs in background
{
  "session_id": "uuid",
  "status": "accepted"
}

curl "http://localhost:8000/memory?user_id=alice&q=deployment" \
  -H "Authorization: Bearer sk-..."

# Response
{
  "user_id": "alice",
  "fragments": [
    {
      "content": "User deploys on Fly.io",
      "type": "decision",
      "importance": 4,
      "score": 0.89
    }
  ],
  "prompt_block": "Relevant context about this user:\n- [decision] User deploys on Fly.io (relevance: 0.89)"
}

import httpx, anthropic

async def chat(user_id, message):
    # 1. Fetch memory
    mem = httpx.get(
        "http://localhost:8000/memory",
        params={"user_id": user_id, "q": message},
        headers={"Authorization": "Bearer sk-..."}
    ).json()

    # 2. Inject into system prompt — one line
    system = f"You are helpful.\n\n{mem['prompt_block']}"

    # 3. Call LLM — nothing else changes
    reply = anthropic.Anthropic().messages.create(
        model="claude-opus-4-6", system=system,
        messages=[{"role": "user", "content": message}]
    ).content[0].text

    # 4. Save conversation
    httpx.post("http://localhost:8000/sessions",
        json={"user_id": user_id, "conversation": f"User: {message}\nAssistant: {reply}"},
        headers={"Authorization": "Bearer sk-..."})
    return reply

# Two apps. One ContextOS server. One brain.
# App 1: Claude app writes Alice's memory
import httpx

claude_app = httpx.Client(
    base_url="http://localhost:8000",
    headers={"Authorization": "Bearer sk-claude-app"}
)

# Alice tells your Claude app she prefers async Python
claude_app.post("/sessions", json={
    "user_id": "alice",
    "conversation": "User: I always use async Python and deploy on Fly.io\nAssistant: Got it."
})

# App 2: GPT app reads Alice's memory — written by the Claude app
gpt_app = httpx.Client(
    base_url="http://localhost:8000",
    headers={"Authorization": "Bearer sk-gpt-app"}
)

mem = gpt_app.get("/memory", params={
    "user_id": "alice",
    "q": "what does alice prefer?"
}).json()

# → "Relevant context: [preference] User prefers async Python. Deploys on Fly.io."
# Alice never re-introduced herself. Your GPT app already knows her.
print(mem["prompt_block"])

API Reference

Simple, RESTful endpoints.

All endpoints require Authorization: Bearer <api-key>. Admin endpoints require Admin-Key header.

POST

/sessions

Ingest a conversation. Extraction runs async in the background. Returns 202 immediately.

60 req/min

GET

/memory

Retrieve top-k relevant fragments for a user. Params: user_id, q, top_k, scope, type.

120 req/min

DELETE

/memory/:id

Delete a specific fragment by ID. Scoped to the calling app.

app-scoped

GET

/health

Check service health. Returns status of Postgres and Redis.

public

POST

/admin/apps

Create an app. Returns app ID used to issue API keys.

Admin-Key

POST

/admin/apps/:id/keys

Issue a new API key. Raw key returned once — store it immediately.

Admin-Key

GET

/admin/apps/:id/usage

Fragment count, unique users, dead-letter count, and last active time.

Admin-Key

DELETE

/admin/memory

GDPR bulk delete — wipe all fragments for a user_id, optionally scoped to one app.

Admin-Key

SDKs

Native clients for every stack.

Sync and async Python. Typed TypeScript with zero runtime dependencies.

🐍

Python SDK

Sync + async · httpx only · Python 3.9+

$ pip install contextos # PyPI — coming soon; for now: pip install ./sdk/python

from contextos import ContextOS

client = ContextOS(api_key="sk-...")

# Write
client.write(user_id="alice", conversation=text)

# Query
mem = client.query(user_id="alice", q=message)
system = f"You are helpful.\n\n{mem.prompt_block}"

# Async versions: awrite(), aquery(), adelete()

🟦

TypeScript SDK

Zero runtime deps · Node.js + edge runtimes

$ cp sdk/typescript/src/index.ts ./lib/

import { ContextOS } from "./lib/contextos"

const client = new ContextOS({ apiKey: "sk-..." })

// Write
await client.write("alice", conversation)

// Query
const mem = await client.query("alice", message)
const system = `You are helpful.\n\n${mem.prompt_block}`

Architecture

One stack. No extra infra.

Postgres handles both relational data and vector search — no separate vector database.

API

FastAPI

Async Python, automatic OpenAPI docs, native background tasks.

Vector store

Postgres + pgvector

Single DB for relational + vector. No Pinecone, no Weaviate.

Cache

Redis

60s TTL on GET /memory. Repeat queries skip embedding + DB entirely.

Embeddings

Local / OpenAI

sentence-transformers (384-dim) for dev. OpenAI (1536-dim) for production.

Extraction

Anthropic / OpenAI / Mock

Mock provider for local dev — no API keys needed.

Migrations

Alembic

Versioned schema changes, async-compatible, 3 revisions so far.

Deploy

Fly.io / Docker

fly.toml included. docker-compose.yml for local development.

Auth

Bearer API key

SHA-256 hashed. OpenAI-familiar pattern. Raw key shown once on creation.

Your AI tools havedifferent brains.

One memory store. Every AI tool.

Write after any conversation

Query before any LLM call

The shared brain your AI tools need.

Cross-app Memory

Hybrid Retrieval

Redis Hot Cache

Memory Consolidation

Decay Scoring

Multi-tenancy

Retry + Dead Letter

Structured Logging

Rate Limiting

Up in three commands.

Simple, RESTful endpoints.

Native clients for every stack.

Python SDK

TypeScript SDK

One stack. No extra infra.

Your AI tools have
different brains.