Production Guide¶
Engram is a beta-stage cognitive architecture. This guide outlines the best practices, security requirements, and operational strategies required to run it safely and reliably in a production environment.
[!WARNING] Because Engram is in active development, the database schema and API contracts are subject to change. Ensure you have automated backups and a staged deployment environment before upgrading versions.
1. Deployment Checklist¶
Before taking an Engram-backed application live, ensure you have addressed the following:
- [ ] Database Setup: You are running PostgreSQL 16+ with the
pgvectorandpg_trgmextensions enabled. - [ ] Process Isolation: You have separated your synchronous API web servers from your asynchronous background Memory Workers.
- [ ] Multi-Tenant Security: Every
engramoperation is strictly scoped byuser_idandagent_idat the API layer. - [ ] Connection Pooling: You are using PgBouncer (or similar) in front of PostgreSQL, and have tuned
ENGRAM_MIN_POOL_SIZEandENGRAM_MAX_POOL_SIZEappropriately for your app nodes. - [ ] Policy Selection: You have explicitly chosen or written a
MemoryPolicythat matches your domain's needs (e.g.,coding_agent,legal). - [ ] Metrics: You are actively monitoring the
memory_jobstable for backlog saturation and derivation failures.
2. Recommended Runtime Architecture¶
Engram is designed to be highly concurrent, but to achieve low latency for the end user, you must separate retrieval from derivation.
| Component | Responsibility | Scaling Strategy |
|---|---|---|
| API / Web Process | Executes engram.recall() to fetch context, streams LLM responses, and appends raw events to the ledger. |
Scale horizontally based on user traffic. |
| Memory Worker | Runs engram.run_memory_worker() continuously. Pops raw events off the queue, calls the LLM to extract facts, and creates task checkpoints. |
Scale vertically or horizontally based on LLM throughput limits and job queue depth. |
| PostgreSQL | Stores vectors, task ledgers, and handles recursive graph queries. | Scale vertically (RAM) to keep the active pgvector indexes in memory. |
3. The Production Turn Loop¶
In production, you should rely on the recall() operator to intelligently route the user's intent, rather than manually hacking together vector searches and keyword lookups.
async def handle_turn(engram, task_id, agent_id, user_id, user_message):
# 1. Build a prompt-ready memory block (Vector + Ledger + Lineage)
# trace_recall returns the assembled context plus retrieval observability,
# not an auto-reply.
trace = await engram.trace_recall(
query=user_message,
agent_id=agent_id,
user_id=user_id,
)
# 2. Call your LLM, streaming the result to the user
response = await call_your_llm(
memory_context=trace.context,
user_message=user_message,
)
# 3. Append to the immutable ledger and queue background processing
await engram.record_turn(
task_id,
user_message=user_message,
assistant_response=response,
agent_id=agent_id,
user_id=user_id,
enqueue_processing=True, # Crucial: defers the heavy extraction
)
return response
The Worker Process¶
In a completely separate deployment service (e.g., a background container):
from engram import Engram
async def boot_worker():
async with Engram(memory_policy="coding_agent") as engram:
# Runs infinitely, processing 20 events per second
await engram.run_memory_worker(batch_size=20, interval_seconds=1.0)
4. Security & Privacy¶
Engram stores raw conversations and distilled insights. Treat this data as highly sensitive.
- App-Level Auth: Engram does not enforce authentication. Your FastAPI/Django/Express layer must authenticate the user before passing their
user_iddown to Engram. - Provider Leaks: Be extremely careful about which LLM provider you use for background derivation. Do not send sensitive enterprise data to public models unless you have a zero-retention data processing agreement (DPA).
- Data Deletion: To comply with GDPR or CCPA, use
purge(user_id="...")to destroy all semantic facts, and ensure your app layer drops the corresponding task ledgers. - Redaction: If a user pastes API keys or PII, use the
redact_event()API on the raw ledger, and supersede any semantic facts derived from that event.
5. Long Inputs & Exact Context¶
For legal, compliance, or financial applications, vector similarity is not enough. You must be able to cite the exact source document.
Do not dump 50-page documents into standard add() calls. Use record_long_input() to securely chunk the document, generate precise quote hashes, and create bounded artifact events.
[!TIP] When building enterprise apps, force your LLM's system prompt to cite the
chunk_idorquote_hashreturned bybuild_long_input_context(). This guarantees hallucination-free citations.
6. Observability & Failure Modes¶
When running Engram in production, monitor the following signals:
| Signal | Metric / Query | Mitigation |
|---|---|---|
| Worker Saturation | Count of pending rows in memory_jobs |
If the backlog grows indefinitely, scale up your worker instances or switch to a faster extraction LLM model. |
| Extraction Failures | Count of failed rows in memory_jobs |
Typically caused by LLM rate limits. Engram will backoff and retry, but sustained failures require requesting higher quotas from OpenAI/Anthropic. |
| Missed Retrieval | Monitor the missing_expected_terms list in the RecallTrace object. |
Tune your MemoryPolicy to pin facts to critical_slots so they bypass vector math entirely. |
| Vector Index Wipes | Application crash on boot with ConfigurationError. |
You changed the embedding model dimension on an existing DB. You must export ENGRAM_ALLOW_EMBEDDING_DIMENSION_CHANGE=true and manually trigger a re-embedding script. |
7. FastAPI Integration Sketch¶
This is the recommended way to manage the Engram connection pool inside a modern async Python web framework.
from contextlib import asynccontextmanager
from fastapi import FastAPI
from engram import Engram
# Global client
engram: Engram | None = None
@asynccontextmanager
async def lifespan(app: FastAPI):
global engram
# Initialize the client and DB connection pool on startup
engram = Engram(memory_policy="default")
await engram.connect()
try:
yield
finally:
# Drain connections cleanly on shutdown
await engram.close()
app = FastAPI(lifespan=lifespan)
@app.post("/chat")
async def chat(body: dict, current_user: User = Depends(get_user)):
assert engram is not None
# Build a prompt-ready context block (agent_id is required)
trace = await engram.trace_recall(
query=body["message"],
agent_id="assistant",
user_id=current_user.id,
)
memory_context = trace.context
# ... Stream LLM, record_turn, etc ...