Operations
How to run a StaticOwl deployment in production: env vars, secrets, deploy scripts, monitoring, common operational tasks.
For day-to-day "I need to fix X" runbook-style content, see RUNBOOK.md (parent project) and TROUBLESHOOTING.md. This doc focuses on the CMS server itself.
Environment variables
Set on the CMS host. Sources:
ecosystem.config.cjsenv block (PM2 — primary mechanism on EC2)- SSM Parameter Store (preferred for secrets — pulled at deploy time)
- Local
.envfor development
Required
| Var | Purpose |
|---|---|
GRAPHIQUITY_ENDPOINT |
Graph engine base URL (e.g., https://api.graphiquity.com) |
GRAPHIQUITY_API_KEY |
Graph-level engine API key (gq_…). Pulled from SSM in production. |
USER_POOL_ID |
Cognito user pool id (e.g., us-east-1_tmdwfgcPz) |
USER_POOL_CLIENT_ID |
Cognito app client id |
AWS_REGION |
AWS region — us-east-1 is the default deployment |
DYNAMO_TABLE |
DynamoDB table for CMS admin data (e.g., Graphiquity) |
Deploy mode + targets
| Var | Default | Purpose |
|---|---|---|
STATICOWL_DEPLOYMENT_MODEL |
new |
legacy / dual / new — controls migration path |
STATICOWL_DEPLOY_TARGET |
static-paths |
static-paths / manifest-pointer / github / both |
Per-target vars:
static-paths: nothing extra (defaults to S3 + the shared bucket)
manifest-pointer:
STATICOWL_ARTIFACTS_BUCKET(defaultstaticowl-artifacts)STATICOWL_MANIFESTS_BUCKET(defaultstaticowl-manifests)STATICOWL_BUILDLOGS_BUCKET(defaultstaticowl-buildlogs)STATICOWL_REGION(defaultus-east-1)
github:
STATICOWL_GITHUB_TOKEN(required) — PAT withcontents: writeSTATICOWL_GITHUB_REPO(required) —owner/nameshapeSTATICOWL_GITHUB_BRANCH(defaultmain)STATICOWL_GITHUB_AUTHOR_NAME(defaultStaticOwl)STATICOWL_GITHUB_AUTHOR_EMAIL(defaultbot@staticowl.com)
See deploy-targets.md for the canonical reference.
AI providers (optional, unlocks AI features)
| Var | Purpose |
|---|---|
ANTHROPIC_API_KEY |
Claude (drafting, derive, bulk rewrite, site health) |
OPENAI_API_KEY |
GPT chat + DALL-E / gpt-image-1 |
REPLICATE_API_TOKEN |
Image gen / edit / upscale (Flux, SDXL, rembg, Real-ESRGAN) |
FAL_API_KEY |
fal.ai image gen (faster Replicate alternative) |
PEXELS_API_KEY |
Pexels stock search |
UNSPLASH_ACCESS_KEY |
Unsplash stock search |
PIXABAY_API_KEY |
Pixabay stock search |
If empty/unset, the corresponding feature is disabled in the UI; the assistant routes around them.
Storage + sandboxing
| Var | Default | Purpose |
|---|---|---|
MEDIA_DIR |
/opt/graphiquity/data/cms-media |
Where uploaded media lives |
GCMS_LAMBDA_MODE |
unused | Legacy env var; the runner now always uses isolated-vm. Retained for back-compat reads but ignored. |
MAX_QUERY_TIMEOUT_MS |
120000 |
Wall-clock timeout for any single query |
Secrets management
All secrets live in SSM Parameter Store as SecureString parameters. ecosystem.config.cjs contains only process.env.<NAME> placeholders — never plaintext keys. deploy-cms.sh pulls each parameter via aws ssm get-parameter --with-decryption immediately before pm2 reload --update-env, so the secret only lives in the running process's env, not in any disk file.
Parameter paths
| Param | Required | Purpose |
|---|---|---|
/graphiquity/cms/api-key |
yes | Engine API key (gq_…) |
/graphiquity/cms/ai/anthropic-api-key |
optional | Claude features |
/graphiquity/cms/ai/openai-api-key |
optional | GPT chat + DALL-E |
/graphiquity/cms/ai/replicate-token |
optional | Image gen / upscale |
/graphiquity/cms/ai/fal-key |
optional | fal.ai image gen |
/graphiquity/cms/ai/pexels-key |
optional | Pexels stock |
/graphiquity/cms/ai/unsplash-key |
optional | Unsplash stock |
/graphiquity/cms/ai/pixabay-key |
optional | Pixabay stock |
/graphiquity/cms/github/token |
only if STATICOWL_DEPLOY_TARGET=github |
GitHub PAT (contents: write) |
First-time setup
# 1. Set values in your shell
export GRAPHIQUITY_API_KEY=gq_yourkey
export ANTHROPIC_API_KEY=sk-ant-...
# ...etc
# 2. Seed SSM
./scripts/seed-ssm-secrets.sh
# 3. Deploy
./deploy-cms.sh
Rotation
# Just put-parameter with --overwrite, then deploy.
aws ssm put-parameter \
--name /graphiquity/cms/ai/anthropic-api-key \
--value sk-ant-newkey \
--type SecureString --overwrite \
--profile signonix --region us-east-1
./deploy-cms.sh # reload pulls the new value
GitHub PAT specifically
The STATICOWL_GITHUB_TOKEN is the highest-blast-radius secret because it can contents: write on a customer's repo. The publisher redacts it from any error message or log line — verified in tests. Rotate on suspicion.
Deploy scripts
CMS deploy
deploy-cms.sh— primary deploy script (pulls SSM secrets → tars source → pushes to S3 → SSM RunCommand →pm2 reload). Lives at the parent project root.pm2 reload ecosystem.config.cjs --update-env— zero-downtime reload (the new process pre-loads, signals ready, old process drains, exits)
Engine deploy (separate concern)
deploy-engine.sh— engine + reader/writer split (in parent project). Updates the graph engine that the CMS depends on.
CI parity
npm run ci locally runs npm run build && npm test. The GitHub Actions workflow at .github/workflows/ci.yml runs the same on every push / PR.
To get the E2E suite in CI, add the Playwright steps from tests/e2e/README.md.
Monitoring
Structured logs
The CMS server emits one JSON line per event. Fields:
{
"ts": "2026-04-29T12:00:00.000Z",
"level": "info | warn | error",
"component": "compile | release-service | review-gate | ...",
"msg": "build started",
"siteId": "site:...",
"envId": "...",
"requestId": "...",
"..."
}
Pipe to CloudWatch / Datadog / Elasticsearch as-is.
Metrics endpoint
GET /api/health — basic liveness, includes lastBuildAt and lastBuildId.
If the parent engine project's /metrics endpoint is mounted, you'll also have Prometheus-format metrics (heap, RSS, query counts, latency p50/p95/p99). See parent project's graph_api.js.
Things worth watching
- Memory pressure — the server sheds requests with 503 under critical pressure (5s sample interval). If you see 503s, time to bump RAM or scale out.
- Query timeout (504) — default 120s. Long-running deploys / replays should be backgrounded if they routinely exceed.
- Deploy gate 409s — high counts mean Reviews are stale; investigate dependency hash drift.
- GitHub publisher errors — token rotation, branch protection, repo size growth.
Common operational tasks
Add an environment to a site
PUT /api/releases/environments/env:staging
{
"name": "Staging",
"order": 1,
"autoBuild": true,
"requiresApproval": true,
"publicUrlOverride": "https://staging.example.com"
}
Roll forward to a specific Release
POST /api/releases/:id/deploy
{ "environmentId": "env:prod", "intent": "deploy" }
Roll back
POST /api/releases/deployments/:currentDeploymentId/rollback
{ "rollbackToReleaseId": "release:older-one" }
Atomic — single S3 pointer write under manifest-pointer. Audit-stamped (creates a new Deployment fact, marks the prior as superseded).
Force a re-review
Bumping a Versionable's version invalidates dependent Reviews via the dependency-hash check. Or directly:
POST /api/releases/:id/transition
{ "state": "pending" }
Rotate the engine API key
- Mint new key:
gq_new... - Update SSM:
aws ssm put-parameter --name /staticowl/graphiquity-api-key --value gq_new... --overwrite ./deploy-cms.sh— pulls new secret + reloads- After verifying, revoke old key
Rotate GitHub PAT
- Mint new PAT in GitHub UI with
contents: writeon target repos - Update SSM:
aws ssm put-parameter --name /staticowl/github-token --value ghp_new... --overwrite pm2 reload ecosystem.config.cjs --update-env- Force a deploy to verify the new token works
- Revoke old PAT
Check what's queued
GET /api/releases/queue?envId=env:prod&limit=100
Lists future-dated Deployments ordered by validFrom ascending — the drip-publishing review surface.
Backup + DR
The parent Graphiquity project ships a 4-layer DR architecture:
- DLM snapshots (EBS-level, automatic)
- Per-graph S3 backups (
POST /graphs/:name/backup→ tar+gzip to S3) - Cross-account replication to a DR account
- EC2 recovery alarms
The CMS itself is largely stateless — its data lives in the engine. CMS-specific concerns:
- Media uploads —
MEDIA_DIRcontent-addressed by SHA. Backed up nightly viaaws s3 sync. - DynamoDB management table — PITR enabled, daily backup.
- PM2 logs — rotated; older logs shipped to S3.
See parent project's docs/backup-strategy.md (or memory entry project_backup_strategy.md) for the full DR shape.
Multi-tenant routing
Tenancy lives at three layers:
- Engine — Graphiquity hosts N graphs; engine API keys are graph-scoped
- CMS server — One CMS server can serve many tenants by routing per
X-Site-Idheader - Per-tenant EC2 isolation (planned) —
graph_router.json EC2-A fronts reader/writer processes; high-value tenants get dedicated EC2s. Path to ECS/EKS.
Default deployment is single-CMS-many-sites. For high-value enterprise customers, consider per-tenant EC2 with the same CMS image; route via graph_router.js.
Known operational issues + mitigations
Rate-limit spam from /api/auth/me-style endpoints
The platformContext middleware queries the engine on every request to resolve req.platformUser. With 3+ queries per request and a busy admin session, customers can hit the engine's 300/60s rate limit.
Fix in flight: a 30s in-memory cache on getOrCreateUser keyed by Cognito sub. ~15 lines. Already designed; pending implementation.
Lifecycle hook RCE risk — FIXED
The runner now uses isolated-vm (a separate V8 isolate, no shared heap) by default. Hostile transform code cannot reach process / require / fs / Node intrinsics — verified by isolation tests in packages/core/src/__tests__/transform-runner.test.ts. The GCMS_LAMBDA_MODE env var is kept for back-compat but ignored.
Plaintext secrets in ecosystem.config.cjs — FIXED
The config file now reads everything via process.env.<NAME>; deploy-cms.sh pulls each value from SSM as a SecureString before reload. See "Secrets management" above.
CMS UI auth bypass risk
The CMS MCP today runs with a graph-level engine API key. Treat the MCP credential like the engine API key it wraps. The proper fix is agent-scoped tokens — see architecture.md → scoped agent tokens.
See also
- Getting started — local dev setup
- Deploy targets — env-var contract per target
- Architecture — what the moving pieces are
- API reference — every
/api/*endpoint - Parent project's
RUNBOOK.mdandTROUBLESHOOTING.mdfor engine-side concerns