Troubleshooting & Support
Resolution protocols for common infrastructure anomalies, HTTP error codes, performance characteristics, and direct engineering support.
warningHTTP Error Reference
OIDC JWT is missing, expired, or has an invalid signature. All non-PUBLIC endpoints require Authorization: Bearer <token>. Validate that ZITADEL_ISSUER and ZITADEL_AUDIENCE are correctly set and that the token has not expired.
The authenticated user does not have the required resource-level permission or system role for this action. Check the user's effective permission on the resource via GET /api/v1/permissions/my/document/{id}.
The requested resource does not exist, has been soft-deleted, or belongs to a different organization. Soft-deleted resources are not returned in standard queries — only an ADMIN can recover them.
Another user holds the exclusive write-lock on this document. Locks auto-expire after 1 hour. Check locked_by_id and locked_at fields on the document object. ADMIN can forcibly release the lock via POST /documents/{id}/unlock.
The file exceeds the organization's configured upload size limit. File bytes are streamed directly to MinIO — the API server never buffers file content in memory. Increase the org storage quota via PUT /organizations/me or split large files.
The request body or query parameters failed Pydantic v2 validation. The response detail array lists each failing field with the specific constraint violated.
The request was rejected by slowapi rate limiting. Authentication and sensitive endpoints have stricter limits. The Retry-After header indicates when the rate limit window resets.
Semantic search exceeded the maximum allowed latency. This typically indicates missing or sub-optimal HNSW indexing on large vector collections. Run EXPLAIN ANALYZE against the embedding similarity query and rebuild the index.
Performance Characteristics
Non-functional requirements and system guarantees under normal production load.
| Characteristic | Detail |
|---|---|
| API Response Time | < 200 ms p95 for all non-AI endpoints under normal load |
| File Upload | Streamed directly to MinIO — API server does not buffer file bytes in memory |
| File Download | Presigned URLs (15-min expiry) — zero API server bandwidth for file content |
| AI Processing | Classification and summarization complete within 5 minutes of upload (async Celery) |
| WebSocket | Persistent connection per document session; heartbeat-based keep-alive |
| Permission Cache | Redis cache for permission lookups; invalidated immediately on mutation |
| Rate Limiting | Configurable per-endpoint rate limits via slowapi; strictest on auth endpoints |
| Test Coverage | 327+ tests; ≥ 72% line coverage across the backend service layer |
| DB Migrations | Alembic-managed; zero-downtime via expand/contract migration pattern |
| Scalability | API and Celery worker tiers are stateless and horizontally scalable |
| Observability | Structured JSON logs (structlog); Prometheus metrics; Celery monitoring via Flower (:5555) |
Performance Tuning
PostgreSQL / pgvector and infrastructure optimization recommendations.
HNSW vs IVFFlat Index
Use HNSW for production (better recall on semantic queries). Use IVFFlat for faster initial index builds on large datasets during migration.
Worker Parallelism
Set max_parallel_workers_per_gather to 50% of CPU cores for vector similarity operations. Monitor via EXPLAIN ANALYZE.
Redis Eviction Policy
Set maxmemory-policy to allkeys-lru. Permission cache entries are re-populated on the next request — brief cache misses are non-fatal.
Celery Concurrency
Scale celery_worker replicas independently from the API tier. AI processing is CPU-bound; allocate dedicated worker pods for large document volumes.
SET hnsw.ef_search = 100;
-- Rebuild index concurrently (zero downtime)
CREATE INDEX CONCURRENTLY ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Verify query plan
EXPLAIN ANALYZE SELECT id, title
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;
Health Endpoints
Use for load balancer health checks and Kubernetes liveness/readiness probes.
/api/v1/healthLiveness probe/api/v1/health/readyReadiness probeDirect Support
Cannot resolve through documentation? Open a high-priority ticket with our core engineers.
confirmation_numberOpen Support TicketTechnical Resources
/api/v1/healthLiveness/api/v1/health/readyReadiness/metricsPrometheus:5555Flower (Celery)/api/docsSwagger UI