From upload to insight in under five minutes.
A full technical walkthrough of the VyXlo Content Service Platform — from OIDC authentication through AI extraction, dual-mode search, real-time collaboration, approval workflows, and immutable audit — deployed across seven containerized services with documented non-functional requirements.
API Response Latency
< 200ms p95 SYNCHRONOUS
Active Node 01
V.X-900
// Layered Architecture
System Architecture
VyXlo is organized into three horizontal layers. The Client Layer covers any consumer of the platform API — the reference Next.js web application, partner-built UIs, mobile applications, and automated integration pipelines. All communication to the platform traverses HTTPS, with WebSocket upgrade for real-time presence channels and Server-Sent Events for token-streaming AI responses.
The API Layer is a single FastAPI application running on Python 3.12 with Uvicorn ASGI. It handles all 64 REST endpoints plus the WebSocket endpoint and SSE streaming responses. The same process hosts a Celery Worker and Celery Beat scheduler in separate containers — all three share the codebase image, configured via environment variables to run different entry points. Flower provides a real-time Celery task monitoring dashboard at port 5555.
The Data Layer is composed of four independent services: PostgreSQL 16 with pgvector for all persistent data and both search modes; Redis 7 for session/permission caching and as the Celery task queue + result backend; MinIO for S3-compatible object storage with presigned URL generation; and ZITADEL as the external identity provider for OIDC/PKCE token issuance, SSO, MFA, and SAML/LDAP federation.
PostgreSQL 16 + pgvector
Primary data store for all persistent records. The pgvector extension adds HNSW vector indexes for semantic search alongside standard tsvector full-text indexes. Async access via asyncpg + SQLAlchemy 2.0.
Redis 7 + Celery
Dual role: in-memory cache for user sessions and permission lookups (reducing DB hits on hot paths), and message broker + result backend for the Celery task queue. Rate limit counters also stored in Redis via slowapi.
MinIO + ZITADEL
MinIO provides S3-compatible blob storage for all uploaded files with server-side encryption and 15-minute presigned URL generation. ZITADEL handles all identity operations — no credentials are stored in VyXlo's database.
// Identity Layer
ZITADEL OIDC / PKCE Authentication
VyXlo delegates all identity operations to ZITADEL, an enterprise-grade open-source identity platform. VyXlo stores no passwords — authentication uses the OAuth 2.0 Authorization Code flow with PKCE (Proof Key for Code Exchange), which is the secure-by-default standard for public clients including single-page applications and mobile apps.
ZITADEL supports multiple authentication mechanisms out of the box: username/password with TOTP or SMS MFA; social login via Google, GitHub, and Microsoft; passkeys and WebAuthn for passwordless authentication; enterprise SSO via SAML 2.0; and LDAP/Active Directory federation for organizations with existing identity infrastructure.
The OIDC discovery document at /.well-known/openid-configuration is fetched by the API server at startup to resolve signing keys. Every inbound request with a Bearer token is validated: signature checked, expiry verified, and organization_id extracted from the JWT claims. Token validation never touches ZITADEL on the hot path — it uses cached public keys with TTL refresh.
Connect your existing SAML 2.0 IdP (Okta, Azure AD, Ping) or LDAP/AD to ZITADEL. Users authenticate against their corporate credentials; ZITADEL issues OIDC tokens to VyXlo. Zero new passwords to manage.
Use ZITADEL's hosted login page or embed authentication using ZITADEL's component library. Supports MFA, passkeys, and social login with your custom branding.
For automated pipelines and backend-to-backend integrations, ZITADEL issues machine credentials using the OAuth 2.0 client credentials flow. No user interaction required.
Step 1: Token Acquisition
Client initiates PKCE flow with code_challenge. ZITADEL authenticates user, returns authorization code. Client exchanges code + verifier for access_token + id_token.
Step 2: API Call
Authorization: Bearer <JWT>
FastAPI middleware validates signature using ZITADEL's cached JWKS. Extracts sub, org_id, role from claims.
Step 3: Authorization
Four-layer check: valid JWT → org_id scope → minimum role level → resource permission (NONE→ADMIN). All layers enforced on every request.
// System Architecture
The Six-Stage Processing Pipeline
Every document uploaded to VyXlo traverses six discrete pipeline stages. Stages 01–03 are executed automatically and asynchronously by the Celery worker immediately after upload. Stages 04–06 represent ongoing operational capabilities available throughout the document's active life.
Ingestion
Two-step upload protocol: first POST /documents to create a metadata record and receive a document ID, then POST /documents/{id}/upload to stream raw file bytes directly to MinIO object storage. The API server never buffers file content in memory — the upload is streamed byte-for-byte to the MinIO S3-compatible bucket, eliminating memory pressure regardless of file size. Supported formats include PDF, DOCX, XLSX, PPTX, images, and plain text. Per-organization size limits are configurable in the admin dashboard. Every upload automatically creates a new document version, preserving the full version history forever — prior versions remain downloadable even after the document is updated.
Extraction
Upload completion triggers a Celery asynchronous task that runs the complete 7-step AI pipeline without blocking the API response. Text is extracted using pdfplumber for PDF files (preserving layout and table structure) or python-docx for DOCX files. The extracted text is passed to LangChain with the configured LLM provider (OpenAI GPT-4, Anthropic Claude, Google Gemini, or Ollama for air-gapped deployments). The LLM classifies the document into one of the supported categories (Legal, Financial, Medical, Logistics, Engineering, HR, Procurement, Real Estate, Compliance, R&D) and returns a confidence score between 0 and 1. Summarization, keyword extraction (up to 20 terms), and named entity extraction (people, organizations, dates, locations) follow as separate LLM calls. All results are stored as JSON fields on the Document record: ai_classification, ai_confidence, ai_summary, ai_keywords, and ai_entities.
Embedding
After extraction, the document text is split into overlapping chunks using tiktoken — the same tokenizer used by OpenAI models — to ensure chunks never exceed the model's context window. Each chunk is independently embedded into a 1536-dimension dense vector using the configured embedding model (text-embedding-3-small by default). Vectors are stored in PostgreSQL via the pgvector extension with an HNSW (Hierarchical Navigable Small World) index, which delivers approximate nearest-neighbor queries in sub-millisecond time even at millions of vectors. The chunk_index_status field transitions from PENDING → INDEXING → INDEXED or ERROR, allowing the frontend to show real-time progress. Once indexed, the document is queryable via semantic cosine-distance search — finding conceptually related content across the entire organization corpus without requiring exact keyword matches.
Interaction
Every indexed document supports two independent search modes operating in parallel. Full-text search uses PostgreSQL tsvector weighted indexes to execute BM25-ranked keyword queries with folder scoping, status filtering, and document type filtering — sub-10ms at scale. Semantic search uses pgvector cosine-distance similarity against the document's chunk embeddings to find conceptually related content across all indexed documents in the organization. For document Q&A, POST /chat/document/{id} initiates a Retrieval-Augmented Generation (RAG) session: the user's question is embedded, the top-K most semantically relevant chunks are retrieved from pgvector, assembled into a context window, and passed to the LLM. The LLM response is streamed back token-by-token via Server-Sent Events (SSE), giving users real-time token streaming in the UI. The final done event in the SSE stream carries citations — the exact chunk IDs and document sections that informed the answer.
Workflow
Documents progress through structured approval chains before reaching PUBLISHED status. Workflows support sequential and parallel step configurations: sequential steps must complete one at a time in order, while parallel steps all activate simultaneously and require unanimous approval before the workflow advances. Each step can be assigned to an individual user or an entire department — in department-assigned steps, any member of the department with sufficient role level may approve. Approvers must provide a reason when rejecting a step, which is surfaced to the document owner along with email notification. Deadline tracking with SLA-based escalation ensures overdue approvals trigger automatic escalation to the assignee's manager. Full workflow cancellation is available to document owners and ADMINs at any time. Workflow state changes update the document status: PENDING_APPROVAL while active, APPROVED on completion, and back to IN_REVIEW on rejection.
Audit
Every action that mutates or accesses protected resources generates an immutable audit record — there is no way to bypass this at the API level. The audit log captures: actor (user ID, email, role), action type (one of CREATE, UPDATE, DELETE, ACCESS, DOWNLOAD, PERMISSION_CHANGE, WORKFLOW, EXPORT), target resource (resource type + ID), timestamp (ISO 8601 with timezone), IP address, HTTP method and path, before/after JSON diff for UPDATE events, and the outcome (success or failure with error code). Audit records are append-only — once written, they cannot be edited or deleted by any role including SUPER_ADMIN. Administrators can query and export the full audit log filtered by date range, actor, event type, or resource. Retention policies per organization control how long audit records are kept before archival.
// State Machine
Document Lifecycle
Every document in VyXlo follows a strictly validated status lifecycle. State transitions are enforced server-side — invalid transitions are rejected with a descriptive 409 Conflict error. Every transition is captured in the immutable audit log with the actor, timestamp, and reason.
Documents begin in DRAFT state upon creation. The owner submits for review, transitioning to IN_REVIEW. When a workflow is attached and activated, the document advances to PENDING_APPROVAL. Successful completion of all approval steps transitions to APPROVED. The owner or an administrator may then publish the document — making it visible to all org members with READ permission or above — transitioning to PUBLISHED. Finally, documents reaching the end of their useful life are moved to ARCHIVED, where they remain readable but appear in a separate archive view and are subject to retention policy enforcement.
Rejection at any workflow step returns the document to IN_REVIEW with the rejection reason surfaced to the owner. Documents can be returned to DRAFT from IN_REVIEW or APPROVED states. Soft-delete is a separate status flag and does not affect the lifecycle state — soft-deleted documents retain their last lifecycle status and are recoverable by ADMINs.
Document created. AI pipeline runs. Owner editing permitted. Not visible to other users beyond those with explicit permission.
Sent for review. Read access granted to reviewers. Workflow may be attached and activated.
Active approval workflow. Each step must be approved sequentially or in parallel. Rejection returns to IN_REVIEW.
All workflow steps passed. Document ready for publication. Approvers notified.
Organization-visible. All users with org READ permission or above can discover and view.
End-of-life state. Readable but not editable. Retention policy enforced by Celery Beat scheduler.
// Live Collaboration
Real-Time Layer: WebSocket + SSE
WebSocket Presence Channel
When a user opens a document, the frontend upgrades to a WebSocket connection on WS /ws/documents/{id}. The server maintains a presence map for that document and broadcasts join/leave/cursor events to all connected clients in real time. Document locking is also mediated through the WebSocket: a client requests a write lock via the lock_request message; the server responds with lock_acquired or lock_denied, and broadcasts the lock state to all presence channel members so they see the "locked by X" indicator immediately.
# WebSocket message types
presence_join { user_id, name, avatar }
presence_leave { user_id }
cursor { user_id, position }
lock_acquired { user_id, expires_at }
lock_released { user_id }
SSE Token Streaming
Document Q&A uses Server-Sent Events to stream the LLM response token-by-token to the browser. The client sends a question to POST /chat/document/{id} which immediately opens an SSE stream. Tokens arrive as data: {token} events, giving the user real-time feedback as the AI formulates its answer. The final event type is done and carries the full citations array — the specific chunk IDs and source passages from the document corpus that informed the answer. Chat sessions persist server-side, enabling multi-turn document conversations.
# SSE stream events
data: "The Q4 revenue"
data: " grew by 18%"
data: " year-on-year"
event: done
data: {"citations": [chunk_id: 14, 22]}
// Infrastructure
Seven-Container Deployment
The VyXlo platform ships as a Docker Compose stack with seven containers. Three containers share the same application image — api, celery_worker, and celery_beat — differentiated purely by their entry point command. The remaining four containers are off-the-shelf infrastructure images pinned to specific versions.
The entire stack launches with a single docker compose up -d command. Health checks on each container ensure the API server only starts after PostgreSQL and Redis are ready. Database migrations run automatically via an Alembic upgrade head in the API container entrypoint.
The Docker Compose specification is Kubernetes-ready — the service definitions translate directly to Kubernetes Deployments, with MinIO replaceable by any S3-compatible service (AWS S3, GCS, Azure Blob) and ZITADEL deployable on the same cluster or consumed as ZITADEL Cloud SaaS. Horizontal scaling of the API and Celery worker containers is supported; PostgreSQL and Redis use standard cloud-managed variants in production deployments.
| Container | Port | Role |
|---|---|---|
| api | 8000 | FastAPI application server — REST, WebSocket, SSE |
| celery_worker | — | AI pipeline, email dispatch, file cleanup tasks |
| celery_beat | — | Scheduled tasks: lock expiry, link expiry, digests |
| flower | 5555 | Celery task monitoring and management dashboard |
| postgres | 5432 | Primary data store with vector extension |
| redis | 6379 | Session cache, task queue, rate limit counters |
| minio | 9000/9001 | S3-compatible object storage + admin console |
Key Environment Variables
| Variable | Example | Description |
|---|---|---|
| DATABASE_URL | postgresql+asyncpg://user:pass@postgres:5432/vyxlo | Async SQLAlchemy connection string |
| REDIS_URL | redis://redis:6379/0 | Redis connection for cache and task broker |
| MINIO_ENDPOINT | minio:9000 | MinIO server host:port |
| MINIO_ACCESS_KEY | minioadmin | MinIO access key |
| MINIO_SECRET_KEY | •••••••• | MinIO secret key |
| MINIO_BUCKET | vyxlo-documents | Bucket name for all uploaded files |
| ZITADEL_DOMAIN | auth.example.com | ZITADEL issuer domain for OIDC discovery |
| ZITADEL_CLIENT_ID | 123456789@vyxlo | OAuth 2.0 client ID registered in ZITADEL |
| OPENAI_API_KEY | sk-… | OpenAI key for classification + embeddings |
| ANTHROPIC_API_KEY | sk-ant-… | Anthropic key for summarization + Q&A |
| LLM_PROVIDER | openai | Active LLM backend: openai | anthropic | gemini | ollama |
| CELERY_BROKER_URL | redis://redis:6379/1 | Celery task broker (Redis DB 1) |
| SECRET_KEY | •••••••• | Application secret for token signing |
| CORS_ORIGINS | https://app.example.com | Comma-separated allowed CORS origins |
| MAX_UPLOAD_SIZE_MB | 100 | Per-upload size cap (org-level override available) |
| PRESIGNED_URL_EXPIRY_S | 900 | MinIO presigned download URL TTL in seconds |
// Search Architecture
Dual-Mode Search Engine
Full-Text Search (tsvector)
PostgreSQL's native full-text search using GIN-indexed tsvector columns. Supports BM25-ranked keyword queries with phrase matching, prefix matching, and stemming. Queries can be scoped by folder, filtered by document status, filtered by document type, and sorted by relevance or date. Results return document metadata, AI summary, and matched keyword highlight snippets. Typical latency under 10ms on a corpus of 100,000+ documents.
GET /documents/search?q=annual+report&status=PUBLISHED
&folder_id=5&doc_type=FINANCIAL
Semantic Search (pgvector)
The user's query is embedded into a 1536-dimension vector using the same embedding model used during document indexing. pgvector computes cosine similarity between the query vector and all indexed document chunk vectors using the HNSW approximate nearest-neighbor index. Results surface conceptually related documents even when no exact keywords are shared — finding "revenue projections" when the user queries "sales forecast", for example. The API returns the top-K results sorted by cosine similarity score.
POST /documents/semantic-search
{ "query": "quarterly sales forecast", "top_k": 10 }
// Performance Characteristics
Non-Functional Requirements
VyXlo publishes documented non-functional targets — not aspirational marketing claims, but verified benchmarks with measurement methodology. The API p95 latency target of < 200ms applies to all synchronous endpoints. AI processing tasks run asynchronously via Celery and complete the full 7-step pipeline (extraction → classification → summarization → keywords → entities → embedding → chunk indexing) in under five minutes for documents up to 100 pages.
The test suite covers 327+ test cases including unit tests for service layer functions, integration tests against a real PostgreSQL + Redis + MinIO stack, and end-to-end API tests. Line coverage is maintained at or above 72%. Prometheus metrics export request latency histograms, error rates, Celery task throughput, and queue depth — compatible with Grafana dashboards and any alerting platform that supports the Prometheus scrape protocol.
| Metric | Target | Scope |
|---|---|---|
| API p95 response time | < 200 ms | Non-AI synchronous endpoints |
| AI pipeline (async) | < 5 min | Celery task from upload to INDEXED |
| SSE first token latency | < 2 s | RAG Q&A first streamed token |
| Search query latency | < 50 ms | tsvector + pgvector combined |
| Test suite | 327+ cases | Unit + integration + E2E |
| Line coverage | ≥ 72% | Backend Python (measured via pytest-cov) |
| Presigned URL TTL | 15 min (900 s) | MinIO download token expiry |
| Document lock expiry | 1 hour | Exclusive write-lock auto-release |
// Partner Integration
Integration Patterns
API-First Integration
Consume the full 64-endpoint REST API from any language. All endpoints are documented with OpenAPI 3.1 at /docs (ReDoc) and /openapi.json. Authenticate using ZITADEL service accounts for backend integrations, or PKCE flows for user-facing applications. SDKs for Python and TypeScript are available.
Embedded Document Layer
Use VyXlo as the document intelligence backend for your existing platform. Upload documents via API, consume AI extraction results, and render document search and Q&A within your own UI. White-label friendly — org-level branding, custom domains, and CORS configuration per deployment.
Event-Driven Pipelines
Build automated document pipelines: ingest from external sources, trigger classification via API, read AI results, route to approval workflows, and archive on completion. The Celery task queue and Flower monitoring dashboard give full visibility into async pipeline health.
See it in action.
Deploy in minutes.
Full Docker Compose stack · ZITADEL OIDC included · OpenAPI docs at /docs