Platform Capabilities

Every feature built for
zero-error environments.

VyXlo CSP is not a folder system with AI bolted on. Every capability was designed from first principles to handle institutional-grade document workflows — with a 7-step AI processing pipeline, an 8-level permission model applied independently per resource, immutable audit trails, and a 64-endpoint REST API that exposes every operation programmatically.

64
REST Endpoints
8
Permission Levels
7
AI Pipeline Steps
< 200ms
p95 API Latency
01psychology

AI Extraction Suite

Every document uploaded to VyXlo runs through a seven-step asynchronous AI processing pipeline executed by Celery workers. The pipeline is triggered automatically on upload (configurable via AI_PROCESS_ON_UPLOAD) or on-demand via the API. Classification and summarization complete within five minutes of upload, after which AI fields are permanently stored on the document object.

The AI layer is provider-agnostic. VyXlo routes through LangChain, allowing you to swap between OpenAI (GPT-4 + Embeddings), Anthropic (Claude), Google Gemini, or fully local Ollama models for air-gapped deployments — with no code changes required. Toggle the feature per organization via the ENABLE_AI_FEATURES flag.

Deep Dive: AI Engine arrow_forward
1
Text ExtractionPDF content extracted via pdfplumber; DOCX via python-docx; raw text for plain text files. Token counts tracked via tiktoken for downstream LLM context management.
2
ClassificationLLM assigns a document type category (FINANCIAL_REPORT, CONTRACT, LEGAL, HR, etc.) with a floating-point confidence score (0–1). Stored as ai_classification and ai_confidence on the document.
3
SummarizationLLM generates a concise natural-language summary of the full document. Stored as ai_summary. Used in search result previews and RAG context construction.
4
Keyword ExtractionThe most relevant domain-specific terms are extracted as a string array (ai_keywords). These keywords are indexed alongside full-text content for enhanced search recall.
5
Entity ExtractionNamed entity recognition identifies people, organizations, dates, and geographic locations. Results stored as the ai_entities JSONB field and filterable via search API.
6
Embedding GenerationA 1536-dimension vector embedding is generated for the full document text using the OpenAI Embeddings model. Stored in pgvector for cosine-distance semantic search across the organization.
7
Chunk IndexingDocument text is split into overlapping chunks, each embedded individually and stored in pgvector. Enables Retrieval-Augmented Generation (RAG): the chat endpoint retrieves the most relevant chunks as context for LLM Q&A responses.
02description

Document Management

VyXlo treats every document as a first-class entity with a controlled lifecycle, full version history, and comprehensive metadata model. Documents are never destroyed silently — soft-delete architecture ensures every file remains recoverable by an administrator regardless of user action.

Every document carries: title, description, document type, department, folder assignment, custom JSONB key-value pairs, tags, language, page count, word count, and a SHA-256 checksum stored at upload time. The checksum is validated on subsequent downloads to detect file corruption or tampering at rest. Usage analytics — view count, download count, and comment count — are tracked automatically and returned in every document API response.

Documents move through a validated status lifecycle: DRAFT → IN_REVIEW → PENDING_APPROVAL → APPROVED → PUBLISHED → ARCHIVED. Every status transition is validated server-side and rejected with a descriptive error if invalid. Each change generates an immutable audit record. Administrators can also set per-document retention dates, enforced automatically by a Celery Beat scheduled task.

Version History

Every upload to an existing document creates a new numbered version. All prior versions remain downloadable indefinitely — versions are never deleted. Version metadata (size, checksum, upload timestamp) is tracked per version.

Document Locking

Any user with edit permission can claim an exclusive write-lock via POST /documents/{id}/lock. The lock prevents concurrent edits by returning 409 Conflict to others. Locks auto-expire after one hour; administrators can release any lock manually.

Soft Delete & Recovery

DELETE /documents/{id} performs a soft delete — the document becomes invisible in listings and search results but remains in the database. Administrators can recover any soft-deleted document. No data is permanently destroyed without an explicit, intentional ADMIN action.

Retention Policies

Administrators set per-document retention_until dates via the API. A Celery Beat scheduled task enforces expiry automatically — when a document passes its retention date, it is flagged accordingly and no longer served in standard queries.

Folder Hierarchy

Arbitrary-depth folder trees scoped per organization, stored using materialized path notation (/1/5/12/) for efficient subtree queries. Move operations include cycle detection. Deleting a folder cascades soft-deletes to all children and their documents.

Rich Metadata & Tags

Documents support custom JSONB key-value metadata for any domain-specific fields beyond the standard schema. A tag library (per-organization, with autocomplete API) allows multi-tag assignment and removal in a single operation. Filter by tag in both full-text and semantic search.

Starring & Bookmarks

Users can star any accessible document. The GET /documents/starred endpoint returns the current user's bookmarked list paginated. Starred status is included in the document object response (is_starred field) for UI state management.

Share Links

Generate cryptographically signed, time-limited share tokens for external access. Optional password protection and optional email restriction (only the specified address may use the link). Access analytics per link. No user account required to access a share link (configurable per link). Token validation always hits the database directly — never cached.

03manage_search

Dual Search Architecture

Full-Text Search

Powered by PostgreSQL tsvector index — the same engine that powers production-grade search at scale. Full-text queries match on document title, description, extracted AI keywords, and body text simultaneously. Support for folder scoping (restrict search to a subtree), status filtering (search only APPROVED documents), and document type filtering (show only FINANCIAL or LEGAL documents).

Results are ranked by relevance using PostgreSQL's built-in ts_rank function. Paginated response with standard page and size parameters. Every result includes the full document object with AI fields, so no secondary fetch is required to display classification or summary in search results.

GET /api/v1/search ?q=quarterly+financial+report &folder_id=17 &status=APPROVED &document_type=FINANCIAL &page=1&size=20

Semantic Vector Search

Powered by pgvector with HNSW (Hierarchical Navigable Small World) indexing. When a user submits a semantic search query, it is first embedded using the same model that processed documents — producing a 1536-dimension vector. A cosine-distance query against all document embeddings in the organization returns conceptually related results even when no exact keyword matches exist.

This means a query for "financial performance last quarter" will surface a document titled "Q4 Board Review" that never uses those exact words. The HNSW index provides approximate nearest neighbor retrieval with high recall at sub-linear query time. The index type can be tuned via hnsw.ef_search for the recall/latency trade-off appropriate to your dataset size.

GET /api/v1/search/semantic ?q=financial+performance+last+quarter &limit=10 # Query embedded → cosine distance → ranked # Returns documents matching by CONCEPT # not requiring exact keyword overlap

Choosing between modes:Full-text search is ideal for exact document retrieval (find the specific "Q4 2025 Financials" file), compliance audit queries (find all APPROVED documents in the Legal folder), and structured filtering workflows. Semantic search is ideal for knowledge discovery (what documents discuss supplier risk mitigation?), research across large document collections, and natural language queries from non-technical users. Both search endpoints are paginated and both return the full document object — including AI classification, summary, and tag assignments — so search results can be rendered with full context without additional API calls.

04account_tree

Approval Workflow Engine

VyXlo's workflow engine transforms unstructured document review into a deterministic, auditable approval chain. Every step generates an immutable audit record so there is a complete, tamper-proof trail of who approved or rejected what, when, and why — meeting the evidentiary requirements of regulated industries.

Workflows are created against one or more documents simultaneously. Once created, each document's status transitions from its current state toward PENDING_APPROVAL. The workflow object exposes all steps with their current status: PENDING, APPROVED, or REJECTED. When all steps in a workflow are APPROVED, the document is automatically promoted to APPROVED status.

Assignees can be individual users or entire departments. When a department is assigned, all current and future members of that department can action the workflow step — making the workflow resilient to personnel changes. Reject actions require a mandatory written reason, creating a documented rationale that satisfies audit requirements. Any workflow can be fully cancelled via the DELETE endpoint, reverting the document to its prior status.

Sequential Chains

Steps execute one after another. Step 2 does not become active until Step 1 is approved. Ideal for hierarchical approval processes: reviewer → manager → legal → executive sign-off.

Parallel Nodes

Multiple steps become active simultaneously. All must resolve before the workflow advances. Ideal when multiple departments must independently approve before a document is published.

Per-Step Assignees

Each step can be assigned to a specific user or an entire department. Department assignments are dynamic — any member of the department can action the step regardless of when they joined.

Approve with Comment

Approvers can attach an optional comment to their approval decision. Stored on the workflow step object for audit purposes. Comments are returned in workflow API responses.

Reject with Reason

Rejection requires a mandatory reason string. This creates a documented rationale for every rejection, providing the audit trail required for compliance in regulated document management.

Deadline Tracking

Each step tracks when it was created and when it was decided. Overdue steps are identifiable programmatically. Auto-escalation can be triggered by monitoring step age via the Celery Beat scheduler.

// Approve step 201 in workflow 88 POST /api/v1/workflows/88/steps/201/approve { "comment": "Financials verified against ledger." } // Reject step 202 POST /api/v1/workflows/88/steps/202/reject { "reason": "Missing CFO sign-off on section 4." } // Cancel entire workflow DELETE /api/v1/workflows/88
05groups

Real-Time Collaboration

WebSocket — Live Presence & Locking

Each document has a dedicated WebSocket channel at ws://host/api/v1/ws/{document_id}?token=<jwt>. The connection is authenticated with the same OIDC JWT used for REST calls — no separate session management required. The server broadcasts typed events to all connected clients: presence_join when a user opens the document, presence_leave when they close it, cursor position updates for collaborative editing awareness, lock_acquired when any user claims the write-lock, and lock_released when it is freed. This means every connected client always has a live, accurate view of who is present and whether the document is editable — without polling.

// Incoming events on the WebSocket channel: {"type":"presence_join","user_id":12,"display_name":"Sarah K."} {"type":"cursor","user_id":12,"position":{"x":420,"y":220}} {"type":"lock_acquired","locked_by_id":12} {"type":"lock_released"} {"type":"presence_leave","user_id":12}

Comments — Threaded & Resolved

Comments are attached directly to documents and support unlimited reply nesting. Any comment thread can be resolved by the document owner or an administrator, marking it as closed and visually collapsing it for other reviewers. Emoji reactions can be added and removed on any comment. Comment authors can edit their own comments; administrators can delete any comment. The comment list endpoint returns threaded structure (replies nested under their parent) so frontend rendering requires no secondary queries. Each comment and reply is part of the immutable audit record.

Notifications — In-App & Email

Event-driven notifications are generated automatically for: new comments on documents you own or follow, workflow step assignments, approval and rejection decisions on workflows you created, share link accesses on your documents, and @mentions in comments. The notification center provides an unread count badge endpoint (GET /notifications/unread-count) ideal for real-time badge updates in navigation UI. Individual notifications can be marked as read. Email delivery and daily digest configuration are toggleable per organization.

Document Q&A — RAG via SSE

Once a document is chunk-indexed, users can have a conversational Q&A session against its content via POST /chat/document/{id}. The endpoint uses Server-Sent Events to stream the LLM response token-by-token, enabling progressive rendering in the UI without waiting for the full response. Sessions are persisted — users can resume prior conversations by passing the session_id. The final done event includes source citations pointing to the exact document chunks used to generate the answer.

// SSE stream tokens one by one: data: {"type":"token","content":"The"} data: {"type":"token","content":" contract"} data: {"type":"token","content":" expires"} ... data: {"type":"done","session_id":42, "citations":[{"chunk_id":7,"score":0.94}]}
Notification Events
New comment on owned document
Workflow step assignment
Approval / rejection decision
Share link access on your document
@mention in a comment
06verified_user

Security & Compliance

VyXlo stores no passwords. All identity is delegated to ZITADEL — an enterprise-grade open-source identity platform — using the OAuth 2.0 Authorization Code flow with PKCE. ZITADEL handles MFA (TOTP and SMS), passkeys, WebAuthn, social login, enterprise SSO via SAML 2.0, and LDAP/Active Directory federation. VyXlo validates OIDC JWTs against the ZITADEL JWKS endpoint on every request.

Authorization is layered. Every API endpoint enforces (1) valid authentication, (2) organization isolation — all queries are scoped to the authenticated user's organization and this filter is never optional, (3) role-based minimum role requirements per endpoint, and (4) resource-level permission checks before any data is returned or mutated.

The permission model has eight levels — NONE, READ, DOWNLOAD, COMMENT, CONTRIBUTOR, WRITE, EDITOR, ADMIN — applied independently per document and per folder. Permissions can target individual users or entire departments (all current and future members inherit). Optional expiry dates enable time-bounded access grants. Permission cache in Redis is invalidated immediately on any mutation.

Full Security Architecture arrow_forward

ZITADEL OIDC / PKCE

Zero passwords in VyXlo. MFA, passkeys, SAML, LDAP supported. JWT validated against ZITADEL JWKS endpoint on every request.

Multi-Tenant Isolation

Shared DB with org_id enforced at ORM level. Cross-org data leakage is architecturally impossible — the filter is embedded in every service layer query.

8-Level Permissions

NONE / READ / DOWNLOAD / COMMENT / CONTRIBUTOR / WRITE / EDITOR / ADMIN applied per document and per folder independently. Grants to users or departments with optional expiry.

7-Level Role Hierarchy

SUPER_ADMIN(100) → ADMIN(80) → MANAGER(60) → EDITOR(40) → USER(20) → VIEWER(10) → GUEST(5). Hierarchical — each role inherits capabilities of all roles below it.

Immutable Audit Trail

8 event types (CREATE, UPDATE, DELETE, ACCESS, DOWNLOAD, PERMISSION_CHANGE, WORKFLOW, EXPORT) with actor, resource, timestamp, IP address, and before/after diff.

MinIO Presigned Downloads

File bytes never proxy through the API server. Downloads return 15-minute presigned MinIO URLs. SHA-256 checksums detect corruption or tampering.

Soft-Delete Architecture

No data permanently destroyed without explicit ADMIN action. All deletes are soft. Retention dates enforced by Celery Beat scheduler. Data recoverable by administrators.

Rate Limiting & CORS

Per-endpoint rate limits via slowapi. Strictest limits on authentication endpoints. CORS policy enforced with configurable allowed origins via ALLOWED_ORIGINS env var.

07api

REST API — 64 Endpoints

The VyXlo API exposes 64 REST endpoints across 18 resource groups, covering every capability available in the product. Every operation that can be performed in the UI can be performed via API — enabling full automation, custom frontends, mobile applications, and system integrations without any undocumented surface area.

All endpoints are prefixed with /api/v1/ and return application/json. Pagination uses page (1-based) and size query parameters, returning a consistent envelope: { items, total, page, page_size, pages }. All timestamps are ISO 8601 UTC. All IDs are 64-bit integers. Errors return { "detail": "..." } with the appropriate HTTP status code.

Interactive Swagger UI is available at /api/docs on any deployment. ReDoc documentation at /api/redoc. The full OpenAPI 3.0 JSON schema is available at /api/openapi.json for client generation in any language.

Full API Reference arrow_forward
Health2
Authentication4
Documents16
Folders8
Search2
Workflows5
Comments7
Tags9
Permissions8
Share Links3
Notifications3
Organizations4
Users5
Invitations3
Departments5
AI2
Chat (SSE)2
WebSocket1
# All endpoints follow these conventions: Base: /api/v1/ Auth: Authorization: Bearer <oidc_jwt> Pagination: ?page=1&size=20 Response: { items, total, page, page_size, pages } Errors: { "detail": "<message>" } Deletes: soft-delete (recoverable) Timestamps: ISO 8601 UTC IDs: 64-bit integers # Live explorer: Swagger: /api/docs ReDoc: /api/redoc OpenAPI: /api/openapi.json
Technology Foundation

Built on production-proven infrastructure.

FastAPI
Python 3.12 · ASGI
PostgreSQL 16
pgvector · tsvector
Redis 7
Cache · Celery queue
MinIO
S3-compatible · SSE
ZITADEL
OIDC · PKCE · SAML
LangChain
AI orchestration
Celery
Async workers
Celery Beat
Scheduled tasks
SQLAlchemy
Async ORM · Alembic
Pydantic v2
Request validation
structlog
JSON structured logs
Prometheus
Metrics export

Ready to deploy?

The full stack runs with a single docker compose up -d. All 64 API endpoints, the complete AI pipeline, and real-time collaboration are available from day one.