Your open-source AIaaS platform. Self-hosted.
Open-source AIaaS · Self-hosted AI as a ServiceRun your own AI as a Service stack — RAG,Agents,MCP tools,Visual logic,App builder — behind one production-ready REST API and a complete Web UI. Apache 2.0, on your infrastructure.
Demo login: demo / demodemo
A complete AI-as-a-Service stack — not just another wrapper. Full Web UI, analytics, security, and enterprise features out of the box, on your infrastructure, under Apache 2.0.
RAG with SQL-to-NL and auto-sync, Agents with MCP tools, Block visual logic, App builder, and direct Inference — all in one platform.
React dashboard with token tracking, cost analytics, latency monitoring, per-project usage charts, and model fleet view. Not just an API.
OpenAI, Anthropic, Ollama, Gemini, Groq, Grok, LiteLLM, vLLM, Azure, AWS Bedrock, and any OpenAI-compatible endpoint.
Teams, RBAC, OAuth/LDAP/OIDC, TOTP 2FA, input/output guardrails, audit logging, per-project rate limits, and budget caps.
Model Context Protocol support for unlimited agent integrations. Connect any MCP server via HTTP/SSE or stdio.
Custom logos, colors, and app names per team. Built-in knowledge sync from S3, Confluence, SharePoint, and Google Drive.
Track token usage, costs, latency, and project activity from a centralised dashboard. Daily charts for tokens, costs, and response latency per project — identify performance regressions at a glance.
Upload documents and query them with LLM-powered retrieval. Multiple vector stores, ColBERT and LLM-based reranking, and natural language to SQL.


Zero-shot ReAct agents with built-in tools and MCP (Model Context Protocol) server support. Connect any MCP-compatible server via HTTP/SSE or stdio for unlimited tool access.
Create and manage AI projects with their own LLM, system prompt, tools, and configuration. Test instantly in the built-in chat playground with streaming responses and multimodal support.
Build processing logic visually using a Blockly-based IDE — no LLM required. Drag-and-drop blocks to define how input is transformed into output. Use the "Call Project" block to compose AI pipelines.
Built-in evaluation system to measure and track AI project quality over time. Create test datasets, run evaluations with multiple metrics, and visualise score trends.
Every system prompt change is automatically versioned. Browse history, compare versions, and restore any previous prompt. Eval runs link to prompt versions for A/B comparison.
Local and remote image generators loaded dynamically. Supports Stable Diffusion, Flux, DALL-E, RMBG2, and more. Auto-detects NVIDIA GPUs with detailed hardware monitoring.
Multi-tenant with teams, RBAC, custom branding per team (white-labelling), TOTP 2FA, input/output guardrails, and a full audit log.


Automatically keep your RAG knowledge base up-to-date by syncing from external sources on a schedule. Configure per project with independent settings.
Add an AI chat bubble to any website with a single <script> tag. Streams responses in real-time, maintains conversation context, and works on any domain.
Drop a single zip into any WordPress site and turn your RESTai instance into the AI engine behind it. Each capability maps to its own RESTai project, so models, prompts and budgets stay tunable per task — and the plugin auto-provisions the starter projects on first connect.
Use LLMs, image generators, and audio transcription directly via OpenAI-compatible endpoints — no project required. Team-level permissions control access, and all usage counts toward budgets.
POST /v1/chat/completions — Chat with any LLM (streaming supported)POST /v1/images/generations — Generate images via DALL-E, Flux, SD, etc.POST /v1/audio/transcriptions — Transcribe audio filesWorks with any OpenAI-compatible SDK — just point base_url to your RESTai instance.
Talks directly to provider SDKs. Configurable context windows with automatic chat memory management.
# Install pip install restai-core # Setup database restai init restai migrate # Start server restai serve # Open http://localhost:9000/admin # Login: admin / admin
# Pull and run — multi-arch (amd64/arm64) docker run -p 9000:9000 apocas/restai:latest # Or pin a release (with env file) docker run -p 9000:9000 \ --env-file .env \ apocas/restai:6.2.13 # Also on GHCR: ghcr.io/apocas/restai
# Clone and install git clone https://github.com/apocas/restai cd restai && make install # Open http://localhost:9000/admin # Login: admin / admin
With env file: restai serve -e .env -p 8080 -w 4
·
Compose: docker compose --env-file .env up --build
AIaaS — AI as a Service — is a delivery model where AI capabilities (LLM inference, RAG, agents, image generation, embeddings) are exposed as managed services consumable via APIs. RESTai is an open-source, self-hosted AIaaS platform: you run the same kind of platform that hosted AI vendors offer, but on your own infrastructure.
Yes. RESTai is fully self-hosted and released under the Apache 2.0 open-source license. Install via PyPI, Docker, or a Helm chart on Kubernetes. No vendor lock-in, no telemetry phone-home, your data and your models stay on your infrastructure.
OpenAI, Anthropic (Claude), Ollama (local), Google Gemini, Groq, Grok (xAI), LiteLLM, vLLM, Azure OpenAI, AWS Bedrock, and any OpenAI-compatible endpoint. Mix multiple providers per team, with per-project budgets and fallbacks.
Hosted AIaaS providers run AI workloads on their cloud and bill per token. RESTai gives you the same product surface — RAG projects, agents, MCP tools, eval, analytics, branding — as open-source software you self-host. You keep full control of data, model selection, prompts, and budgets.
RAG over your own documents, MCP-powered agents, visual logic pipelines (Block IDE), full apps via the App Builder, embeddable chat widgets, WordPress AI integrations, and direct OpenAI-compatible LLM/image/audio endpoints — all behind one self-hosted REST API.