Introduction

Overview

DataDot is a comprehensive RAG (Retrieval-Augmented Generation) application that combines a modern Next.js frontend with powerful Python backend services to provide:

Multi-LLM Support: Integration with OpenAI, Anthropic, Google Gemini, Cohere, and Ollama
Smart Document Processing: Convert and process PDFs, Word docs, Excel files, images, audio, and more
Intelligent Chat Interface: Real-time streaming conversations with AI agents
Workspace Management: Organize documents and conversations by project
Vector Search: Multiple vector database support (LanceDB, ChromaDB, Pinecone, Qdrant, Weaviate)
MCP Server: Model Context Protocol server for AI agent integration

Features

Document Collector

Intelligent ingestion system that processes PDFs, websites, and YouTube videos into RAG-ready formats.

MCP Server

Model Context Protocol server that exposes documentation and tools to AI agents like Claude and Cursor.

Vector Search

High-performance similarity search using Pinecone and OpenAI embeddings.

Admin Dashboard

Next.js-based interface for managing workspaces, documents, and API keys.

Customizable Themes: Dark/light mode with custom branding
Internationalization: Multi-language support via i18next
Built with Next.js 16 (App Router) and React 19
Real-time chat with streaming responses (SSE + WebSockets)
Responsive design with mobile support
Custom theme system with CSS variables
JWT-based authentication
Drag-and-drop file uploads
Text-to-speech and speech-to-text capabilities
Data visualization with Recharts

Backend Server

FastAPI with async support
Multi-user mode with workspace isolation
Secure authentication with JWT tokens
Unified LLM interface for multiple providers
Vector database operations for RAG
Background job scheduling
Comprehensive API documentation (OpenAPI/Swagger)

Document Collector

Support for 9+ file formats (PDF, DOCX, XLSX, EPUB, MBOX, images, audio)
Audio transcription with OpenAI Whisper
OCR for images using Tesseract
Web scraping with depth control
GitHub/GitLab repository ingestion
YouTube transcript extraction
Obsidian Vault processing
Automatic token counting and text chunking

MCP Server

Model Context Protocol (MCP) server for AI agent integration
SSE (Server-Sent Events) transport for real-time communication
Dynamic configuration updates at runtime
API key authentication
Multiple LLM provider support
Document upload and processing tools
Vector database query tools
Workspace management tools

Getting Started

Core Components

Overview

Features

Document Collector

MCP Server

Vector Search

Admin Dashboard

Backend Server

Document Collector

MCP Server

Getting Started

Core Components

​Overview

​Features

Document Collector

MCP Server

Vector Search

Admin Dashboard

​Backend Server

​Document Collector

​MCP Server

Overview

Features

Backend Server

Document Collector

MCP Server