๐Ÿข

Enterprise Knowledge Assistant

Production-oriented RAG system simulating an enterprise internal knowledge base โ€” clean layered architecture, source-backed answers, and Milvus vector retrieval as the next milestone.

๐Ÿ Python FastAPI Milvus RAG ๐Ÿšง Work in progress
๐Ÿšง
Work in progress. The backend foundation is fully implemented โ€” API contract, document loader, chunker, and generator abstraction. The next milestone is plugging in Milvus for real vector retrieval and grounded answer generation. This page reflects the current state honestly.

Context

Enterprise knowledge management is one of the clearest ROI use cases for RAG. Employees ask questions in natural language; the system retrieves the right internal documents and generates a concise, source-backed answer โ€” no hallucinated policies, no outdated procedures.

This project builds that system from scratch with a focus on engineering discipline: a stable API contract locked before retrieval is implemented, a tested document pipeline, and a clean layered architecture that makes each component replaceable. The goal is a system that could be deployed in a real company context, not just a notebook demo.

Architecture & Pipeline

Ingestion pipeline
Markdown docs (data/sample_docs/)
โ†’
KnowledgeDocument Loader
โ†’
KnowledgeChunk Chunker (overlap)
โ†’
Query pipeline
POST /query {question}
โ†’
QueryService
โ†’ โ†’ โ†’
answer + sources[]

Dashed boxes are the next milestone โ€” everything before them is implemented and tested. The API contract (response schema with answer and sources[]) is already locked so the interface doesn't change as retrieval is plugged in.

Layered Architecture

๐ŸŒ
api/
FastAPI routes (/health, /query), request/response schemas, input validation. Thin โ€” delegates everything to services.
โš™๏ธ
core/
Centralized settings, shared dependency wiring, structured logging. Single source of truth for configuration.
๐Ÿ”€
services/
Orchestrates use cases (QueryService). Keeps route handlers lightweight by owning the business logic coordination.
๐Ÿ“š
rag/
Document loading, chunking with overlap, generator abstraction. Will expand to cover embeddings, Milvus adapter, retriever, and prompt building.

Implementation Status

Feature Status
FastAPI app + routing
Entrypoint, route registration, CORS, config wiring
โœ“ Done
API contract (GET /health, POST /query)
Response schema locked with answer + sources[]
โœ“ Done
KnowledgeDocument loader
Markdown corpus under data/sample_docs/, tested
โœ“ Done
KnowledgeChunk chunker
Overlap support, tested, retrieval-ready
โœ“ Done
Generator abstraction + mock
Interface defined, mock implementation for dev/testing
โœ“ Done
Dev tooling (uv, Ruff, pytest, MkDocs)
Make targets for API, tests, and documentation
โœ“ Done
Milvus collection + chunk ingestion
Embed chunks โ†’ index into Milvus โ†’ enable vector search
โฌก Next
Vector retrieval โ†’ grounded generation
Wire retrieval results into /query, real LLM answers
โฌก Next
Streamlit UI + Docker packaging
Interactive frontend, containerized deployment
โ—‹ Later

API Endpoints

GET /health Service health check
POST /query Natural language question โ†’ answer + sources

The contract is locked: POST /query already returns { answer, sources[] } even with the mock generator. Real retrieval will plug in without changing the public interface.