Context
Enterprise knowledge management is one of the clearest ROI use cases for RAG. Employees ask questions in natural language; the system retrieves the right internal documents and generates a concise, source-backed answer โ no hallucinated policies, no outdated procedures.
This project builds that system from scratch with a focus on engineering discipline: a stable API contract locked before retrieval is implemented, a tested document pipeline, and a clean layered architecture that makes each component replaceable. The goal is a system that could be deployed in a real company context, not just a notebook demo.
Architecture & Pipeline
Dashed boxes are the next milestone โ everything before them is implemented and tested. The API contract (response schema with answer and sources[]) is already locked so the interface doesn't change as retrieval is plugged in.
Layered Architecture
/health, /query), request/response schemas, input validation. Thin โ delegates everything to services.Implementation Status
| Feature | Status |
|---|---|
|
FastAPI app + routing
Entrypoint, route registration, CORS, config wiring
|
โ Done |
|
API contract (GET /health, POST /query)
Response schema locked with answer + sources[]
|
โ Done |
|
KnowledgeDocument loader
Markdown corpus under data/sample_docs/, tested
|
โ Done |
|
KnowledgeChunk chunker
Overlap support, tested, retrieval-ready
|
โ Done |
|
Generator abstraction + mock
Interface defined, mock implementation for dev/testing
|
โ Done |
|
Dev tooling (uv, Ruff, pytest, MkDocs)
Make targets for API, tests, and documentation
|
โ Done |
|
Milvus collection + chunk ingestion
Embed chunks โ index into Milvus โ enable vector search
|
โฌก Next |
|
Vector retrieval โ grounded generation
Wire retrieval results into /query, real LLM answers
|
โฌก Next |
|
Streamlit UI + Docker packaging
Interactive frontend, containerized deployment
|
โ Later |
API Endpoints
The contract is locked: POST /query already returns { answer, sources[] } even with the mock generator. Real retrieval will plug in without changing the public interface.