Enterprise Knowledge Assistant

Context

Enterprise knowledge management is one of the clearest ROI use cases for RAG. Employees ask questions in natural language; the system retrieves the right internal documents and generates a concise, source-backed answer — no hallucinated policies, no outdated procedures.

This project builds that system from scratch with a focus on engineering discipline: a stable API contract locked before retrieval is implemented, a tested document pipeline, and a clean layered architecture that makes each component replaceable. The goal is a system that could be deployed in a real company context, not just a notebook demo.

Architecture & Pipeline

Ingestion pipeline

Markdown docs (data/sample_docs/)

→

KnowledgeDocument Loader

→

KnowledgeChunk Chunker (overlap)

→

Embeddings ⬡ Milvus index

Query pipeline

POST /query {question}

→

QueryService

→

Retriever → top-k chunks

→

Generator (LLM + context)

→

answer + sources[]

Dashed boxes are the next milestone — everything before them is implemented and tested. The API contract (response schema with answer and sources[]) is already locked so the interface doesn't change as retrieval is plugged in.

Layered Architecture

🌐

api/

FastAPI routes (/health, /query), request/response schemas, input validation. Thin — delegates everything to services.

⚙️

core/

Centralized settings, shared dependency wiring, structured logging. Single source of truth for configuration.

🔀

services/

Orchestrates use cases (QueryService). Keeps route handlers lightweight by owning the business logic coordination.

📚

rag/

Document loading, chunking with overlap, generator abstraction. Will expand to cover embeddings, Milvus adapter, retriever, and prompt building.

Implementation Status

Feature	Status
FastAPI app + routing Entrypoint, route registration, CORS, config wiring	✓ Done
API contract (GET /health, POST /query) Response schema locked with answer + sources[]	✓ Done
KnowledgeDocument loader Markdown corpus under data/sample_docs/, tested	✓ Done
KnowledgeChunk chunker Overlap support, tested, retrieval-ready	✓ Done
Generator abstraction + mock Interface defined, mock implementation for dev/testing	✓ Done
Dev tooling (uv, Ruff, pytest, MkDocs) Make targets for API, tests, and documentation	✓ Done
Milvus collection + chunk ingestion Embed chunks → index into Milvus → enable vector search	⬡ Next
Vector retrieval → grounded generation Wire retrieval results into /query, real LLM answers	⬡ Next
Streamlit UI + Docker packaging Interactive frontend, containerized deployment	○ Later

API Endpoints

GET /health Service health check

POST /query Natural language question → answer + sources

The contract is locked: POST /query already returns { answer, sources[] } even with the mock generator. Real retrieval will plug in without changing the public interface.