๐ŸŽ“

Course RAG API

Semantic course retrieval and LLM-powered recommendations for EdTech โ€” built at Inokufu with hexagonal architecture, FAISS, and multi-provider LLM adapters.

๐Ÿข Inokufu ๐Ÿ Python 3.14 FastAPI FAISS ๐ŸŸข Open Source

Context

Inokufu is a French EdTech company building learning intelligence products. The challenge: learners search for courses across a massive catalog in natural language โ€” keyword matching is not enough to surface the right content.

Course RAG API addresses this by combining semantic vector search (FAISS + sentence-transformers) with LLM-generated recommendations that explain why each course was selected. The system is designed from day one to be production-ready: pluggable backends, strict output validation, full observability, and clean hexagonal architecture.

Key Metrics

5
API endpoints
4
Vector backends
2
LLM providers
6
Sprint roadmap

System Architecture

The system has two distinct pipelines that run independently:

Offline Pipeline (run once / on data refresh)
Raw Datasets (CSV)
โ†’
prepare-data
โ†’
Curated CSV
โ†’
build-index
โ†’
FAISS Index + JSONL Meta
Online Pipeline (per request)
User Query
โ†’
Embedding Adapter
MiniLM-L6-v2
โ†’
FAISS Search
โ†’
Metadata Hydration
โ†’
Ranked Results
+ Learner Profile
โ†’
Context Builder
โ†’
Prompt Builder
versioned
โ†’
LLM (OpenAI / Anthropic)
โ†’
Structured Recommendation

Hexagonal Architecture Layers

๐Ÿ“‹
contracts/ Ports
Boundary interfaces: RetrieverServicePort, RecommendationLLMFactoryPort, LearnerProfileStorePort, RecommendationLogStorePort. No implementation, no dependencies.
โš™๏ธ
services/ Use Cases
Application logic: RetrieveCoursesUseCase, RecommendCoursesUseCase, offline jobs (prepare_data, build_index). Depends only on ports.
๐Ÿ”Œ
infrastructure/ Adapters
Concrete implementations: FAISS/Qdrant/Milvus/Pinecone vector adapters, sentence-transformers embedding, files/postgres/mongo catalog, memory/redis cache, OpenAI/Anthropic LLM adapters.
๐ŸŒ
api/ Delivery
FastAPI routers, request validation, response serialization, CORS. Completely decoupled from business logic.

API Endpoints

GET /v1/health Service health check
GET /v1/info Index metadata & config
GET /v1/retrieve?q=...&top_k=5 Semantic course retrieval
POST /v1/recommend LLM recommendation with explanation
POST /v1/recommend/prompt-preview Debug prompt rendering

Recommendation Pipeline Design

The POST /v1/recommend endpoint implements a full grounded recommendation flow: the LLM never generates course content from memory โ€” it can only recommend courses already present in the retrieved candidate set.

Grounding validation rejects any recommendation that references a course ID not present in the FAISS search results. Structured output parsing validates JSON schema before returning any response. Fallback strategies handle provider timeouts, parse failures, and quota exhaustion gracefully.

Each recommendation includes: why the course was selected, which query signals matched, which profile signals matched, and retrieval score evidence โ€” making the system explainable by design.

Query + Learner Profile
โ†’
RetrieveCoursesUseCase
โ†’
Top-K Candidates
RecommendationContextBuilder
โ†’
PromptBuilder (versioned)
โ†’
LLM call
JSON Parser + Validator
โ†’
Grounding Check
โ†’
RecommendationResult
Structured Log (trace_id, latency, tokens, cost)

Observability & Error Handling

Every recommendation request is fully traceable: each log event captures request ID, user ID, provider, model, prompt version, pipeline version, retrieval latency, LLM latency, input/output tokens, estimated cost, parse success, and fallback usage.

The exception hierarchy maps failure modes explicitly โ€” RecommendationGroundingError, RecommendationOutputParseError, RecommendationProviderError โ€” with deterministic HTTP status codes (422, 502, 503, 504) and structured log payloads.