Context
Inokufu is a French EdTech company building learning intelligence products. The challenge: learners search for courses across a massive catalog in natural language โ keyword matching is not enough to surface the right content.
Course RAG API addresses this by combining semantic vector search (FAISS + sentence-transformers) with LLM-generated recommendations that explain why each course was selected. The system is designed from day one to be production-ready: pluggable backends, strict output validation, full observability, and clean hexagonal architecture.
Key Metrics
System Architecture
The system has two distinct pipelines that run independently:
MiniLM-L6-v2
versioned
Hexagonal Architecture Layers
API Endpoints
Recommendation Pipeline Design
The POST /v1/recommend endpoint implements a full grounded recommendation flow: the LLM never generates course content from memory โ it can only recommend courses already present in the retrieved candidate set.
Grounding validation rejects any recommendation that references a course ID not present in the FAISS search results. Structured output parsing validates JSON schema before returning any response. Fallback strategies handle provider timeouts, parse failures, and quota exhaustion gracefully.
Each recommendation includes: why the course was selected, which query signals matched, which profile signals matched, and retrieval score evidence โ making the system explainable by design.
Observability & Error Handling
Every recommendation request is fully traceable: each log event captures request ID, user ID, provider, model, prompt version, pipeline version, retrieval latency, LLM latency, input/output tokens, estimated cost, parse success, and fallback usage.
The exception hierarchy maps failure modes explicitly โ RecommendationGroundingError, RecommendationOutputParseError, RecommendationProviderError โ with deterministic HTTP status codes (422, 502, 503, 504) and structured log payloads.