Course RAG API — Ichrak Ennaceur

Context

Inokufu is a French EdTech company building learning intelligence products. The challenge: learners search for courses across a massive catalog in natural language — keyword matching is not enough to surface the right content.

Course RAG API addresses this by combining semantic vector search (FAISS + sentence-transformers) with LLM-generated recommendations that explain why each course was selected. The system is designed from day one to be production-ready: pluggable backends, strict output validation, full observability, and clean hexagonal architecture.

Key Metrics

5

API endpoints

4

Vector backends

2

LLM providers

6

Sprint roadmap

System Architecture

The system has two distinct pipelines that run independently:

Offline Pipeline (run once / on data refresh)

Raw Datasets (CSV)

→

prepare-data

→

Curated CSV

→

build-index

→

FAISS Index + JSONL Meta

Online Pipeline (per request)

User Query

→

Embedding Adapter
MiniLM-L6-v2

→

FAISS Search

→

Metadata Hydration

→

Ranked Results

+ Learner Profile

→

Context Builder

→

Prompt Builder
versioned

→

LLM (OpenAI / Anthropic)

→

Structured Recommendation

Hexagonal Architecture Layers

📋

contracts/ Ports

Boundary interfaces: RetrieverServicePort, RecommendationLLMFactoryPort, LearnerProfileStorePort, RecommendationLogStorePort. No implementation, no dependencies.

⚙️

services/ Use Cases

Application logic: RetrieveCoursesUseCase, RecommendCoursesUseCase, offline jobs (prepare_data, build_index). Depends only on ports.

🔌

infrastructure/ Adapters

Concrete implementations: FAISS/Qdrant/Milvus/Pinecone vector adapters, sentence-transformers embedding, files/postgres/mongo catalog, memory/redis cache, OpenAI/Anthropic LLM adapters.

🌐

api/ Delivery

FastAPI routers, request validation, response serialization, CORS. Completely decoupled from business logic.

API Endpoints

GET /v1/health Service health check

GET /v1/info Index metadata & config

GET /v1/retrieve?q=...&top_k=5 Semantic course retrieval

POST /v1/recommend LLM recommendation with explanation

POST /v1/recommend/prompt-preview Debug prompt rendering

Recommendation Pipeline Design

The POST /v1/recommend endpoint implements a full grounded recommendation flow: the LLM never generates course content from memory — it can only recommend courses already present in the retrieved candidate set.

Grounding validation rejects any recommendation that references a course ID not present in the FAISS search results. Structured output parsing validates JSON schema before returning any response. Fallback strategies handle provider timeouts, parse failures, and quota exhaustion gracefully.

Each recommendation includes: why the course was selected, which query signals matched, which profile signals matched, and retrieval score evidence — making the system explainable by design.

Query + Learner Profile

→

RetrieveCoursesUseCase

→

Top-K Candidates

RecommendationContextBuilder

→

PromptBuilder (versioned)

→

LLM call

JSON Parser + Validator

→

Grounding Check

→

RecommendationResult

Structured Log (trace_id, latency, tokens, cost)

Observability & Error Handling

Every recommendation request is fully traceable: each log event captures request ID, user ID, provider, model, prompt version, pipeline version, retrieval latency, LLM latency, input/output tokens, estimated cost, parse success, and fallback usage.

The exception hierarchy maps failure modes explicitly — RecommendationGroundingError, RecommendationOutputParseError, RecommendationProviderError — with deterministic HTTP status codes (422, 502, 503, 504) and structured log payloads.