Module 1 · AI

📖 8 min read · By ['[“HungHM15”]']

Final Exam: Enterprise RAG System#

overview#

Field	Value
Course	RAG and Optimization
Duration	240 minutes (4 hours)
Passing Score	70%
Total Points	100

Description#

You have been hired as an AI Engineer at TechDocs Inc., a company that provides enterprise documentation solutions. Your task is to build a production-ready Enterprise RAG System that can answer complex questions about technical documentation, company policies, and product specifications.

The current basic RAG system has several limitations:

Poor retrieval quality due to fixed-size chunking
Slow search performance with growing document collections
Inability to handle keyword-specific queries (error codes, product IDs)
Redundant and irrelevant results in retrieved documents
Missing relationship information between entities (policies, stakeholders, regulations)

You must apply all five optimization techniques learned in this module to build a comprehensive, production-grade RAG system.

Objectives#

By completing this exam, you will demonstrate mastery of:

Implementing Semantic Chunking for intelligent document segmentation
Configuring HNSW Index for high-performance vector search
Building Hybrid Search combining BM25 and Vector Search with RRF fusion
Applying Query Transformation techniques (HyDE and Query Decomposition)
Implementing Post-Retrieval Processing with Cross-Encoder and MMR
Designing a GraphRAG architecture for relationship-aware retrieval

Problem Description#

Build an Enterprise RAG System named enterprise-rag-system that processes a collection of technical documents and provides accurate, contextual answers to user queries. The system must handle:

Technical documentation with code snippets, error codes, and specifications
Policy documents with stakeholder relationships and regulatory references
Product catalogs with model numbers, features, and comparisons

The system should intelligently route queries to the appropriate retrieval strategy and provide high-quality, diverse, and accurate results.

Assumptions#

You have access to sample documents (technical docs, policies, product specs) or will use provided sample data
OpenAI API key or compatible LLM endpoint is available
Neo4j database is available (local Docker or cloud instance)
Python 3.10+ environment with necessary packages installed
Basic understanding of all five RAG optimization techniques

Technical Requirements#

Environment Setup#

Python 3.10 or higher
Required packages:
- langchain >= 0.1.0
- langchain-neo4j >= 0.1.0
- openai >= 1.0.0
- sentence-transformers >= 2.2.0
- chromadb >= 0.4.0 OR qdrant-client >= 1.7.0
- rank-bm25 >= 0.2.2
- pydantic >= 2.0.0
- neo4j >= 5.0.0

Infrastructure#

Vector Database: ChromaDB or Qdrant with HNSW indexing
Graph Database: Neo4j (Docker recommended)
Embedding Model: text-embedding-3-small or all-MiniLM-L6-v2
Cross-Encoder: cross-encoder/ms-marco-MiniLM-L-6-v2
LLM: GPT-4 or equivalent

Tasks#

Task 1: Advanced Indexing Pipeline (20 points)#

Time Allocation: 45 minutes

Implement an intelligent document indexing pipeline that preserves semantic coherence.

Requirements:#

Semantic Chunking Implementation
- Build a chunker that splits documents based on semantic similarity between sentences
- Configure similarity threshold (0.7-0.85) and chunk size limits
- Handle edge cases: code blocks, tables, lists, short documents
HNSW Index Configuration
- Set up vector database with HNSW indexing
- Configure optimal parameters: M=32, ef_construction=200, ef_search=100
- Document the trade-offs for your chosen configuration
Indexing Pipeline
- Process at least 20 documents through the pipeline
- Store metadata (source, chunk_id, document_type) with each vector
- Implement batch processing for efficiency

Deliverables:#

indexing/semantic_chunker.py
indexing/vector_store.py
Indexed document collection with metadata

Task 2: Hybrid Search Implementation (20 points)#

Time Allocation: 45 minutes

Build a hybrid retrieval system that combines keyword and semantic search.

Requirements:#

BM25 Retriever
- Implement BM25 indexing for all document chunks
- Proper tokenization with case normalization and punctuation handling
- Return top-K results with BM25 scores
Hybrid Search with RRF
- Execute both BM25 and Vector Search in parallel
- Implement RRF fusion: RRF(d) = Σ 1/(60 + rank(d))
- Handle documents appearing in only one result list
Query Router
- Analyze query to determine optimal search strategy
- Route keyword-heavy queries to prioritize BM25
- Route semantic queries to prioritize Vector Search
- Use Hybrid Search as default

Deliverables:#

retrieval/bm25_retriever.py
retrieval/hybrid_search.py
retrieval/query_router.py

Task 3: Query Transformation Layer (15 points)#

Time Allocation: 35 minutes

Implement query transformation to handle vague and complex queries.

Requirements:#

HyDE Implementation
- Generate hypothetical answer paragraphs using LLM
- Use hypothetical answer embedding for retrieval
- Design domain-appropriate generation prompts
Query Decomposition
- Detect multi-part questions requiring information from multiple sources
- Generate independent sub-queries for parallel retrieval
- Aggregate results from all sub-queries
Transformation Router
- Classify queries: simple, vague (use HyDE), complex (use Decomposition)
- Apply appropriate transformation before retrieval

Deliverables:#

transformation/hyde.py
transformation/query_decomposition.py
transformation/transformation_router.py

Task 4: Post-Retrieval Processing (15 points)#

Time Allocation: 35 minutes

Implement re-ranking and diversity optimization for retrieved results.

Requirements:#

Cross-Encoder Re-ranking
- Retrieve top-50 candidates with Bi-Encoder
- Re-rank using Cross-Encoder (cross-encoder/ms-marco-MiniLM-L-6-v2)
- Return top-10 re-ranked results
MMR for Diversity
- Implement MMR algorithm with configurable λ parameter
- Default λ=0.5 for balanced relevance/diversity
- Ensure diverse information coverage in final results
Configurable Pipeline
- Support both: Cross-Encoder → MMR and MMR → Cross-Encoder orders
- Allow configuration of k values at each stage

Deliverables:#

post_retrieval/cross_encoder_reranker.py
post_retrieval/mmr.py
post_retrieval/post_retrieval_pipeline.py

Task 5: GraphRAG Integration (20 points)#

Time Allocation: 50 minutes

Build a knowledge graph for relationship-aware retrieval.

Requirements:#

Entity Extraction
- Define Pydantic models for domain entities (Policy, Stakeholder, Product, Regulation, etc.)
- Extract entities and relationships using LLM with structured output
- Validate extracted data against schema
Knowledge Graph Construction
- Populate Neo4j with extracted entities and relationships
- Use MERGE to prevent duplicates
- Create appropriate indexes for query performance
Graph-Aware Retrieval
- Implement natural language to Cypher translation
- Support relationship traversal queries
- Combine graph results with vector search results

Deliverables:#

graph/entity_models.py
graph/entity_extractor.py
graph/knowledge_graph.py
graph/graph_retriever.py

Task 6: Integration and Orchestration (10 points)#

Time Allocation: 30 minutes

Integrate all components into a unified RAG system.

Requirements:#

Unified Query Pipeline
- Accept user query as input
- Apply query classification and routing
- Execute appropriate retrieval strategy
- Apply post-retrieval processing
- Generate final answer using LLM
Configuration Management
- Externalize all configurable parameters
- Support different modes: fast (less accurate), accurate (slower), balanced
Error Handling and Logging
- Graceful degradation if a component fails
- Structured logging for debugging and monitoring

Deliverables:#

main.py or enterprise_rag.py
config.py or config.yaml
README.md with setup and usage instructions

Questions to Answer#

Include written answers to these questions in your README.md or a separate ANSWERS.md file:

Architecture Decision: Explain why you chose your specific HNSW parameters and how they balance speed vs. accuracy for this use case.
Hybrid Search Trade-offs: Describe a scenario where Hybrid Search significantly outperforms pure Vector Search, and explain why.
Query Transformation Selection: How does your system decide when to use HyDE vs. Query Decomposition? What signals does it look for?
Re-ranking Strategy: Why did you choose your specific order of Cross-Encoder and MMR? What would change if the use case prioritized diversity over precision?
GraphRAG Value: Provide an example query that your GraphRAG component can answer that would be impossible or very difficult with vector search alone.

Grading Rubrics#

Criterion	Weight	Excellent (90-100%)	Good (70-89%)	Satisfactory (50-69%)	Needs Improvement (<50%)
Advanced Indexing	20%	Semantic chunking preserves context perfectly; HNSW optimally configured with benchmarks	Chunking works with minor issues; HNSW configured but not optimized	Basic chunking implemented; HNSW uses default parameters	Chunking breaks context; HNSW not implemented
Hybrid Search	20%	BM25 and RRF perfectly implemented; Query router makes intelligent decisions	Hybrid search works; Router has some misclassifications	Basic hybrid search; No query routing	Hybrid search not functional
Query Transformation	15%	HyDE and Decomposition both work excellently; Smart routing between them	Both techniques work; Routing is rule-based	One technique works; No routing	Neither technique functional
Post-Retrieval	15%	Cross-Encoder significantly improves precision; MMR provides diverse results	Both components work; Measurable improvement	One component works	Neither component functional
GraphRAG	20%	Complete entity extraction; Rich graph; Answers complex relationship queries	Graph populated; Basic queries work	Partial graph; Limited queries	Graph not functional
Integration	10%	Seamless pipeline; Excellent error handling; Clean configuration	Components integrated; Some rough edges	Partial integration	Components not connected

Estimated Time#

Task	Time Allocation
Task 1: Advanced Indexing	45 minutes
Task 2: Hybrid Search	45 minutes
Task 3: Query Transformation	35 minutes
Task 4: Post-Retrieval	35 minutes
Task 5: GraphRAG	50 minutes
Task 6: Integration	30 minutes
Total	240 minutes (4 hours)

Hints#

General Tips:

Start by setting up the infrastructure (Neo4j, Vector DB) before writing code
Test each component independently before integration
Use the companion notebooks from assignments as references
Cache LLM responses during development to save API costs

Component-Specific Tips:

For Semantic Chunking: Use sentence-transformers for efficient similarity calculation
For HNSW: Prioritize ef_search tuning for query-time optimization
For BM25: Use nltk.word_tokenize() for consistent tokenization
For HyDE: The hypothetical answer doesn’t need to be factually correct
For Cross-Encoder: Batch processing significantly improves throughput
For GraphRAG: Test Cypher queries in Neo4j Browser before implementing in code

Notes:#

You can use the your implementation in your previous assignment lab that you have done as the starting point.

Final Exam: Enterprise RAG System#

overview#

Description#

Objectives#

Problem Description#

Assumptions#

Technical Requirements#

Environment Setup#

Infrastructure#

Tasks#

Task 1: Advanced Indexing Pipeline (20 points)#

Requirements:#

Deliverables:#

Task 2: Hybrid Search Implementation (20 points)#

Requirements:#

Deliverables:#

Task 3: Query Transformation Layer (15 points)#

Requirements:#

Deliverables:#

Task 4: Post-Retrieval Processing (15 points)#

Requirements:#

Deliverables:#

Task 5: GraphRAG Integration (20 points)#

Requirements:#

Deliverables:#

Task 6: Integration and Orchestration (10 points)#

Requirements:#

Deliverables:#

Questions to Answer#

Submission Rules#

Required Deliverables#

Submission Checklist#

Grading Rubrics#

Estimated Time#

Hints#

Notes:#