Basic AI Fundamentals - Final Assignment#
Assignment Metadata#
Field |
Description |
|---|---|
Assignment Name |
RAG Agent for FPT Policy Document Q&A |
Course |
Unit 1: Basic AI Fundamentals |
Project Name |
fpt-policy-rag-agent |
Estimated Time |
150 minutes |
Framework |
Python 3.10+, LangChain 1.0.5+, Amazon S3, Amazon Bedrock |
Learning Objectives#
By completing this assignment, you will be able to:
Build a complete RAG (Retrieval-Augmented Generation) pipeline from scratch
Implement document loading, chunking, and embedding strategies
Configure a vector database (FAISS or Chroma or S3 Vector) for efficient similarity search
Design effective prompts for accurate and grounded answers
Apply retrieval techniques including similarity search and metadata filtering
Create a conversational agent that can answer questions about FPT policies
Validate system responses for faithfulness and relevance
Assignment Description#
Background#
FPT Corporation has various internal policy documents that employees need to reference frequently. Your task is to build a RAG-based Q&A system that allows employees to ask natural language questions and receive accurate answers based on the policy documents. Get your data from the folder ./data/FSoft_HR.pdf
Requirements#
Build a RAG Agent that:
Loads and processes FPT policy documents (PDF/TXT format)
Chunks documents using appropriate strategies (semantic or fixed-size)
Creates embeddings and stores them in a vector database
Retrieves relevant document chunks based on user queries
Generates accurate answers grounded in the retrieved context
Handles edge cases when information is not available
Technical Requirements#
Framework Requirements#
Component |
Requirement |
|---|---|
Python |
3.10 or higher |
LangChain |
1.0.5 or higher |
Vector Storage |
Amazon S3 Vector |
Embedding Model |
Amazon Bedrock Nova Embedding |
LLM |
Amazon Bedrock Nova Pro |
Document Loaders |
PyPDFLoader, TextLoader |
AWS Configuration#
Configure AWS credentials via AWS CLI or environment variables
Ensure IAM permissions for Amazon Bedrock and S3 access
Set AWS region (e.g.,
us-east-1) where Bedrock is available
Code Quality Standards#
Use meaningful variable and function names
Organize code into modular functions/classes
Include docstrings for all functions
Implement proper error handling with try/except blocks
Use environment variables for AWS credentials (never hardcode)
Implementation Tasks#
Task 1: Document Loading & Chunking (30 mins)#
Load policy documents from the provided dataset
Implement chunking with appropriate chunk size (500-1000 tokens)
Add metadata to chunks (source file, page number)
Task 2: Embedding & Vector Store (30 mins)#
Create embeddings using Amazon Bedrock Nova Embedding model
Store embeddings in Amazon S3 bucket
Implement persistence (save/load vectors from S3)
Task 3: Retrieval System (30 mins)#
Create a retriever with Top-K = 4
Implement similarity search
(Optional) Add metadata filtering capability
Task 4: Generation & Prompt Engineering (30 mins)#
Design a system prompt that instructs the LLM to:
Only use information from the provided context
Say “I don’t have information about this” when context is insufficient
Cite the source document when possible
Build the complete RAG chain using LCEL
Task 5: Testing & Validation (30 mins)#
Test with at least 5 sample questions
Verify answers are grounded in the context (faithfulness)
Handle edge cases (out-of-scope questions)
Submission Requirements#
Required Deliverables#
Deliverable |
Description |
|---|---|
Source Code |
Jupyter Notebook (.ipynb) or Python script (.py) |
README.md |
Setup instructions and usage guide |
Screenshots |
Screenshots showing successful Q&A interactions |
Sample Q&A |
At least 5 question-answer pairs demonstrating the system |
Submission Checklist#
Project runs without errors
Vector database is created successfully
RAG chain generates accurate answers
System handles “unknown” questions gracefully
Code is well-documented with comments
README.md with setup instructions included
Evaluation Criteria#
Criteria |
Weight |
Description |
|---|---|---|
Functionality |
40% |
RAG pipeline works correctly end-to-end |
Code Quality |
20% |
Clean, modular, well-documented code |
Prompt Engineering |
15% |
Effective prompts that reduce hallucination |
Error Handling |
15% |
Graceful handling of edge cases and errors |
Documentation |
10% |
Clear README and code comments |
Sample Questions for Testing#
Use these questions to test your RAG agent:
“What is the leave policy for employees?”
“How do I request work-from-home approval?”
“What are the working hours at FPT?”
“What is the process for expense reimbursement?”
“Tell me about the weather today” (out-of-scope test)
Notes#
Be Specific: Ensure your retriever returns relevant chunks
Be Grounded: Answers should only use information from the context
Be Transparent: When information is not available, say so clearly
Be Efficient: Optimize chunk size and Top-K for best results