AI Theory Exams#

This page consolidates theory exam question banks from all AI training modules.


AI Fundamentals Theory#

Basic AI Fundamentals Quiz#

No.

Training Unit

Lecture

Training content

Question

Level

Mark

Answer

Answer Option A

Answer Option B

Answer Option C

Answer Option D

Explanation

1

Unit 1: Basic AI Fundamentals

Lec1

RAG Architecture

What is RAG (Retrieval-Augmented Generation), a hybrid AI architecture, designed to do?

Medium

1

C

Increase the speed of natural language processing

Reduce the cost of training language models

Enhance the quality and reliability of Large Language Models

Increase the creativity of language models

RAG is designed to enhance the quality and reliability of Large Language Models (LLMs) by integrating an information retrieval step from an external knowledge base before the LLM generates text.

2

Unit 1: Basic AI Fundamentals

Lec1

RAG Core Problems

What is one of the core technical problems that RAG solves?

Easy

1

A

Reduce hallucination (making up information)

Improve data retrieval speed

Increase data storage capacity

Enhance information security

RAG addresses limitations of traditional LLMs such as hallucination, outdated knowledge, lack of transparency, and difficulty accessing specialized knowledge.

3

Unit 1: Basic AI Fundamentals

Lec1

RAG vs Fine-tuning

What is the advantage of RAG over fine-tuning when updating knowledge for LLMs?

Medium

1

D

RAG is only suitable for unstructured data

RAG requires greater computing resources

RAG has lower transparency

RAG allows faster knowledge updates

RAG allows quick and nearly instant knowledge updates by updating the vector database, while fine-tuning requires retraining the model, which is expensive and slower.

4

Unit 1: Basic AI Fundamentals

Lec1

RAG Use Cases

When should you choose RAG instead of fine-tuning an LLM?

Medium

1

A

When you need to add factual knowledge and answer questions based on new data

When you need to reduce model operating costs

When you need to enhance the model’s reasoning ability

When you need to adjust the model’s behavior and style

RAG is suitable when you need to add factual knowledge and answer questions based on new data, while fine-tuning is appropriate when you need to adjust behavior, style, or learn a new skill.

5

Unit 1: Basic AI Fundamentals

Lec1

RAG Pipeline

In the RAG architecture, which phase occurs once or periodically to prepare data?

Easy

1

D

Query vectorization phase

Similarity search phase

Retrieval and answer generation phase (Retrieval Generation Online)

Data indexing phase (Indexing Offline)

The Data Indexing phase (Indexing Offline) occurs once or periodically to prepare data for RAG.

6

Unit 1: Basic AI Fundamentals

Lec1

Chunking

What is the purpose of dividing data into smaller text chunks in the ‘Load and Chunk’ step?

Easy

1

A

To ensure semantics are not lost and optimize for searching

To simplify the vectorization process

To reduce the storage capacity of data

To speed up data loading into the system

Chunking ensures that semantics are not lost and optimizes for searching.

7

Unit 1: Basic AI Fundamentals

Lec1

Vector Similarity

What is the most common method for measuring similarity between query vectors and document vectors in a Vector Database?

Medium

1

C

Manhattan distance

Jaccard similarity

Cosine Similarity

Euclidean distance

Cosine Similarity is the most common method for measuring the cosine angle between two vectors.

8

Unit 1: Basic AI Fundamentals

Lec1

RAG Online Phase

What happens to the user’s question in the first step of the ‘Retrieval and Answer Generation’ phase?

Easy

1

D

The question is stored in the database

The question is divided into smaller chunks

The question is translated to another language

The question is vectorized using an Embedding model

The user’s question is vectorized using an Embedding model.

9

Unit 1: Basic AI Fundamentals

Lec1

Embedding Quality

The quality of which component directly affects the effectiveness of the entire RAG system?

Medium

1

A

Embedding model

Similarity search method

Vector database

Prompting technique

The quality of the Embedding model directly affects the effectiveness of the entire system.

10

Unit 1: Basic AI Fundamentals

Lec1

Softmax Function

In the LLM model, what is the role of the Softmax function?

Hard

1

A

Convert scores (logits) into a probability distribution to select the most likely word

Filter out irrelevant sentences or information in text chunks

Calculate scores (logits) for all words in the vocabulary

Search for suitable text chunks

The Softmax function converts scores (logits) into a probability distribution, helping the model select the most likely word to appear.

11

Unit 1: Basic AI Fundamentals

Lec1

HyDE Technique

What is the HyDE (Hypothetical Document Embeddings) technique used for?

Hard

1

A

Expand the input query to improve retrieval results

Re-evaluate the relevance of each (question, chunk) pair

Filter out irrelevant information in text chunks

Combine the power of keyword search and vector search

HyDE uses a small LLM to generate a hypothetical document containing the answer, then uses this document’s vector for searching, improving retrieval results.

12

Unit 1: Basic AI Fundamentals

Lec1

Hybrid Search

What is Hybrid Search?

Medium

1

A

A method that combines the power of keyword search and vector search

A method that re-evaluates the relevance of each (question, chunk) pair

A method that transforms questions to improve retrieval results

A method that compresses context before putting it into the prompt

Hybrid Search combines keyword search (e.g., BM25) and vector search to achieve more comprehensive results.

13

Unit 1: Basic AI Fundamentals

Lec1

Context Compression

What is the purpose of Context Compression?

Medium

1

D

Rearrange potential candidates to select the top quality chunks

Transform input questions to improve retrieval results

Improve the accuracy of information retrieval

Reduce prompt length and help LLM focus on core information

Context Compression helps reduce prompt length and helps the LLM focus on core information by filtering out irrelevant information.

14

Unit 1: Basic AI Fundamentals

Lec1

Re-ranker

What is the role of a Re-ranker in the RAG process?

Medium

1

C

Compress text chunks to reduce prompt length

Transform the original question to improve retrieval results

Re-evaluate the relevance of each (question, chunk) pair and reorder them

Search for text chunks based on keywords

Re-ranker re-evaluates the relevance of each (question, chunk) pair and reorders them to select the top quality chunks.

15

Unit 1: Basic AI Fundamentals

Lec1

Retriever Failure

What happens if the retrieval system (retriever) does not find accurate documents in the RAG system?

Medium

1

B

The system will automatically adjust retrieval parameters to find more suitable documents

The Large Language Model (LLM) cannot answer correctly

The Large Language Model (LLM) will search for information from external sources to compensate for missing data

The Large Language Model (LLM) can still generate accurate answers based on prior knowledge

If the retriever does not find the correct documents, no matter how smart the LLM is, it cannot answer correctly.

16

Unit 1: Basic AI Fundamentals

Lec1

Lost in the Middle

What does the ‘Lost in the Middle’ syndrome in RAG systems refer to?

Hard

1

A

The tendency of LLMs to focus on information at the beginning and end of long contexts, ignoring information in the middle

Text chunks having duplicate information in the middle, causing noise in processing

Difficulty integrating LLMs in the middle of the retrieval and generation process

Delays in information retrieval when relevant documents are in the middle position in the database

When prompts contain long contexts, LLMs tend to focus only on information at the beginning and end, easily ignoring important details in the middle.

17

Unit 1: Basic AI Fundamentals

Lec1

Faithfulness Evaluation

What does ‘Faithfulness’ evaluation in RAG systems measure?

Medium

1

A

The degree to which the generated answer adheres to the provided context

The speed of processing and generating answers by the system

The relevance of the answer to the user’s question

The system’s ability to retrieve information from different sources

Faithfulness measures the degree to which the generated answer adheres to the provided context. Does the system add information on its own?

18

Unit 1: Basic AI Fundamentals

Lec1

Attention Mechanism

What role does the Attention Mechanism play in the Transformer architecture of RAG systems?

Hard

1

C

Improve the model’s parallel processing capability, helping to speed up computation

Reduce dependence on fully connected layers in the model

Allow the model to weigh the importance of different words in the input sequence for deep context understanding

Enhance the ability to encode input information into semantic vectors

The Attention Mechanism allows the model to weigh the importance of different words in the input sequence for deep context understanding.

19

Unit 1: Basic AI Fundamentals

Lec1

MRR Metric

What does the Mean Reciprocal Rank (MRR) metric measure in Retrieval Evaluation?

Hard

1

C

Measure the system’s ability to synthesize information from different sources

Measure the relevance between the question and the generated answer

Measure the position of the first correct chunk in the returned result list

Measure the percentage of questions for which the system retrieves at least one chunk containing correct answer information

Mean Reciprocal Rank (MRR) measures the position of the first correct chunk in the returned result list. The higher the position, the higher the MRR score.

20

Unit 1: Basic AI Fundamentals

Lec1

Value in RAG

In the RAG model, which element represents the actual extracted information?

Medium

1

D

Key

Query

Key vector dimension (d_k)

Value

Value represents the actual extracted information in the RAG model.

21

Unit 1: Basic AI Fundamentals

Lec1

Multimodal RAG

Which RAG development direction allows retrieving information from different types of data such as images, audio, and text?

Easy

1

A

Multimodal RAG

Internal RAG system

Agentic RAG

RAG Chatbot

Multimodal RAG allows retrieving information from different data sources, not just text.

22

Unit 1: Basic AI Fundamentals

Lec1

Agentic RAG

Which type of RAG application has the ability to ask sub-questions and interact with external tools to gather information?

Medium

1

B

Internal document RAG system

Agentic RAG

Multimodal RAG

RAG Chatbot

Agentic RAG is more proactive in gathering information by asking sub-questions and interacting with external tools.

23

Unit 1: Basic AI Fundamentals

Lec1

Enterprise RAG

Which RAG application helps employees search for information in the company’s internal documents quickly and accurately?

Easy

1

D

Multimodal RAG

Research and specialized analysis assistant

Smart customer support chatbots

Enterprise internal document RAG system

Enterprise internal document RAG systems help employees search for information quickly and accurately.

24

Unit 1: Basic AI Fundamentals

Lec1

Interactive Learning

What problem does RAG (Retrieval-Augmented Generation) application solve in interactive learning?

Medium

1

C

Limited access to learning materials

Inaccurate assessment of learning outcomes

Boredom and passivity when learning through textbooks

Lack of updated information in textbooks

RAG creates interactive tools that allow students to interact with learning materials more actively compared to reading traditional textbooks.

25

Unit 1: Basic AI Fundamentals

Lec1

Financial RAG

In the financial field, how can RAG support analysts?

Medium

1

A

Summarize and analyze risks from long financial reports

Manage personal investment portfolios

Predict stock market fluctuations

Automatically create financial reports

RAG can summarize and analyze risks from long financial reports, helping analysts save time and make decisions faster.

26

Unit 1: Basic AI Fundamentals

Lec1

E-commerce RAG

How does RAG improve product recommendation systems on e-commerce sites?

Medium

1

A

Retrieve information from detailed descriptions, product reviews, and technical specifications

Optimize product prices based on competitors

Provide 24/7 online customer support services

Enhance the ability to predict customer needs

RAG retrieves information from detailed descriptions, product reviews, and technical specifications to provide personalized recommendations, rather than relying solely on click history.

27

Unit 1: Basic AI Fundamentals

Lec1

RAG Distinctive Feature

What is the distinctive feature of RAG compared to traditional generative AI systems?

Medium

1

D

Integration with cloud platforms to increase scalability

Using the most advanced deep learning algorithms

Ability to automatically adjust parameters to optimize performance

Combining the deep language capabilities of LLMs with the accuracy of external knowledge bases

RAG combines the language capabilities of LLMs with the accuracy and up-to-date nature of external knowledge bases, creating more reliable and transparent AI applications.

28

Unit 1: Basic AI Fundamentals

Lec1

Vector Database

What is the primary purpose of a Vector Database in a RAG system?

Easy

1

B

Store raw text documents for quick retrieval

Store and efficiently search through vector embeddings

Manage user authentication and access control

Cache frequently asked questions and answers

A Vector Database is specifically designed to store and efficiently search through vector embeddings, enabling fast similarity searches in the RAG pipeline.

29

Unit 1: Basic AI Fundamentals

Lec1

Chunking Strategies

Which chunking strategy maintains the logical structure of a document by splitting at natural boundaries?

Medium

1

C

Fixed-size chunking

Random chunking

Semantic chunking

Overlapping chunking

Semantic chunking splits documents at natural boundaries (paragraphs, sentences, sections) to maintain logical structure and preserve meaning within each chunk.

30

Unit 1: Basic AI Fundamentals

Lec1

Top-K Retrieval

What does the ‘Top-K’ parameter control in RAG retrieval?

Easy

1

A

The number of most similar documents to retrieve

The maximum length of each chunk

The threshold for similarity scores

The number of re-ranking iterations

Top-K parameter controls how many of the most similar documents are retrieved from the vector database to provide context for the LLM.

31

Unit 1: Basic AI Fundamentals

Lec1

Prompt Engineering

In RAG systems, what is the role of the system prompt when generating answers?

Medium

1

B

To store retrieved documents permanently

To instruct the LLM on how to use the retrieved context to generate answers

To perform the similarity search in the vector database

To convert user queries into embeddings

The system prompt instructs the LLM on how to use the retrieved context to generate accurate, grounded answers and may include formatting guidelines and constraints.

32

Unit 1: Basic AI Fundamentals

Lec1

Answer Relevance

What does ‘Answer Relevance’ measure in RAG evaluation?

Medium

1

C

How fast the system generates responses

The accuracy of the embedding model

How well the generated answer addresses the user’s original question

The number of retrieved documents used

Answer Relevance measures how well the generated answer addresses the user’s original question, ensuring the response is pertinent and useful.

33

Unit 1: Basic AI Fundamentals

Lec1

Context Window

What limitation does the ‘context window’ impose on RAG systems?

Hard

1

D

The maximum number of documents that can be stored

The time limit for generating responses

The minimum similarity score for retrieval

The maximum amount of text that can be processed by the LLM at once

The context window limits the maximum amount of text (retrieved chunks + query + system prompt) that can be processed by the LLM at once, requiring careful management of chunk sizes.

34

Unit 1: Basic AI Fundamentals

Lec1

Metadata Filtering

What is the benefit of using metadata filtering in RAG retrieval?

Medium

1

A

Narrow down search results based on document attributes before semantic search

Increase the size of the vector database

Speed up the embedding generation process

Reduce the cost of LLM API calls

Metadata filtering allows narrowing down search results based on document attributes (date, source, category) before or during semantic search, improving retrieval precision.

35

Unit 1: Basic AI Fundamentals

Lec1

Hallucination Prevention

Which technique helps prevent hallucination in RAG systems by ensuring answers are grounded in retrieved content?

Hard

1

B

Increasing the temperature parameter

Instructing the LLM to only use information from the provided context

Using larger embedding dimensions

Reducing the Top-K value to 1

Instructing the LLM through the system prompt to only use information from the provided context and to say “I don’t know” when information is not available helps prevent hallucination and ensures answers are grounded in retrieved content.


RAG Optimization Theory#

Exam Theory: RAG and Optimization#

This exam theory focuses on assessing advanced topics within Retrieval-Augmented Generation (RAG) and its optimization techniques, drawing specifically from Advanced Indexing, Hybrid Search, Query Transformation, Post-Retrieval Processing, and GraphRAG Implementations.

No.

Training Unit

Lecture

Training content

Question

Level

Mark

Answer

Answer Option A

Answer Option B

Answer Option C

Answer Option D

Explanation

1

Unit 1: RAG and Optimization

Lec 1

Advanced Indexing

What is a major disadvantage of fixed-size chunking when applied to large amounts of documents?

Easy

1

A

It causes a loss of semantics by breaking ideas arbitrarily.

It is too computationally expensive.

It prevents vector search from indexing numbers.

It requires advanced linguistic models to parse.

Mechanical chunking accidentally breaks the flow of the text, making the LLM unable to understand the context when an idea is arbitrarily split.

2

Unit 1: RAG and Optimization

Lec 1

Advanced Indexing

Why does Brute-force Flat Indexing become a serious problem as a system scales?

Easy

1

B

It consumes too much disk space.

It causes high latency when sequentially scanning millions of vectors.

It is incompatible with neural network architectures.

It only supports English text.

Sequentially scanning through millions of vectors in a Flat Index is too slow to meet real-time requirements.

3

Unit 1: RAG and Optimization

Lec 1

Advanced Indexing

What is the core idea driving Semantic Chunking?

Medium

1

C

To chunk text strictly by paragraph breaks.

To split texts after exactly 1000 characters.

To detect shifts to a new topic and perform a break precisely at the intersection of two topics.

To summarize the text before splitting it.

Semantic Chunking detects when sentences or content shift to a new topic (when vector direction abruptly changes) to perform a break.

4

Unit 1: RAG and Optimization

Lec 1

Advanced Indexing

What metric is typically calculated between consecutive sentences during Semantic Chunking?

Medium

1

A

Cosine similarity

Word count ratio

Token frequency

Character limits

In Semantic Chunking, the similarity (for example cosine similarity) is calculated between the current sentence and the next one.

5

Unit 1: RAG and Optimization

Lec 1

Advanced Indexing

In Semantic Chunking, when does the algorithm decide to split the text?

Medium

1

D

When similarity is above 90%.

After a fixed number of punctuation marks.

When the sentence length exceeds the threshold.

When similarity drops significantly below a threshold.

If similarity drops significantly below the threshold, it means the topic has changed, breaking the chunk there.

6

Unit 1: RAG and Optimization

Lec 1

Advanced Indexing

What is a notable advantage of Semantic Chunking over Recursive Chunking?

Medium

1

B

It runs extremely fast.

It preserves ideas fully and perfectly follows the flow of text.

It does not consume any computational resources.

It is specifically designed for codebases.

Semantic Chunking preserves ideas fully, strictly follows the text flow, and increases accuracy when searching.

7

Unit 1: RAG and Optimization

Lec 1

Advanced Indexing

What is a major disadvantage of Semantic Chunking?

Easy

1

C

It cuts through important ideas frequently.

It returns very noisy contexts.

It consumes computational resources due to running a model to compare each sentence.

It only works for legal or contract documents.

Because it must run an ML model to compare the similarity of each consecutive sentence, it consumes computational resources.

8

Unit 1: RAG and Optimization

Lec 1

Advanced Indexing

What does HNSW stand for in the context of Vector Databases?

Easy

1

A

Hierarchical Navigable Small World

High Neural State Weights

Heuristic Node Searching Window

Hierarchical Numeric Sequence Word

HNSW stands for Hierarchical Navigable Small World, an effective algorithm balancing retrieval speed and accuracy.

9

Unit 1: RAG and Optimization

Lec 1

Advanced Indexing

What kind of data structure does HNSW organize data into?

Medium

1

C

A flat SQL table

A chronological file system

A multi-layered graph structure

A raw byte stream

HNSW organizes data in the form of a multi-layered graph structure utilizing short and long shortcut links.

10

Unit 1: RAG and Optimization

Lec 1

Advanced Indexing

In HNSW, what is the role of Layer 0?

Medium

1

D

It contains the shortest summary of the dataset.

It stores the sparse shortcut links.

It is empty and serves as a placeholder.

It contains all data points and the most detailed links between them.

Layer 0 contains all data points, and the most detailed links. It contains the most complete information to find the exact target.

11

Unit 1: RAG and Optimization

Lec 1

Advanced Indexing

What does parameter M (Max Links per Node) dictate in HNSW?

Hard

1

A

The maximum number of links a node can create with neighbor nodes.

The memory limit in megabytes.

The number of documents returned.

The margin of error allowed.

M specifies the maximum number of links a node can create with other neighbor nodes. The larger M is, the denser the network.

12

Unit 1: RAG and Optimization

Lec 1

Advanced Indexing

How should ef_search be configured for a real-time Chatbot application?

Hard

1

B

It should be set to 0.

It should be kept at a low level (e.g., 50-100) to optimize latency.

It should be set to maximum allowed bounds.

It should equal the total number of documents.

Keeping ef_search at a low level optimizes the system response time for a chatbot where small error margins are acceptable in favor of speed.

13

Unit 1: RAG and Optimization

Lec 2

Hybrid Search

What is an inherent weakness of standard Vector Search?

Easy

1

C

It lacks speed when processing basic synonyms.

It struggles with multilingual queries.

It reveals weaknesses when encountering queries requiring absolute accuracy in wording.

It ignores document meaning entirely.

Vector Search reveals weaknesses when processing queries requiring absolute accuracy (e.g., proper names, error codes).

14

Unit 1: RAG and Optimization

Lec 2

Hybrid Search

What exactly constitutes a Hybrid Search mechanism?

Easy

1

A

Combining the power of semantic vector search with traditional keyword search.

Merging structured and unstructured relational databases.

Running two identical LLMs simultaneously.

Compiling queries in both Python and Java.

Hybrid search combines semantic search (Vector) and traditional keyword search (BM25).

15

Unit 1: RAG and Optimization

Lec 2

Hybrid Search

Which keyword frequency-based statistical algorithm is standard for Hybrid Search?

Easy

1

D

BERT

HNSW

HyDE

BM25

BM25 is the gold standard for traditional keyword retrieval algorithms in Hybrid Search.

16

Unit 1: RAG and Optimization

Lec 2

Hybrid Search

How does BM25 solve the keyword spamming problem found in TF-IDF?

Medium

1

B

By manually blacklisting frequent spammers.

By applying a saturation mechanism where scoring asymptotes after several keyword occurrences.

By analyzing the semantic meaning of repetitive words.

By deleting any document that repeats a word.

BM25 applies a saturation mechanism so that appearing a 101st time hardly adds more score than the 10th time.

17

Unit 1: RAG and Optimization

Lec 2

Hybrid Search

What does Inverse Document Frequency (IDF) do in the BM25 formula?

Medium

1

A

It penalizes common words and massively rewards rare words.

It ranks shorter documents higher than longer ones.

It limits the number of query words sent to the server.

It inverses the vectors created by the model.

IDF penalizes common words heavily while attributing more importance and score weight to rare words.

18

Unit 1: RAG and Optimization

Lec 2

Hybrid Search

Why is Length Normalization an important feature of BM25?

Medium

1

C

It forces all documents to be exactly 1000 characters.

It compresses long queries to save bandwidth.

A single keyword in a short paragraph gets rated higher than the same keyword diluted in a long novel.

It converts all characters to lowercase.

BM25 scales the score based on document length to prevent long documents from unfairly dominating over concise information.

19

Unit 1: RAG and Optimization

Lec 2

Hybrid Search

In a typical Hybrid Search pipeline, how are the two algorithms executed?

Medium

1

D

Vector search completes first, then BM25 is run on the results.

BM25 runs entirely locally before running Vector remotely.

Only one is executed depending on a query classifier.

They are executed in parallel simultaneously.

The system sends the query simultaneously to both search engines (Parallel Execution).

20

Unit 1: RAG and Optimization

Lec 2

Hybrid Search

Why can’t we simply add the BM25 score and the Vector Search score together?

Hard

1

B

Vector search scores are negative integers.

The scoring scales are fundamentally different (Vector uses [0, 1] cosine similarity; BM25 is arbitrary positive numbers).

They are processed on different neural network architectures.

BM25 produces alphabetical grading ranges.

The scoring scales of the two algorithms are completely different and numerically incompatible directly.

21

Unit 1: RAG and Optimization

Lec 2

Hybrid Search

What algorithm solves the score compatibility issue in Hybrid Search?

Medium

1

C

GraphRAG Convolution

Maximal Marginal Relevance

Reciprocal Rank Fusion (RRF)

TF-IDF Smoothing

Reciprocal Rank Fusion (RRF) merges these two lists effectively.

22

Unit 1: RAG and Optimization

Lec 2

Hybrid Search

Upon what theoretical basis does Reciprocal Rank Fusion (RRF) operate?

Hard

1

A

Instead of scores, it assumes that if a document appears at a high rank in both lists, it is certainly important.

It averages the raw text chunks of both documents.

It only accounts for the longest document.

It uses an LLM to assign arbitrary ranks.

RRF cares about rank rather than score; a high consensus of rank across disparate algorithms signifies an important document.

23

Unit 1: RAG and Optimization

Lec 2

Hybrid Search

What is the purpose of the smoothing constant k within the RRF formula?

Hard

1

D

It identifies the number of total documents in the database.

It sets the maximum allowed token count.

It determines the strictness of exact keyword matching.

It helps reduce score disparity between very high ranks, ensuring fairness.

The constant \(k\) (usually 60) reduces massive score disparities between adjacent high ranks (like Top 1 vs Top 2), ensuring a smoother gradient of rank scoring.

24

Unit 1: RAG and Optimization

Lec 2

Hybrid Search

What does Hybrid Search primarily sacrifice to gain balanced Context and Keyword accuracy?

Easy

1

B

Security and Privacy

System resources, as it is complex to deploy and consumes resources running 2 parallel streams.

API documentation clarity

Multi-lingual support

Hybrid Search is more complex to deploy and consumes more resources due to running parallel streams simultaneously.

25

Unit 1: RAG and Optimization

Lec 3

Query Transformation

Why do raw user questions often yield poor Vector Search results natively?

Easy

1

C

LLMs cannot read unformatted text.

Vector databases reject single words.

Questions are short/interrogative, lacking context compared to long descriptive documents.

Search algorithms intentionally delay short queries.

Vector Search faces semantic asymmetry; questions are short and interrogative while documents are long and descriptive.

26

Unit 1: RAG and Optimization

Lec 3

Query Transformation

What is the core idea of Query Transformation?

Easy

1

A

Using an LLM to rewrite, expand, or break down the user’s question into better versions before searching.

Encrypting user queries before transmission.

Replacing semantic searches with strict SQL SELECT queries.

Running the user’s prompt through a grammar checker.

It uses an LLM to intelligently edit, expand, or rewrite poor raw queries before sending them to the lookup department.

27

Unit 1: RAG and Optimization

Lec 3

Query Transformation

What does HyDE stand for in Query Transformation?

Medium

1

B

Heavy Yield Database Execution

Hypothetical Document Embeddings

Hybrid Y-axis Dense Encapsulation

Hex-layered Data Encryption

HyDE stands for Hypothetical Document Embeddings.

28

Unit 1: RAG and Optimization

Lec 3

Query Transformation

What happens during the “Generate” phase of a HyDE strategy?

Medium

1

D

It generates Python scripts.

It generates a dense vector representing the question.

It generates an index mapping inside the SQL table.

The system asks the LLM to write a hypothetical answer paragraph for the user’s question.

The LLM is forced to draft a fake, hypothetical answer for the question so it matches the expected document vocabulary.

29

Unit 1: RAG and Optimization

Lec 3

Query Transformation

Does the hypothetical “fake answer” drafted in HyDE need to be factually correct?

Hard

1

B

Yes, exact factual accuracy guarantees precise matches.

No, but the writing style and technical vocabulary should resemble the actual document.

Yes, the model refuses to output hallucinated responses.

No, it just generates a sequence of random numbers.

The information in the paragraph might be factually incorrect, but its style and technical vocabulary mimic real documents to enable better semantic matching.

30

Unit 1: RAG and Optimization

Lec 3

Query Transformation

Why is the vector generated from the “fake answer” in HyDE more useful than the user’s question vector?

Medium

1

A

The fake answer vector is semantically closer to the real document vector than the short interrogative question vector.

It consumes 0 RAM.

It maps perfectly to sparse BM25 arrays.

The user’s query vector is permanently deleted.

The drafted answer contains similar sentence structures/buzzwords to real documents, closing the asymmetric semantic gap.

31

Unit 1: RAG and Optimization

Lec 3

Query Transformation

When is the Query Decomposition strategy particularly useful?

Medium

1

C

When querying single words.

When parsing simple FAQ menus.

When a question requires comparing or aggregating information from multiple independent scattered sources.

When reading codebases in completely unknown programming languages.

It handles complex multi-intent questions comparing or gathering data from multiple sources where a single text snippet fails to contain the whole answer.

32

Unit 1: RAG and Optimization

Lec 3

Query Transformation

What happens during the first phase (Breakdown) of Query Decomposition?

Medium

1

A

The LLM analyzes the original question and splits it into a sequence of separate independent sub-questions.

The system shreds the database documents into chunks.

The LLM provides the final answer immediately without searching.

The database is partitioned across multiple distinct servers.

The system identifies multi-intent questions and logically breaks them into single-intent targeted sub-questions.

33

Unit 1: RAG and Optimization

Lec 3

Query Transformation

How does Query Decomposition run searches for multiple sub-questions?

Medium

1

B

It merges all sub-questions back into one query.

It performs standard document searches individually for each separate sub-question.

It relies exclusively on cached external queries.

It skips queries containing conjunctions.

It executes distinct targeted retrieval queries for every identified independent sub-question.

34

Unit 1: RAG and Optimization

Lec 3

Query Transformation

Which phase of Query Decomposition requires the LLM to process text found from all separate sub-searches?

Easy

1

C

Breakdown

Encapsulation

Synthesis

Verification

In Synthesis, text segments found from all previous distinct steps are aggregated and fed into the LLM to form a complete final answer.

35

Unit 1: RAG and Optimization

Lec 3

Query Transformation

In summary, what role does Query Transformation act as?

Easy

1

D

An internet firewall proxy.

A database administrator deleting old records.

A compiler translating queries to binary.

An intelligent editor reorienting questions to ensure the system correctly understands true intent.

It performs intelligent preprocessing (via drafting or splitting) so concise or poor user queries execute properly against the technical index.

36

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

Why is the Top-K list returned directly from standard retrievers often suboptimal for an LLM?

Medium

1

A

Standard embedding models trade deep semantic accuracy for retrieval speed, and may return contextually incorrect “noisy” keyword matches.

The returned list is usually empty.

The standard top-K size is too large for modern hardware.

The returned documents are always translated to a random language.

Embedding models heavily prioritize index speed over complex relationship comprehension, often returning documents with matching keywords but wrong contextual intents.

37

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

What represents the main goal of Re-ranking in a RAG pipeline?

Easy

1

C

To randomly shuffle the document list.

To format the output HTML for the frontend.

To act as a final filter processing a small pool of candidates to pick the absolutely best ones.

To permanently alter the dataset ordering.

Re-ranking takes a small pool (like 50) and spends extra computational time reading them carefully to pick the top 5 highest-quality documents.

38

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

What architectural method do standard Embedding Models use during the Retrieval step?

Medium

1

B

Graph-Encoder

Bi-Encoder

Cross-Encoder

Recursive-Encoder

Retrieval embeddings process questions and documents separately via Bi-Encoders.

39

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

What is the major pros and cons of the Bi-Encoder architecture?

Hard

1

A

Fast speed (via pre-computation), but loses detailed nuanced interaction information between question and document words.

Extreme accuracy, but consumes too much API quota.

Perfectly handles complex negations, but fails at simple keywords.

It guarantees data privacy, but prevents external web searches.

Because the vectors are calculated independently ahead of time, it runs fast but misses deeper interrelated context (like negations vs subjects).

40

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

How does a Cross-Encoder fundamentally differ from a Bi-Encoder?

Hard

1

D

It translates everything into Spanish.

It maps vectors onto a graph database exclusively.

It bypasses the attention mechanism entirely.

The question and document are concatenated into a single text sequence, processed simultaneously via a full Self-Attention mechanism.

Instead of separated outputs, Cross-Encoders read both strings concurrently to understand complex logic, negation, and interactions between all words simultaneously.

41

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

If Cross-Encoders are incredibly accurate, why don’t we use them to search the entire database?

Medium

1

C

They cannot run on GPUs.

They only output integers.

They are very slow and resource-consuming to run across millions of documents.

They are blocked by vector database protocols.

Processing millions of documents concurrently through strict Self-Attention is too computationally slow.

42

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

What describes the Funnel Strategy in Post-Retrieval?

Medium

1

B

Running Bi-Encoder and Cross-Encoder on separate clusters entirely.

Using Bi-Encoder to fast-retrieve a Top 50, then using Cross-Encoder to slowly re-score those 50 into a Top 5.

Splitting documents into smaller funnels based on character limits.

Re-ranking the vector database before queries arrive.

The funnel strategy accepts speed from Bi-Encoders (for finding 50 items) and precision from Cross-Encoders (for filtering to 5).

43

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

In scenarios dealing with biological negation (e.g., “What does Python NOT eat”), why does a Cross-Encoder succeed where a Bi-Encoder fails?

Hard

1

A

The Cross-Encoder recognizes the negation structure and biological context perfectly since it reads the query and document concurrently.

The Cross-Encoder has a specialized biology database pre-installed.

The Bi-Encoder deletes the word “NOT”.

The Cross-Encoder ignores keywords entirely.

Bi-Encoders mistakenly link the keywords “Python” and “eat”, while Cross-Encoders accurately recognize the negation modifier mapping to the biological logic.

44

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

What does MMR stand for in the context of Post-Retrieval processing?

Medium

1

D

Minimum Marginal Rating

Multi-Model Retrieval

Memory Mapping Resolution

Maximal Marginal Relevance

MMR stands for Maximal Marginal Relevance, an algorithm used to diversify query results.

45

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

What twofold problem does MMR aim to solve when selecting final documents?

Medium

1

B

Size vs Compression

Relevance to the query vs Diversity to prevent identical redundant documents.

API Latency vs Local Storage

Token allowance vs Security constraints

When similarity returns 5 identical paragraphs of text, MMR resolves the redundancy by ensuring selected documents are relevant but distinctly diverse.

46

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

In the MMR algorithm, what occurs after picking the most similar document (Step 1)?

Hard

1

C

The system clears the cache.

The system returns immediately.

It finds the next document similar to the query but least similar to previously selected documents.

It picks the document that is completely irrelevant to the query.

Step 2 balances relevance by filtering for the next document containing the query’s answer but differing heavily from the document already selected.

47

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

In the MMR optimization formula, what does lowering lambda (\(\lambda\)) do?

Hard

1

A

Priorities diversity by increasing the penalty for selecting text similar to existing selected documents.

Causes the system to crash.

Forces exact keyword matching.

Elevates relevance entirely over diversity.

Decreasing lambda gives more mathematical priority to the diversity penalty section of the MMR formula, forcing varied information.

48

Unit 1: RAG and Optimization

Lec 4

Post-Retrieval

If a user asks a broad question (“Features of VF8 Car”) and wants comprehensive overall coverage, which Re-ranker is optimal?

Medium

1

C

Flat Indexing

Recursive Chunking

Maximal Marginal Relevance (MMR)

Simple Bi-Encoder similarity

MMR guarantees diverse, non-redundant documents giving the LLM text detailing multiple broad vehicle features, not just repeated text about its engine.

49

Unit 1: RAG and Optimization

Lec 5

GraphRAG

What does GraphRAG combine to create a comprehensive knowledge representation system?

Easy

1

B

Cloud storage and Edge devices

Structured graph databases with vector-based retrieval

Dense and Sparse chunking limits

Hybrid APIs and NoSQL mappings

GraphRAG merges structured graph DBs (like Neo4j) and vector retrieval.

50

Unit 1: RAG and Optimization

Lec 5

GraphRAG

What popular graph database is used for storing GraphRAG entities in the implementation example?

Easy

1

A

Neo4j

PostgreSQL

ElasticSearch

MongoDB

Neo4j is utilized to construct and store the nodes and relationship graphs.

51

Unit 1: RAG and Optimization

Lec 5

GraphRAG

What is the purpose of Pydantic models in the implementation pipeline?

Medium

1

D

To render the Neo4j visualization frontend.

To manage API timeout failures.

To download PDF files correctly.

To enforce validation schemas for structured entity/relationship output from the LLM.

Pydantic classes like PolicyClauseExtraction compel the LLM to output consistent, strictly validated object types representing entities.

52

Unit 1: RAG and Optimization

Lec 5

GraphRAG

According to the implementation extraction rules, what constitutes a “commitment”?

Medium

1

C

Simple definitions and jargon.

Any sentence ending in a period.

A clear promise, obligation, or prohibition found in the text.

A numeric calculation executed by the CPU.

The LLM is instructed to identify clear promises, obligations, or prohibitions as Commitments.

53

Unit 1: RAG and Optimization

Lec 5

GraphRAG

How are measurable numeric limits inside obligations handled during extraction?

Hard

1

D

They are discarded mathematically.

They are summed together.

They are sent to a calculator API.

They are explicitly extracted as Constraint unit parameters.

If a commitment contains numeric limits, the agent extracts them strictly as linked Constraints.

54

Unit 1: RAG and Optimization

Lec 5

GraphRAG

What does the .with_structured_output(PolicyClauseExtraction) method achieve in LangChain?

Medium

1

A

Forces the LLM to reply via JSON adhering precisely to the Pydantic schema class.

Translates the output into Neo4j graph visualizations natively.

Prevents the model from reading files.

Outputs Python code running in a sandbox.

It guarantees the unstructured text processed by the ChatGPT API is accurately deserialized back into structured PolicyClauseExtraction objects.

55

Unit 1: RAG and Optimization

Lec 5

GraphRAG

In the designed graph schema, what do PolicyClause nodes specifically track?

Easy

1

C

The user identities processing the data.

The hardware metrics.

The overarching policy topics/units from chunked texts.

The exact numeric values from commitments.

PolicyClause nodes store the actual chunked policy texts/topics serving as central nodes linking other entities.

56

Unit 1: RAG and Optimization

Lec 5

GraphRAG

In Cypher (Neo4j), which operation ensures duplicate nodes are not created during ingestion?

Medium

1

B

INSERT IGNORE

MERGE

UPSERT

ADD DISTINCT

Using the MERGE query checks existence before inserting, preventing duplicated nodes.

57

Unit 1: RAG and Optimization

Lec 5

GraphRAG

How are Stakeholder nodes structurally linked in the Neo4j graph?

Hard

1

A

Via the AFFECTS relationship incoming from the PolicyClause node.

Via a standalone IS_A class instance mapping.

Via CONTAINS relationships stemming from Regulation nodes.

They are completely unlinked.

Stakeholder nodes reflect affected parties, mapped using [:AFFECTS] from the PolicyClause.

58

Unit 1: RAG and Optimization

Lec 5

GraphRAG

What represents a distinct advantage of GraphRAG over standard vector similarity search?

Medium

1

B

It consumes zero system memory.

Relationships explicitly define how entities connect, solving queries needing context-aware traversal mapping.

It requires no chunking.

It automatically resolves grammatical mistakes.

Graph traversal natively exposes how discrete entities explicitly connect, answering intricate logical queries that vector distances alone cannot deduce.

59

Unit 1: RAG and Optimization

Lec 5

GraphRAG

Which LangChain module converts natural language into Cypher queries for the LLM?

Medium

1

A

GraphCypherQAChain

VectorDBQAChain

PydanticOutputParser

DocumentConverter

GraphCypherQAChain converts English questions into Cypher code capable of traversing the graph structure.

60

Unit 1: RAG and Optimization

Lec 5

GraphRAG

What is noted as a core limitation or consideration when implementing GraphRAG?

Medium

1

D

It deletes all prior indexes upon restart.

It requires user authentication before every search.

The LLM must be hosted locally.

It relies heavily on specific types of structured data linking to form an effective knowledge base.

GraphRAG’s power originates strictly from highly structured data mappings; mapping unstructured erratic data yields poor relationships.


LangGraph and Agentic AI Theory#

Final Exam#

No.

Training Unit

Lecture

Training content

Question

Level

Mark

Answer

Answer Option A

Answer Option B

Answer Option C

Answer Option D

Explanation

1

LangGraph & Agentic AI

Lec1

State Management

What is the core field used for ALL input/output from nodes in a LangGraph State?

Easy

1

C

context

history

messages

state_vars

The messages field is the core channel for all conversational I/O between nodes in LangGraph.

2

LangGraph & Agentic AI

Lec1

State Management

Which concept allows LangGraph to support complex workflows compared to standard LangChain chains?

Easy

1

B

Linear flows only

Cyclic flows and conditional routing

Stateless operations

Basic sequential pipelines

Extends basic chains with cyclic flows and conditional routing for loops / complex logic.

3

LangGraph & Agentic AI

Lec1

State Management

What is the role of add_messages reducer in a TypedDict State?

Easy

1

A

Appending new messages and handling deduplication

Deleting old messages automatically

Summarizing long conversations

Replacing the current message list with a new one

add_messages automatically appends new messages and handles deduplication via message IDs.

4

LangGraph & Agentic AI

Lec1

State Management

Which of the following is NOT a standard LangChain message type used in LangGraph?

Easy

1

D

AIMessage

HumanMessage

ToolMessage

DataMessage

Standard types are AIMessage, HumanMessage, SystemMessage, ToolMessage. DataMessage is not standard.

5

LangGraph & Agentic AI

Lec1

State Management

In LangGraph’s State structure, what should non-conversational context like user_id or max_iterations be used for?

Easy

1

B

Sent directly to the LLM response

Storing configuration and metadata

Replacing the standard message history

Caching LLM tokens

Context fields are meant for metadata and configuration, not standard I/O messages.

6

LangGraph & Agentic AI

Lec1

State Management

Which object serves as the core director engine orchestrating LLM workflows in LangGraph?

Easy

1

D

MessageGraph

GraphPipeline

WorkflowGraph

StateGraph

StateGraph is the core class orchestrating directed graph workflows based on state.

7

LangGraph & Agentic AI

Lec1

State Management

How does LangGraph handle context injection before starting the graph execution?

Medium

1

C

By loading it from an external JSON file automatically.

By sending a special SystemMessage at the end of the conversation.

By initializing the state with context variables when calling app.invoke(initial_state).

Context cannot be injected; the LLM must generate it.

Context is provided to app.invoke() alongside initial messages.

8

LangGraph & Agentic AI

Lec1

State Management

When building a multi-agent system, how do different agents (nodes) share findings with one another in a messages-centric pattern?

Medium

1

A

By appending AIMessage tagged with their name to the group’s messages list.

By modifying the global context object directly.

By resetting the messages list every time an agent switches.

By sending direct peer-to-peer API calls bypassing the state.

Agents append named AIMessages to the shared state’s messages list.

9

LangGraph & Agentic AI

Lec1

State Management

What is the primary purpose of adding nodes and edges to a StateGraph object?

Medium

1

D

To train a new deep learning model.

To clean the data before input into a LangChain chain.

To replace the standard LLM reasoning layers.

To map out functions as nodes and execution paths as edges.

Nodes represent functions/agents; edges dictate the workflow paths and conditionals.

10

LangGraph & Agentic AI

Lec1

State Management

If an LLM node returns {"messages": [AIMessage("Hello")]} without the add_messages reducer setup, what happens to the state?

Medium

1

B

It merges the new message safely.

It overwrites the existing message list.

It throws a syntax error.

It drops the message entirely.

Without a reducer like add_messages, standard dictionary update behavior would overwrite the list rather than append.

11

LangGraph & Agentic AI

Lec1

State Management

According to LangGraph Best Practices, why should conversational data (I/O) be kept strictly in messages while keeping context fields separate?

Hard

1

B

Because LangChain parsers crash if state contains integers.

It enables robust State Persistence (Checkpointers) which rely on deterministic, append-only message histories.

It saves tokens directly since context fields are automatically hidden from the LLM.

Context fields are only valid in the END node.

Checkpointers reconstruct and replay the state efficiently when conversational history relies on the standardized, append-only messages slice.

12

LangGraph & Agentic AI

Lec1

State Management

How can conditional routing leverage the State to decide whether to call a tool or end the workflow?

Hard

1

A

By inspecting state["messages"][-1] to check for tool_calls attributes.

By manually polling an external database at every node.

By counting the number of characters in the previous AIMessage.

By throwing an exception when the state is exhausted.

The conditional edge function looks at the last message to see if the LLM populated tool_calls.

13

LangGraph & Agentic AI

Lec2

Agentic Patterns

What does the ReAct pattern stand for in agentic workflows?

Easy

1

B

Refresh and Activate

Reason and Act

Respond and Acknowledge

Request and Action

ReAct combines explicit reasoning (Think) before acting (Tool Use) in a loop.

14

LangGraph & Agentic AI

Lec2

Agentic Patterns

Why is a Multi-Expert pattern generally preferred over a single generic web search tool for complex research?

Easy

1

A

It provides specialized domain knowledge and structured reasoning.

It uses fewer tokens.

It operates completely offline.

It requires zero prompt engineering.

Specialized LLMs acting as tools provide better domain insights and consistent reasoning.

15

LangGraph & Agentic AI

Lec2

Agentic Patterns

What is the purpose of the ToolNode in LangGraph?

Easy

1

D

To prompt the LLM to generate code.

To browse the internet using a headless browser.

To compress message history.

To automatically handle the parsing and execution of multiple tools.

ToolNode automatically executes the tools called by the LLM and formats them as ToolMessages.

16

LangGraph & Agentic AI

Lec2

Agentic Patterns

In a ReAct loop, what is the sequence of steps the coordinator LLM usually follows?

Easy

1

C

Act \(\to\) Think \(\to\) Stop

Observe \(\to\) Act \(\to\) Think

Think \(\to\) Act \(\to\) Observe

Stop \(\to\) Observe \(\to\) Think

The standard ReAct loop is: Think (Reason), Act (Call Tool), Observe (Tool Result), and Repeat.

17

LangGraph & Agentic AI

Lec2

Agentic Patterns

What is a common way to prevent an agent from getting trapped in an infinite ReAct loop?

Easy

1

B

Disabling all tools permanently.

Adding an iteration_count field in State and routing to END when a limit is reached.

Forcing the LLM to answer in 10 words or less.

Unplugging the server.

Checking an iteration limit in the conditional edge is best practice to stop runaway loops.

18

LangGraph & Agentic AI

Lec2

Agentic Patterns

How do Multi-Expert Tools differ technically from standard external API tools (like web search) inside a LangGraph setup?

Easy

1

C

They don’t use the @tool decorator.

They execute JavaScript code.

They are themselves LLM invocations with specialized system prompts.

They bypass the messages state entirely.

Expert tools invoke another instance of an LLM primed with a specific expert persona.

19

LangGraph & Agentic AI

Lec2

Agentic Patterns

If an agent is deciding which expert to call during the “Act” phase, what enables the LLM to provide structured function calls automatically?

Medium

1

B

Regular Expressions parsing.

Using llm.bind_tools([expert1, expert2]).

Writing manual JSON format instructions in the prompt.

Training a custom fine-tuned router model.

bind_tools() maps the tool schema natively to the LLM’s function-calling capabilities.

20

LangGraph & Agentic AI

Lec2

Agentic Patterns

What is the main architectural upgrade introduced when adding a Planning Agent to a simple ReAct flow?

Medium

1

A

The Coordinator is relieved of analyzing the user’s initial message; a separate Planner handles decomposition first.

Tools are executed synchronously without LLM intervention.

The agent switches to using a completely different model provider.

State management is no longer required.

A Planner separates the complex task of understanding and task decomposition from the execution/coordinator task.

21

LangGraph & Agentic AI

Lec2

Agentic Patterns

During the “Observe” phase of standard ReAct with Langgraph ToolNode, what specific message object is appended to the state?

Medium

1

D

SystemMessage

AIMessage

FunctionMessage

ToolMessage

After executing a tool, ToolMessages containing the tool output are returned to the state.

22

LangGraph & Agentic AI

Lec2

Agentic Patterns

What happens if multiple expert tools are called simultaneously by the Coordinator LLM?

Medium

1

B

They are ignored and skipped.

The ToolNode executes them in parallel and returns all their ToolMessages.

The graph crashes due to a concurrency error.

Only the first tool is executed.

Modern models can return multiple tool calls at once, which ToolNode handles naturally by executing them and appending all results.

23

LangGraph & Agentic AI

Lec2

Agentic Patterns

In a robust production-ready Multi-Expert Research agent, how should tool execution failures be handled?

Hard

1

D

By shutting down the LangGraph server.

By letting the unhandled exception crash the application so developers can debug.

By automatically switching model providers mid-workflow.

By catching the exception inside the tool or custom node and returning a ToolMessage stating the error, so the LLM can try a fallback.

Returning the error as a string message allows the Coordinator LLM to “Reason” about the failure and take alternative action.

24

LangGraph & Agentic AI

Lec2

Agentic Patterns

Why does a Multi-Expert ReAct pattern consume significantly more tokens than a simple linear agent?

Hard

1

C

Because it stores all memory in a vector database.

Because LangGraph adds a large metadata overhead to every variable.

The complete conversation history (messages list) including all intermediate reasoning and tool outputs must be sent back to the LLM upon every iteration.

Because expert LLMs generate longer responses to simple questions.

In ReAct loops, the context window GROWS each cycle as new AIMessage and ToolMessage entities are appended and fed back entirely during the next loop.

25

LangGraph & Agentic AI

Lec3

Tool Calling

What is the main difference between traditional LLM prompts and Tool Calling capabilities?

Easy

1

D

Prompts use more tokens.

Tool Calling avoids external APIs.

Tool Calling is only available in open-source models.

Tool Calling enables the model to issue structured JSON parameters to invoke external code automatically.

Structural return formats from the LLM via defined JSON schemes is the core innovation in Tool Calling.

26

LangGraph & Agentic AI

Lec3

Tool Calling

Which terminology specifically refers to OpenAI’s native API parameter for passing a JSON schema?

Easy

1

A

Function Calling

Agentic Use

Execution Action

Tool Prompting

OpenAI specifically categorizes the schema object passing under “Function Calling.”

27

LangGraph & Agentic AI

Lec3

Tool Calling

Which python decorator is used in LangChain to easily convert a standard Python function into a Tool?

Easy

1

C

@langchain_tool

@chain

@tool

@func

The @tool decorator automatically infers schema from the python function and its docstring.

28

LangGraph & Agentic AI

Lec3

Tool Calling

What makes Tavily Search specifically optimized for AI applications compared to standard generic web search APIs?

Easy

1

B

It is slower but cheaper.

It pre-formats results for LLMs, filters noise, and provides context for RAG.

It only searches Wikipedia.

It bypasses the internet using a local database.

Tavily removes clutter (HTML/Ads) and extracts clean content structured for immediate LLM context window ingestion.

29

LangGraph & Agentic AI

Lec3

Tool Calling

What is a common best practice regarding Tool Descriptions in the code?

Easy

1

A

They should be highly detailed so the LLM knows exactly when and how to call the tool.

They are ignored by the LLM, so they can be left blank.

They must be written in JSON.

They should be under 5 words to save tokens.

High-quality descriptions help the model “Reason” appropriately about when the tool is useful.

30

LangGraph & Agentic AI

Lec3

Tool Calling

What is “Tool Chaining”?

Easy

1

D

Storing tool outputs in a blockchain.

Running the same tool 100 times to check consistency.

Restricting tool execution to an administrator.

Using the output of one tool as the direct input argument for another tool recursively.

A common pattern is having one tool’s result guide the parameter execution of the next tool (like extracting a company name, then passing a stock ticker to a finance tool).

31

LangGraph & Agentic AI

Lec3

Tool Calling

How should developers securely manage API keys (like TAVILY_API_KEY) when building tool-calling applications?

Medium

1

B

Hardcoding them at the top of the python script.

Using Environment Variables or a Secret Management service (like Azure KeyVault).

Passing them directly inside the user prompt.

Storing them inside the StateGraph object.

Best practices strongly dictate loading secrets via ENV variables (e.g. dotenv) or cloud secret managers.

32

LangGraph & Agentic AI

Lec3

Tool Calling

When handling tool execution errors (such as network timeouts or API failures), what is the recommended fallback strategy?

Medium

1

C

Raising a fatal exception to stop the script immediately.

Silently ignoring the error and proceeding with an empty string.

Catching the exception and returning a ToolMessage containing the error text for the LLM.

Switching to an older language model automatically.

Returning the exception as a string in ToolMessage gives the LLM context to either reason about the failure, apologize to the user, or try another tool.

33

LangGraph & Agentic AI

Lec3

Tool Calling

What optimization technique can significantly reduce duplicate external API calls from tools?

Medium

1

A

Implementing a caching layer (e.g. lru_cache or a dictionary buffer) keyed by the tool query.

Disabling the @tool decorator.

Limiting the LLM to 1 iteration entirely.

Removing the system prompt.

Caching recent tool queries locally drastically saves external latency and cost for repeated inquiries.

34

LangGraph & Agentic AI

Lec3

Tool Calling

If you want to use a Custom Tool class in LangChain instead of a decorator, which base class must you inherit from?

Medium

1

D

ToolDecorator

GraphNode

LLMChain

BaseTool

Class-based tools need to inherit from BaseTool and override the _run and _arun methods.

35

LangGraph & Agentic AI

Lec3

Tool Calling

How does the Tavily API search_depth="advanced" configuration differ conceptually from standard execution?

Hard

1

C

It executes SQL queries on the backend instead.

It forces the agent to ask the user permission.

It performs a multi-step semantic search to extract comprehensive answers rather than returning simple link snippets.

It parses local PDF files instead of the web.

Advanced depth leverages an AI sub-agent during search to synthesize answers and return higher-quality textual analysis.

36

LangGraph & Agentic AI

Lec3

Tool Calling

When building an architecture where an Orchestrator routes tasks, why would you implement a specific “Web Search Agent” rather than just giving the generic tools directly to the primary assistant?

Hard

1

B

Because the primary assistant cannot accept tools format APIs.

To separate concerns: a specialized agent can execute multi-step tool queries recursively without overloading the main router’s prompt context.

Because Tavily Search restricts execution to sub-nodes by design.

Web Search agents use zero tokens.

Sub-agents handle the cognitive load of browsing, reading snippets, and re-searching autonomously, returning only polished synthesis to the main router.

37

LangGraph & Agentic AI

Lec4

Multi-Agent Collab

What is the main structural advantage of a Hierarchical (Supervisor) multi-agent system?

Easy

1

A

A Primary Assistant coordinates user intent and cleanly routes requests to specialized sub-agents.

Every agent talks to every other agent at the same time.

It prevents the use of external APIs.

It runs on a single linear LangChain pipeline.

Supervisors manage the workflow orchestration cleanly while sub-agents handle specific deep domains.

38

LangGraph & Agentic AI

Lec4

Multi-Agent Collab

Why would a system designer choose multi-agent architectures over a single sophisticated LLM?

Easy

1

C

Single LLMs cannot use Python code.

A single LLM always hallucinates.

It promotes specialization, modularity, parallel processing, and avoids prompt overloading.

Multi-agent systems guarantee faster latency in all scenarios.

Splitting into separate specialized models (e.g., Architect, Coder, Reviewer) improves accuracy and creates maintainable codebases.

39

LangGraph & Agentic AI

Lec4

Multi-Agent Collab

What does a Network (Peer-to-Peer) coordination pattern imply?

Easy

1

C

Agents are executed manually by humans.

All agents must report back to a supervisor before interacting.

Agents can communicate with each other directly without central supervision.

It is a centralized routing protocol.

Unlike supervisors, peer-to-peer agents message each other directly to resolve tasks.

40

LangGraph & Agentic AI

Lec4

Multi-Agent Collab

In a Hierarchical system, how does a Sub-Agent signal that its task is complete and it wishes to return control to the Primary Assistant?

Easy

1

D

By crashing the program.

By calling the end user via SMS.

By erasing the shared state’s message list.

By executing a “CompleteOrEscalate” tool call, signaling the workflow to pop the dialog stack.

The common pattern relies on returning a specific signal (like pop_dialog_state) transitioning back to the orchestrator.

41

LangGraph & Agentic AI

Lec4

Multi-Agent Collab

In multi-agent LangGraph architectures, what prevents agents from losing the overarching conversation context?

Easy

1

B

They read the local filesystem.

They all read and append to a centralized shared messages list managed in the AgenticState.

The developer manually pastes the JSON transcript into each prompt.

They query a vector database at every step.

Shared TypedDict State containing add_messages tracking history across all nodes ensures alignment.

42

LangGraph & Agentic AI

Lec4

Multi-Agent Collab

What is the purpose of the dialog_state stack in a hierarchical multi-agent state?

Easy

1

A

To push and pop agent identifiers corresponding to the current active agent in the conversation tree.

To log errors to a debugging console.

To translate different languages.

To count the number of LLM tokens used.

The dialog stack (["primary", "ticket_agent"]) acts analogously to a programming call stack, remembering which agent is currently active.

43

LangGraph & Agentic AI

Lec4

Multi-Agent Collab

What is “Context Injection” referring to in multi-agent tool execution?

Medium

1

D

Injecting system prompts into the vector database.

Overriding the user’s internet connection.

Re-training the model mid-conversation.

Automatically supplying known session metadata (like user_id or email) into tool arguments without the LLM needing to derive them explicitly.

Context fields defined in the AgenticState are injected quietly into tool schemas by intermediate functions to provide precise references automatically.

44

LangGraph & Agentic AI

Lec4

Multi-Agent Collab

How do routing functions (conditional edges) decide to shift execution from the Primary Assistant to a designated Sub-Agent?

Medium

1

C

The user types “Route” in the chat window.

A random hash evaluates to true.

By inspecting the tool_calls generated by the Primary Assistant and matching the tool_name to a subgraph node.

They execute raw SQL queries tracking agent status.

Standard routers look at the Assistant’s final AIMessage; if it includes tool_calls for a particular sub-agent, the edge routes to that corresponding node.

45

LangGraph & Agentic AI

Lec4

Multi-Agent Collab

Why might an agentic architecture include an “Entry Node” when transitioning to a child agent?

Medium

1

B

To charge the user additional credits.

To silently append a ToolMessage providing the child agent with instructions, task context, and a reminder to call a return tool when done.

To block external api requests permanently.

To delete previous session checkpoints.

Entry nodes serve as a trampoline, providing localized instructions to the incoming sub-agent without confusing the Primary Assistant’s prompt.

46

LangGraph & Agentic AI

Lec4

Multi-Agent Collab

During multi-agent fallback, what happens when a tool execution fails inside an agent’s subgraph?

Medium

1

A

A custom create_tool_node_with_fallback catches the exception and returns the error within a standard ToolMessage for the corresponding agent to review.

The PrimaryAssistant automatically shuts down.

The system crashes.

It switches out the open-source LLM for an OpenAI model.

A structured fallback catcher prevents silent failures or crashes and turns exceptions into conversational events the agent can rectify.

47

LangGraph & Agentic AI

Lec4

Multi-Agent Collab

In a highly complex Competitive multi-agent arrangement, how do agents ultimately converge on a single answer?

Hard

1

C

They execute a random dice roll.

The graph hangs infinitely until restarted.

A separate Evaluator/Synthesizer agent compares the outputs of all competing agents and selects or merges the best response into the final message.

Only the agent that responds first is recorded in state.

Competitive architectures require downstream synthesis nodes that “Observe” multiple paths and judge the optimal conclusion analytically.

48

LangGraph & Agentic AI

Lec4

Multi-Agent Collab

Consider the structure: state["dialog_state"] = update_dialog_stack(["primary", "ticket_agent"], "pop"). What state does the graph enter next based on hierarchical stack principles?

Hard

1

B

It adds a third string to the stack.

It returns the list to ["primary"].

It deletes the entire stack.

It loops infinitely within ticket_agent.

The custom reducer pops the last active element (ticket_agent), gracefully restoring control to the base primary_assistant.

49

LangGraph & Agentic AI

Lec5

Human-in-the-Loop

Why is a “Human-in-the-Loop” (HITL) step strongly recommended for applications performing financial transactions?

Easy

1

A

They involve irreversible critical actions that require human oversight to prevent costly AI mistakes.

It accelerates the transaction speed natively.

Models cannot do math.

HITL is an obsolete pattern replaced by GPT-4.

Financial transactions are high-stakes operations requiring human intervention and compliance audit trails before final execution.

50

LangGraph & Agentic AI

Lec5

Human-in-the-Loop

In LangGraph, what prevents all computation from being lost when an agent pauses to wait for human input?

Easy

1

C

Writing logs to a simple text file.

LangChain’s built-in ConversationBufferMemory.

LangGraph’s native Checkpointing mechanism (e.g., MemorySaver or SqliteSaver) tightly coupled with interrupt_before/interrupt_after.

Caching the prompt on the client side.

Checkpointers serialize the exact state graph, allowing it to rest safely in memory or DB until resumed.

51

LangGraph & Agentic AI

Lec5

Human-in-the-Loop

How does passing interrupt_before=["approval_node"] change the execution behavior of the graph?

Easy

1

B

It forces the node to timeout after 3 seconds.

It suspends execution right before the specified node executes, returning control back to the application.

It skips the node altogether.

It triggers an infinite loop of human questions.

interrupt_before natively halts the graph, saves state, and acts as a boundary pause expecting the app to resume it later.

52

LangGraph & Agentic AI

Lec5

Human-in-the-Loop

What is the main drawback of using MemorySaver as a checkpointer in LangGraph?

Easy

1

D

It requires setting up a massive cluster.

It runs too slowly for modern models.

It writes to a file that fills up the hard drive instantly.

Checkpoints disappear completely when the python process drops or server restarts.

MemorySaver keeps data purely in process RAM; process death equals checkpoint death.

53

LangGraph & Agentic AI

Lec5

Human-in-the-Loop

Which checkpointer is recommended for a scalable, production-grade distributed LangGraph service?

Easy

1

C

MemorySaver

SqliteSaver

PostgresSaver

FileSaver

PostgresSaver leverages robust PostgreSQL servers built for concurrent, heavy-scale transactions needed in production.

54

LangGraph & Agentic AI

Lec5

Human-in-the-Loop

How does LangGraph distinguish parallel user conversations hitting the same graph application simultaneously?

Easy

1

B

By creating separate python processes.

By assigning each conversation a unique thread_id in the RunnableConfig.

By deleting the older users’ conversations.

By using separate API keys.

thread_id segregates memory namespaces per conversation perfectly.

55

LangGraph & Agentic AI

Lec5

Human-in-the-Loop

What information does LangGraph’s app.get_state_history(config) feature provide?

Medium

1

A

A complete historical log of all checkpointed states, parent markers, and metadata modifications across a conversation.

Only the very first HumanMessage sent.

The system prompt token usage.

Live streaming characters from the LLM.

Pulling state history allows time-travel debugging and viewing the explicit step-by-step data modification over the thread’s lifespan.

56

LangGraph & Agentic AI

Lec5

Human-in-the-Loop

Given a graph paused before a “Publishing” node, what code pattern can update the state manually, say, switching approved: False to approved: True?

Medium

1

C

app.publish(approved=True)

Modifying the global variables inside the python script.

Calling app.update_state(config, {"approved": True}) before invoking the graph again.

Redefining the TypedDict.

update_state lets developers patch the state tree with manual human reviews before releasing the lock on the paused graph.

57

LangGraph & Agentic AI

Lec5

Human-in-the-Loop

Why would a multi-agent framework require separate short-term Checkpointers vs explicit long-term external vector databases?

Medium

1

D

Because LangChain deprecates long-term storage natively.

Short-term databases always truncate after 1 megabyte.

To prevent open-source models from scraping data.

Checkpointers handle immediate conversational state securely per thread, while Vector stores aggregate historical knowledge and profiles persistently across unrelated sessions.

Checkpointers = Thread-scoped conversational state. VectorDB = Global user-scoped background context fetching.

58

LangGraph & Agentic AI

Lec5

Human-in-the-Loop

How does the SqliteSaver schema manage nested state timelines within the same thread if the user “rewinds” to an earlier step and branches context?

Medium

1

B

It overwrites the database completely.

It creates a new checkpoint_id pointing back to the specific parent_checkpoint_id, preserving branching forks natively.

It throws a primary key error.

It switches back to MemorySaver.

The DB schema retains parent-child snapshot ID graphs, effectively allowing true non-destructive time travel.

59

LangGraph & Agentic AI

Lec5

Human-in-the-Loop

If an agent architecture has a manual Node simulating an “As-Node” state update (app.update_state(config, {"fix": 1}, as_node="human_check")), what is the technical outcome in the graph context?

Hard

1

C

The app skips ahead 10 checkpoints automatically.

The update is discarded silently because the node was skipped.

It behaves as if the actual human_check node was evaluated, allowing the graph’s conditional edges mapped from human_check to traverse properly during resumption.

The agent loops forever.

as_node perfectly mocks node output, resolving edge transitions waiting for that specfic node’s signature.

60

LangGraph & Agentic AI

Lec5

Human-in-the-Loop

In a scenario where an AI is suggesting Medical treatment protocols, how might interrupt_after be used successfully in a LangGraph structure?

Hard

1

A

Pausing after the Generate_Diagnosis node, sending the raw output downstream to a UI so a Senior Doctor can review and inject corrections before the Finalize_Report executes.

Halting the system if the internet disconnects.

Interrupting the LLM mid-token generation.

Making the LLM stream results to a text-to-speech engine.

This allows the state to fully materialize the AI’s proposal, giving the human doctor a complete object to assess before continuing.


LLMOps and Evaluation Theory#

LLMOps and Evaluation Question Bank#

No.

Training Unit

Lecture

Training content

Question

Level

Mark

Answer

Answer Option A

Answer Option B

Answer Option C

Answer Option D

Explanation

1

Unit 1: LLMOps

Lec2

RAGAS Metrics

What does the Faithfulness metric measure in RAGAS?

Easy

1

A

The truthfulness of the generated answer compared to the retrieved context

The relevance of the answer to the original question

The accuracy of the ranking of contexts

The coverage of the retrieval process

Faithfulness checks if all statements in the answer can be supported by the retrieved context, avoiding hallucinations.

2

Unit 1: LLMOps

Lec2

RAGAS Metrics

Which LLM framework is RAGAS designed to evaluate?

Easy

1

B

Agents

RAG systems

Fine-tuned models

Traditional Search Engines

Ragas is an automated evaluation framework designed specifically for RAG systems.

3

Unit 1: LLMOps

Lec2

RAGAS Metrics

What do you need to annotate data manually when using RAGAS?

Easy

1

C

Large scale human annotations

Only expert domain knowledge

Nothing, it uses LLMs like GPT-4 to automate evaluation

Both standard Q&A pairs and ranking queries

Unlike traditional methods, Ragas uses LLMs to automate the evaluation process without needing heavy human annotations.

4

Unit 1: LLMOps

Lec2

RAGAS Metrics

Which dimension is measured by Context Precision?

Easy

1

C

Quality of generation

Semantic similarity to the user query

Accuracy of the retrieval process

Coverage of expected facts

Context Precision measures the accuracy of the retrieval process by assessing the ranking of contexts.

5

Unit 1: LLMOps

Lec2

RAGAS Metrics

What is the main purpose of Answer Relevancy?

Easy

1

D

Fact-checking the answer

Verifying truthfulness

Guaranteeing context coverage

Measuring relevance between answer and original question

It evaluates the relevance between the answer and question to confirm it addresses the problem asked.

6

Unit 1: LLMOps

Lec2

RAGAS Metrics

What value range do Ragas metrics return?

Easy

1

B

0 to 100

0 to 1

-1 to 1

1 to 5

Each metric gives a value from 0 to 1, with higher values indicating better quality.

7

Unit 1: LLMOps

Lec2

RAGAS Metrics

Which metric evaluates if relevant chunks are ranked high in retrieved contexts?

Easy

1

C

Faithfulness

Context Recall

Context Precision

Answer Relevancy

Context Precision checks if relevant chunks are ranked high in the list of retrieved contexts.

8

Unit 1: LLMOps

Lec2

RAGAS Metrics

How many main metrics are covered in the RAGAS documentation?

Easy

1

A

4

5

3

6

The four main metrics are faithfulness, answer relevancy, context precision, and context recall.

9

Unit 1: LLMOps

Lec2

RAGAS Metrics

If Context Recall is 0, what does that indicate?

Easy

1

A

Retriever failed to find necessary context

Rank 1 is an irrelevant context

LLM generated hallucination

The answer is irrelevant to the query

It indicates the retriever failed to find context containing necessary information to answer the question.

10

Unit 1: LLMOps

Lec2

RAGAS Metrics

Which two metrics evaluate the “retrieval” performance?

Easy

1

B

Faithfulness & Answer Relevancy

Context Precision & Context Recall

Answer Relevancy & Context Recall

Context Precision & Faithfulness

Context precision and context recall evaluate retrieval performance.

11

Unit 1: LLMOps

Lec2

RAGAS Metrics

Describe the calculation process for Faithfulness in Ragas.

Medium

2

A

Decompose answer to statements, verify against context, calculate ratio

Generate questions, embed them, calculate cosine similarity

Determine context relevance, calculate Precision@k, aggregate

Decompose reference answer, verify if inferences exist in retrieved context

The process is: Decomposition (claims), Verification (checked against context), and Scoring (ratio).

12

Unit 1: LLMOps

Lec2

RAGAS Metrics

How does Answer Relevancy determine its score technically?

Medium

2

C

By classifying the answer using a trained classifier

By matching keywords between answer and question

By reverse-engineering questions from answer and calculating embedding cosine similarity

By comparing the character count of answer vs question

LLM generates N questions from the given answer, converts them to embeddings, and compares cosine similarity with the original question.

13

Unit 1: LLMOps

Lec2

RAGAS Metrics

A low Context Recall score means what in terms of information availability?

Medium

2

D

The information is hallucinated

The answer has redundant information

The retrieved information is scattered

The necessary facts from the reference answer are missing in the retrieved contexts

It means the necessary information from the reference answer was not found in the retrieved contexts.

14

Unit 1: LLMOps

Lec2

RAGAS Metrics

In Context Precision calculation, what is \(v_k\)?

Medium

2

C

Velocity of retrieval

Volume of chunks

Relevance indicator at position k

Value of cosine similarity

\(v_k \in \{0, 1\}\) is the relevance indicator at position k.

15

Unit 1: LLMOps

Lec2

RAGAS Metrics

Why might an answer score high in Faithfulness but low in Answer Relevancy?

Medium

2

B

The answer is hallucinated but relevant

The answer is entirely true based on context but fails to address the user’s specific question

The retriever brought back poor context

The context precision is very low

It can be completely faithful to retrieved context, but that context (and answer) might not be what the user asked for.

16

Unit 1: LLMOps

Lec2

RAGAS Metrics

Why is Faithfulness strictly compared to retrieved context and not world knowledge?

Medium

2

A

To prevent LLM hallucinations from being counted as correct if the retriever failed

Ragas has no access to world knowledge

The LLM doesn’t know facts

World knowledge costs more tokens

RAG’s core value is grounding generation on specific private/provided context, so it measures adherence to that context only to prevent unaccounted hallucinations.

17

Unit 1: LLMOps

Lec2

RAGAS Metrics

If LLM splits an answer into 3 statements, and only 2 are verified in context, Faithfulness is?

Medium

2

B

0.5

0.67

0.33

1.0

Faithfulness relies on the ratio of correct statements: 2 out of 3 makes it ~0.67.

18

Unit 1: LLMOps

Lec2

RAGAS Metrics

Given a scenario where a user asks about Einstein’s death, but the context only contains his birth, and the LLM answers “Einstein died in 1955” using its internal knowledge. What are the RAGAS metric implications?

Hard

3

B

High Faithfulness, Low Answer Relevancy

Low Faithfulness, High Answer Relevancy

Low Faithfulness, Low Context Recall

High Context Precision, High Context Recall

It answers the user (High Relevancy), but the claim isn’t in context, making Faithfulness low.

19

Unit 1: LLMOps

Lec2

RAGAS Metrics

To improve Context Precision in a RAG pipeline, what architecture modification would you introduce?

Hard

3

C

Increase LLM temperature

Swap FAISS for ChromaDB

Add a Cross-encoder reranking step

Generate multiple answers and average them

Reranking specifically improves the order/ranking of retrieved chunks, heavily impacting Context Precision metrics.

20

Unit 1: LLMOps

Lec2

RAGAS Metrics

Detail the mathematical rationale behind using N reverse-engineered questions for calculating Answer Relevancy.

Hard

3

A

Averages out the stochastic nature of LLMs generating questions to provide a stable semantic similarity

It is required to satisfy vector dimensions

One question uses up too few tokens

N acts as a padding token for embeddings

Generating N questions and averaging their cosine similarities mitigates the variance inherent in LLM generation, ensuring a robust relevancy score.

21

Unit 2: Observability

Lec6

Observability Concepts

What is Observability in the context of LLM applications?

Easy

1

A

The ability to track flows, errors and costs of LLM apps acting as black boxes

A library for generating UI code

A vector database

The algorithm used for chunking texts

It tracks probabilistic components acting as black boxes, aiding in tracing, tracking costs, and debugging.

22

Unit 2: Observability

Lec6

LangFuse Basics

Which of these tools is known for being Open Source?

Easy

1

B

LangChain

LangFuse

LangSmith

OpenAI

LangFuse is a popular open-source tool focusing on engineering observability.

23

Unit 2: Observability

Lec6

Observability Challenges

What makes LLM applications harder to debug than traditional software?

Easy

1

C

They use more memory

They require internet connections

They involve probabilistic, non-deterministic components

They use Python

You give input, get output. LLMs act as probabilisitic black boxes.

24

Unit 2: Observability

Lec6

LangSmith Basics

Who built LangSmith?

Easy

1

B

Google

The LangChain Team

OpenAI

Meta

LangSmith is built by the LangChain team for native integration.

25

Unit 2: Observability

Lec6

LangFuse Integration

In LangFuse, what is used to automatically instrument LangChain chains code?

Easy

1

C

System.out.println

VectorEmbeddings

CallbackHandler

FAISS

LangFuse provides a CallbackHandler that automatically instruments chains.

26

Unit 2: Observability

Lec6

Prompt Management

Why should you manage prompts in a tool like LangFuse instead of hardcoding in Git?

Easy

1

A

To allow non-engineers to tweak them

Because Git is too slow

Because Git charges per token

To hide prompts from developers

It acts as a CMS for prompts so non-engineers can comfortably inspect and tweak them.

27

Unit 2: Observability

Lec6

Setup

How can you enable LangSmith auto-tracing in a LangChain project usually?

Easy

1

D

Rewrite all code to use LangSmith classes

Contact support to enable it

Import enable_smith module

Just set environment variables

LangSmith is magic; you often don’t need code changes, just environment variables.

28

Unit 2: Observability

Lec6

Production Best Practices

What is the recommended tracing sampling rate for Production environments?

Easy

1

C

100%

50%

1-5% of traffic

None

In production, tracing every request is noisy and expensive, so 1-5% or high importance traces are recommended.

29

Unit 2: Observability

Lec6

Privacy

How handle PII Data Privacy before logging to a cloud observability tool?

Easy

1

B

Do nothing

Run PII Masking/Redaction functions

Encrypt with simple base64

Delete all logs

Never log sensitive data; run PII Masking or use enterprise redacting features.

30

Unit 2: Observability

Lec6

Alerts

What is an example of a good alert to set up in observability?

Easy

1

A

Error Rate Spike > 10% in 5 min

“Hello World” printed

CPU temperature

Single user logged out

You should alert on things like Error Rate > 10%, Latency Spikes, or Cost Anomalies.

31

Unit 2: Observability

Lec6

LangFuse vs LangSmith

If self-hosting data privacy is an absolute requirement and budget is zero, which tool is recommended?

Medium

2

C

Weights & Biases

LangSmith

LangFuse

CloudWatch

LangFuse is Open Source (MIT) and offers easy self-hosting (Docker Compose) for free.

32

Unit 2: Observability

Lec6

LangSmith Playground

What is the “Playground: Edit and Re-run” feature in LangSmith useful for?

Medium

2

A

You can take a failed production trace, change the prompt, and test a fix immediately

Training new models

Deploying code to AWS

Chatting with other developers

It allows you to take failed real-world traces and edit prompts/parameters to instantly see if the issue resolves.

33

Unit 2: Observability

Lec6

Latency Debugging

If a RAG request takes 10 seconds, how does tracing help?

Medium

2

B

It makes the query faster

It breaks down the latency per component (e.g., Vector DB vs API completion)

It charges the user for the wait time

It cancels requests longer than 5 seconds

Tracing visualizes the execution flow, pinpointing exactly which step (Vector Search vs Generate) is the bottleneck.

34

Unit 2: Observability

Lec6

Cost Tracking

Why is Cost Tracking a critical feature in LLM Observability compared to traditional app monitoring?

Medium

2

D

Because AWS charges are cheap

Because you don’t need servers

Because LLMs don’t cost real money

Because LLM API calls are charged per-token and single runaway loops can cost hundreds of dollars quickly

API calls are expensive, requiring real-time tracking to prevent unmanaged financial overruns.

35

Unit 2: Observability

Lec6

Langchain Integration

What environment variable activates LangSmith tracing?

Medium

2

B

LANGCHAIN_DEBUG=1

LANGCHAIN_TRACING_V2=true

LANGCHAIN_LOG=all

LANGSMITH_ACTIVE=1

export LANGCHAIN_TRACING_V2=true activates LangSmith native tracing.

36

Unit 2: Observability

Lec6

Prompt CMS

How do you fetch a production prompt dynamically using LangFuse SDK?

Medium

2

A

Using langfuse.get_prompt(name, version)

Reading from a local .json file

Executing a GraphQL query to Github

Using prompt = os.getenv('PROMPT')

Langfuse acts as a CMS and lets you retrieve prompts using get_prompt("name", version="production").

37

Unit 2: Observability

Lec6

Alerts & Best Practices

Why shouldn’t you just “stare at dashboards” for production LLM apps?

Medium

2

A

You need automated alerts (error spikes, costs) to respond fast to anomalies

Dashboards are always broken

It slows down the computer

Observability doesn’t provide dashboards

Dashboards are passive. Automated alerts are needed to actively manage sudden cost, latency, or error anomalies.

38

Unit 2: Observability

Lec6

Advanced LangChain Integration

You have a complex application utilizing standard Python code, LangChain agent loops, and custom API calls. Should you prefer LangSmith or LangFuse, and why?

Hard

3

B

LangSmith, because it supports Python natively better

LangFuse, because it is platform-agnostic and instruments cleanly across non-LangChain code too.

LangSmith, because LangChain is mandatory.

LangFuse, because it has an “Edit and Re-run” playground.

LangFuse is platform-agnostic for non-LangChain code, making it better for mixed-stack integrations, while LangSmith is highly specific and native to LangChain execution loops.

39

Unit 2: Observability

Lec6

Debugging Scenarios

In production, users report the chatbot occasionally ignores their negative feedback instructions. How would you leverage LangSmith to resolve this?

Hard

3

C

By deleting the user history and trying again

Check the VectorDB logs

Locate the failed traces in LangSmith, transition them to the Playground, adjust the system prompt, and replay to verify compliance

Re-index the FAISS database

LangSmith’s Playground allows you to take directly failed traces, manipulate the prompt, and replay the exact trace environment to find the fix.

40

Unit 2: Observability

Lec6

Data Security Architecture

Explain a robust architectural design for handling HIPAA/PII compliance while using a SaaS LLM Observability platform like LangSmith Enterprise.

Hard

3

A

Run an edge/middleware service that performs localized PII Entity masking/redaction before transmitting traces to the LangSmith API

Avoid observability tools completely

Share passwords directly via the agent

Mask PII inside the LangSmith GUI

PII must not leave the secure perimeter; redaction must happen at the application layer or middleware before data is shipped via logs/traces.