Quiz#
Advanced Indexing#
Question 1: What characterizes ‘Fixed-size Chunking’?
A. Cuts based on changes in content meaning.
B. Text is cut mechanically every 500 or 1000 characters, regardless of sentence completion.
C. Uses an LLM to rewrite the text.
D. Relies on BM25 for text division.
Answer: B
Question 2: What happens when search utilizes ‘Flat Indexing’?
A. The system calculates the similarity of consecutive sentences.
B. The system has to compare the question with every single vector in a long list.
C. Data points are structured hierarchically.
D. The semantic flow of text is perfectly preserved.
Answer: B
Question 3: What does the abbreviation ANN stand for in vector searching?
A. Artificial Neural Network
B. Advanced Node Navigation
C. Approximate Nearest Neighbor
D. Algorithm for Numerical Nodes.
Answer: C
Question 4: What determines the split points in the Semantic Chunking method?
A. A fixed character limit.
B. Understanding the content to decide split points.
C. The number of punctuation marks.
D. Random allocation by the system.
Answer: B
Question 5: What is the first step in the Semantic Chunking implementation process?
A. Thresholding
B. Similarity Calculation
C. Generating hypothetical documents.
D. Sentence Splitting: Split the text into complete sentences based on punctuation.
Answer: D
Question 6: Why does ‘Loss of Semantics’ occur with mechanical chunking?
A. Because embedding models ignore large chunks.
B. Because mechanical chunking accidentally breaks the flow of text, splitting complete ideas into meaningless chunks.
C. Because it uses the HNSW algorithm.
D. Because it relies heavily on BM25.
Answer: B
Question 7: How does Semantic Chunking detect a change in topic between sentences?
A. By counting the characters.
B. When vectors abruptly change direction, creating a large distance.
C. By checking for new paragraph formatting.
D. By querying a Graph Database.
Answer: B
Question 8: What happens in Semantic Chunking if the similarity between a sentence and the next one is high?
A. The system generates a Cypher query.
B. The topic has changed, so a new chunk starts.
C. Two sentences are on the same topic and are merged into one chunk.
D. The algorithm performs a brute-force search.
Answer: C
Question 9: What is one of the main cons of Semantic Chunking compared to Recursive Chunking?
A. It easily cuts through important ideas.
B. It creates noisy context.
C. It consumes computational resources due to running a model to compare each sentence.
D. It is only suitable for strictly structured documents like laws.
Answer: C
Question 10: What is the structure of data in the HNSW algorithm?
A. A single long list.
B. A flat SQL table.
C. A multi-layered graph structure.
D. A recursive binary tree.
Answer: C
Question 11: In the HNSW graph, what does Layer 0 contain?
A. Only sparse shortcut links.
B. The most recently added data only.
C. All data points, and the most detailed links between them.
D. Only vectors that match exactly.
Answer: C
Question 12: What role do the ‘higher layers’ serve in an HNSW graph?
A. They contain the full detailed index.
B. They hold backup data for failure recovery.
C. They act as shortcuts helping the algorithm move quickly through the large data space.
D. They are used exclusively for BM25 processing.
Answer: C
Question 13: What does the parameter ‘M’ control in HNSW, and what is the trade-off of increasing it?
A. Controls search depth, increases speed but lowers accuracy.
B. Specifies the maximum number of links per node, increases accuracy but RAM consumption increases significantly.
C. Controls document chunk size, improves context but slows embedding.
D. Specifies the threshold for semantic chunking, diversity.
Answer: B
Question 14: How does the parameter ‘ef_construction’ affect index construction?
A. It sets the maximum token length for documents.
B. A larger value makes the algorithm scan more candidates to find optimal links, ensuring high structure quality but slower loading.
C. It limits the number of nodes at Layer 0 to save RAM.
D. It determines how heavily rare words are penalized.
Answer: B
Question 15: If you are configuring HNSW for a Real-time Chatbot Application, what is the recommended practical strategy?
A. Set M to 64 and ef_search to 500.
B. Disable HNSW and use brute-force.
C. Keep ‘ef_search’ at a low level (e.g., 50 - 100) to optimize latency, accepting a small margin of error.
D. Set ef_construction to 0 to bypass indexing.
Answer: C
Question 16: What describes the search process transition between layers in HNSW?
A. It scans all nodes on every layer sequentially.
B. It moves to the nearest neighbor until a local extremum is reached, which becomes the entry point for the next layer down.
C. It calculates the cosine similarity for all nodes on Layer 0 first.
D. It performs an exact BM25 keyword match at the highest layer.
Answer: B
Question 17: In the context of evaluating Chunking Methods, why is Recursive Chunking considered ‘suitable for documents with clear structure’?
A. Because it relies heavily on LLM semantic analysis.
B. Because it cuts based on punctuation and a fixed number of characters, which aligns well with laws and contracts.
C. Because it is the only method that supports HNSW.
D. Because it automatically merges sentences on the same topic.
Answer: B
Question 18: What is the specific risk of decreasing the ‘ef_search’ parameter?
A. The RAM consumption will skyrocket.
B. The system will crash during index building.
C. The system responds extremely fast, but there is a risk of returning results not closest to the question.
D. It will trigger an endless loop in the top layer.
Answer: C
Question 19: Why did Semantic Chunking successfully keep all ‘NVIDIA H100 GPU’ information intact in the visual example, unlike Recursive Chunking?
A. Because the word count was exactly 500 characters.
B. Because it detected the semantic break point between H100 and Llama-3 and cut at the right time.
C. Because H100 was designated as a keyword.
D. Because it bypassed the punctuation rules entirely.
Answer: B
Question 20: What is an explicit consequence of a ‘brute-force’ approach when a system scales up many times over?
A. The vectors lose their semantic meaning completely.
B. Sequentially scanning through millions of vectors is too slow to meet real-time requirements.
C. The LLM hallucinates technical terms.
D. BM25 saturation limits are bypassed.
Answer: B