Quiz#

Post-Retrieval Processing#

Question 1: Why is the Top-K list often not optimal enough to feed directly into the LLM?

  • A. Because the documents are too short.

  • B. Due to Limited Semantic Accuracy and Information Noise.

  • C. Because it contains too many images.

  • D. Because it bypasses the embedding model completely.

Answer: B

Question 2: What is the primary role of Re-ranking in the retrieval process?

  • A. To act as a filter in the final step to select the best documents.

  • B. To generate hypothetical answers.

  • C. To split documents into chunks.

  • D. To combine vector and keyword search.

Answer: A

Question 3: What architecture is used when a system processes a question and a document separately?

  • A. Cross-Encoder

  • B. BM25

  • C. Bi-Encoder

  • D. HyDE

Answer: C

Question 4: What does the MMR algorithm aim to balance?

  • A. Relevance and Diversity

  • B. Speed and Cost

  • C. Latency and Accuracy

  • D. Keywords and Vectors.

Answer: A

Question 5: Which encoding method is known for high speed because calculations can be pre-computed?

  • A. Cross-Encoder

  • B. Bi-Encoder

  • C. MMR

  • D. Semantic Chunking.

Answer: B

Question 6: What is a major consequence of Embedding models prioritizing retrieval speed on large amounts of data?

  • A. They consume too much RAM.

  • B. They are forced to trade off the ability to understand complex semantic relationships.

  • C. They can only process English text.

  • D. They require manual keyword tagging.

Answer: B

Question 7: How does a Cross-Encoder process the question and the document?

  • A. It converts them into separate numerical IDs.

  • B. It concatenates them into a single text sequence fed into the model simultaneously.

  • C. It translates the question before comparing it.

  • D. It strictly uses term frequency algorithms.

Answer: B

Question 8: What is a significant disadvantage of using a Cross-Encoder?

  • A. It cannot handle negation.

  • B. It is very slow and resource-consuming, cannot be used across the entire database.

  • C. It only relies on keyword matching.

  • D. It splits complete ideas into meaningless chunks.

Answer: B

Question 9: What is the first step in the ‘Funnel Strategy’ for re-ranking?

  • A. Use Bi-Encoder to quickly get Top 50 documents from millions.

  • B. Apply MMR to all documents.

  • C. Use Cross-Encoder to re-score millions of documents.

  • D. Generate a hypothetical response using HyDE.

Answer: A

Question 10: In MMR, what is the purpose of the ‘Diversity’ factor?

  • A. To ensure the document is related to the question.

  • B. To ensure the document is different from previously selected documents.

  • C. To ensure multiple languages are included.

  • D. To increase the retrieval speed.

Answer: B

Question 11: What does a Bi-Encoder fail to capture that a Cross-Encoder excels at?

  • A. The length of the document.

  • B. The detailed interaction information between each word in the question and each word in the document.

  • C. The exact number of keyword occurrences.

  • D. The file type of the retrieved document.

Answer: B

Question 12: When querying ‘What does Python not eat?’, why might a Bi-Encoder return irrelevant results?

  • A. It cannot process the word ‘Python’.

  • B. It may search with wrong intent because it only catches keywords and ignores negation.

  • C. It automatically translates ‘Python’ to a programming language.

  • D. It requires a Graph Database to function.

Answer: B

Question 13: According to the Simplified Formula for MMR, what does a smaller lambda value prioritize?

  • A. Relevance over diversity.

  • B. Diversity more heavily.

  • C. Execution speed over accuracy.

  • D. The Cross-Encoder score.

Answer: B

Question 14: In the MMR calculation, what does the term ‘max Sim_2(d_i, d_j)’ aim to penalize?

  • A. Documents that do not match the user’s keywords.

  • B. Documents that are too long.

  • C. Documents that are highly similar to documents already selected in the set S.

  • D. Documents that come from different external sources.

Answer: C

Question 15: How does a Cross-Encoder understand complex interactions within text?

  • A. By using keyword density mapping.

  • B. By relying on a full Self-Attention mechanism to read the concatenated sequence in parallel.

  • C. By splitting sentences and indexing them in an HNSW graph.

  • D. By relying purely on BM25 frequency calculations.

Answer: B

Question 16: Why does Information Noise occur in the initial Retrieval step?

  • A. Documents may contain matching keywords but deviate in context or true intent.

  • B. The embedding model compresses text too heavily.

  • C. The user’s internet connection was unstable.

  • D. The query was rewritten poorly by HyDE.

Answer: A

Question 17: If you need extremely accurate answers for difficult questions, which Re-ranking method should you choose?

  • A. Maximal Marginal Relevance (MMR)

  • B. Hybrid Search

  • C. Cross-Encoder

  • D. Bi-Encoder.

Answer: C

Question 18: If your goal is to provide a general answer covering many aspects, which Re-ranking method is most appropriate?

  • A. Cross-Encoder

  • B. Maximal Marginal Relevance (MMR)

  • C. Bi-Encoder

  • D. HyDE.

Answer: B

Question 19: In the VF8 car example, why is Doc 2 selected when using MMR?

  • A. Because it has a higher Bi-Encoder score than Doc 1.

  • B. Because it contains the word ‘VF8’ more frequently.

  • C. Because its content (ADAS system) differs from Doc 1 (electric motor).

  • D. Because it is the longest document available.

Answer: C

Question 20: What specific structural element of the query ‘What does Python not eat?’ is successfully recognized by a Cross-Encoder but often missed by a Bi-Encoder?

  • A. The biological context and negation structure ‘not eat’.

  • B. The capital letter ‘P’.

  • C. The interrogative word ‘What’.

  • D. The length of the query string.

Answer: A