Query Transformations#
In the previous sections, we assumed that the user’s question is always clear, semantically complete, and matches the content in the document. However, reality is rarely that perfect.
Users often tend to ask short questions, lacking context, or ask multiple issues at once. For example: instead of asking “What date do remote work regulations apply from?”, they might just type “remote work”. If we feed this raw question directly into the RAG system, search results are often very poor because the question’s vector does not match the vector of detailed legal documents.
To solve this problem, we use the Query Transformation technique. The core idea is to use an LLM to rewrite, expand, or break down the user’s question into better versions before performing the search.
Hypothetical document embeddings (HyDE)#
One of the biggest challenges of Vector Search is the semantic asymmetry between the question and the answer. Questions are often short and interrogative, while documents are long and affirmative/descriptive.
HyDE is a technique to overcome this by improved using the creativity of the LLM. Instead of searching based on the question, we ask the LLM to provide a hypothetical answer, then use this hypothetical answer to find the real document.
Mechanism of Operation
The HyDE process takes place in three steps:
Generate: The system asks the LLM to write a hypothetical answer paragraph for the user’s question. Note that the information in this paragraph may be factually incorrect, but the writing style and technical vocabulary used will resemble the actual document.
Encode: Pass this hypothetical paragraph through the embedding model to create a vector.
Retrieve: Use this vector to search in the database. Since the vector of the “fake answer” will be closer to the vector of the “real answer” than the vector of the “question”, search results are usually more accurate.
Illustrative Example
graph LR
Q["User Question\n'How to handle blue screen error'"]
Q -->|"1. Generate"| LLM[LLM]
LLM --> HD["Hypothetical Answer\n'To fix BSOD, restart, check stop code,\nupdate driver, enter Safe Mode...'"]
HD -->|"2. Encode"| EM[Embedding Model]
EM --> HV[Hypothetical Vector]
HV -->|"3. Retrieve"| SS[Similarity Search]
DB[(Vector Store)] --> SS
SS --> RD["Real Documents\n(technical instructions)"]
User Question: “How to handle blue screen error.”
Problem: The question is too short, the vector might mistakenly match documents describing screen colors.
HyDE Generation (LLM drafting): “To fix the Blue Screen of Death (BSOD) error on Windows, you need to restart the computer, check the stop code, update the graphics card driver, or enter Safe Mode to remove conflicting software…”
Result: The system uses the draft paragraph above to search. Thanks to technical keywords like “BSOD”, “driver”, “Safe Mode” appearing in the draft, the system easily finds the exact technical instruction document in the database.
Query Decomposition#
This technique is particularly useful for complex questions where a single text passage cannot contain enough information to answer.
If a user asks a question that requires comparing or aggregating information from multiple sources, simple searching often fails because the question vector will hang in between different topics. Query Decomposition solves this by breaking the large problem into simpler sub-problems.
Strategy
The system uses an LLM to analyze the original question and split it into a sequence of independent sub-questions.
Breakdown: Split multi-intent questions into single-intent questions.
Retrieval: Perform document search for each separate sub-question. This ensures each search has a clear goal and high accuracy.
Synthesis: Aggregate text segments found from all steps above and give them to the LLM to answer the original initial question.
graph TD
OQ["Original Complex Question"] -->|"1. Breakdown"| LLM[LLM]
LLM --> SQ1["Sub-query 1\n(single intent)"]
LLM --> SQ2["Sub-query 2\n(single intent)"]
SQ1 -->|"2. Retrieval"| R1["Retrieved Docs 1"]
SQ2 -->|"2. Retrieval"| R2["Retrieved Docs 2"]
R1 --> SYN[LLM Synthesis]
R2 --> SYN
SYN -->|"3. Synthesis"| ANS["Final Answer"]
Illustrative Example
User Question: “Compare the revenue of iPhone 15 and Samsung S24 in Q1 2024.”
Problem: There is no single document containing this comparison table. Information is scattered in Apple’s financial report and Samsung’s report.
Decomposition Process:
Sub-query 1: “What is iPhone 15 revenue in Q1 2024?” → Found in Apple Report.
Sub-query 2: “What is Samsung S24 revenue in Q1 2024?” → Found in Samsung Report.
Final Generation: The LLM receives both figures from the two searches and self-aggregates them into a complete comparison answer.
In summary, Query Transformation acts as an intelligent editor, helping to edit and reorient user questions before sending them to the lookup department, ensuring that the system always correctly understands the true intent behind concise commands.