Beyond Traditional RAG: How GraphRAG Enhances Information Retrieval

8 min read

GraphRAG vs. Traditional RAG: Understanding Why GraphRAG Matters

The GraphRAG Manifesto (https://neo4j.com/blog/genai/graphrag-manifesto/)

Traditional Retrieval-Augmented Generation (RAG) systems are like smart search engines. They try to find the right pieces of information by comparing words and meanings. This works well most of the time, but it doesn’t really understand what you’re asking. It’s more like guessing based on word patterns and meaning.

GraphRAG takes a different approach. Instead of just looking for text that “sounds right,” it builds a map of knowledge — like a network connecting people, companies, documents, and events. When you ask a question, GraphRAG doesn’t just hunt for similar words and meaning. It follows the connections in this map to find exactly what you need.

Think of it this way:

Traditional RAG is like flipping through a book, scanning for familiar words and similar phrases, hoping you’ll land on the right page by chance.
GraphRAG, on the other hand, is like using the table of contents or an index — it shows you how everything is connected, so you can go directly to the information you need.

This shift adds an understanding to the search process — turning retrieval into something that starts to look like reasoning. But the big question is: can this new method truly deliver on that promise?

Key Insights at a Glance

Conventional retrieval systems already work remarkably well. In fact, they are sufficient for approximately 95% of user queries — not because they answer 95% of questions perfectly, but because their operational model fits the vast majority of real-world needs.

For example, in the financial industry, where we operate, a typical question like:
“What is our latest take on investing in NVIDIA?”
can be answered well using traditional RAG techniques. A similarity-based search quickly finds recent analyses or memos discussing NVIDIA and passes them to the language model for summarization or interpretation.

However, not all questions are this straightforward. Consider a different query:
“List all articles we have that mention NVIDIA.”
This seemingly simple request requires a comprehensive and structured scan of all relevant documents. Traditional RAG often struggles with completeness in such cases — it might find a few related snippets, but miss others due to how text chunks are matched.

This is where GraphRAG shows its strength. By using a structured knowledge graph, it can follow relationships like “NVIDIA” → “mentioned in” → “document” and return a precise and complete list.

Still, we don’t see GraphRAG as a replacement. Rather, it complements traditional RAG. Our goal is to combine both approaches — the speed and sufficiency of traditional retrieval with the structure and reasoning power of graphs — to build a system that answers as optimally as possible, depending on the type of query.

A Simple Overview of Retrieval-Augmented Generation

https://k21academy.com/ai-ml/an-overview-of-retrieval-augmented-generationrag-and-ragops/

At its core, a RAG system is designed to find and use relevant information from a dataset to answer user questions. In a typical chatbot use case, RAG retrieves the most relevant pieces of data and feeds them to a large language model (LLM), which then generates a response that is grounded in the original source material.

RAG works through a multi-step process:

Ingestion: It begins with a collection of documents from various sources. These documents are processed and split into manageable text chunks — typically around 400 words each. This chunk size is chosen to preserve enough context for semantic understanding when generating vector embeddings.
Storage: These chunks are stored in a database and serve as the basic retrievable units.
Retrieval: When a user submits a question, the system performs retrieval using a combination of similarity search (via vector embeddings) and full-text search (e.g., Elasticsearch). These methods aim to surface the chunks most likely to contain useful information. However, the system doesn’t know in advance which chunk contains the actual answer — it simply retrieves the most promising ones based on relevance scoring. The hope is that the golden chunk — the one that truly answers the question — is among them.
Generation: The selected chunks are then passed as context to the LLM, which uses them to generate a coherent and context-aware answer. The quality of the response depends heavily on whether the right chunks — especially the golden one — were included in the retrieval step.

Traditional RAG: How it Works and Where it Struggles

Traditional RAG implementations commonly utilize two primary methods to identify relevant information or "golden chunks": full-text search and similarity search.

Similarity Search: Similarity search is a nuanced retrieval technique. During ingestion, when the dataset is split into chunks, vector embeddings are generated for each chunk. These embeddings transform text chunks into multidimensional vectors—numerical representations capturing semantic meaning with remarkable accuracy.

When a user poses a question, the system creates a vector embedding for the query or a variation of it and compares it against the chunk embeddings. The chunks with embeddings most similar to the query embedding are retrieved, hence the term "similarity search."

Full-text Search: Full-text search retrieves chunks by matching query keywords directly with the text. Using methodologies akin to those employed by search engines like Google, full-text search offers rapid retrieval speeds, significantly enhancing RAG's accuracy for straightforward queries.

While these two methods are robust and suitable for most query types, they can struggle with complex questions involving comprehensive lists or nested information requests. In such cases, advanced techniques like GraphRAG are employed to overcome these limitations, enabling the system to deliver more comprehensive and nuanced answers.

GraphRAG: How Structured Knowledge Enhances Retrieval

Financial institutions frequently rely on rapid, accurate information retrieval for critical decision-making. However, traditional RAG systems based on similarity searches often struggle with seemingly simple yet contextually rich queries. For instance, consider the question:

"Which investment reports reference UBS's sustainable finance initiatives from Q3 2024?"

A conventional full-text search might return multiple documents mentioning UBS, sustainability, and finance separately, but fail to capture the specific context required. Without structured metadata and relationships, the returned chunks of information are overwhelming and unstructured, making it challenging for analysts to generate precise, actionable insights.

GraphRAG, on the other hand, leverages a financial-specific knowledge graph, connecting entities such as financial institutions, investment topics, quarters, and reports through defined relationships. In this case, UBS would be a central entity, connected via a relationship such as "MENTIONED_IN_REPORT" to specific investment reports explicitly tagged with "Sustainable Finance" and timestamped with "Q3 2024."

The power of GraphRAG lies in its structured knowledge representation, allowing analysts to rapidly traverse and extract precise subgraphs relevant to the query. Analysts gain immediate visibility into how entities like UBS, specific investment themes, and report timings interrelate. The extracted subgraph provides targeted, coherent context directly to the language model, enabling it to generate concise, accurate summaries or insights tailored specifically for financial industry needs.

Consider another financial scenario:

"What recent market analysis is recommended by the Chief Investment Officer of UBS’s biggest competitor?"

Traditional similarity-based systems would likely miss critical relationships: identifying UBS's biggest competitor, their CIO, and finally the CIO's recommended market analysis. GraphRAG handles this smoothly by traversing explicit "COMPETITOR_OF" relationships, then "HAS_CIO," and finally "RECOMMENDED_REPORT."

By embedding such structured reasoning directly within the retrieval stage, GraphRAG significantly enhances the precision, efficiency, and reliability of information retrieval—essential for fast-paced, data-driven financial services environments.

In this example, JPMorgan is identified as UBS’s largest competitor. The graph allows traversal from UBS to JPMorgan through a "COMPETITOR_OF" edge, and from there to JPMorgan’s Chief Investment Officer, followed by a link to the CIO’s recommended reports on recent market trends. This capability enables precise, insightful responses beyond what traditional RAG pipelines can achieve.

Addressing Scalability in Knowledge Graph Construction

Constructing knowledge graphs traditionally requires considerable domain expertise and manual intervention, which is impractical at scale. Automated methods become essential but present significant challenges:

Unknown Document Content: Often, the specifics of documents are unknown upfront, complicating the creation of an ontology—the structured definition of entities and relationships.
Changing Document Sets: Documents frequently change or expand over time, requiring adaptable ontologies and knowledge graphs.
Indirect Document Access: In some cases, direct access to documents might be constrained, necessitating fully automated, scalable methods.

Pitfalls of Direct Ontology Mining

Intuitively, one might try extracting an ontology directly from the documents, identifying as many entities and relationships as possible. However, this approach typically yields overly broad and noisy results. Without specific guidance, it becomes challenging to discern genuinely valuable relationships from trivial ones.

For instance, direct document-based ontology extraction might generate connections that, while true, are irrelevant or rarely queried by users. Consequently, this method often leads to inflated, inefficient, and impractical knowledge graphs.

QuestionRAG: Ontology Mining Driven by Real User Questions

To circumvent these issues, we propose QuestionRAG—a strategy driven by user questions rather than raw document content. User questions provide targeted insight into which entities and relationships matter most, enabling precise and meaningful ontology extraction.

This approach follows a structured process:

Clustering Questions: Group similar questions into semantic clusters.
Ontology Extraction: Use these clusters and relevant document contexts to let an LLM identify essential entity types and relationships.
Ontology Aggregation: Consolidate and refine these outputs into a coherent, targeted ontology.
Knowledge Graph Construction: Implement the ontology using the Neo4j SDK, creating a scalable, adaptive knowledge graph.

The advantage of this method lies in its precision and adaptability. Unlike direct document-based extraction, QuestionRAG continuously aligns with real user needs, creating a focused, relevant ontology that evolves naturally as questions emerge.

Internal Session: Unique – GraphRAG Input (Video)

For those interested in a more detailed walkthrough of the ideas presented above, we recommend watching our recorded internal session, “GraphRAG Input”, held on April 15, 2025.

In this session, we covered the core concepts of RAG, GraphRAG, and the emerging approach we call QuestionRAG — including how these methods apply in real-world financial use cases.

Key topics discussed include:

A practical introduction to traditional RAG, including full-text and vector similarity search.
The importance of identifying the “golden chunk” and how retrieval affects LLM performance.
How GraphRAG improves retrieval by leveraging entities, relationships, and graph traversal.
Challenges in building knowledge graphs manually, and the benefits of automated ontology construction using LLMs and user questions.
Use cases such as listing all documents related to a specific entity or resolving complex queries through reasoning paths.
An introduction to Text-to-Cypher, context expansion, and the integration of GraphRAG into existing systems.
A preview of QuestionRAG, showing how user questions drive the ontology and guide knowledge graph construction.

The session concludes with a live Q&A covering taxonomy automation, error handling in relationships, graph scalability, and future plans for benchmarking GraphRAG’s effectiveness.

Outlook: Toward Implementation and Exploration

In our next blog post, we will delve deeper into implementing QuestionRAG in practice. We will explore:

Practical methods for effectively clustering user queries (similarity vs. dissimilarity).
Techniques for refining LLM-driven ontology extraction.
Potent resolving and processing of the ontology.
Real-world examples demonstrating the adaptive evolution of knowledge graphs.

Through this exploration, we aim to illustrate precisely how QuestionRAG provides a scalable solution, extending traditional methods with genuinely reasoning-based information retrieval methods.