LightRAG Evaluation

2 min read

Evaluating LightRAG for Complex Workflows

Our recent exploratory R&D focused on whether LightRAG, a graph-based retrieval system, could improve our existing agentic workflow, UniqueAI-Chat. We tested these systems against two specific challenges: multi-step reasoning and document-wide information aggregation.

Using our due due diligence dataset (370 pages across 9 PDF documents), we benchmarked how these architectures handle complex queries that go beyond simple fact-retrieval.

1. Multi-Step Reasoning

Multi-step reasoning involves questions that require a sequence of dependent logic. For example: "Does the company with the largest market cap in the S&P 500 tech sector have a higher dividend yield than its closest competitor?"

Required reasoning steps:

Identify the largest company in the S&P 500 tech sector (e.g., Nvidia).
Identify its closest competitor by market cap (e.g., Apple).
Retrieve and compare the dividend yields for both.

The Findings:

UniqueAI-Chat: Answered 5 out of 6 complex questions correctly, demonstrating high competence in navigating these intermediate logic steps.
LightRAG: Answered 2 out of 6 questions correctly.

Conclusion: In its default configuration, LightRAG struggled with sequential logic. Current evidence suggests that agentic workflows are already highly capable in this area, and LightRAG did not provide a measurable performance boost.

2. Document-Wide Information Aggregation

This task requires scanning a large volume of documents to compile a complete list or comparison—tasks where conventional RAG typically fails. We used the task "Map employees to office locations" as our benchmark, as this data is scattered across various org charts and text sections.

The Findings:

UniqueAI-Chat: Retrieved an average of 72.4% of relevant employees with a 4.5% error rate.
LightRAG: Retrieved 44.5% of relevant employees with a 3.8% error rate.

To understand why LightRAG underperformed, we analyzed its internal pipeline across five runs for the query "Which employees work in Zurich?" Errors compounded at every stage:

Knowledge Graph (KG) Extraction: Only 17 of 23 employees were correctly identified and linked to the right location in the initial graph.
Retrieval: Relevant information present in the KG was not always successfully passed to the LLM.
Final Output: Even when the context reached the LLM, the final written answer often omitted correct details.

Conclusion: LightRAG’s initial extraction layer created a bottleneck. Even with a perfect retrieval and generation process, the underlying graph was less complete than the results produced by UniqueAI’-Chat standard methods.