Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124


A central part of any data recovery operation is the use of a component called a recoverer. Its job is to retrieve relevant content for a given query.
In the AI era, skimmers have been used as part of RAG pipelines. The approach is simple: retrieve the relevant documents, pass them to an LLM and let the model generate a response based on that context.
Although recovery may have seemed like a solved problem, it has actually not been solved for modern agentic AI workflows.
In research Released this week, Databricks introduced Instructed Retriever, a new architecture that the company says delivers up to 70% improvement over traditional RAG on complex, instruction-heavy enterprise question answering tasks. The difference is how the system understands and uses metadata.
"Many systems designed for retrieval before the era of large language models were actually designed for use by humans, not agents." Michael Bendersky, research director at Databricks, told VentureBeat. "What we found is that in many cases, errors coming from the agent are not due to the agent not being able to reason about the data. This is because the agent is not able to retrieve the right data in the first place."
The main problem comes from how traditional RAG handles what Bendersky calls "system level specifications." These include the full context of user instructions, metadata schemas, and examples that define what successful recovery should look like.
In a typical RAG pipeline, a user query is converted to an embedding, similar documents are retrieved from a vector database, and these results feed into a language model for generation. The system may incorporate basic filtering, but it basically treats each query as an isolated text matching exercise.
This approach fails with real business data. Business documents often include rich metadata such as timestamps, author information, product reviews, document types, and domain-specific attributes. When a user asks a question that requires reasoning about these metadata fields, traditional RAG struggles.
Consider this example: "Show me five-star product reviews from the last six months, but exclude anything from brand X." Traditional RAG cannot reliably translate this natural language constraint into appropriate database filters and structured queries.
"If you’re just using a traditional RAG system, there’s no way to use all of these different signals on the data that’s encapsulated in the metadata," » said Bendersky. "They must be passed to the agent itself so that it can do the correct recovery work."
The problem becomes more acute as companies move beyond simple document searching to agent-based workflows. A human using a search system can rephrase queries and apply filters manually when early results miss the mark. An autonomously functioning AI agent needs the retrieval system itself to understand and execute complex, multi-faceted instructions.
Databricks’ approach fundamentally rethinks the recovery pipeline. The system propagates the complete system specifications at each stage of recovery and generation. These specifications include user instructions, labeled examples, and index schemas.
The architecture adds three key features:
Breakdown of queries: The system divides complex queries into several parts into a search plan containing multiple keyword searches and filter instructions. A request for "Recent FooBrand products excluding Lite models" is broken down into structured queries with appropriate metadata filters. Traditional systems would attempt a single semantic search.
Reasoning about metadata: Natural language instructions are translated into database filters. "From last year" becomes a date filter, "five star review" becomes a rating filter. The system understands both what metadata is available and how to match it to user intent.
Contextual relevance: The reranking step uses the full context of user instructions to improve documents that match intent, even when keywords match less well. The system can prioritize recency or specific document types based on specifications rather than simple text similarity.
"The magic lies in how we construct queries," » said Bendersky. "We’re sort of trying to use the tool like an agent would, not like a human would. It has all the intricacies of the API and uses them to best effect."
In the second half of 2025, the industry moved away from RAG and toward agentic AI memory, sometimes called contextual memory. Approaches including Hindsight And A-MEM has emerged offering the promise of a RAG-free future.
Bendersky argues that contextual memory and sophisticated retrieval serve different purposes. Both are necessary for enterprise AI systems.
"It is not possible to put everything in your context memory," Bendersky noted. "You kind of need both. You need context memory to provide specifications, to provide schemas, but you still need access to data, which may be spread across multiple tables and documents."
Contextual memory excels at retaining task specifications, user preferences, and metadata patterns within a session. He keeps the "rules of the game" readily available. But the actual business data corpus exists outside of this pop-up. Most businesses have volumes of data that exceed even generous pop-ups by orders of magnitude.
Instructed Retriever leverages context memory for system-level specifications while using retrieval to access a broader data set. In-context specifications inform how the retriever constructs queries and interprets results. The retrieval system then extracts specific documents from potentially billions of candidates.
This division of labor is important for practical deployment. Loading millions of documents in context is neither feasible nor efficient. Metadata alone can be important when dealing with heterogeneous systems within an enterprise. Instructed Retriever solves this problem by making metadata immediately usable without the need for everything to fit into context.
Instructed Retriever is available now as part of Databricks Agent Bricks; it is integrated into the Knowledge Assistant product. Companies that use Knowledge Assistant to create question-and-answer systems on their documents automatically leverage the Instructed Retriever architecture without creating custom RAG pipelines.
The system is not available as open source, although Bendersky said Databricks is considering wider availability. For now, the company’s strategy is to offer benchmark tests such as StaRK-Instruct to the research community while maintaining proprietary implementation of its enterprise products.
The technology holds particular promise for businesses with complex, highly structured data that includes rich metadata. Bendersky cited use cases in finance, e-commerce and healthcare. Essentially, any area where documents have meaningful attributes beyond plain text can benefit.
"What we have seen in some cases unlocks things that the customer cannot do without," » said Bendersky.
He explained that without Instructed Retriever, users must perform more data management tasks to place content in the right structure and tables for an LLM to properly retrieve the correct information.
“Here you can just create an index with the right metadata, point your retriever at it, and it will work straight away,” he said.
For companies building RAG-based systems today, the research raises a crucial question: Is your recovery pipeline actually capable of following the instructions and reasoning through the metadata that your use case requires?
The 70% improvement demonstrated by Databricks is not achievable through incremental optimization. This represents an architectural difference in how system specifications flow through the fetch and build process. Organizations that have invested in carefully structuring their data with detailed metadata may find that traditional RAG leaves out much of the value of that structure.
For companies looking to implement AI systems that can reliably follow complex, multi-part instructions across heterogeneous data sources, research indicates that retrieval architecture can be a key differentiator.
Those still relying on basic RAG for production use cases involving rich metadata should evaluate whether their current approach can fundamentally meet their requirements. The performance gap demonstrated by Databricks suggests that a more sophisticated recovery architecture is now a major issue for companies with complex data estates.