I have watched smart teams ship impressive agent demos — and then watch those same agents fail the only test that matters: would an expert trust this under pressure?
The model is rarely the villain. The retrieval design is.
The expert retrieval gap
Most enterprise AI agents are built around a comforting pipeline: chunk, embed, retrieve, generate. It looks scientific. It scales on slides. It also assumes experts think in paragraphs — flat, interchangeable, equally worthy of attention.
They do not.
Experts navigate information landscapes. They anchor on structure — headers, diagrams, relationships, cross-references — before they commit to detail. They jump strategically. They ignore noise with practiced ruthlessness. That choreography is not a personality quirk. It is trained perception.
Current RAG systems often invert the order experts use. They retrieve isolated passages and present them with equal weight, as if the page were a bag of words. The result feels plausible and lands wrong — the special failure mode of probabilistic tools.
What eye-tracking keeps showing
In controlled studies — documentation, dashboards, incident timelines — the pattern repeats. Experts spend disproportionate time on scaffolding before depth. Novices read linearly and pay for it in time and error.
Heatmaps reveal anchor points: visual landmarks experts return to when complexity spikes. Miss those anchors in your UI or retrieval ordering, and you force every user to rebuild a mental map from scratch.
This is why I keep saying RAG is not only an embedding problem. It is a foraging problem.
A better path forward
Agent design should model expert information behavior, not textbook reading:
- Hierarchical retrieval — structure before chunks; relationships before isolated facts.
- Context-aware synthesis — answers that respect where information lived on the page, not just that it appeared in the top-k results.
- Interfaces that surface decision order — what to check first, what proves trust, what can wait.
If you are building enterprise agents, ask an uncomfortable question: does your system help experts see, or does it ask them to wade through confident text?
The organizations that figure this out will not win on model size. They will win on respect for how judgment actually forms — one scan path at a time.