The Two Paths of Document Intelligence
WhitepaperMay 1, 2026

The Two Paths of Document Intelligence

RAG and agent-native document access are diverging. This guide covers both, then demonstrates each on a 456-page NFL contract with working code.

Antonio Bustamante
Antonio Bustamante
May 1, 2026·7 min read·Whitepaper·

Every production team working with unstructured data faces the same fork in the road. On one side: the retrieval-augmented generation (RAG) pipeline that became the default approach in 2023. On the other: a newer pattern where AI agents access documents directly, navigating them the way a human would. Both work. They solve different problems. And the teams shipping the most resilient systems are starting to use both.

This guide covers the architecture, tradeoffs, and production considerations of each approach. Then it demonstrates both on a real document, the NFL's 456-page Collective Bargaining Agreement, with working code you can run today.

How We Got Here

Document intelligence has gone through three distinct waves.

The first wave was template-based extraction. OCR engines with hand-written rules for known layouts. Accurate on the templates they were built for, brittle on everything else. If a vendor changed their invoice format, the pipeline broke.

The second wave was ML classification. Models trained on labeled datasets to classify document types and extract named fields. Better at handling variation, but still bounded by training data. Adding a new document type meant weeks of labeling and fine-tuning.

The third wave, the one we're in now, was enabled by large language models. LLMs can read documents with no task-specific training. They understand context, handle variation, and can extract data from formats they've never seen before. This unlocked two fundamentally different architectures for making unstructured data usable at scale.

Approach 1: Compressed Semantics (RAG)

The RAG pipeline compresses a document's meaning ahead of time. You chunk the text, generate vector embeddings for each chunk, store them in a database, and retrieve the most relevant pieces at query time. The semantics are pre-computed and frozen at ingest.

The architecture looks like this:

  • Ingest: Split documents into overlapping chunks (typically 256–1024 tokens)
  • Embed: Generate vector representations using an embedding model
  • Store: Write vectors to a database (Pinecone, Weaviate, pgvector, etc.)
  • Retrieve: At query time, embed the question and find nearest-neighbor chunks
  • Generate: Feed the retrieved chunks to an LLM as context for the answer

This architecture works well when:

  • You have known question patterns, and the types of queries are predictable
  • Latency matters: pre-computed embeddings make retrieval fast
  • Volume is high: the same corpus is queried thousands of times
  • The domain is narrow: a well-tuned chunking strategy captures the information you need

RAG has earned its place in production. The challenge is what happens at the boundaries. Chunking is lossy by nature. A table header on page 1 that defines the unit of measurement for numbers on page 47 gets separated during chunking. Footnotes that modify the meaning of a clause three pages earlier are embedded in a different vector. Cross-references like "as defined in Article 12, Section 6(c)" point to context that may not be retrieved together.

For structured extraction (invoices, rate confirmations, forms), this rarely matters. For interpretive reasoning over long documents (contracts, regulatory filings, compliance audits), it can be a liability.

Approach 2: Just-in-Time Semantics (Agent-Native)

The agent-native approach skips chunking entirely. Instead, the document is parsed into a navigable structure (sections, entities, relationships) and made available through an API that agents can call on demand. The semantics aren't compressed ahead of time. They're resolved in the moment, by the agent, as it reasons through a task.

The access pattern mirrors how coding agents like Claude Code or Cursor navigate a codebase. They don't embed every file into a vector database. They ls the directory, grep for a symbol, cat the relevant file, and read what they need. The same pattern works for documents: list what's available, search for relevant sections, read the content, follow cross-references.

This architecture works well when:

  • Questions are open-ended, and you can't predict what the user will ask
  • Cross-section reasoning is required: the answer spans multiple parts of the document
  • Context integrity matters: losing a footnote or cross-reference changes the meaning
  • Documents are large and complex: contracts, regulatory filings, technical manuals

The tradeoff is token cost. An agent exploring a 456-page contract will consume more tokens per query than a vector lookup. For high-volume, repetitive queries on short documents, RAG is more cost-efficient. For complex reasoning over long documents, the additional token cost buys you something RAG can't provide: the full picture.

Side by Side

RAG (Compressed Semantics)Agent-Native (Just-in-Time)
Pre-processingChunk → embed → store in vector DBParse into navigable structure (sections, entities)
At query timeVector similarity search → top-K chunks → LLMAgent traverses document via API (ls, grep, cat, find)
Context windowFixed chunk size; cross-chunk context is lostFull document available; agent reads what it needs
Cross-referencesSeparated during chunking; may not co-retrievePreserved in document structure; agent can follow them
Best forHigh-volume structured extraction, known question patternsOpen-ended reasoning, compliance, multi-section analysis
LatencyFast and predictable (pre-computed embeddings)Variable (depends on how much the agent reads)
Cost per queryLow (embedding lookup + small LLM call)Higher (token-proportional to traversal depth)
InfrastructureRequires vector DB, embedding pipeline, chunk tuningRequires document parsing layer and agent runtime

The Convergence

These two approaches aren't competing. They're complementary. The same document can be structurally extracted and made available for agent exploration. In practice, the most robust systems do exactly that.

Consider a contract management platform. Structured extraction pulls the parties, effective dates, termination clauses, and renewal terms into the system of record. That feeds dashboards, alerts, and automated workflows. Meanwhile, the parsed document is available for an agent that can answer ad-hoc questions: "Does this contract have a non-compete clause?", "What happens if the vendor misses the SLA for three consecutive months?", "Compare the indemnification language in our two largest supplier contracts."

Different access patterns, same document, same API. The rest of this guide demonstrates both.

The Test Document

We're using the NFL-NFLPA Collective Bargaining Agreement, a 456-page contract that governs player compensation, salary caps, free agency, the college draft, drug testing, disciplinary procedures, and more. It's publicly available from the NFLPA website. It represents exactly the kind of document that stress-tests both approaches: long, dense, heavily cross-referenced, and full of defined terms that modify meaning across sections.

Eight pages from the NFL Collective Bargaining Agreement showing the scope and density of a 456-page contract

Structured Extraction with Extract

Extract is for when you know exactly what data you need from a document. You define a schema, send the document, and get back structured JSON. This is the RAG-adjacent approach: deterministic, schema-enforced, designed for automation.

Define a schema

Create an extraction function with the fields you care about. Here we're pulling key contract terms: the parties, dates, salary cap structure, and major provisions.

bash
1curl -X POST https://api.bem.ai/v3/functions \
2 -H "x-api-key: $BEM_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "functionName": "contract-terms",
6 "type": "extract",
7 "displayName": "Contract Terms Extractor",
8 "outputSchemaName": "ContractTerms",
9 "outputSchema": {
10 "type": "object",
11 "required": ["parties", "effectiveDate", "termLength"],
12 "properties": {
13 "parties": {
14 "type": "array",
15 "description": "Named parties to the agreement",
16 "items": {
17 "type": "object",
18 "properties": {
19 "name": { "type": "string" },
20 "role": { "type": "string" }
21 }
22 }
23 },
24 "effectiveDate": { "type": "string" },
25 "expirationDate": { "type": "string" },
26 "termLength": { "type": "string" },
27 "keyProvisions": {
28 "type": "array",
29 "items": {
30 "type": "object",
31 "properties": {
32 "title": { "type": "string" },
33 "summary": { "type": "string" }
34 }
35 }
36 },
37 "disputeResolution": { "type": "string" }
38 }
39 }
40 }'

Send the document

Wrap the function in a workflow and submit the PDF. For a 456-page document, use async mode.

bash
1# Create the workflow
2curl -X POST https://api.bem.ai/v3/workflows \
3 -H "x-api-key: $BEM_API_KEY" \
4 -H "Content-Type: application/json" \
5 -d '{
6 "name": "contract-analysis",
7 "mainNodeName": "contract-terms",
8 "nodes": [{
9 "name": "contract-terms",
10 "function": { "name": "contract-terms" }
11 }]
12 }'
13
14# Submit the 456-page CBA
15curl -X POST https://api.bem.ai/v3/workflows/contract-analysis/call \
16 -H "x-api-key: $BEM_API_KEY" \
17 -F "wait=false" \
18 -F "callReferenceID=nfl-cba-001" \
19 -F "file=@nfl-cba-2020.pdf"

The result

This is the actual output from running the NFL CBA through Extract. 456 pages processed in 95 seconds:

json
1{
2 "parties": [
3 {
4 "name": "National Football League Management Council",
5 "role": "Management Council"
6 },
7 {
8 "name": "National Football League Players Association",
9 "role": "Union"
10 }
11 ],
12 "effectiveDate": "2020-03-15",
13 "termLength": "11 years",
14 "salaryCap": {
15 "amount": "Calculated based on AR, Projected AR, and Player Cost Amount"
16 },
17 "keyProvisions": [
18 {
19 "title": "No Strike/Lockout/Suit",
20 "summary": "Neither party will engage in strikes or lockouts"
21 },
22 {
23 "title": "College Draft",
24 "summary": "Rules for annual and supplemental drafts, including eligibility"
25 },
26 {
27 "title": "Veteran Free Agency",
28 "summary": "Rules for unrestricted and restricted free agents"
29 },
30 {
31 "title": "Franchise and Transition Players",
32 "summary": "Rules for designating franchise/transition players"
33 },
34 {
35 "title": "Anti-Collusion",
36 "summary": "Prohibited conduct, enforcement provisions, burden of proof"
37 }
38 ],
39 "disputeResolution": "System Arbitrator and Impartial Arbitrator with binding authority"
40}

Every field maps to the schema you defined. If a field can't be determined with high confidence, it's flagged, not hallucinated.

The same flow in Python

python
1from bem import Bem
2
3client = Bem() # reads BEM_API_KEY from environment
4
5call = client.workflows.call(
6 workflow_name="contract-analysis",
7 file_path="nfl-cba-2020.pdf",
8 wait=True
9)
10
11terms = call.outputs[0].transformed_content
12print(f"Parties: {terms['parties'][0]['name']} & {terms['parties'][1]['name']}")
13print(f"Term: {terms['effectiveDate']}, {terms['termLength']}")
14
15for p in terms['keyProvisions']:
16 print(f" {p['title']}: {p['summary']}")

Agent-Native Access with Parse

Parse takes the opposite approach. Instead of defining what you want out of the document, you give Bem the document and it builds a navigable knowledge layer: sections with labeled content, named entities, and relationships. Your agents explore this layer through file-system-style operations.

Parse the document

bash
1# Create a parse function
2curl -X POST https://api.bem.ai/v3/functions \
3 -H "x-api-key: $BEM_API_KEY" \
4 -H "Content-Type: application/json" \
5 -d '{
6 "functionName": "document-parser",
7 "type": "parse",
8 "displayName": "Document Parser"
9 }'
10
11# Create workflow and send the CBA
12curl -X POST https://api.bem.ai/v3/workflows \
13 -H "x-api-key: $BEM_API_KEY" \
14 -H "Content-Type: application/json" \
15 -d '{
16 "name": "doc-parser",
17 "mainNodeName": "document-parser",
18 "nodes": [{ "name": "document-parser", "function": { "name": "document-parser" } }]
19 }'
20
21curl -X POST https://api.bem.ai/v3/workflows/doc-parser/call \
22 -H "x-api-key: $BEM_API_KEY" \
23 -F "wait=false" \
24 -F "callReferenceID=nfl-cba-parse" \
25 -F "file=@nfl-cba-2020.pdf"

The NFL CBA parses into 574 sections and 29 named entities. Once parsed, the document is accessible through the File System API.

The File System API

All file system operations go through a single endpoint: POST /v3/fs. The op field determines the operation. This is the same access pattern that coding agents use to navigate codebases, and it's immediately familiar to any LLM agent.

ls: List parsed documents

See all documents in your environment with metadata:

bash
1curl -X POST https://api.bem.ai/v3/fs \
2 -H "x-api-key: $BEM_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{ "callID": "YOUR_CALL_ID", "op": "ls" }'
json
1{
2 "op": "ls",
3 "data": [
4 {
5 "referenceID": "nfl-cba-parse-demo",
6 "functionName": "document-parser",
7 "parsedAt": "2026-04-30T23:05:23Z",
8 "pageCount": 116,
9 "sectionCount": 574,
10 "entityCount": 29,
11 "previewEntities": [
12 "National Football League",
13 "National Football League Players Association",
14 "NFL Collective Bargaining Agreement",
15 "NFL Player Contract",
16 "NFLPA Group Licensing Program",
17 "College Draft"
18 ]
19 }
20 ]
21}

stat: Document metadata

Get detailed metadata for a specific document:

bash
1curl -X POST https://api.bem.ai/v3/fs \
2 -H "x-api-key: $BEM_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{ "callID": "YOUR_CALL_ID", "op": "stat", "path": "nfl-cba-parse-demo" }'
json
1{
2 "op": "stat",
3 "data": {
4 "kind": "parsed_document",
5 "path": "nfl-cba-parse-demo",
6 "referenceID": "nfl-cba-parse-demo",
7 "functionName": "document-parser",
8 "pageCount": 116,
9 "sectionCount": 574,
10 "entityCount": 29,
11 "parsedAt": "2026-04-30T23:05:23Z"
12 }
13}

head: First sections of a document

Read the opening sections to understand document structure:

bash
1curl -X POST https://api.bem.ai/v3/fs \
2 -H "x-api-key: $BEM_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{ "callID": "YOUR_CALL_ID", "op": "head", "path": "nfl-cba-parse-demo" }'
json
1{
2 "op": "head",
3 "data": {
4 "sections": [
5 { "content": "NFL", "label": "Organization Logo", "page": 1, "type": "header" },
6 { "content": "COLLECTIVE BARGAINING AGREEMENT", "label": "Document Title", "page": 1, "type": "title" },
7 { "content": "MARCH 15, 2020", "label": "Effective Date", "page": 1, "type": "metadata" },
8 { "content": "TABLE OF CONTENTS", "label": "Document Title", "page": 2, "type": "title" },
9 {
10 "content": "PREAMBLE ... xvi\nARTICLE 1 DEFINITIONS ... 1\nARTICLE 2 GOVERNING AGREEMENT ... 5\nARTICLE 3 NO STRIKE/LOCKOUT/SUIT ... 7\nARTICLE 4 NFL PLAYER CONTRACT ... 9\nARTICLE 5 OPTION CLAUSES ... 16\nARTICLE 6 COLLEGE DRAFT ... 17\nARTICLE 7 ROOKIE COMPENSATION ... 21",
11 "label": "Table of Contents Entries",
12 "page": 2,
13 "type": "list"
14 }
15 ]
16 }
17}

grep: Search across sections

Search the entire document for a term or phrase. Results include the page, section label, and a snippet with context:

bash
1curl -X POST https://api.bem.ai/v3/fs \
2 -H "x-api-key: $BEM_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{ "callID": "YOUR_CALL_ID", "op": "grep", "pattern": "salary cap" }'
json
1{
2 "op": "grep",
3 "data": [
4 {
5 "referenceID": "nfl-cba-parse-demo",
6 "scope": "section",
7 "page": 4,
8 "sectionLabel": "ARTICLE 13 SALARY CAP ACCOUNTING RULES",
9 "snippet": "ARTICLE 13 SALARY CAP ACCOUNTING RULES ... 106\nSection 1. Calculation of the Salary Cap ..."
10 },
11 {
12 "referenceID": "nfl-cba-parse-demo",
13 "scope": "section",
14 "page": 4,
15 "sectionLabel": "ARTICLE 14 ENFORCEMENT OF THE SALARY CAP AND ROOKIE COMPENSATION POOL",
16 "snippet": "ARTICLE 14 ENFORCEMENT OF THE SALARY CAP AND ROOKIE COMPENSATION POOL ... 127\nSection 1. Undisclosed Terms ..."
17 },
18 {
19 "referenceID": "nfl-cba-parse-demo",
20 "scope": "section",
21 "page": 19,
22 "sectionLabel": "Definitions",
23 "snippet": "\"Free Agent\" means a player who is not under contract and is free to negotiate and sign a Player Contract with any NFL Club ..."
24 }
25 ]
26}

An agent can use grep to find every mention of a concept across all 574 sections, then drill into the specific pages that matter. No chunking strategy needed. The search is over the full parsed document.

find: Discover named entities

List the canonical entities Bem has identified in the document: organizations, committees, agreements, people, and concepts.

bash
1curl -X POST https://api.bem.ai/v3/fs \
2 -H "x-api-key: $BEM_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{ "callID": "YOUR_CALL_ID", "op": "find" }'
json
1{
2 "op": "find",
3 "data": [
4 {
5 "entityID": "ent_...",
6 "canonical": "NFL Head, Neck and Spine Committee",
7 "type": "committee",
8 "description": "A committee that reviews medical reports regarding head, neck, and spine health",
9 "mentionCount": 4,
10 "surfaceForms": ["NFL Head, Neck and Spine Committee", "NFL HEAD, NECK AND SPINE COMMITTEE"]
11 },
12 {
13 "entityID": "ent_...",
14 "canonical": "Personal Conduct Policy",
15 "type": "agreement",
16 "description": "A policy governing the conduct of individuals associated with the league",
17 "mentionCount": 3,
18 "surfaceForms": ["Personal Conduct Policy"]
19 },
20 {
21 "entityID": "ent_...",
22 "canonical": "Arbitration Panel",
23 "type": "committee",
24 "description": "A panel designated to handle arbitration hearings for non-injury and injury grievances",
25 "mentionCount": 1,
26 "surfaceForms": ["Arbitration Panel"]
27 }
28 ]
29}

Entities are deduplicated across surface forms (the same entity referred to different ways in the text) and enriched with descriptions derived from the document context. An agent can use find to build a mental model of the document's key concepts before diving into specific sections.

Connecting to an agent

The File System API is designed to be called directly by LLM agents. Here's what it looks like when an agent explores the NFL CBA:

python
1from bem import Bem
2
3client = Bem()
4
5# Agent explores: what's in this document?
6docs = client.fs.ls(call_id="YOUR_CALL_ID")
7print(f"{docs[0]['sectionCount']} sections, {docs[0]['entityCount']} entities")
8
9# Search for a concept
10matches = client.fs.grep(call_id="YOUR_CALL_ID", pattern="injury grievance")
11for m in matches:
12 print(f" p.{m['page']} [{m['sectionLabel']}]: {m['snippet'][:80]}...")
13
14# Follow a cross-reference to Article 43
15article_43 = client.fs.grep(call_id="YOUR_CALL_ID", pattern="Article 43")
16for m in article_43:
17 print(f" p.{m['page']} [{m['sectionLabel']}]")
18
19# Discover all named entities
20entities = client.fs.find(call_id="YOUR_CALL_ID")
21committees = [e for e in entities if e['type'] == 'committee']
22print(f"Found {len(committees)} committees referenced in the CBA")

Combining Both in a Single Workflow

Extract and Parse are composable. A single workflow can parse the document for agent access and extract specific fields into your system of record.

bash
1curl -X POST https://api.bem.ai/v3/workflows \
2 -H "x-api-key: $BEM_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "name": "full-contract-pipeline",
6 "mainNodeName": "document-parser",
7 "nodes": [
8 { "name": "document-parser", "function": { "name": "document-parser" } },
9 { "name": "contract-terms", "function": { "name": "contract-terms" } }
10 ],
11 "edges": [
12 { "from": "document-parser", "to": "contract-terms" }
13 ]
14 }'
15
16# One call produces both outputs
17curl -X POST https://api.bem.ai/v3/workflows/full-contract-pipeline/call \
18 -H "x-api-key: $BEM_API_KEY" \
19 -F "file=@nfl-cba-2020.pdf" \
20 -F "wait=false"

The result: your agents can explore the full document through the File System API while your downstream systems receive clean, structured JSON. One document, two access patterns.

Decision Framework

Use Extract when:

  • You know the fields you need before you see the document
  • The same schema applies across many documents (invoices, claims, rate confirmations)
  • You need deterministic, auditable outputs that feed automated workflows
  • Throughput matters: thousands of documents per day

Use Parse when:

  • Questions are open-ended or impossible to predict in advance
  • The answer requires reasoning across multiple sections or pages
  • You're building search, Q&A, or interactive document exploration
  • Context integrity is non-negotiable: losing a cross-reference changes the answer

Use both when:

  • You need structured data in your ERP and agent-accessible documents for ad-hoc questions
  • Contracts are both processed (extract renewal dates, parties, terms) and explored (answer compliance questions)
  • Your system serves both automated workflows and human users

Getting Started

Install the SDK:

bash
1# Python
2pip install bem-sdk
3
4# TypeScript / Node.js
5npm install bem-ai-sdk
6
7# Go
8go get github.com/bem-team/bem-go-sdk
9
10# C#
11dotnet add package Bem

Or use the CLI:

bash
1brew install bem-team/tools/bem
2bem workflows call contract-analysis --input ./contract.pdf

For agent-native workflows, add the MCP server to Claude, Cursor, or any MCP-compatible agent:

bash
1claude mcp add bem -- npx -y bem-ai-sdk-mcp

The agent can then parse documents, extract structured data, and navigate your document library directly.

Where This Is Going

The document intelligence landscape is converging. Teams that started with RAG are adding agent-native access for the questions their chunking strategy can't handle. Teams that started with agents are adding structured extraction for the workflows that need deterministic outputs. The end state isn't one approach replacing the other. It's infrastructure that supports both.

The code in this guide is production-ready. The NFL CBA results are real. If you want to try it on your own documents, start here.

Antonio Bustamante

Written by

Antonio Bustamante

May 1, 2026 · Whitepaper

CTA accent 1CTA accent 2

Ready to see it in action?

Talk to our team to walk through how Bem can work inside your stack.

Talk to the team