Whitepaper—May 1, 2026

The Two Paths of Document Intelligence

RAG and agent-native document access are diverging. This guide covers both, then demonstrates each on a 456-page NFL contract with working code.

Antonio Bustamante

May 1, 2026·7 min read·Whitepaper·

Every production team working with unstructured data faces the same fork in the road. On one side: the retrieval-augmented generation (RAG) pipeline that became the default approach in 2023. On the other: a newer pattern where AI agents access documents directly, navigating them the way a human would. Both work. They solve different problems. And the teams shipping the most resilient systems are starting to use both.

This guide covers the architecture, tradeoffs, and production considerations of each approach. Then it demonstrates both on a real document, the NFL's 456-page Collective Bargaining Agreement, with working code you can run today.

How We Got Here

Document intelligence has gone through three distinct waves.

The first wave was template-based extraction. OCR engines with hand-written rules for known layouts. Accurate on the templates they were built for, brittle on everything else. If a vendor changed their invoice format, the pipeline broke.

The second wave was ML classification. Models trained on labeled datasets to classify document types and extract named fields. Better at handling variation, but still bounded by training data. Adding a new document type meant weeks of labeling and fine-tuning.

The third wave, the one we're in now, was enabled by large language models. LLMs can read documents with no task-specific training. They understand context, handle variation, and can extract data from formats they've never seen before. This unlocked two fundamentally different architectures for making unstructured data usable at scale.

Approach 1: Compressed Semantics (RAG)

The RAG pipeline compresses a document's meaning ahead of time. You chunk the text, generate vector embeddings for each chunk, store them in a database, and retrieve the most relevant pieces at query time. The semantics are pre-computed and frozen at ingest.

The architecture looks like this:

Ingest: Split documents into overlapping chunks (typically 256–1024 tokens)
Embed: Generate vector representations using an embedding model
Store: Write vectors to a database (Pinecone, Weaviate, pgvector, etc.)
Retrieve: At query time, embed the question and find nearest-neighbor chunks
Generate: Feed the retrieved chunks to an LLM as context for the answer

This architecture works well when:

You have known question patterns, and the types of queries are predictable
Latency matters: pre-computed embeddings make retrieval fast
Volume is high: the same corpus is queried thousands of times
The domain is narrow: a well-tuned chunking strategy captures the information you need

RAG has earned its place in production. The challenge is what happens at the boundaries. Chunking is lossy by nature. A table header on page 1 that defines the unit of measurement for numbers on page 47 gets separated during chunking. Footnotes that modify the meaning of a clause three pages earlier are embedded in a different vector. Cross-references like "as defined in Article 12, Section 6(c)" point to context that may not be retrieved together.

For structured extraction (invoices, rate confirmations, forms), this rarely matters. For interpretive reasoning over long documents (contracts, regulatory filings, compliance audits), it can be a liability.

Approach 2: Just-in-Time Semantics (Agent-Native)

The agent-native approach skips chunking entirely. Instead, the document is parsed into a navigable structure (sections, entities, relationships) and made available through an API that agents can call on demand. The semantics aren't compressed ahead of time. They're resolved in the moment, by the agent, as it reasons through a task.

The access pattern mirrors how coding agents like Claude Code or Cursor navigate a codebase. They don't embed every file into a vector database. They ls the directory, grep for a symbol, cat the relevant file, and read what they need. The same pattern works for documents: list what's available, search for relevant sections, read the content, follow cross-references.

This architecture works well when:

Questions are open-ended, and you can't predict what the user will ask
Cross-section reasoning is required: the answer spans multiple parts of the document
Context integrity matters: losing a footnote or cross-reference changes the meaning
Documents are large and complex: contracts, regulatory filings, technical manuals

The tradeoff is token cost. An agent exploring a 456-page contract will consume more tokens per query than a vector lookup. For high-volume, repetitive queries on short documents, RAG is more cost-efficient. For complex reasoning over long documents, the additional token cost buys you something RAG can't provide: the full picture.

Side by Side

	RAG (Compressed Semantics)	Agent-Native (Just-in-Time)
Pre-processing	Chunk → embed → store in vector DB	Parse into navigable structure (sections, entities)
At query time	Vector similarity search → top-K chunks → LLM	Agent traverses document via API (ls, grep, cat, find)
Context window	Fixed chunk size; cross-chunk context is lost	Full document available; agent reads what it needs
Cross-references	Separated during chunking; may not co-retrieve	Preserved in document structure; agent can follow them
Best for	High-volume structured extraction, known question patterns	Open-ended reasoning, compliance, multi-section analysis
Latency	Fast and predictable (pre-computed embeddings)	Variable (depends on how much the agent reads)
Cost per query	Low (embedding lookup + small LLM call)	Higher (token-proportional to traversal depth)
Infrastructure	Requires vector DB, embedding pipeline, chunk tuning	Requires document parsing layer and agent runtime

The Convergence

These two approaches aren't competing. They're complementary. The same document can be structurally extracted and made available for agent exploration. In practice, the most robust systems do exactly that.

Consider a contract management platform. Structured extraction pulls the parties, effective dates, termination clauses, and renewal terms into the system of record. That feeds dashboards, alerts, and automated workflows. Meanwhile, the parsed document is available for an agent that can answer ad-hoc questions: "Does this contract have a non-compete clause?", "What happens if the vendor misses the SLA for three consecutive months?", "Compare the indemnification language in our two largest supplier contracts."

Different access patterns, same document, same API. The rest of this guide demonstrates both.

The Test Document

We're using the NFL-NFLPA Collective Bargaining Agreement, a 456-page contract that governs player compensation, salary caps, free agency, the college draft, drug testing, disciplinary procedures, and more. It's publicly available from the NFLPA website. It represents exactly the kind of document that stress-tests both approaches: long, dense, heavily cross-referenced, and full of defined terms that modify meaning across sections.

Eight pages from the NFL Collective Bargaining Agreement showing the scope and density of a 456-page contract

Structured Extraction with Extract

Extract is for when you know exactly what data you need from a document. You define a schema, send the document, and get back structured JSON. This is the RAG-adjacent approach: deterministic, schema-enforced, designed for automation.

Define a schema

Create an extraction function with the fields you care about. Here we're pulling key contract terms: the parties, dates, salary cap structure, and major provisions.

bash

1curl -X POST https://api.bem.ai/v3/functions \
2  -H "x-api-key: $BEM_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "functionName": "contract-terms",
6    "type": "extract",
7    "displayName": "Contract Terms Extractor",
8    "outputSchemaName": "ContractTerms",
9    "outputSchema": {
10      "type": "object",
11      "required": ["parties", "effectiveDate", "termLength"],
12      "properties": {
13        "parties": {
14          "type": "array",
15          "description": "Named parties to the agreement",
16          "items": {
17            "type": "object",
18            "properties": {
19              "name": { "type": "string" },
20              "role": { "type": "string" }
21            }
22          }
23        },
24        "effectiveDate": { "type": "string" },
25        "expirationDate": { "type": "string" },
26        "termLength": { "type": "string" },
27        "keyProvisions": {
28          "type": "array",
29          "items": {
30            "type": "object",
31            "properties": {
32              "title": { "type": "string" },
33              "summary": { "type": "string" }
34            }
35          }
36        },
37        "disputeResolution": { "type": "string" }
38      }
39    }
40  }'

Send the document

Wrap the function in a workflow and submit the PDF. For a 456-page document, use async mode.

bash

1# Create the workflow
2curl -X POST https://api.bem.ai/v3/workflows \
3  -H "x-api-key: $BEM_API_KEY" \
4  -H "Content-Type: application/json" \
5  -d '{
6    "name": "contract-analysis",
7    "mainNodeName": "contract-terms",
8    "nodes": [{
9      "name": "contract-terms",
10      "function": { "name": "contract-terms" }
11    }]
12  }'
13
14# Submit the 456-page CBA
15curl -X POST https://api.bem.ai/v3/workflows/contract-analysis/call \
16  -H "x-api-key: $BEM_API_KEY" \
17  -F "wait=false" \
18  -F "callReferenceID=nfl-cba-001" \
19  -F "file=@nfl-cba-2020.pdf"

The result

This is the actual output from running the NFL CBA through Extract. 456 pages processed in 95 seconds:

json

1{
2  "parties": [
3    {
4      "name": "National Football League Management Council",
5      "role": "Management Council"
6    },
7    {
8      "name": "National Football League Players Association",
9      "role": "Union"
10    }
11  ],
12  "effectiveDate": "2020-03-15",
13  "termLength": "11 years",
14  "salaryCap": {
15    "amount": "Calculated based on AR, Projected AR, and Player Cost Amount"
16  },
17  "keyProvisions": [
18    {
19      "title": "No Strike/Lockout/Suit",
20      "summary": "Neither party will engage in strikes or lockouts"
21    },
22    {
23      "title": "College Draft",
24      "summary": "Rules for annual and supplemental drafts, including eligibility"
25    },
26    {
27      "title": "Veteran Free Agency",
28      "summary": "Rules for unrestricted and restricted free agents"
29    },
30    {
31      "title": "Franchise and Transition Players",
32      "summary": "Rules for designating franchise/transition players"
33    },
34    {
35      "title": "Anti-Collusion",
36      "summary": "Prohibited conduct, enforcement provisions, burden of proof"
37    }
38  ],
39  "disputeResolution": "System Arbitrator and Impartial Arbitrator with binding authority"
40}

Every field maps to the schema you defined. If a field can't be determined with high confidence, it's flagged, not hallucinated.

The same flow in Python

python

1from bem import Bem
2
3client = Bem()  # reads BEM_API_KEY from environment
4
5call = client.workflows.call(
6    workflow_name="contract-analysis",
7    file_path="nfl-cba-2020.pdf",
8    wait=True
9)
10
11terms = call.outputs[0].transformed_content
12print(f"Parties: {terms['parties'][0]['name']} & {terms['parties'][1]['name']}")
13print(f"Term: {terms['effectiveDate']}, {terms['termLength']}")
14
15for p in terms['keyProvisions']:
16    print(f"  {p['title']}: {p['summary']}")

Agent-Native Access with Parse

Parse takes the opposite approach. Instead of defining what you want out of the document, you give Bem the document and it builds a navigable knowledge layer: sections with labeled content, named entities, and relationships. Your agents explore this layer through file-system-style operations.

Parse the document

bash

1# Create a parse function
2curl -X POST https://api.bem.ai/v3/functions \
3  -H "x-api-key: $BEM_API_KEY" \
4  -H "Content-Type: application/json" \
5  -d '{
6    "functionName": "document-parser",
7    "type": "parse",
8    "displayName": "Document Parser"
9  }'
10
11# Create workflow and send the CBA
12curl -X POST https://api.bem.ai/v3/workflows \
13  -H "x-api-key: $BEM_API_KEY" \
14  -H "Content-Type: application/json" \
15  -d '{
16    "name": "doc-parser",
17    "mainNodeName": "document-parser",
18    "nodes": [{ "name": "document-parser", "function": { "name": "document-parser" } }]
19  }'
20
21curl -X POST https://api.bem.ai/v3/workflows/doc-parser/call \
22  -H "x-api-key: $BEM_API_KEY" \
23  -F "wait=false" \
24  -F "callReferenceID=nfl-cba-parse" \
25  -F "file=@nfl-cba-2020.pdf"

The NFL CBA parses into 574 sections and 29 named entities. Once parsed, the document is accessible through the File System API.

The File System API

All file system operations go through a single endpoint: POST /v3/fs. The op field determines the operation. This is the same access pattern that coding agents use to navigate codebases, and it's immediately familiar to any LLM agent.

`ls`: List parsed documents

See all documents in your environment with metadata:

bash

1curl -X POST https://api.bem.ai/v3/fs \
2  -H "x-api-key: $BEM_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{ "callID": "YOUR_CALL_ID", "op": "ls" }'

json

1{
2  "op": "ls",
3  "data": [
4    {
5      "referenceID": "nfl-cba-parse-demo",
6      "functionName": "document-parser",
7      "parsedAt": "2026-04-30T23:05:23Z",
8      "pageCount": 116,
9      "sectionCount": 574,
10      "entityCount": 29,
11      "previewEntities": [
12        "National Football League",
13        "National Football League Players Association",
14        "NFL Collective Bargaining Agreement",
15        "NFL Player Contract",
16        "NFLPA Group Licensing Program",
17        "College Draft"
18      ]
19    }
20  ]
21}

`stat`: Document metadata

Get detailed metadata for a specific document:

bash

1curl -X POST https://api.bem.ai/v3/fs \
2  -H "x-api-key: $BEM_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{ "callID": "YOUR_CALL_ID", "op": "stat", "path": "nfl-cba-parse-demo" }'

json

1{
2  "op": "stat",
3  "data": {
4    "kind": "parsed_document",
5    "path": "nfl-cba-parse-demo",
6    "referenceID": "nfl-cba-parse-demo",
7    "functionName": "document-parser",
8    "pageCount": 116,
9    "sectionCount": 574,
10    "entityCount": 29,
11    "parsedAt": "2026-04-30T23:05:23Z"
12  }
13}

`head`: First sections of a document

Read the opening sections to understand document structure:

bash

1curl -X POST https://api.bem.ai/v3/fs \
2  -H "x-api-key: $BEM_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{ "callID": "YOUR_CALL_ID", "op": "head", "path": "nfl-cba-parse-demo" }'

json

1{
2  "op": "head",
3  "data": {
4    "sections": [
5      { "content": "NFL", "label": "Organization Logo", "page": 1, "type": "header" },
6      { "content": "COLLECTIVE BARGAINING AGREEMENT", "label": "Document Title", "page": 1, "type": "title" },
7      { "content": "MARCH 15, 2020", "label": "Effective Date", "page": 1, "type": "metadata" },
8      { "content": "TABLE OF CONTENTS", "label": "Document Title", "page": 2, "type": "title" },
9      {
10        "content": "PREAMBLE ... xvi\nARTICLE 1 DEFINITIONS ... 1\nARTICLE 2 GOVERNING AGREEMENT ... 5\nARTICLE 3 NO STRIKE/LOCKOUT/SUIT ... 7\nARTICLE 4 NFL PLAYER CONTRACT ... 9\nARTICLE 5 OPTION CLAUSES ... 16\nARTICLE 6 COLLEGE DRAFT ... 17\nARTICLE 7 ROOKIE COMPENSATION ... 21",
11        "label": "Table of Contents Entries",
12        "page": 2,
13        "type": "list"
14      }
15    ]
16  }
17}

`grep`: Search across sections

Search the entire document for a term or phrase. Results include the page, section label, and a snippet with context:

bash

1curl -X POST https://api.bem.ai/v3/fs \
2  -H "x-api-key: $BEM_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{ "callID": "YOUR_CALL_ID", "op": "grep", "pattern": "salary cap" }'

json

1{
2  "op": "grep",
3  "data": [
4    {
5      "referenceID": "nfl-cba-parse-demo",
6      "scope": "section",
7      "page": 4,
8      "sectionLabel": "ARTICLE 13 SALARY CAP ACCOUNTING RULES",
9      "snippet": "ARTICLE 13 SALARY CAP ACCOUNTING RULES ... 106\nSection 1. Calculation of the Salary Cap ..."
10    },
11    {
12      "referenceID": "nfl-cba-parse-demo",
13      "scope": "section",
14      "page": 4,
15      "sectionLabel": "ARTICLE 14 ENFORCEMENT OF THE SALARY CAP AND ROOKIE COMPENSATION POOL",
16      "snippet": "ARTICLE 14 ENFORCEMENT OF THE SALARY CAP AND ROOKIE COMPENSATION POOL ... 127\nSection 1. Undisclosed Terms ..."
17    },
18    {
19      "referenceID": "nfl-cba-parse-demo",
20      "scope": "section",
21      "page": 19,
22      "sectionLabel": "Definitions",
23      "snippet": "\"Free Agent\" means a player who is not under contract and is free to negotiate and sign a Player Contract with any NFL Club ..."
24    }
25  ]
26}

An agent can use grep to find every mention of a concept across all 574 sections, then drill into the specific pages that matter. No chunking strategy needed. The search is over the full parsed document.

`find`: Discover named entities

List the canonical entities Bem has identified in the document: organizations, committees, agreements, people, and concepts.

bash

1curl -X POST https://api.bem.ai/v3/fs \
2  -H "x-api-key: $BEM_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{ "callID": "YOUR_CALL_ID", "op": "find" }'

json

1{
2  "op": "find",
3  "data": [
4    {
5      "entityID": "ent_...",
6      "canonical": "NFL Head, Neck and Spine Committee",
7      "type": "committee",
8      "description": "A committee that reviews medical reports regarding head, neck, and spine health",
9      "mentionCount": 4,
10      "surfaceForms": ["NFL Head, Neck and Spine Committee", "NFL HEAD, NECK AND SPINE COMMITTEE"]
11    },
12    {
13      "entityID": "ent_...",
14      "canonical": "Personal Conduct Policy",
15      "type": "agreement",
16      "description": "A policy governing the conduct of individuals associated with the league",
17      "mentionCount": 3,
18      "surfaceForms": ["Personal Conduct Policy"]
19    },
20    {
21      "entityID": "ent_...",
22      "canonical": "Arbitration Panel",
23      "type": "committee",
24      "description": "A panel designated to handle arbitration hearings for non-injury and injury grievances",
25      "mentionCount": 1,
26      "surfaceForms": ["Arbitration Panel"]
27    }
28  ]
29}

Entities are deduplicated across surface forms (the same entity referred to different ways in the text) and enriched with descriptions derived from the document context. An agent can use find to build a mental model of the document's key concepts before diving into specific sections.

Connecting to an agent

The File System API is designed to be called directly by LLM agents. Here's what it looks like when an agent explores the NFL CBA:

python

1from bem import Bem
2
3client = Bem()
4
5# Agent explores: what's in this document?
6docs = client.fs.ls(call_id="YOUR_CALL_ID")
7print(f"{docs[0]['sectionCount']} sections, {docs[0]['entityCount']} entities")
8
9# Search for a concept
10matches = client.fs.grep(call_id="YOUR_CALL_ID", pattern="injury grievance")
11for m in matches:
12    print(f"  p.{m['page']} [{m['sectionLabel']}]: {m['snippet'][:80]}...")
13
14# Follow a cross-reference to Article 43
15article_43 = client.fs.grep(call_id="YOUR_CALL_ID", pattern="Article 43")
16for m in article_43:
17    print(f"  p.{m['page']} [{m['sectionLabel']}]")
18
19# Discover all named entities
20entities = client.fs.find(call_id="YOUR_CALL_ID")
21committees = [e for e in entities if e['type'] == 'committee']
22print(f"Found {len(committees)} committees referenced in the CBA")

Combining Both in a Single Workflow

Extract and Parse are composable. A single workflow can parse the document for agent access and extract specific fields into your system of record.

bash

1curl -X POST https://api.bem.ai/v3/workflows \
2  -H "x-api-key: $BEM_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "name": "full-contract-pipeline",
6    "mainNodeName": "document-parser",
7    "nodes": [
8      { "name": "document-parser", "function": { "name": "document-parser" } },
9      { "name": "contract-terms", "function": { "name": "contract-terms" } }
10    ],
11    "edges": [
12      { "from": "document-parser", "to": "contract-terms" }
13    ]
14  }'
15
16# One call produces both outputs
17curl -X POST https://api.bem.ai/v3/workflows/full-contract-pipeline/call \
18  -H "x-api-key: $BEM_API_KEY" \
19  -F "file=@nfl-cba-2020.pdf" \
20  -F "wait=false"

The result: your agents can explore the full document through the File System API while your downstream systems receive clean, structured JSON. One document, two access patterns.

Decision Framework

Use Extract when:

You know the fields you need before you see the document
The same schema applies across many documents (invoices, claims, rate confirmations)
You need deterministic, auditable outputs that feed automated workflows
Throughput matters: thousands of documents per day

Use Parse when:

Questions are open-ended or impossible to predict in advance
The answer requires reasoning across multiple sections or pages
You're building search, Q&A, or interactive document exploration
Context integrity is non-negotiable: losing a cross-reference changes the answer

Use both when:

You need structured data in your ERP and agent-accessible documents for ad-hoc questions
Contracts are both processed (extract renewal dates, parties, terms) and explored (answer compliance questions)
Your system serves both automated workflows and human users

Getting Started

Install the SDK:

bash

1# Python
2pip install bem-sdk
3
4# TypeScript / Node.js
5npm install bem-ai-sdk
6
7# Go
8go get github.com/bem-team/bem-go-sdk
9
10# C#
11dotnet add package Bem

Or use the CLI:

bash

1brew install bem-team/tools/bem
2bem workflows call contract-analysis --input ./contract.pdf

For agent-native workflows, add the MCP server to Claude, Cursor, or any MCP-compatible agent:

bash

1claude mcp add bem -- npx -y bem-ai-sdk-mcp

The agent can then parse documents, extract structured data, and navigate your document library directly.

Where This Is Going

The document intelligence landscape is converging. Teams that started with RAG are adding agent-native access for the questions their chunking strategy can't handle. Teams that started with agents are adding structured extraction for the workflows that need deterministic outputs. The end state isn't one approach replacing the other. It's infrastructure that supports both.

The code in this guide is production-ready. The NFL CBA results are real. If you want to try it on your own documents, start here.

Written by

Antonio Bustamante

May 1, 2026 · Whitepaper

Ready to see it in action?

Talk to our team to walk through how Bem can work inside your stack.

Talk to the team

The Two Paths of Document Intelligence

How We Got Here

Approach 1: Compressed Semantics (RAG)

Approach 2: Just-in-Time Semantics (Agent-Native)

Side by Side

The Convergence

The Test Document

Structured Extraction with Extract

Define a schema

Send the document

The result

The same flow in Python

Agent-Native Access with Parse

Parse the document

The File System API

ls: List parsed documents

stat: Document metadata

head: First sections of a document

grep: Search across sections

find: Discover named entities

Connecting to an agent

Combining Both in a Single Workflow

Decision Framework

Getting Started

Where This Is Going

Ready to see it in action?

`ls`: List parsed documents

`stat`: Document metadata

`head`: First sections of a document

`grep`: Search across sections

`find`: Discover named entities