Processing a 456-Page Contract in Seconds: Two Approaches to Document Intelligence
WhitepaperApr 30, 2026

Processing a 456-Page Contract in Seconds: Two Approaches to Document Intelligence

The market is splitting into two approaches for working with unstructured data: compressed semantics (RAG) and just-in-time agent access. This guide walks through both using the NFL's 456-page CBA, with working code examples in Python, TypeScript, and curl.

Antonio Bustamante
Antonio Bustamante
Apr 30, 2026·4 min read·Whitepaper·

The way production teams access unstructured data is splitting into two fundamentally different approaches. Understanding both is the difference between building a system that scales and one that breaks at 10x volume.

This guide walks through both approaches using a real document: the NFL's 456-page Collective Bargaining Agreement. We'll process it with Bem's V3 API, show working code, and explain when to use which approach.

The Market Is Diverging

Over the past 18 months, we've watched hundreds of production teams build document intelligence systems. A clear pattern has emerged: there are now two schools of thought on how to make unstructured data usable, and they're moving in opposite directions.

Approach 1: Compress Semantics Ahead of Time (RAG)

The first approach, popularized by the RAG (Retrieval-Augmented Generation) wave, works like this: ingest your documents, chunk them, embed the chunks into a vector database, and retrieve relevant pieces at query time. The semantics are compressed before anyone asks a question.

  • Predictable latency at query time
  • Works well for known question patterns
  • Cost-efficient for high-volume, repetitive queries
  • Schema-enforced outputs with confidence scoring

The tradeoff: chunking is lossy. A table header on page 1 that defines the unit of measurement for numbers on page 47 gets separated during chunking. Context that spans sections, pages, or documents is lost. For many use cases, this is fine. For claims adjudication on a 456-page contract, it's a liability.

Approach 2: Just-in-Time Semantics (Agent-Native)

The second approach skips chunking entirely. Instead, the full document is made available to an agent, which traverses it using file-system-level operations: ls, grep, cat, find. The agent decides what to read, when to read it, and how deep to go. Semantics are resolved just-in-time.

  • No context loss from chunking
  • Handles cross-reference and multi-section reasoning
  • Agents can follow the document's own structure
  • Better for interpretive, open-ended questions

The tradeoff: token cost can be higher per query, and the agent's reasoning path is less predictable. For a chatbot answering "what holidays do union members get?", this approach is powerful. For processing 10,000 invoices per hour, it's overkill.

You Need Both

The best production systems use both approaches. Structured extraction for the workflows that run millions of times a day. Agent-native document access for the questions that require judgment. Bem supports both through two composable primitives: Extract and Parse.

The Document: NFL's 456-Page CBA

To demonstrate both approaches, we're using the NFL-NFLPA Collective Bargaining Agreement. It's 456 pages of dense legal language covering player compensation, salary caps, benefits, drug policies, disciplinary procedures, and more. It's the kind of document that takes a legal team days to review manually.

You can download the full PDF from the NFLPA website.

Approach 1: Extract (Structured Automation)

Extract is for when you know exactly what data you need from a document. You define a schema, Bem selects the right model, and you get back verified JSON with confidence scores on every field.

Step 1: Define Your Schema

First, create an extraction function with the fields you care about. Here we're pulling the key contract terms: parties, dates, salary cap schedule, and major provisions.

bash
1curl -X POST https://api.bem.ai/v3/functions \
2 -H "x-api-key: $BEM_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "functionName": "contract-terms",
6 "type": "extract",
7 "displayName": "Contract Terms Extractor",
8 "outputSchemaName": "ContractTerms",
9 "outputSchema": {
10 "type": "object",
11 "required": ["parties", "effectiveDate", "termLength"],
12 "properties": {
13 "parties": {
14 "type": "array",
15 "description": "Named parties to the agreement",
16 "items": {
17 "type": "object",
18 "properties": {
19 "name": { "type": "string" },
20 "role": { "type": "string" }
21 }
22 }
23 },
24 "effectiveDate": {
25 "type": "string",
26 "description": "When the agreement takes effect"
27 },
28 "expirationDate": {
29 "type": "string",
30 "description": "When the agreement expires"
31 },
32 "termLength": {
33 "type": "string",
34 "description": "Duration in years"
35 },
36 "salaryCap": {
37 "type": "object",
38 "properties": {
39 "amount": { "type": "string" },
40 "yearlySchedule": {
41 "type": "array",
42 "items": {
43 "type": "object",
44 "properties": {
45 "year": { "type": "string" },
46 "amount": { "type": "string" }
47 }
48 }
49 }
50 }
51 },
52 "keyProvisions": {
53 "type": "array",
54 "items": {
55 "type": "object",
56 "properties": {
57 "title": { "type": "string" },
58 "summary": { "type": "string" }
59 }
60 }
61 },
62 "disputeResolution": { "type": "string" }
63 }
64 }
65 }'

Step 2: Create a Workflow and Send the Document

Wrap the function in a workflow and send the PDF. For a 456-page document, use async mode and poll for results.

bash
1# Create workflow
2curl -X POST https://api.bem.ai/v3/workflows \
3 -H "x-api-key: $BEM_API_KEY" \
4 -H "Content-Type: application/json" \
5 -d '{
6 "name": "contract-analysis",
7 "mainNodeName": "contract-terms",
8 "nodes": [{
9 "name": "contract-terms",
10 "function": { "name": "contract-terms" }
11 }]
12 }'
13
14# Send the 456-page CBA
15curl -X POST https://api.bem.ai/v3/workflows/contract-analysis/call \
16 -H "x-api-key: $BEM_API_KEY" \
17 -F "wait=false" \
18 -F "callReferenceID=nfl-cba-001" \
19 -F "file=@nfl-cba-2020.pdf"

Step 3: Get Verified Results

When the call completes, you get back schema-enforced JSON. Every field has been verified against the document. Confidence scores tell you exactly how certain the extraction is.

json
1# Poll for results
2curl https://api.bem.ai/v3/calls/{callID} \
3 -H "x-api-key: $BEM_API_KEY"
4
5# Real output from the NFL CBA (456 pages, processed in ~95 seconds):
6{
7 "call": {
8 "status": "completed",
9 "outputs": [{
10 "transformedContent": {
11 "parties": [
12 {
13 "name": "National Football League Management Council",
14 "role": "Management Council"
15 },
16 {
17 "name": "National Football League Players Association",
18 "role": "Union"
19 }
20 ],
21 "effectiveDate": "2020-03-15",
22 "termLength": "11 years",
23 "salaryCap": {
24 "amount": "Calculated based on AR, Projected AR, and Player Cost Amount, with specific percentages for League Media AR, NFL Ventures/Postseason AR, and Local AR."
25 },
26 "keyProvisions": [
27 {
28 "title": "No Strike/Lockout/Suit",
29 "summary": "Neither party will engage in strikes or lockouts"
30 },
31 {
32 "title": "College Draft",
33 "summary": "Rules for annual and supplemental drafts, including eligibility and required tenders"
34 },
35 {
36 "title": "Veteran Free Agency",
37 "summary": "Rules for unrestricted and restricted free agents, qualifying offers, and right of first refusal"
38 },
39 {
40 "title": "Franchise and Transition Players",
41 "summary": "Rules for designating franchise/transition players, required tenders, and signing periods"
42 },
43 {
44 "title": "Anti-Collusion",
45 "summary": "Prohibited conduct, enforcement provisions, burden of proof"
46 }
47 ],
48 "disputeResolution": "System Arbitrator and Impartial Arbitrator with binding authority, appeals panel, and confidentiality provisions"
49 }
50 }]
51 }
52}

That's a real response. 456 pages of dense legal language, distilled into structured JSON in 95 seconds. Every field maps to your schema. If a field can't be extracted with high confidence, it's flagged, not hallucinated.

Using the Python SDK

The same flow in Python:

python
1from bem import Bem
2
3client = Bem() # reads BEM_API_KEY from environment
4
5# Create and call in one step
6call = client.workflows.call(
7 workflow_name="contract-analysis",
8 file_path="nfl-cba-2020.pdf",
9 wait=True
10)
11
12# Access structured output
13terms = call.outputs[0].transformed_content
14print(f"Agreement: {terms['parties'][0]['name']} vs {terms['parties'][1]['name']}")
15print(f"Term: {terms['effectiveDate']} to {terms['expirationDate']}")
16print(f"Salary cap 2023: {terms['salaryCap']['yearlySchedule'][2]['amount']}")
17
18for provision in terms['keyProvisions']:
19 print(f" - {provision['title']}: {provision['summary']}")

Approach 2: Parse (Agent-Native Document Access)

Parse is for when you don't know the questions in advance. Instead of defining a schema, you give Bem the document and it creates a fully navigable knowledge layer: raw text by section, named entities, and relationships between them. Your agents access this layer through file-system-style operations.

Step 1: Parse the Document

bash
1# Create a parse function
2curl -X POST https://api.bem.ai/v3/functions \
3 -H "x-api-key: $BEM_API_KEY" \
4 -H "Content-Type: application/json" \
5 -d '{
6 "functionName": "doc-parser",
7 "type": "parse",
8 "displayName": "Document Parser"
9 }'
10
11# Create workflow and send the CBA
12curl -X POST https://api.bem.ai/v3/workflows/doc-parser/call \
13 -H "x-api-key: $BEM_API_KEY" \
14 -F "wait=false" \
15 -F "file=@nfl-cba-2020.pdf"

Step 2: Navigate with File-System Operations

Once parsed, the document is accessible through operations your agents already understand: ls, cat, grep, find, stat. This is the same access pattern that coding agents like Claude Code use to navigate codebases. It's immediately familiar.

bash
1# List all parsed documents
2curl -X POST https://api.bem.ai/v3/fs \
3 -H "x-api-key: $BEM_API_KEY" \
4 -H "Content-Type: application/json" \
5 -d '{ "operation": "ls" }'
6
7# Search for "salary cap" across the entire document
8curl -X POST https://api.bem.ai/v3/fs \
9 -H "x-api-key: $BEM_API_KEY" \
10 -H "Content-Type: application/json" \
11 -d '{ "operation": "grep", "query": "salary cap" }'
12
13# Get metadata: how many pages, sections, entities
14curl -X POST https://api.bem.ai/v3/fs \
15 -H "x-api-key: $BEM_API_KEY" \
16 -H "Content-Type: application/json" \
17 -d '{ "operation": "stat", "documentId": "..." }'
18
19# Find all named entities (organizations, people, financial terms)
20curl -X POST https://api.bem.ai/v3/fs \
21 -H "x-api-key: $BEM_API_KEY" \
22 -H "Content-Type: application/json" \
23 -d '{ "operation": "find" }'

Step 3: Let Your Agent Explore

The real power is when you connect this to an AI agent. The agent can traverse the document the same way a human lawyer would: start with the table of contents, zoom into relevant sections, cross-reference terms, and follow the document's own structure.

python
1from bem import Bem
2
3client = Bem()
4
5# Parse the document
6call = client.workflows.call(
7 workflow_name="doc-parser",
8 file_path="nfl-cba-2020.pdf",
9 wait=True
10)
11
12# Agent-style exploration
13sections = client.fs.ls()
14print(f"Document has {len(sections)} parsed documents")
15
16# Search for specific content
17results = client.fs.grep("holiday")
18for match in results:
19 print(f"Found in section: {match['section']}")
20 print(f" Context: {match['snippet']}")
21
22# Get all entities (people, organizations, financial concepts)
23entities = client.fs.find()
24for entity in entities[:10]:
25 print(f" {entity['type']}: {entity['name']}")

When to Use Which

The decision is straightforward:

Use Extract when:

  • You know exactly what fields you need
  • The same schema applies across many documents
  • You need deterministic, auditable outputs
  • You're automating a process (claims, invoices, rate confirmations)
  • Volume matters: thousands of documents per day

Use Parse when:

  • Questions are open-ended or unpredictable
  • Users or agents need to explore documents interactively
  • Cross-section reasoning is required
  • You're building chatbots, search, or Q&A systems
  • Context that spans pages can't be lost to chunking

Use both when:

  • You want to parse a contract library for agent access AND extract specific fields into your ERP
  • You need a searchable knowledge layer AND automated workflow triggers
  • Extract and Parse are chainable in a single workflow

Chaining Extract + Parse in a Single Workflow

Bem's workflow engine lets you compose both approaches. Parse the full document for knowledge access, then extract specific fields into your system of record. One API call, two outputs.

bash
1curl -X POST https://api.bem.ai/v3/workflows \
2 -H "x-api-key: $BEM_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "name": "full-contract-pipeline",
6 "mainNodeName": "doc-parser",
7 "nodes": [
8 {
9 "name": "doc-parser",
10 "function": { "name": "doc-parser" }
11 },
12 {
13 "name": "contract-terms",
14 "function": { "name": "contract-terms" }
15 }
16 ],
17 "edges": [
18 { "from": "doc-parser", "to": "contract-terms" }
19 ]
20 }'
21
22# One call, both outputs
23curl -X POST https://api.bem.ai/v3/workflows/full-contract-pipeline/call \
24 -H "x-api-key: $BEM_API_KEY" \
25 -F "file=@nfl-cba-2020.pdf" \
26 -F "wait=false"

The result: your agents can explore the full document through the file-system API, while your downstream systems receive clean, structured JSON. Both outputs are verified, auditable, and improve with every correction.

Getting Started

Install the SDK in your language of choice:

bash
1# Python
2pip install bem-sdk
3
4# TypeScript / Node.js
5npm install bem-ai-sdk
6
7# Go
8go get github.com/bem-team/bem-go-sdk
9
10# C#
11dotnet add package Bem

Or use the CLI:

bash
1# Install
2brew install bem-team/tools/bem
3
4# Process a document
5bem workflows call contract-analysis --input ./nfl-cba-2020.pdf

For agent-native workflows, add the MCP server to Claude, Cursor, or any MCP-compatible agent:

bash
1claude mcp add bem -- npx -y bem-ai-sdk-mcp

Your agent can now call Bem directly. Ask it to parse a document, extract specific fields, or search across your entire document library.

The Bottom Line

The industry is moving past the "RAG vs. no-RAG" debate. Production teams need both structured extraction and agent-native document access. The question isn't which approach to use. It's whether your infrastructure supports both.

Bem is the production layer for unstructured data. One API, both approaches, verified outputs. Get started at bem.ai.

Antonio Bustamante

Written by

Antonio Bustamante

Apr 30, 2026 · Whitepaper

CTA accent 1CTA accent 2

Ready to see it in action?

Talk to our team to walk through how Bem can work inside your stack.

Talk to the team
Processing a 456-Page Contract in Seconds: Two Approaches to Document Intelligence | bem