Experimental · Coming summer 2026

A new shape of inference for the work that matters most.

bem Local runs your verified document workflows on the silicon you already own. Parse, extract, classify, and reconcile the unstructured data your business runs on, entirely inside your boundary. No shared cloud. No data egress. Just your hardware, fully used.

01 / backend
NVIDIA
CUDA · the GPUs you own
02 / backend
Apple Silicon
MLX · unified memory
03 / posture
Zero egress
nothing leaves your walls
04 / status
Summer 2026
early access opening
§01 / why local

The work that can’t leave the building.

Claims, underwriting, KYC, procurement: the operations that run an enterprise carry legal, financial, and clinical weight. The documents behind them often cannot be sent to a shared cloud at all. bem started as the production layer for unstructured data. bem Local runs that same layer where the data already lives, so the operations your business depends on never have to choose between automation and control.

§02 / where bem sits

The operating system on top of inference.

Foundation models are the engine. bem is the layer above them: it routes every request to the right model on the right hardware, then verifies the result before it ships. The models underneath can change, cloud or local, one or fifteen. The layer on top does not. You move the compute, bem moves with it, and inside your walls nothing leaves.

§03 / heterogeneous compute

Built for the silicon you already run.

NVIDIA · CUDA

Workstation to datacenter

The stack most local inference runs on. bem Local targets CUDA across the NVIDIA GPUs your teams already have, from a single workstation to a rack in your datacenter.

Apple Silicon · MLX

Unified memory, fully used

On M-series Macs, CPU and GPU share one physical memory pool. Through MLX, a single machine holds models a discrete GPU cannot, with tensors moving zero-copy. A quiet box on a desk becomes a capable inference node.

On the roadmap: AMD ROCm, Vulkan compute, and broader heterogeneous backends. bem Local is built on a compute abstraction, so as your fleet changes, the workflows do not.

§04 / token efficiency

Drive inference without the waste.

The cheapest token is the one you never spend. bem earns its cost back at the layer it controls: it routes each call to the smallest model that clears your quality bar, never re-processes a document it has already seen, and makes models return only the fields you asked for.

50–70%
target cost reduction, depending on the workload
01

Route to the cheapest model that clears the bar

Easy work goes to a small or local model; only the hard cases reach a frontier model. Published routing methods hold roughly 95% of frontier quality at a fraction of the cost.

02

Never pay for the same document twice

A file bem has already parsed is served from memory, not re-run. Across enterprise corpora, a quarter to a third of content is duplicate or boilerplate.

03

Generate the answer, not the prose

Schema-constrained decoding emits the structured fields you defined, not paragraphs or hidden reasoning. The output tokens you pay the most for are the only ones you produce.

04

Deterministic replay

Same inputs, same outputs, every run. Auditable and reproducible, which is what regulated work requires.

§05 / verified workflow building

Probabilistic models. Deterministic guardrails.

Running locally changes where inference happens, not whether you can trust it. Every function carries accuracy scores (precision, recall, F1). Low-confidence outputs route to human review, and every correction becomes training data that sharpens your own models. You compose the same functions into the same auditable workflows. It never guesses. It shows its work.

§06 / data sovereignty

Your data never leaves your boundary.

On-device

Inference runs on your machine. Nothing is sent to an external service.

Air-gapped

Deploy with no network connection at all, for the most contained environments.

Data residency

Data physically stays inside the jurisdiction and boundary you choose.

SOC 2 Type IIHIPAAEU data sovereigntyEnd-to-end encryption
§07 / questions

What teams ask about bem Local.

What is bem Local?

bem Local is bem running on your own hardware. The same verified workflows you build in the bem cloud (parse, extract, classify, join, enrich) run on the silicon inside your boundary, so unstructured documents become structured, audited data without leaving your infrastructure.

Which hardware does bem Local support?

At launch, bem Local accelerates on NVIDIA GPUs through CUDA and on Apple Silicon through MLX, the framework built for the M-series unified-memory architecture. AMD ROCm, Vulkan compute, and broader heterogeneous backends are on the roadmap. bem Local is built on a compute abstraction, so it runs where your hardware lives.

Does my data leave my infrastructure?

No. bem Local is designed for zero data egress. Inputs, model weights, and outputs stay on hardware you control. It can run fully on-premise, in your own cloud VPC, or air-gapped with no network connection at all.

How is bem Local different from the bem cloud?

It is the same platform and the same V3 API, placed inside your boundary. You compose the same composable, auditable functions into workflows. The difference is where inference happens: on your silicon, under your control, with the orchestration layer scheduling work across the CPUs, GPUs, and accelerators you already run.

Is bem Local verified and compliant?

Yes. Every function carries accuracy scores (precision, recall, F1) and routes low-confidence outputs to human review, and corrections become training data. bem is SOC 2 Type II and HIPAA certified, supports EU data sovereignty, and offers on-premise and air-gapped deployment.

When is bem Local available?

bem Local is coming summer 2026. Early access is opening to teams whose work cannot leave the building. You can request access now.

Be first to run bem locally.

bem Local is coming summer 2026. Early access is opening to teams whose work cannot leave the building.

Local: a new shape of inference on your own silicon | bem