Question 1

What is bem Local?

Accepted Answer

bem Local is bem running on your own hardware. The same verified workflows you build in the bem cloud (parse, extract, classify, join, enrich) run on the silicon inside your boundary, so unstructured documents become structured, audited data without leaving your infrastructure.

Question 2

Which hardware does bem Local support?

Accepted Answer

At launch, bem Local accelerates on NVIDIA GPUs through CUDA and on Apple Silicon through MLX, the framework built for the M-series unified-memory architecture. AMD ROCm, Vulkan compute, and broader heterogeneous backends are on the roadmap. bem Local is built on a compute abstraction, so it runs where your hardware lives.

Question 3

Does my data leave my infrastructure?

Accepted Answer

No. bem Local is designed for zero data egress. Inputs, model weights, and outputs stay on hardware you control. It can run fully on-premise, in your own cloud VPC, or air-gapped with no network connection at all.

Question 4

How is bem Local different from the bem cloud?

Accepted Answer

It is the same platform and the same V3 API, placed inside your boundary. You compose the same composable, auditable functions into workflows. The difference is where inference happens: on your silicon, under your control, with the orchestration layer scheduling work across the CPUs, GPUs, and accelerators you already run.

Question 5

Is bem Local verified and compliant?

Accepted Answer

Yes. Every function carries accuracy scores (precision, recall, F1) and routes low-confidence outputs to human review, and corrections become training data. bem is SOC 2 Type II and HIPAA certified, supports EU data sovereignty, and offers on-premise and air-gapped deployment.

Question 6

When is bem Local available?

Accepted Answer

bem Local is coming summer 2026. Early access is opening to teams whose work cannot leave the building. You can request access now.

A new shape of inference for the work that matters most.

The work that can’t leave the building.

The operating system on top of inference.

Built for the silicon you already run.

Workstation to datacenter

Unified memory, fully used

Drive inference without the waste.

Route to the cheapest model that clears the bar

Never pay for the same document twice

Generate the answer, not the prose

Deterministic replay

Probabilistic models. Deterministic guardrails.

Your data never leaves your boundary.

What teams ask about bem Local.