How to build a Retail AI Agent with the bem API
Create an agent with bem that can detect products in planograms, CCTV, social media posts, e-commerce, and more
We talk a lot about "Unstructured to Structured" data. Usually, that means PDFs. But the operational reality of retail isn't a PDF—it's a chaotic stream of video feeds, audio logs, and user-generated content.
I want to show you how to build the Universal Retail Agent we demoed live.
The goal? A single API endpoint that accepts any input (CCTV footage, a TikTok review, a PDF planogram) and returns the exact same rigid, enriched JSON.
We are going to do this in 4 steps using the Bem API.
(Note: Everything I’m showing here via curl can also be built visually in our UI. If you prefer clicking to coding, check out the video tutorial below.)
Phase 1: The Knowledge Base (Collections)
The Business Outcome: Your AI needs to know what it is looking at. It needs a source of truth.
You can upload your entire Product Catalog, SKU list, or Planogram database into a Bem Collection. This allows us to perform RAG (Retrieval-Augmented Generation) and semantic search during the pipeline.
I’ve already created a collection called master-product-catalog containing 50,000 SKUs using our POST /v2/collections endpoint. You can sync this programmatically via CDC (Change Data Capture) from your PIM/ERP, or just drag-and-drop a CSV in the UI.
Phase 2: The Eyes (The Analyze Function)
The Business Outcome: Transforming pixels and audio into raw observations.
We need a function that "watches" the video or "reads" the PDF and extracts visual facts. We are going to create an Analyze function using the Universal Retail Schema.
Notice that this schema doesn't ask for database IDs. It asks for observable facts: "Blue can," "Eye-level shelf," "Frustrated sentiment."
The API Call:
1curl -X POST https://api.bem.ai/v2/functions \2 -H "x-api-key: YOUR_API_KEY" \3 -H "Content-Type: application/json" \4 -d '{5 "type": "analyze",6 "functionName": "retail-vision-agent",7 "outputSchemaName": "Universal Retail Analysis",8 "outputSchema": {9 "type": "object",10 "description": "Universal Retail Analysis",11 "properties": {12 "detected_entities": {13 "type": "array",14 "items": {15 "type": "object",16 "properties": {17 "visual_identity": {18 "type": "object",19 "properties": {20 "brand_name": { "type": "string" },21 "visual_attributes": { "type": "string" }22 }23 },24 "operational_context": {25 "type": "object",26 "properties": {27 "environment_type": { "type": "string", "enum": ["Shelf/Aisle", "Consumer_Home", "Unknown"] },28 "stock_condition": { "type": "string", "enum": ["Full", "Gap_Visible", "N/A"] }29 }30 },31 "sentiment_context": {32 "type": "object",33 "properties": {34 "overall_vibe": { "type": "string", "enum": ["Positive", "Negative", "Mixed"] },35 "positive_remarks": { "type": "array", "items": { "type": "string" } },36 "pain_points": { "type": "array", "items": { "type": "string" } }37 }38 }39 }40 }41 },42 "agent_triage": {43 "type": "object",44 "required": ["category", "urgency"],45 "properties": {46 "category": { "type": "string", "enum": ["Inventory", "Marketing", "Security"] },47 "urgency": { "type": "string", "enum": ["Immediate", "Review", "Ignore"] }48 }49 }50 },51 "required": ["detected_entities", "agent_triage"]52 }53 }'
Phase 3: The Brain (The Enrich Function)
The Business Outcome: Connecting observation to database reality.
The vision model sees "A blue can of Yerba Mate." But your ERP needs "SKU: YM-BLUE-16".
We use an Enrich function to take the messy output from Phase 2, look it up in our master-product-catalog collection, and append the rigid SKU data.
1curl -X POST https://api.bem.ai/v2/functions \2 -H "x-api-key: YOUR_API_KEY" \3 -H "Content-Type: application/json" \4 -d '{5 "type": "enrich",6 "functionName": "sku-resolver",7 "config": {8 "steps": [9 {10 "sourceField": "detected_entities[*].visual_identity.brand_name",11 "collectionName": "master-product-catalog",12 "targetField": "matched_sku_record",13 "searchMode": "hybrid",14 "topK": 115 }16 ]17 }18 }'
Phase 4: The Logic (The Workflow)
The Business Outcome: A single deployable asset.
We don't want to call these functions separately. We want a pipeline. We chain them together so that Input -> Analyze -> Enrich -> Output.
The API Call:
1curl -X POST https://api.bem.ai/v2/workflows \2 -H "x-api-key: YOUR_API_KEY" \3 -H "Content-Type: application/json" \4 -d '{5 "name": "retail-intelligence-pipeline",6 "mainFunction": { "name": "retail-vision-agent", "versionNum": 1 },7 "relationships": [8 {9 "sourceFunction": { "name": "retail-vision-agent", "versionNum": 1 },10 "destinationFunction": { "name": "sku-resolver", "versionNum": 1 }11 }12 ]13 }'
Phase 5: Running it (The "Magic")
This is the best part. We can now throw anything at this workflow endpoint. The pipeline handles the orchestration, the visual perception, and the database lookup in one go.
Example A: The UGC Review (Video) Throw a 15-second TikTok review at the endpoint. The Analyze function extracts the brand and the "Pain Points" (e.g., "The can dents too easily"). The Enrich function then automatically attaches the manufacturing SKU so your Product Team knows exactly which production line to audit.
Example B: The CCTV Feed (Video) Throw a dashcam or security clip at the same endpoint. Analyze sees the "Gap_Visible" on the shelf. Enrich maps that gap to a specific SKU and its backroom inventory count.
The Single API Call:
1curl -X POST https://api.bem.ai/v2/calls \2 -H "x-api-key: YOUR_API_KEY" \3 -H "Content-Type: application/json" \4 -d '{5 "calls": [6 {7 "workflowName": "retail-intelligence-pipeline",8 "callReferenceID": "store-104-aisle-4",9 "input": {10 "singleFile": {11 "inputType": "mp4",12 "inputContent": "[BASE64_VIDEO_DATA]"13 }14 }15 }16 ]17 }'
The Payoff: Computer Vision as Infrastructure
Notice what happened here. We didn’t write code to "handle a video" or "parse a PDF." We built infrastructure that treats the physical world as a queryable database.
Whether it’s a seed-stage startup or a Fortune 50 enterprise, the operational reality is multimodal. Your data pipelines should be, too.
Stop building brittle, one-off scripts. Start building resilient pipelines.
Start to see it in action?
Talk to our team to walk through how bem can work inside your stack.

