Introduction
Welcome to bem. We're building the next generation of data transformation primitives so you don't have to. In this reference, you'll find a comprehensive list of all available endpoints, with their parameters and responses. Please give us feedback; that's how we build an amazing product.
API Base URL
Unless otherwise specified, all endpoints use https://api.bem.ai
as their base URL.
API Keys
For all requests, you'll need an API key. Pass this in using an x-api-key
header.
Webhook Authentication
To confirm authenticity of webhook requests coming from bem, we provide a bem-signature
header on every outgoing request to the endpoint specified in your pipeline. The header value includes a timestamp (t=
) and a signature (v1=
); these values are comma-separated, and the scheme will be versioned in case of future updates.
bem-signature:
t=1492774577,
v1=0734be64d748aa8e8ee9dfe87407665541f2c33f9b0ebf19dfd0dd80f08f504c
Signatures are generated using HMAC with SHA-256. The webhook secret for your account can be generated, retrieved, and revoked through our API, and we use that secret to encode the payload into the signature we present in the header.
To verify the signature, you must complete the following steps:
Step 1: Extract timestamp and signature from header
Split the raw string to grab the respective t
timestamp and v1
signature values.
Step 2: Prepare the signed payload string
The payload string is created by concatenating:
- The timestamp (as a string)
- The character
.
- The actual JSON payload (stringified request body)
Step 3: Determine the expected signature
Compute an HMAC with the SHA-256 hash function (the string output should be in hex). Use your account's webhook secret as the key, and the signed payload string as the message.
Step 4: Compare the signatures
Compare your computed signature with the signature provided in the header doing a simple string equality check. If the signatures match, you've validated that the request to your webhook endpoint is coming from bem.
Building an Output Schema
For some best practices and tips around how to effectively shape your output schema, you can take a look at our guide here.
Email inputs
On top of our Create Transformation endpoint below, every pipeline has an associated automatic @pipeline.bem.ai
email address where you can forward emails. The email address input will also handle attachments with the same behavior as emails sent through our API, meaning you can send CSV, XLSX, XLX, and PDFs to be processed along with email body content. The referenceID
we store for each email processed through the pipeline email address is the value of the Message-ID
header included in the email.
Processing Collections
By default, our pipelines will do linear transformations over your inputs, meaning that one input data point will result in a single output object according to your schema.
If you'd like to process a collection of data points (in that a single input will result in an array of outputs), set the independentDocumentProcessingEnabled
boolean option
on your respective pipeline with an output schema defining a single object. Your pipeline will then treat each individual row as a discrete entity, and each output transformation
will be a batched array of objects according to your output schema. Each associated transformation will have an itemOffset
field to help you map each object to its discrete row
in your input data.
Order of operations
All jobs are asynchronous and therefore we don't guarantee we'll return transformed data points in the order you sent us. If you must keep track of the order, we recommend you generate internal time-sensitive KSUIDs as the referenceID for future sorting.
Pagination
Our pagination follows the same conventions as the Stripe API, allowing you to use cursors to page back-and-forth through results. Our API uses cursor-based pagination through startingAfter
and endingBefore
parameters. Both parameters accept an existing object ID value and return objects in chronological order. The endingBefore
parameter returns objects listed before the given object. The startingAfter
parameter returns objects listed after the given object. These parameters are mutually exclusive. You can use either the startingAfter
or endingBefore
parameter, but not both simultaneously. An limit
parameter can be optionally provided to control the page size and our API defaults to a page size of 50 if a limit is not provided.