How to use Cosmos DB as an online feature store from Databricks

littlereddotdata
7 min readMay 26, 2023

--

Features stored in Databricks’ Feature Store can be made available via Azure Cosmos DB for low-latency access

ICYMI: Databricks now supports Cosmos DB as an online feature store. This announcement comes in addition to existing support for other online databases such as Dynamo DB and Azure MySQL.

TLDR: In this post, we want to highlight how users can get started using Databricks Feature Store with Cosmos DB. Online databases are increasingly prevalent as more applications move towards serving machine learning predictions at scale and with low-latency. However, not many users are familiar with concepts such as the differences between online and offline features, and when to use one over the other. Therefore, in this article, we cover both an end-to-end example of how to use Cosmos DB with Databricks Feature Store, as well as introduce the reader to what online stores are and how they fit into a machine learning model’s lifecycle.

If they want good predictions, Data Scientists can’t only rely on passing raw data into their models. Instead, raw data needs to be transformed into features. In some cases, a feature can be an aggregation (for example, how much a customer has spent in the past week). In other cases, it could be a transformation, for example taking the logarithm of the Price variable. Essentially, features capture aspects of data that increase the predictive power of a dataset.

We can categorize features into different flavors (as shown in the diagram below). For pre-computed features, data usually arrives through ETL pipelines before being transformed and stored into a Feature Store like Databricks Feature Store. For real-time features, data comes in on-the-fly instead of through scheduled workflows. Feature transformations are calculated as data comes in, and then passed directly to the model for inference rather than being stored into a table for lookup later.

Precomputed features can be further subdivided into features that are stored in offline or online storage layers. Offline storage layers, such as Delta Lake, are able to store large amounts of historical data. As such, they are useful for training machine learning models or for doing model inference in batch mode. Meanwhile, online storage layers, such as Cosmos DB, typically only store the most recent feature values for a given primary key. They are able to return feature values at a faster latency compared to that of an offline store. Hence, they are a good choice when our model needs to perform inference at low latencies. Examples of low-latency applications include fraud detection or real-time recommendations that need to return predictions to a client within a second or sooner.

Diagram 1: Types of features used by Machine Learning models. Precomputed features can be computed offline and include features such as a customer’s average spend over the past 30 days or the number of movies watched in the past month. On-demand features can be computed and used as soon as data arrives to an application, for example a list of pages that a user has visited in the last 30 seconds. (For more information on on-demand features, see our blogpost)

In these cases, we would want to take the features we write to a Delta Feature Store table and publish them to a downstream online database that offers us the fast lookup times we need. Examples of these databases include Cosmos DB, DynamoDB and Postgres.

The typical workflow of publishing features to an online store is outlined in the diagram below. After a scheduled feature computation pipeline writes our features into a Databricks Feature Store table, we can use the Feature Store client to subsequently publish these same features to an online database. This database is then queried by our model during inference time in order to get the latest feature values for inference.

How online feature stores work

Diagram 2: A batch preprocessing pipeline first writes features to an offline store such as Databricks Feature Store. Subsequently, offline features are published to an online store (such as Cosmos DB) so downstream ML models can access these features at low-latencies. This enabled real-time applications such as fraud detection models to meet their latency requirements

The rest of this article will go over an example of an end-to-end workflow of how to make use of Cosmos DB as an online feature store. Our workflow will involve:

  1. Ingesting data
  2. Creating features and storing them inside a Databricks Feature Store table
  3. Publishing these features from our Databricks Feature Store table into Cosmos DB
  4. Using these features and other raw data to train a machine learning model using Scikit-learn and MLflow
  5. Create a Databricks Model Serving endpoint to serve our predictions. This serving option is more seamless than other alternatives for a few reasons:
  6. Our Databricks Model Serving endpoint will fetch the feature values needed by the model from the online feature store automatically. Model Serving endpoints can automatically scale up or down to adjust for a changing volume of incoming requests. Additionally, Model Serving endpoints are built for production workloads. They can support up to 3000 QPS and are associated with dashboards that monitor endpoint health using metrics such as QPS, latency and error rate.

Alternative if not using Databricks Model Serving:

  1. Create a Fast API endpoint that can serve our trained model’s predictions
  2. Using this endpoint to receive raw, real-time data, write custom code to join this data with features stored within Cosmos, then passing this inference dataset to our model to get predictions in return

For steps 1–5, we make use of the Online Feature Store example notebook from the Databricks online store documentation. This notebook uses the Wine Quality dataset to predict the quality of a bottle of wine. In our scenario, we use Alcohol by Volume (ABV) and Wine ID as our real-time features, while other features in the dataset are saved to a Feature Store table and looked-up during model training and inference. This separation is illustrative of a typical online application that needs a combination of real-time and precomputed features to handle model inference.

Ingestion Data

Creating features and storing them inside a Databricks Feature Store table

Publishing these features from our Databricks Feature Store table into Cosmos DB

We can see that the online store has been registered in our Feature Store UI.

Going to the Azure Portal, we can cross-check that we have a container in Cosmos DB created with the name feature_store_online_wine_features

Using these features and other raw data to train a machine learning model using Scikit-learn and MLFLow. Example training code can be found from the Online Feature Store example notebook. Once we have trained the model, we can view our metrics, results and model from the MLflow Experiment Tracking UI

Create a Serverless Real-time Inference endpoint to serve our trained model’s predictions

  1. Navigate to the model inside the MLflow model registry and click “Use model for inference”

2. Inside the pop-up, specify what the model version and endpoint name should be. Click “create endpoint” on the bottom right.

3. A page will appear showing the status of the endpoint

4. Once the endpoint is ready, we can use the Python requests library to query the endpoint

def create_tf_serving_json(data):
return {'inputs': {name: data[name].tolist() for name in data.keys()} if isinstance(data, dict) else data.tolist()}

def predict(dataset):
url = '<Replace with the URL shown in Serving page>'
headers = {'Authorization': f'Bearer {DATABRICKS_TOKEN}'}
data_json = dataset.to_dict(orient='split') if isinstance(dataset, pd.DataFrame) else create_tf_serving_json(dataset)
response = requests.request(method='POST', headers=headers, url=url, json=data_json)
if response.status_code != 200:
raise Exception(f'Request failed with status {response.status_code}, {response.text}')
return response.json()

*note that with SRTI, the payload data should be formatted as:
{"dataframe_records": [
{"wine_id": 25, "alcohol": 7.9},
{"wine_id": 25, "alcohol": 11.0}
]}

Creating a Fast API endpoint that can serve our trained model’s predictions

Here, we move away from our notebook environment into an IDE. Within our app/ folder, we specify a Fast API predict/ endpoint that can serve our model’s predictions. The project folder structure is shown in the below, and the code for this can be found on Github.

This endpoint will accept a JSON input of {"alchohol":0, "wine_id":0} and output a predicted wine quality score

@app.post(“/predict”)
async def predict(item: Item):
"""Prediction endpoint.
"""

Behind the scenes, upon receiving a request, this endpoint will pass on the wine_id as a query parameter to our Cosmos DB container to get the features relevant to a particular wine_id.

query_text = “SELECT * FROM feature_store_online_wine_features f WHERE f._feature_store_internal__primary_keys = @id”
res = await query_items(container_obj, query_text, id)

Then, these features are joined to our input data to create a dataset to use for model inference

inputs_and_features = input_df.join(lookups).drop("wine_id", axis=1)
# Make predictions and log
model_output = MODEL.predict(inputs_and_features).tolist()

If we open a local instance of our Fast API application, we can test that our endpoint is working.

We have shown how features registered to Databricks Feature Store can be published to an online database such as Cosmos DB for low-latency lookups during model serving. We have also shown how a REST application can query Cosmos DB for the most up-to-date features, and then join these features with real-time raw data to create a dataset that is used for model inference. Online stores are increasingly prevalent for applications that need to return results in real-time or near real-time. Hopefully, this guide serves as a starting point for how to think about batch versus online features and how they inform the model building and serving process.

Thank you Feifei Wang and Maxim Lukiyanov for valuable feedback on this post

--

--

littlereddotdata
littlereddotdata

Written by littlereddotdata

I work with data in the little red dot

No responses yet