Vertex AI Feature Store: The Single Source of Truth

Vertex AI Feature Store: The Single Source of Truth

Stop duplicating feature engineering code. Learn how Feature Store unifies Online (Serving) and Offline (Training) feature access.

The "Two Pipelines" Problem

Without a Feature Store, you usually build two pipelines:

  1. Training Pipeline: A massive SQL query that joins tables to calculate Avg_Spend_30d.
  2. Serving Pipeline: A fast Java/Go function that queries the database to calculate Avg_Spend_30d for the user right now.

Risk: If the SQL logic and Java logic differ by even 1%, your model fails.

Vertex AI Feature Store provides a centralized repository so you define the logic once.


1. Architecture

  • EntityType: The "Noun" (e.g., User, Product, Store).
  • Feature: The "Adjective" (e.g., age, average_rating, zip_code).
  • Ingestion: You stream or batch write values into the store.

The Two Interfaces

  1. Offline Store (BigQuery backed):
    • Used for: Training.
    • Query: "Give me the values of age and spend for these 100k users."
    • Feature: Point-in-Time Lookup (Time Travel). (See below).
  2. Online Store (Bigtable/Redis backed):
    • Used for: Serving.
    • Query: "Give me the latest values for User_123."
    • Latency: < 10ms.

2. Point-in-Time Correctness (Time Travel)

This is the killer feature. Imagine you are training a fraud model to predict a fraud that happened on Jan 1st.

  • User's spend on Jan 1st was $500.
  • User's spend today (Feb 1st) is $1000.

If you just query "Current Spend" for your training set, you leak future information ($1000). The model learns that "High Spend causes past fraud" (Wrong). Feature Store allows you to ask: "Give me the feature values as they existed on the timestamp of the event."


3. Code Example: Fetching Features

from google.cloud import aiplatform

# 1. SERVING (Online)
# Get latest values for User 123
features = featurestore_service_client.read_feature_values(
    entity_type_id="users",
    entity_id="123",
    feature_selector={"id_matcher": {"ids": ["age", "avg_spend"]}}
)
# Returns: {age: 25, avg_spend: 500}

# 2. TRAINING (Offline)
# Get values for a list of users at specific times
training_df = aiplatform.Featurestore.batch_serve_to_dataframe(
    serving_resource=feature_store_resource,
    read_instances_uri="gs://my-bucket/training_ids_and_timestamps.csv"
)

4. Summary

  • Feature Store prevents skew by unifying logic.
  • Offline = High Throughput, Time Travel (Training).
  • Online = Low Latency (Serving).
  • Point-in-Time prevents data leakage.

In the next module, we enter the Lab. Model Prototyping.


Knowledge Check

Error: Quiz options are missing or invalid.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn