A code-first data catalog
Data products defined in code, stored in your repo, and synced to your catalog on every push.
Data products defined in code, stored in your repo, and synced to your catalog on every push.
Define and validate data products from your terminal. Everything lives as YAML in your repo, following the open OpenDPI spec.
Read the tutorial →Write your schema once, then generate code for PySpark, dbt, Pydantic, Go, and more. Run it locally, in CI, or on deploy. Your definitions stay in sync across your entire stack.
ports: daily_metrics: schema: type: object properties: customer_id: type: string date: type: string format: date total_orders: type: integer revenue: type: number
from pyspark.sql.types import * daily_metrics_schema = StructType([ StructField("customer_id", StringType()), StructField("date", DateType()), StructField("total_orders", IntegerType()), StructField("revenue", DoubleType()), ])
No surprises. Pick a plan that fits your team and scale when you need to.
For teams managing data products together.
For organizations that need full control.
Tutorials, updates, and thoughts on data cataloging.
Get notified when we publish new blog posts.
Ask questions, share ideas, or just follow along.