Git
for Data

Version control for data and ETL pipelines. Rollback mistakes in minutes, not weeks.

Data Version
main • v42
Catalog
sales
customer_data
customer_segments
orders
revenue_metrics
marketing_costs
product_sales
pricing_history
product_catalog
raw_transactions
sales_analysis
tax
Query 1 ×
Query 2
+
SQL Python
1 SELECT
2     date,
3     SUM(revenueas total_revenue,
4     COUNT(DISTINCT customer_idas customers
5 FROM sales VERSION AS OF 40
6 WHERE date >= '2025-01-01'
7 GROUP BY date
Results
date total_revenue customers
2025-01-01 $45,230 128
2025-01-02 $52,180 142
2025-01-03 $58,920 156

The $50M Data Problem

Every data team faces these expensive, time-consuming challenges

Impossible Rollbacks

When mistakes happen, data teams face 2-3 week backfill campaigns costing $50K-200K per incident. Recovery requires manual intervention and tribal knowledge. One bad query can corrupt your entire pipeline.

Data Discovery Waste

Analysts waste 30-40% of their time asking "Where is the data?" Manual dataset discovery from outdated spreadsheets and tribal knowledge creates duplicate work, inconsistent analysis, and slow time-to-insight.

Dependency Chaos

Tables depend on other tables, but tracking these relationships manually is error-prone. One delayed upstream job breaks everything downstream. No lineage means debugging becomes a nightmare.

Cost Explosion

Companies have zero visibility into data spending and waste 60-80% of budgets on redundant datasets, inefficient queries, and over-provisioned infrastructure. Bills keep growing with no way to optimize.

Time Travel for Your Data Lake

Bring Git-like superpowers to your data pipelines. Branch, version, and rollback with confidence.

Instant Rollback & Cascade

95% faster incident response. One-click rollback with automatic downstream DAG cascade. Select any point in time, click rollback, and the entire dependency tree cascades automatically. 3 weeks → 5 minutes.

ROLLBACK TO VERSION 122

Version Everything

Complete audit trail. Every query, transformation, and schema change is automatically versioned. Git-like branching for data and schema with 60%+ storage reduction vs table copies.

SELECT * FROM sales VERSION AS OF 123

Smart Dependencies & Lineage

Lineage captured at authoring time. Automatically track table dependencies at the partition level. Jobs run in the right order. Enables discovery, operations monitoring, and change impact assessment.

DEPENDS ON sales.date = YESTERDAY

AI-Powered Discovery

Hours → seconds. AI agents understand schemas and lineage. Ask "Show me sales by region for last month" in natural language—get instant SQL generation and query execution.

"Show me sales by region last month"

Enterprise Capabilities, Zero Operations

Production-ready features with serverless simplicity

Built-in Native Lineage

Lineage captured at authoring time—not reverse-engineered from logs. Enables discovery, operations monitoring, and change impact assessment across the entire pipeline.

Version Control for Data

Git-like branching for data and schema. 60%+ storage reduction vs table copies. Safe schema testing with complete isolation.

Unified SQL/Python Pipelines

Single platform for analysts and engineers. No more tool fragmentation—unified version control, dependency tracking, and rollback for both SQL and Python.

Zero-Setup Deployment

Install to your AWS account in under 5 minutes. Start querying within 60 seconds. Serverless architecture means no servers to manage, no ops expertise required.

Interoperability via Iceberg

Connect any tool (Spark, Trino, Presto, Dremio). Import from S3, Glue, or paste data directly. Never locked in—built on Apache Iceberg open standard.

Cross-Instance Sharing

Spin up private data lakes per team, share across departments, manage centrally. Organizational flexibility with data accessibility.

Visual Query Builder

Modern React interface for browsing catalogs, building queries, and managing versions. No SQL required—intuitive IDE experience for all skill levels.

Cloud Native

Serverless architecture built on modern cloud infrastructure. No servers to manage, infinite scalability. Deploys to your cloud account in under 5 minutes.

Get Started Today

Download the Data Version desktop application for Windows, macOS, or Linux

Free for individual use • AWS account required for deployment

Have Questions?

We're here to help you get started with Data Version