Git
for Data

Version control for data and ETL pipelines. Rollback mistakes in minutes, not weeks.

Try Live Demo Download for Desktop

Data Version

main • v42

Catalog

sales

customer_data

customer_segments

orders

revenue_metrics

marketing_costs

product_sales

pricing_history

product_catalog

raw_transactions

sales_analysis

tax

Query 1 ×

Query 2

 SELECT
     date,
     SUM(revenue) as total_revenue,
     COUNT(DISTINCT customer_id) as customers
 FROM sales VERSION AS OF 40
 WHERE date >= '2025-01-01'
 GROUP BY date

Results

date	total_revenue	customers
2025-01-01	$45,230	128
2025-01-02	$52,180	142
2025-01-03	$58,920	156

3 rows • 0.42s

The $50M Data Problem

Every data team faces these expensive, time-consuming challenges

Impossible Rollbacks

When mistakes happen, data teams face 2-3 week backfill campaigns costing $50K-200K per incident. Recovery requires manual intervention and tribal knowledge. One bad query can corrupt your entire pipeline.

Data Discovery Waste

Analysts waste 30-40% of their time asking "Where is the data?" Manual dataset discovery from outdated spreadsheets and tribal knowledge creates duplicate work, inconsistent analysis, and slow time-to-insight.

Dependency Chaos

Tables depend on other tables, but tracking these relationships manually is error-prone. One delayed upstream job breaks everything downstream. No lineage means debugging becomes a nightmare.

Cost Explosion

Companies have zero visibility into data spending and waste 60-80% of budgets on redundant datasets, inefficient queries, and over-provisioned infrastructure. Bills keep growing with no way to optimize.

Time Travel for Your Data Lake

Bring Git-like superpowers to your data pipelines. Branch, version, and rollback with confidence.

Instant Rollback & Cascade

95% faster incident response. One-click rollback with automatic downstream DAG cascade. Select any point in time, click rollback, and the entire dependency tree cascades automatically. 3 weeks → 5 minutes.

ROLLBACK TO VERSION 122

Version Everything

Complete audit trail. Every query, transformation, and schema change is automatically versioned. Git-like branching for data and schema with 60%+ storage reduction vs table copies.

SELECT * FROM sales VERSION AS OF 123

Smart Dependencies & Lineage

Lineage captured at authoring time. Automatically track table dependencies at the partition level. Jobs run in the right order. Enables discovery, operations monitoring, and change impact assessment.

DEPENDS ON sales.date = YESTERDAY

AI-Powered Discovery

Hours → seconds. AI agents understand schemas and lineage. Ask "Show me sales by region for last month" in natural language—get instant SQL generation and query execution.

"Show me sales by region last month"

Enterprise Capabilities, Zero Operations

Production-ready features with serverless simplicity

Built-in Native Lineage

Lineage captured at authoring time—not reverse-engineered from logs. Enables discovery, operations monitoring, and change impact assessment across the entire pipeline.

Version Control for Data

Git-like branching for data and schema. 60%+ storage reduction vs table copies. Safe schema testing with complete isolation.

Unified SQL/Python Pipelines

Single platform for analysts and engineers. No more tool fragmentation—unified version control, dependency tracking, and rollback for both SQL and Python.

Zero-Setup Deployment

Install to your AWS account in under 5 minutes. Start querying within 60 seconds. Serverless architecture means no servers to manage, no ops expertise required.

Interoperability via Iceberg

Connect any tool (Spark, Trino, Presto, Dremio). Import from S3, Glue, or paste data directly. Never locked in—built on Apache Iceberg open standard.

Cross-Instance Sharing

Spin up private data lakes per team, share across departments, manage centrally. Organizational flexibility with data accessibility.

Visual Query Builder

Modern React interface for browsing catalogs, building queries, and managing versions. No SQL required—intuitive IDE experience for all skill levels.

Cloud Native

Serverless architecture built on modern cloud infrastructure. No servers to manage, infinite scalability. Deploys to your cloud account in under 5 minutes.

Git
for Data

The $50M Data Problem

Impossible Rollbacks

Data Discovery Waste

Dependency Chaos

Cost Explosion

Time Travel for Your Data Lake

Instant Rollback & Cascade

Version Everything

Smart Dependencies & Lineage

AI-Powered Discovery

Enterprise Capabilities, Zero Operations

Built-in Native Lineage

Version Control for Data

Unified SQL/Python Pipelines

Zero-Setup Deployment

Interoperability via Iceberg

Cross-Instance Sharing

Visual Query Builder

Cloud Native

Get Started Today

Have Questions?

Email Us

GitHub

Community

Git for Data

The $50M Data Problem

Impossible Rollbacks

Data Discovery Waste

Dependency Chaos

Cost Explosion

Time Travel for Your Data Lake

Instant Rollback & Cascade

Version Everything

Smart Dependencies & Lineage

AI-Powered Discovery

Enterprise Capabilities, Zero Operations

Built-in Native Lineage

Version Control for Data

Unified SQL/Python Pipelines

Zero-Setup Deployment

Interoperability via Iceberg

Cross-Instance Sharing

Visual Query Builder

Cloud Native

Get Started Today

Have Questions?

Email Us

GitHub

Community

Git
for Data