What is Data Version?
Data Version is a comprehensive data lake management platform that brings Git-like version control capabilities to your data pipelines. Built on Apache Iceberg, it enables you to branch, version, and rollback data with the same confidence you have with code.
The Problem We Solve
Modern data teams face a critical challenge: when mistakes happen in data pipelines, recovery is expensive and time-consuming. Traditional approaches require 2-3 week backfill campaigns costing $50K-200K per incident. Teams lack confidence to experiment, leading to brittle, untested pipelines.
- Instant Rollback: Recover from pipeline failures in seconds, not weeks
- Time Travel: Query historical data states without maintaining copies
- Safe Experimentation: Branch and test schema changes with complete isolation
- Zero Operations: Serverless architecture requires no infrastructure management
How It Works
Data Version transforms your SQL queries into versioned, scheduled ETL pipelines with a single click. The platform provides:
1. Query to Pipeline Transformation
Write a SQL query, click "Save as Table", and Data Version automatically creates a managed pipeline with scheduling, dependency tracking, and version control. Your query results become versioned Iceberg tables that you can branch, merge, and rollback.
2. Git-Like Operations for Data
Every table version is immutable and addressable. You can:
- Time travel to any historical snapshot
- Create branches for testing schema changes
- Merge tested changes back to production
- Rollback bad deployments in seconds
3. Native Lineage and Dependencies
Lineage is captured at authoring time, not reverse-engineered from logs. This enables:
- Automatic impact analysis when schemas change
- Intelligent query generation with full context awareness
- Cascading dependency management across your entire pipeline
4. AI-Powered Query Generation
Ask business questions in natural language and get production-ready SQL queries. The AI understands your schema, lineage, and query patterns, generating queries that integrate seamlessly with your existing pipelines.
Architecture
Data Version consists of three integrated components:
Desktop Application
Electron-based desktop client that provides a deployment wizard and native desktop experience. The desktop app guides you through AWS deployment with a simple, intuitive interface.
React Web Interface
Modern web application for browsing your data catalog, writing queries, and managing versions. Features include:
- Interactive SQL editor with AI assistance
- Visual query builder
- Data catalog browser with schema exploration
- Pipeline scheduling and dependency management
- Version control interface for branches and snapshots
Serverless Backend
AWS CDK-based infrastructure deployed to your AWS account:
- Lambda Functions: Query execution, pipeline orchestration, version management
- DynamoDB: Metadata storage and catalog management
- S3: Iceberg table storage with versioning
- Athena: SQL query engine
- EMR Serverless: Python pipeline execution
- EventBridge: Scheduled pipeline triggers
Use Cases
Pipeline Development and Testing
Create a branch of your production table, test schema changes or new transformations, then merge back when validated. No need to maintain separate dev/staging environments.
Disaster Recovery
When a bad deployment corrupts data, rollback to the last good snapshot in seconds. No manual intervention, no tribal knowledge required.
Data Quality Monitoring
Track data quality metrics across versions. When quality degrades, instantly identify which pipeline changes caused the issue and rollback if needed.
Cross-Team Collaboration
Spin up isolated data lake instances per team, share tables across departments, manage centrally. Organizational flexibility with data accessibility.
Technology Stack
- Apache Iceberg: Open table format enabling time travel and versioning
- AWS Serverless: Lambda, Athena, EMR Serverless, EventBridge
- React + Electron: Modern, responsive user interface
- CDK (Python): Infrastructure as code for reproducible deployments
Next Steps
Ready to get started? Head over to the Getting Started Guide to install the desktop client and deploy to your AWS account.
← Back to Home