Data Processing

Transform data at scale

Production-ready toolkit for CSV/JSONL to Parquet transformation. CLI + REST API. Validation, compression, checksums, encryption.

# Transform CSV to optimized Parquet
dpkit ingest data.csv --output warehouse/

# Validate with TruthKit
dpkit verify warehouse/manifest.json

# REST API
curl -X POST localhost:8000/ingest -d '{"source": "s3://..."}'

Features

Everything you need for production data pipelines.

📦

Format Conversion

CSV, JSONL, JSON to optimized Parquet. Automatic schema inference and type coercion.

🗜️

Compression

Multiple codecs: Snappy, ZSTD, Gzip, LZ4. Configurable row group sizes for query optimization.

Validation

SHA-256 checksums, manifest generation, and TruthKit validation harness.

🔐

Encryption

Optional Fernet encryption for secure data transfers. Key management included.

📁

Partitioning

Hive-style partitioning for efficient query pruning. Date, category, or custom keys.

🚀

Chunked Processing

Memory-efficient streaming for files of any size. No OOM errors.

Build reliable data pipelines

Open source CLI. Enterprise API available.

Get Started