Production-ready toolkit for CSV/JSONL to Parquet transformation. CLI + REST API. Validation, compression, checksums, encryption.
Everything you need for production data pipelines.
CSV, JSONL, JSON to optimized Parquet. Automatic schema inference and type coercion.
Multiple codecs: Snappy, ZSTD, Gzip, LZ4. Configurable row group sizes for query optimization.
SHA-256 checksums, manifest generation, and TruthKit validation harness.
Optional Fernet encryption for secure data transfers. Key management included.
Hive-style partitioning for efficient query pruning. Date, category, or custom keys.
Memory-efficient streaming for files of any size. No OOM errors.