Kore: Binary File Format Optimized for Modern Data Systems (Open Source)
Kore is a new high-performance binary file format designed for modern data systems, particularly optimized for analytical workloads. It boasts a 38% compression ratio and significantly enhances query speed, achieving a 131x speedup with advanced features like column pruning and predicate pushdown. The format also supports native integration with Spark and provides tools for easy data reading and writing.
- ▪Kore offers a 38% compression ratio compared to 63% for Parquet.
- ▪It provides a 131x query speedup through features like column pruning and predicate pushdown.
- ▪The format includes native integration with Spark for efficient data handling.
Opening excerpt (first ~120 words) tap to expand
🚀 Kore — Killer Optimized Record Exchange The fastest, most compressed columnar format for big data | v0.1.0 KORE is a high-performance binary file format optimized for analytical workloads. It provides: 38% compression ratio (vs 63% for Parquet) 131x query speedup with column pruning & predicate pushdown Zero data loss verification (400K+ cells tested) Native Spark integration — read/write with PySpark Quick Start Rust Library Add this crate as a dependency (when published) or include from path: use kore_fileformat::*; // Write data kore_write_simple("output.kore", schema_json, data_json)?; // Read data let data = kore_read_simple("output.kore")?; // Read specific column let col = kore_read_col_simple("output.kore", "column_name")?; // Get file info let info =…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.