.. pairs_to_parquet documentation master file, created by sphinx-quickstart on Fri Oct 24 13:37:25 2025. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Overview ============================== `pairs_to_parquet` is a .parquet extention of .pairs file format. The main purpose of this extension is to leverage the row groups and metadata features of the Parquet format in order to: - speed up data selection, filtering and sorting - address a minor limitation of the .pairs format, where metadata cannot be easily parsed by generic CSV readers - reduce storage space required for the data - improve I/O performance The main problem was to find a tool, which will be able to store additional metadata in .parquet file. `Duckdb` was a perfect tool to do so, in particularly able to store key-value metadata. Here is, how the process is organised: .. figure:: _static/header_to_kv_metadata.png :width: 100% :alt: Convertation of header(.pairs.gz) to key-value metadata(.parquet) :align: center .. raw:: html

.. toctree:: :maxdepth: 2 :caption: Contents: quickstart.md .. toctree:: :maxdepth: 2 :caption: Tutorials tutorials/sort.ipynb tutorials/pairs_to_parquet.ipynb tutorials/parquet_to_pairs.ipynb