{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sort anyting to anything" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is how a classical .pairs file looks like:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\".pairs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more details about .pairs file format please look at the [documentation](https://pairtools.readthedocs.io/en/latest/index.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To sort pairs in either the .pairs or .parquet format, you just need to run a simple CLI command (with parameters similar to those used in pairtools).\n", "Note that sorting can be combined with conversion - and in most cases, you’ll actually save time by combining processing(in this case sorting) and conversion, rather than performing them separately.\n", "\n", "Here’s an example with some useful parameters:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pairs_to_parquet sort -o test_sorted.parquet --nproc 4 --tmpdir /temp_directory --memory 35G test.pairs.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Default option of sorting is to sort pairs by `chr1-chr2-pos1-pos2-pair_type`, where chr and pair_type columns are sorted in `lexicographical` order and pos columns in `numerical` order" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you have your own idea, in which order you want to sort you pairs: then you need to specify `c1, c2, p1, p2, pt or even extra_col` parameters" ] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.9.23" } }, "nbformat": 4, "nbformat_minor": 2 }