Nucleotide Archival Format (NAF)
NAF is a binary file format for biological sequence data. It's based on zstd, and features strong compression and fast decompression. It can store DNA, RNA, protein or text sequences, with or without qualities. It supports FASTA and FASTQ-formatted sequences, ambiguous IUPAC codes, masked sequence, and has no limit on sequence length or number of sequences. It supports Unix pipes which allows easy integration into pipelines. See NAF homepage for details.
|Example benchmark: SILVA 132 LSURef database (610 MB):|
|From Sequence Compression Benchmark project - visit for details and more benchmarks.|
NAF specification is in public domain: NAFv2.pdf
Encoder and decoder
NAF encoder and decoder are called "ennaf" and "unnaf". After compressing your data with ennaf, you suddenly have enough space. However, if you decompress it back with unnaf, your space is again un-enough.
Installing with bioconda
To install NAF with bioconda:
conda install naf
See package page for details: naf at bioconda.
Building from source
Prerequisites: git, gcc, make, diff, perl (diff and perl are only used for test suite).
E.g., to install on Ubuntu:
sudo apt install git gcc make diffutils perl.
On Mac OS you may have to install Xcode Command Line Tools.
Building and installing:
git clone --recurse-submodules https://github.com/KirillKryukov/naf.git cd naf && make && make test && sudo make install
To install in alternative location, add "prefix=DIR" to the "make install" command. E.g.,
sudo make prefix=/usr/local/bio install
For a staged install, add "DESTDIR=DIR". E.g.,
make DESTDIR=/tmp/stage install
Building from latest unreleased source
For testing purpose only:
git clone --recurse-submodules --branch develop https://github.com/KirillKryukov/naf.git cd naf && make && make test && sudo make install
ennaf file.fa -o file.naf
ennaf -h and Compression Manual for detailed usage.
unnaf file.naf -o file.fa
unnaf -h and Decompression Manual.
Compressing multiple files
Working with multiple files is possible using Multi-Multi-FASTA as intermediate format. Example commands:
mumu.pl --dir 'Helicobacter' 'Helicobacter pylori*' | ennaf -22 --text -o Hp.nafnaf
Decompressing and unpacking:
unnaf Hp.nafnaf | mumu.pl --unpack --dir 'Helicobacter'
Filename of NAF-compressed single file normally ends with a ".naf". To avoid ambiguity, ".nafnaf" is the recommended suffix for multi-file NAF archives.
If you use NAF, please cite:
- Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, Tadashi Imanishi (2019) "Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences" Bioinformatics, 35(19), 3826-3828, doi: 10.1093/bioinformatics/btz144.
For compressor benchmark, please cite: