123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173 |
- The .lzma File Format
- =====================
- 0. Preface
- 0.1. Notices and Acknowledgements
- 0.2. Changes
- 1. File Format
- 1.1. Header
- 1.1.1. Properties
- 1.1.2. Dictionary Size
- 1.1.3. Uncompressed Size
- 1.2. LZMA Compressed Data
- 2. References
- 0. Preface
- This document describes the .lzma file format, which is
- sometimes also called LZMA_Alone format. It is a legacy file
- format, which is being or has been replaced by the .xz format.
- The MIME type of the .lzma format is `application/x-lzma'.
- The most commonly used software to handle .lzma files are
- LZMA SDK, LZMA Utils, 7-Zip, and XZ Utils. This document
- describes some of the differences between these implementations
- and gives hints what subset of the .lzma format is the most
- portable.
- 0.1. Notices and Acknowledgements
- This file format was designed by Igor Pavlov for use in
- LZMA SDK. This document was written by Lasse Collin
- <lasse.collin@tukaani.org> using the documentation found
- from the LZMA SDK.
- This document has been put into the public domain.
- 0.2. Changes
- Last modified: 2022-07-13 21:00+0300
- Compared to the previous version (2011-04-12 11:55+0300)
- the section 1.1.3 was modified to allow End of Payload Marker
- with a known Uncompressed Size.
- 1. File Format
- +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+
- | Header | LZMA Compressed Data |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+
- The .lzma format file consist of 13-byte Header followed by
- the LZMA Compressed Data.
- Unlike the .gz, .bz2, and .xz formats, it is not possible to
- concatenate multiple .lzma files as is and expect the
- decompression tool to decode the resulting file as if it were
- a single .lzma file.
- For example, the command line tools from LZMA Utils and
- LZMA SDK silently ignore all the data after the first .lzma
- stream. In contrast, the command line tool from XZ Utils
- considers the .lzma file to be corrupt if there is data after
- the first .lzma stream.
- 1.1. Header
- +------------+----+----+----+----+--+--+--+--+--+--+--+--+
- | Properties | Dictionary Size | Uncompressed Size |
- +------------+----+----+----+----+--+--+--+--+--+--+--+--+
- 1.1.1. Properties
- The Properties field contains three properties. An abbreviation
- is given in parentheses, followed by the value range of the
- property. The field consists of
- 1) the number of literal context bits (lc, [0, 8]);
- 2) the number of literal position bits (lp, [0, 4]); and
- 3) the number of position bits (pb, [0, 4]).
- The properties are encoded using the following formula:
- Properties = (pb * 5 + lp) * 9 + lc
- The following C code illustrates a straightforward way to
- decode the Properties field:
- uint8_t lc, lp, pb;
- uint8_t prop = get_lzma_properties();
- if (prop > (4 * 5 + 4) * 9 + 8)
- return LZMA_PROPERTIES_ERROR;
- pb = prop / (9 * 5);
- prop -= pb * 9 * 5;
- lp = prop / 9;
- lc = prop - lp * 9;
- XZ Utils has an additional requirement: lc + lp <= 4. Files
- which don't follow this requirement cannot be decompressed
- with XZ Utils. Usually this isn't a problem since the most
- common lc/lp/pb values are 3/0/2. It is the only lc/lp/pb
- combination that the files created by LZMA Utils can have,
- but LZMA Utils can decompress files with any lc/lp/pb.
- 1.1.2. Dictionary Size
- Dictionary Size is stored as an unsigned 32-bit little endian
- integer. Any 32-bit value is possible, but for maximum
- portability, only sizes of 2^n and 2^n + 2^(n-1) should be
- used.
- LZMA Utils creates only files with dictionary size 2^n,
- 16 <= n <= 25. LZMA Utils can decompress files with any
- dictionary size.
- XZ Utils creates and decompresses .lzma files only with
- dictionary sizes 2^n and 2^n + 2^(n-1). If some other
- dictionary size is specified when compressing, the value
- stored in the Dictionary Size field is a rounded up, but the
- specified value is still used in the actual compression code.
- 1.1.3. Uncompressed Size
- Uncompressed Size is stored as unsigned 64-bit little endian
- integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates
- that Uncompressed Size is unknown. End of Payload Marker (*)
- is used if Uncompressed Size is unknown. End of Payload Marker
- is allowed but rarely used if Uncompressed Size is known.
- XZ Utils 5.2.5 and older don't support .lzma files that have
- End of Payload Marker together with a known Uncompressed Size.
- XZ Utils rejects files whose Uncompressed Size field specifies
- a known size that is 256 GiB or more. This is to reject false
- positives when trying to guess if the input file is in the
- .lzma format. When Uncompressed Size is unknown, there is no
- limit for the uncompressed size of the file.
- (*) Some tools use the term End of Stream (EOS) marker
- instead of End of Payload Marker.
- 1.2. LZMA Compressed Data
- Detailed description of the format of this field is out of
- scope of this document.
- 2. References
- LZMA SDK - The original LZMA implementation
- http://7-zip.org/sdk.html
- 7-Zip
- http://7-zip.org/
- LZMA Utils - LZMA adapted to POSIX-like systems
- http://tukaani.org/lzma/
- XZ Utils - The next generation of LZMA Utils
- http://tukaani.org/xz/
- The .xz file format - The successor of the .lzma format
- http://tukaani.org/xz/xz-file-format.txt
|