lzma-file-format.txt 5.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173
  1. The .lzma File Format
  2. =====================
  3. 0. Preface
  4. 0.1. Notices and Acknowledgements
  5. 0.2. Changes
  6. 1. File Format
  7. 1.1. Header
  8. 1.1.1. Properties
  9. 1.1.2. Dictionary Size
  10. 1.1.3. Uncompressed Size
  11. 1.2. LZMA Compressed Data
  12. 2. References
  13. 0. Preface
  14. This document describes the .lzma file format, which is
  15. sometimes also called LZMA_Alone format. It is a legacy file
  16. format, which is being or has been replaced by the .xz format.
  17. The MIME type of the .lzma format is `application/x-lzma'.
  18. The most commonly used software to handle .lzma files are
  19. LZMA SDK, LZMA Utils, 7-Zip, and XZ Utils. This document
  20. describes some of the differences between these implementations
  21. and gives hints what subset of the .lzma format is the most
  22. portable.
  23. 0.1. Notices and Acknowledgements
  24. This file format was designed by Igor Pavlov for use in
  25. LZMA SDK. This document was written by Lasse Collin
  26. <lasse.collin@tukaani.org> using the documentation found
  27. from the LZMA SDK.
  28. This document has been put into the public domain.
  29. 0.2. Changes
  30. Last modified: 2022-07-13 21:00+0300
  31. Compared to the previous version (2011-04-12 11:55+0300)
  32. the section 1.1.3 was modified to allow End of Payload Marker
  33. with a known Uncompressed Size.
  34. 1. File Format
  35. +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+
  36. | Header | LZMA Compressed Data |
  37. +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+
  38. The .lzma format file consist of 13-byte Header followed by
  39. the LZMA Compressed Data.
  40. Unlike the .gz, .bz2, and .xz formats, it is not possible to
  41. concatenate multiple .lzma files as is and expect the
  42. decompression tool to decode the resulting file as if it were
  43. a single .lzma file.
  44. For example, the command line tools from LZMA Utils and
  45. LZMA SDK silently ignore all the data after the first .lzma
  46. stream. In contrast, the command line tool from XZ Utils
  47. considers the .lzma file to be corrupt if there is data after
  48. the first .lzma stream.
  49. 1.1. Header
  50. +------------+----+----+----+----+--+--+--+--+--+--+--+--+
  51. | Properties | Dictionary Size | Uncompressed Size |
  52. +------------+----+----+----+----+--+--+--+--+--+--+--+--+
  53. 1.1.1. Properties
  54. The Properties field contains three properties. An abbreviation
  55. is given in parentheses, followed by the value range of the
  56. property. The field consists of
  57. 1) the number of literal context bits (lc, [0, 8]);
  58. 2) the number of literal position bits (lp, [0, 4]); and
  59. 3) the number of position bits (pb, [0, 4]).
  60. The properties are encoded using the following formula:
  61. Properties = (pb * 5 + lp) * 9 + lc
  62. The following C code illustrates a straightforward way to
  63. decode the Properties field:
  64. uint8_t lc, lp, pb;
  65. uint8_t prop = get_lzma_properties();
  66. if (prop > (4 * 5 + 4) * 9 + 8)
  67. return LZMA_PROPERTIES_ERROR;
  68. pb = prop / (9 * 5);
  69. prop -= pb * 9 * 5;
  70. lp = prop / 9;
  71. lc = prop - lp * 9;
  72. XZ Utils has an additional requirement: lc + lp <= 4. Files
  73. which don't follow this requirement cannot be decompressed
  74. with XZ Utils. Usually this isn't a problem since the most
  75. common lc/lp/pb values are 3/0/2. It is the only lc/lp/pb
  76. combination that the files created by LZMA Utils can have,
  77. but LZMA Utils can decompress files with any lc/lp/pb.
  78. 1.1.2. Dictionary Size
  79. Dictionary Size is stored as an unsigned 32-bit little endian
  80. integer. Any 32-bit value is possible, but for maximum
  81. portability, only sizes of 2^n and 2^n + 2^(n-1) should be
  82. used.
  83. LZMA Utils creates only files with dictionary size 2^n,
  84. 16 <= n <= 25. LZMA Utils can decompress files with any
  85. dictionary size.
  86. XZ Utils creates and decompresses .lzma files only with
  87. dictionary sizes 2^n and 2^n + 2^(n-1). If some other
  88. dictionary size is specified when compressing, the value
  89. stored in the Dictionary Size field is a rounded up, but the
  90. specified value is still used in the actual compression code.
  91. 1.1.3. Uncompressed Size
  92. Uncompressed Size is stored as unsigned 64-bit little endian
  93. integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates
  94. that Uncompressed Size is unknown. End of Payload Marker (*)
  95. is used if Uncompressed Size is unknown. End of Payload Marker
  96. is allowed but rarely used if Uncompressed Size is known.
  97. XZ Utils 5.2.5 and older don't support .lzma files that have
  98. End of Payload Marker together with a known Uncompressed Size.
  99. XZ Utils rejects files whose Uncompressed Size field specifies
  100. a known size that is 256 GiB or more. This is to reject false
  101. positives when trying to guess if the input file is in the
  102. .lzma format. When Uncompressed Size is unknown, there is no
  103. limit for the uncompressed size of the file.
  104. (*) Some tools use the term End of Stream (EOS) marker
  105. instead of End of Payload Marker.
  106. 1.2. LZMA Compressed Data
  107. Detailed description of the format of this field is out of
  108. scope of this document.
  109. 2. References
  110. LZMA SDK - The original LZMA implementation
  111. http://7-zip.org/sdk.html
  112. 7-Zip
  113. http://7-zip.org/
  114. LZMA Utils - LZMA adapted to POSIX-like systems
  115. http://tukaani.org/lzma/
  116. XZ Utils - The next generation of LZMA Utils
  117. http://tukaani.org/xz/
  118. The .xz file format - The successor of the .lzma format
  119. http://tukaani.org/xz/xz-file-format.txt