123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109 |
- XZ Utils To-Do List
- ===================
- Known bugs
- ----------
- The test suite is too incomplete.
- If the memory usage limit is less than about 13 MiB, xz is unable to
- automatically scale down the compression settings enough even though
- it would be possible by switching from BT2/BT3/BT4 match finder to
- HC3/HC4.
- XZ Utils compress some files significantly worse than LZMA Utils.
- This is due to faster compression presets used by XZ Utils, and
- can often be worked around by using "xz --extreme". With some files
- --extreme isn't enough though: it's most likely with files that
- compress extremely well, so going from compression ratio of 0.003
- to 0.004 means big relative increase in the compressed file size.
- xz doesn't quote unprintable characters when it displays file names
- given on the command line.
- tuklib_exit() doesn't block signals => EINTR is possible.
- SIGTSTP is not handled. If xz is stopped, the estimated remaining
- time and calculated (de)compression speed won't make sense in the
- progress indicator (xz --verbose).
- If liblzma has created threads and fork() gets called, liblzma
- code will break in the child process unless it calls exec() and
- doesn't touch liblzma.
- Missing features
- ----------------
- Add support for storing metadata in .xz files. A preliminary
- idea is to create a new Stream type for metadata. When both
- metadata and data are wanted in the same .xz file, two or more
- Streams would be concatenated.
- The state stored in lzma_stream should be cloneable, which would
- be mostly useful when using a preset dictionary in LZMA2, but
- it may have other uses too. Compare to deflateCopy() in zlib.
- Support LZMA_FINISH in raw decoder to indicate end of LZMA1 and
- other streams that don't have an end of payload marker.
- Adjust dictionary size when the input file size is known.
- Maybe do this only if an option is given.
- xz doesn't support copying extended attributes, access control
- lists etc. from source to target file.
- Multithreaded compression:
- - Reduce memory usage of the current method.
- - Implement threaded match finders.
- - Implement pigz-style threading in LZMA2.
- Buffer-to-buffer coding could use less RAM (especially when
- decompressing LZMA1 or LZMA2).
- I/O library is not implemented (similar to gzopen() in zlib).
- It will be a separate library that supports uncompressed, .gz,
- .bz2, .lzma, and .xz files.
- Support changing lzma_options_lzma.mode with lzma_filters_update().
- Support LZMA_FULL_FLUSH for lzma_stream_decoder() to stop at
- Block and Stream boundaries.
- lzma_strerror() to convert lzma_ret to human readable form?
- This is tricky, because the same error codes are used with
- slightly different meanings, and this cannot be fixed anymore.
- Make it possible to adjust LZMA2 options in the middle of a Block
- so that the encoding speed vs. compression ratio can be optimized
- when the compressed data is streamed over network.
- Improved BCJ filters. The current filters are small but they aren't
- so great when compressing binary packages that contain various file
- types. Specifically, they make things worse if there are static
- libraries or Linux kernel modules. The filtering could also be
- more effective (without getting overly complex), for example,
- streamable variant BCJ2 from 7-Zip could be implemented.
- Filter that autodetects specific data types in the input stream
- and applies appropriate filters for the corrects parts of the input.
- Perhaps combine this with the BCJ filter improvement point above.
- Long-range LZ77 method as a separate filter or as a new LZMA2
- match finder.
- Documentation
- -------------
- More tutorial programs are needed for liblzma.
- Document the LZMA1 and LZMA2 algorithms.
- Miscellaneous
- ------------
- Try to get the media type for .xz registered at IANA.
|