lzma-native =========== [![NPM Version](https://img.shields.io/npm/v/lzma-native.svg?style=flat)](https://npmjs.org/package/lzma-native) [![NPM Downloads](https://img.shields.io/npm/dm/lzma-native.svg?style=flat)](https://npmjs.org/package/lzma-native) [![Build Status](https://travis-ci.org/addaleax/lzma-native.svg?style=flat&branch=master)](https://travis-ci.org/addaleax/lzma-native?branch=master) [![Windows](https://img.shields.io/appveyor/ci/addaleax/lzma-native/master.svg?label=windows)](https://ci.appveyor.com/project/addaleax/lzma-native) [![Coverage Status](https://coveralls.io/repos/addaleax/lzma-native/badge.svg?branch=master)](https://coveralls.io/r/addaleax/lzma-native?branch=master) [![Dependency Status](https://david-dm.org/addaleax/lzma-native.svg?style=flat)](https://david-dm.org/addaleax/lzma-native) [![devDependency Status](https://david-dm.org/addaleax/lzma-native/dev-status.svg?style=flat)](https://david-dm.org/addaleax/lzma-native#info=devDependencies) Node.js interface to the native liblzma compression library (.xz file format, among others) This package provides interfaces for compression and decompression of `.xz` (and legacy `.lzma`) files, both stream-based and string-based. ## Example usage ### Installation Simply install `lzma-native` via npm: ```bash $ npm install --save lzma-native ``` *Note*: As of version 1.0.0, this module provides pre-built binaries for multiple Node.js versions and all major OS using [node-pre-gyp](https://github.com/mapbox/node-pre-gyp), so for 99 % of users no compiler toolchain is necessary. Please [create an issue here](https://github.com/addaleax/lzma-native/issues/new) if you have any trouble installing this module. *Note*: `lzma-native@2.x` requires a Node version >= 4. If you want to support Node `0.10` or `0.12`, you can feel free to use `lzma-native@1.x`. ### For streams If you don’t have any fancy requirements, using this library is quite simple: ```js var lzma = require('lzma-native'); var compressor = lzma.createCompressor(); var input = fs.createReadStream('README.md'); var output = fs.createWriteStream('README.md.xz'); input.pipe(compressor).pipe(output); ``` For decompression, you can simply use `lzma.createDecompressor()`. Both functions return a stream where you can pipe your input in and read your (de)compressed output from. ### For simple strings/Buffers If you want your input/output to be Buffers (strings will be accepted as input), this even gets a little simpler: ```js lzma.compress('Banana', function(result) { console.log(result); // }); ``` Again, replace `lzma.compress` with `lzma.decompress` and you’ll get the inverse transformation. `lzma.compress()` and `lzma.decompress()` will return promises and you don’t need to provide any kind of callback ([Example code](#api-q-compress-examle)). ## API ### Compatibility implementations Apart from the API described here, `lzma-native` implements the APIs of the following other LZMA libraries so you can use it nearly as a drop-in replacement: * [node-xz][node-xz] via `lzma.Compressor` and `lzma.Decompressor` * [LZMA-JS][LZMA-JS] via `lzma.LZMA().compress` and `lzma.LZMA().decompress`, though without actual support for progress functions and returning `Buffer` objects instead of integer arrays. (This produces output in the `.lzma` file format, *not* the `.xz` format!) ### Multi-threaded encoding Since version `1.5.0`, lzma-native supports liblzma’s built-in multi-threading encoding capabilities. To make use of them, set the `threads` option to an integer value: `lzma.createCompressor({ threads: n });`. You can use value of `0` to use the number of processor cores. This option is only available for the `easyEncoder` (the default) and `streamEncoder` encoders. Note that, by default, encoding will take place in Node’s libuv thread pool regardless of this option, and setting it when multiple encoders are running is likely to affect performance negatively. ### Reference [Encoding strings and Buffer objects](#api-encoding-buffers) * [`compress()`](#api-compress) – Compress strings and Buffers * [`decompress()`](#api-decompress) – Decompress strings and Buffers * [`LZMA().compress()`](#api-LZMA_compress) ([LZMA-JS][LZMA-JS] compatibility) * [`LZMA().decompress()`](#api-LZMA_decompress) ([LZMA-JS][LZMA-JS] compatibility) [Creating streams for encoding](#api-creating-streams) * [`createCompressor()`](#api-create-compressor) – Compress streams * [`createDecompressor()`](#api-create-decompressor) – Decompress streams * [`createStream()`](#api-create-stream) – (De-)Compression with advanced options * [`Compressor()`](#api-robey_compressor) ([node-xz][node-xz] compatibility) * [`Decompressor()`](#api-robey_decompressor) ([node-xz][node-xz] compatibility) [.xz file metadata](#api-parse-indexes) * [`isXZ()`](#api-isxz) – Test Buffer for `.xz` file format * [`parseFileIndex()`](#api-parse-file-index) – Read `.xz` file metadata * [`parseFileIndexFD()`](#api-parse-file-index-fd) – Read `.xz` metadata from a file descriptor [Miscellaneous functions](#api-functions) * [`crc32()`](#api-crc32) – Calculate CRC32 checksum * [`checkSize()`](#api-check-size) – Return required size for specific checksum type * [`easyDecoderMemusage()`](#api-easy-decoder-memusage) – Expected memory usage * [`easyEncoderMemusage()`](#api-easy-encoder-memusage) – Expected memory usage * [`rawDecoderMemusage()`](#api-raw-decoder-memusage) – Expected memory usage * [`rawEncoderMemusage()`](#api-raw-encoder-memusage) – Expected memory usage * [`versionString()`](#api-version-string) – Native library version string * [`versionNumber()`](#api-version-number) – Native library numerical version identifier ### Encoding strings and Buffer objects #### `lzma.compress()`, `lzma.decompress()` * `lzma.compress(string, [opt, ]on_finish)` * `lzma.decompress(string, [opt, ]on_finish)` Param | Type | Description ------------ | ---------------- | -------------- `string` | Buffer / String | Any string or buffer to be (de)compressed (that can be passed to `stream.end(…)`) [`opt`] | Options / int | Optional. See [options](#api-options) `on_finish` | Callback | Will be invoked with the resulting Buffer as the first parameter when encoding is finished, and as `on_finish(null, err)` in case of an error. These methods will also return a promise that you can use directly. Example code: ```js lzma.compress('Bananas', 6, function(result) { lzma.decompress(result, function(decompressedResult) { assert.equal(decompressedResult.toString(), 'Bananas'); }); }); ``` Example code for promises: ```js lzma.compress('Bananas', 6).then(function(result) { return lzma.decompress(result); }).then(function(decompressedResult) { assert.equal(decompressedResult.toString(), 'Bananas'); }).catch(function(err) { // ... }); ``` #### `lzma.LZMA().compress()`, `lzma.LZMA().decompress()` * `lzma.LZMA().compress(string, mode, on_finish[, on_progress])` * `lzma.LZMA().decompress(string, on_finish[, on_progress])` (Compatibility; See [LZMA-JS][LZMA-JS] for the original specs.) **Note that the result of compression is in the older LZMA1 format (`.lzma` files).** This is different from the more universally used LZMA2 format (`.xz` files) and you will have to take care of possible compatibility issues with systems expecting `.xz` files. Param | Type | Description ------------- | ----------------------- | -------------- `string` | Buffer / String / Array | Any string, buffer, or array of integers or typed integers (e.g. `Uint8Array`) `mode` | int | [A number between 0 and 9](#api-options-preset), indicating compression level `on_finish` | Callback | Will be invoked with the resulting Buffer as the first parameter when encoding is finished, and as `on_finish(null, err)` in case of an error. `on_progress` | Callback | Indicates progress by passing a number in [0.0, 1.0]. Currently, this package only invokes the callback with 0.0 and 1.0. These methods will also return a promise that you can use directly. This does not work exactly as described in the original [LZMA-JS][LZMA-JS] specification: * The results are `Buffer` objects, not integer arrays. This just makes a lot more sense in a Node.js environment. * `on_progress` is currently only called with `0.0` and `1.0`. Example code: ```js lzma.LZMA().compress('Bananas', 4, function(result) { lzma.LZMA().decompress(result, function(decompressedResult) { assert.equal(decompressedResult.toString(), 'Bananas'); }); }); ``` For an example using promises, see [`compress()`](#api-q-compress-examle). ### Creating streams for encoding #### `lzma.createCompressor()`, `lzma.createDecompressor()` * `lzma.createCompressor([options])` * `lzma.createDecompressor([options])` Param | Type | Description ----------- | ---------------- | -------------- [`options`] | Options / int | Optional. See [options](#api-options) Return a [duplex][duplex] stream, i.e. a both readable and writable stream. Input will be read, (de)compressed and written out. You can use this to pipe input through this stream, i.e. to mimick the `xz` command line util, you can write: ```js var compressor = lzma.createCompressor(); process.stdin.pipe(compressor).pipe(process.stdout); ``` The output of compression will be in LZMA2 format (`.xz` files), while decompression will accept either format via automatic detection. #### `lzma.Compressor()`, `lzma.Decompressor()` * `lzma.Compressor([preset], [options])` * `lzma.Decompressor([options])` (Compatibility; See [node-xz][node-xz] for the original specs.) These methods handle the `.xz` file format. Param | Type | Description ----------- | ---------------- | -------------- [`preset`] | int | Optional. See [options.preset](#api-options-preset) [`options`] | Options | Optional. See [options](#api-options) Return a [duplex][duplex] stream, i.e. a both readable and writable stream. Input will be read, (de)compressed and written out. You can use this to pipe input through this stream, i.e. to mimick the `xz` command line util, you can write: ```js var compressor = lzma.Compressor(); process.stdin.pipe(compressor).pipe(process.stdout); ``` #### `lzma.createStream()` * `lzma.createStream(coder, options)` Param | Type | Description ----------- | ---------------- | -------------- [`coder`] | string | Any of the [supported coder names](#api-coders), e.g. `"easyEncoder"` (default) or `"autoDecoder"`. [`options`] | Options / int | Optional. See [options](#api-options) Return a [duplex][duplex] stream for (de-)compression. You can use this to pipe input through this stream. The available coders are (the most interesting ones first): * `easyEncoder` Standard LZMA2 ([`.xz` file format](https://en.wikipedia.org/wiki/.xz)) encoder. Supports [`options.preset`](#api-options-preset) and [`options.check`](#api-options-check) options. * `autoDecoder` Standard LZMA1/2 (both `.xz` and `.lzma`) decoder with auto detection of file format. Supports [`options.memlimit`](#api-options-memlimit) and [`options.flags`](#api-options-flags) options. * `aloneEncoder` Encoder which only uses the legacy `.lzma` format. Supports the whole range of [LZMA options](#api-options-lzma). Less likely to be of interest to you, but also available: * `aloneDecoder` Decoder which only uses the legacy `.lzma` format. Supports the [`options.memlimit`](#api-options-memlimit) option. * `rawEncoder` Custom encoder corresponding to `lzma_raw_encoder` (See the native library docs for details). Supports the [`options.filters`](#api-options-filters) option. * `rawDecoder` Custom decoder corresponding to `lzma_raw_decoder` (See the native library docs for details). Supports the [`options.filters`](#api-options-filters) option. * `streamEncoder` Custom encoder corresponding to `lzma_stream_encoder` (See the native library docs for details). Supports [`options.filters`](#api-options-filters) and [`options.check`](#api-options-check) options. * `streamDecoder` Custom decoder corresponding to `lzma_stream_decoder` (See the native library docs for details). Supports [`options.memlimit`](#api-options-memlimit) and [`options.flags`](#api-options-flags) options. #### Options Option name | Type | Description ------------- | ---------- | ------------- `check` | check | Any of `lzma.CHECK_CRC32`, `lzma.CHECK_CRC64`, `lzma.CHECK_NONE`, `lzma.CHECK_SHA256` `memlimit` | float | A memory limit for (de-)compression in bytes `preset` | int | A number from 0 to 9, 0 being the fastest and weakest compression, 9 the slowest and highest compression level. (Please also see the [xz(1) manpage][xz-manpage] for notes – don’t just blindly use 9!) You can also OR this with `lzma.PRESET_EXTREME` (the `-e` option to the `xz` command line utility). `flags` | int | A bitwise or of `lzma.LZMA_TELL_NO_CHECK`, `lzma.LZMA_TELL_UNSUPPORTED_CHECK`, `lzma.LZMA_TELL_ANY_CHECK`, `lzma.LZMA_CONCATENATED` `synchronous` | bool | If true, forces synchronous coding (i.e. no usage of threading) `bufsize` | int | The default size for allocated buffers `threads` | int | Set to an integer to use liblzma’s multi-threading support. 0 will choose the number of CPU cores. `blockSize` | int | Maximum uncompressed size of a block in multi-threading mode `timeout` | int | Timeout for a single encoding operation in multi-threading mode `options.filters` can, if the coder supports it, be an array of filter objects, each with the following properties: * `.id` Any of `lzma.FILTERS_MAX`, `lzma.FILTER_ARM`, `lzma.FILTER_ARMTHUMB`, `lzma.FILTER_IA64`, `lzma.FILTER_POWERPC`, `lzma.FILTER_SPARC`, `lzma.FILTER_X86` or `lzma.FILTER_DELTA`, `lzma.FILTER_LZMA1`, `lzma.FILTER_LZMA2` The delta filter supports the additional option `.dist` for a distance between bytes (see the [xz(1) manpage][xz-manpage]). The LZMA filter supports the additional options `.dict_size`, `.lp`, `.lc`, `pb`, `.mode`, `nice_len`, `.mf`, `.depth` and `.preset`. See the [xz(1) manpage][xz-manpage] for meaning of these parameters and additional information. ### Miscellaneous functions #### `lzma.crc32()` * `lzma.crc32(input[, encoding[, previous]])` Compute the CRC32 checksum of a Buffer or string. Param | Type | Description ------------ | ---------------- | -------------- `input` | string / Buffer | Any string or Buffer. [`encoding`] | string | Optional. If `input` is a string, an encoding to use when converting into binary. [`previous`] | int | The result of a previous CRC32 calculation so that you can compute the checksum per each chunk Example usage: ```js lzma.crc32('Banana') // => 69690105 ``` #### `lzma.checkSize()` * `lzma.checkSize(check)` Return the byte size of a check sum. Param | Type | Description ------------ | ---------------- | -------------- `check` | check | Any supported check constant. Example usage: ```js lzma.checkSize(lzma.CHECK_SHA256) // => 16 lzma.checkSize(lzma.CHECK_CRC32) // => 4 ``` #### `lzma.easyDecoderMemusage()` * `lzma.easyDecoderMemusage(preset)` Returns the approximate memory usage when decoding using easyDecoder for a given preset. Param | Type | Description ------------ | ----------- | -------------- `preset` | preset | A compression level from 0 to 9 Example usage: ```js lzma.easyDecoderMemusage(6) // => 8454192 ``` #### `lzma.easyEncoderMemusage()` * `lzma.easyEncoderMemusage(preset)` Returns the approximate memory usage when encoding using easyEncoder for a given preset. Param | Type | Description ------------ | ----------- | -------------- `preset` | preset | A compression level from 0 to 9 Example usage: ```js lzma.easyEncoderMemusage(6) // => 97620499 ``` #### `lzma.rawDecoderMemusage()` * `lzma.rawDecoderMemusage(filters)` Returns the approximate memory usage when decoding using rawDecoder for a given filter list. Param | Type | Description ------------ | ----------- | -------------- `filters` | array | An array of [filters](#api-options-filters) #### `lzma.rawEncoderMemusage()` * `lzma.rawEncoderMemusage(filters)` Returns the approximate memory usage when encoding using rawEncoder for a given filter list. Param | Type | Description ------------ | ----------- | -------------- `filters` | array | An array of [filters](#api-options-filters) #### `lzma.versionString()` * `lzma.versionString()` Returns the version of the underlying C library. Example usage: ```js lzma.versionString() // => '5.2.3' ``` #### `lzma.versionNumber()` * `lzma.versionNumber()` Returns the version of the underlying C library. Example usage: ```js lzma.versionNumber() // => 50020012 ``` ### .xz file metadata #### `lzma.isXZ()` * `lzma.isXZ(input)` Tells whether an input buffer is an XZ file (`.xz`, LZMA2 format) using the file format’s magic number. This is not a complete test, i.e. the data following the file header may still be invalid in some way. Param | Type | Description ------------ | ---------------- | -------------- `input` | string / Buffer | Any string or Buffer (integer arrays accepted). Example usage: ```js lzma.isXZ(fs.readFileSync('test/hamlet.txt.xz')); // => true lzma.isXZ(fs.readFileSync('test/hamlet.txt.lzma')); // => false lzma.isXZ('Banana'); // => false ``` (The magic number of XZ files is hex `fd 37 7a 58 5a 00` at position 0.) #### `lzma.parseFileIndex()` * `lzma.parseFileIndex(options[, callback])` Read `.xz` file metadata. `options.fileSize` needs to be an integer indicating the size of the file being inspected, e.g. obtained by `fs.stat()`. `options.read(count, offset, cb)` must be a function that reads `count` bytes from the underlying file, starting at position `offset`. If that is not possible, e.g. because the file does not have enough bytes, the file should be considered corrupt. On success, `cb` should be called with a `Buffer` containing the read data. `cb` can be invoked as `cb(err, buffer)`, in which case `err` will be passed along to the original `callback` argument when set. `callback` will be called with `err` and `info` as its arguments. If no `callback` is provided, `options.read()` must work synchronously and the file info will be returned from `lzma.parseFileIndex()`. Example usage: ```js fs.readFile('test/hamlet.txt.xz', function(err, content) { // handle error lzma.parseFileIndex({ fileSize: content.length, read: function(count, offset, cb) { cb(content.slice(offset, offset + count)); } }, function(err, info) { // handle error // do something with e.g. info.uncompressedSize }); }); ``` #### `lzma.parseFileIndexFD()` * `lzma.parseFileIndexFD(fd, callback)` Read `.xz` metadata from a file descriptor. This is like [`parseFileIndex()`](#api-parse-file-index), but lets you pass an file descriptor in `fd`. The file will be inspected using `fs.stat()` and `fs.read()`. The file descriptor will not be opened or closed by this call. Example usage: ```js fs.open('test/hamlet.txt.xz', 'r', function(err, fd) { // handle error lzma.parseFileIndexFD(fd, function(err, info) { // handle error // do something with e.g. info.uncompressedSize fs.close(fd, function(err) { /* handle error */ }); }); }); ``` ## Installation This package includes the native C library, so there is no need to install it separately. ## Licensing The original C library package contains code under various licenses, with its core (liblzma) being public domain. See its contents for details. This wrapper is licensed under the MIT License. ## Related projects Other implementations of the LZMA algorithms for node.js and/or web clients include: * [lzma-purejs](https://github.com/cscott/lzma-purejs) * [LZMA-JS](https://github.com/nmrugg/LZMA-JS) * [node-xz](https://github.com/robey/node-xz) (native) * [node-liblzma](https://github.com/oorabona/node-liblzma) (native) Note that LZMA has been designed to have much faster decompression than compression, which is something you may want to take into account when choosing an compression algorithm for large files. Almost always, LZMA achieves higher compression ratios than other algorithms, though. ## Acknowledgements Initial development of this project was financially supported by [Tradity](https://tradity.de/). [node-xz]: https://github.com/robey/node-xz [LZMA-JS]: https://github.com/nmrugg/LZMA-JS [Q]: https://github.com/kriskowal/q [duplex]: https://nodejs.org/api/stream.html#stream_class_stream_duplex [xz-manpage]: https://www.freebsd.org/cgi/man.cgi?query=xz&sektion=1&manpath=FreeBSD+8.3-RELEASE