gzip

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU (the "g" is from "GNU"). Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

Gzip
Gzip-Logo
Original author(s)
Developer(s)GNU Project
Initial release31 October 1992
Stable release
1.10 (GNU Gzip)[1] / 29 December 2018
Repositorygit.savannah.gnu.org/cgit/gzip.git
Written inC
Operating systemUnix-like
TypeData compression
LicenseGNU GPLv3
Websitewww.gnu.org/software/gzip/

File format

gzip
Filename extension.gz
Internet media typeapplication/gzip[2]
Uniform Type Identifier (UTI)org.gnu.gnu-zip-archive
Developed byJean-loup Gailly and Mark Adler
Type of formatData compression
Open format?Yes
Websitegzip.org (obsolete)

gzip is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding. DEFLATE was intended as a replacement for LZW and other patent-encumbered data compression algorithms which, at the time, limited the usability of compress and other popular archivers.

"gzip" is often also used to refer to the gzip file format, which is:

  • a 10-byte header, containing a magic number (1f 8b), compression ID (08 for DEFLATE), file flags, a 32-bit timestamp, compression flags and operating system ID.
  • optional extra headers denoted by file flags, such as the original filename
  • a body, containing a DEFLATE-compressed payload
  • an 8-byte footer, containing a CRC-32 checksum and the length of the original uncompressed data, modulo .[3]
Targzip

Although its file format also allows for multiple such streams to be concatenated (gzipped files are simply decompressed concatenated as if they were originally one file[4]), gzip is normally used to compress just single files.[5] Compressed archives are typically created by assembling collections of files into a single tar archive (also called tarball[6]), and then compressing that archive with gzip. The final compressed file usually has the extension .tar.gz or .tgz.

gzip is not to be confused with the ZIP archive format, which also uses DEFLATE. The ZIP format can hold collections of files without an external archiver, but is less compact than compressed tarballs holding the same data, because it compresses files individually and cannot take advantage of redundancy between files (solid compression).

Implementations

NetBSD Gzip / FreeBSD Gzip
Developer(s)The NetBSD Foundation
Repositorycvsweb.netbsd.org/bsdweb.cgi/src/usr.bin/gzip/
Written inC
TypeData compression
LicenseSimplified BSD License

Various implementations of the program have been written. The most commonly known is the GNU Project's implementation using Lempel-Ziv coding (LZ77). OpenBSD's version of gzip is actually the compress program, to which support for the gzip format was added in OpenBSD 3.4. The 'g' in this specific version stands for gratis.[7] FreeBSD, DragonFly BSD and NetBSD use a BSD-licensed implementation instead of the GNU version; it is actually a command-line interface for zlib intended to be compatible with the GNU implementation's options.[8] These implementations originally come from NetBSD, and support decompression of bzip2 and the Unix pack format.

An alternative compression program achieving 3-8% better compression is Zopfli. It achieves gzip-compatible compression using more exhaustive algorithms, at the expense of compression time required. It does not affect decompression time.

pigz, written by Mark Adler, is compatible to gzip and speeds up compression by using all available CPU cores and threads.[9]

Derivatives and other uses

The tar utility included in most Linux distributions can extract .tar.gz files by passing the z option, e.g., tar -zxf file.tar.gz.

zlib is an abstraction of the DEFLATE algorithm in library form which includes support both for the gzip file format and a lightweight stream format in its API. The zlib stream format, DEFLATE, and the gzip file format were standardized respectively as RFC 1950, RFC 1951, and RFC 1952.

The gzip format is used in HTTP compression, a technique used to speed up the sending of HTML and other content on the World Wide Web. It is one of the three standard formats for HTTP compression as specified in RFC 2616. This RFC also specifies a zlib format (called "DEFLATE"), which is equal to the gzip format except that gzip adds eleven bytes of overhead in the form of headers and trailers. Still, the gzip format is sometimes recommended over zlib because Internet Explorer does not implement the standard correctly and cannot handle the zlib format as specified in RFC 1950.[10]

zlib DEFLATE is used internally by the Portable Network Graphics (PNG) format.

Since the late 1990s, bzip2, a file compression utility based on a block-sorting algorithm, has gained some popularity as a gzip replacement. It produces considerably smaller files (especially for source code and other structured text), but at the cost of memory and processing time (up to a factor of 4).[11]

AdvanceCOMP and 7-Zip can produce gzip-compatible files, using an internal DEFLATE implementation with better compression ratios than gzip itself—at the cost of more processor time compared to the reference implementation.

See also

Notes

  1. ^ Meyering, Jim (29 December 2018). "gzip-1.10 released [stable]". The Free Software Foundation. Retrieved 31 December 2018.
  2. ^ The 'application/zlib' and 'application/gzip' Media Types. Tools.ietf.org. doi:10.17487/RFC6713. RFC 6713. Retrieved 1 March 2014.
  3. ^ Jean-loup Gailly. "GNU Gzip". Gnu.org. Retrieved 11 October 2015.
  4. ^ "GNU Gzip: Advanced usage". Gnu.org. Retrieved 28 November 2012.
  5. ^ "Can gzip compress several files into a single archive?". Gnu.org. Retrieved 27 January 2010.
  6. ^ "tarball, The Jargon File, version 4.4.7". Catb.org. Retrieved 27 January 2010.
  7. ^ "OpenBSD gzip(1) manual page". Openbsd.org. OpenBSD. Retrieved 4 February 2018.
  8. ^ "gzip". Man.freebsd.org. 9 October 2011. Retrieved 1 March 2014.
  9. ^ Mark Adler (2017). "pigz: A parallel implementation of gzip for modern multi-processor, multi-core machines". zlib.net.
  10. ^ Lawrence, Eric (21 November 2014). "Compressing the Web". MSDN Blogs > IEInternals. Microsoft.
  11. ^ "Comparison Tool: 7-zip vs bzip2 vs gzip". compressionratings.com. Archived from the original on 1 November 2014. Retrieved 1 November 2014.

References

  • RFC 1952 – GZIP file format specification version 4.3

External links

Ark (software)

Ark is a file archiver and compressor developed by KDE and included in the KDE Applications software bundle. It supports various common archive and compression formats including zip, 7z, rar, lha and tar (both uncompressed and compressed with e.g. gzip, bzip2, lzip or xz).

Brotli

Brotli is a data format specification for data streams compressed with a specific combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd order context modelling. Google employees Jyrki Alakuijala and Zoltan Szabadka initially developed Brotli to decrease the size of transmissions of WOFF2 web fonts, and in that context Brotli was a continuation of the development of zopfli, which is a zlib-compatible implementation of the standard gzip and deflate specifications. Brotli allows a denser packing than gzip and deflate because of several algorithmic and format level improvements: the use of context models for literals and copy distances, describing copy distances through past distances, use of move-to-front queue in entropy code selection, joint-entropy coding of literal and copy lengths, the use of graph algorithms in block splitting, and a larger backward reference window are example improvements. The Brotli specification was generalized in September 2015 for HTTP stream compression (content-encoding type 'br'), and can now be used to encode any data sent by a web server to a web browser if both client and server support the format. This generalized iteration also improved the compression ratio by using a pre-defined dictionary of frequently used words and phrases.

Alakuijala and Szabadka completed the Brotli specification during 2013–2016. The specification was accompanied with a reference implementation developed by two additional authors, Evgenii Kliuchnikov and Lode Vandevenne, who had previously developed Google's zopfli reimplementation of deflate/gzip compression formats in 2013. Unlike zopfli, which was a reimplementation of an existing data format specification, Brotli was a new data format, and allowed the authors to improve compression ratios even further.Brotli was designed for use on a sequentially processed data stream (a bitstream), rather than on discrete random-access files. This makes Brotli particularly suitable for compressing data as it is sent over a network connection. Under ideal circumstances, this reduces the volume of data being transmitted. The transmission of a compressed stream may then also complete sooner than would be the case for an uncompressed stream, or a stream compressed with a less efficient stream compressor such as gzip or deflate. While gzip and deflate are comparatively light-weight compressors (i.e. less processor- and memory-intensive than Brotli), and are already widely supported by many web servers, Brotli has not yet been implemented as widely. The Brotli compressed data format was submitted to the Internet Engineering Task Force with a request for comment (RFC 7932) in July 2016. The Brotli data format is an integral part of the 2nd iteration of the Web Open Font Format.While Google's zopfli implementation of the deflate compression algorithm is named after a Swiss German word for a braided sweet bread and literally means "little plait", brotli is a Swiss German word for a bread roll and literally means "small bread". Google's own implementation of the Brotli specification was released under the terms of the permissive free software MIT license in 2016. A formal validation of the Brotli specification was independently implemented by Mark Adler,cf. one of the co-authors of the zlib/gzip compression format and library. Adler's implementation was released under the terms of the similarly permissive Apache license. Other implementations of the specification also exist, including one in the source-to-source haxe language.

Comparison of archive formats

There are many popular computer data archive formats for creating and maintaining archive files. The tables below compare many popular archive formats.

Comparison of boot loaders

The following tables compare general and technical information for a number of available boot loaders.

Compress

compress is a Unix shell compression program based on the LZW compression algorithm. Compared to more modern compression utilities such as gzip and bzip2, compress performs faster and with less memory usage, at the cost of a significantly lower compression ratio.

The uncompress utility will restore files to their original state after they have been compressed using the compress utility. If no files are specified, the standard input will be uncompressed to the standard output.

In the up-coming POSIX and Single Unix Specification revision, it is planned that DEFLATE algorithm used in gzip format be supported in those utilities.

DEFLATE

In computing, Deflate is a lossless data compression algorithm and associated file format that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool. The file format was later specified in RFC 1951.The original algorithm as designed by Katz was patented as U.S. Patent 5,051,745 and assigned to PKWARE, Inc. As stated in the RFC document, an algorithm producing Deflate files is widely thought to be implementable in a manner not covered by patents. This has led to its widespread use, for example in gzip compressed files, PNG image files and the ZIP file format for which Katz originally designed it.

GnuWin32

The GnuWin32 project provides native ports in the form of runnable computer programs, patches, and source code for various GNU and open source tools and software, much of it modified to run on the 32-bit Windows platform. The ports included in the GnuWin32 packages are:

GNU utilities such as bc, bison, chess, Coreutils, diffutils, ed, Flex, gawk, gettext, grep, Groff, gzip, iconv, less, m4, patch, readline, rx, sharutils, sed, tar, texinfo, units, Wget, which.

Archive management and compression tools, such as: arc, arj, bzip2, gzip, lha, zip, zlib.

Non-GNU utilities such as: cygutils, file, ntfsprogs, OpenSSL, PCRE.

Graphics tools.

PDCurses.

Tools for processing text.

Mathematical software and statistics software.Most programs have dependencies (typically DLLs), so that the executable files cannot simply be run in Windows unless files they depend upon are available. An alternative set of ported programs is UnxUtils; these are usually older versions, but depend only on the Microsoft C-runtime msvcrt.dll.

There is a package maintenance utility, GetGnuWin32, to download and install or update current versions of all GnuWin32 packages.

HTTP compression

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.HTTP data is compressed before it is sent from the server: compliant browsers will announce what methods are supported to the server before downloading the correct format; browsers that do not support compliant compression method will download uncompressed data. The most common compression schemes include gzip and Deflate; however, a full list of available schemes is maintained by the IANA. Additionally, third parties develop new methods and include them in their products, such as the Google Shared Dictionary Compression for HTTP (SDCH) scheme implemented in the Google Chrome browser and used on Google servers.

There are two different ways compression can be done in HTTP. At a lower level, a Transfer-Encoding header field may indicate the payload of a HTTP message is compressed. At a higher level, a Content-Encoding header field may indicate that a resource being transferred, cached, or otherwise referenced is compressed. Compression using Content-Encoding is more widely supported than Transfer-Encoding, and some browsers do not advertise support for Transfer-Encoding compression to avoid triggering bugs in servers.

Krusader

Krusader is an advanced orthodox file manager for KDE and other desktops in the Unix world. It is similar to the console-based GNU Midnight Commander, GNOME Commander for the GNOME desktop environment, or Total Commander for Windows, all of which can trace their paradigmatic features to the original Norton Commander for DOS. It supports extensive archive handling, mounted filesystem support, FTP, advanced search, viewer/editor, directory synchronisation, file content comparisons, batch renaming, etc.

It supports the following archive formats: tar, ZIP, bzip2, gzip, RAR, ace, ARJ, LHA, 7z and RPM and can handle other KIO Slaves such as smb or fish.

Krusader is published under GNU General Public License.

Lzip

lzip is a free, command-line tool for the compression of data; it employs the Lempel–Ziv–Markov chain algorithm (LZMA) with a user interface that is familiar to users of usual Unix compression tools, such as gzip and bzip2.

Like gzip and bzip2, concatenation is supported to compress multiple files, but the convention is to bundle a file that is an archive itself, such as those created by the tar or cpio Unix programs. Lzip can split the output for the creation of multivolume archives.

The file that is produced by lzip is usually given .lz as its filename extension, and the data is described by the MIME type application/x-lzip.

The lzip suite of programs was written in C++ and C by Antonio Diaz Diaz and is being distributed as free software under the terms of version 2 or later of the GNU General Public License (GPL).

Lzop

lzop is a free software file compression tool which implements the LZO algorithm and is licensed under the GPL.

Aimed at being very fast, lzop produces files slightly larger than gzip while only requiring a tenth of the CPU use and only slightly higher memory utilization. lzop is one of the fastest compressors available, a close second to lz4.

Named pipe

In computing, a named pipe (also known as a FIFO for its behavior) is an extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication (IPC). The concept is also found in OS/2 and Microsoft Windows, although the semantics differ substantially. A traditional pipe is "unnamed" and lasts only as long as the process. A named pipe, however, can last as long as the system is up, beyond the life of the process. It can be deleted if no longer used. Usually a named pipe appears as a file, and generally processes attach to it for IPC.

Pack (compression)

Pack is a (now deprecated) Unix shell compression program based on Huffman coding.The unpack utility will restore files to their original state after they have been compressed using the pack utility. If no files are specified, the standard input will be uncompressed to the standard output.

Although obsolete, support for packed files exists in modern compression tools such as gzip and 7-zip.

Snappy (compression)

Snappy (previously known as Zippy) is a fast data compression and decompression library written in C++ by Google based on ideas from LZ77 and open-sourced in 2011. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. Compression speed is 250 MB/s and decompression speed is 500 MB/s using a single core of a circa 2011 "Westmere" 2.26 GHz Core i7 processor running in 64-bit mode. The compression ratio is 20–100% lower than gzip.Snappy is widely used in Google projects like Bigtable, MapReduce and in compressing data for Google's internal RPC systems. It can be used in open-source projects like MariaDB ColumnStore, Cassandra, Hadoop, LevelDB, MongoDB, RocksDB, Lucene. Decompression is tested to detect any errors in the compressed stream. Snappy does not use inline assembler (except some optimizations) and is portable.

TUGZip

TUGZip is a freeware file archiver for Microsoft Windows. It handles a great variety of archive formats, including some of the commonly used ones like zip, rar, gzip, bzip2, sqx and 7z. It can also view disk image files like BIN, C2D, IMG, ISO and NRG. TugZip repairs corrupted ZIP archives and can encrypt files with 6 different algorithms.Since the release of TUGZip 3.5.0.0, development has been suspended due to lack of time from Kindahl's side.

Tar (computing)

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from (t)ape (ar)chive, as it was originally developed to write data to sequential I/O devices with no file system of their own. The archive data sets created by tar contain various file system parameters, such as name, time stamps, ownership, file access permissions, and directory organization. The command line utility was first introduced in the Version 7 Unix in January 1979, replacing the tp program. The file structure to store this information was standardized in POSIX.1-1988 and later POSIX.1-2001, and became a format supported by most modern file archiving systems.

XZ Utils

XZ Utils (previously LZMA Utils) is a set of free command-line lossless data compressors, including LZMA and xz, for Unix-like operating systems and, from version 5.0 onwards, Microsoft Windows.

XZ Utils consists of two major components:

xz, the command-line compressor and decompressor (analogous to gzip)

liblzma, a software library with an API similar to zlibVarious command shortcuts exist, such as lzma (for xz --format=lzma), unxz (for xz --decompress; analogous to gunzip) and xzcat (for unxz --stdout; analogous to zcat)

XZ Utils can compress and decompress both the xz and lzma file formats, but since the LZMA format is now legacy, XZ Utils compresses by default to xz.

Xar (archiver)

XAR (short for eXtensible ARchive format) is an open source file archiver and the archiver’s file format. It was created within the OpenDarwin project and is used in macOS X 10.5 and up for software installation routines, as well as browser extensions in Safari 5.0 and up. Xar replaced the use of gzipped pax files.One development branch of RPM, RPM5, uses xar.

Zlib

zlib is a software library used for data compression. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms including Linux, Mac OS X, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

The first public version of zlib, 0.9, was released on 1 May 1995 and was originally intended for use with the libpng image library. It is free software, distributed under the zlib license.

Archiving only
Compression only
Archiving and compression
Software packaging and distribution
Document packaging and distribution
Archivers with
compression
(comparison)
Non-archiving
compressors
Audio
compression

(comparison)
Video
compression

(comparison)
History
Licenses
Software
Public
speakers
Other topics

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.