Filename extension

A filename extension is an identifier specified as a suffix to the name of a computer file. The extension indicates a characteristic of the file contents or its intended use. A file extension is typically delimited from the filename with a full stop (period), but in some systems it is separated with spaces.

Some file systems implement filename extensions as a feature of the file system itself and may limit the length and format of the extension, while others treat filename extensions as part of the filename without special distinction.

Usage

Filename extensions may be considered a type of metadata.[1] They are commonly used to imply information about the way data might be stored in the file. The exact definition, giving the criteria for deciding what part of the file name is its extension, belongs to the rules of the specific filesystem used; usually the extension is the substring which follows the last occurrence, if any, of the dot character (example: txt is the extension of the filename readme.txt, and html the extension of mysite.index.html). On file systems of some mainframe systems such as CMS in VM, VMS, and of PC systems such as CP/M and derivative systems such as MS-DOS, the extension is a separate namespace from the filename. Under Microsoft's DOS and Windows, extensions such as EXE, COM or BAT indicate that a file is a program executable. In OS/360 and successors, the part of the dataset name following the last period is treated as an extension by some software, e.g., TSO EDIT, but it has no special significance to the operating system itself; the same applies to Unix files in MVS.

Filesystems for UNIX-like operating systems do not separate the extension metadata from the rest of the file name. The dot character is just another character in the main filename. A file name can have no extensions, a single extension, or more than one extension. More than one extension usually represents nested transformations, such as files.tar.gz (the .tar indicates that the file is a tar archive of one or more files, and the .gz indicates that the tar archive file is compressed with gzip). Programs transforming or creating files may add the appropriate extension to names inferred from input file names (unless explicitly given an output file name), but programs reading files usually ignore the information; it is mostly intended for the human user. It is more common, especially in binary files, for the file itself to contain internal metadata describing its contents. This model generally requires the full filename to be provided in commands, whereas the metadata approach often allows the extension to be omitted.

The VFAT, NTFS, and ReFS file systems for Windows also do not separate the extension metadata from the rest of the file name, and allow multiple extensions.

With the advent of graphical user interfaces, the issue of file management and interface behavior arose. Microsoft Windows allowed multiple applications to be associated with a given extension, and different actions were available for selecting the required application, such as a context menu offering a choice between viewing, editing or printing the file. The assumption was still that any extension represented a single file type; there was an unambiguous mapping between extension and icon.

The classic Mac OS disposed of filename-based extension metadata entirely; it used, instead, a distinct file type code to identify the file format. Additionally, a creator code was specified to determine which application would be launched when the file's icon was double-clicked. macOS, however, uses filename suffixes, as well as type and creator codes, as a consequence of being derived from the UNIX-like NeXTSTEP operating system.

Improvements

The filename extension was originally used to determine the file's generic type. The need to condense a file's type into three characters frequently led to abbreviated extensions. Examples include using .GFX for graphics files, .TXT for plain text, and .MUS for music. However, because many different software programs have been made that all handle these data types (and others) in a variety of ways, filename extensions started to become closely associated with certain products—even specific product versions. For example, early WordStar files used .WS or .WSn, where n was the program's version number. Also, conflicting uses of some filename extensions developed. One example is .rpm, used for both RPM Package Manager packages and RealPlayer Media files;[2]. Others are .qif, shared by DESQview fonts, Quicken financial ledgers, and QuickTime pictures[3]; .gba, shared by GrabIt scripts and Game Boy Advance ROM images [4]; and .sb, used for SmallBasic and Scratch.

Some other operating systems that used filename extensions generally had much more liberal sizes for filenames. Many allowed full filename lengths of 14 or more characters, and maximum name lengths up to 255 were not uncommon. The file systems in operating systems such as Multics and UNIX stored the file name as a single string, not split into base name and extension components, with the "." is just another character allowed in file names. Such systems generally allow for variable-length filenames, permitting more than one dot, and hence multiple suffixes. Some components of Multics and UNIX, and applications running on them, used suffixes, in some cases, to indicate file types, but they did not use them as much—for example, executables and ordinary text files had no suffixes in their names.

The High Performance File System (HPFS), used in Microsoft and IBM's OS/2 also supported long file names and did not divide the file name into a name and an extension. The convention of using suffixes continued, even though HPFS supported extended attributes for files, allowing a file's type to be stored in the file as an extended attribute.

Microsoft's Windows NT's native file system, NTFS, supported long file names and did not divide the file name into a name and an extension, but again, the convention of using suffixes to simulate extensions continued, for compatibility with existing versions of Windows.

When the Internet age first arrived, those using Windows systems that were still restricted to 8.3 filename formats had to create web pages with names ending in .HTM, while those using Macintosh or UNIX computers could use the recommended .html filename extension. This also became a problem for programmers experimenting with the Java programming language, since it requires source code files to have the four-letter suffix .java and compiles object code output files with the five-letter .class suffix.[5]

Eventually, Windows 95 introduced support for long file names, and removed the 8.3 name/extension split in file names from non-NT Windows, in an extended version of the commonly used FAT file system called VFAT. VFAT first appeared in Windows NT 3.5 and Windows 95. The internal implementation of long file names in VFAT is largely considered to be a kludge, but it removed the important length restriction and allowed files to have a mix of upper case and lower case letters, on machines that would not run Windows NT well. However, the use of three-character extensions under Microsoft Windows has continued, originally for backward compatibility with older versions of Windows and now by habit, along with the problems it creates.

Command name issues

The use of a filename extension in a command name appears occasionally, usually as a side effect of the command having been implemented as a script, e.g., for the Bourne shell or for Python, and the interpreter name being suffixed to the command name, a practice common on systems that rely on associations between filename extension and interpreter, but sharply deprecated[6] in UNIX-derived systems like Linux and Apple's macOS, where the interpreter is normally specified as a header in the script ("shebang").

On association-based systems, the filename extension is generally mapped to a single, system-wide selection of interpreter for that extension (such as ".py" meaning to use Python), and the command itself is runnable from the command line even if the extension is omitted (assuming appropriate setup is done). If the implementation language is changed, the command name extension is changed as well, and the OS provides a consistent API by allowing the same extension-less version of the command to be used in both cases. This method suffers somewhat from the essentially global nature of the association mapping, as well as from developers' incomplete avoidance of extensions when calling programs, and that developers can't force that avoidance. Windows is the only remaining widespread employer of this mechanism.

On systems with interpreter directives, including virtually all versions of Unix, command name extensions have no special significance, and are by standard practice not used, since the primary method to set interpreters for scripts is to start them with a single line specifying the interpreter to use (which could be viewed as a degenerate resource fork). In these environments, including the extension in a command name unnecessarily exposes an implementation detail which puts all references to the commands from other programs at future risk if the implementation changes. For example, it would be perfectly normal for a shell script to be reimplemented in Python or Ruby, and later in C or C++, all of which would change the name of the command were extensions used. Without extensions, a program always has the same extension-less name, with only the interpreter directive and/or magic number changing, and references to the program from other programs remain valid.

Security issues

The default behavior of File Explorer, the file browser provided with Microsoft Windows, is for filename extensions to not be displayed. Malicious users have tried to spread computer viruses and computer worms by using file names formed like LOVE-LETTER-FOR-YOU.TXT.vbs. The hope is that this will appear as LOVE-LETTER-FOR-YOU.TXT, a harmless text file, without alerting the user to the fact that it is a harmful computer program, in this case, written in VBScript. Default behavior for ReactOS is to display file extensions in ReactOS Explorer.

Later Windows versions (starting with Windows XP Service Pack 2 and Windows Server 2003) included customizable lists of filename extensions that should be considered "dangerous" in certain "zones" of operation, such as when downloaded from the web or received as an e-mail attachment. Modern antivirus software systems also help to defend users against such attempted attacks where possible.

Some viruses take advantage of the similarity between the ".com" top-level domain and the ".COM" file extension by emailing malicious, executable command-file attachments under names superficially similar to URLs (e.g., "myparty.yahoo.com"), with the effect that some naive users click on email-embedded links that they think lead to websites but actually download and execute the malicious attachments.

There have been instances of malware crafted to exploit vulnerabilities in some Windows applications which could cause a stack-based buffer overflow when opening a file with an overly long, unhandled filename extension.

The file extension is just a marker and the content of the file does not have to match it.[7]. This can be used to disguise malicious content. When trying to identify a file for security reasons, it is therefore considered dangerous to rely on the extension alone and a proper analysis of the content of the file is preferred. For example, on UNIX derived systems, it is not uncommon to find files with no extensions at all, as commands such as file (command) are meant to be used instead, and will read the file's header to determine its content.

Alternatives

In many Internet protocols, such as HTTP and MIME email, the type of a bitstream is stated as the media type, or MIME type, of the stream, rather than a filename extension. This is given in a line of text preceding the stream, such as Content-type: text/plain.

There is no standard mapping between filename extensions and media types, resulting in possible mismatches in interpretation between authors, web servers, and client software when transferring files over the Internet. For instance, a content author may specify the extension svgz for a compressed Scalable Vector Graphics file, but a web server that does not recognize this extension may not send the proper content type application/svg+xml and its required compression header, leaving web browsers unable to correctly interpret and display the image.

BeOS, whose BFS file system supports extended attributes, would tag a file with its media type as an extended attribute. The KDE and GNOME desktop environments associate a media type with a file by examining both the filename suffix and the contents of the file, in the fashion of the file command, as a heuristic. They choose the application to launch when a file is opened based on that media type, reducing the dependency on filename extensions. macOS uses both filename extensions and media types, as well as file type codes, to select a Uniform Type Identifier by which to identify the file type internally.

See also

References

  1. ^ Stauffer, Todd; McElhearn, Kirk (2006). Mastering Mac OS X. John Wiley & Sons. pp. 95–96. ISBN 9780782151282. Retrieved 2 October 2017.
  2. ^ File Extension .RPM Details from filext.com
  3. ^ File Extension .QIF Details from filext.com
  4. ^ File Extension .GBA Details from filext.com
  5. ^ "javac – Java programming language compiler". Sun Microsystems, Inc. 2004. Retrieved 2009-05-31. Source code file names must have .java suffixes, class file names must have .class suffixes, and both source and class files must have root names that identify the class.
  6. ^ Commandname Extensions Considered Harmful
  7. ^ "What Is a File Extension?".

External links

.app

.app is a short form of the word application often used in the IT sector. It may refer to:

.app (gTLD), a top-level internet domain

.m2ts

M2TS is a filename extension used for the Blu-ray Disc Audio-Video (BDAV) MPEG-2 Transport Stream (M2TS) container file format. It is used for multiplexing audio, video and other streams. It is based on the MPEG-2 transport stream container. This container format is commonly used for high definition video on Blu-ray Disc and AVCHD.

Adobe Illustrator Artwork

Adobe Illustrator Artwork (AI) is a proprietary file format developed by Adobe Systems for representing single-page vector-based drawings in either the EPS or PDF formats. The .ai filename extension is used by Adobe Illustrator.

The AI file format was originally a native format called PGF. PDF compatibility is achieved by embedding a complete copy of the PGF data within the saved PDF format file. This format is not related to .pgf using the same name Progressive Graphics Format.The same “dual path” approach as for PGF is used when saving EPS-compatible files in recent versions of Illustrator. Early versions of the AI file format are true EPS files with a restricted, compact syntax, with additional semantics represented by Illustrator-specific DSC comments that conform to DSC's Open Structuring Conventions. These files are identical to their corresponding Illustrator EPS counterparts, but with the EPS procsets (procedure sets) omitted from the file and instead externally referenced using %%Include directives.

Analyze (imaging software)

Analyze is a software package developed by the Biomedical Imaging Resource (BIR) at Mayo Clinic for multi-dimensional display, processing, and measurement of multi-modality biomedical images. It is a commercial program and is used for medical tomographic scans from magnetic resonance imaging, computed tomography and positron emission tomography.

The Analyze 7.5 file format has been widely used in the functional neuroimaging field, and other programs such as SPM, FreeSurfer, AIR, MRIcro and Mango are able to read and write the format. The files can be used to store voxel-based volumes. One data item consists of two files: One file with the actual data in a binary format with the filename extension .img and another file (header with filename extension .hdr) with information about the data such as voxel size and number of voxels in each dimension. SPM has defined changes to this format, among other things the voxel ordering within the file.

Android application package

Android Package (APK) is the package file format used by the Android operating system for distribution and installation of mobile apps and middleware.

APK files are analogous to other software packages such as APPX in Microsoft Windows or a Debian package in Debian-based operating system. To make an APK file, a program for Android is first compiled, and then all of its parts are packaged into one container file. An APK file contains all of a program's code (such as .dex files), resources, assets, certificates, and manifest file. As is the case with many file formats, APK files can have any name needed, provided that the file name ends in the file extension ".apk".APK files are a type of archive file, specifically in zip format-type packages, based on the JAR file format, with .apk as the filename extension. The MIME type associated with APK files is application/vnd.android.package-archive.APK files can be installed on Android-powered devices just like installing software on a PC. When a user downloads and installs an Android application, from either an official source (such as the Google Play Store), or from an unofficial site, they are installing an APK file on to their device. A user or developer can also install an APK file directly to a device (that is, not via download from the network) from a desktop computer, using a communication program such as adb, or from within a file manager app in a process known as sideloading. The installation of APK files downloaded outside the Google Play is disabled by default. Users can install unknown APK files by enabling "Unknown sources" from "Accounts and Security" in Settings.

Cabinet (file format)

Cabinet (or CAB) is an archive-file format for Microsoft Windows that supports lossless data compression and embedded digital certificates used for maintaining archive integrity. Cabinet files have .cab filename extensions and are recognized by their first 4 bytes MSCF. Cabinet files were known originally as Diamond files.

The CAB file format may employ the following compression algorithms:

DEFLATE – invented by Phil Katz, the author of the ZIP file format

Quantum compression – licensed from David Stafford, the author of the Quantum archiver

LZX – invented by Jonathan Forbes and Tomi Poutanen, given to Microsoft when Forbes joined the companyA CAB archive can reserve empty spaces in the archive as well as for each file in the archive, for some application-specific uses like digital signatures or arbitrary data. A variety of Microsoft installation technologies use the CAB format - these include Windows Installer, Setup API, Device Installer and AdvPack (used by Internet Explorer to install ActiveX components). CAB files are also often associated with self-extracting programs like IExpress where the executable program extracts the associated CAB file. CAB files are also sometimes embedded into other files. For example, MSI and MSU files (the latter are CAB files with just another filename extension) usually include one or more embedded CAB files.

Cartesian Perceptual Compression

Cartesian Perceptual Compression (abbreviated CPC, with filename extension .cpc) is a proprietary image file format. It was designed for high compression of black-and-white raster Document Imaging for archival scans.

CPC is lossy, has no lossless mode, and is restricted to bi-tonal images. The company which controls the patented format claims it is highly effective in the compression of text, black-and-white (halftone) photographs, and line art. The format is intended for use in the web distribution of legal documents, design plans, and geographical plot maps.

Viewing and converting documents in the CPC format currently requires the download of proprietary software. Although viewing CPC documents is free, as is converting CPC images to other formats, conversion to CPC format requires a purchase.

JSTOR, a United States-based online system for archiving academic journals, converted its online archives to CPC in 1997. The CPC files are used to reduce storage requirements for its online collection, but are temporarily converted on their servers to GIF for display, and to PDF for printing. JSTOR still scans to TIFF G4 and considers those files its preservation masters.

DirectDraw Surface

The DirectDraw Surface container file format (uses the filename extension DDS), is a Microsoft format for storing data compressed with the proprietary S3 Texture Compression (S3TC) algorithm, which can be decompressed in hardware by GPUs. This makes the format useful for storing graphical textures and cubic environment maps as a data file, both compressed and uncompressed. The file extension for this data format is '.dds'.

Doc (computing)

In computing, DOC or doc (an abbreviation of "document") is a filename extension for word processing documents, most commonly in the proprietary Microsoft Word Binary File Format. Historically, the extension was used for documentation in plain text, particularly of programs or computer hardware on a wide range of operating systems. During the 1980s, WordPerfect used DOC as the extension of their proprietary format. Later, in 1983, Microsoft chose to use the DOC extension for their proprietary Microsoft Word format. These uses for the extension have largely disappeared from the PC world.

FictionBook

FictionBook is an open XML-based e-book format which originated and gained popularity in Russia. FictionBook files have the .fb2 filename extension. Some readers also support ZIP-compressed FictionBook files (.fb2.zip or .fbz)

The FictionBook format does not specify the appearance of a document; instead, it describes its structure. For example, there are special tags for epigraphs, verses and quotations. All ebook metadata, such as author name, title, and publisher, are also present in the ebook file. This makes the format convenient for automatic processing, indexing, and ebook collection management, and allows automatic conversion into other formats.

File association

In computing, a file association associates a file with an application capable of opening that file. More commonly, a file association associates a class of files (usually determined by their filename extension, such as .txt) with a corresponding application (such as a text editor).

Interactive Ruby Shell

Interactive Ruby Shell (IRB or irb) is a REPL for programming in the object-oriented scripting language Ruby. The abbreviation irb comes from the fact that the filename extension for Ruby is ".rb", although interactive Ruby files do not have an extension of ".irb".

The program is launched from a command line and allows the execution of Ruby commands with immediate response, experimenting in real-time. It features command history, line editing capabilities, and job control, and is able to communicate directly as a shell script over the Internet and interact with a live server. It was developed by Keiju Ishitsuka.

Lzip

lzip is a free, command-line tool for the compression of data; it employs the Lempel–Ziv–Markov chain algorithm (LZMA) with a user interface that is familiar to users of usual Unix compression tools, such as gzip and bzip2.

Like gzip and bzip2, concatenation is supported to compress multiple files, but the convention is to bundle a file that is an archive itself, such as those created by the tar or cpio Unix programs. Lzip can split the output for the creation of multivolume archives.

The file that is produced by lzip is usually given .lz as its filename extension, and the data is described by the MIME type application/x-lzip.

The lzip suite of programs was written in C++ and C by Antonio Diaz Diaz and is being distributed as free software under the terms of version 2 or later of the GNU General Public License (GPL).

MHTML

MHTML, an initialism of MIME encapsulation of aggregate HTML documents, is a web page archive format used to combine, in a single computer file, the HTML code and its companion resources (such as images, Flash animations, Java applets, and audio and video files) that are represented by external hyperlinks in the web page's HTML code. The content of an MHTML file is encoded using the same techniques that were first developed for HTML email messages, using the MIME content type multipart/related. MHTML files use a .mhtml or .mht filename extension.

The first part of the file is an e-mail header. The second part is normally HTML code. Subsequent parts are additional resources identified by their original uniform resource locators (URLs) and encoded in base64 binary-to-text encoding. MHTML was proposed as an open standard, then circulated in a revised edition in 1999 as RFC 2557.

The .mhtml (Web archive) and .eml (email) filename extensions are interchangeable: either filename extension can be changed from one to the other. An .eml message can be sent by e-mail, and it can be displayed by an email client. An email message can be saved using a .mhtml or .mht filename extension and then opened for display in a web browser or for editing other programs, including word processors and text editors.

MPEG-4 Part 14

MPEG-4 Part 14 or MP4 is a digital multimedia container format most commonly used to store video and audio, but it can also be used to store other data such as subtitles and still images. Like most modern container formats, it allows streaming over the Internet. The only official filename extension for MPEG-4 Part 14 files is .mp4. MPEG-4 Part 14 (formally ISO/IEC 14496-14:2003) is a standard specified as a part of MPEG-4.

Portable media players are sometimes advertised as "MP4 Players", although some are simply MP3 Players that also play AMV video or some other video format, and do not necessarily play the MPEG-4 Part 14 format.

MrSID

MrSID (pronounced Mister Sid) is an acronym that stands for multiresolution seamless image database. It is a file format (filename extension .sid) developed and patented by LizardTech for encoding of georeferenced raster graphics, such as orthophotos.

MrSID originated as the result of research efforts at Los Alamos National Laboratory (LANL).

NFO

NFO may refer to:

National Farmers Organization, a producerist movement founded in the United States in 1955

Naval Flight Officer in the United States Navy or United States Marine Corps

New Fund Offer, a term used in mutual funds in India

NFO, EMS code for Norwegian Forest Cat in the Fédération Internationale Féline

.nfo, a filename extension for de facto standard info text files accompanying compressed software

.nfo, a filename extension used for Folio Infobase data files

Property list

In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files.

Property list files are often used to store a user's settings. They are also used to store information about bundles and applications, a task served by the resource fork in the old Mac OS.

Uniform Type Identifier

A Uniform Type Identifier (UTI) is a text string used on software provided by Apple Inc. to uniquely identify a given class or type of item. Apple provides built-in UTIs to identify common system objects – document or image file types, folders and application bundles, streaming data, clipping data, movie data – and allows third party developers to add their own UTIs for application-specific or proprietary uses. Support for UTIs was added in the Mac OS X 10.4 operating system, integrated into the Spotlight desktop search technology, which uses UTIs to categorize documents. One of the primary design goals of UTIs was to eliminate the ambiguities and problems associated with inferring a file's content from its MIME type, filename extension, or type or creator code.UTIs use a reverse-DNS naming structure. Names may include the ASCII characters A-Z, a-z, 0-9, hyphen ("-"), and period ("."), and all Unicode characters above U+007F. Colons and slashes are prohibited for compatibility with Macintosh and POSIX file path conventions. UTIs support multiple inheritance, allowing files to be identified with any number of relevant types, as appropriate to the contained data.

Types
Properties
Organisation
Operations
Linking
Management

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.