Audio signal processing

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waveslongitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.


The motivation for audio signal processing began at the beginning of the 20th century with inventions like the telephone, phonograph, and radio that allowed for the transmission and storage of audio signals. Audio processing was necessary for early radio broadcasting, as there were many problems with studio-to-transmitter links.[1] The theory of signal processing and it's application to audio was largely developed at Bell Labs in the mid 20th century. Claude Shannon and Harry Nyquist's early work on communication theory, sampling theory, and Pulse-code modulation laid the foundations for the field. In 1957, Max Mathews became the first person to synthesize audio from a computer, giving birth to computer music.

Analog signals

An analog audio signal is a continuous signal represented by an electrical voltage or current that is “analogous” to the sound waves in the air. Analog signal processing then involves physically altering the continuous signal by changing the voltage or current or charge via electrical circuits.

Historically, before the advent of widespread digital technology, analog was the only method by which to manipulate a signal. Since that time, as computers and software have become more capable and affordable and digital signal processing has become the method of choice. However, in music applications analog technology is often still desirable as it often produces nonlinear responses that are difficult to replicate with digital filters.

Digital signals

A digital representation expresses the audio waveform as a sequence of symbols, usually binary numbers. This permits signal processing using digital circuits such as digital signal processors, microprocessors and general-purpose computers. Most modern audio systems use a digital approach as the techniques of digital signal processing are much more powerful and efficient than analog domain signal processing.[2]

Application areas

Processing methods and application areas include storage, data compression, music information retrieval, speech processing, localization, acoustic detection, transmission, noise cancellation, acoustic fingerprinting, sound recognition, synthesis, and enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.).

Audio broadcasting

Audio signal processing is used when broadcasting audio signals in order to enhance their fidelity or optimize for bandwidth or latency. In this domain, the most important audio processing takes place just before the transmitter. The audio processor here must prevent or minimize overmodulation, compensate for non-linear transmitters (a potential issue with medium wave and shortwave broadcasting), and adjust overall loudness to desired level.

Active noise control

Active noise control is a technique designed to reduce unwanted sound. By creating a signal that is identical to the unwanted noise but with the opposite polarity, the two signals cancel out due to destructive interference.

Audio synthesis

Audio synthesis is the electronic generation of audio signals. A musical instrument that accomplishes this is called a synthesizer. Synthesizers can either imitate sounds or generate new ones. Audio synthesis is also used to generate human speech using speech synthesis.

Audio effects

Audio effects are systems designed to alter how an audio signal sounds. Unprocessed audio is metaphorically referred to as dry, while processed audio is referred to as wet.[3]

  • delay or echo - To simulate the effect of reverberation in a large hall or cavern, one or several delayed signals are added to the original signal. To be perceived as echo, the delay has to be of order 35 milliseconds or above. Short of actually playing a sound in the desired environment, the effect of echo can be implemented using either digital or analog methods. Analog echo effects are implemented using tape delays or bucket-brigade devices. When large numbers of delayed signals are mixed a reverberation effect is produced; The resulting sound has the effect of being presented in a large room.
  • flanger - to create an unusual sound, a delayed signal is added to the original signal with a continuously variable delay (usually smaller than 10 ms). This effect is now done electronically using DSP, but originally the effect was created by playing the same recording on two synchronized tape players, and then mixing the signals together. As long as the machines were synchronized, the mix would sound more-or-less normal, but if the operator placed his finger on the flange of one of the players (hence "flanger"), that machine would slow down and its signal would fall out-of-phase with its partner, producing a phasing comb filter effect. Once the operator took his finger off, the player would speed up until it was back in phase with the master, and as this happened, the phasing effect would appear to slide up the frequency spectrum. This phasing up-and-down the register can be performed rhythmically.
  • phaser - another way of creating an unusual sound; the signal is split, a portion is filtered with a variable all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed to produce a comb filter. The phaser effect was originally a simpler implementation of the flanger effect since delays were difficult to implement with analog equipment.
  • chorus - a delayed signal is added to the original signal with a constant delay. The delay has to be short in order not to be perceived as echo, but above 5 ms to be audible. If the delay is too short, it will destructively interfere with the un-delayed signal and create a flanging effect. Often, the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices.
  • equalization - different frequency bands are attenuated or boosted to produce desired spectral characteristics. Moderate use of equalization (often abbreviated as "EQ") can be used to "fine-tune" the tone quality of a recording; extreme use of equalization, such as heavily cutting a certain frequency can create more unusual effects.
  • filtering - Equalization is a form of filtering. In the general sense, frequency ranges can be emphasized or attenuated using low-pass, high-pass, band-pass or band-stop filters. Band-pass filtering of voice can simulate the effect of a telephone because telephones use band-pass filters.
  • overdrive effects such as the use of a fuzz box can be used to produce distorted sounds, such as for imitating robotic voices or to simulate distorted radiotelephone traffic (e.g., the radio chatter between starfighter pilots in the science fiction film Star Wars). The most basic overdrive effect involves clipping the signal when its absolute value exceeds a certain threshold.
  • pitch shift - this effect shifts a signal up or down in pitch. For example, a signal may be shifted an octave up or down. This is usually applied to the entire signal, and not to each note separately. Blending the original signal with shifted duplicate(s) can create harmonies from one voice. Another application of pitch shifting is pitch correction. Here a musical signal is tuned to the correct pitch using digital signal processing techniques. This effect is ubiquitous in karaoke machines and is often used to assist pop singers who sing out of tune. It is also used intentionally for aesthetic effect in such pop songs as Cher's Believe and Madonna's Die Another Day.
  • time stretching - the complement of pitch shift, that is, the process of changing the speed of an audio signal without affecting its pitch.
  • resonators - emphasize harmonic frequency content on specified frequencies. These may be created from parametric EQs or from delay-based comb-filters.
  • robotic voice effects are used to make an actor's voice sound like a synthesized human voice.
  • modulation - to change the frequency or amplitude of a carrier signal in relation to a predefined signal. Ring modulation, also known as amplitude modulation, is an effect made famous by Doctor Who's Daleks and commonly used throughout sci-fi.
  • compression - the reduction of the dynamic range of a sound to avoid unintentional fluctuation in the dynamics. Level compression is not to be confused with audio data compression, where the amount of data is reduced without affecting the amplitude of the sound it represents.
  • 3D audio effects - place sounds outside the stereo basis
  • reverse echo - a swelling effect created by reversing an audio signal and recording echo and/or delay while the signal runs in reverse. When played back forward the last echos are heard before the effected sound creating a rush like swell preceding and during playback. Jimmy Page of Led Zeppelin used this effect in the bridge of "Whole Lotta Love".[4][5][6]
  • wave field synthesis - a spatial audio rendering technique for the creation of virtual acoustic environments

See also


  1. ^ Atti, Andreas Spanias, Ted Painter, Venkatraman (2006). Audio signal processing and coding ([Online-Ausg.] ed.). Hoboken, NJ: John Wiley & Sons. p. 464. ISBN 0-471-79147-4.
  2. ^ Zölzer, Udo (1997). Digital Audio Signal Processing. John Wiley and Sons. ISBN 0-471-97226-6.
  3. ^ Hodgson, Jay (2010). Understanding Records, p.95. ISBN 978-1-4411-5607-5.
  4. ^ "WHOLE LOTTA LOVE by LED ZEPPELIN". Retrieved 5 January 2018.
  5. ^ O'Neil, Bill. "Page's Studio Tricks III (Backwards echo)". Retrieved 5 January 2018.
  6. ^ Audrey. "The History of Reverse Reverb". Retrieved 5 January 2018.

Further reading

Acoustical engineering

Acoustical engineering (also known as acoustic engineering) is the branch of engineering dealing with sound and vibration. It is the application of acoustics, the science of sound and vibration, in technology. Acoustical engineers are typically concerned with the design, analysis and control of sound.

One goal of acoustical engineering can be the reduction of unwanted noise, which is referred to as noise control. Unwanted noise can have significant impacts on animal and human health and well-being, reduce attainment by students in schools, and cause hearing loss. Noise control principles are implemented into technology and design in a variety of ways, including control by redesigning sound sources, the design of noise barriers, sound absorbers, suppressors, and buffer zones, and the use of hearing protection (earmuffs or earplugs).

But acoustical engineering is not just about noise control; it also covers positive uses of sound, from the use of ultrasound in medicine to the programming of digital sound synthesizers, and from designing a concert hall to enhance the sound of an orchestra to specifying a railway station's sound system so announcements are intelligible.

Aphex Systems

Aphex is a brand of audio signal processing equipment. Aphex Systems was founded in 1975 in Massachusetts. The company changed its name to Aphex in 2010.

Audio engineer

An audio engineer (also known as a sound engineer or recording engineer) helps to produce a recording or a live performance, balancing and adjusting sound sources using equalization and audio effects, mixing, reproduction, and reinforcement of sound. Audio engineers work on the "...technical aspect of recording—the placing of microphones, pre-amp knobs, the setting of levels. The physical recording of any project is done by an engineer ... the nuts and bolts." It's a creative hobby and profession where musical instruments and technology are used to produce sound for film, radio, television, music, and video games. Audio engineers also set up, sound check and do live sound mixing using a mixing console and a sound reinforcement system for music concerts, theatre, sports games and corporate events.

Alternatively, audio engineer can refer to a scientist or professional engineer who holds an engineering degree and who designs, develops and builds audio or musical technology working under terms such as acoustical engineering, electronic/electrical engineering or (musical) signal processing.

Computer Music Journal

Computer Music Journal is a peer-reviewed academic journal that covers a wide range of topics related to digital audio signal processing and electroacoustic music. It is published on-line and in hard copy by MIT Press. The journal is accompanied by an annual CD/DVD that collects audio and video work by various electronic artists. Computer Music Journal was established in 1977. According to the Journal Citation Reports, the journal has a 2016 impact factor of 0.405.

DirectX Media

DirectX Media is a set of multimedia-related APIs for Microsoft Windows complementing DirectX. It included DirectAnimation for 2D/3D web animation, DirectShow for multimedia playback and streaming media, DirectX Transform for web interactivity, and Direct3D Retained Mode for higher level 3D graphics. DirectShow additionally contained DirectX plugins for audio signal processing and DirectX Video Acceleration for accelerated video playback.

DirectX Media runtime components were distributed as part of Internet Explorer. DirectX Media SDK and DirectX SDK existed as two separate SDKs until DirectX 6.0. Later on, Microsoft deprecated DirectX Media and integrated DirectShow, the key part of DirectX Media, into DirectX. As of April 2005, DirectShow was removed from DirectX and moved to the Microsoft Platform SDK instead. DirectX is, however, still required to build the DirectShow samples. DirectShow and its components are to be gradually deprecated in favor of the newer Media Foundation.

Retained Mode was used by a variety of applications and can still be implemented on systems newer than XP by copying the d3drm.dll file from an older version of Windows to the system32 directory (for 32 bit Windows) or SysWOW64 directory (for 64 bit Windows) to regain system-wide support.


In audio signal processing and acoustics, echo is a reflection of sound that arrives at the listener with a delay after the direct sound. The delay is directly proportional to the distance of the reflecting surface from the source and the listener. Typical examples are the echo produced by the bottom of a well, by a building, or by the walls of an enclosed room and an empty room. A true echo is a single reflection of the sound source.The word echo derives from the Greek ἠχώ (ēchō), itself from ἦχος (ēchos), "sound". Echo in the folk story of Greek is a mountain nymph whose ability to speak was cursed, only able to repeat the last words anyone spoke to her. Some animals use echo for location sensing and navigation, such as cetaceans (dolphins and whales) and bats.

Exciter (effect)

An exciter (also called a harmonic exciter or aural exciter) is an audio signal processing technique used to enhance a signal by dynamic equalization, phase manipulation, harmonic synthesis of (usually) high frequency signals, and through the addition of subtle harmonic distortion. Dynamic equalization involves variation of the equalizer characteristics in the time domain as a function of the input. Due to the varying nature, noise is reduced compared to static equalizers. Harmonic synthesis involves the creation of higher order harmonics from the fundamental frequency signals present in the recording. As noise is usually more prevalent at higher frequencies, the harmonics are derived from a purer frequency band resulting in clearer highs. Exciters are also used to synthesize harmonics of low frequency signals to simulate deep bass in smaller speakers.

Originally made in valve (tube) based equipment, they are now implemented as part of a digital signal processor, often trying to emulate analogue Exciters. Exciters are mostly found as plug-ins for sound editing software and in sound enhancement processors.

Feedback suppressor

A feedback suppressor is an audio signal processing device which is used in the signal path in a live sound reinforcement system to prevent or suppress audio feedback.

Headroom (audio signal processing)

In digital and analog audio, headroom refers to the amount by which the signal-handling capabilities of an audio system exceed a designated nominal level. Headroom can be thought of as a safety zone allowing transient audio peaks to exceed the nominal level without damaging the system or the audio signal, e.g., via clipping. Standards bodies differ in their recommendations for nominal level and headroom.

IEEE James L. Flanagan Speech and Audio Processing Award

The IEEE James L. Flanagan Speech and Audio Processing Award is a Technical Field Award presented by the IEEE for an outstanding contribution to the advancement of speech and/or audio signal processing. It may be presented to an individual or a team of up to three people. The award was established by the IEEE Board of Directors in 2002. The award is named after James L. Flanagan, who was a scientist from Bell Labs where he worked on acoustics for many years.

Recipients of this award receive a bronze medal, certificate and honorarium.


LADSPA is an acronym for Linux Audio Developer's Simple Plugin API. It is an application programming interface (API) standard for handling audio filters and audio signal processing effects, licensed under the GNU Lesser General Public License (LGPL). It was originally designed for Linux through consensus on the Linux Audio Developers Mailing List, but works on a variety of other platforms. It is used in many free audio software projects and there is a wide range of LADSPA plug-ins available.

LADSPA exists primarily as a header file written in the programming language C.

There are many audio plugin standards and most major modern software synthesizers and sound editors support a variety. The best known standard is probably Steinberg Cubase's Virtual Studio Technology. LADSPA is unusual in that it attempts to provide only the "Greatest Common Divisor" of other standards. This means that its scope is limited, but it is simple and plugins written using it are easy to embed in many other programs. The standard has changed little with time, so compatibility problems are rare.

DSSI extends LADSPA to cover instrument plugins.

LV2 is a successor, based on LADSPA and DSSI, but permitting easy extensibility, allowing custom user interfaces, MIDI messages, and custom extensions.

Linear predictive coding

Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters.

List of audio programming languages

This is a list of notable programming languages optimized for sound production, algorithmic composition, and sound synthesis.

ABC notation, a language for notating music using the ASCII character set

ChucK, strongly timed, concurrent, and on-the-fly audio programming language

Real-time Cmix, a MUSIC-N synthesis language somewhat similar to Csound

Common Lisp Music (CLM), a music synthesis and signal processing package in the Music V family

Csound, a MUSIC-N synthesis language released under the LGPL with many available unit generators

Extempore, a live-coding environment which borrows a core foundation from the Impromptu environment

FAUST, Functional Audio Stream, a functional compiled language for efficient real-time audio signal processing

Hierarchical Music Specification Language (HMSL), optimized more for music than synthesis, developed in the 1980s in Forth

Impromptu, a Scheme language environment for Mac OS X capable of sound and video synthesis, algorithmic composition, and 2D and 3D graphics programming

JFugue, a Java and JVM library for programming music that outputs to MIDI and has the ability to convert to formats including ABC Notation, Lilypond, and MusicXML



Kyma (sound design language)

Max/MSP, a proprietary, modular visual programming language aimed at sound synthesis for music

Music Macro Language (MML), often used to produce chiptune music in Japan

Music21 an algorithmic programming composition tool based in python, allowing integration of machine learning for advanced musicology and algorithmic composition

MUSIC-N, includes versions I, II, III, IV, IV-B, IV-BF, V, 11, and 360



Pure Data, a modular visual programming language for signal processing aimed at music creation


Sonic Pi

Structured Audio Orchestra Language (SAOL), part of the MPEG-4 Structured Audio standard



Audulus, a visual programming language and suite in which one uses singular nodes of simulated modular synthesis components to build modules, or uses pre-built custom designed modules from library, which are built by the forum community, and is able to design custom patches for sound synthesis. It bears a bit of a similarity to Max/MSP.

Phasing (disambiguation)

Phasing may refer to:

Phasing, a technique in musical composition

Phasing, the use of the Phaser (effect), an audio signal processing technique


Q-LAN is the audio over IP audio networking technology component of the Q-Sys audio signal processing platform from QSC Audio Products.

Speech coding

Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.The two most important applications of speech coding are mobile telephony and voice over IP.The techniques employed in speech coding are similar to those used in audio data compression and audio coding where knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory system. For example, in voiceband speech coding, only information in the frequency band 400 Hz to 3500 Hz is transmitted but the reconstructed signal is still adequate for intelligibility.

Speech coding differs from other forms of audio coding in that speech is a simpler signal than most other audio signals, and a lot more statistical information is available about the properties of speech. As a result, some auditory information which is relevant in audio coding can be unnecessary in the speech coding context. In speech coding, the most important criterion is preservation of intelligibility and "pleasantness" of speech, with a constrained amount of transmitted data.In addition, most speech applications require low coding delay, as long coding delays interfere with speech interaction.

Speech enhancement

Speech enhancement aims to improve speech quality by using various algorithms.

The objective of enhancement is improvement in intelligibility and/or overall perceptual quality of degraded speech signal using audio signal processing techniques.

Enhancing of speech degraded by noise, or noise reduction, is the most important field of speech enhancement, and used for many applications such as mobile phones, VoIP, teleconferencing systems, speech recognition, and hearing aids


Universal Audio (company)

Universal Audio is a designer and importer of audio signal processing hardware and DSP software founded in 1958 by Bill Putnam (→Universal Audio). The current incarnation of the company was re-established in 1999 by brothers Bill Putnam, Jr. and Jim Putnam. Universal Audio merged with Kind of Loud Technologies to "...reproduce classic analog recording equipment designed by their father and his colleagues," and "...research and design new recording tools in the spirit of vintage analog technology."Universal Audio replicates modern versions of vintage UREI and Teletronix designs. Universal Audio also designs and imports DSP cards and audio plugins for music production on the UAD-2 platform. The company has won several TEC Award awards and a FutureMusic Platinum award. The founder's son, Bill Putnam Jr. is CEO.

Waves Audio

Waves Audio Ltd. is a developer and supplier of professional digital audio signal processing technologies and audio effects, used in recording, mixing, mastering, post production, broadcast, and live sound. The company's corporate headquarters and main development facilities are located in Tel Aviv, with additional offices in the United States, China, and Taiwan, and development centers in India and Ukraine.

In 2011, Waves won a Technical Grammy Award.

Subcarrier signals
Signal processing
  • Practices

  • aesthetics
  • Roles

  • professions

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.