Bit slicing

Bit slicing is a technique for constructing a processor from modules of processors of smaller bit width, for the purpose of increasing the word length; in theory to make an arbitrary n-bit CPU. Each of these component modules processes one bit field or "slice" of an operand. The grouped processing components would then have the capability to process the chosen full word-length of a particular software design.

Bit slicing more or less died out due to the advent of the microprocessor. Recently it's been used in ALUs for quantum computers, and has been used as a software technique (e.g. in x86 CPUs, for cryptography.[1])

Operational details

Bit slice processors usually include an arithmetic logic unit (ALU) of 1, 2, 4, 8 or 16 bits and control lines (including carry or overflow signals that are internal to the processor in non-bitsliced CPU designs).

For example, two 4-bit ALU chips could be arranged side by side, with control lines between them, to form an 8-bit ALU (result need not be power of two, e.g. three 1-bit can make a 3-bit ALU,[2] thus 3-bit (or n-bit) CPU, while 3-bit, or any CPU with higher odd-number of bits, hasn't been manufactured and sold in volume). Four 4-bit ALU chips could be used to build a 16-bit ALU. It would take eight chips to build a 32-bit word ALU. The designer could add as many slices as required to manipulate increasingly longer word lengths.

A microsequencer or control ROM would be used to execute logic to provide data and control signals to regulate function of the component ALUs.

Known bit-slice microprocessor modules:

  • 1-bit slice:
    • ...
  • 8-bit slice:
    • National IMP-8 family (1974), cascadable up to 32-bit
    • Texas Instruments SN54AS888 / SN74AS888
    • Fairchild 100K
    • ZMD U830C (1978/1981), cascadable up to 32 bit

Historical necessity

Bit slicing, although not called that at the time, was also used in computers before large scale integrated circuits (LSI, the predecessor to today's VLSI, or very-large-scale integration circuits). The first bit-sliced machine was EDSAC 2, built at the University of Cambridge Mathematical Laboratory in 1956–1958.

Prior to the mid-1970s and late 1980s there was some debate over how much bus width was necessary in a given computer system to make it function. Silicon chip technology and parts were much more expensive than today. Using multiple, simpler, and thus less expensive ALUs was seen as a way to increase computing power in a cost effective manner. While 32-bit architecture microprocessors were being discussed at the time, few were in production.

The UNIVAC 1100 series mainframes (one of the oldest series, originating in the 1950s) has a 36-bit architecture and the 1100/60 introduced in 1979 used nine Motorola MC10800 4-bit ALU[11] chips to implement the needed word width while using modern integrated circuits.[12]

At the time 16-bit processors were common but expensive, and 8-bit processors, such as the Z80, were widely used in the nascent home computer market.

Combining components to produce bit slice products allowed engineers and students to create more powerful and complex computers at a more reasonable cost, using off-the-shelf components that could be custom-configured. The complexities of creating a new computer architecture were greatly reduced when the details of the ALU were already specified (and debugged).

The main advantage was that bit slicing made it economically possible in smaller processors to use bipolar transistors, which switch much faster than NMOS or CMOS transistors. This allowed for much higher clock rates, where speed was needed; for example DSP functions or matrix transformation, or as in the Xerox Alto, the combination of flexibility and speed, before discrete CPUs were able to deliver that.

Modern use

Software use on non-bit-slice hardware

In more recent times, the term bit-slicing was re-coined by Matthew Kwan[13] to refer to the technique of using a general purpose CPU to implement multiple parallel simple virtual machines using general logic instructions to perform Single Instruction Multiple Data (SIMD) operations. This technique is also known as SIMD Within A Register (SWAR).

This was initially in reference to Eli Biham's 1997 paper A Fast New DES Implementation in Software,[14] which achieved significant gains in performance of DES by using this method.

Bit-sliced quantum computers

To simplify the circuit structure and reduce the hardware cost of quantum computers (proposed to run the MIPS32 instruction set) a 50 GHz superconducting "4-bit bit-slice arithmetic logic unit (ALU) for 32-bit rapid single-flux-quantum microprocessors was demonstrated."[15]

See also


  1. ^ Benadjila, Ryad; Guo, Jian; Lomné, Victor; Peyrin, Thomas (2014-03-21) [2013-07-15]. "Implementing Lightweight Block Ciphers on x86 Architectures". Cryptology Archive Report 2013/445.
  2. ^ "How to Create a 1-bit ALU". Archived from the original on 2017-05-08. […] Here's how you would put three 1-bit ALU to create a 3-bit ALU […]
  3. ^ "3002 - The CPU Shack Museum". Retrieved 2017-11-05.
  4. ^ "(unknown)" (PDF).
  5. ^ "IMP-4 - National Semiconductor". Retrieved 2017-11-05.
  6. ^ "6701 - The CPU Shack Museum". Retrieved 5 November 2017.
  7. ^ "5700/6700 - Monolithic Memories". Retrieved 5 November 2017.
  8. ^ "File:MMI 5701-6701 MCU (August, 1974).pdf" (PDF). Retrieved 5 November 2017.
  9. ^ "Archived copy" (PDF). Archived from the original (PDF) on 2011-02-11. Retrieved 2017-05-21.CS1 maint: Archived copy as title (link)
  10. ^ "SN74S481 - The CPU Shack Museum". Retrieved 5 November 2017.
  11. ^ a b Mueller, Dieter (2012). "The MC10800". Archived from the original on 2018-07-18. Retrieved 2017-11-05.
  12. ^ "Computers Sperry Univac 1100/60 System" (PDF). Delran, NJ, USA: Datapro Research Corporation. January 1983. 70C-877-12. Archived from the original (PDF) on 2016-06-11. Retrieved 2016-01-28.
  13. ^ "Bitslice DES". Retrieved 2017-11-05.
  14. ^ Biham, Eli (1997). "A Fast New DES Implementation in Software". Retrieved 2017-11-05.
  15. ^ Tang, Guang-Ming; Takata, Kensuke; Tanaka, Masamitsu; Fujimaki, Akira; Takagi, Kazuyoshi; Takagi, Naofumi (January 2016) [2015-12-09]. "4-bit Bit-Slice Arithmetic Logic Unit for 32-bit RSFQ Microprocessors". IEEE Transactions on Applied Superconductivity. 26 (1): 1–6. doi:10.1109/TASC.2015.2507125. 1300106. […] 4-bit bit-slice arithmetic logic unit (ALU) for 32-bit rapid single-flux-quantum microprocessors was demonstrated. The proposed ALU covers all of the ALU operations for the MIPS32 instruction set. […] It consists of 3481 Josephson junctions with an area of 3.09 × 1.66 mm2. It achieved the target frequency of 50 GHz and a latency of 524 ps for a 32-bit operation, at the designed DC bias voltage of 2.5 mV […] Another 8-bit parallel ALU has been designed and fabricated with target processing frequency of 30 GHz […] To achieve comparable performance to CMOS parallel microprocessors operating at 2–3 GHz, 4-bit bit-slice processing should be performed with a clock frequency of several tens of gigahertz. Several bit-serial arithmetic circuits have been successfully demonstrated with high-speed clocks of above 50 GHz […]

External links

1-bit architecture

A 1-bit computer architecture is an instruction set architecture for a processor that has datapath widths and data register widths of 1 bit (1/8 octet) wide.

An example of a 1-bit computer built from discrete logic SSI chips were the Wang 700 (1968/1970) and Wang 500 (1970/1971) calculator as well as the Wang 1200 (1971/1972) word processor series of Wang Laboratories.

An example of a 1-bit architecture that was marketed as a CPU is the Motorola MC14500B Industrial Control Unit (ICU), introduced in 1977 and manufactured at least up into the mid 1990s. One of the computers known to be based on this CPU was the WDR 1-bit computer. A typical sequence of instructions from a program for a 1-bit architecture might be:

load digital input 1 into a 1-bit register;

OR the value in the 1-bit register with input 2, leaving the result in the register;

write the value in the 1-bit register to output 1.This architecture was considered superior for programs making decisions rather than performing arithmetic computations, for ladder logic as well as for serial data processing.There are also several design studies for 1-bit architectures in academia, and corresponding 1-bit logic can also be found in programming.

Other examples of 1-bit architectures are programmable logic controllers (PLCs), programmed in instruction list (IL).

Several early massively parallel computers used 1-bit architectures for the processors as well. Examples include the Goodyear MPP and the Connection Machine. By using a 1-bit architecture for the individual processors a very large array (e.g.: the Connection Machine had 65,536 processors) could be constructed with the chip technology available at the time. In this case the slow computation of a 1-bit processor was traded off against the large number of processors.

1-bit CPUs can meanwhile be considered obsolete, not many kinds have been produced and none are known to be available in the major computer component stores (as of 2019, a few MC14500B chips are still available from brokers for obsolete parts.).


In computer architecture, 4-bit integers, memory addresses, or other data units are those that are 4 bits wide. Also, 4-bit CPU and ALU architectures are those that are based on registers, address buses, or data buses of that size. A group of four bits is also called a nibble and has 24 = 16 possible values.

Some of the first microprocessors had a 4-bit word length and were developed around 1970. The TMS 1000, the world's first single-chip microprocessor, was a 4-bit CPU; it had a Harvard architecture, with an on-chip instruction ROM, 8-bit-wide instructions and an on-chip data RAM with 4-bit words. The first commercial microprocessor was the binary-coded decimal (BCD-based) Intel 4004, developed for calculator applications in 1971; it had a 4-bit word length, but had 8-bit instructions and 12-bit addresses.

The HP Saturn processors, used in many Hewlett-Packard calculators between 1984 and 2003 (including the HP 48 series of scientific calculators) are "4-bit" (or hybrid 64-/4-bit) machines; as the Intel 4004 did, they string multiple 4-bit words together, e.g. to form a 20-bit memory address, and most of the registers are 64 bits wide, storing 16 4-bit digits.The 4-bit processors were programmed in assembly language or Forth, e.g. "MARC4 Family of 4 bit Forth CPU" because of the extreme size constraint on programs and because common programming languages (for microcontrollers, 8-bit and larger), such as the C programming language, do not support 4-bit data types (C requires that the size of the char data type be at least 8 bits, and that all data types other than bitfields have a size that is a multiple of the character size). While larger than 4-bit values can be used by combining more than one manually, the language has to support the smaller values used in the combining. If not, assembly is the only option.The 1970s saw the emergence of 4-bit software applications for mass markets like pocket calculators. During the 1980s 4-bit microprocessor were used in handheld electronic games to keep costs low.

In the 1970s and 1980s, a number of research and commercial computers used bit slicing, in which the CPU's arithmetic logic unit (ALU) was built from multiple 4-bit-wide sections, each section including a chip such as an Am2901 or 74181 chip.

The Zilog Z80, although it is an 8-bit microprocessor, has a 4-bit ALU.

AMD Am2900

Am2900 is a family of integrated circuits (ICs) created in 1975 by Advanced Micro Devices (AMD). They were constructed with bipolar devices, in a bit-slice topology, and were designed to be used as modular components each representing a different aspect of a computer control unit (CCU). By using the bit slicing technique, Am2900 family was able to implement a CCU with data, addresses, and instructions to be any multiple of 4 bits by multiplying the number of ICs. One major problem with this modular technique was that it required a larger number of ICs to implement what could be done on a single CPU IC. The Am2901 chip was the arithmetic-logic unit (ALU), and the "core" of the series. It could count using 4 bits and implement binary operations as well as various bit-shifting operations.

The 2901 and some other chips in the family were second sourced by an unusually large number of other manufacturers, starting with Motorola and then Raytheon – both in 1975 – and also Cypress Semiconductor, National Semiconductor, NEC, Thomson, and Signetics. In the Soviet Union and later Russia the Am2900 family was manufactured as the 1804 series (with e.g. the Am2901 designated as KR1804VS1 / Russian: КР1804ВС1) which was still in production as of 2016.

Bit-serial architecture

In digital logic applications, bit-serial architectures send data one bit at a time, along a single wire, in contrast to bit-parallel word architectures, in which data values are sent all bits or a word at once along a group of wires.

All computers before 1951, and most of the early massive parallel processing machines used a bit-serial architecture—they were serial computers.

Bit-serial architectures were developed for digital signal processing in the 1960s through 1980s, including efficient structures for bit-serial multiplication and accumulation.Often N serial processors will take less FPGA area and have a higher total performance than a single N-bit parallel processor.

Common Scrambling Algorithm

The Common Scrambling Algorithm (CSA) is the encryption algorithm used in the DVB digital television broadcasting for encrypting video streams.

CSA was specified by ETSI and adopted by the DVB consortium in May 1994. It is being succeeded by CSA3, based on a combination of 128-bit AES and a confidential block cipher, XRC. However, CSA3 is not yet in any significant use, so CSA continues to be the dominant cipher for protecting DVB broadcasts.

JH (hash function)

JH is a cryptographic hash function submitted to the NIST hash function competition by Hongjun Wu. Though chosen as one of the five finalists of the competition, JH ultimately lost to NIST hash candidate Keccak. JH has a 1024-bit state, and works on 512-bit input blocks. Processing an input block consists of three steps:

XOR the input block into the left half of the state.

Apply a 42-round unkeyed permutation (encryption function) to the state. This consists of 42 repetitions of:

Break the input into 256 4-bit blocks, and map each through one of two 4-bit S-boxes, the choice being made by a 256-bit round-dependent key schedule. Equivalently, combine each input block with a key bit, and map the result through a 5→4 bit S-box.

Mix adjacent 4-bit blocks using a maximum distance separable code over GF(24).

Permute 4-bit blocks so that they will be adjacent to different blocks in following rounds.

XOR the input block into the right half of the state.The resulting digest is the first 224, 256, 384 or 512 bits from the 1024-bit final value.

It is well suited to a bit slicing implementation using the SSE2 instruction set, giving speeds of 16.8 cycles per byte.

Joel McCormack

Joel McCormack is the designer of the NCR Corporation version of the p-code machine, which is a kind of stack machine popular in the 1970s as the preferred way to implement new computing architectures and languages such as Pascal and BCPL. The NCR design shares no common architecture with the Pascal MicroEngine designed by Western Digital but both were meant to execute the UCSD p-System.[1,2]


MatterHackers is an Orange County-based company founded in 2012 that supplies 3D printing materials and tools. MatterHackers is developing their 3D printer control software, MatterControl.

Multigate device

A multigate device or multiple-gate field-effect transistor (MuGFET) refers to a MOSFET (metal–oxide–semiconductor field-effect transistor) that incorporates more than one gate into a single device. The multiple gates may be controlled by a single gate electrode, wherein the multiple gate surfaces act electrically as a single gate, or by independent gate electrodes. A multigate device employing independent gate electrodes is sometimes called a multiple-independent-gate field-effect transistor (MIGFET).

Multigate transistors are one of the several strategies being developed by CMOS semiconductor manufacturers to create ever-smaller microprocessors and memory cells, colloquially referred to as extending Moore's law.Development efforts into multigate transistors have been reported by AMD, Hitachi, IBM, Infineon Technologies, Intel Corporation, TSMC, Freescale Semiconductor, University of California, Berkeley, and others, and the ITRS predicted correctly that such devices will be the cornerstone of sub-32 nm technologies. The primary roadblock to widespread implementation is manufacturability, as both planar and non-planar designs present significant challenges, especially with respect to lithography and patterning. Other complementary strategies for device scaling include channel strain engineering, silicon-on-insulator-based technologies, and high-κ/metal gate materials.

Dual-gate MOSFETs are commonly used in very high frequency (VHF) mixers and in sensitive VHF front-end amplifiers. They are available from manufacturers such as Motorola, NXP Semiconductors, and Hitachi.


Slice may refer to:


This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.