In computer engineering, computer architecture is a set of rules and methods that describe the functionality, organization, and implementation of computer systems. Some definitions of architecture define it as describing the capabilities and programming model of a computer but not a particular implementation. In other definitions computer architecture involves instruction set architecture design, microarchitecture design, logic design, and implementation.
The first documented computer architecture was in the correspondence between Charles Babbage and Ada Lovelace, describing the analytical engine. When building the computer Z1 in 1936, Konrad Zuse described in two patent applications for his future projects that machine instructions could be stored in the same storage used for data, i.e. the stored-program concept. Two other early and important examples are:
The term “architecture” in computer literature can be traced to the work of Lyle R. Johnson and Frederick P. Brooks, Jr., members of the Machine Organization department in IBM’s main research center in 1959. Johnson had the opportunity to write a proprietary research communication about the Stretch, an IBM-developed supercomputer for Los Alamos National Laboratory (at the time known as Los Alamos Scientific Laboratory). To describe the level of detail for discussing the luxuriously embellished computer, he noted that his description of formats, instruction types, hardware parameters, and speed enhancements were at the level of “system architecture” – a term that seemed more useful than “machine organization.”
Subsequently, Brooks, a Stretch designer, started Chapter 2 of a book (Planning a Computer System: Project Stretch, ed. W. Buchholz, 1962) by writing,
Computer architecture, like other architecture, is the art of determining the needs of the user of a structure and then designing to meet those needs as effectively as possible within economic and technological constraints.
Brooks went on to help develop the IBM System/360 (now called the IBM zSeries) line of computers, in which “architecture” became a noun defining “what the user needs to know”. Later, computer users came to use the term in many less-explicit ways.
The earliest computer architectures were designed on paper and then directly built into the final hardware form. Later, computer architecture prototypes were physically built in the form of a transistor–transistor logic (TTL) computer—such as the prototypes of the 6800 and the PA-RISC—tested, and tweaked, before committing to the final hardware form. As of the 1990s, new computer architectures are typically "built", tested, and tweaked—inside some other computer architecture in a computer architecture simulator; or inside a FPGA as a soft microprocessor; or both—before committing to the final hardware form.
The discipline of computer architecture has three main subcategories:
There are other types of computer architecture. The following types are used in bigger companies like Intel, and count for 1% of all of computer architecture
The purpose is to design a computer that maximizes performance while keeping power consumption in check, costs low relative to the amount of expected performance, and is also very reliable. For this, many aspects are to be considered which includes instruction set design, functional organization, logic design, and implementation. The implementation involves integrated circuit design, packaging, power, and cooling. Optimization of the design requires familiarity with compilers, operating systems to logic design, and packaging.
An instruction set architecture (ISA) is the interface between the computer's software and hardware and also can be viewed as the programmer's view of the machine. Computers do not understand high-level programming languages such as Java, C++, or most programming languages used. A processor only understands instructions encoded in some numerical fashion, usually as binary numbers. Software tools, such as compilers, translate those high level languages into instructions that the processor can understand.
Besides instructions, the ISA defines items in the computer that are available to a program—e.g. data types, registers, addressing modes, and memory. Instructions locate these available items with register indexes (or names) and memory addressing modes.
The ISA of a computer is usually described in a small instruction manual, which describes how the instructions are encoded. Also, it may define short (vaguely) mnemonic names for the instructions. The names can be recognized by a software development tool called an assembler. An assembler is a computer program that translates a human-readable form of the ISA into a computer-readable form. Disassemblers are also widely available, usually in debuggers and software programs to isolate and correct malfunctions in binary computer programs.
ISAs vary in quality and completeness. A good ISA compromises between programmer convenience (how easy the code is to understand), size of the code (how much code is required to do a specific action), cost of the computer to interpret the instructions (more complexity means more hardware needed to decode and execute the instructions), and speed of the computer (with more complex decoding hardware comes longer decode time). Memory organization defines how instructions interact with the memory, and how memory interacts with itself.
During design emulation software (emulators) can run programs written in a proposed instruction set. Modern emulators can measure size, cost, and speed to determine if a particular ISA is meeting its goals.
Computer organization helps optimize performance-based products. For example, software engineers need to know the processing power of processors. They may need to optimize software in order to gain the most performance for the lowest price. This can require quite detailed analysis of the computer's organization. For example, in a SD card, the designers might need to arrange the card so that the most data can be processed in the fastest possible way.
Computer organization also helps plan the selection of a processor for a particular project. Multimedia projects may need very rapid data access, while virtual machines may need fast interrupts. Sometimes certain tasks need additional components as well. For example, a computer capable of running a virtual machine needs virtual memory hardware so that the memory of different virtual computers can be kept separated. Computer organization and features also affect power consumption and processor cost.
Once an instruction set and micro-architecture are designed, a practical machine must be developed. This design process is called the implementation. Implementation is usually not considered architectural design, but rather hardware design engineering. Implementation can be further broken down into several steps:
The exact form of a computer system depends on the constraints and goals. Computer architectures usually trade off standards, power versus performance, cost, memory capacity, latency (latency is the amount of time that it takes for information from one node to travel to the source) and throughput. Sometimes other considerations, such as features, size, weight, reliability, and expandability are also factors.
The most common scheme does an in depth power analysis and figures out how to keep power consumption low, while maintaining adequate performance.
Modern computer performance is often described in IPC (instructions per cycle). This measures the efficiency of the architecture at any clock frequency. Since a faster rate can make a faster computer, this is a useful measurement. Older computers had IPC counts as low as 0.1 instructions per cycle. Simple modern processors easily reach near 1. Superscalar processors may reach three to five IPC by executing several instructions per clock cycle.
Counting machine language instructions would be misleading because they can do varying amounts of work in different ISAs. The "instruction" in the standard measurements is not a count of the ISA's actual machine language instructions, but a unit of measurement, usually based on the speed of the VAX computer architecture.
Many people used to measure a computer's speed by the clock rate (usually in MHz or GHz). This refers to the cycles per second of the main clock of the CPU. However, this metric is somewhat misleading, as a machine with a higher clock rate may not necessarily have greater performance. As a result, manufacturers have moved away from clock speed as a measure of performance.
There are two main types of speed: latency and throughput. Latency is the time between the start of a process and its completion. Throughput is the amount of work done per unit time. Interrupt latency is the guaranteed maximum response time of the system to an electronic event (like when the disk drive finishes moving some data).
Performance is affected by a very wide range of design choices — for example, pipelining a processor usually makes latency worse, but makes throughput better. Computers that control machinery usually need low interrupt latencies. These computers operate in a real-time environment and fail if an operation is not completed in a specified amount of time. For example, computer-controlled anti-lock brakes must begin braking within a predictable, short time after the brake pedal is sensed or else failure of the brake will occur.
Benchmarking takes all these factors into account by measuring the time a computer takes to run through a series of test programs. Although benchmarking shows strengths, it shouldn't be how you choose a computer. Often the measured machines split on different measures. For example, one system might handle scientific applications quickly, while another might render video games more smoothly. Furthermore, designers may target and add special features to their products, through hardware or software, that permit a specific benchmark to execute quickly but don't offer similar advantages to general tasks.
Power efficiency is another important measurement in modern computers. A higher power efficiency can often be traded for lower speed or higher cost. The typical measurement when referring to power consumption in computer architecture is MIPS/W (millions of instructions per second per watt).
Modern circuits have less power required per transistor as the number of transistors per chip grows. This is because each transistor that is put in a new chip requires its own power supply and requires new pathways to be built to power it. However the number of transistors per chip is starting to increase at a slower rate. Therefore, power efficiency is starting to become as important, if not more important than fitting more and more transistors into a single chip. Recent processor designs have shown this emphasis as they put more focus on power efficiency rather than cramming as many transistors into a single chip as possible. In the world of embedded computers, power efficiency has long been an important goal next to throughput and latency.
Increases in clock frequency have grown more slowly over the past few years, compared to power reduction improvements. This has been driven by the end of Moore's Law and demand for longer battery life and reductions in size for mobile technology. This change in focus from higher clock rates to power consumption and miniaturization can be shown by the significant reductions in power consumption, as much as 50%, that were reported by Intel in their release of the Haswell microarchitecture; where they dropped their power consumption benchmark from 30-40 watts down to 10-20 watts. Comparing this to the processing speed increase of 3 GHz to 4 GHz (2002 to 2006) it can be seen that the focus in research and development are shifting away from clock frequency and moving towards consuming less power and taking up less space.
Architecture describes the internal organization of a computer in an abstract way; that is, it defines the capabilities of the computer and its programming model. You can have two computers that have been constructed in different ways with different technologies but with the same architecture.
This task has many aspects, including instruction set design, functional organization, logic design,and implementation.
ACM SIGARCH is the Association for Computing Machinery's Special Interest Group on computer architecture, a community of computer professionals and students from academia and industry involved in research and professional practice related to computer architecture and design. The organization sponsors many prestigious international conferences in this area, including the International Symposium on Computer Architecture (ISCA), recognized as the top conference in this area since 1975. Together with IEEE Computer Society's Technical Committee on Computer Architecture (TCCA), it is one of the two main professional organizations for people working in computer architecture.ACM SIGARCH was formed in August 1971, initially as a Special Interest Committee (a precursor to a SIG), with Michael J. Flynn as the founding chairman. Flynn was also the founding chairman of IEEE Computer Society's TCCA and encouraged from the beginning, joint cooperation between the two groups. Many of the joint symposiums and conferences are the leading events in the field.Abstraction layer
In computing, an abstraction layer or abstraction level is a way of hiding the working details of a subsystem, allowing the separation of concerns to facilitate interoperability and platform independence. Examples of software models that use layers of abstraction include the OSI model for network protocols, OpenGL and other graphics libraries.
In computer science, an abstraction layer is a generalization of a conceptual model or algorithm, away from any specific implementation. These generalizations arise from broad similarities that are best encapsulated by models that express similarities present in various specific implementations. The simplification provided by a good abstraction layer allows for easy reuse by distilling a useful concept or design pattern so that situations where it may be accurately applied can be quickly recognized.
A layer is considered to be on top of another if it depends on it. Every layer can exist without the layers above it, and requires the layers below it to function. Frequently abstraction layers can be composed into a hierarchy of abstraction levels. The OSI model comprises seven abstraction layers. Each layer of the model encapsulates and addresses a different part of the needs of digital communications, thereby reducing the complexity of the associated engineering solutions.
A famous aphorism of David Wheeler is "All problems in computer science can be solved by another level of indirection". This is often deliberately misquoted with "abstraction" substituted for "indirection". It is also sometimes misattributed to Butler Lampson. Kevlin Henney's corollary to this is, "...except for the problem of too many layers of indirection."Cellular architecture
A cellular architecture is a type of computer architecture prominent in parallel computing. Cellular architectures are relatively new, with IBM's Cell microprocessor being the first one to reach the market. Cellular architecture takes multi-core architecture design to its logical conclusion, by giving the programmer the ability to run large numbers of concurrent threads within a single processor. Each 'cell' is a compute node containing thread units, memory, and communication. Speed-up is achieved by exploiting thread-level parallelism inherent in many applications.
Cell, a cellular architecture containing 9 cores, is the processor used in the PlayStation 3. Another prominent cellular architecture is Cyclops64, a massively parallel architecture currently under development by IBM.
Cellular architectures follow the low-level programming paradigm, which exposes the programmer to much of the underlying hardware. This allows the programmer to greatly optimize their code for the platform, but at the same time makes it more difficult to develop software.Dataflow architecture
Dataflow architecture is a computer architecture that directly contrasts the traditional von Neumann architecture or control flow architecture. Dataflow architectures do not have a program counter, or (at least conceptually) the executability and execution of instructions is solely determined based on the availability of input arguments to the instructions, so that the order of instruction execution is unpredictable: i. e. behavior is nondeterministic.
Although no commercially successful general-purpose computer hardware has used a dataflow architecture, it has been successfully implemented in specialized hardware such as in digital signal processing, network routing, graphics processing, telemetry, and more recently in data warehousing. It is also very relevant in many software architectures today including database engine designs and parallel computing frameworks.Synchronous dataflow architectures tune to match the workload presented by real-time data path applications such as wire speed packet forwarding. Dataflow architectures that are deterministic in nature enable programmers to manage complex tasks such as processor load balancing, synchronization and accesses to common resources.Meanwhile, there is a clash of terminology, since the term dataflow is used for a subarea of parallel programming: for dataflow programming.HP Labs
HP Labs is the exploratory and advanced research group for HP Inc. HP Labs' headquarters is in Palo Alto, California and the group has research and development facilities in Bristol, UK. The development of programmable desktop calculators, inkjet printing, and 3D graphics are credited to HP Labs researchers.
HP Labs was established on March 3, 1966, by founders Bill Hewlett and David Packard, seeking to create an organization not bound by day-to-day business concerns.The labs have downsized dramatically; in August 2007, HP executives drastically diminished the number of projects, down from 150 to 30. As of 2018, HP Labs has just over 200 researchers, compared to earlier staffing levels of 500 researchers.With the Hewlett Packard Enterprise being spun off from Hewlett-Packard in November 1, 2015 and renamed to and HP Inc., the research lab also spun off Hewlett Packard Labs to Hewlett Packard Enterprise and HP Labs was kept for HP Inc.Harvard architecture
The Harvard architecture is a computer architecture with physically separate storage and signal pathways for instructions and data. The term originated from the Harvard Mark I relay-based computer, which stored instructions on punched tape (24 bits wide) and data in electro-mechanical counters. These early machines had data storage entirely contained within the central processing unit, and provided no access to the instruction storage as data. Programs needed to be loaded by an operator; the processor could not initialize itself.
Today, most processors implement such separate signal pathways for performance reasons, but actually implement a modified Harvard architecture, so they can support tasks like loading a program from disk storage as data and then executing it.Hazard (computer architecture)
In the domain of central processing unit (CPU) design, hazards are problems with the instruction pipeline in CPU microarchitectures when the next instruction cannot execute in the following clock cycle, and can potentially lead to incorrect computation results. Three common types of hazards are data hazards, structural hazards, and control hazards (branching hazards).There are several methods used to deal with hazards, including pipeline stalls/pipeline bubbling, operand forwarding, and in the case of out-of-order execution, the scoreboarding method and the Tomasulo algorithm.International Symposium on Computer Architecture
The International Symposium on Computer Architecture (ISCA) is an annual academic conference on computer architecture, generally viewed as the top-tier in the field. Association for Computing Machinery's Special Interest Group on Computer Architecture (ACM SIGARCH) and Institute of Electrical and Electronics Engineers Computer Society are technical sponsors.
ISCA has participated in the Federated Computing Research Conference in 1993, 1996, 1999, 2003, 2007, 2011 and 2015, every year that the conference has been organized.Load–store architecture
In computer engineering, a load–store architecture is an instruction set architecture that divides instructions into two categories: memory access (load and store between memory and registers), and ALU operations (which only occur between registers).RISC instruction set architectures such as PowerPC, SPARC, RISC-V, ARM, and MIPS are load–store architectures.For instance, in a load–store approach both operands and destination for an ADD operation must be in registers. This differs from a register–memory architecture (for example, a CISC instruction set architecture such as x86) in which one of the operands for the ADD operation may be in memory, while the other is in a register.The earliest example of a load–store architecture was the CDC 6600. Almost all vector processors (including many GPUs) use the load–store approach.Load–store unit
In computer engineering a load–store unit is a specialized execution unit responsible for executing all load and store instructions, generating virtual addresses of load and store operations and loading data from memory or storing it back to memory from registers.The load–store unit usually includes a queue which acts as a waiting area for memory instructions, and the unit itself operates independently of other processor units.Load–store units may also be used in vector processing, and in such cases the term "load–store vector" may be used.Some load-store units are also capable of executing simple fixed-point and/or integer operations.Memory hierarchy
In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies. Memory hierarchy affects performance in computer architectural design, algorithm predictions, and lower level programming constructs involving locality of reference.
Designing for high performance requires considering the restrictions of the memory hierarchy, i.e. the size and capabilities of each component. Each of the various components can be viewed as part of a hierarchy of memories (m1,m2,...,mn) in which each member mi is typically smaller and faster than the next highest member mi+1 of the hierarchy. To limit waiting by higher levels, a lower level will respond by filling a buffer and then signaling for activating the transfer.
There are four major storage levels.
Internal – Processor registers and cache.
Main – the system RAM and controller cards.
On-line mass storage – Secondary storage.
Off-line bulk storage – Tertiary and Off-line storage.This is a general memory hierarchy structuring. Many other structures are useful. For example, a paging algorithm may be considered as a level for virtual memory when designing a computer architecture, and one can include a level of nearline storage between online and offline storage.Multithreading (computer architecture)
In computer architecture, multithreading is the ability of a central processing unit (CPU) (or a single core in a multi-core processor) to execute multiple processes or threads concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the processes and threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB).
Where multiprocessing systems include multiple complete processing units in one or more cores, multithreading aims to increase utilization of a single core by using thread-level parallelism, as well as instruction-level parallelism. As the two techniques are complementary, they are sometimes combined in systems with multiple multithreading CPUs and with CPUs with multiple multithreading cores.Register memory architecture
In computer engineering, a register–memory architecture is an instruction set architecture that allows operations to be performed on (or from) memory, as well as registers. If the architecture allows all operands to be in memory or in registers, or in combinations, it is called a "register plus memory" architecture.In a register–memory approach one of the operands for ADD operation may be in memory, while the other is in a register. This differs from a load/store architecture (used by RISC designs such as MIPS) in which both operands for an ADD operation must be in registers before the ADD.Examples of register memory architecture are IBM System/360, its successors, and Intel x86. Examples of register plus memory architecture are VAX and the Motorola 68000 family.SISD
In computing, SISD (single instruction stream, single data stream) is a computer architecture in which a single uni-core processor, executes a single instruction stream, to operate on data stored in a single memory. This corresponds to the von Neumann architecture.
SISD is one of the four main classifications as defined in Flynn's taxonomy. In this system, classifications are based upon the number of concurrent instructions and data streams present in the computer architecture. According to Michael J. Flynn, SISD can have concurrent processing characteristics. Pipelined processors and superscalar processors are common examples found in most modern SISD computers.Instructions are sent to the control unit from the Memory Module and are decoded and sent to the processing unit which processes on the data retrieved from Memory module and sents back to it.Single instruction, multiple threads
Single instruction, multiple thread (SIMT) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with multithreading.Temporal multithreading
Temporal multithreading is one of the two main forms of multithreading that can be implemented on computer processor hardware, the other being simultaneous multithreading. The distinguishing difference between the two forms is the maximum number of concurrent threads that can execute in any given pipeline stage in a given cycle. In temporal multithreading the number is one, while in simultaneous multithreading the number is greater than one. Some authors use the term super-threading synonymously.Von Neumann architecture
The von Neumann architecture—also known as the von Neumann model or Princeton architecture—is a computer architecture based on a 1945 description by the mathematician and physicist John von Neumann and others in the First Draft of a Report on the EDVAC. That document describes a design architecture for an electronic digital computer with these components:
A processing unit that contains an arithmetic logic unit and processor registers
A control unit that contains an instruction register and program counter
Memory that stores data and instructions
External mass storage
Input and output mechanismsThe word has evolved to mean any stored-program computer in which an instruction fetch and a data operation cannot occur at the same time because they share a common bus. This is referred to as the von Neumann bottleneck and often limits the performance of the system.The design of a von Neumann architecture machine is simpler than a Harvard architecture machine—which is also a stored-program system but has one dedicated set of address and data buses for reading and writing to memory, and another set of address and data buses to fetch instructions.
A stored-program digital computer keeps both program instructions and data in read-write, random-access memory (RAM). Stored-program computers were an advancement over the program-controlled computers of the 1940s, such as the Colossus and the ENIAC. Those were programmed by setting switches and inserting patch cables to route data and control signals between various functional units. The vast majority of modern computers use the same memory for both data and program instructions. The von Neumann vs. Harvard distinction applies to the cache architecture, not the main memory (split cache architecture).Word (computer architecture)
In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized piece of data handled as a unit by the instruction set or the hardware of the processor. The number of bits in a word (the word size, word width, or word length) is an important characteristic of any specific processor design or computer architecture.
The size of a word is reflected in many aspects of a computer's structure and operation; the majority of the registers in a processor are usually word sized and the largest piece of data that can be transferred to and from the working memory in a single operation is a word in many (not all) architectures. The largest possible address size, used to designate a location in memory, is typically a hardware word (here, "hardware word" means the full-sized natural word of the processor, as opposed to any other definition used).
Modern processors, including those in embedded systems, usually have a word size of 8, 16, 24, 32, or 64 bits; those in modern general-purpose computers in particular usually use 32 or 64 bits. Special-purpose digital processors, such as DSPs for instance, may use other sizes, and many other sizes have been used historically, including 9, 12, 18, 24, 26, 36, 39, 40, 48, and 60 bits. Several of the earliest computers (and a few modern as well) used binary-coded decimal rather than plain binary, typically having a word size of 10 or 12 decimal digits, and some early decimal computers had no fixed word length at all.
The size of a word can sometimes differ from the expected due to backward compatibility with earlier computers. If multiple compatible variations or a family of processors share a common architecture and instruction set but differ in their word sizes, their documentation and software may become notationally complex to accommodate the difference (see Size families below).