Von Neuman Machine Reduced Instruction Set Computers Increasing Speeds Superscalar Architecture
A digital computer consists of an interconnected system of processors, memories, and input/output devices.
PROCESSORS
The CPU (Central Processing Unit) is the "brain" of the computer, which executes programs stored in memory. It does them in repeated cycles of fetch, decode, execute and store. The components are connected by a collection of parallel wires called a bus. This exists both in the CPU and outside it.
The CPU consists of the Control Unit which directs the operation of the ALU (Arithmetic Logic Unit) which performs arithmetic calculations and Boolean operations to compare values, and a small high-speed internal memory. This memory consists of many registers of different sizes and performing various functions. Each register can hold just one number.
A couple of important registers are the Program Counter (PC) which points to the next instruction to be fetched for execution and the Instruction Register (IR) which holds the instruction currently being executed. Since most registers are used for temporary storage, we will study them in more detail when we study memory as part of digital logic level. For a history of the development of the CPU, click below.
The basic design of the operation is known as a von Neumann machine. It has five basic parts, the memory, the arithmetic logic unit, the control unit, and the input and output equipment. The data path consists of 1 to 32 registers which feed into ALU input registers. The ALU performs addition, subtraction, and other simple operations and stores the result in the output register. This can then, if necessary, be stored in memory.
The CPU executes each instruction in a series of small steps, known as fetch-decode-execute which are.
1. Fetch the next instruction from memory into the
Instruction Register (IR)
2 Change the Program Counter (PC) to
point to the following to point to the following instruction.
3. Determine the type of instruction just fetched.
4. If the instruction uses a word in memory, determine where
it is.
5. Fetch the word, if needed, into a CPU register.
6. Execute the instruction.
7. Return to Step 1 and repeat.
von Neumann pioneered this concept. For information on him, please click below.
In the early days of the computer, the computers were expensive, while people were comparatively poorly paid. So it made sense to make computers simple and do most of the operations by complicated software. This is known as CISC (Complex Instruction Set Computer). Now, on the other hand, the price of computers has come down, and people are highly paid. It now makes sense to write a simple code and let the hardware do the complicated operations. This is known as RISC (Reduced Instruction Set Computer). RISC consists of a small number of simple instructions that execute in one cycle of the data path. RISC instructions are not interpreted. Although RISC has many advantages, CISC has not been overwhelmed because of backward compatibility and the fact that some of the RISC ideas have been incorporated into CISC.
1. All instructions are directly executed by hardware, without interpretation by microinstructions. This provides high speeds.
2. Maximize the rate at which instructions are issued. Multiple instructions are executed simultaneously using parallel processing. If two or more instructions use the same register, then the register should clear the first instruction before the second is encountered.
3. Instructions should be easy to decode. They should be regular, of fixed length, a small number of fields, and have fewer different formats.
4. Only loads and stores should reference memory. Access to memory can take a long time and the delay is unpredictable. So overlap these instructions with others.
5. Provide plenty of (at least 32) registers. Running out of registers means that words must be placed in memory until required for reuse. This takes additional time
Nothing can travel faster than Light. The speed of light is 186,000 miles per second or or about 1 foot per nano second (1 billionth second). So it takes 1 nanosecond for electricity to move 1 foot. So other means are found to reduce time.
Pipelining: The principle is similar to a conveyor belt.
We divide into units called stages. An example with five stages is
shown below.
|
Instruction Fetch Unit |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
|
Instruction Decode Unit |
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
|
Operand Fetch Unit |
|
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|
Instruction Execution Unit |
|
|
|
1 |
2 |
3 |
4 |
5 |
6 |
|
Write Back Unit |
|
|
|
|
1 |
2 |
3 |
4 |
5 |
|
Time |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
In the first time frame, the first instruction is fetched. In the second time frame, while it is being decoded, a second instruction is fetched. In the third frame, the first instruction's operand is fetched, the second is decoded, while a third instruction is fetched. In et fourth time frame, the first instruction is executed, the second instruction is decoded, the third instruction's operand is etched, while a fourth instruction s fetched. In the fifth time frame, all stages are active. Now the first instruction is written back, the second is executed, the third's operand is fetched, the fourth is decoded, and a fifth instruction is fetched. From this time all stages are active as shown in the table, until at the end when they are phased down. This work fine if all instructions are in sequence. Problems are encountered when we have selections and loops as then we have to branch out. These types of instructions have to be solved differently and take time.
Two pipelines in parallel can work twice as fast. In one version, the main pipeline called the u pipeline could execute a arbitrary Pentium instruction, while the second pipeline called the v pipeline would execute only simple integer instructions.
Array processor consists of a large number of identical processors that perform the same sequence of instructions on different sets of data.
Vector processor is similar but all additions are performed in a single heavily pipelined adder. They only work well on problems requiring the same computation to be performed on many data sets simultaneously.
Multiprocessors is a system with more than one CPU sharing a common memory, so they must coordinate. One way is to have a single bus with multiple CPUs and a memory plugged into it. Or each processor can have its own local memory not accessible to others. Since the main bus is not used, its traffic is reduced.
Multicomputers are a large number of interconnected computers each having its own memory with no common memory. The CPUs send messages like an extremely fast e-mail. For larger systems topologies such as 2D and 3D grids, trees and rings are used.
Multiprocessors are easier to program while multicomputers are easier to build. A hybrid system incorporating the benefits of each are researched.
For information of memory, click: Memory
For information on Input/output click: Input/Output