|
|
|
|
|
|
|
The next level is the microarchitecture level which implements the Instruction Set Architecture (ISA), the level above it. This contains a microprogram which fetches, decodes and executes instructions (in this case Integer Virtual Java Machine instructions). Each instruction has one or two fields to serve some specific purpose. The first is the opcode (short for operation code), which identifies the instruction such as ADD or BRANCH. The second field specifies the operand (which variable).
For more information of the Java Virtual Machine click below:
The data path contains the ALU, its inputs, and its outputs. It also contains a number of 32-bit registers. There are three busses, "A", "B", and the "C" bus. The ALU has six inputs, F0, F1, ENA (Enable A), ENB (Enable B), INVA (Inverse A), and INC (Increment). We can get up to 26 or 64 possible outputs, such as A, B, NOT A, NOT B, A + B, A + B + 1, A + 1, B + 1, B - A, B - 1, -A, A AND B, A OR B, 0, 1, AND -1.
A short pulse, derived from the main clock, is produced at the start of each clock cycle. On the falling edge of the pulse, the bits that drive all the gates are set up in finite and known time. Then the register needed on the B bus is selected and driven onto the B bus. After this is stable, the ALU and shifter begin operation on valid data. After that is stable, the results are propagated along the C bus to the registers, where they can be loaded on the rising edge of the next pulse.
The machine has two different ways to communicate with memory: a 32-bit, word-addressable memory port, controlled by two registers, MAR (Memory Address Register) and MDR (Memory Data Register), and an 8-bit, byte-addressable memory port controlled by one register, PC (Program Counter), which reads 1 byte into the low-order 8 bits of MBR (Memory Buffer Register). This port can only read data from memory; it cannot write data to memory.
All registers are driven by one or two control signals. MAR does not have a connection to B bus, only to main memory. H is connected through A bus to ALU. Fore more details of registers click below:
We need 29 signals to control the data path. They are:
9 Signals to control writing data from the C bus into registers.
They are:
H (Holding register), OPC and TOS (Extra registers),
CPP (Constant Pool Pointer), LV(Local Variable), SP(Stack
Pointer), PC, MDR, and MAR.
9 Signals to control enabling registers onto the B bus for ALU input.
They are:
MDR, PC, MBR, MBRU(An additional signal from MBR),
SP, LV, CPP, TOS, and OPC.
8 Signals to control the ALU and shifter functions. They are:
SLL8(Shift Left Logical 8 bits), SRA1(Shift Right
Arithmetic 1 bit), F0 and F1 (Control Signals), ENA,
ENB, INVA, INC
2 Signals to indicate memory read/write via MAR/MDR.
1 Signal to indicate memory fetch via PC/MBR.
With a small increase in circuitry, we can reduce the number of bits needed to select among the possible sources for driving the B bus. We can encode B bus information in 4 bits and use a decoder to generate 24 or 16 control signals, 7 of which are not needed. So the total number of signals required is now 24 which is the microinstruction format Mic-1..
The table below shows the microinstruction format for Mic-1.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
NEXT_ADDRESS contains the address of a potential next microinstruction. It has 9 bits. So, 29 or 512 addresses can be located. JAM determines how the next microinstruction is selected. The signals are: JMPC (Jump to Micro Program Counter), JAMN(Jump if negative) and JAMZ(Jump if zero). ALU are ALU and shifter functions. Mem is Memory functions. They are WRITE, READ, and FETCH.
The largest and most important item in the control portion of the machine is a memory called the control store, which holds the complete microprogram. It contains 512 words, each of 36-bit microinstruction.
All programming languages support the concept of procedures (also called methods, functions, or subroutines), which have local variables, which are accessed from inside the procedures, but cease to exist once the procedures have returned. Where should these variables be kept in memory? Giving each an absolute memory address does not work in case of recursive methods which can call themselves. Even if it is called twice, the second invocation will interfere with the first.
So, an area of memory called the stack is reserved for variables, but individual variables do not get absolute addresses in it. Instead, a register LV is set to point to the base of the local variables for the current procedure. The figures below explain the process.
|
|
|
|
|
SP ->
|
|
108 |
|
|
104 | |
|
LV ->
|
|
100 |
While A is active.
|
|
|
|
SP ->
|
|
|
|
|
|
|
|
|
LV ->
|
|
|
|
|
|
|
|
|
|
After A calls B
|
|
|
|
SP ->
|
|
|
LV ->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
After B calls C
|
|
|
|
SP ->
|
|
|
|
|
|
|
|
|
|
|
|
LV ->
|
|
|
|
|
|
|
|
|
|
After C and B return and A calls D
In addition to holding local variables, stacks can also be used to hold
operands during the computation of an arithmetic operation, in which case
it is referred to as the operand stack. The figures below
shows how a1 = a2 + a3 is accomplished.
|
|
|
|
|
|
|
|
|
|
SP ->
|
|
||||||
|
SP ->
|
|
|
SP ->
|
|
|||
|
|
|
|
SP ->
|
|
|||
|
|
|
|
|
||||
|
LV ->
|
|
LV ->
|
|
LV ->
|
|
LV ->
|
|
The IJVM (Integer Java Virtual Machine) Memory Model
It consists of a memory that can be viewed as 4,294,967,296 bytes (4 GB) or 1,073,741,824 words each of 4 bytes. The areas of memory are:
1. The Constant Pool, which cannot be written by an IJVM program and consists of constants, strings, and pointers to other areas of memory that can be referenced. It is loaded when the program is brought into memory and cannot be changed later.
2. The Local Variable Frame stores variables during the lifetime of a method. The parameters (or arguments) are at the beginning of the frame. The operand stack is separate.
3. The operand Stack guaranteed not to exceed a certain size, computed in advance by the Java compiler, and is allocated directly above the local variable frame. The stack pointer, SP, changes during the execution of the method, unlike CPP and LV.
4. The Method Area is a region of memory containing
the program, referred to as the "text" area in a UNIX process. This
is a byte area and the pointer is the Program Counter, PC.
|
|
<- SP | ||||
|
|
<- LV | ||||
|
|
|||||
|
|
<- CPP |
|
Method Area | <- PC |
The IJVM Instruction Set
|
|
|
|
|
|
BIPUSH byte | Push byte onto stack |
|
|
DUP | Copy top word on stack and push onto stack |
|
|
GOTO offset | Unconditional branch |
|
|
IADD | Pop two words from stack; push their sum |
|
|
IAND | Pop two words from stack; push Boolean AND |
|
|
IFEQ offset | Pop word from stack and branch if it is zero |
|
|
IFLT offset | Pop word from stack and branch if it is less than zero |
|
|
IF_ICMPEQ offset | Pop two words from stack; branch if equal |
|
|
IINC varnum const | add a constant to a local variable |
|
|
ILOAD varnum | Push local variable onto stack |
|
|
INVOKEVIRTUAL disp | Invoke a method |
|
|
IOR | Pop two words from stack; push Boolean OR |
|
|
IRETURN | Return from method with integer value |
|
|
ISTORE varnum | Pop word from stack and store in local variable |
|
|
ISUB | Pop two words from stack; push their difference |
|
|
LDC_W index | Push constant from constant pool onto stack |
|
|
NOP | Do nothing |
|
|
POP | Delete word on top of stack |
|
|
SWAP | Swap the top two words on the stack |
|
|
WIDE | Prefix instruction; next instruction has a 16-bit index |
Consider the java program:
i = j + k;
if (i == 3)
k = 0;
else
j = j - 1
The IJVM program is
1.
ILOAD j
// i = j + k
0x15 0x02
2.
ILOAD k
0x15 0x03
3.
IADD
0x60
4.
ISTORE i
0x36 0x01
5.
ILOAD i
// if (i == 3)
0x15 0x01
6.
BIPUSH 3
0x10 0x03
7.
IF_ICMPEQ L1
0x9F 0x00 0x0D
8.
ILOAD j
// j = j - 1
0x15 0x02
9.
BIPUSH 1
0x10 0x01
10.
ISUB
0x64
11.
ISTORE j
0x36 0x02
12.
GOTO L2
0xA7 0x00 0x0F
13. L1: BIPUSH 0
// k = 0
0x10 0x00
14.
ISTORE k
0x36 0x03
15. L2:
First j and k are pushed on onto the stack, added, and the result stored in i. Then i and the constant 3 are pushed onto the stack and compared. If they are equal, a branch is taken to L1, where k is set to 0. if they are unequal, the code following IF_ICMPEQ is executed. When it is done, it branches to L2, where the then and else parts merge.
DESIGN OF THE MICROARCHITECTURE LEVEL
Speed versus cost
There are three basic approaches for increasing the sped of execution:
1. Reduce the number of clock cycles needed to execute an instruction, known as the path length. It can be shortened by adding specialized hardware, such as an incrementer (an adder with one side permanently wired to add 1). This means more hardware. Reducing the number of cycles to fetch instructions requires an additional circuit to increment the PC.
2. Simplify the organization so that the clock cycle can be shorter.
3. Overlap the execution of instructions offers the most opportunity for dramatic increases in speed such a s parallel processing.
Cost was a count of the number of components when processors were built of discrete components that were purchased and assembled. Today, the entire processor exists on a single chip, and bigger, more complex chips are more expensive than smaller, simpler ones. Manufacturing cost of the chip grows faster than its area (measured in pico-acres). The system designer must decide whether greater speed is worth the additional cost.
An instruction fetch unit
For every instruction the following operations occur:
1. The PC is passed thorough the ALU and incremented.
2. The PC is used to fetch the next byte in the instruction
stream.
3. Operands are read from memory.
4. Operands are written to memory.
5. The ALU does a computation and the results are
stored back.
Fetching and assembling a field ties up the ALU for at least one cycle per byte to increment the PC, and then again to assemble the resulting index or offset. In order to overlap the main loop, the ALU must be freed from some of these tasks. This can be done by introducing a second, but not full ALU. Or cycles can be eliminated by introducing additional data paths not going through the ALU. The main path can be eliminated by branching directly to the next instruction at the end of the previous instruction.
Cache memory
A cache holds the most recently used memory word in a small, fast memory, speeding up access to them. If a large enough percentage of the memory words needed are in the cache, the effective memory latency can be reduced. Multiple caches can be used as a separate cache for instructions and data known as split cache.
Direct-mapped cache
Each entry (row) in the cache can hold exactly one cache line from main memory. With a 32-bit cache line size, the cache can hold 64KB. Each cache entry consists of three parts:
1. The Valid bit indicates whether there is any valid
data in this entry or not. When the system is booted (started), all
entries are marked as invalid.
2. The Tag field consists of a unique, 16-bit value
identifying the corresponding line of memory from which the data came.
3. The Data field contains a copy of the data in
memory. This field holds one cache line of 32 bytes.
In a direct-mapped cache, a given memory word can be stored in exactly one place within the cache. Given a memory address, there is only one place to look for it in the cache. If it is not there, it is not in the cache. For storing and retrieving data from the cache, the address is broken into four components:
1. The TAG field corresponds to the Tag bits stored
in cache entry.
2. The LINE field indicates which cache entry holds
the corresponding data, if they are present.
3. The WORD field tells which word within a line
is referenced.
4. The BYTE field is used only if a single byte is
requested. It tells which byte within the word is needed. For
a cache supplying only 32-bit words, this field will always be 0.
Set-associative cache
This allows two or more lines in each entry. A cache with n possible entries for each address is called an n-way set-associative cache. It is more complicated than a direct-mapped cache as a set of n cache entries must be checked to see if the needed line is present. When a new entry is to be brought into the cache, which of the present items should be discarded. Not being able to see the future, LRU (Least Recently Used) is the one discarded.
Branch Prediction
Programs are not linear code sequences, but full of branch instructions. As an example, if i is compared to 0, and the value of k depends on the result. The program is:
if (i == 0)
k = 1;
else
k = 2;
In assembly language, it is:
CMP i, 0
; compare i to 0
BNE Else
; branch to Else if not equal
Then:
MOV k, 1 ;
move 1 to k
BR Next
; unconditional branch to Next
Else:
MOV k, 2
; move 2 to k
Next:
The first way is to allow instructions fetched after a predicted conditional branch to execute until they try to change the machine's state (store into a register). The second way is to record the value of any register about to be overwritten (in a secret scratch register), so the machine can be rolled back to the state it had at the time of the mispredicted branch.
Dynamic Branch Prediction
Accurate predictions allow the CPU to proceed at full speed. The CPU can maintain a history table (in special hardware), in which it logs conditional branches as they occur, so they can be looked up when they occur again. The history table contains one entry for each conditional branch instruction. The entry contains the address of the branch instruction along with a bit telling whether it was taken the last time it was executed. The prediction is that the branch will go the same way it went last time. If the prediction is wrong, the bit in the history table is changed.
Static Branch Prediction
Dynamic branch predictions require specialized and expensive hardware
and complex chips. The compiler can help. For instance, with
a statement like for (i = 0; i < 1000000; i++), the compiler knows
the branch at the of the loop will always be taken.
|
|
|
|