Fundamentals of computer design¶
约 1193 个字 37 行代码 1 张图片 预计阅读时间 5 分钟
Introduction¶
Classer of computers¶
-
Desktop computers (or personal computers .aka. PC)
- General-purpose,vatiety of software
- Emphasize good performance for a single user at relatively low cost
- Mostly execute third-party software(第三方软件)
-
Server Computers
- Emphasize great performance for a few complex applications
- Or emphasize reliable performance for many users at once
- Greater computing power, storage, or network capacity than personal computers
-
Embedded computers
- Largest class and most diverse
- Hidden as components of systems
- Stringent power, cost, and size constraints,performance.
-
Personal Mobile Devices
- Such as smartphones, tablets/ipad, and wearables
- generally have the same design requirements as PCs
-
Supercomputer
- Computer cluster(集群)
- High capacity,prformance, and reliability
- Range to building size(即大型机)
Class By Flynn¶
福林分类法
- SISD: Single Instruction Stream, Single Data Stream
- SIMD: Single Instruction Stream, Multiple Data Stream
- MISD: Multiple Instruction Stream, Single Data Stream
- MIMD: Multiple Instruction Stream, Multiple Data Stream
definition
- IS: Instruction Stream
- DS: Data Stream
- CS: Control Stream
- CU: Control Unit
- PU: Processing Unit
- MM&SM: Memory Module & Storage Module
See as follows:

Performance¶
There are a lot of factors that can affect the performance:
- Architecture
- Hardware implementation
- Compiler
- Operating system
- Application
Two major performance metrics:
- Latency: The time it takes to complete a task(Response time)
- Throughput: The rate at which a task can be completed (Bandwidth)
We define the performance as:
性能与执行时间成反比
Quantitative approaches¶
- Bit
- Nibble: 4 bits
- Byte: 8 bits
- Word: 32 bits on many embedded systems
- Word: 64 bits on most desktop and server systems
- Kibibyte: \(2^{10}\) bytes,KB
- Mebibyte: \(2^{20}\) bytes,MB
- Gibibyte: \(2^{30}\) bytes,GB
- Tebibyte: \(2^{40}\) bytes,TB
CPU Performance¶
CPU time example
- Computer A: 2GHz clock, 10s CPU time
- Designing Computer B
- Aim for 6s CPU time
- Can do faster clock, but causes 1.2 × clock cycles
To determine how fast Computer B's clock must be:
Calculate Clock Cycles for Computer A:
Calculate Clock Rate for Computer B:
CPI: Cycles Per Instruction is the average number of clock cycles per instruction.
some CPU time could also be written as:
The Big Picture:
Performance depence on:
- Algorithm:affects IC,CPI
- Programming language: affects IC,CPI
- Compiler: affects CPI,IC
- Instruction set architecture: affects CPI,IC,T
Amdahl's Law¶
Amdahl's Law
Amdahl's Law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used.
Amdahl's Law depends on two factors:
- The fraction of the time the enhancement can be exploited.
- The improvement gained by the enhancement while it is exploited.
即部分改进的执行时间除以改进的倍数,加上未改进的执行时间,等于改进后的执行时间。
So based on Amdahl's Law, we can get the following formula:
from the formula above, we can conclude that:
- The overall speedup is limited by the enhancement that is used the least.
- If Fraction is close to 1, the overall speedup is close to the speedup of the enhancement.
- If speedup of the enhancement is close to infinity, the overall speedup is close to 1/(1-Fraction).Which ,in fact ,is hard to achieve.
Amdahl's Law Example
- Original Execution Time: 10 seconds
- Enhancement: 2x speedup
- Fraction of Execution Time Affected: 0.5 The execution time after enhancement is:
The overall speedup is:
Great Architecture ideas¶
Moore's Law¶
Moore's Law is a principle that states that the number of transistors on a microchip doubles approximately every two years.
Use abstraction to simplify design¶
Abstraction is a technique that allows us to simplify a complex system by hiding unnecessary details.Low-level details are hidden from the higher-level components.
Make the common case fast¶
- Identify the common case and try to improve it.
- Most cost-efficient method to obtain improvements.
Improve performance via parallelism¶
- Improve performance by performing operations in parallel.
- There are many levels of parallelism—instruction-level, process-level, etc.
Improve performance via pipelining¶
- Break tasks into stages so that multiple tasks can be simultaneously performed in different stages.
- Commonly used to improve instruction throughput.
Improve performance via prediction¶
- Sometimes faster to assume a particular result than waiting until the result is known.
- Known as speculation and is used to guess results of branches.
Use a hierarchy of memories¶
- Make the fastest, smallest, and most expensive per bit memory the first level accessed and the slowest, largest, and cheapest per bit memory the last level accessed.
- Allows most accesses to be caught at the first level and be able to retain most of the information at the lowest level.
Improve dependability via redundancy¶
- Include redundant components that can both detect and correct failures.
- Used at many different levels.
Instruction Set Architecture¶
instruction set architecture(ISA) is the set of instructions that a processor can execute.
when designing an ISA, we need to consider the following basic principles:
-
Compatibility:
- The ISA should be compatible with the existing software.
-
Versatility:
- The ISA should be versatile enough to support a variety of applications.
-
High efficiency:
- The ISA should be efficient in terms of the number of instructions and the amount of data that can be processed in a single instruction.
-
Security:
- The ISA should be secure enough to prevent attacks.
Instruction Set design issues¶
- Where are operands stored?: registers,memory,stack,accumulator,etc.
- How many operands are needed?: 0,1,2,3,etc. Classification of instructions
- How is the operand location specified?: immediate,direct,indirect,relative,etc. Addressing modes
- What type and size of operands are needed?: integer,floating-point,vector,etc. Data representation
- What operations are supported?: arithmetic,logic,control,etc. Types of instructions
ISA Classes
- Stack architecture
- Accumulator architecture
- Register architecture
- General-purpose registers architecture(GPR)
对于A*B-(A+C*B)
; Stack architecture
PUSH A
PUSH B
MUL ; A * B
PUSH A
PUSH C
PUSH B
MUL ; C * B
ADD ; A + (C * B)
SUB ; (A * B) - (A + C * B)
; Accumulator architecture
LOAD A ; Acc = A
MUL B ; Acc = A * B
STORE TEMP ; Store result in TEMP
LOAD B ; Acc = B
MUL C ; Acc = C * B
ADD A ; Acc = A + (C * B)
STORE TEMP2 ; Store result in TEMP2
LOAD TEMP ; Acc = (A * B)
SUB TEMP2 ; Acc = (A * B) - (A + C * B)