Nvidia GPU Architecture
Each Nvidia GPU will have Streaming Multiprocessors (SM)
Each Nvidia GPU will have Streaming Multiprocessors (SM)
Each SM will have 8 Stream processors(SP) , two tow Special Fucntion Units (SFU) and one double precision FPU.
•8
SPs
•
2 Special Function Units (SFUs)
–
4 FP multiply units —transcendental operations (e.g. sin)
and interpolation
•
64-bit double-precision FPU
•
MT issue unit —dispatches instructions to SPs and SFUs.
•
Cache
–
Very small instruction cache.
–
Read-only data cache
•
16KB read/write shared memory.
•
Multi-threaded instruction dispatch
–
1 to 1024 threads active
–
Shared instruction fetch per 32 threads
– Cover latency
of texture/memory loads
FLOPS
•
30 Streaming Multiprocessor
–
8 SPs – 1 mad (2 ops) per cycle per SP
–
2 SFUs – 4 mul per cycle per SFU
•
SP can dual-issue MAD and MUL operations in conjunction
with SFU
–
perform 3 floating point operations per clock cycle
•
1476 MHz (GTX285) or 1296 MHz(GTX280) clock for SM functional
units
•
Flops = 30 SMs * 8 SPs * 3 Ops/cycle * 1476 MHz =
1063 Gflops
•
Flops without Dual-issue = 30 SMs * 8 SPs * 2 Ops/cycle * 1476MHz
= 709 GFlops
Double
precision performance
•
30 Streaming Multiprocessor
–
1 double-precision FPU – 1 double mad (2 Ops) per cycle
•
Flops = 30 SMs * 1 FPU * 2 Ops/cycle * 1476 MHz = 88 GFlops
No comments:
Post a Comment