Sunday 21 October 2012

Nvidia GPU Architecture

Nvidia GPU Architecture


Each Nvidia GPU will have Streaming Multiprocessors (SM)

 
Each SM will have 8 Stream processors(SP) , two tow Special Fucntion Units (SFU) and one double precision FPU.
 
•8 SPs
• 2 Special Function Units (SFUs)
         – 4 FP multiply units —transcendental operations (e.g. sin) and interpolation
• 64-bit double-precision FPU
• MT issue unit —dispatches instructions to SPs and SFUs.
• Cache
           – Very small instruction cache.
            – Read-only data cache
• 16KB read/write shared memory.
• Multi-threaded instruction dispatch
           – 1 to 1024 threads active
          – Shared instruction fetch per 32 threads
         – Cover latency of texture/memory loads
 
FLOPS
30 Streaming Multiprocessor
          – 8 SPs – 1 mad (2 ops) per cycle per SP
          – 2 SFUs – 4 mul per cycle per SFU
• SP can dual-issue MAD and MUL operations in conjunction with SFU
                 – perform 3 floating point operations per clock cycle
• 1476 MHz (GTX285) or 1296 MHz(GTX280) clock for SM functional units
• Flops = 30 SMs * 8 SPs * 3 Ops/cycle * 1476 MHz = 1063 Gflops
• Flops without Dual-issue = 30 SMs * 8 SPs * 2 Ops/cycle * 1476MHz = 709 GFlops
Double precision performance
30 Streaming Multiprocessor
– 1 double-precision FPU – 1 double mad (2 Ops) per cycle
• Flops = 30 SMs * 1 FPU * 2 Ops/cycle * 1476 MHz = 88 GFlops

 
 

No comments:

Post a Comment