HPC : Nvidia GPU Architecture

Sunday, 21 October 2012

Nvidia GPU Architecture

Each Nvidia GPU will have Streaming Multiprocessors (SM)

Each SM will have 8 Stream processors(SP) , two tow Special Fucntion Units (SFU) and one double precision FPU.

•8 SPs

• 2 Special Function Units (SFUs)

– 4 FP multiply units —transcendental operations (e.g. sin) and interpolation

• 64-bit double-precision FPU

• MT issue unit —dispatches instructions to SPs and SFUs.

• Cache

– Very small instruction cache.

– Read-only data cache

• 16KB read/write shared memory.

• Multi-threaded instruction dispatch

– 1 to 1024 threads active

– Shared instruction fetch per 32 threads

– Cover latency of texture/memory loads

FLOPS

• 30 Streaming Multiprocessor

– 8 SPs – 1 mad (2 ops) per cycle per SP

– 2 SFUs – 4 mul per cycle per SFU

• SP can dual-issue MAD and MUL operations in conjunction with SFU

– perform 3 floating point operations per clock cycle

• 1476 MHz (GTX285) or 1296 MHz(GTX280) clock for SM functional units

• Flops = 30 SMs * 8 SPs * 3 Ops/cycle * 1476 MHz = 1063 Gflops

• Flops without Dual-issue = 30 SMs * 8 SPs * 2 Ops/cycle * 1476MHz = 709 GFlops

Double precision performance

• 30 Streaming Multiprocessor

– 1 double-precision FPU – 1 double mad (2 Ops) per cycle

• Flops = 30 SMs * 1 FPU * 2 Ops/cycle * 1476 MHz = 88 GFlops

HPC