ppt on graphical processing unit/gpu


Presentation Description

GPU is used for separate dedicated graphics management & make 3d world as realistic as possible.


Presentation Transcript

Slide 1:


Why GPU? :

Why GPU? To provide separate dedicated graphics management To relieve some of the burden of the main system resources, namely the Central resources including a graphics processor and memory. Processing Unit , Main Memory, and the System Bus, which would otherwise get saturated with graphical operations and I/O requests. It is highly parallel, highly multithreaded multiprocessor optimized for visual computing .


WHAT IS GPU? Graphic al Processing Unit or GPU (occasionally called Visual Processing Unit by ATI ) Like the CPU (Central Processing Unit ) is a dedicated processor efficient at manipulating and displaying computer graphics , it is a single-chip processor. HOWEVER, The abstract goal of a GPU, is to o enable a representation of a 3D world as realistically as possible. So these GPUs are designed to provide additional computational power that is customized specifically to perform these 3D tasks.


BRIEF HISTORY: First-Generation GPUs: Up to 1998:Nvidia’s TNT2, ATi’s Rage, and 3dfx’s Voodoo3;DX6 feature set. Second-Generation GPUs: In 1999 -2000: Nvidia’s GeForce256 and GeForce2, ATi’s Radeon7500. Third-Generation GPUs: In 2001:GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB, DX7/8 Fourth-Generation GPUs : In 2002 onwards: GeForce FX family, Radeon 9700; OpenGL+extensions , DX9. Fifth-Generation GPUs : Ge Force 8X:DirectX10. Sixth generation GPUs: In 2004: Geforce 6800 with High dynamic Imaging &Pure Video capabilities. Seventh-Generation: Geforce G70 in 2005. Eighth-Generation : G80 in 2006 supports Direct3D . Ninth-Generation: G92 in Feb 2008/ Geforce 100 Series for better 3D Effect.


GPU ARCHITECTURE How many processing units? – Lots . How many ALUs? – Hundreds. Do you need a cache? – Sort of. What kind of memory? – very fast.

ARCHITECTURE OF 6th generation GPU -- GEFORCE 6800::

ARCHITECTURE OF 6 th generation GPU -- GEFORCE 6800: We start with six parallel vertex processors that receive data from the host (the CPU) and perform operations such as transformation and lighting(moving of 3d obj , visible to viewer,effect of light on objects). Next, the output goes into the triangle setup stage which takes care of primitive assembly, culling(cull) and clipping, and then into the rasterizer (vector gra -> raster,point sprite,alaising,&anti-alasing ) which produces the fragments. The Geforce 6800 has an additional Z-cull unit which allows to perform an early fragment visibility check based on depth, further improving the efficiency. We then move on to the sixteen fragment processors which operate in 4 parallel units and computes the output colors of each fragment. The fragment crossbar is a linking element that is basically responsible for directing output pixels to any available pixel engine (also called ROP, short for Raster Operator), thus avoiding pipeline stalls. The 16 pixel engines are the final stage of processing, and perform operations such as alpha blending, depth tests, etc., before delivering the final pixel to the frame buffer.

Vertex Processor (or vertex shader):

Vertex Processor (or vertex shader ) Allow shader(rendering effect) to be applied to each vertex. Transformation , lighting &other per vertex operations Allow vertex shader (a program to create 3d effect)to fetch textured data

Clipping, Z Culling and Rasterization:

Clipping, Z Culling and Rasterization Cull/clip – per primitive operation and data preparation for rasterization . Rasterization : primitive to pixel mapping Z culling : quick pixel elimination based on depth

Fragment processor and Texel pipeline:

Fragment processor and Texel pipeline Fragment : a candidate pixel Varying number of pixel pipelines Operates on quads – for texture LOD, computes the output colors of each pixel .They can take position, color, depth, fog arbitrary 4-dimensional attributes as input. SIMD processing hides texture fetch latency Texture caches

Z compare and blend/PIXEL ENGINES:

Z compare and blend/PIXEL ENGINES Depth testing Stencil tests Alpha operations Render final color to target buffer Z-cull unit which allows to perform an early fragment visibility check based on depth, further improving the efficiency.

Processing units:

Processing units Focus on Floating point math (GPGPU/GPGP) fp32 (full) and fp16 (partial) precision support for intermediate calculation. Format for High Dynamic Imaging from DX9. Dedicated fp16 normalization hardware MEMORY Use dedicated but standard memory architectures( eg DRAM) Multiple small independent memory partitions for improved latency Memory used to store buffers and optionally textures In low-end system (Intel 855GM) system memory is shared as the Graphics memory


Caches Texture caches (2 level) Shared between vertex processor and fragment processor Cache for processed filtered textures Vertex caches cache processed and unprocessed vertexes. improve computation and fetch performance. Z and buffer cache

System Interface & BUSES:

System Interface & BUSES GPU interfaces with the CPU using fast buses like AGP and PCI Express Port speeds: – PCI express up to 8GB/sec – AGP up to 2 GB/sec Such bus speeds are important because textures and vertex data needs to come from CPU to GPU (after that it's the internal GPU bandwidth that matters)


SOME APPLICATIONS…. Computer generated holography is done by using a graphics processing unit . Computer graphics in used games. Improve the performance of CAD tools. 3D image generating & processing


So long and thanks for all the fish

authorStream Live Help