stream computing

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

STREAM COMPUTING : 

..... A PROGRAMMING PARADIGM STREAM COMPUTING

Slide 2: 

Report submitted by: Sujat Khan 7th Semester Department of Computer science & Engineering Registered No: 0601288069 In partial fulfillment for award of the degree of Bachelor Of Technology in Computer Science & Engineering. Biju Pattanaik University of Technology, Orissa.

TOPICS UNDER DISCUSSION : : 

TOPICS UNDER DISCUSSION : Introduction Computing Stream Computing Need for stream computing Characteristics of stream computing Difference between stream computing & computation on CPU Evolution of processors Stream processors

Slide 4: 

Enabling technologies Stream Processor Architecture Processing stages in stream processing systems StreamIt Language Overview Filters as computational elements Applications Extracting Linear Representation Combining Linear Filters Linear optimization of Stream graph Development Support StreamIt Development Tool StreamIt Graphical Editor StreamIt Debugging Environment Conclusion

Introduction : 

Introduction What is stream computing exactly ? “Stream computing is a programming paradigm that models a computer program as a stream of data between several processing units, rather than as an implemented algorithm processing data” This is the general definition, but to achieve a perfect understanding of the topic we need to learn a lot more than just sticking to the topic. We need to analyze lot of terms that appear in the discussion . While there is a lack of precise scientific definitions for many of these terms, general definitions can be given.

Slide 6: 

Computing: Computing can be described as any activity of using and/or developing computer hardware and software. It includes everything that sits in the bottom layer, i.e. everything from raw compute power to storage capabilities. Stream processing is a computer programming paradigm, related to SIMD that allows some applications to more easily exploit a limited form of parallel processing. Such applications can use multiple computational units, such as the floating points on a GPU without explicitly managing allocation, synchronization, or communication among those units.

Stream Computing : : 

Stream Computing : Stream computing (or stream processing) refers to a class of compute problems, applications or tasks that can be broken down into parallel, identical operations and run simultaneously on a single processor device. These parallel data streams entering the processor device, computations taking place and the output from the device define stream computing. Today, stream computing is primarily the realm of the graphics processor unit (GPU) where the parallel processes used to produce graphics imagery are used instead to perform arithmetic calculations.

Slide 8: 

In stream computing, advanced software algorithms analyze the data as it streams in. Text, voice and image-recognition technology, for example, can be used to determine that some data is more relevant to a particular problem than others. The priority data is then shuttled off into a program tailored to work on complex, fast-changing problems like tracking an epidemic and predicting its spread, or culling data from electronic sensors in a computer chip plant to quickly correct flaws in manufacturing.

Need for stream computing : : 

Need for stream computing : The stream processing paradigm simplifies parallel software and hardware by restricting the parallel computation that can be performed. Given a set of data (a stream), a series of operations (kernel functions) are applied to each element in the stream. Uniform streaming, where one kernel function is applied to all elements in the stream, is typical. Kernel functions are usually pipelined, and local on-chip memory is reused to minimize external memory bandwidth. Since the kernel and stream abstractions expose data dependencies, compiler tools can fully automate and optimize on-chip management tasks. Stream processing hardware can use score boarding, for example, to launch DMAs at runtime, when dependencies become known. The elimination of manual DMA management reduces software complexity, and the elimination of hardware caches reduces the amount of die area not dedicated to computational units such as ALUs.

Slide 10: 

Characteristics of stream computing: Enable new applications on new architecture. Parallel problems other than graphics that map well on GPU architecture . Transition from fixed function to programmable pipelines. Various proof points in research and industry under the name GPGPU . Data dependencies and parallelism. A great advantage of the stream programming model lies in the kernel defining independent and local data usage. Kernel operations define the basic data unit, both as input and output. This allows the hardware to better allocate resources and schedule global I/O. Although usually not exposed in the programming model, the I/O operations seems to be much more advanced on stream processors (at least, on GPUs). I/O operations are also usually pipelined by themselves while chip structure can help hide latencies.

Slide 11: 

Definition of the data unit is usually explicit in the kernel, which is expected to have well-defined inputs (possibly using structures, which is encouraged) and outputs. In some environments, output values are fixed (in GPUs for example, there is a fixed set of output attributes, unless this is relaxed). Having each computing block clearly independent and defined allows to schedule bulk read or write operations, greatly increasing cache and memory bus efficiency.

Slide 12: 

How does stream computing differ from computation on CPU? Stream computing takes advantage of a SIMD methodology (single instruction, multiple data) whereas a CPU is a modified SISD methodology (single instruction, single data); modifications taking various parallelism techniques into account. The benefit of stream computing stems from the highly parallel architecture of the GPU whereby tens to hundreds of parallel operations are performed with each clock cycle whereas the CPU can at best work only a small handful of parallel operations per clock cycle.

Slide 13: 

Flynn's classification of architecture: 1. SISD (single instruction stream, single data stream) Corresponds to usual Von Neumann architecture. Single CPU executes one instruction at a time (single instruction stream) and fetches/stores one data value at a time (single data stream) 2. SIMD (single instruction stream, multiple data stream) Executes one instructions at a time (single instruction stream) same operation is performed on many data values at the same time (multiple data stream) these are the so-called ' vector machines ', such as CDC 6600 / 7600 / Cyber machines A vector operation with n elements can be performed in one instruction cycle on a SIMD architecture 3. MISD (multiple instruction, single data stream) Multiple programs, operating on same data (performing different computations). No MISD machines exist at this point. 4.MIMD (multiple instruction stream, multiple data stream) These are multiprocessor systems. Each processor can execute a different program on its own data hence, multiple instruction streams (programs) and multiple data streams.

Slide 14: 

Both SIMD and MIMD are parallel processing architectures since the processors execute operations in parallel. They are, by default, multiprocessor architectures, which can be subdivided into two categories. 1. Global memory architectures: common, global memory is shared by all processors 2. Local memory architectures one local memory per processor Global memory MIMD architectures are also known as tightly-coupled multiprocessor systems; local memory MIMD systems are known as loosely-coupled multiprocessor systems.

Slide 15: 

Evolution of Processors : The evolution of processing units can be very similar to the evolution of man as illustrated below with the following pictures: CENSORED

Slide 17: 

Stream processors : The widely available varieties of processors in the market today are those that are produced by two giant manufacturers: Intel AMD I.B.M was the first to introduce a high-performance computer system that is intended to rapidly analyze data as it streams in from many sources, increasing the speed and accuracy of decision making in fields as diverse as security surveillance and Wall Street trading. The system called as System S, was developed for faster data handling and analysis in business and science, and the growing flood of information in digital form, including Websites, blogs, e-mail, video and news clips, telephone conversations, transaction data and electronic sensors.

Slide 18: 

The initial system runs on about 800 microprocessors, though it can scale up to tens of thousands as needed. The most notable step lies in the System S software, which enables software applications to split up tasks like image recognition and text recognition, and then reassemble the pieces of the puzzle into an answer. Apart from IBM, the other stream processors developed by AMD revolve around the concept of utilizing realm of the graphics processor unit (GPU) where the parallel processes used to produce graphics imagery are used instead to perform arithmetic calculations. Some of the commonly developed stream processors developed by AMD are: FireStream 9170 FireStream 9250

Slide 19: 

AMD's Fire Stream™ 9170 The latest generation stream computing GPU, features enabled: 320 stream cores (compute units or ALUs) 2GB on-board GDDR3 memory Double precision floating point support PCIe 2.0 x16 interface Product Advantages : Only company positioned to offer a unique platform with strengths in accelerated GPU as well as CPU computing Stream computing today leading to fusion tomorrow AMD's open systems SDK approach: CTM initiative

Slide 20: 

Deliver high level, multi-targeted compilers through Brook, 3rd parties like RapidMind, and partnerships with universities and industry. Deliver library functions through AMD's ACML, APL, Cobra, and through university partner program. Stream Processor architecture :

Slide 21: 

Processing stages in Stream Processing systems :

Slide 22: 

StreamIt: StreamIt is a programming language and a compilation infrastructure, specifically engineered for modern streaming systems. It is designed to facilitate the programming of large streaming applications, as well as their efficient and effective mapping to a wide variety of target architectures, including commercial-off-the-shelf uniprocessors , multicore architectures, and clusters of workstations. Stream computing is an effort to deal with two issues: the need for faster data handling and analysis in business and science, and the growing flood of information in digital form, including Web sites, blogs, e-mail, video and news clips, telephone conversations, transaction data and electronic sensors.

Slide 23: 

StreamIt Language Overview : StreamIt is an architecture-independent language for streaming applications. It adopts the Cyclo-Static Dataflow [1] model of computation which is a generalization of Synchronous Dataflow . StreamIt programs are represented as graphs where nodes represent computation and edges represent FIFO-ordered communication of data over tapes. The basic programmable unit in StreamIt is a filter. Each filter contains a work function that executes atomically, popping (i.e., reading) a fixed number of item from the filters input tape and pushing (i.e., writing) a fixed number of items to the filters output tape.

Slide 24: 

Filter as computational elements : Filters are the programmable units. An initialization function and a steady sate work fuction.communicate via FIFO’s:pop(),peek(index),push(value) float→float filter FIR (int N) { float[N] weights; init { weights = calculate_weights(N } work push 1 pop 1 peek N { float result = 0; for (int i = 0; i < N; i++) { result += weights[i] * peek(i); } push(result); pop();}}

Slide 25: 

Applications : Stream processing is especially suitable for applications that exhibit three application characteristics. Compute Intensity the number of arithmetic operations per I/O or global memory reference. In many signal processing applications today it is well over 50:1 and increasing with algorithmic complexity . Data Parallelism exists in a kernel if the same function is applied to all records of an input stream and a number of records can be processed simultaneously without waiting for results from previous records. Data Locality is a specific type of temporal locality common in signal and media processing applications where data is produced once, read once or twice later in the application , and never read again. Intermediate streams passed between kernels as well as intermediate data within kernel functions can capture this locality directly using the stream processing programming model.

Slide 26: 

Extracting Linear Representation : Resembles constant propagation Maintains linear form v, b for each variable Peek expression: generate fresh v Push expression: copy v into A Pop expression: increment o Combining Linear Filters : Pipelines and split joins can be collapsed .For example, the figure given aside describes about the Combinational example.

Slide 27: 

Linear optimization of Stream graph : StreamIt Development tool : The StreamIt Development Tool (SDT) features many aspects of an IDE, including a text editor and a debugger. The SDT graphically represents StreamIt programs, and preserves hierarchical information to allow an application engineer to focus on the parts of the stream program that are of interest. In addition, the SDT can track the flow of data between filters and most importantly, it provides a deterministic mechanism to debug parallel streams. The SDT is implemented in Java as an Eclipse [3]plug-in.

Slide 28: 

StreamIt graphical editor : It is implemented in Java as an Eclipse plug in ,and intended for developing, debugging, and visualizing programs written in StreamIt. A StreamIt program can be visually depicted as a hierarchical directed graph of streams, with graph nodes representing streams and graph edges representing tapes or channels. The containers are rendered according to the code declarations, and the visualization tools in the SDT allow the user to selectively collapse and expand containers. This is useful in large streams where the application developers are only interested in visualizing a particular subset, for example to verify the interconnect topology of the graph. StreamIt Debugging Environment :

Slide 29: 

Conclusion : Stream processing has been shown to outperform mainstream programmable computing solutions while consuming less power for data parallel applications. Exploiting the data- and instruction-level parallelism inherent in these applications, stream processors sustain many operations in parallel, and overlap them with memory accesses in order to improve computation throughput. Realizing the performance potential of stream processing, however, depends on the ability to manage bandwidth demands in the memory hierarchy to sustain the operands needed for highly parallel computation. We introduced an indexed stream register file architecture that enabled data reuse patterns found in a broad range of data parallel applications to be captured in on chip memories of stream processors, reducing o chip bandwidth demands by several fold in some cases. This, in effect , enables classes of data parallel applications that ,due to bandwidth bottlenecks, could not previously be efficiently executed on stream processors to be supported efficiently.

Slide 31: 

Thank You Thank You