Multiprocessor systems

Views:
 
     
 

Presentation Description

Introduction about Multiprocessor systems

Comments

Presentation Transcript

Multiprocessor Systems:

CS502 Spring 2006 Multiprocessor Systems CS-502 Operating Systems Spring 2006

Overview –Interrelated topics:

CS502 Spring 2006 Overview –Interrelated topics Multiprocessor Systems Distributed Systems Distributed File Systems

Distributed Systems:

CS502 Spring 2006 Distributed Systems Nearly all systems today are distributed in some way, e.g.: they use email they access files over a network they access printers over a network they are backed up over a network they share other physical or logical resources they cooperate with other people on other machines they receive video, audio, etc.

Distributed Systems – Why?:

CS502 Spring 2006 Distributed Systems – Why? Distributed systems are now a requirement: Economics – small computers are very cost effective Resource sharing sharing and printing files at remote sites processing information in a distributed database using remote specialized hardware devices Many applications are by their nature distributed (bank teller machines, airline reservations, ticket purchasing) Computation speedup – To solve the largest or most data intensive problems , we use many cooperating small machines (parallel programming) Reliability

What is a Distributed System?:

CS502 Spring 2006 What is a Distributed System? There are several levels of distribution. Earliest systems used simple explicit network programs: FTP: file transfer program Telnet ( rlogin ): remote login program mail remote job entry (or rsh ): run jobs remotely Each system was a completely autonomous independent system, connected to others on the network

Loosely Coupled Systems:

CS502 Spring 2006 Loosely Coupled Systems Most distributed systems are “loosely-coupled Each CPU runs an independent autonomous OS Hosts communicate through message passing . Computers/systems don’t really trust each other Some resources are shared, but most are not The system may look differently from different hosts Typically, communication times are long Relative to processing times

Closely-Coupled Systems:

CS502 Spring 2006 Closely-Coupled Systems Distributed system becomes more “closely coupled” as it: appears more uniform in nature runs a “single” operating system (cooperating across all machines) has a single security domain shares all logical resources (e.g., files) shares all physical resources (CPUs, memory, disks, printers, etc.) In the limit, a closely coupled distributed system: – Multicomputer Multiple computers – CPU and memory and network interface (NIC) High performance interconnect Looks a lot like a single system E.g., Beowulf clusters

Tightly Coupled Systems:

CS502 Spring 2006 Tightly Coupled Systems Tightly coupled systems usually are multiprocessor systems Have a single address space Usually has a single bus or backplane to which all processors and memories are connected Low communication latency Shared memory for processor communication Shared I/O device access Example: Multiprocessor Windows PC

Distributed Systems – a Spectrum:

CS502 Spring 2006 Distributed Systems – a Spectrum Loosely coupled Latency – milliseconds Closely coupled Multicomputer Latency – microseconds Tightly coupled Multiprocessor Latency – nanoseconds

Distributed Systems – Software Overview (1):

CS502 Spring 2006 Distributed Systems – Software Overview (1) Network Operating System Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: Remote logging into the appropriate remote machine. Transferring data from remote machines to local machines, via the File Transfer Protocol (FTP) mechanism.

Distributed Systems – Software Overview (2):

CS502 Spring 2006 Distributed Systems – Software Overview (2) Distributed Operating System Users not aware of multiplicity of machines. Access to remote resources similar to access to local resources. Data Migration – transfer data by transferring entire file, or transferring only those portions of the file necessary for the immediate task. Computation Migration – transfer the computation, rather than the data, across the system. However, The distinction between Networked Operating Systems and Distributed Operating Systems is shrinking E.g., CCC cluster; Windows XP on home network

Multiprocessor Systems:

CS502 Spring 2006 Multiprocessor Systems Loosely coupled Latency – milliseconds Closely coupled Multicomputer Latency – microseconds Tightly coupled Multiprocessor Latency – nanoseconds

Multiprocessors (1) – Bus-based:

CS502 Spring 2006 Multiprocessors (1) – Bus-based Bus contention limits the number of CPUs Lower bus contention Caches need to be synced ( big deal ) Compiler places data and text in private or shared memory

Multiprocessors (2) - Crossbar:

CS502 Spring 2006 Multiprocessors (2) - Crossbar Can support a large number of CPUs - Non-blocking network Cost/performance effective up to about 100 CPU – growing as n 2

Multiprocessors(3) – Multistage Switching Networks:

CS502 Spring 2006 Multiprocessors(3) – Multistage Switching Networks Omega Network – blocking Lower cost, longer latency For N CPUs and N memories – log 2 n stages of n/2 switches

Type of Multiprocessors – UMA vs. NUMA:

CS502 Spring 2006 Type of Multiprocessors – UMA vs. NUMA UMA (Uniform Memory Access) Shared Memory Multiprocessor Familiar programming model Number of CPUs are limited Completely symmetrical NUMA (Non-Uniform Memory Access) Single address space visible to all CPUs Access to remote memory via commands LOAD & STORE remote memory access slower than to local

Caching vs. Non-caching:

CS502 Spring 2006 Caching vs. Non-caching No caching Remote access time not hidden Slows down a fast processor May impact programming model Caching Hide remote memory access times Complex cache management hardware Some data must be marked as non-cachable Visible to programming model

Multiprocessor Systems:

CS502 Spring 2006 Multiprocessor Systems Loosely coupled Latency – milliseconds Closely coupled Multicomputer Latency – microseconds Tightly coupled Multiprocessor Latency – nanoseconds

Multiprocessor OS – Private OS:

CS502 Spring 2006 Multiprocessor OS – Private OS Each processor has a copy of the OS Looks and generally acts like N independent computers May share OS code OS Data is separate I/O devices and some memory shared Synchronization issues While simple, benefits are limited

Multiprocessor OS – Master-Slave:

CS502 Spring 2006 Multiprocessor OS – Master-Slave One CPU (master) runs the OS and applies most policies Other CPUs run applications Minimal OS to acquire and terminate processes Relatively simple OS Master processor can become a bottleneck for a large number of slave processors

Multiprocessor OS – Symmetric Multi-Processor (SMP):

CS502 Spring 2006 Multiprocessor OS – Symmetric Multi-Processor (SMP) Any processor can execute the OS and applications Synchronization within the OS is the issue Lock the whole OS – poor utilization – long queues waiting to use OS OS critical regions – much preferred Identify independent OS critical regions that be executed independently – protect with mutex Identify independent critical OS tables – protect access with MUTEX Design OS code to avoid deadlocks The art of the OS designer Maintenance requires great care

Multiprocessor OS – SMP (continued):

CS502 Spring 2006 Multiprocessor OS – SMP (continued) Multiprocessor Synchronization Need special instructions – test-and-set Spinlocks are common Can context switch if time in critical region is greater than context switch time OS designer must understand the performance of OS critical regions Context switch time could be onerous Data cached on one processor needs to be re-cached on another

Multiprocessor Scheduling:

CS502 Spring 2006 Multiprocessor Scheduling When processes are independent (e.g., timesharing) Allocate CPU to highest priority process Tweaks For a process with a spinlock, let it run until it releases the lock To reduce TLB and memory cache flushes, try to run a process on the same CPU each time it runs For groups of related processes Attempt to simultaneously allocate CPUs to all related processes ( space sharing ) Run all threads to termination or block Gang schedule – apply a scheduling policy to related processes together

Multicomputer Systems:

CS502 Spring 2006 Multicomputer Systems Loosely coupled Latency – milliseconds Closely coupled Multicomputer Latency – microseconds Tightly coupled Multiprocessor Latency – nanoseconds

Multicomputers:

CS502 Spring 2006 Multicomputers Multiprocessor size is limited Multicomputers – closely coupled processors that do not physically share memory Cluster computers Networks or clusters of computers (NOWs or COWs) Can grow to a very large number of processors Consist of Processing nodes – CPU, memory and network interface (NIC) I/O nodes – device controller and NIC Interconnection network Many topologies – e.g. grid, hypercube, torus Can be packet switched or circuit switched

Inter-Process Communication (IPC) among computers:

CS502 Spring 2006 Inter-Process Communication (IPC) among computers Processes on separate processors communicate by messages Message moved to NIC send buffer Message moved across the network Message copied into NIC receive buffer destination host addr. header source host addr. application ID msg length msg data checksum destination host addr.

Interprocessor Communication:

CS502 Spring 2006 Interprocessor Communication Copying of messages is a major barrier to achieving high performance Network latency may involve copying message (hardware issue) Must copy message to NIC on send and from NIC on receive Might have additional copies between user processes and kernel (e.g., for error recovery) Could map NIC into user space – creates some additional usage and synchronization problems

Multicomputer IPC (continued):

CS502 Spring 2006 Multicomputer IPC (continued) Message Passing mechanisms MPI (p. 123) and PVM are two standards Basic operations are send ( destinationID , & message ) receive ( senderID , & message ) Blocking calls – process blocks until message is moved from (to) NIC buffer to (from) network {for send (receive)} We will look at alternative interprocess communication methods in a few minutes

Multicomputer Scheduling:

CS502 Spring 2006 Multicomputer Scheduling Typically each node has its own scheduler With a coordinator on one node, gang scheduling is possible for some applications Most scheduling is done when processes are created i.e., allocation to a processor for life of process Load Balancing – efficiently use the system’s resources Many models – dependent on what is important Examples Sender-initiated - when overloaded send process to another processor Receiver-initiated – when underloaded ask another processor for a job

Multicomputer IPC Distributed Shared Memory (DSM) :

CS502 Spring 2006 Multicomputer IPC Distributed Shared Memory (DSM) A method of allowing processes on different processors to share regions of virtual memory Programming model (alleged to be) simpler Implementation is essentially paging over the network Backing file lives in mutually accessible place Can easily replicate read-only pages to improve performance Writeable pages One copy and move as needed Multiple copies Make each frame read-only On write tell other processors to invalidate page to be written Write through

Distributed System – Remote Procedure Call (RPC):

CS502 Spring 2006 Distributed System – Remote Procedure Call (RPC) The most common means for remote communication Used both by operating systems and by applications NFS is implemented as a set of RPCs DCOM, CORBA, Java RMI, etc., are just RPC systems Fundamental idea: – Servers export an interface of procedures/functions that can be called by client programs similar to library API, class definitions, etc. Clients make local procedure/function calls As if directly linked with the server process Under the covers, procedure/function call is converted into a message exchange with remote server process

RPC – Issues:

CS502 Spring 2006 RPC – Issues How to make the “remote” part of RPC invisible to the programmer? What are semantics of parameter passing? E.g., pass by reference? How to bind (locate/connect-to) servers? How to handle heterogeneity? OS, language, architecture, … How to make it go fast?

RPC Model:

CS502 Spring 2006 RPC Model A server defines the service interface using an interface definition language (IDL) the IDL specifies the names, parameters, and types for all client-callable server procedures example: Sun’s XDR (external data representation) A stub compiler reads the IDL declarations and produces two stub functions for each server function Server-side and client-side Linking:– Server programmer implements the service’s functions and links with the server-side stubs Client programmer implements the client program and links it with client-side stubs Operation:– Stubs manage all of the details of remote communication between client and server

RPC Stubs:

CS502 Spring 2006 RPC Stubs A client-side stub is a function that looks to the client as if it were a callable server function I.e., same API as the server’s implementation of the function A server-side stub looks like a caller to the server I.e., like a hunk of code invoking the server function The client program thinks it’s invoking the server but it’s calling into the client-side stub The server program thinks it’s called by the client but it’s really called by the server-side stub The stubs send messages to each other to make the RPC happen transparently (almost!)

Marshalling Arguments:

CS502 Spring 2006 Marshalling Arguments Marshalling is the packing of function parameters into a message packet the RPC stubs call type-specific functions to marshal or unmarshal the parameters of an RPC Client stub marshals the arguments into a message Server stub unmarshals the arguments and uses them to invoke the service function on return: the server stub marshals return values the client stub unmarshals return values, and returns to the client program

RPC Binding:

CS502 Spring 2006 RPC Binding Binding is the process of connecting the client to the server the server, when it starts up, exports its interface identifies itself to a network name server tells RPC runtime that it is alive and ready to accept calls the client, before issuing any calls, imports the server RPC runtime uses the name server to find the location of the server and establish a connection The import and export operations are explicit in the server and client programs

RPC Systems:

CS502 Spring 2006 RPC Systems Validation of Lauer-Needham hypothesis about system organization Management of shared system resources or functions encapsulated in modules Interchangeability of function call and message passing

Summary:

CS502 Spring 2006 Summary There are many forms of multiple processor systems The system software to support them involves substantial additional complexity over single processor systems The core OS must be carefully designed to fully utilize the multiple resources Programming model support is essential to help application developers