reliable multicast

Uploaded from authorPOINTLite
Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Reliable Multicast Transport for Distributed Middleware: 

Reliable Multicast Transport for Distributed Middleware Gidon Gershinsky IBM Haifa Research Lab

Talk outline: 

Talk outline IP multicast overview IETF Market adoption Our work

IP Multicast: 

IP Multicast Deering’s PhD, 1988 RFC1112 “Host extensions for IP multicast”, 1989 D class of IP addresses: 224.0.0.1 – 239.255.255.255 Administratively scoped multicast (RFC 2365) 239.0.0.0 - 239.255.255.255 IP, UDP. Win*, Unix, Linux, AS. Java (UDP). Ethernet, TokenRing, FDDI. Satellite. Cisco and Nortel routers. “Without a doubt, multicast has become a hot topic“ “Next year is the year of multicast”.

IP to Ethernet: 

IP to Ethernet Ethernet multicast: 23 low-order bits of IP multicast address are placed to 01.00.5E.00.00.00

Switching: 

Switching Switches: default “on” CGMP (Cisco Group Management Protocol) Must be configured on both routers and Layer 2 switches. Receiver: IGMP “join” to router. Router: CGMP join to switch. IGMP snooping Eavesdrop on Layer 3 IGMP messages. Must examine every multicast packet – not suitable for low-end switches. Can require dedicated h/w circuits.

Routing: 

Routing Routers: default “off” Dense mode: flood and prune Reverse Path Forwarding (RPF) PIM-DM, DVMRM Dense mode: explicit topology MOSPF Sparse mode: rendezvous points PIM-SM Cisco: PIM-DM, PIM-SM IGMP: v2 – active leave, v3 – ssm (s,g)

PIM-SM: 

PIM-SM

Reliable Multicast: 

Reliable Multicast "no single reliable multicast protocol will likely meet the needs of all applications" RFC 2357 Transport characteristics Reliability Real-time delivery Number of transmitters Late joins Network topology Reliability - protocol families (RFC 3048): Negative Acknowledgment (NACK) Tree-based ACK Asynchronous Layered Coding (ALC) Router assist

Reliable Multicast: 

Reliable Multicast Bulk Data Corporate data, server cache replication, software distribution Files, large memory segments Static Full reliability, no real-time, one sender Streaming Data Stock quotes, news, video, audio Messages, v/a formats Dynamic Full-to-none reliability reqs, varying real-time reqs, one/few sender(s) Collaborative Whiteboard interaction, multimedia conference Short messages, v/a formats Dynamic and/or static Full-to-moderate reliability reqs, moderate real-time reqs, many senders

IETF RMT WG: 

IETF RMT WG No standard protocol defined. “Building Blocks” (RFC 2887) for One-to-many bulk-data transfer (RFC 3048) Nack-oriented reliable multicast (Draft) Tree-ACK (TRACK) reliable multicast (Draft) Layered-coding transport (Draft) Generic Router Assist (Draft) Forward Error Correction (Draft) Tree Auto-Configuration (Draft)

IETF WGs: 

IETF WGs Multicast-Address Allocation (malloc) Multicast Security (msec) Group key management Subset-Difference (Dalit Naor, IBM Haifa) Source-Specific Multicast (ssm) (source IP, mcast group) Protocol- Independent Multicast (PIM) Inter-Domain Multicast Routing (idmr) DVMRP

Nack-based reliability: 

Nack-based reliability PGM protocol, Experimental RFC 3208 Packet stream; transmission window Nack suppression Exponential t = T/L*ln(x*(exp(L)-1)*(T/L)) Dependent on gap size Designated local repairers (DLR) Network Elements Additional Nack suppression Repair data filtering Cisco router support

Erasure codes: 

Erasure codes a.k.a Forward Error Correction Salomon-Reed codes: Improve scalability by reducing (Nack) feedback traffic. k packets -> k+n packets. Streaming data Proactive encoding Reactive encoding “Tornado codes”: k packets -> infinite number Content distribution. No backlink.

Congestion Control: 

Congestion Control Datagram (IP or UDP) traffic is unfair to self-managed data streams (inc TCP). No standard. Feedback from end-points (receivers) and network elements. Nack suppression. TCP-friendly CC: Rate ~ mtu/(rtt*sqrt(loss)) TCP-like window slide, ACKer selection. Worst-throughput. “Crying-baby” problem.

Layered Architecture: 

Layered Architecture Same (or multi-layer) data - to a number of multicast groups. Possibly no feedback Static layer scheme Base group, r0. r1=r0*k, r2=r0*k^2, etc Time slots Leave if packet lost Join new if zero loss and trigger set Receiver-driven congestion control Equation-based reception rate, RTT measured by e.g. the time required to join.

Dynamic layer scheme: 

Dynamic layer scheme “Wave and Equation Based Rate Control, WEBRC” Leave latency (IGMP v1 + leave propagation) G groups, A – active, G-A - quiescent. r(G)=0, r(G-1)=0,…,r(A+1)=0,r(A),r(A-1),…,r(1) Group j, timeslot t, rate is r(((j-t-1) mod G)+1) Always join the base group. Must leave quiescent wave groups.

Marketplace: 

Marketplace Enterprise “Middleware”. No use in public internet domain, private customers. Messaging (JMS and like) Tibco: Rendezvous “reliable broadcast” or multicast 60-second limit, probably Nack mechanism Routing daemons: subnet and wide-area Talarian: PGM implementation Pro/re-active FEC Network elements: Host implementation Bought by Tibco 01/2002 110m$

Marketplace: 

Marketplace Messaging (cont) Microsoft: PGM for MSMQ SoftWired Proprietary protocol (nack-based) JMS Automatic assignment of topics to mcast groups Positive ACK for “epoch” Scalable to ~100 Fiorano Proprietary protocol (nack-based) Uses standard mcast-routers Uses one multicast group only

Marketplace: 

Marketplace Content (bulk data) delivery Digital Fountain FEC (Tornado-style erasure codes) Useful for unicast streams as well, e.g. on long distance links Bandwiz Proprietary patented FEC algorithm Cluster synchronization (Horus-style) BEA (WebLogic server cluster) Software replication Symantec “Norton Ghost Tool” Multimedia real-time streams

IBM Haifa: 

IBM Haifa Enterprise middleware environment NACK, FEC reliability mechanisms Security Congestion control Message streams, bulk data (files) delivery

Burst suppression: 

Burst suppression small messages at extreme rates (Nagle-like algorithm) P = A + B*Size. Cisco 2500 Router:

Hub-centric distribution: 

Hub-centric distribution Message stream concatenation Extreme storage demands Reliability definition B = T*Rate; Tmin depends on Loss, Nrec Persistence SSD technologies

Topic mapping to groups: 

Topic mapping to groups Limited number of multicast groups Ti1,..,Tin(i) -> Gi, i=1,…,Ng Sk subscribes to {T}. Will get {T} and {Tr} Perform mapping taking into account: Topic load Network topology Receiver characteristics, interest correlation Dynamic re-mapping as congestion control tool

Publications: 

Publications Patent on Nack suppression (filed) Patent on Multidocument multicasting (filed) Patent on Multicast Caching (filed) Paper on Networked Group Communication (to be submitted)