Data Centric Storage using GHTLecture 13 October 14, 2004EENG 460a / CPSC 436 / ENAS 960 Networked Embedded Systems &Sensor Networks: Data Centric Storage using GHT Lecture 13 October 14, 2004 EENG 460a / CPSC 436 / ENAS 960 Networked Embedded Systems & Sensor Networks Andreas Savvides
andreas.savvides@yale.edu
Office: AKW 212
Tel 432-1275
Course Website
http://www.eng.yale.edu/enalab/courses/eeng460a
Data centric Storage In Sensornets with GHT: Data centric Storage In Sensornets with GHT S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin and F. Yu
MONET Special Issue on Sensor Networks, August 2003
Overview: Overview
Data Centric Storage
Data is stored inside the network – each name corresponds to a location in space
All data with the same name will be stored at the same sensor network location
E.g an elephant sighting
Why Data centric Storage?
Energy efficiency
Robustness against mobility and node failures
Scalability
Keywords and Terminology: Keywords and Terminology Observation
♦ low-level readings from sensors
♦ e.g. Detailed temperature readings
Events
♦ Predefined constellations of low-level observations
♦ e.g. temperature greater than 75 F
Queries
♦Used to elicit information from sensor network
Performance Metric:Total Usage /Hotspot Usage : Performance Metric:Total Usage /Hotspot Usage Use communication as a cost function for energy consumption
Total Usage
Total number of packets sent in the Sensor network
Hotspot Usage
The maximal number of packets send by a particular sensor node
Costs used in the evaluation
Message flooding cost O(n)
Point-to-point routing cost
n is the number of nodes
Alternative Storage Schemes: Alternative Storage Schemes External Storage (ES)
Events propagated and stored at an external location
Local Storage (LS)
Events stored locally at the detecting node
Queries are flooded to all nodes and the events are sent back
Data Centric Storage (DCS)
Data for an event stored within the sensor network
Queries are directed to the node that stores the data
External Storage (ES): External Storage (ES) External
storage event
Local Storage (LS): Local Storage (LS) event event Queries flooded at all
the nodes
Why do we need DCS?: Why do we need DCS? Scalability
Robustness against Node failures and Node mobility
To achieve Energy-efficiency
Design Criterial: Scalability & Robustness: Design Criterial: Scalability & Robustness Node failures
Topology changes
System scale to large number of nodes
Energy Constraints
Persistence
(k,v) pair must remain available to queries, despite sensor node failures and changes in sensor network topology
Consistency
A query k must be routed correctly to a node where (k,v) pairs are stored – if these node change, then they should do this consisently
Scaling in Database Size
Topological generality – system should scale well on a large number of topologies
Assumptions in DCS: Assumptions in DCS Large Scale networks whose approximate geographic boundaries are known
Nodes have short range communication and are within the radio range of several other nodes
Nodes know their own locations by GPS or some localization scheme
Communication to the outside world takes place by one or more access points
Data Centric Storage: Data Centric Storage Relevant Data are stored by “name” at nodes within the Sensor network
All data with the same general name will be stored at the same sensor-net node.
e.g. (“elephant sightings”)
Queries for data with a particular name are then sent directly to the node storing those named data
Data centric Storage : Data centric Storage Elephant Sighting source:lass.cs.umass.edu
Geographic Hash Table : Geographic Hash Table Events are named with keys and both the storage and the retrieval are performed using keys
GHT provides (key, value) based associative memory
Geographic Hash Table Operations: Geographic Hash Table Operations GHT supports two operations
♦ Put(k,v)-stores v (observed data) according to the key k
♦ Get(k)-retrieve whatever value is associated with key k
Hash function
♦ Hash the key in to the geographic coordinates
♦ Put() and Get() operations on the same key “k” hash k to the same location
Storing Data in GHT: Storing Data in GHT Put (“elephant”, data) (12,24) Hash (‘elephant’)=(12,24) source:lass.cs.umass.edu
Retrieving data in GHT : Retrieving data in GHT Get (“elephant”) Hash (‘elephant’)=(12,24) (12,24)
Geographic Hash Table : Geographic Hash Table Node A Node B
Algorithms Used By GHT: Algorithms Used By GHT Geographic hash Table uses GPSR for Routing(Greedy Perimeter Stateless Routing)
PEER-TO-PEER look up system
(data object is associated with key and each node in the system is responsible for storing a certain range of keys)
Algorithm (Contd): Algorithm (Contd) GPSR- Packets are marked with position of destinations and each node is aware of its position
Greedy forwarding algorithm
Perimeter forwarding algorithm
A B A B
GPSR: Right-Hand Rule In Perimeter Forwarding: GPSR: Right-Hand Rule In Perimeter Forwarding x y z 1 2 3
Home Node and Home perimeter: Home Node and Home perimeter Home node: Node geographically nearest to the destination coordinates of the packet
Serves as the rendezvous point for Get() and Put() operations on the same key
In GHT packet is not addressed to specific node but only to a specific location
Use GPSR to find the home node
only perimeter mode of GPSR to find Home Perimeter
Home Perimeter – perimeter that encloses the destination
Start from the home node, and use perimeter mode to make a cycle and return to the home node
Problems : Problems Robustness could be affected
Nodes could move (i.d. of Home node?)
Node failure can Occur
Deployment of new Nodes
Not Scalable
Storage capacity of the home nodes
Bottleneck at Home nodes
Solutions to the problems: Solutions to the problems
Perimeter refresh protocol
mostly addresses the robustness issue
Structured Replication
address the scalability issue
how to handle storage of many events
Perimeter refresh protocol: Perimeter refresh protocol Replicates stored data for key k at nodes around the location to which k hashes
Stores a copy of the key value pair at each node on the home perimeter
Each node on the perimeter is called a replica node
How do you ensure consistency & persistence
A node becomes the home node if a packet for a particular key arrives at that node
The perimeter refresh protocols periodically sends out refresh packets
After a time period Th generate a refresh packet that contains the data for that key
Packet forwarded on the home perimeter in the same way as Get() and Put()
The refresh packet will take a tour of the home perimeter regardless the changes in the network topology since the key’s insertion
This property maintains the perimeter
Perimeter Refresh Protocol: Perimeter Refresh Protocol How do you guard against node failures
When a replica node receives a packet it did not originate, it caches the data in the refresh and sets up a takeover timer Tt
Timer is reset each time a refresh from another node arrives
If the timer expires the replica node initiates a refresh packet addressed to the key’s hashed location
Note: That particular node does not determine a new home node. The GHT routing causes the refresh to reach a node home node
Perimeter Refresh Protocol: Perimeter Refresh Protocol E F B D A C L home Replica Replica Assume key k hashes
at location L
A is closest to L so it becomes
the home node
Perimeter Refresh Protocol: Perimeter Refresh Protocol E D F C B L Replica Replica home Replica Replica Suppose the node A dies
Time Specifications: Time Specifications Refresh time (Th)
Take over time (Tt)
Death time (Td)
General rule
Td>Th and Tt>Th
In GHT Td=3Th and Tt=2Th
Characteristics Of Refresh Packet: Characteristics Of Refresh Packet Refresh packet is addressed to the hashed location of the key
Every (Th) secs the home node will generate refresh packet
Refresh packet contains the data stored for the key and routed exactly as get() and put() operations
Refresh packet always travels along the home perimeter
Structured Replication: Structured Replication Too many events are detected then home node will become the hotspot of communication.
Structured replication is used to address the scaling problem
Hierarchical decomposition of the key space
Event names have a certain hierarchy depth
Structured Replication: Structured Replication
Structured Replication: Structured Replication A node that detects a new event, stores that event to its closest mirror
this is easily computable
This reduces the storage cost, but increases the query cost
GHT has to route the queries to all mirror nodes
Queries are routes recursively
First route query to the root, then to the first level and then to the second level mirrors
Structured replication becomes more useful for frequently detected events
Evaluation: Evaluation Simulation to test if the protocol is functioning correctly
Done in the ns-2 network simulator using an IEEE 802.11 mac
This is a well known event driven simulator for ad-hoc networks
Larger scale simulations for the comparative study where done with a custom simulator
Comparative Study: Comparative Study
Simulation compares the following schemes
External Storage (ES)
Local Storage (LS)
Normal DCS – A query returns a separate message for each detected event
Summarized DCS(S-DCS): A query returns a single message regardless of the number of detected events
Structured Replication DCS (SR_DCS) – Assuming an optimal level of SR
Comparison based on Cost
Comparison based on Total usage and Hot spot usage
Assumptions in comparison: Assumptions in comparison Asymptotic costs of O(n) for floods and O( n) for point to point routing
Event locations are distributed randomly
Event locations are not known in advance
No more than one query for each event type
(Q –Queries in total)
Assume access points to be the most heavily used area of the sensor network
Comparison based onHot-spot/Total Usage : Comparison based on Hot-spot/Total Usage n - Number of nodes
T - Number of Event types
Q – Number Of Event types queried for
Dtotal – Total number of detected events
DQ- Number of detected events for queries
DCS TYPES: DCS TYPES Normal DCS – Query returns a separate message for each detected event
Summarized DCS – Query returns a single message regardless of the number of detected events
(usually summary is preferred)
Comparison Study – contd..: Comparison Study – contd..
Observations from the Comparison: Observations from the Comparison DCS is preferable only in cases where
Sensor network is Large
There are many detected events and not all event types queried
Dtotal>>max(Dq,Q)
Simulations: Simulations To check the Robustness of GHT
To compare the Storage methods in terms of total and hot spot usage
Simulation Setup: Simulation Setup ns-2
Node Density – 1node/256m2
Radio Range – 40 m
Number of Nodes -50,100,150,200
Mobility Rate -0,0.1,1m/s
Query generation Rate -2qps
Event types – 20
Events detected -10/type
Refresh interval -10 s
Performance metrics: Performance metrics Availability of data stored to Queriers
(In terms of success rate)
Loads placed on the nodes participating in GHT (hotspot usage)
Simulation Results for Robustness: Simulation Results for Robustness GHT offers perfect availability of stored events in static case
It offers high availability when nodes are subjected to mobility and failures
Simulation Results under varying Q: Simulation Results under varying Q Number of nodes is
constant= 10000
Simulation results under varying N: Simulation results under varying N Number of Queries Q =50
Simulation Results for comparison of 3-storage methods: Simulation Results for comparison of 3-storage methods S-DCS have low hot-spot usage under varying “Q”
S-DCS is has the lowest hot-spot usage under varying “n”
Conclusion: Conclusion Data centric storage entails naming of data and storing data at nodes within the sensor network
GHT- hashes the key (events) in to geographical co-ordinates and stores a key-value pair at the sensor node geographically nearest to the hash
GHT uses Perimeter Refresh Protocol and structured replication to enhance robustness and scalability
DCS is useful in large sensor networks and there are many detected events but not all event types are Queried
REFERENCES: REFERENCES Deepak Ganesan, Deborah Estrin, John Heidemann, Dimensions: why do we need a new data handling architecture for sensor networks?, ACM SIGCOMM Computer Communication Review, Volume 33 Issue 1, January 2003 Scott Shenker, Sylvia Ratnasamy, Brad Karp, Ramesh Govindan, Deborah Estrin, Data-centric storage in sensornets, ACM SIGCOMM Computer Communication Review, Volume 33 Issue 1, January 2003
Sylvia Ratnasamy, Brad Karp, Scott Shenker, Deborah Estrin, Ramesh Govindan, Li Yin, Fang Yu, Data-centric storage in sensornets with GHT, a geographic hash table, Mobile Networks and Applications, Volume 8 Issue 4, August 2003
Chalermek Intanagonwiwat, Ramesh Govindan, Deborah Estrin, John Heidemann, Fabio Silva, Directed diffusion for wireless sensor networking, IEEE/ACM Transactions on Networking (TON), Volume 11 Issue, February 2003
R. Govindan, J. M. Hellerstein, W. Hong, S. Madden, M. Franklin, S. Shenker, The Sensor Network as a Database, USC Technical Report No. 02-771, September 2002