logging in or signing up Distributed Database Systems dabhijaydeep Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: Embed: Flash iPad Dynamic Copy Does not support media & animations Automatically changes to Flash or non-Flash embed WordPress Embed Customize Embed URL: Copy Thumbnail: Copy The presentation is successfully added In Your Favorites. Views: 2780 Category: Education License: All Rights Reserved Like it (3) Dislike it (0) Added: September 20, 2010 This Presentation is Public Favorites: 1 Presentation Description No description available. Comments Posting comment... By: halko75 (19 month(s) ago) please allow me to download this file Saving..... Post Reply Close By: aimal_jane (16 month(s) ago) please allow me to download this file Saving..... Edit Comment Close By: Diwakar_11 (26 month(s) ago) pls Allow me to download the it.. Saving..... Post Reply Close Saving..... Edit Comment Close By: artiattri (31 month(s) ago) plz allow me to download it Saving..... Post Reply Close Saving..... Edit Comment Close By: rubenaviles (31 month(s) ago) hello excellent presentation!! Saving..... Post Reply Close Saving..... Edit Comment Close Premium member Presentation Transcript Distributed DatabaseSystems : Distributed DatabaseSystems Contents : Contents Data & Database : Data & Database A database is a collection of related data. By data, we mean known facts that can be recorded and that have implicit meaning. For example, consider the names, telephone numbers, and addresses of the people you know. A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and some preconceived applications in which these users are interested. Database Management System (DBMS) : Database Management System (DBMS) A Database Management System (DBMS) is a collection of programs that enables users to create and maintain a database. The DBMS is hence a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. Defining involves specifying the data types, structures, and constraints for the data to be stored in the database. Constructing the database is the process of storing the data itself on some storage medium that is controlled by the DBMS. Manipulating a database includes querying the database to retrieve specific data, updating the database to reflect changes and generating reports from the data. Database System Architecture (1 of 2) : Database System Architecture (1 of 2) Database System Architecture (2 of 2) : Database System Architecture (2 of 2) The architecture of a database system is greatly influenced by the underlying computer system on which it runs, in particular by such aspects of computer architecture as networking, parallelism, and distribution. Networking of computers allows some tasks to be executed on a server system, and some tasks to be executed on client systems. This division of work has led to client–server database systems. Parallel processing within a computer system allows database-system activities to be speeded up, allowing faster response to transactions, as well as more transactions per second. The need for parallel query processing has led to parallel database systems. Distributing data across sites or departments in an organization allows those data to reside where they are generated or most needed, but still to be accessible from other sites and from other departments. Distributed database systems handle geographically or administratively distributed data spread across multiple database systems. Distributed Database Systems : Distributed Database Systems In a distributed database system, the database is stored on several computers. The computers in a distributed system communicate with one another through various communication media, such as high-speed networks or telephone lines. They do not share main-memory or disks. The computers in a distributed system may vary in size and function, ranging from workstations up to mainframe systems. The computers in a distributed system are referred to by a number of different names, such as sites or nodes. We mainly use the term site, to emphasize the physical distribution of these systems. A Distributed System : A Distributed System Reasons for building Distributed Database Systems : Reasons for building Distributed Database Systems Sharing data: The major advantage in building a distributed database system is the provision of an environment where users at one site may be able to access the data residing at other sites. Autonomy: The primary advantage of sharing data by means of data distribution is that each site is able to retain a degree of control over data that are stored locally. Availability: If one site fails in a distributed system, the remaining sites may be able to continue operating. In particular, if data items are replicated in several sites, a transaction needing a particular data item may find that item in any of several sites. Thus, the failure of a site does not necessarily imply the shutdown of the system. Types of Distributed Database System : Types of Distributed Database System Distributed Database Types of Distributed Database System Homogeneous Distributed Database Heterogeneous Distributed Database Homogeneous Distributed Databases : Homogeneous Distributed Databases In a homogeneous distributed database, all sites have identical database management system software, are aware of one another. Sites agree to cooperate in processing users’ requests. In such a system, local sites surrender a portion of their autonomy in terms of their right to change schemas or database management system software. That software must also cooperate with other sites in exchanging information about transactions, to make transaction processing possible across multiple sites. Heterogeneous Distributed Databases : Heterogeneous Distributed Databases In a heterogeneous distributed database, different sites may use different schemas, and different database management system software. The sites may not be aware of one another, and they may provide only limited facilities for cooperation in transaction processing. The differences in schemas are often a major problem for query processing, while the divergence in software becomes a hindrance for processing transactions that access multiple sites. Manipulation of information located in a heterogeneous distributed database requires an additional software layer on top of existing database systems. This software layer is called a multidatabase system. Distributed Data Storage : Distributed Data Storage Consider a relation ‘r’ that is to be stored in the database. There are two approaches to storing this relation in the distributed database: Replication: The system maintains several identical replicas (copies) of the relation, and stores each replica at a different site. The alternative to replication is to store only one copy of relation ‘r’. Fragmentation: The system partitions the relation into several fragments, and stores each fragment at a different site. Fragmentation and Replication can be combined: A relation can be partitioned into several fragments and there may be several replicas of each fragment. Transparency : Transparency The user of a distributed database system should not be required to know either where the data are physically located or how the data can be accessed at the specific local site. This characteristic, called Data Transparency. Data Transparency can take several forms: Fragmentation transparency Replication transparency Location transparency System Structure (1 of 2) : System Structure (1 of 2) Each site has its own local transaction manager, whose function is to ensure the ACID properties of those transactions that execute at that site. The various transaction managers cooperate to execute global transactions. System Structure (2 of 2) : System Structure (2 of 2) To understand how such a manager can be implemented, consider an abstract model of a transaction system, in which each site contains two subsystems: The transaction manager manages the execution of those transactions (or sub-transactions) that access data stored in a local site. Note that each such transaction may be either a local transaction (that is, a transaction that executes at only that site) or part of a global transaction (that is, a transaction that executes at several sites). The transaction coordinator coordinates the execution of the various transactions (both local and global) initiated at that site. Distributed Query Processing : Distributed Query Processing In a distributed system, we must take into account several other matters, including The cost of data transmission over the network The potential gain in performance from having several sites process parts of the query in parallel The relative cost of data transfer over the network and data transfer to and from disk varies widely depending on the type of network and on the speed of the disks. Thus, in general, we cannot focus solely on disk costs or on network costs. Rather, we must find a good tradeoff between the two. System Failure Modes : System Failure Modes A distributed system may suffer from the same types of failure that a centralized system does (for example, software errors, hardware errors, or disk crashes). The basic failure types are Failure of a site Loss of messages Failure of a communication link Network partition To provide high availability, a distributed database must detect failures, reconfigure itself so that computation may continue, and recover when a processor or a link is repaired. The task is greatly complicated by the fact that it is hard to distinguish between network partitions or site failures. Other Important Issues : Other Important Issues Commit Protocols If we are to ensure atomicity, all the sites in which a transaction T executed must agree on the final outcome of the execution. T must either commit at all sites, or it must abort at all sites. To ensure this property, the transaction coordinator of T must execute a commit protocol. Timestamping The principal idea behind the timestamping scheme in is that each transaction is given a unique timestamp that the system uses in deciding the serialization order. You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.