Distributed DatabaseSystems : Distributed DatabaseSystems Contents : Contents Data & Database : Data & Database A database is a collection of related data.
By data, we mean known facts that can be recorded and that have implicit meaning.
For example, consider the names, telephone numbers, and addresses of the people you know.
A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and some preconceived applications in which these users are interested. Database Management System (DBMS) : Database Management System (DBMS) A Database Management System (DBMS) is a collection of programs that enables users to create and maintain a database.
The DBMS is hence a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications.
Defining involves specifying the data types, structures, and constraints for the data to be stored in the database.
Constructing the database is the process of storing the data itself on some storage medium that is controlled by the DBMS.
Manipulating a database includes querying the database to retrieve specific data, updating the database to reflect changes and generating reports from the data. Database System Architecture (1 of 2) : Database System Architecture (1 of 2) Database System Architecture (2 of 2) : Database System Architecture (2 of 2) The architecture of a database system is greatly influenced by the underlying computer system on which it runs, in particular by such aspects of computer architecture as networking, parallelism, and distribution.
Networking of computers allows some tasks to be executed on a server system, and some tasks to be executed on client systems. This division of work has led to client–server database systems.
Parallel processing within a computer system allows database-system activities to be speeded up, allowing faster response to transactions, as well as more transactions per second. The need for parallel query processing has led to parallel database systems.
Distributing data across sites or departments in an organization allows those data to reside where they are generated or most needed, but still to be accessible from other sites and from other departments. Distributed database systems handle geographically or administratively distributed data spread across multiple database systems. Distributed Database Systems : Distributed Database Systems In a distributed database system, the database is stored on several computers.
The computers in a distributed system communicate with one another through various communication media, such as high-speed networks or telephone lines.
They do not share main-memory or disks.
The computers in a distributed system may vary in size and function, ranging from workstations up to mainframe systems.
The computers in a distributed system are referred to by a number of different names, such as sites or nodes. We mainly use the term site, to emphasize the physical distribution of these systems. A Distributed System : A Distributed System Reasons for building Distributed Database Systems : Reasons for building Distributed Database Systems Sharing data:
The major advantage in building a distributed database system is the provision of an environment where users at one site may be able to access the data residing at other sites.
The primary advantage of sharing data by means of data distribution is that each site is able to retain a degree of control over data that are stored locally.
If one site fails in a distributed system, the remaining sites may be able to continue operating. In particular, if data items are replicated in several sites, a transaction needing a particular data item may find that item in any of several sites. Thus, the failure of a site does not necessarily imply the shutdown of the system. Types of Distributed Database System : Types of Distributed Database System Distributed
Database Types of Distributed
Database System Homogeneous Distributed Database Heterogeneous Distributed Database Homogeneous Distributed Databases : Homogeneous Distributed Databases In a homogeneous distributed database, all sites have identical database management system software, are aware of one another.
Sites agree to cooperate in processing users’ requests.
In such a system, local sites surrender a portion of their autonomy in terms of their right to change schemas or database management system software.
That software must also cooperate with other sites in exchanging information about transactions, to make transaction processing possible across multiple sites. Heterogeneous Distributed Databases : Heterogeneous Distributed Databases In a heterogeneous distributed database, different sites may use different schemas, and different database management system software.
The sites may not be aware of one another, and they may provide only limited facilities for cooperation in transaction processing.
The differences in schemas are often a major problem for query processing, while the divergence in software becomes a hindrance for processing transactions that access multiple sites.
Manipulation of information located in a heterogeneous distributed database requires an additional software layer on top of existing database systems. This software layer is called a multidatabase system. Distributed Data Storage : Distributed Data Storage Consider a relation ‘r’ that is to be stored in the database. There are two approaches to storing this relation in the distributed database:
Replication: The system maintains several identical replicas (copies) of the relation, and stores each replica at a different site. The alternative to replication is to store only one copy of relation ‘r’.
Fragmentation: The system partitions the relation into several fragments, and stores each fragment at a different site.
Fragmentation and Replication can be combined: A relation can be partitioned into several fragments and there may be several replicas of each fragment. Transparency : Transparency The user of a distributed database system should not be required to know either where the data are physically located or how the data can be accessed at the specific local site. This characteristic, called Data Transparency.
Data Transparency can take several forms:
Location transparency System Structure (1 of 2) : System Structure (1 of 2) Each site has its own local transaction manager, whose function is to ensure the ACID properties of those transactions that execute at that site.
The various transaction managers cooperate to execute global transactions. System Structure (2 of 2) : System Structure (2 of 2) To understand how such a manager can be implemented, consider an abstract model of a transaction system, in which each site contains two subsystems:
The transaction manager manages the execution of those transactions (or sub-transactions) that access data stored in a local site. Note that each such transaction may be either a local transaction (that is, a transaction that executes at only that site) or part of a global transaction (that is, a transaction that executes at several sites).
The transaction coordinator coordinates the execution of the various transactions (both local and global) initiated at that site. Distributed Query Processing : Distributed Query Processing In a distributed system, we must take into account several other matters, including
The cost of data transmission over the network
The potential gain in performance from having several sites process parts of the query in parallel
The relative cost of data transfer over the network and data transfer to and from disk varies widely depending on the type of network and on the speed of the disks.
Thus, in general, we cannot focus solely on disk costs or on network costs. Rather, we must find a good tradeoff between the two. System Failure Modes : System Failure Modes A distributed system may suffer from the same types of failure that a centralized system does (for example, software errors, hardware errors, or disk crashes).
The basic failure types are
Failure of a site
Loss of messages
Failure of a communication link
To provide high availability, a distributed database must detect failures, reconfigure itself so that computation may continue, and recover when a processor or a link is repaired. The task is greatly complicated by the fact that it is hard to distinguish between network partitions or site failures. Other Important Issues : Other Important Issues Commit Protocols
If we are to ensure atomicity, all the sites in which a transaction T executed must agree on the final outcome of the execution. T must either commit at all sites, or it must abort at all sites. To ensure this property, the transaction coordinator of T must execute a commit protocol.
The principal idea behind the timestamping scheme in is that each transaction is given a unique timestamp that the system uses in deciding the serialization order.