6- 7 Relational Algebra I

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Relational Algebra : 

Relational Algebra Keys: Databases use tables to organize information. Each table consists of a number of rows, each of which corresponds to a single database record. Databases keep all of these records straight through the use of keys. A database key is an attribute utilized to sort and/or identify data in some manner.

Slide 2: 

Why Keys Are Important? Keys are crucial to a table structure for the following reasons: They ensure that each record in a table is precisely identified. They help establish and enforce various types of integrity. They serve to establish table relationships.

Slide 3: 

Primary Key: Every database table should have one or more columns designated as the primary key. The value this key holds should be unique for each record in the database. For example, assume we have a table called Employees that contains personnel information for every employee in our firm. In this case a better choice for primary key might be to use a unique employee ID number that you assign to each employee when they’re hired.

Slide 4: 

Some organizations choose to use Social Security Numbers (or similar government identifiers) for this task because each employee already has one and they’re guaranteed to be unique. Once you decide upon a primary key and set it up in the database, the database management system will enforce the uniqueness of the key. If you try to insert a record into a table with a primary key that duplicates an existing record, the insert will fail.

Slide 5: 

Foreign Key: These keys are used to create relationships between tables. Returning to our employees database, let’s imagine that we wanted to add a table containing departmental information to the database. This new table might be called Departments and would contain a large amount of information about the department as a whole. We’d also want to include information about the employees in the department, but it would be redundant to have the same information in two tables (Employees and Departments). Instead, we can create a relationship between the two tables.

Slide 6: 

Let’s assume that the Departments table uses the Department Name column as the primary key. To create a relationship between the two tables, we add a new column to the Employees table called Department. We then fill in the name of the department to which each employee belongs. We also inform the database management system that the Department column in the Employees table is a foreign key that references the Departments table. The database will then enforce referential integrity by ensuring that all of the values in the Departments column of the Employees table have corresponding entries in the Departments table.

Slide 7: 

Note that there is no uniqueness constraint for a foreign key. We may have more than one employee belonging to a single department. Similarly, there’s no requirement that an entry in the Departments table have any corresponding entry in the Employees table. It is possible that we’d have a department with no employees.

Slide 8: 

Secondary Key: A secondary key is a key that holds the physical location of a record or a portion of a record in a file or database, and provides an alternative means of accessing data. Also known as alternate key. A secondary key is made on a field that you would like to be indexed for faster searches. A secondary key uses an additional structure that is called an index. This is similar to the idea of an index that is used in textbooks. A textbook index alphabetically lists important terms at the end of a book. Next to each term are page numbers. You can quickly search the index to find a list of page numbers (addresses), and you can locate the term by searching the specified pages. The index is an exact indicator that shows where each term occurs in the textbook.

Slide 9: 

When you define a secondary key, an index is automatically maintained and reflects the sorting order that is defined by the key. The fields that make up the secondary keys do not always contain unique data, and the DBMS does not reject records with duplicate data in secondary key fields. If two or more records contain identical information in the secondary key, then the DBMS uses the primary key for the table to resolve this conflict.

Slide 10: 

Composite Keys: A table, has attributes. Those attributes are denoted as the columns of that table. Depending on data and the entity that is modeled, it is possible for one column to serve as the primary key. In other cases, the primary key is composed of more than one column. A composite key is simply a key that is composed of more than one column. A composite key is a combination of keys that produce one unique key.

Slide 11: 

Candidate Key: A candidate key is a field or combination of fields that can act as a primary key field for that table to uniquely identify each record in that table. A candidate key refers to the attributes that are used to identify a database record without the help of any extraneous data. Each table has one or more candidate keys and one of these keys is selected as the primary key of the table.

Slide 12: 

Integrity Rules: Relational tables follow certain integrity rules to ensure that the data they contain are accurate and are always accessible. Integrity rules are laws that govern the operations allowed on data in a database. This ensures data accuracy and consistency. First, the rows in a relational table should all be distinct. A second integrity rule of the traditional relational model is that column values must not be repeating groups.

Slide 13: 

A third aspect of data integrity involves the concept of a null value. A database takes care of situations where data may not be available by using a null value to indicate that a value is missing. It does not equate to a blank or zero. When each row in a table is different, it is possible to use one or more columns to identify a particular row. This unique column or group of columns is called a primary key. Any column that is part of a primary key cannot be null; if it were, the primary key containing it would no longer be a complete identifier. This rule is referred to as entity integrity.

Relation Schema : 

Relation Schema Formally, given domains D1, D2, …. Dn a relation r is a subset of D1 x D2 x … x DnThus, a relation is a set of n-tuples (a1, a2, …, an) where each ai  Di Schema of a relation consists of attribute definitions name type/domain integrity constraints

Relation Instance : 

Relation Instance The current values (relation instance) of a relation are specified by a table. An element t of r is a tuple, represented by a row in a table. Order of tuples is irrelevant (tuples may be stored in an arbitrary order) Jones Smith Curry Lindsay customer_name Main North North Park customer_street Harrison Rye Rye Pittsfield customer_city customer attributes (or columns) tuples (or rows)

Database : 

Database A database consists of multiple relations. Information about an enterprise is broken up into parts, with each relation storing one part of the information. Example: account : information about accountsdepositor : which customer owns which account customer : information about customers

The customer Relation : 

The customer Relation

The depositor Relation : 

The depositor Relation

Why Split Information Across Relations? : 

Why Split Information Across Relations? Storing all information as a single relation such as bank(account_number, balance, customer_name, ..)results in repetition of information e.g., if two customers own an account (What gets repeated?) the need for null values e.g., to represent a customer without an account Normalization theory deals with how to design relational schemas.

Query Languages : 

Query Languages Query language is a language in which user requests information from the database. Categories of languages: Procedural language Non-procedural language, “Pure” languages: Relational algebra Tuple relational calculus Domain relational calculus Pure languages form underlying basis of query languages that people use.

Relational Algebra : 

Relational Algebra Relational Algebra is a procedural Pure language. In Relational Algebra there are six basic operators: select:  project:  union:  set difference: – Cartesian product: x rename:  The operators take one or two relations as inputs and produce a new relation as a result.

Select Operation : 

Select Operation The SELECT operation is used to select a subset of the tuples from a relation that satisfy a selection condition. A select operation is a filter that keeps only those tuples that satisfy a qualifying condition.  (sigma) is used to denote the SELECT operator, and the selection condition is a Boolean expression specified on the attributes of relation R. R is generally a relational algebra expression whose result is a relation. The Boolean expression specified in <selection condition > is made up of a number of clauses of the form: <attribute name> <comparison op> <constant value> OR <attribute name> <comparison op> <attribute name>

Slide 23: 

Where : <attribute name> is the name of an attribute of R, <comparison op> is normally one of the operators{ =,<,<=,>,>=,!=} and <constant value> is a constant value from the attribute domain. Clauses can be arbitrarily connected by Boolean operators AND, OR, and NOT to form a general selection condition.

Slide 24: 

Examples: 1. Select the employee tuples who works in department 4:  DNO=4(EMPLOYEE) 2. Select employees whose salary is greater than 30000:  SALARY>30000 (EMPLOYEE) 3. Select employees who either work in department 4 and make over 25,000 per year or work in department 5 and make over 30,000: (DNO=4 AND SALARY>25000) OR (DNO=5AND salary > 30000) (EMPLOYEE)

Slide 25: 

The PROJECT Operation Project operation keeps only certain attributes (columns) from a relation R specified in an attribute list L. Form of operation:  L(R) Resulting relation has only those attributes of R specified in L.

Slide 26: 

Example: 1. To list down Employee’s First and last name and Salary:  FNAME,LNAME,SALARY(EMPLOYEE) The PROJECT operation eliminates duplicate tuples in the resulting relation so that it remains a mathematical set (no duplicate elements): Example:  SEX,SALARY(EMPLOYEE) If several male employees have salary 30000, only a single tuple <M, 30000> is kept in the resulting relation. Duplicate tuples are eliminated by the  operation.

Slide 27: 

Sequences of operations: Several operations can be combined to form a relational algebra expression (query): Example: Retrieve the names and salaries of employees who work in department 4:  FNAME,LNAME,SALARY ( DNO=4) (EMPLOYEE)

Slide 28: 

Alternatively, we can specify explicit intermediate relations for each step: DEPT4_EMPS <-  DNO=4 (EMPLOYEE) R <-  FNAME,LNAME,SALARY(DEPT4_EMPS)

Slide 29: 

Attributes can optionally be renamed in the resulting left-hand-side relation (this may be required for some operations that will be presented later): DEPT4_EMPS <- DNO=4 (EMPLOYEE) R(FIRSTNAME,LASTNAME,SALARY) <- FNAME,LNAME,SALARY(DEPT4_EMPS)

Slide 30: 

Set Operations Set theoretic operations are used to merge the elements of two sets in various ways. These are binary operations; that is, each is applied to two sets. Binary operations form mathematical set theory: UNION: R1 U R2, INTERSECTION: R1 | | R2, SET DIFFERENCE: R1 - R2, CARTESIAN PRODUCT: R1 X R2. When these operations are adapted to relational databases, the two relations on which any of the above three operations are applied must have the same type of tuples; this condition is called Union Compatible if they have the same degree n. This means that the two relations have the same number of attributes and that each pair of corresponding attributes have the same domain.

Slide 31: 

UNION: The result of this operation, denoted by R  S is a relation that includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are eliminated. INTERSECTION: The result of this operation, denoted by R n S, is a relation that includes all tuples that are in both R and S. SET DIFFERENCE: The result of this operation denoted by R – S, is a relation that includes all tuples that are in R but not in S. CARTESIAN PRODUCT: It is also known as CROSS PRODUCT or CROSS JOIN denoted by X, which is also a binary set operation, but the relations on which it is applied do not have to be union compatible. This operation is used to combine tuples from two relations in a combinatorial fashion.

Union Operation – Example : 

Union Operation – Example Relations r, s: r  s: A B    1 2 1 A B   2 3 r s A B     1 2 1 3

Intersect Operation – Example : 

Intersect Operation – Example Relations r, s: r n s: A B    1 2 1 A B   2 3 r s A B  2

Set Difference Operation – Example : 

Set Difference Operation – Example Relations r, s: r – s: A B    1 2 1 A B   2 3 r s A B   1 1

Cartesian - Product Operation – Example : 

Cartesian - Product Operation – Example Relations r, s: r x s: A B   1 2 A B         1 1 1 1 2 2 2 2 C D         10 10 20 10 10 10 20 10 E a a b b a a b b C D     10 10 20 10 E a a b b r s

Banking Example : 

Banking Example branch (branch_name, branch_city, assets) customer (customer_name, customer_street, customer_city) account (account_number, branch_name, balance) loan (loan_number, branch_name, amount) depositor (customer_name, account_number) borrower (customer_name, loan_number)

Example Queries : 

Example Queries 1. Find all loans of over 1200 2. Find the loan number for each loan of an amount greater than 1200 amount > 1200 (loan) loan_number (amount > 1200 (loan)) 3. Find the names of all customers who have a loan, an account, or both, from the bank customer_name (borrower)  customer_name (depositor)

Aggregate Functions and Operations : 

Aggregate Functions and Operations Aggregation function takes a collection of values and returns a single value as a result. avg: average valuemin: minimum valuemax: maximum valuesum: sum of valuescount: number of values

Aggregate Operation – Example : 

Aggregate Operation – Example Relation r: A B         C 7 7 3 10 g sum(c) (r) sum(c ) 27

Aggregate Operation – Example : 

Aggregate Operation – Example Relation account grouped by branch-name: branch_name g sum(balance) (account) branch_name account_number balance Perryridge Perryridge Brighton Brighton Redwood A-102 A-201 A-217 A-215 A-222 400 900 750 750 700 branch_name sum(balance) Perryridge Brighton Redwood 1300 1500 700

Aggregate Functions (Cont.) : 

Aggregate Functions (Cont.) Result of aggregation does not have a name. We can use rename operation to give it a name: branch_name g sum(balance) as sum_balance (account)

Slide 42: 

JOINS Inner Joins: It is also called simple join. It returns all rows from both tables if there is any match. E.g:SELECT * FROM R1, R2 WHERE R1.r1_field = R2.r2_field;

Example: : 

Example: Relation loan Relation borrower

Slide 44: 

Example: loan inner join borrower on loan.loan_number = borrower.loan_number

Slide 45: 

Outer Joins: An extension of the join operation that avoids loss of information. Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join. Uses null values. null signifies that the value is unknown or does not exist .

Slide 46: 

LEFT OUTER JOIN: R1 X R2 lets every tuple in R1 appear in the result. Example: loan left outer join borrower on loan.loan_number = borrower.loan_number

Slide 47: 

RIGHT OUTER JOIN: R1 X R2 lets every tuple in R2 appear in the result.

Slide 48: 

FULL OUTER JOIN: R1 X R2 lets every tuple in R1 or R2 appear in the result. Example: Full Outer Join borrower using (loan_number)

Slide 49: 

Codd’s Rules: The information Rule. Systematic treatment of null values. Guaranteed Access Rule. Dynamic online catalog based on relational model. Comprehensive data sublanguage rule. View updating rule. High level insert, update & delete. Physical data independence. Logical data independence. Integrity independence. Distribution independence. Non subversion rule.

Slide 50: 

Information Rule:All information in the database should be represented in one and only one way -- as values in a table. 2. Guaranteed Access Rule: Each and every datum (atomic value) is guaranteed to be logically accessible by resorting to a combination of table name, primary key value, and column name.

Slide 51: 

3. Systematic Treatment of Null Values:Null values (distinct from empty character string or a string of blank characters and distinct from zero or any other number) are supported in the fully relational DBMS for representing missing information in a systematic way, independent of data type. 4. Dynamic Online Catalog Based on the Relational Model: The database description is represented at the logical level in the same way as ordinary data, so authorized users can apply the same relational language to its interrogation as they apply to regular data.

Slide 52: 

5.Comprehensive Data Sublanguage Rule:A relational system may support several languages and various modes of terminal use. However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and whose ability to support all of the following is comprehensible:   a. data definition    b. view definition    c. data manipulation (interactive and by program)    d. integrity constraints    e. authorization f. transaction boundaries (begin, commit, and rollback). 6. View Updating Rule:All views that are theoretically updateable are also updateable by the system.

Slide 53: 

7. High-Level Insert, Update, and Delete:The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data, but also to the insertion, update, and deletion of data. 8. Physical Data Independence:Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representation or access methods.

Slide 54: 

9. Logical Data Independence:Application programs and terminal activities remain logically unimpaired when information preserving changes of any kind that theoretically permit unimpairment are made to the base tables. 10. Integrity Independence:Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.

Slide 55: 

11. Distribution Independence:The data manipulation sublanguage of a relational DBMS must enable application programs and terminal activities to remain logically unimpaired whether and whenever data are physically centralized or distributed. 12. Nonsubversion Rule:If a relational system has or supports a low-level (single-record-at-a-time) language, that low-level language cannot be used to subvert or bypass the integrity rules or constraints expressed in the higher-level (multiple-records-at-a-time) relational language.

authorStream Live Help