Relational Algebra: Relational Algebra CIS 4301
Lecture Notes
Lecture 13 - 2/23/2006
Database Query Languages: Database Query Languages Use 'Campus' schema
Given a database, ask questions, get data as answers
Ex: Get all students with GPA andgt; 3.7 who applied to Berkeley and Stanford and nowhere else
Ex: Get all humanities departments at campuses in Florida with andlt; 1000 applicants
Ex: Get the campus with highest average accept rate over the last five years
Some questions are easy to pose, some are not
Some questions are easy for DBMS to answer, some are not.
'Query language' also used to update the database
Relational Query Languages: Relational Query Languages Formal: relational algebra, relational calculus, Datalog
Actual: SQL, Quel, Query-by-Example (QBE)
In ALL languages, a query is executed over a set of relations, get single relation as the result
Relational Algebra: Relational Algebra Notation for describing queries in the relational model
Relational model has concrete set of 'standard' operations
Operations are not 'Turing Complete'
Not a defect, helps with query processing and optimization
FYI, a language is Turing Complete if it is powerful enough to implement any Turing machine. It's widely believed that Turing machines can do any calculation that can be performed by a modern computer program
Start by introducing operations of relational algebra, SQL next
Algebra applies to sets of tuples, i.e., relations
Commercial DBMS use different notation of relations which are multisets
Relational Algebra: Relational Algebra Construct new relations from old ones
Set of operators
Relations are operands
Build progressively more complex expressions by applying operators to relations or to rela. algebra expressions (which are relations as well)
Query is an expression of relational algebra
First concrete example of a query language
Four broad classes of operations
Set operations, selection operations, operations that combine data from two relations, rename operation
Sample Relational Schema: Sample Relational Schema Movie (Title,Year,length,filmType,
studioName,producerC#)
StarsIn (MovieTitle,MovieYear,StarName)
MovieStar(Name,address,gender,birthdate)
MovieExec(name,address,Cert#,netWorth)
Studio (Name,address,presC#)
Basics: Basics Operations of traditional relational algebra fall into four broad classes:
Set operations
Operations that remove parts of a relation
Operations that combine tuples of two relations
Renaming
Set Operations: Set Operations Union (binary, commutative, associative)
R S
Intersection (binary, commutative, associative)
R S
Set Difference (binary)
R - S
Set of elements in R but not in S
R-S S-R !!
R(A1,A2,…,An), S(B1,B2,…,Bn) must be union compatible
R and S are of the same degree
for each i, dom(Ai) = dom(Bi)
Columns of R and S must be ordered so that order of attributes is same for both relations
Example: Example name address gender birthdate Carrie Fisher Mark Hamil 123 Maple St., Hollywood 456 Oak Rd., Brentwood F M 9/9/99 8/8/88 R name address gender birthdate Carrie Fisher Harrison Ford 123 Maple St., Hollywood 789 Palm Dr., Beverly Hills F M 9/9/99 7/7/77 S
Sample Operations: Sample Operations name address gender birthdate Carrie Fisher 123 Maple St., Hollywood F 9/9/99 R S name address gender birthdate Carrie Fisher Harrison Ford 123 Maple St., Hollywood 789 Palm Dr., Beverly Hills F M 9/9/99 7/7/77 R S Mark Hamil 456 Oak Rd., Brentwood M 8/8/88 name address gender birthdate R - S Mark Hamil 456 Oak Rd., Brentwood M 8/8/88
Relational Operator: Project: Relational Operator: Project Project (unary)
andlt;attr listandgt; (R)
andlt;attr listandgt; is a list of attributes (columns) from R only
Ex: title, year, length (Movie) 'horizontal restriction' A1 A2 A3 … An ... i A1 A2… Ak ... j, i j n k
Project: Project PROJECT can produce many tuples with same value
Relational algebra semantics says remove duplicates
SQL does not -- one difference between formal and actual query languages
Relational Operator: Select: Relational Operator: Select Select or Restrict (unary, commutative)
andlt;predicateandgt; (R)
andlt;predicateandgt; is a conditional expression of the type that we are familiar with from conventional programming languages
andlt;attributeandgt; andlt;opandgt; andlt;attributeandgt;
andlt;attributeandgt; andlt;opandgt; andlt;constantandgt;
attribute in R
op {=,,andlt;,andgt;,, …, AND, OR}
Ex: length100 (Movie) vertical restriction'
Pictorially: Pictorially A1 A2 A3 … An ... i A1 A2 A3 … An ... j, i j title year length filmType Star Wars Mighty
Ducks Wayne’s
World 1977 1991 1992 124 104 95 color color color Movie result set # of selected tuples is referred to as the selectivity of the condition
Cartesian Product: Cartesian Product Cartesian Product (binary, commutative, associative)
R x S
Sets of all pairs that can be formed by choosing the first element of the pair to be any element of R, the second any element of S
Relation schema is union of schemas for R and S
Resulting schema may be ambiguous
Use R.A or S.A to disambiguate an attribute that occurs in both schemas
Example: Example A B 1 2 3 4 B C 2 5 4 7 D 6 8 9 10 11 x A R.B S.B C D R S 1 2 2 5 6 1 2 4 7 8 1 2 9 10 11 3 4 3 4 3 4 2 5 6 4 7 8 9 10 11
Join Operations: Join Operations Natural Join (binary)
R join S
Match only those tuples from R and S that agree in whatever attributes are common to the schemas of R and S
If r and s from r(R) and s(S) are successfully paired, result is called a joined tuple
This join operation is the same we used in earlier section to recombine relations that had been projected onto two subsets of their attributes (e.g., as a result of a BCNF decomposition)
Example: Example A B 1 2 3 4 B C 2 5 4 7 D 6 8 9 10 11 join Resulting schema has attributes from R, either R or S (i.e., joining attribute(s)), and S
Tuples that fail to pair with any tuple of the other relation are called dangling tuples A B C D R S 1 2 5 6 3 4 7 8
Join Operations: Join Operations Theta Join (binary)
R joinC S, where C is an arbitrary join condition
Step 1: take the product of R and S
Step 2: Select from the product only those tuples that satisfy condition C
As with the product operation, the schema for the result is the union of the schemas of R and S
Example: Example B C 2 3 2 3 D 4 5 7 8 10 joinAandlt;D AND U.BV.B A B 1 2 6 7 C 3 8 9 7 8 U V V.B V.C D A U.B U.C 1 2 3 7 8 10
Final Word on Join: Final Word on Join DBMS often implements theta-join as basic operation
Use of term 'join' in implementation circles usually refers to theta-join or sometimes to cross-product