INTRODUCTION TO RELATIONAL MODEL: : INTRODUCTION TO RELATIONAL MODEL: The data in RDBMS is stored in database objects called tables. The table is a collection of related data entries and it consists of columns and rows. Remember, a table is the most common and simplest form of data storage in a relational database. Prof.Indrani Sen MCA,MPhil Computer Science
Database Schema: : Database Schema: It includes descriptions of the database structure, data types, and the constraints on the database. In a relational database, the schema defines the tables, the fields in each table, and the relationships between fields and tables. Schemas are generally stored in a data dictionary. Although a schema is defined in database language, the term is often used to refer to a graphical depiction of the database structure. For e.g. a schema of the Bank database can be as follows: Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Instances: A collection of records stored in the database at any particular point of time is called an instance of the database. Instance of Database change over time as information is inserted and deleted. The following section shows an instance of the Bank database. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
Functional dependancy: Functional dependancy Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: AcNo CustID Type BranchID Balance A00023 C02 Savings B04 90,000 A00121 C01 Current B01 80,567 A00234 C02 Savings B04 12,000 A01208 C06 Current B02 45,000 A01208 C01 Current B02 45,000 A00789 C04 Savings B04 4,500 A00893 C05 Savings B06 67,900 A00987 C03 Current B02 78,600 In the above table the AcNo uniquely determines the BranchID or it means that a for a particular AcNo there is only one particular BranchID allotted. It is shown by the expression AcNo - > BranchID . This means attribute BranchID is dependent on the attribute AcNo . Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: A functional dependency is defined as a constraint or a rule between two sets of attributes in a relation (Table) from a database. One attribute set is known as the Determinant set and another set is known as the Dependant set. Thus , in a given record, if the values of the A attributes determine or control the corresponding value of the D attribute, A is the determinant and D is the dependent attribute set. Given a relation R, a set of attributes A in R is said to functionally determine another attribute D, also in R, (written as A D) if and only if each A value is associated with at most one D value. Prof.Indrani Sen MCA,MPhil Computer Science
Closure: Closure For a particular Table We need to consider all functional dependencies that hold. Given a set F of functional dependencies, we can prove that certain other ones also hold. We say these ones are logically implied by F . The closure of a set F of functional dependencies is the set of all functional dependencies logically implied by F . We denote the closure of F by F+ Prof.Indrani Sen MCA,MPhil Computer Science
Armstrong’s Axioms: Armstrong’s Axioms Reflexivity : If B is a subset of A then A functionally determines B For example: {Name, location} -> {name} { AcNo , BranchID }-> { BranchID } It also means that an attribute determines itself. A->A B->B Prof.Indrani Sen MCA,MPhil Computer Science
Augmentation: : Augmentation : If B is a subset of A and C functionally determines D Then A and C functionally determine B and D For example: { EmpID , Name} determines name {Designation} determines salary Then { EmpID , Name and Designation} determines Name and Salary Or AB->B C->D AC->BD Prof.Indrani Sen MCA,MPhil Computer Science
Transitivity: : Transitivity : If A functionally determines B and B functionally Determines C then A functionally determines C { EmpID } determines Designation {Designation} determines Salary Therefore { EmpID } determines Salary A->B B->C A->C Prof.Indrani Sen MCA,MPhil Computer Science
Pseudo transitivity:: Pseudo transitivity : If A functionally determines B and B and C functionally determine D Then A and C functionally determine D A->B BC->D AC->D EmpID determines Designation Designation and Salary determines Provident_Fund Then EmpID and Salary determines Provident fund Prof.Indrani Sen MCA,MPhil Computer Science
Union: : Union : If A functionally determines B and A functionally Determines C then A functionally determines B and C A->B A->C A->BC EmpID determines Designation EmpID determines Name EmpID determines {Name} and {Designation} Prof.Indrani Sen MCA,MPhil Computer Science
Decomposition: : Decomposition : If A functionally determines B and C Then A functionally determines B and A functionally determines C A->BC A->B A->C EmpID determines Name and Contact_No Then EmpID determines Name and EmpID determines Contact-No Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: A->A A->B A->C A->D CD->E[given] BC- >E[ pseudotransitivity ] A->BC[given] A->E[transitivity ] A->ABCDE E->ABCDE CD->ABCDE B->D[given] BC->ABCDE[ pseudotransitivity ] Prof.Indrani Sen MCA,MPhil Computer Science
Candidate Key: Candidate Key A minimal set of attributes which determines all the other attributes in a relation A->ABCDE E->ABCDE CD->ABCDE BC->ABCDE Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
NORMALIZATION : NORMALIZATION Normalization is the process of creating a database structure which is capable of storing consistent data which is easily accessible and is free from redundancies. It protects the database from insertion, update and deletion anomalies which may result in data inconsistency and loss of data integrity. Prof.Indrani Sen MCA,MPhil Computer Science
TYPES OF FUNCTIONAL DEPENDENCIES : TYPES OF FUNCTIONAL DEPENDENCIES The following are the different types of functional dependencies we can consider in a relational DBMS. Trivial Functional Dependencies A functional dependency which is obvious is known as trivial functional dependency. A trivial functional dependency occurs when you derive the functional dependency of an attribute which is a subset of a collection of attribute sets. For example, “{ A, B} -> B” is a trivial functional dependency. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Lec_Room_No + Lec_Date + PK Lec_Time Room_size Programme_Code Lec_Course_code Lect_T_Name 601 20/08/2013 7.00-9.00 50 MScIT -I PP Anita Ghosal 601 20/08/2013 9.30-10.30 50 MScIT-I DS Manisha P 601 25/08/2013 11.00-1.00 50 MSc IT-I DS Manisha P 601 21/08/2013 9.00-11.00 50 MScIT-I IT AparnaPanigrahy 602 21/08/2013 7.00-9.00 100 MScIT-II AI Kavita P 602 20/08/2013 9.00-11.00 100 MScIT-II DW Vaishali Mishra 603 21/09/2013 9.00-11.00 70 TYIT DAT Mahesh Naik 603 25/09/2013 12.00-2.00 70 TYIT DAT Mahesh Naik 603 21/09/2013 11.30-1.30 70 TYIT AC Indrani Sen 603 22/09/2013 9.00-11.00 70 TYIT AC Indrani Sen Prof.Indrani Sen MCA,MPhil Computer Science
Full Functional Dependencies : Full Functional Dependencies An attribute C is said to be fully functionally dependent on another attribute set AB if it is functionally dependent on AB and not functionally dependent on any subset of AB. For example ,Consider the Table Lecture below, In this case Room_Size is not fully functionally dependent On the Composite key which consists of a set of attributes Lec_Room_No,Lec_Date and Lec_time,because Room_Size is dependent on Room_No which is a subset of the Composite key. The Lec_Course_Code on the other hand is fully functionally dependent on Lec_Room_No , Lec_Date and Lec_Time . { Lec_Room_No , Lec_Date , Lec_Time }-> Lec_Course_Code Prof.Indrani Sen MCA,MPhil Computer Science
Partial Functional Dependency : Partial Functional Dependency Partial functional dependency is said to occur on a set of attributes on the LHS when an attribute is dependent on a subset of attributes on the LHS and even on removing an attribute the functional dependency continues to hold. For example consider AB-> C.In this case if B is removed from the LHS, and still A->C continues to exist then C is partial functionally dependent on AB . Consider the Employee Table { EmpID,EmpName }->Designation. This is only partial dependence because even if you remove Name from the LHS, { EmpID } -> {Designation} holds. Similarly Room_Size attribute in the Table Lecture is partially dependent on { Lec_Room_No , Lec_Date , and Lec_Time }. In simple words a partial dependency is said to occur when a non-key attribute is dependent only on a part of the composite key and not on the whole of it. Prof.Indrani Sen MCA,MPhil Computer Science
Transitive Dependencies : Transitive Dependencies Transitive dependency is said to occur if there is an indirect relationship. If A->B and B->C then A->C For Example in the above Lecture Table:- { Lec_Room_No , Lec_Date , Lec_Time } -> { Lec_Course_Code } { Lec_Course_Code }-> Lec_T_Name Then { Lec_Room_No , Lec_Date , Lec_Time }->{ Lec_T_Name } Therefore Lec_T_Name is transitively dependent on the Primary key. A transitive dependency is said to occur if there is a dependency which exists between two non-key attributes. Prof.Indrani Sen MCA,MPhil Computer Science
Multivalued Dependencies : Multivalued Dependencies A multivalued dependency on R , A ->> B, says that if two tuples or records of R have the same value for the attributes of A, then their components in B may be swapped, and the result will be two tuples that are also in the relation. Consider the following tuples of the Account Table which possess the same values for the attribute “ AcNo ” and multiple values for the attribute CustID.If we now swap the values of AcNo,the resulting tuples also belongs to the Account relation hence we can conclude AcNO ->> CustID . Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: AcNo CustID Type BranchID Balance A01208 C06 Current B02 45,000 A01208 C01 Current B02 45,000 Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: AcNo CustID Type BranchID Balance A00121 C01 Current B01 80,567 A01208 C01 Current B02 45,000 Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Anomalies in a Relation: Data Redundancy: Redundant information occurs in a relation if the same fact is duplicated in several records. The most obvious problem of data redundancy is unnecessary usage of memory. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Update anomalies: It occurs when it is possible to change the value of some records leaving the same value unchanged in other records. It leads to data inconsistency. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Deletion anomalies: It occurs while deleting a record if it deletes other records from the database which do not require to be deleted. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Insertion anomalies: It occurs when insertion of a particular fact or data value requires the insertion of some other unwanted facts into the database. Ideally the relation schemas should not exhibit any anomalies. Normalization is a process that can be used to create such good relational schema design. Prof.Indrani Sen MCA,MPhil Computer Science
THE FIRST NORMAL FORM: : THE FIRST NORMAL FORM: First normal form (1NF) sets the very basic rules for an organized database like: Organize the data into related columns-The data should be organized into columns. Each column contains related data. Ensure that there are no repeating groups of data. A row of data cannot contain repeating groups of data. Ensure that there is a primary key. All field values must be atomic. Prof.Indrani Sen MCA,MPhil Computer Science
Elliminate repeating Groups: : Elliminate repeating Groups: For example consider the table in which the data is organized into related columns. But the tables contains a number of NULL values due to the absence of data values for that particular column. This is known as repeating groups. For example, the member “ Sandhya Chowdhury” and “ Mitu Singh” does not have a Mobile phone number. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science There can be several problems faced by the DBA while updating and modifying the above table. Insertion anomalies: If we want to insert a new member, the member should have two addresses and contact numbers, otherwise we end up inserting null values into our table. Updating Anomalies : For example,we want to add a third address or phone number belonging to the member 1002, which implies adding another column to the above table in which only the member 1002 will have a data value. Deletion Anomalies : If we want to delete the secondary address of a particular member, we cannot delete the entire column which will result in undue loss of secondary addresses of the other members as well.
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science Mem_ID PK Mem_Name Mem_DOB Address Phone1 Phone2 1001 Rita Chowdhury 12/04/2001 Malad (E), Mumbai 02224567890 9920879452 1002 Sonia Dutta 1/12/1953 Goregaon (E ), Mumbai 02223456712 9866600235 1003 Romi Singh 1/1/2000 MG road, Bangalore 04028779048 9920478945 1004 Siddhant Kapoor 2/09/1977 Park street, Kolkata 03336789025 9867238970 1005 Sandhya Chowdhury 19/09/1986 Hissar, Gurgaon 01126745600 1006 Tina Dutta 17/07/1980 Koramagla , Bangalore 04025647900 9920456780 1007 Tushar Kapoor 12/04/1957 MaidanGarhi, Delhi 02227947902 9982409798 1008 Mitu Singh 11/02/1997 Esplanade, Kolkata 02247889025
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
Ensuring the presence of a unique Primary key in the Table: : Ensuring the presence of a unique Primary key in the Table: The tables created after removing the repeating groups do not possess an unique value for the Mem_ID any more. Thus Mem_ID cannot continue as a Primary key. Further To eliminate the null values completely, we need to decompose the above Table(Member) into two new Tables “Mem_Address” and.” Mem_ContactNo ”. In Mem_Address we can consider a composite key consisting of Mem_ID and Contact_Address and similarly in Mem_ContactNo we can consider a composite key consisting of Mem_ID and Contact_No . Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
All the fields should be atomic: : All the fields should be atomic: Meaning a field value cannot be decomposed into smaller pieces or should not be decomposed into smaller parts with more than one kind of data in it. The goal of this normalization is to identify redundancy of data and make retrieval of data easier. A relation scheme is said to be in first normal form (1NF) if the values in the domain of each attribute of the relation is atomic. Meaning only one value is associated with each attribute and the value is not a list of values. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
THE SECOND NORMAL FORM: : THE SECOND NORMAL FORM: The second normal form does not allow partial dependency between a non key attribute and a primary attribute. In the above table all the non-primary attributes, i.e. Mem_Fname , Mem_Lname , Mem_Locality , Mem_City and Mem_DOB are fully dependant on the primary key Mem_ID,hence the above table is in 2NF. But it is not the same always. Consider the following table which contains time table data related to the lectures. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: In the above relation a lecture is identified by the Room_No , Date and time where it is scheduled to be held. The Primary key of the above table is a unique combination of the { Lec_room_no,Lec_Date,Lec_time }. According to the 2 nd Normal form all the non key attributes are fully dependant on the Primary key . If you carefully analyze the above relation, we can see that the Room_Size attribute is functionally dependant on the Lec_Room_No . So the above table is not in 2NF. The table or the relation suffers from the problem of data redundancy and can lead to data update anomalies . If, a lecture is rescheduled to a different room, the Room_No as well as the Room_Size attribute needs to be changed. Again we understand that a specific Programme is scheduled in a particular room . So Programme is also functionally dependant on the Room_no . Both of these partial dependencies have to be removed and thus we decompose the relation. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: This transitive dependency results in redundant data and insert, update and deletion anomalies. So we consider the further decomposition of the above “Lecture” relation into “Lecture” and “Course” relation. T he decomposition should be lossless i.e. if we perform a Cartesian product of the decomposed sub relations (Lecture and Course) we should get back the original table without losing any data. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Lec_Room_No + Lec_Date + PK Lect_Time Lec_Course_code 601 20/08/2013 7.00-9.00 PP 601 20/08/2013 9.30-10.30 DS 601 25/08/2013 11.00-1.00 DS 601 21/08/2013 9.00-11.00 IT 602 21/08/2013 7.00-9.00 AI 602 20/08/2013 7.00-9.00 DW 603 21/09/2013 9.00-11.00 DAT 603 25/09/2013 12.00-2.00 DAT 603 21/09/2013 11.30-1.30 AC 603 22/09/2013 9.00-11.00 AC Prof.Indrani Sen MCA,MPhil Computer Science Lec_Course_code Lec_T_Name PP Anita Ghosal DS Manisha P IT AparnaPanigrahy AI Kavita P DW Vaishali Mishra DAT Mahesh Naik AC Indrani Sen
THE BOYCE CODD NORMAL FORM: : THE BOYCE CODD NORMAL FORM: BCNF is an extension of 3rd Normal Form (3NF). 3NF states that all data in a table must depend only on that table’s primary key, and not on any other field in the table. 3NF removes transitive dependencies. BCNF states that if there exists any functional dependency in the table it will solely be on the candidate keys. ”A relation R is in Boyce- Codd normal form (BCNF) if and only if every determinant is a candidate key”. Prof.Indrani Sen MCA,MPhil Computer Science
BCNF: BCNF BCNF is an extension of 3rd Normal Form ( 3NF ). 3NF states that all data in a table must depend only on that table’s primary key, and not on any other field in the table. 3NF removes transitive dependencies. BCNF states that if there exists any functional dependency in the table it will solely be on the candidate keys. ”A relation R is in Boyce- Codd normal form ( BCNF ) if and only if every determinant is a candidate key”. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Example 1: Consider the following table student. Student ( Adhar_ID , Stud_name , address, college_code , college_name , college_City , HSC_percentage , Program) Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science Adhar_ID Stud_name Address college_code college_name college_City HSC_percentage Program 11012 RON GOREGAON,MUMBAI 101 PATKAR MUMBAI 80 BSc 11014 A NI MALAD,MUMBAI 113 NIRMALA MUMBAI 80 B.Com 11016 MANISH PARK STREET,KOLKATA 213 ASHUTOSH KOLKATA 50 B.Com 10020 RITESH NERUL,MUMBAI 101 PATKAR MUMBAI 84 B.Com 10032 MINA BANSDRONI,KOLKATA 113 NIRMALA MUMBAI 56 B.Com
PowerPoint Presentation: The determinants in the Student relation are as follows: Adhar_ID→Stud_Name,Address,HSc_percentage,Program,College_Code,College_Name,College_City ( Stud_Name,HSc_Percentage )→ Adhar_ID,Address,College_code,College_Name,College_City,Program ( Stud_Name )-> Adhar_ID,Address,College_code,College_Name,College_City,Program (Address)-> Adhar_ID,College_code,College_Name,College_City,Program College_Code→College_name,College_city Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science Adhar_ID Stud_name Address HSC_percentage Program college_code 11012 RON GOREGAON,MUMBAI 80 BSc 101 11014 A NI MALAD,MUMBAI 80 B.Com 113 11016 MANISH PARK STREET,KOLKATA 50 B.Com 213 10020 RITESH NERUL,MUMBAI 84 B.Com 101 10032 MINA BANSDRONI,KOLKATA 56 B.Com 113 college_code college_name college_City 101 PATKAR MUMBAI 113 NIRMALA MUMBAI 213 ASHUTOSH KOLKATA
PowerPoint Presentation: Let us find out the dependencies of the Table Student again:- Adhar_ID→Stud_Name,Address,HSc_percentage,Program,College_Code ( Stud_Name,Address )-> Adhar_ID,HSc_percAentage,Program,College_code (Address)-> Adhar_ID,Stud_Name,HSc_percentage , Program,College_Code In the above case, the LHS consists of only Candidate keys and hence the above Table is in BCNF . Prof.Indrani Sen MCA,MPhil Computer Science
But if we consider that a single student can get admitted in more than one program then the scenario changes. : But if we consider that a single student can get admitted in more than one program then the scenario changes. Prof.Indrani Sen MCA,MPhil Computer Science Adhar_ID Stud_name Address HSC_percentage Program college_code 11012 RON GOREGAON,MUMBAI 80 BSc 101 11014 A NI MALAD,MUMBAI 80 B.Com 113 11016 MANISH PARK STREET,KOLKATA 50 B.Com 213 10020 RITESH NERUL,MUMBAI 84 B.Com 101 10032 MINA BANSDRONI,KOLKATA 56 B.Com 113 11012 RON GOREGAON,MUMBAI 80 BCA 113 11016 MANISH PARK STREET,KOLKATA 50 BMS 213
PowerPoint Presentation: ( Adhar_ID,Program )→ Stud_Name,Address,HSc_percentage,College_Code ( Adhar_ID,Program )-> Adhar_ID , HSc_percentage,College_Code ( Adhar_ID )-> Student_Name,Student_Address ( Student_Name )-> Adhar_ID,Student_Address ( Student_Address )-> Adhar_ID,Student_Name Prof.Indrani Sen MCA,MPhil Computer Science
In this case all the LHS (Determinants)are not candidate keys, because Adhar_ID,Student_Name and Student_Address exists in duplicate and do not determine all the other attributes.Hence the relation is not in BCNF SO we decompose the relation separating the determinants. : In this case all the LHS (Determinants)are not candidate keys, because Adhar_ID,Student_Name and Student_Address exists in duplicate and do not determine all the other attributes.Hence the relation is not in BCNF SO we decompose the relation separating the determinants. Prof.Indrani Sen MCA,MPhil Computer Science Adhar_ID Program college_code 11012 BSc 101 11014 B.Com 113 11016 B.Com 213 10020 B.Com 101 10032 B.Com 113 11012 BCA 113 11016 BMS 213 Adhar_ID Stud_name HSC % Address 11012 RON 80 GOREGAON,MUMBAI 11014 A NI 80 MALAD,MUMBAI 11016 MANISH 50 PARK STREET,KOLKATA 10020 RITESH 84 NERUL,MUMBAI 10032 MINA 56 BANSDRONI,KOLKATA
FOURTH NORMAL FORM : FOURTH NORMAL FORM The anomalies which remain in the relation after BCNF can be removed by using Fourth Normal form. A relation is said to be in 4NF, if it is already in BCNF and contain multivalued dependencies. A relation or Table is said to contain a multivalued dependency if more than one attributes possess multivalued functional dependencies and the attributes are independent of each other. In other words, a multivalued dependency occurs when a relation R has attributes A, B, and C such that A determines a set of values for B [A->>B] A determines a set of values for C [A->>C]and B and C are independent of each other. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: The anomalies which remain in the relation after BCNF can be removed by using Fourth Normal form. A relation is said to be in 4NF, if it is already in BCNF and , for every one of its non-trivial multivalued dependencies X Y, X is a superkey —that is, X is either a candidate key or a superset. The above Relation possess a multivalued functional dependency BranchID ->> { AcNo , Balance}. Note: This does not imply that BranchID ->> AcNo and BranchID ->>Balance BranchID is not a super key or candidate key of the Relation .Hence the Relation is not in 4NF. Hence we decompose the relation into R1 (XY) R2(R-X) Prof.Indrani Sen MCA,MPhil Computer Science
Example 1: Consider the Table Student_Program 5.30 which is in BCNF : Prof.Indrani Sen MCA,MPhil Computer Science Example 1: Consider the Table Student_Program 5.30 which is in BCNF Adhar_ID Program college_code 11012 BSc 101 11014 B.Com 113 11016 B.Com 213 10020 B.Com 101 10032 B.Com 113 11012 BCA 113 11016 BMS 213
PowerPoint Presentation: Adhar_ID ->>Program, College_Code Prof.Indrani Sen MCA,MPhil Computer Science
Example 2: Consider the Table Account 5.33 which is in BCNF : Example 2: Consider the Table Account 5.33 which is in BCNF Prof.Indrani Sen MCA,MPhil Computer Science AcNo PK BranchID Type Balance A00023 B04 Savings 90000 A00121 B01 Current 80567 A00234 B04 Savings 12000 A01208 B02 Current 45000 A00789 B04 Savings 4500 A00893 B06 Savings 67900 A00987 B02 Current 78600
PowerPoint Presentation: BranchID ->> AcNo , Balance Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science BranchID + AcNo Balance B01 A00121 80567 B02 A01208 45000 B02 A00987 78600 B04 A00023 90000 B04 A00234 12000 B04 A00789 4500 B06 A00893 67900 BranchID Type B01 Current B02 Current B04 Savings B06 Savings
3D constraint: 3D constraint IF a student S1 takes up program P1 AND College C1 conducts program P1 And student gets admitted in College C1 The student S1 takes up program P1 in college C1 Prof.Indrani Sen MCA,MPhil Computer Science
Courses - tutors - levels (CTL): Courses - tutors - levels (CTL)
CTL - 2 attribute projections: CTL - 2 attribute projections CT TL CL
CTL - 3-decomposable: CTL - 3-decomposable the join of any two projections is not CTL; e.g: join(CT, TL) Extra!
Constraint 3D illustrated on the CTL relation: Constraint 3D illustrated on the CTL relation IF tutor t1 teaches subject s1 AND level l1 studies subject s1 AND tutor t1 teaches level l1 THEN tutor t1 teaches subject s1 for level l1 note: this constraint is not expressed in CTL
Constraint 3D and Join Dependency: Constraint 3D and Join Dependency 4NF does not express the constraint 3D
Join dependency: Join dependency Let R be a relation. Let A, B, ..., Z be arbitrary subsets of R’s attributes. R satisfies the JD ( A, B, ..., Z ) if and only if R is equal to the join of its projections on A, B, ..., Z
5 NF: 5 NF R is in 5NF if and only if every join dependency in R is implied by the candidate keys of R 5NF is always achievable
Explanation: Explanation a join dependency, (A, B, …, Z), is implied by the candidate keys, K1, …, Km of R if the fact that K1, …, Km are candidate keys for R determine the fact that R has the JD (A, B, …, Z)
Illustration - positive example: Illustration - positive example consider R (S_id, S_name, Status, City) with S_id and S_name candidate keys ({S_id, S_name, Status}, {S_id, City}) is a JD because S_id is a candidate key in R ({S_id, S_name}, {S_id, Status}, {S_name, City}) is a JD because S_id and S_name are both candidate keys in R
Concluding remarks: Concluding remarks 5NF is the ultimate normal form with respect to projection / join 5NF is guaranteed to be free of all anomalies that can be eliminated via projections determining whether a relation is in 4NF but not in 5NF is still fuzzy very rare in practice
Recap: Recap JD - a more general constraint than MD a relation can be in 4NF and have un-expressed JDs this results in update anomalies such a relation can be decomposed (via projection) into an equivalent set of 5NF relations a relation is 5NF if all its JDs are deducible from its candidate keys for a relation in 4NF but not in 5NF, an unexpressed JD is a possible decomposition (towards 5NF)
Decomposing Relations: : Decomposing Relations: The anomalies mentioned above can be eliminated by splitting or decomposing the relation schema. Let R be a relation schema which is decomposed into two relation schemas R1 and R2. Let ‘a’ be the set of attributes which belong to the relation R1(a R1). Let ‘b’ be the set of attributes belonging to R2 (b R2). Then R is composed of the union of the attributes from the relations R1 and R2. (a b) R. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: A set of relation schemas {R1, R2,…, Rn} is a decomposition of R if R = R1 U R2 U …..U Rn. Each Ri is a subset of R(for i = 1,2…,n). For relation R(x, y, z) there can be 2 subsets: R1(x , z) and R2(y, z ). The goal of decomposition is to eliminate redundancy by decomposing a relation into several smaller relations . It is important to check that a decomposition does not lead to bad design so that we may lose our data. For example consider the following relation, R. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: ProdID Prodname Supplier Price P1 Hair oil S1 150 P2 Shampoo S1 300 P3 Pen S2 100 P4 Ruler S2 50 P5 Fevicol S3 200 Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: R1 ProdID Supplier P1 S1 P2 S1 P3 S2 P4 S2 P5 S3 Prof.Indrani Sen MCA,MPhil Computer Science R2 Prodname Supplier Price Hair oil S1 150 Shampoo S1 300 Pen S2 100 Ruler S2 50 Fevicol S3 200
PowerPoint Presentation: After doing a Union of the above decomposed relations we get the relation as shown below. This relation is not the same as the original relation and additional records are obtained with the new records and leads to loss of information. Thus the above decomposition is called Lossy Decomposition . The Table obtained after the Union of the decomposed table is called Dependency preserving if it retains the same set of functional dependencies as the original Table. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
Lossless-Join Decomposition : Lossless-Join Decomposition Let R be a relation schema and let F be a set of FDs over R . A decomposition of R into two schemas with attribute sets X and Y is said to be a lossless-join decomposition In general,though , the other direction does not hold . If we take projections of a relation and recombine them using natural join, we typically obtain some tuples that were not in the original relation. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: In other words, the attributes common to R 1 and R 2 must contain a key for either R 1 or R 2. If a relation is decomposed into two relations, this test is a necessary and condition for the decomposition to be lossless-join. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science The decomposition of relation schema R with FDs F into schemas with attribute sets X and Y is dependency-preserving if (FX [ FY )+ = F+. That is, if we take the dependencies in FX and FY and compute the closure of their union, we get back all dependencies in the closure of F. Therefore, we need to enforce only the dependencies in FX and FY ; all FDs in F+ are then sure to be satised . To enforce FX, we need to examine only relation X (on inserts to that relation). To enforce FY , we need to examine only relation Y.
DECOMPOSITION: DECOMPOSITION Prof.Indrani Sen MCA,MPhil Computer Science
Decomposing Relations: : Decomposing Relations: The anomalies mentioned above can be eliminated by splitting or decomposing the relation schema. Let R be a relation schema which is decomposed into two relation schemas R1 and R2. Let ‘a’ be the set of attributes which belong to the relation R1(a R1). Let ‘b’ be the set of attributes belonging to R2 (b R2). Then R is composed of the union of the attributes from the relations R1 and R2. (a b) R. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: A set of relation schemas {R1, R2,…, Rn} is a decomposition of R if R = R1 U R2 U …..U Rn. Each Ri is a subset of R(for i = 1,2…,n). For relation R(x, y, z) there can be 2 subsets: R1(x , z) and R2(y, z ). The goal of decomposition is to eliminate redundancy by decomposing a relation into several smaller relations . It is important to check that a decomposition does not lead to bad design so that we may lose our data. For example consider the following relation, R. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: ProdID Prodname Supplier Price P1 Hair oil S1 150 P2 Shampoo S1 300 P3 Pen S2 100 P4 Ruler S2 50 P5 Fevicol S3 200 Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: R1 ProdID Supplier P1 S1 P2 S1 P3 S2 P4 S2 P5 S3 Prof.Indrani Sen MCA,MPhil Computer Science R2 Prodname Supplier Price Hair oil S1 150 Shampoo S1 300 Pen S2 100 Ruler S2 50 Fevicol S3 200
PowerPoint Presentation: After doing a Union of the above decomposed relations we get the relation as shown below. This relation is not the same as the original relation and additional records are obtained with the new records and leads to loss of information. Thus the above decomposition is called Lossy Decomposition . The Table obtained after the Union of the decomposed table is called Dependency preserving if it retains the same set of functional dependencies as the original Table. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
The desirable properties of decomposition: The desirable properties of decomposition Lossless join decomposition Dependency preserving decomposition Prof.Indrani Sen MCA,MPhil Computer Science
Lossless decomposition: Lossless decomposition A decomposition of the relation scheme R into relations R1, R2... Rn is Lossless if, the original relation can be retrieved by a natural join of the relations which are a projection of the original relation. Let R be a relation schema. Let F be a set of functional dependencies on R . Let R1 and R2 form a decomposition of R . The decomposition is a lossless-join decomposition of R if at least one of the following functional dependencies are in : R1∩R2→R1 R1∩R2→R2 Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: The above FDs enforces a constraint that the attributes involved in the natural join R1 and R2 should be a candidate key for at least one of the two relations. This ensures that we can never get the situation where spurious tuples are generated, as for any value on the join attributes there will be a unique tuple in one of the relations. Prof.Indrani Sen MCA,MPhil Computer Science
PowerPoint Presentation: Prof.Indrani Sen MCA,MPhil Computer Science
Dependency preserving decomposition : Dependency preserving decomposition A decomposition of the relation scheme R into Subschemas ’ R1, R2... Rn is Dependency preserving if the closure set of FDs within R can be derived from those within the relations R1, R2... Rn. If F is the set of dependencies defined on R, then the set of dependencies obtained from R1 and R2 together will yield R. Prof.Indrani Sen MCA,MPhil Computer Science