Relational Database Design BCNF, 3NF : Relational Database Design BCNF, 3NF
Looking for a “Good” Form: Looking for a “Good” Form Recall that the goal of a good database design are Lossless decomposition - necessary in order to ensure correctness of the data Dependency preservation – not necessary, but desirable in order to achieve efficiency of updates Good form – desirable in order to avoid redundancy. But what it means for a table to be in good form? If the domains of all attributes in a table contain only atomic values, then the table is in First Normal Form (1NF). In other words, there are no nested tables, multi-valued attributes, or complex structures such as lists. Relational tables are always in 1NF, according to the definition of the relational model.
Second Normal Form (2NF): Second Normal Form ( 2NF ) R is a relation schema, with the set F of FDs R is in 2NF if and only if for each FD: X {A} in F+ Then A X (the FD is trivial) , or X is not a proper subset of a candidate key for R, or A is a prime attribute A prime attribute is an attribute that is part of a candidate key In 2NF, a subset of a candidate key cannot determine a non-prime attribute. HINT: whenever you try to determine the normal form (2NF, 3NF, BCNF) of a table, you always have to find all candidate keys.
2NF Example: 2NF Example Consider the relation scheme {A,B,C,D} with the FDs: {A,B} {C,D} and {A} {D} {A,B} is a candidate key (it is not a proper subset) { A } is a proper subset of a candidate key { D } is not a prime attribute This scheme is not in 2NF because of { A } { D } 2NF is not important (after this slide forget it). We can always achieve a better form (3NF) that is lossless, preserves dependencies and contains less redundancy.
Third Normal Form (3NF): Third Normal Form ( 3NF ) R is a relation schema, with the set F of FDs R is in 3NF if and only if for each FD: X {A} in F+ Then A X (trivial FD), or X is a superkey for R, or A is prime attribute for R In words: For every FD that does not contain extraneous (useless) attributes: the LHS is a candidate key, or the RHS is a prime attribute, i.e., it is an attribute that is part of a candidate key
3NF Example: 3NF Example R = (B, C, E) F = {{E} {B}, {B,C} {E}} Remember that you always have to find all candidate keys in order to determine the normal form of a table Two candidate keys: BC and EC {E} {B} B is prime attribute {B,C} {E} BC is a candidate key None of the FDs violates the rules of the previous slide. Therefore, R is in 3NF
Redundancy in 3NF: Redundancy in 3NF Bank-schema = (Branch B, Customer C, Employee E) F = { {E} {B} , e.g., an employee works in a single branch {B,C} {E} }, e.g., when a customer goes to a certain branch s/he is always served by the same employee Branch Customer Employee HKUST Wong Au HKUST Chin Au Central Wong Jones Central null Cheng A 3NF table still has problems redundancy (e.g., we repeat that Au works at HKUST branch ) need to use null values (e.g., to represent that Cheng works at Central even though he is not assigned any customers).
Boyce-Codd Normal Form (BCNF): Boyce-Codd Normal Form ( BCNF ) R is a relation schema, with the set F of FDs R is in BCNF if and only if for each FD: X {A} in F+ Then A X (trivial FD), or X is a superkey for R In words : For every FD that does not contain extraneous (useless) attributes, the LHS of every FD is a candidate key. BCNF tables have no redundancy. If a table is in BCNF it is also in 3NF (and 2NF and 1NF)
BCNF Example: BCNF Example R = (B, C, E) F = {{E} {B}, {B,C} {E}} Two candidate keys: BC and EC {B,C} {E} does not violate BCNF because BC is a key {E} {B} violates BCNF because E is not a key In order to achieve BCNF we have to decompose the table but how? Since the decomposition must be lossless, we only have one option: R1(B, E ), and R2( C,E ). The common attribute E should be key of one fragment, here R1.
BCNF Example (cont): BCNF Example (cont) Bank-schema = (Branch B, Customer C, Employee E) F = { {E} {B}, {B,C} {E} } Decompose into R1( B, E ), and R2( C,E ) Branch Customer Employee HKUST Wong Au HKUST Chin Au Central Wong Jones Central null Cheng Branch Employee HKUST Au Central Jones Central Cheng Customer Employee Wong Au Chin Au Wong Jones We have avoided the problems of redundancy and null values of 3NF
BCNF Example (cont): BCNF Example (cont) We can generate the original table by joining the two fragments (however, but we must use an outer join -an outer join fills null values for tuples that do not have join partners) Branch Cust. Empl. HKUST Wong Au HKUST Chin Au Central Wong Jones Central null Cheng Branch Employee HKUST Au Central Jones Central Cheng Customer Employee Wong Au Chin Au Wong Jones Is the decomposition dependency preserving? No. We loose {B,C} {E} Can we have a dependency preserving decomposition? No. No matter how we break we loose {B,C} {E} since it involves all attributes =
Observations about BCNF: Observations about BCNF Best Normal Form Avoids the problems of redundancy and all anomalies There is always a lossless decomposition that generates BCNF tables However, we may not be able to preserve all dependencies Next step: an algorithm for automatically generating BCNF tables.
Algorithm for BCNF Decomposition : Algorithm for BCNF Decomposition Let R be the initial table with FDs F S={R} Until all relation schemes in S are in BCNF for each R in S for each FD X Y that violates BCNF for R S = (S – {R}) (R-Y) (X,Y) enduntil This is a simplified version. In words: When we find a table R with BCNF violation X Y we: 1] Remove R from S 2] Add a table that has the same attributes as R except for Y 3] Add a second table that contains the attributes in X and Y
BCNF Decomposition Example : BCNF Decomposition Example Let us consider the relation scheme R=(A,B,C,D,E) and the FDs: {A} {B,E}, {C} {D} Candidate key: AC Both functional dependencies violate BCNF because the LHS is not a candidate key Pick {A} {B,E} We can also choose {C} {D} – different choices lead to different decompositions. (A,B,C,D,E) generates R1= (A,C,D) and R2= (A,B,E) Do we need to decompose further?
BCNF Decomposition Example (cont): BCNF Decomposition Example (cont) ( A,C ,D) and ( A ,B,E) {A} {B,E}, {C} {D} We need to decompose R1= ( A,C ,D) because of the FD {C} {D} Thus ( A,C ,D) is replaced with R3=(A,C) and R4=(C,D). Final decomposition: R2=(A,B,E), R3=(A,C), R4=(C,D) Is the decomposition lossless ? Yes the algorithm always creates lossless decompositions. In step S = (S – {R}) (R-Y) (X,Y) we replace R with tables (R-Y) and (X,Y) that have X as the common attribute and X Y, i.e., X is the key of (X,Y) Is the decomposition dependency preserving ? Yes because F2={{A} {B,E}}, F3= , F4={{C} {D}} and (F2 F3 F4)+=F+ But remember: sometimes we may not be able to preserve dependencies
Testing if a FD violates BCNF: Testing if a FD violates BCNF Important question: which dependencies to check for BCNF violations? F or F + ? Answer-Part 1 : To check if a table R with a given set of FDs F is in BCNF, it suffices to check only the dependencies in F Consider R (A, B, C, D), with F = {{A} {B}, {B} {C}} The key is {A,D}. R violates BCNF because the LHS of both {A} {B} and {B} {C}. Neither A nor B is a key . We can see that by simply using F - we do not need F+ (e.g., we do not need to check the implicit FD {A} {C}) We can show that if none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.
Testing if a FD violates BCNF (cont): Testing if a FD violates BCNF (cont) Answer-Part 2 : However, using only F is insufficient when testing a fragment in the decomposition of R Consider again R(A,B,C,D), with F = {{A} {B}, {B} {C}} that violates BCNF Decompose R into and R1(A,C,D) and R2(A,B) There is no FD in F that contains only attributes from R1(A,C,D) so we might be mislead into thinking that R1 is in BCNF. In fact, dependency {A} {C} in F+ shows that R1 is not in BCNF. Therefore, for the decomposed relations we also need to consider dependencies in F+ (see next slide).
Testing if a FD violates BCNF (cont): Testing if a FD violates BCNF (cont) To check if a relation R i in a decomposition of R is in BCNF, Either test R i for BCNF with respect to the restriction of F+ to R i (that is, all FDs in F+ that contain only attributes from R i ) or use the the following test: for every set of attributes X R i , check that X+ either includes no attribute of R i -X, or includes all attributes of R i . If the condition is violated, the dependency X (X+ - X ) R i holds on R i , and R i violates BCNF. We use above dependency to decompose R i Note : we have seen how to compute X+ in the previous class about FDs.
Testing if a FD violates BCNF - Example: Testing if a FD violates BCNF - Example Consider again: R(A,B,C,D), F = {{A} {B}, {B} {C}} and the decomposition R1(A,C,D) and R2(A,B) A+={A,B,C}, B+={B,C}, C+={C} R2(A,B) is in BCNF because A+ R2 ={A,B,C} {A,B}={A,B} includes all attributes of R2 B+ R2 ={B,C} {A,B}={B} includes no attributes of R2 - {B} In other words, each attribute (e.g., A) determines everything (it is a key) or nothing (e.g., B). R1(A,C,D) is not in BCNF because A+ R1 = {A,B,C} {A,C,D}={A,C} does not include all attributes of R1 Therefore, the dependency {A} {C} causes a BCNF violation and will be used for further decomposing R1 Final decomposition: R2(A,B), R3(A,D), R4(A,C)
Different BCNF Decompositions: Different BCNF Decompositions The different possible orders in which we consider FDs violating BCNF in the algorithm may lead to different decompositions Previous example: R(A,B,C,D), F = {{A} {B}, {B} {C}} Previous BCNF decomposition: R2(A,B), R3(A,D), R4(A,C) Question: is the decomposition dependency preserving ? Answer: No – we lost the dependency {B} {C} Question: Can you obtain a dependency preserving decomposition? Answer: Yes – in the first decomposition we first applied violation {A} {B}. If, instead, we apply {B} {C} we obtain: R1=(A,B,D) and R2=(B,C) We decompose R1=(A,B,D) further using {A} {B} to obtain: R3=(A,D) and R4=(A,B) The final decomposition R2=(B,C), R3=(A,D), R4=(A,B) is dependency preserving .
Third Normal Form: Motivation: Third Normal Form: Motivation We can always obtain a lossless join decomposition in BCNF using the previous algorithm. However, there are some situations where there does not exist a dependency preserving BCNF decomposition, and efficient checking for FD violation on updates is important Solution: use the weaker Third Normal Form (3NF). Allows some redundancy (with related problems) But FDs can be checked on individual relations without computing a join. There is always a lossless-join, dependency-preserving decomposition into 3NF. see next algorithm
Algorithm for 3NF Synthesis : Algorithm for 3NF Synthesis Let R be the initial table with FDs F Compute the canonical cover Fc of F S= for each FD X Y in the canonical cover Fc S=S (X,Y) if no scheme contains a candidate key for R Choose any candidate key CN S=S table with attributes of CN Note: unlike the BCNF algorithm where we break the original relation, in 3NF we synthesize the tables using the FDs in the canonical cover
3NF Example: 3NF Example Bank=(branch-name, customer-name, banker-name, office-number) Functional dependencies (also canonical cover ): {banker-name} {branch-name, office-number} {customer-name, branch-name} {banker-name} Candidate Keys: { customer-name, branch-name } or {customer-name, banker-name} {banker-name} {office-number} violates 3NF 3NF tables – for each FD in the canonical cover create a table Banker = ( banker-name , branch-name, office-number) Customer-Branch = ( customer-name, branch-name , banker-name) Since Customer-Branch contains a candidate key for Bank, we are done. Question: is the decomposition lossless and dependency preserving? Answer: Yes – all decompositions generated by this algorithm have these properties
Normalization Goals: Normalization Goals Goal for a relational database design is: BCNF. Lossless join. Dependency preservation. If we cannot achieve this, we accept one of Lack of dependency preservation in BCNF Redundancy due to use of 3NF Interestingly, SQL does not provide a direct way of specifying functional dependencies other than superkeys. Can specify FDs using assertions/triggers, but they are expensive to test
ER Model and Normalization: ER Model and Normalization When an E-R diagram is carefully designed, the tables generated from the E-R diagram should not need further normalization. However, in a real (imperfect) design there can be FDs from non-key attributes of an entity to other attributes of the entity E.g. employee entity with attributes department-number and department-address , and an FD department-number department-address Good design would have made department an entity
Universal Relation Approach: Universal Relation Approach We start with a single universal relation and we decompose it using the FDs (no ER diagrams) Assume Loans(branch-name, loan-number, amount, customer-id, customer-name) and FDs: {loan-number} {branch-name, amount, customer-id} {customer-id} {customer-name} We apply existing decomposition algorithms to generate tables : Loan( loan-number , branch-name, amount, customer-id) Customer( customer-id ,customer-name)
Denormalization for Performance: Denormalization for Performance May want to use non-normalized schema for performance E.g. displaying customer-name along with loan-number and amount requires join of loan with customer Alternative 1: Use denormalized relation containing attributes of loan as well as customer with all above attributes faster lookup Extra space and extra execution time for updates extra coding work for programmer and possibility of error in extra code Alternative 2: use a materialized view defined as loan JOIN customer Benefits and drawbacks same as above, except no extra coding work for programmer and avoids possible errors
Other Design Issues: Other Design Issues Some aspects of database design are not caught by normalization Examples of bad database design, to be avoided: Instead of earnings ( company-id, year, amount ), use earnings-2000, earnings-2001, earnings-2002 , etc., all on the schema ( company-id, earnings ). Above are in BCNF, but make querying across years difficult and needs new table each year company-year ( company-id, earnings-2000, earnings-2001, earnings-2002 ) Also in BCNF, but also makes querying across years difficult and requires new attribute each year.