Normalization: Normalization
What it’s all about: What it’s all about Given a relation, R, and a set of functional dependencies, F, on R.
Assume that R is not in a desirable form for enforcing F.
Decompose relation R into relations, R1,..., Rk, with associated functional dependencies, F1,..., Fk, such that R1,..., Rk are in a more desirable form, 3NF or BCNF.
While decomposing R, make sure to preserve the dependencies, and make sure not to lose information.
Contents: Contents The Good and the Bad
Bad database design
redundancy of fact
fact clutter
information loss
dependency loss
Good database design
How to compute with meaning
functional dependencies - FDs
Armstrong’s inference rules
the meaning of a set of FDs
minimal cover of a set of FDs
Normal Forms - overview
1NF, 2NF, 3NF, BCNF
The Good: The Good
Primitive Domains: Primitive Domains Attributes must be defined over domains with atomic values
Bad Database Design- redundancy of fact: Bad Database Design - redundancy of fact redundancy: airline name repeated for same flight
inconsistency: when airline name for a flight changes, it must be changed many places
Bad Database Design- fact clutter: Bad Database Design - fact clutter insertion anomalies: how do we represent that SK912 is flown by Scandinavian without there being a date and a plane assigned?
deletion anomalies: cancelling AA411 on 10/22/00 makes us lose that it is flown by American.
update anomalies: if DL242 is flown by Sabena, we must change it everywhere.
Bad Database Design- information loss: Bad Database Design - information loss
Bad Database Design- information loss: Bad Database Design - information loss information loss: we polluted the database with false facts; we can’t find the true facts.
Bad Database Design- dependency loss: Bad Database Design - dependency loss dependency loss: we lost the fact that (flt#, date) ® plane#
Good Database Design: Good Database Design no redundancy of FACT (!)
no inconsistency
no insertion, deletion or update anomalies
no information loss
no dependency loss
Slide12: Let X and Y be sets of attributes in R
Y is functionally dependent on X in R iff for each x Î R.X there is precisely one yÎ R.Y
Y is fully functional dependent on X in R if Y is functional dependent on X and Y is not functional dependent on any proper subset of X
We use keys to enforce functional dependencies in relations: X ® Y Functional Dependencies and Keys
Functional Dependencies and Keys: Functional Dependencies and Keys plane# is not determined by flt# alone airline is not determined by flt# and date the FLIGHT relation will not allow the FDs to be enforced by keys
Functional Dependencies and Keys: Functional Dependencies and Keys real world database name address Consider the meaning combined separate
Functional Dependencies: Functional Dependencies Functional Dependencies in the ER-Diagram
AIRPORT « Airportcode
FLT-SCHEDULE « Flt#
FLT-INSTANCE « (Flt#, Date)
AIRPLANE « Plane#
CUSTOMER « Cust#
RESERVATION « (Cust#, Flt#, Date)
RESERVATION « Ticket#
Airportcode ® name, City, State
Flt# ® Airline, Dtime, Atime, Miles, Price, (from) Airportcode, (to) Airportcode
(Flt#, Date) ® Flt#, Date, Plane#
(Cust#, Flt#, Date) ®Cust#, Flt#, Date, Ticket#, Seat#, CheckInStatus,
Ticket# ® Cust#, Flt#, Date
Cust# ® CustomerName, CustomerAddress, Phone#
How to Compute Meaning- Armstrong’s inference rules: How to Compute Meaning - Armstrong’s inference rules Rules of the computation:
reflexivity: if YÍ X, then X®Y
Augmentation: if X®Y, then WX®WY
Transitivity: if X®Y and Y®Z, then X®Z
Derived rules:
Union: if X®Y and X®Z, the X®YZ
Decomposition: if X®YZ, then X®Y and X®Z
Pseudotransitivity: if X®Y and WY®Z, then XW®Z
Armstrong’s Axioms:
sound
complete
How to Compute Meaning-the meaning of a set of FDs, F+: How to Compute Meaning -the meaning of a set of FDs, F+ Given the ribs of an umbrella, the FDs, what does the whole umbrella, F+, look like?
Determine each set of attributes, X, that appears on a left-hand side of a FD. Determine the set, X+, the closure of X under F.
How to Compute Meaningwhen do sets of FDs mean the same?: How to Compute Meaning when do sets of FDs mean the same? F covers E if every FD in E is also in F+
F and E are equivalent if F covers E and E covers F.
We can determine whether F covers E by calculating X+ with respect to F for each FD, X®Y in E, and then checking whether this X+ includes the attributes in Y+. If this is the case for every FD in E, then F covers E.
How to Compute Meaning- minimal cover of a set of FDs: How to Compute Meaning - minimal cover of a set of FDs
Is there a minimal set of ribs that will hold the umbrella open?
F is minimal if:
every dependency in F has a single attribute as right-hand side
we can’t replace any dependency X®A in F with a dependency Y®A where YÌX and still have a set of dependencies equivalent with F
we can’t remove any dependency from F and still have a set of dependencies equivalent with F
How to guarantee lossless joins: How to guarantee lossless joins Decompose relation, R, with functional dependencies, F, into relations, R1 and R2, with attributes, A1 and A2, and associated functional dependencies, F1 and F2.
The decomposition is lossless iff:
A1ÇA2®A1\A2 is in F+, or
A1ÇA2®A2 \A1 is in F+
How to guarantee preservation of FDs: How to guarantee preservation of FDs Decompose relation, R, with functional dependencies, F, into relations, R1,..., Rk, with associated functional dependencies, F1,..., Fk.
The decomposition is dependency preserving iff:
F+=(F1È... È Fk)+
Overview of NFs: Overview of NFs NF2
1NF
2NF
3NF
BCNF
Normal Forms- definitions: Normal Forms - definitions NF2: non-first normal form
1NF: R is in 1NF. iff all domain values are atomic2
2NF: R is in 2. NF. iff R is in 1NF and every nonkey attribute is fully dependent on the key
3NF: R is in 3NF iff R is 2NF and every nonkey attribute is non-transitively dependent on the key
BCNF: R is in BCNF iff every determinant is a candidate key
Determinant: an attribute on which some other attribute is fully functionally dependent.
Example of Normalization: Example of Normalization
Example of Normalization: Example of Normalization 1NF: 3NF & BCNF: 2NF:
3NF that is not BCNF: 3NF that is not BCNF Candidate keys: {A,B} and {A,C} Determinants: {A,B} and {C}
A decomposition:
Lossless, but not dependency preserving!
Major Results in Normalization Theory: Major Results in Normalization Theory Theorem:
There is an algorithm for testing a decomposition for lossless join wrt. a set of FDs
Theorem:
There is an algorithm for testing a decomposition for dependency preservation
Theorem:
There is an algorithm for lossless join decomposition into BCNF
Theorem:
There is an algorithm for dependency preserving decomposition into 3NF