Level 3.1 CBT course on Database Concepts

Category: Entertainment

Presentation Description

No description available.


Presentation Transcript

Slide 1: 

3/3/2011 TPG Confidential 1 Technology Practices Group Java Competency Framework "Incubate, Nurture and Deploy Technology Experts Innovate, Build and Deliver Technology Solutions”

Slide 2: 

3/3/2011 TPG Confidential 2 Introduction

Slide 3: 

3/3/2011 TPG Confidential 3 TABLE OF CONTENTS


3/3/2011 TPG Confidential 4 TABLE OF CONTENTS 4.4 Object-relational model 4.5 ML database model 4.6 Star schema model 4.7 Snowflake Schemas Model 5. Case Study


3/3/2011 TPG Confidential 5 INTRODUCTION A Data model is a conceptual representation of data structures (tables) required for a database and is very powerful in expressing and communicating the business requirements. A data model visually represents the nature of data, business rules governing the data, and how it will be organized in the database. A data model is comprised of two parts logical design and physical design.


3/3/2011 TPG Confidential 6 INTRODUCTION In the view of an object-oriented developer data modeling is conceptually similar to class modeling. With data modeling you identify entity types whereas with class modeling you identify classes. Data attributes are assigned to entity types just as you would assign attributes and operations to classes. There are associations between entities, similar to the associations between classes – relationships, inheritance, composition, and aggregation are all applicable concepts in data modeling.


3/3/2011 TPG Confidential 7 DATA MODEL LIFE CYCLE 1) Conceptual Data Modeling (CDM) - First Phase:    This data model includes all major entities, relationships and it will not contain much detail about attributes and is often used in the INITIAL PLANNING PHASE. 2) Logical Data Modeling (LDM) - Second Phase:     This is the actual implementation of a conceptual model in a logical data model. A logical data model is the version of the model that represents all of the business requirements of an organization. 3) Physical Data Modeling (PDM) - Third Phase:      This is a complete model that includes all required tables, columns, relationship, database properties for the physical implementation of the database.

Conceptual Data Modeling (CDM) : 

3/3/2011 TPG Confidential 8 Conceptual Data Modeling (CDM) Conceptual data model is created by gathering business requirements. i) CDM is the first step in constructing a data model in top-down approach and is a clear and accurate visual representation of the business requirement. ii) CDM visualizes the overall structure of the database and provides high-level information about the subject areas or data structures of an organization. iii) CDM comprises of entity types and relationships. The relationships between the subject areas and the relationship between each entity in a subject area are drawn by symbolic notation(IDEF1X or IE). In a data model, cardinality represents the relationship between two entities. i.e. One to one relationship, or one to many relationship or many to many relationship between the entities. iv) CDM contains data structures that have not been implemented in the database.

CDM - Example : 

3/3/2011 TPG Confidential 9 CDM - Example Below figure shows a conceptual data model that contains major entities from Customer, Sales Order, and Item.

Logical Data Modeling (LDM) : 

3/3/2011 TPG Confidential 10 Logical Data Modeling (LDM) This is the actual implementation and extension of a conceptual data model. A Logical data model is the version of a data model that represents the business requirements (entire or part) of an organization and is developed before the physical data model. As soon as the conceptual data model is accepted by the functional team, development of logical data model gets started. A sound logical design should streamline the physical design process by clearly defining data structures and the relationships between them. A good data model is created by clearly thinking about the current and future business requirements. Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules.

LDB - Example : 

3/3/2011 TPG Confidential 11 LDB - Example In the example, we have identified the entity names, attribute names, and relationship.

Physical Data Modeling (PDMs) : 

3/3/2011 TPG Confidential 12 Physical Data Modeling (PDMs) Logical data model is approved by functional team and there-after development of physical data model work gets started. PDMs are used to design the internal schema of a database, depicting the data tables, the data columns of those tables, and the relationships between the tables. The transformations from logical model to physical model include imposing database rules, implementation of referential integrity, super types and sub types etc. Once physical data model is completed, it is then forwarded to technical teams (developer, group lead, DBA) for review.

Slide 13: 

3/3/2011 TPG Confidential 13 PDM - Example In the example, the entity names have been changed to table names, changed attribute names to column names, assigned nulls and not nulls, and datatype to each column.

How to Model Data? : 

3/3/2011 TPG Confidential 14 How to Model Data? Lets take an example of Customer Order Management to Model Data. Following are the steps involved in designing. Step 1: Identify entity types An entity type, also simply called entity, is similar conceptually to object-orientation’s concept of a class – an entity type represents a collection of similar objects. Examples of entities in an order management system would include the following Customer Address Order Item

How to Model Data? : 

3/3/2011 TPG Confidential 15 Step 2: Identify attributes Each entity type will have one or more data attributes. A column is the implementation of a data attribute within a relational database The Customer entity can have attributes such as First Name, Last Name and Title. The Address entity can have address line 1 and 2, city, state and zip code. The Order entity can have Item, Quantity and Price. The item entity can have the item code, item description, price and available stock. How to Model Data?

How to Model Data? : 

3/3/2011 TPG Confidential 16 How to Model Data? Step 3: Apply naming conventions Follow organization or client guidelines for both logical and physical modeling, the logical naming conventions should be focused on human readability whereas the physical naming conventions will reflect technical considerations. For E.g. Customer Entity will have the following attributes names. Logical Representation Physical Representation

How to Model Data? : 

3/3/2011 TPG Confidential 17 Step 4: Identify relationships Entities have relationships with other entities. For example, customers PLACE orders, customers LIVE AT addresses, and line items ARE PART OF orders. The relationships between entities are conceptually identical to the relationships (associations) between objects. How to Model Data?

How to Model Data? : 

3/3/2011 TPG Confidential 18 How to Model Data? Step 5 : Apply data model patterns Identify the best suited data model pattern and apply. We will see the details of each model later, Step 6: Assign keys There are two fundamental strategies for assigning keys to tables Unique Key : A unique key or primary key is a candidate key to uniquely identify each row in a table. A unique key or primary key comprises a single column or set of columns. In our Customer table, CUST_NR and CUST_SSN can be unique keys for identifying a customer. 2) Foreign Key : A foreign key is a referential constraint between two tables. The foreign key identifies a column or a set of columns in one (referencing) table that refers to a column or set of columns in another (referenced) table. The columns in the referencing table must be the primary key or other candidate key in the referenced table. In our example Address table will have a foreign key CUST_NR, to identify the addresses of the Customer.

How to Model Data? : 

3/3/2011 TPG Confidential 19 How to Model Data? In the below example CUSTOMER_NR and CUSTOMER_SSN are unique keys. ADDRESS_ID is foreign key to CUST_ADDRESS table.

How to Model Data? : 

3/3/2011 TPG Confidential 20 How to Model Data? Step 7: Normalize to reduce data redundancy Database normalization, is a technique for designing relational database tables to minimize duplication of information and, to safeguard the database against certain types of logical or structural problems, namely data anomalies. Normal Forms: Normal Forms are series of guidelines for ensuring that databases are normalized. First Normal Form (1NF) First normal form (1NF) sets the very basic rules for an organized database: Eliminate duplicative columns from the same table. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

How to Model Data? : 

3/3/2011 TPG Confidential 21 How to Model Data? Second Normal Form (2NF) Second normal form (2NF) further addresses the concept of removing duplicative data: Meet all the requirements of the first normal form. Remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships between these new tables and their predecessors through the use of foreign keys. Third Normal Form (3NF) Third normal form (3NF) goes one large step further: Meet all the requirements of the second normal form. Remove columns that are not dependent upon the primary key.

How to Model Data? : 

3/3/2011 TPG Confidential 22 How to Model Data? Fourth Normal Form (4NF) Fourth normal form (4NF) has one additional requirement: Meet all the requirements of the third normal form. A relation is in 4NF if it has no multi-valued dependencies. Fifth normal form The table must be in 4NF. There must be no non-trivial join dependencies that do not follow from the key constraints. A 4NF table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.

Benefits of Normalization : 

3/3/2011 TPG Confidential 23 Benefits of Normalization Searching, sorting, and creating indexes is faster, since tables are narrower, and more rows fit on a data page. More flexibility in tuning queries. Index searching is often faster, since indexes tend to be narrower and shorter. More tables allow better use of segments to control physical placement of data. Data modification commands are faster. Fewer null values and less redundant data, making your database more compact. Triggers execute more quickly if you are not maintaining redundant data. Data modification anomalies are reduced. Normalization is conceptually cleaner and easier to maintain and change as your needs change.

How to Model Data? : 

3/3/2011 TPG Confidential 24 Step 8: Denormalize to Improve Performance Denormalization basically is the process of finding the balance between avoiding data duplication and ensuring database performance. Few Types of Denormalization 1) Pre-Joined Tables : used when the cost of joining is prohibitive 2) Report Tables : used when specialized critical reports are needed 3) Mirror Tables : used when tables are required concurrently by two different types of environments 4) Split Tables : used when distinct groups use different parts of a table 5) Combined Tables : used when one to one relationships exist 6) Redundant Data : used to reduce the number of table joints required 7) Repeating Groups : used to reduce I/O 8) Derivable Data : used to eliminate calculations and algorithms 9) Speed Tables : used to support hierarchies How to Model Data?

Slide 25: 

3/3/2011 TPG Confidential 25 Hierarchical model Network model Relational model Object-relational model XML database model Star schema model Snowflake Schemas Model Common Database Models

Hierarchical Model : 

3/3/2011 TPG Confidential 26 Hierarchical Model Data is organized into a tree-like structure. This structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent. Example: See below Organization hierarchical representation of data. Tree Structure Table Representation

Slide 27: 

3/3/2011 TPG Confidential 27 The network model is a database model conceived as a flexible way of representing objects and their relationships. The hierarchical model structures data as a tree of records, with each record having one parent record and many children, the network model allows each record to have multiple parent and child records. Network Model

Relational Model : 

3/3/2011 TPG Confidential 28 Relational Model The relational model stores data in tables. Each table holds data about one particular type of thing: customers, products etc. Within a table, each row represents one instance of the type of things that the tables stores - – one customer, one product and so on Thus a customer table might have columns for customer number, name, telephone number, credit limit, outstanding balance and so on. Simple example of customers and orders:

Object-Relational Model : 

3/3/2011 TPG Confidential 29 Object-Relational Model An object-relational database (ORD) or object-relational database management system (ORDBMS) is a database management system similar to a relational database, but with an object-oriented database model. Objects, classes and inheritance are directly supported in database schemas and in the query language. In addition, it supports extension of the data model with custom data-types and methods.

XML Model : 

3/3/2011 TPG Confidential 30 XML Model An XML database is a data persistence software system that allows data to be stored in XML format. This data can then be queried, exported and serialized into any format the developer wishes. XML support in the database brings with it the advantage of being able to translate relational data into XML and vice versa. XML Data can be queried or modified. Sample Table:

Star Schema : 

3/3/2011 TPG Confidential 31 Star Schema The star schema is the simplest data warehouse schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. A star schema is characterized by one or more very large fact tables that contain the primary information, and a number of much smaller dimension tables (or lookup tables), each of which contains information about the entries for a particular attribute in the fact table.

Snowflake Schema Model : 

3/3/2011 TPG Confidential 32 Snowflake Schema Model The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake. Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. This saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance.

Case study : 

3/3/2011 TPG Confidential 33 Case study REQUIREMENT: Territory is a segment of the firm's market - group of present and potential customers assigned to a salesperson or a group of salespeople for a given period of time. The key word in this definition is “customers”. – Territory is typically defined by customers within a geographic boundary – It can be defined by customer types (Industry, Segment etc.). – In cases in which products are highly technical and sophisticated, product specialty is also used to define sales territories.

Case study : 

3/3/2011 TPG Confidential 34 A system needs to be build to allow users to create Territories, Manage Territory Hierarchies The core dimensions for defining sales Territories were: Client Types were 1. Industry 2. Segment – 3. Geography – 4. Product Any level in the hierarchy(ies) of the core dimensions can be used to define Territory Case study

Territory Illustration : 

3/3/2011 TPG Confidential 35 Territory Illustration

Adjacency List Model : 

3/3/2011 TPG Confidential 36 Adjacency List Model The first, and most elegant, approach to represent a hierarchy. It's an elegant approach because you'll need just one, simple function to iterate through your tree. The territories can be represented in table as below (See previous slide for the Territory Tree)

How to fetch Territories : 


Disadvantage of Adjacency Model : 

3/3/2011 TPG Confidential 38 Disadvantage of Adjacency Model It's slow and inefficient. This is mainly caused by the recursion. We need one database query for each node in the tree. As each query takes some time, this makes the function very slow when dealing with large trees. For a tree with four levels, you'll be running four instances of the function at the same time. As each function occupies a slice of memory and takes some time to initiate, recursion is very slow when applied to large trees.

Preorder Tree Traversal : 

3/3/2011 TPG Confidential 39 Preorder Tree Traversal Recursion can be slow, so we would rather not use a recursive function. We'd also like to minimize the number of database queries. Preferably, we'd have just one query for each activity. Lets take an example of Geography Dimension. Start at the root node (‘Americas'), and write a 1 to its left. Follow the tree to ‘Canada' and write a 2 next to it. In this way, you walk (traverse) along the edges of the tree while writing a number on the left and right side of each node. The last number is written at the right side of the Root node.

Preorder Tree Traversal Sample Table : 

3/3/2011 TPG Confidential 40 Preorder Tree Traversal Sample Table

Query To Fetch Geography : 

3/3/2011 TPG Confidential 41 Query To Fetch Geography Query to fetch all the tree under USA: SELECT * FROM GEOGRAPHY WHERE LEFT_NR BETWEEN 4 AND 21 ORDER BY LEVEL; RESULT:

Advantages Of Preorder Tree Traversal : 

3/3/2011 TPG Confidential 42 Advantages Of Preorder Tree Traversal Retrieving the nodes is achieved with only one query Disadvantages Of Preorder Tree Traversal Updating the tree takes more queries and is inefficient

Which Model to Choose : 

3/3/2011 TPG Confidential 43 Which Model to Choose Territories can change constantly as users can add new territories. Using Preorder Tree Traversal might cause slowness in updating the Territories. So, using Adjacency List Model would be the best approach to represent Territories. The dimensions Industry, Geography, Segments, Product etc does not change constantly. There are more static. So, representing them using Preorder Tree Traversal is more efficient.

authorStream Live Help