logging in or signing up Project Report-Aditi Paul aSGuest44690 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 223 Category: Science & Tech.. License: All Rights Reserved Like it (0) Dislike it (0) Added: May 12, 2010 This Presentation is Public Favorites: 1 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript ON-LINE ANALYTICAL PROCESSING : ON-LINE ANALYTICAL PROCESSING - Analyzing Data Resources ADITI PAUL MCS/08/20 REGISTRATION NO – 003834 OF 2008 POST GRADUATE DEPARTMENT OF COMPUTER SCIENCE ST.XAVIERS COLLEGE (AUTONOMOUS) WHAT IS OLAP ? : WHAT IS OLAP ? Basic idea: Quickly answer multi-dimensional analytical queries. Convert data into information that decision makers need It is a continuous , iterative, and preferably interactive process. WHO USES OLAP ? : WHO USES OLAP ? It is used in an organization to carry out the different ORGANIZATIONAL FUNCTIONS in : Finance departments Sales analysis and forecasting Marketing departments Cardinal Goal “ Provide managers with the information they need to make effective decisions ” Understanding Online Analytical Processing - OLAP : Understanding Online Analytical Processing - OLAP 3 part description Part 1 – Online Part 2 – Analytical Part 3 – Processing PART 1 – ONLINE : PART 1 – ONLINE Slide 6: FLASH BACK Data Stored in a Database : Data Stored in a Database TYPE 1 Operational Data Data that “works”. Frequent Updates and Queries. Normalized for efficient search and updates. Fragmented & local relevance. Point Queries . Examples of Operational Data : Examples of Operational Data Account Details of a Customer in a Bank Student Details in a College/School Database Employee Records Etc. Example Queries on Operational Data : Example Queries on Operational Data What is the salary of Mr.Chatterjee ? ( point query) What is the address and phone number of the person in charge of the hardware department ? How many students have received an “distinction” credential in the latest exam? Slide 10: Operational Data pertain to what we call “ONLINE TRANSACTION PROCESSING” As the name suggests these sorts of data are used for day to day ‘operations’ like data entry /retrieval . For example : An ATM is a commercial online transaction system. Types of Data in a Database : Types of Data in a Database Type 2 Historical Data Data that “tells”. Very Infrequent updates. Integrated data set with global relevance. Analytical queries that require huge amounts of aggregation. Performance issues mainly in query response time. Examples of Historical Data : Examples of Historical Data Last set of 10 transactions on a particular bank account of a customer Record of sales of a product in the last 15 years in a company’s database The profits incurred by a company stored month wise in a whole fiscal year. Example Queries on Historical Data : Example Queries on Historical Data How is the student marks percentage scene changing over the years in college? Is there a correlation between the geographical location of a company unit and excellent employee appraisals? How is the employee attrition changing over the years across the company? Slide 14: Historical Data pertains to the phenomenon that is “Online Analytical Processing” where queries thus do not just depend on seeing one part of a tuple . For example to find out the employee attrition, we have to find out some aggregate employee attrition and then map it against time. Thus these queries require “analyzing” certain facts and then producing a correct output . Slide 15: The necessity that these queries be ONLINE means that the queries need to be responded to in an “ONLINE INTERACTIVE RESPONSE TIME” as the waiting time of users is of the order of a few seconds. The differences Between OLAP and OLTP thus are : The differences Between OLAP and OLTP thus are PART 2 - ANALYTICAL : PART 2 - ANALYTICAL Analysis of the Data : Analysis of the Data In order to “Analyze” this Historical Data , it needs to be stored in a certain formatted and organized manner. This is accomplished by a Data Warehouse. Data warehouse is an infrastructure to manage historical data from various sources. It is designed to support OLAP Queries involving gratuitous use of aggregation. Subject Oriented , Integrated ,Time-Variant and Non Volatile collection of data in support of management’s decision making process. WAREHOUSING SCHEMATIC DATA DIAGRAM : WAREHOUSING SCHEMATIC DATA DIAGRAM Dimensions of Data Warehouse Modeling : Dimensions of Data Warehouse Modeling Measures – Key performance indicator that we want to evaluate. Typically numerical , including volume, sales and cost. A Rule of Thumb : if a number makes(business) sense when aggregated, then it is a measure. Affects what should be stored in Data Warehouse. Example : Aggregate daily volume to month , quarter and year Slide 21: Dimensions – Categories of data analysis Typical dimensions include product, time, region. A Rule of Thumb : when a report is requested “by” something, that something is usually a dimension. Example :In sales report , view sales by month,by region,so the two dimensions needed are time and region. Dimensions and measures are physically represented by a STAR SCHEMA. : Dimensions and measures are physically represented by a STAR SCHEMA. Slide 23: The Data Model Which is adhered to while handling Historical Data to populate a Data Warehouse is a “MULTIDIMENSIONAL DATA MODEL.” One way to look at a multidimensional data model is to view it as a cube. Slide 24: CUBE It is a data structure that allows fast analysis of data. It can also be defined as the capability of manipulating and analyzing data from multiple perspectives. The response time of the multidimensiona-l query still depends on how many cells have to be added on the fly : The response time of the multidimensiona-l query still depends on how many cells have to be added on the fly BASIC STRUCTURE OF A CUBE Slide 27: n-D base cube is called a BASE CUBOID. The top most 0-D cuboid, which holds the highest-level of summarization, is called the APEX CUBOID. The lattice of cuboids forms a data CUBE. PART 3 - PROCESSING : PART 3 - PROCESSING PROCESSING DATA TO INFORMATION : PROCESSING DATA TO INFORMATION Now that we have the Required Data in the Requisite form , how do we get the Desired output to a Query which requires analyzing of the data? This is Accomplished by OLAP Operations OLAP Functions SQL Extensions for OLAP. OLAP OPERATIONS : OLAP OPERATIONS Dimension Tables Market (Market_ID, City , Region) Product (Product_ID,Name,Category,Price) Time(Time_ID,Week,Month,Quarter) Fact table Sales(Market_ID, Product_ID,Time_ID,Amount) OLAP OPERATIONS : OLAP OPERATIONS Aggregation – doing the ‘total’ of a measure over one or more dimensions. : SELECT Market_ID ,Product_ID ,SUM(AMOUNT)FROM SalesGROUP BY Market_ID , Product_ID; QUERY : Find the Total Sales (over time) of each product in each market OLAP OPERATIONS : OLAP OPERATIONS 2. ROLL UP Specific grouping on one dimension where we go from lower level of aggregation to a higher. Example : “ROLL UP sales on MARKET from CITY to REGION” Firsty, the TOTAL SALE of a PARTICULAR Product in a city at a given time is done.Then,we use the CITY and Product ID of a city belonging to a REGION to project sales in that region : Firsty, the TOTAL SALE of a PARTICULAR Product in a city at a given time is done.Then,we use the CITY and Product ID of a city belonging to a REGION to project sales in that region Select S.Product_Id,M.City,SUM(S.Amount) INTO City_Sales FROM Sales S,Market M WHERE M.Market_ID = S.Market_ID GROUP BY S.Product_ID,M.City OLAP OPERATIONS : OLAP OPERATIONS 3.DRILL DOWN Finer –grained view on aggregated data,i.e. going from higher to lower aggregation Converse of Roll-up E.g disaggregate county sales by region/city. OLAP OPERATIONS : OLAP OPERATIONS 4.PIVOTING Select A different dimension(orientation) for analysis OLAP OPERATIONS : OLAP OPERATIONS 5. SLICE and DICE Slicing : Selection on one or more dimensions Example : “Choosing sales only in week 12” Slicing the data cube in the Time Dimension SELECT S.* FROM Sales S,Time T WHERE T.Time_ID = S.Time_ID AND T.WEEK=’Week 12’ OLAP OPERATIONS : OLAP OPERATIONS Dicing: A range selection in a hypercube. Partition or group on one or more dimensions. Example : “ Total sales for each product in each quarter “ Dicing sales in the time dimension : SELECT S.Product_ID,T.Quarter,SUM(S.Amount) FROM Sales S,Time T WHERE T.Time_ID=S.Time_ID Group BY T.Quarter,S.Product_ID SQL EXTENSIONS FOR OLAP : SQL EXTENSIONS FOR OLAP 1.ROLL UP SELECT SEM,SUM(MARKS), RANK() OVER (ORDER BY SUM (MARKS) DESC) AS rank FROM TEACHERS GROUP BY ROLL UP(SEM) ORDER BY SEM Slide 40: ROLL UP thus provides subtotals of aggregate rows. SQL EXTENSIONS : SQL EXTENSIONS 2.CUBE SELECT SEM, SUM(MARKS) FROM TEACHERS GROUP BY CUBE(SEM) Slide 42: The CUBE operator provides subtotals of aggregate values in the result set SQL EXTENSIONS : SQL EXTENSIONS 3. GROUPING SETS lets us compute groups on several different sets of grouping columns in the same query. This Query returns subtotal rows for each year, but not for the individual quarters. SQL EXTENSIONS : SQL EXTENSIONS Select YEAR as YEAR , QUARTER as QUARTER, COUNT(*) as ORDERS from SALES GROUP BY GROUPING SETS(YEAR,QUARTER),(YEAR)) ORDER BY YEAR & QUARTER OLAP FUNCTIONS : OLAP FUNCTIONS 1. RANK FUNCTION – Lets us compile a list of values from your data set in ranked order. Example : The SQL query that follows finds the male and female employees from Kolkata and ranks them in descending order according to salary. SELECT emp_lname, salary, sex,RANK () OVER (ORDER BY salary DESC) "Rank"FROM employeeWHERE city IN (’KOL’) : SELECT emp_lname, salary, sex,RANK () OVER (ORDER BY salary DESC) "Rank"FROM employeeWHERE city IN (’KOL’) OLAP FUNCTIONS : OLAP FUNCTIONS 2.REPORTING FUNCTION : Reporting functions lets us compare non-aggregate values to aggregate values. Example : The following query returns a result set that shows a list of the products that sold higher than the average number of sales. The result set is partitioned by year. SELECT *FROM (SELECT year(order_date) AS Year, prod_id,SUM( quantity ) AS Q,AVG (SUM(quantity))OVER (PARTITION BY Year) AS Average FROM sales_order JOIN sales_order_itemsGROUP BY year(order_date), prod_idORDER BY Year)AS derived_tableWHERE Q > Average : SELECT *FROM (SELECT year(order_date) AS Year, prod_id,SUM( quantity ) AS Q,AVG (SUM(quantity))OVER (PARTITION BY Year) AS Average FROM sales_order JOIN sales_order_itemsGROUP BY year(order_date), prod_idORDER BY Year)AS derived_tableWHERE Q > Average For the year 2000, the average number of orders was 1787. Four products (700, 601, 600, and 400) sold higher than that amount. In 2001, the average number of orders was 1048 and three products exceeded that amount. OLAP FUNCTIONS : OLAP FUNCTIONS WINDOW FUNCTIONS Window functions lets us analyze ourdata by computing aggregate values over windows surrounding each row. The result set returns a summary value representing a set of rows. The query returns a result set that partitions the data by department and then provides a cumulative summary of employees’ salaries starting with the employee who has been at the company the longest. The result set includes only those employees who reside in West Bengal, BBSR, Maharashtra, or Arunachal. The column Sum Salary provides the cumulative total of employees’ salaries. SELECT dept_id, emp_lname, start_date, salary,SUM(salary) OVER (PARTITION BY dept_idORDER BY start_dateRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "Sum_Salary"FROM employeeWHERE state IN (’WB’, ’BBSR’, ’MH’, ’AR’) AND dept_id IN (’100’,’200’)ORDER BY dept_id, start_date; : The query returns a result set that partitions the data by department and then provides a cumulative summary of employees’ salaries starting with the employee who has been at the company the longest. The result set includes only those employees who reside in West Bengal, BBSR, Maharashtra, or Arunachal. The column Sum Salary provides the cumulative total of employees’ salaries. SELECT dept_id, emp_lname, start_date, salary,SUM(salary) OVER (PARTITION BY dept_idORDER BY start_dateRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "Sum_Salary"FROM employeeWHERE state IN (’WB’, ’BBSR’, ’MH’, ’AR’) AND dept_id IN (’100’,’200’)ORDER BY dept_id, start_date; On Line Analytical Processing : On Line Analytical Processing Thus Online Analytical Processing as a whole can be understood to be a method which takes in raw data , processes it through various functions and operations and produces Information as a Response to Multidimensional Queries in Real Time SERVER ARCHITECTURES : SERVER ARCHITECTURES MOLAP : Multidimensional OLAP The database is stored in a special, usually proprietary, structure that is optimized for multidimensional analysis. + : very fast query response time because data is mostly pre-calculated -: practical limit on the size because the time taken to calculate the database and the space required to hold these pre-calculated values SERVER ARCHICTECTURES : SERVER ARCHICTECTURES ROLAP – Relational OLAP The database is a standard relational database and the database model is a multidimensional model, often referred to as a star or snowflake model or schema. +: more scalable solution -: performance of the queries will be largely governed by the complexity of the SQL and the number and size of the tables being joined in the query SERVER ARCHITECTURES : SERVER ARCHITECTURES HOLAP – HYBRID OLAP A hybrid of ROLAP and MOLAP can be thought of as a virtual database whereby the higher levels of the database are implemented as MOLAP and the lower levels of the database as ROLAP SERVER ARCHITECTURES : SERVER ARCHITECTURES DOLAP –DESKTOP OLAP The previous terms are used to refer to server based OLAP technologies DOLAP (Desktop OLAP) DOLAP enables users to quickly pull together small cubes that run on their desktops or laptops . COMMERCIAL OLAP SYSTEMS : COMMERCIAL OLAP SYSTEMS IBM DB2 DATAWAREHOUSING ENTERPRIZE EDITION STANDARD EDITION BASE EDITION ORACLE 9i ENTERPRIZE EDITION MICROSOFT SQL SERVER 2005 BUSINESS INTELLIGEN CE WORKBENCH PLATFORM OLAP Challenges and Future Scope : OLAP Challenges and Future Scope Analytical Complexity Business questions can be rarely answered by a single query Complex queries are hard to understand,write and execute efficiently Need for good business analysts Data Cubes can be HUGE But also can be sparse Can compute in advance,compute on demand , or some combination. OLAP forms the underlying structure of DDAS –Distributed Data Analysis and Dissemination System. From On line Analytical Processing to Online Analytical Mining ( OLAP to OLAM) BIBLIOGRAPHY : BIBLIOGRAPHY Data Warehousing , Data Mining and OLAP – Alex Berson,Stephen J.Smith Data Warehousing And OLAp - Hector Garcia-Molina Stanford University A Hitchhiker’s guide to OLAP – Paul Burton and Howard ong. Data mining data warehousing – Dr.Hani Saleeb DATA WAREHOUSE AND OLAP TECHNOLOGY Prof. Anita Wasilewska Data Mining: Concepts and Techniques Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University Wikipedia. Data Warehousing, Filtering, and Mining-Temple University Data Mining- Professor Maytal Saar-Tsechansky THANK YOU ! : THANK YOU ! You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Project Report-Aditi Paul aSGuest44690 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 223 Category: Science & Tech.. License: All Rights Reserved Like it (0) Dislike it (0) Added: May 12, 2010 This Presentation is Public Favorites: 1 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript ON-LINE ANALYTICAL PROCESSING : ON-LINE ANALYTICAL PROCESSING - Analyzing Data Resources ADITI PAUL MCS/08/20 REGISTRATION NO – 003834 OF 2008 POST GRADUATE DEPARTMENT OF COMPUTER SCIENCE ST.XAVIERS COLLEGE (AUTONOMOUS) WHAT IS OLAP ? : WHAT IS OLAP ? Basic idea: Quickly answer multi-dimensional analytical queries. Convert data into information that decision makers need It is a continuous , iterative, and preferably interactive process. WHO USES OLAP ? : WHO USES OLAP ? It is used in an organization to carry out the different ORGANIZATIONAL FUNCTIONS in : Finance departments Sales analysis and forecasting Marketing departments Cardinal Goal “ Provide managers with the information they need to make effective decisions ” Understanding Online Analytical Processing - OLAP : Understanding Online Analytical Processing - OLAP 3 part description Part 1 – Online Part 2 – Analytical Part 3 – Processing PART 1 – ONLINE : PART 1 – ONLINE Slide 6: FLASH BACK Data Stored in a Database : Data Stored in a Database TYPE 1 Operational Data Data that “works”. Frequent Updates and Queries. Normalized for efficient search and updates. Fragmented & local relevance. Point Queries . Examples of Operational Data : Examples of Operational Data Account Details of a Customer in a Bank Student Details in a College/School Database Employee Records Etc. Example Queries on Operational Data : Example Queries on Operational Data What is the salary of Mr.Chatterjee ? ( point query) What is the address and phone number of the person in charge of the hardware department ? How many students have received an “distinction” credential in the latest exam? Slide 10: Operational Data pertain to what we call “ONLINE TRANSACTION PROCESSING” As the name suggests these sorts of data are used for day to day ‘operations’ like data entry /retrieval . For example : An ATM is a commercial online transaction system. Types of Data in a Database : Types of Data in a Database Type 2 Historical Data Data that “tells”. Very Infrequent updates. Integrated data set with global relevance. Analytical queries that require huge amounts of aggregation. Performance issues mainly in query response time. Examples of Historical Data : Examples of Historical Data Last set of 10 transactions on a particular bank account of a customer Record of sales of a product in the last 15 years in a company’s database The profits incurred by a company stored month wise in a whole fiscal year. Example Queries on Historical Data : Example Queries on Historical Data How is the student marks percentage scene changing over the years in college? Is there a correlation between the geographical location of a company unit and excellent employee appraisals? How is the employee attrition changing over the years across the company? Slide 14: Historical Data pertains to the phenomenon that is “Online Analytical Processing” where queries thus do not just depend on seeing one part of a tuple . For example to find out the employee attrition, we have to find out some aggregate employee attrition and then map it against time. Thus these queries require “analyzing” certain facts and then producing a correct output . Slide 15: The necessity that these queries be ONLINE means that the queries need to be responded to in an “ONLINE INTERACTIVE RESPONSE TIME” as the waiting time of users is of the order of a few seconds. The differences Between OLAP and OLTP thus are : The differences Between OLAP and OLTP thus are PART 2 - ANALYTICAL : PART 2 - ANALYTICAL Analysis of the Data : Analysis of the Data In order to “Analyze” this Historical Data , it needs to be stored in a certain formatted and organized manner. This is accomplished by a Data Warehouse. Data warehouse is an infrastructure to manage historical data from various sources. It is designed to support OLAP Queries involving gratuitous use of aggregation. Subject Oriented , Integrated ,Time-Variant and Non Volatile collection of data in support of management’s decision making process. WAREHOUSING SCHEMATIC DATA DIAGRAM : WAREHOUSING SCHEMATIC DATA DIAGRAM Dimensions of Data Warehouse Modeling : Dimensions of Data Warehouse Modeling Measures – Key performance indicator that we want to evaluate. Typically numerical , including volume, sales and cost. A Rule of Thumb : if a number makes(business) sense when aggregated, then it is a measure. Affects what should be stored in Data Warehouse. Example : Aggregate daily volume to month , quarter and year Slide 21: Dimensions – Categories of data analysis Typical dimensions include product, time, region. A Rule of Thumb : when a report is requested “by” something, that something is usually a dimension. Example :In sales report , view sales by month,by region,so the two dimensions needed are time and region. Dimensions and measures are physically represented by a STAR SCHEMA. : Dimensions and measures are physically represented by a STAR SCHEMA. Slide 23: The Data Model Which is adhered to while handling Historical Data to populate a Data Warehouse is a “MULTIDIMENSIONAL DATA MODEL.” One way to look at a multidimensional data model is to view it as a cube. Slide 24: CUBE It is a data structure that allows fast analysis of data. It can also be defined as the capability of manipulating and analyzing data from multiple perspectives. The response time of the multidimensiona-l query still depends on how many cells have to be added on the fly : The response time of the multidimensiona-l query still depends on how many cells have to be added on the fly BASIC STRUCTURE OF A CUBE Slide 27: n-D base cube is called a BASE CUBOID. The top most 0-D cuboid, which holds the highest-level of summarization, is called the APEX CUBOID. The lattice of cuboids forms a data CUBE. PART 3 - PROCESSING : PART 3 - PROCESSING PROCESSING DATA TO INFORMATION : PROCESSING DATA TO INFORMATION Now that we have the Required Data in the Requisite form , how do we get the Desired output to a Query which requires analyzing of the data? This is Accomplished by OLAP Operations OLAP Functions SQL Extensions for OLAP. OLAP OPERATIONS : OLAP OPERATIONS Dimension Tables Market (Market_ID, City , Region) Product (Product_ID,Name,Category,Price) Time(Time_ID,Week,Month,Quarter) Fact table Sales(Market_ID, Product_ID,Time_ID,Amount) OLAP OPERATIONS : OLAP OPERATIONS Aggregation – doing the ‘total’ of a measure over one or more dimensions. : SELECT Market_ID ,Product_ID ,SUM(AMOUNT)FROM SalesGROUP BY Market_ID , Product_ID; QUERY : Find the Total Sales (over time) of each product in each market OLAP OPERATIONS : OLAP OPERATIONS 2. ROLL UP Specific grouping on one dimension where we go from lower level of aggregation to a higher. Example : “ROLL UP sales on MARKET from CITY to REGION” Firsty, the TOTAL SALE of a PARTICULAR Product in a city at a given time is done.Then,we use the CITY and Product ID of a city belonging to a REGION to project sales in that region : Firsty, the TOTAL SALE of a PARTICULAR Product in a city at a given time is done.Then,we use the CITY and Product ID of a city belonging to a REGION to project sales in that region Select S.Product_Id,M.City,SUM(S.Amount) INTO City_Sales FROM Sales S,Market M WHERE M.Market_ID = S.Market_ID GROUP BY S.Product_ID,M.City OLAP OPERATIONS : OLAP OPERATIONS 3.DRILL DOWN Finer –grained view on aggregated data,i.e. going from higher to lower aggregation Converse of Roll-up E.g disaggregate county sales by region/city. OLAP OPERATIONS : OLAP OPERATIONS 4.PIVOTING Select A different dimension(orientation) for analysis OLAP OPERATIONS : OLAP OPERATIONS 5. SLICE and DICE Slicing : Selection on one or more dimensions Example : “Choosing sales only in week 12” Slicing the data cube in the Time Dimension SELECT S.* FROM Sales S,Time T WHERE T.Time_ID = S.Time_ID AND T.WEEK=’Week 12’ OLAP OPERATIONS : OLAP OPERATIONS Dicing: A range selection in a hypercube. Partition or group on one or more dimensions. Example : “ Total sales for each product in each quarter “ Dicing sales in the time dimension : SELECT S.Product_ID,T.Quarter,SUM(S.Amount) FROM Sales S,Time T WHERE T.Time_ID=S.Time_ID Group BY T.Quarter,S.Product_ID SQL EXTENSIONS FOR OLAP : SQL EXTENSIONS FOR OLAP 1.ROLL UP SELECT SEM,SUM(MARKS), RANK() OVER (ORDER BY SUM (MARKS) DESC) AS rank FROM TEACHERS GROUP BY ROLL UP(SEM) ORDER BY SEM Slide 40: ROLL UP thus provides subtotals of aggregate rows. SQL EXTENSIONS : SQL EXTENSIONS 2.CUBE SELECT SEM, SUM(MARKS) FROM TEACHERS GROUP BY CUBE(SEM) Slide 42: The CUBE operator provides subtotals of aggregate values in the result set SQL EXTENSIONS : SQL EXTENSIONS 3. GROUPING SETS lets us compute groups on several different sets of grouping columns in the same query. This Query returns subtotal rows for each year, but not for the individual quarters. SQL EXTENSIONS : SQL EXTENSIONS Select YEAR as YEAR , QUARTER as QUARTER, COUNT(*) as ORDERS from SALES GROUP BY GROUPING SETS(YEAR,QUARTER),(YEAR)) ORDER BY YEAR & QUARTER OLAP FUNCTIONS : OLAP FUNCTIONS 1. RANK FUNCTION – Lets us compile a list of values from your data set in ranked order. Example : The SQL query that follows finds the male and female employees from Kolkata and ranks them in descending order according to salary. SELECT emp_lname, salary, sex,RANK () OVER (ORDER BY salary DESC) "Rank"FROM employeeWHERE city IN (’KOL’) : SELECT emp_lname, salary, sex,RANK () OVER (ORDER BY salary DESC) "Rank"FROM employeeWHERE city IN (’KOL’) OLAP FUNCTIONS : OLAP FUNCTIONS 2.REPORTING FUNCTION : Reporting functions lets us compare non-aggregate values to aggregate values. Example : The following query returns a result set that shows a list of the products that sold higher than the average number of sales. The result set is partitioned by year. SELECT *FROM (SELECT year(order_date) AS Year, prod_id,SUM( quantity ) AS Q,AVG (SUM(quantity))OVER (PARTITION BY Year) AS Average FROM sales_order JOIN sales_order_itemsGROUP BY year(order_date), prod_idORDER BY Year)AS derived_tableWHERE Q > Average : SELECT *FROM (SELECT year(order_date) AS Year, prod_id,SUM( quantity ) AS Q,AVG (SUM(quantity))OVER (PARTITION BY Year) AS Average FROM sales_order JOIN sales_order_itemsGROUP BY year(order_date), prod_idORDER BY Year)AS derived_tableWHERE Q > Average For the year 2000, the average number of orders was 1787. Four products (700, 601, 600, and 400) sold higher than that amount. In 2001, the average number of orders was 1048 and three products exceeded that amount. OLAP FUNCTIONS : OLAP FUNCTIONS WINDOW FUNCTIONS Window functions lets us analyze ourdata by computing aggregate values over windows surrounding each row. The result set returns a summary value representing a set of rows. The query returns a result set that partitions the data by department and then provides a cumulative summary of employees’ salaries starting with the employee who has been at the company the longest. The result set includes only those employees who reside in West Bengal, BBSR, Maharashtra, or Arunachal. The column Sum Salary provides the cumulative total of employees’ salaries. SELECT dept_id, emp_lname, start_date, salary,SUM(salary) OVER (PARTITION BY dept_idORDER BY start_dateRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "Sum_Salary"FROM employeeWHERE state IN (’WB’, ’BBSR’, ’MH’, ’AR’) AND dept_id IN (’100’,’200’)ORDER BY dept_id, start_date; : The query returns a result set that partitions the data by department and then provides a cumulative summary of employees’ salaries starting with the employee who has been at the company the longest. The result set includes only those employees who reside in West Bengal, BBSR, Maharashtra, or Arunachal. The column Sum Salary provides the cumulative total of employees’ salaries. SELECT dept_id, emp_lname, start_date, salary,SUM(salary) OVER (PARTITION BY dept_idORDER BY start_dateRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "Sum_Salary"FROM employeeWHERE state IN (’WB’, ’BBSR’, ’MH’, ’AR’) AND dept_id IN (’100’,’200’)ORDER BY dept_id, start_date; On Line Analytical Processing : On Line Analytical Processing Thus Online Analytical Processing as a whole can be understood to be a method which takes in raw data , processes it through various functions and operations and produces Information as a Response to Multidimensional Queries in Real Time SERVER ARCHITECTURES : SERVER ARCHITECTURES MOLAP : Multidimensional OLAP The database is stored in a special, usually proprietary, structure that is optimized for multidimensional analysis. + : very fast query response time because data is mostly pre-calculated -: practical limit on the size because the time taken to calculate the database and the space required to hold these pre-calculated values SERVER ARCHICTECTURES : SERVER ARCHICTECTURES ROLAP – Relational OLAP The database is a standard relational database and the database model is a multidimensional model, often referred to as a star or snowflake model or schema. +: more scalable solution -: performance of the queries will be largely governed by the complexity of the SQL and the number and size of the tables being joined in the query SERVER ARCHITECTURES : SERVER ARCHITECTURES HOLAP – HYBRID OLAP A hybrid of ROLAP and MOLAP can be thought of as a virtual database whereby the higher levels of the database are implemented as MOLAP and the lower levels of the database as ROLAP SERVER ARCHITECTURES : SERVER ARCHITECTURES DOLAP –DESKTOP OLAP The previous terms are used to refer to server based OLAP technologies DOLAP (Desktop OLAP) DOLAP enables users to quickly pull together small cubes that run on their desktops or laptops . COMMERCIAL OLAP SYSTEMS : COMMERCIAL OLAP SYSTEMS IBM DB2 DATAWAREHOUSING ENTERPRIZE EDITION STANDARD EDITION BASE EDITION ORACLE 9i ENTERPRIZE EDITION MICROSOFT SQL SERVER 2005 BUSINESS INTELLIGEN CE WORKBENCH PLATFORM OLAP Challenges and Future Scope : OLAP Challenges and Future Scope Analytical Complexity Business questions can be rarely answered by a single query Complex queries are hard to understand,write and execute efficiently Need for good business analysts Data Cubes can be HUGE But also can be sparse Can compute in advance,compute on demand , or some combination. OLAP forms the underlying structure of DDAS –Distributed Data Analysis and Dissemination System. From On line Analytical Processing to Online Analytical Mining ( OLAP to OLAM) BIBLIOGRAPHY : BIBLIOGRAPHY Data Warehousing , Data Mining and OLAP – Alex Berson,Stephen J.Smith Data Warehousing And OLAp - Hector Garcia-Molina Stanford University A Hitchhiker’s guide to OLAP – Paul Burton and Howard ong. Data mining data warehousing – Dr.Hani Saleeb DATA WAREHOUSE AND OLAP TECHNOLOGY Prof. Anita Wasilewska Data Mining: Concepts and Techniques Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University Wikipedia. Data Warehousing, Filtering, and Mining-Temple University Data Mining- Professor Maytal Saar-Tsechansky THANK YOU ! : THANK YOU !