Final

Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide 1:

Index Recommendation for High Dimensional Databases Project Guide Mr.B.S.Teertharaj Project Associates: AkhilRaj.V.Gadagkar Kavyashree.B.P. Kruthika.B.V. Prabhakara.H.D

Slide 2:

High-dimensional databases pose a challenge with respect to efficient access . users are usually interested in querying data over a relatively small subset of the entire attribute set at a time . A potential solution is to use lower dimensional indexes that accurately represent the user access patterns . The system output must be some parameter that we can measure and use to make decisions. Query performance is the obvious parameter to monitor. Using this we can compare traditional sequential search with index search. Abstract

Slide 3:

Introduction An increasing number of data base applications such as business data warehouses and scientific data repositories deal with high dimensional data sets As the number of attributes and the overall size of data sets increases it becomes essential to efficiently retrieve specific queried data from database Indexing support is needed to effectively prune out significant portions of the data set that are not relevant to queries

Solution:

Solution A system that can that can efficiently retrieve data from a large voluminous databases , through indexing support.

Slide 5:

User Request as input Select Without Index Select with Index Data Base Request Response Request Response Collaboration Diagram

Slide 6:

System Requirements Software Requirements Front End :J2EE(JSP) Back End :MS SQL Server 2000 Hardware Requirements Monitor :15 inches RAM :256 MB Processor :Intel Pentium 4 Key board :102 keys Mouse :3 Buttons

Slide 7:

Modules Selection in sequence and using indices Calculate the Query Cost Support and confidence calculation Batch execution

Selection in sequence and using indices:

Selection in sequence and using indices In a sequential search, the system scans each file block and tests all records to see whether they satisfy the selection condition. Index structures are referred to as access paths , since they provide a path through which data can be located and accessed.

Calculate the query cost:

Calculate the query cost The cost of query evaluation can be measured in terms of a number of different resources, including disk accesses, CPU time to execute a query. The response time for a query – evaluation plan that is, the clock time required to execute the plan, assuming no other activity is going on the computer, would account for all these costs, and could be used as a good measure of the cost of the plan.

Slide 10:

In large database systems, however disk accesses which we measure as the number of transfers of blocks from disk are usually the most important cost, since disk accesses are slow compared to in- memory operations. We use the number of blocks transfers from disk as a measure of the actual cost. A more accurate measure would therefore estimate 1. The number of seek operations performed 2. The number of blocks read 3. The number of blocks written

Slide 11:

query Parser and translator Relational algebra expression optimizer Query output Evaluation engine Execution plan data Statistic about data Steps in query processing

Support and confidence calculation:

Support and confidence calculation The support (s) of an actor/actress is the percentage of number of movies in which he or she appeared. The support (s) of an association rule X=>Y is the percentage of movies in the database that contain XUY The confidence or strength ( α ) for an association rule X=>Y is the ratio of the number of movies that contain XUY to the number of movies that contain X.

Slide 13:

An Example: No of movies the actor and actress appeared. {Sharukh} [Support=80%] {Kajol} [Support=60%] Association Rules from the no of movies Sharukh Kajol [Support=60%, Confidence=75%] Kajol Sharukh [Support=60%, Confidence=100%] M1. Sharukh ,kajol M2 Sharukh, manisha M3. Kajol,sharukh M4. Aamir , Juhie M5. Sharukh,Kajol Movies data Algorithm

Batch execution:

Batch execution Execution of set of queries simultaneously. This module actually represents workload on database In a workload several queries are been made to the tables in the databases. The cost of executing each query and the cost of total workload is measured here.

Slide 15:

Use-Case Diagram

Slide 16:

Authenticator Log on Search With Index Without index ) Use r Sequence Diagram

Slide 17:

Data Flow Diagram [DFD]

Table Structure:

Table Structure Actors table

Slide 19:

Movies table

Slide 20:

Support and Confidence Table

Slide 21:

Advantages By creating index we can minimize the searching time. Disadvantages Efficiency is less if the structure of database changes . Less reliability

Future Enhancement:

Future Enhancement The modules implemented in this project can be combined together for tuning of indices in a system to improve performance The Search module ,Support and confidence module and the workload module can be used as building blocks for the design of a new index tuning framework.

Bibliography:

Bibliography Database system concepts – Abraham Silberschatz, Henry F. Korth, S.Sudarshan Data mining-Margaret H.Dunham, S.sridhar Complete Reference J2EE –Herbert Schildt JAVA2 Complete-Sun Microsystems IEEE problem statement on Online Index Recommendations for High-Dimensional Databases using Query Workloads -Michael Gibas, Guadalupe Canahuate, Hakan Ferhatosmanoglu.

Slide 24:

THANK YOU