assoc2

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

By: chandugaurav24 (20 month(s) ago)

wow can u give it 2 me?

By: rere_swe (43 month(s) ago)

please send me a copy of this presentation Thanks

Presentation Transcript

Association Rules: Advanced Topics: 

Association Rules: Advanced Topics

Apriori Adv/Disadv: 

Apriori Adv/Disadv Advantages: Uses large itemset property. Easily parallelized Easy to implement. Disadvantages: Assumes transaction database is memory resident. Requires up to m database scans.

Vertical Layout: 

Vertical Layout Rather than have Transaction ID – list of items (Transactional) We have Item – List of transactions (TID-list) Now to count itemset AB Intersect TID-list of itemA with TID-list of itemB All data for a particular item is available

Eclat Algorithm: 

Eclat Algorithm Dynamically process each transaction online maintaining 2-itemset counts. Transform Partition L2 using 1-item prefix Equivalence classes - {AB, AC, AD}, {BC, BD}, {CD} Transform database to vertical form Asynchronous Phase For each equivalence class E Compute frequent (E)

Asynchronous Phase: 

Asynchronous Phase Compute Frequent (E_k-1) For all itemsets I1 and I2 in E_k-1 If (I1 ∩ I2 >= minsup) add I1 and I2 to L_k Partition L_k into equivalence classes For each equivalence class E_k in L_k Compute_frequent (E_k) Properties of ECLAT Locality enhancing approach Easy and efficient to parallelize Few scans of database (best case 2)

Max-patterns: 

Max-patterns Frequent pattern {a1, …, a100}  (1001) + (1002) + … + (110000) = 2100-1 = 1.27*1030 frequent sub-patterns! Max-pattern: frequent patterns without proper frequent super pattern BCDE, ACD are max-patterns BCD is not a max-pattern Min_sup=2

Frequent Closed Patterns: 

Frequent Closed Patterns Conf(acd)=100%  record acd only For frequent itemset X, if there exists no item y s.t. every transaction containing X also contains y, then X is a frequent closed pattern “acd” is a frequent closed pattern Concise rep. of freq pats Reduce # of patterns and rules N. Pasquier et al. In ICDT’99 Min_sup=2

Mining Various Kinds of Rules or Regularities: 

Mining Various Kinds of Rules or Regularities Multi-level, quantitative association rules, correlation and causality, ratio rules, sequential patterns, emerging patterns, temporal associations, partial periodicity Classification, clustering, iceberg cubes, etc.

Multiple-level Association Rules: 

Multiple-level Association Rules Items often form hierarchy Flexible support settings: Items at the lower level are expected to have lower support. Transaction database can be encoded based on dimensions and levels explore shared multi-level mining

ML/MD Associations with Flexible Support Constraints: 

ML/MD Associations with Flexible Support Constraints Why flexible support constraints? Real life occurrence frequencies vary greatly Diamond, watch, pens in a shopping basket Uniform support may not be an interesting model A flexible model The lower-level, the more dimension combination, and the long pattern length, usually the smaller support General rules should be easy to specify and understand Special items and special group of items may be specified individually and have higher priority

Multi-dimensional Association: 

Multi-dimensional Association Single-dimensional rules: buys(X, “milk”)  buys(X, “bread”) Multi-dimensional rules:  2 dimensions or predicates Inter-dimension assoc. rules (no repeated predicates) age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”) hybrid-dimension assoc. rules (repeated predicates) age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”)

Multi-level Association: Redundancy Filtering: 

Multi-level Association: Redundancy Filtering Some rules may be redundant due to “ancestor” relationships between items. Example milk  wheat bread [support = 8%, confidence = 70%] 2% milk  wheat bread [support = 2%, confidence = 72%] We say the first rule is an ancestor of the second rule. A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor.

Multi-Level Mining: Progressive Deepening: 

Multi-Level Mining: Progressive Deepening A top-down, progressive deepening approach: First mine high-level frequent items: milk (15%), bread (10%) Then mine their lower-level “weaker” frequent itemsets: 2% milk (5%), wheat bread (4%) Different min_support threshold across multi-levels lead to different algorithms: If adopting the same min_support across multi-levels then toss t if any of t’s ancestors is infrequent. If adopting reduced min_support at lower levels then examine only those descendents whose ancestor’s support is frequent/non-negligible.

Interestingness Measure: Correlations (Lift): 

Interestingness Measure: Correlations (Lift) play basketball  eat cereal [40%, 66.7%] is misleading The overall percentage of students eating cereal is 75% which is higher than 66.7%. play basketball  not eat cereal [20%, 33.3%] is more accurate, although with lower support and confidence Measure of dependent/correlated events: lift

Constraint-based Data Mining: 

Constraint-based Data Mining Finding all the patterns in a database autonomously? — unrealistic! The patterns could be too many but not focused! Data mining should be an interactive process User directs what to be mined using a data mining query language (or a graphical user interface) Constraint-based mining User flexibility: provides constraints on what to be mined System optimization: explores such constraints for efficient mining—constraint-based mining

Constrained Frequent Pattern Mining: A Mining Query Optimization Problem: 

Constrained Frequent Pattern Mining: A Mining Query Optimization Problem Given a frequent pattern mining query with a set of constraints C, the algorithm should be sound: it only finds frequent sets that satisfy the given constraints C complete: all frequent sets satisfying the given constraints C are found A naïve solution First find all frequent sets, and then test them for constraint satisfaction More efficient approaches: Analyze the properties of constraints comprehensively Push them as deeply as possible inside the frequent pattern computation.

Anti-Monotonicity in Constraint-Based Mining: 

Anti-Monotonicity in Constraint-Based Mining Anti-monotonicity When an intemset S violates the constraint, so does any of its superset sum(S.Price)  v is anti-monotone sum(S.Price)  v is not anti-monotone Example. C: range(S.profit)  15 is anti-monotone Itemset ab violates C So does every superset of ab TDB (min_sup=2)

Which Constraints Are Anti-Monotone?: 

Which Constraints Are Anti-Monotone?

Monotonicity in Constraint-Based Mining: 

Monotonicity in Constraint-Based Mining Monotonicity When an intemset S satisfies the constraint, so does any of its superset sum(S.Price)  v is monotone min(S.Price)  v is monotone Example. C: range(S.profit)  15 Itemset ab satisfies C So does every superset of ab TDB (min_sup=2)

Which Constraints Are Monotone?: 

Which Constraints Are Monotone?

Succinctness: 

Succinctness Succinctness: Given A1, the set of items satisfying a succinctness constraint C, then any set S satisfying C is based on A1 , i.e., S contains a subset belonging to A1 Idea: Without looking at the transaction database, whether an itemset S satisfies constraint C can be determined based on the selection of items min(S.Price)  v is succinct sum(S.Price)  v is not succinct Optimization: If C is succinct, C is pre-counting pushable

Which Constraints Are Succinct?: 

Which Constraints Are Succinct?

The Apriori Algorithm — Example: 

The Apriori Algorithm — Example Database D Scan D C1 L1 L2 C2 C2 Scan D C3 L3 Scan D

Naïve Algorithm: Apriori + Constraint : 

Naïve Algorithm: Apriori + Constraint Database D Scan D C1 L1 L2 C2 C2 Scan D C3 L3 Scan D Constraint: Sum{S.price < 5}

Pushing the constraint deep into the process : 

Pushing the constraint deep into the process Database D Scan D C1 L1 L2 C2 C2 Scan D C3 L3 Scan D Constraint: Sum{S.price < 5}

Push a Succinct Constraint Deep : 

Push a Succinct Constraint Deep Database D Scan D C1 L1 L2 C2 C2 Scan D C3 L3 Scan D Constraint: min{S.price <= 1 }

Converting “Tough” Constraints: 

Converting “Tough” Constraints Convert tough constraints into anti-monotone or monotone by properly ordering items Examine C: avg(S.profit)  25 Order items in value-descending order <a, f, g, d, b, h, c, e> If an itemset afb violates C So does afbh, afb* It becomes anti-monotone! TDB (min_sup=2)

Convertible Constraints: 

Convertible Constraints Let R be an order of items Convertible anti-monotone If an itemset S violates a constraint C, so does every itemset having S as a prefix w.r.t. R Ex. avg(S)  v w.r.t. item value descending order Convertible monotone If an itemset S satisfies constraint C, so does every itemset having S as a prefix w.r.t. R Ex. avg(S)  v w.r.t. item value descending order

Strongly Convertible Constraints: 

Strongly Convertible Constraints avg(X)  25 is convertible anti-monotone w.r.t. item value descending order R: <a, f, g, d, b, h, c, e> If an itemset af violates a constraint C, so does every itemset with af as prefix, such as afd avg(X)  25 is convertible monotone w.r.t. item value ascending order R-1: <e, c, h, b, d, g, f, a> If an itemset d satisfies a constraint C, so does itemsets df and dfa, which having d as a prefix Thus, avg(X)  25 is strongly convertible

What Constraints Are Convertible?: 

What Constraints Are Convertible?

Combing Them Together—A General Picture: 

Combing Them Together—A General Picture

Classification of Constraints: 

Classification of Constraints Convertible anti-monotone Convertible monotone Strongly convertible Inconvertible Succinct Antimonotone Monotone

Mining With Convertible Constraints: 

Mining With Convertible Constraints C: avg(S.profit)  25 List of items in every transaction in value descending order R: <a, f, g, d, b, h, c, e> C is convertible anti-monotone w.r.t. R Scan transaction DB once remove infrequent items Item h in transaction 40 is dropped Itemsets a and f are good TDB (min_sup=2)

Can Apriori Handle Convertible Constraint?: 

Can Apriori Handle Convertible Constraint? A convertible, not monotone nor anti-monotone nor succinct constraint cannot be pushed deep into the an Apriori mining algorithm Within the level wise framework, no direct pruning based on the constraint can be made Itemset df violates constraint C: avg(X)>=25 Since adf satisfies C, Apriori needs df to assemble adf, df cannot be pruned But it can be pushed into frequent-pattern growth framework!

Mining With Convertible Constraints: 

Mining With Convertible Constraints C: avg(X)>=25, min_sup=2 List items in every transaction in value descending order R: <a, f, g, d, b, h, c, e> C is convertible anti-monotone w.r.t. R Scan TDB once remove infrequent items Item h is dropped Itemsets a and f are good, … Projection-based mining Imposing an appropriate order on item projection Many tough constraints can be converted into (anti)-monotone TDB (min_sup=2)

Handling Multiple Constraints: 

Handling Multiple Constraints Different constraints may require different or even conflicting item-ordering If there exists an order R s.t. both C1 and C2 are convertible w.r.t. R, then there is no conflict between the two convertible constraints If there exists conflict on order of items Try to satisfy one constraint first Then using the order for the other constraint to mine frequent itemsets in the corresponding projected database