# Ch1-04

Views:

Category: Entertainment

## Presentation Description

No description available.

## Presentation Transcript

### Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 1 —:

Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 1 — ©Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj

### Chapter 1 Introduction:

Chapter 1 Introduction Data What is the data mining

### 1.1 Data:

1.1 Data Numbers Curves, figures Sounds Papers, books Web, telephone process

### Mining Multimedia Databases:

Refining or combining searches Search for “blue sky” (top layout grid is blue) Search for “blue sky and green meadows” (top layout grid is blue and bottom is green) Search for “airplane in blue sky” (top layout grid is blue and keyword = “airplane”) Mining Multimedia Databases

### Slide11:

Get more data/information HPLC -DAD 3D chromatogram HPLC chromatogram of nuclueside of Cordyceps Sinensis ( 冬蟲草 ) at one wavelength Hyphenated Instrument ( 聯用儀器 )

DNA microarray

### Data Analysis:

Data Analysis Univariate statistics Multivariate statistics Data Mining

### Multivariate Analysis:

Multivariate Analysis Regression analysis Principal component analysis Factor analysis Structural equation models Canonical correlation analysis Discriminant analysis Cluster analysis

### 1.2 What Is Data Mining?:

1.2 What Is Data Mining? Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and summarize the data in novel ways that are both understandable and useful to the data owner. -- David Hand, Heikki Mannila, and Padhraic Symth; 2001

### Slide20:

What Is Data Mining? Data Mining is the process of exploration and analysis , by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules . --Michael J. A. Berry and Gordon S. Linoff; 2000

### What Is Data Mining?:

What Is Data Mining? Knowledge discovery in databases (KDD) Extraction of interesting ( non-trivial, implicit , previously unknown and potentially useful) information or patterns from data in large databases , etc.

### What Is Data Mining?:

What Is Data Mining? Inside stories Data mining: a misnomer? Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

### What Is Data Mining?:

What Is Data Mining? Data Mining Database Technology Statistics Other Disciplines Information Science Machine Learning Visualization

### Data Mining: A KDD Process:

Data Mining: A KDD Process Data mining: the core of knowledge discovery process. Data Cleaning Data Integration Databases Data Warehouse Knowledge Task-relevant Data Selection Data Mining Pattern Evaluation

### Problems:

Problems Classification Pattern Recognition Association (Correlation) Description Visualization Etc. AT&T, Ernst & Young, IRS, DGBAS, Credit Card, etc.

### Problems:

Problems Classification Pattern Recognition Association (Correlation) Description Visualization Etc. AT&T, Ernst & Young, IRS, DGBAS, Credit Card, etc.

### Data Mining Methods:

Data Mining Methods Characterization Decision Tree Association or Affinity Grouping Classification/prediction Discrimination Regression Clustering Outlier Analysis Description and Visualization

### Data Mining vs. Statistics:

Data Mining vs. Statistics Large amount of data: 1,000,000,000 rows, 3,000 columns Happenstance data Why sample? We have a large parallel computer PowerPoint shows Reasonable Price for Software: \$2,000,000 Large amount of data: 10,000 rows, 20 columns Systematically gathered data Sample -- we even get error estimates!! Overhead foils Reasonable Price for Software

### Applications of Data Mining:

Applications of Data Mining Bank, Credit card ( 銀行 , 信用卡 ) Marketing ( 市場研究 ) Web intelligence ( 網頁智能 ) Communication ( 傳媒 ) Risk management ( 風險管理 ) Genetics ( 基因 ) Chinese Medicine ( 中藥 ) Chemistry ( 化學 )

### Deta mining (KDD) :

Deta mining (KDD) Clear task Good data sets Methods depend on the task

### 1.3 Complexity:

1.3 Complexity Descriptor Data Set Size in Bytes Storage Mode Tiny 10 2 Piece of Paper Small 10 4 A Few Pieces of Paper Medium 10 6 A Floppy Disk Large 10 8 Hard Disk Huge 10 10 Multiple Hard Disks Massive 10 12 Robotic Magnetic Tape Storage Silos Supermassive 10 15 Distributed Data Archives

### 1.3 Complexity:

1.3 Complexity O( n 1/2 ) Plot a Scatterplot O( n ) Calculate Means, Variances, Kernel Density Estimates O(n log(n)) Calculate Fast Fourier Transforms O(n c) Calculate Singular Value Decomposition of an r x c Matrix; Solve a Multiple Linear Regression O( n 2 ) Solve most Clustering Algorithms O( a n ) Detect Multivariate Outliers Algorithmic Complexity

1.3 Complexity

1.3 Complexity

### Statistical Data Mining:

Statistical Data Mining Need Statistical methodologies/algorithms that is computable (under the constraints of computer memory and complexity). So All Statistical methodologies need to be labeled its complexity. For powerful O(n 2 ) methodologies, an approximate O(n) algorithm is needed. Sampling is recommended.

### 1.4 Data Mining Example:

1. 4 Data Mining Example The sport of choice for the urban poor is BASKETBALL . The sport of choice for maintenance level employees is BOWLING . The sport of choice for front-line workers is FOOTBALL. The sport of choice for supervisors is BASEBALL . The sport of choice for middle management is TENNIS . The sport of choice for corporate officers is GOLF .

### CONCLUSION:

CONCLUSION The higher you are in the corporate structure, the smaller your balls become.

### Lincoln & Kennedy:

Lincoln & Kennedy The incidence of coincidence is so previewed, that it cannot be considered coincidence. Abraham Lincoln was elected to Congress in 1846. John F. Kennedy was elected to Congress in 1946. Abraham Lincoln was elected President in 1860. John F. Kennedy was elected President in 1960. The names Lincoln and Kennedy each contain seven letters.

### Lincoln & Kennedy:

Lincoln & Kennedy Both were particularly concerned with civil rights. Both wives lost their children while living in the White House. Both Presidents were shot on a Friday. Both Presidents were shot in the head. Both were shot in presence of their wives. The secretary of each President warned them not to go, to the theatre and to Dallas, respectively.

### Lincoln & Kennedy:

Lincoln & Kennedy Lincoln's secretary was named Kennedy. Kennedy's secretary was named Lincoln. Both were assassinated by Southerners. Both were succeeded by Southerners. Both successors were named Johnson. Andrew Johnson, who succeeded Lincoln, was born in 1808. Lyndon Johnson, who succeeded Kennedy, was born in 1908.

### Lincoln & Kennedy:

Lincoln & Kennedy John Wilkes Booth, who assassinated Lincoln, was born in 1839. Lee Harvey Oswald, who assassinated Kennedy, was born in 1939. Both assassins were known by their three names. Both names are comprised of fifteen letters. Lincoln was shot at the theatre named 'Kennedy.' Kennedy was shot in a car called 'Lincoln.'

### Lincoln & Kennedy:

Lincoln & Kennedy Booth ran from the theatre and was caught in a warehouse. Oswald ran from a warehouse and was caught in a theatre. Booth and Oswald were assassinated before their trials. And here's the kicker... A week before Lincoln was shot, he was in Monroe, Maryland. A week before Kennedy was shot, he was in Monroe, Marilyn.