ic 06 v13

Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Multimedia and Graph mining: 

Multimedia and Graph mining Christos Faloutsos CMU

CONGRATULATIONS!: 

CONGRATULATIONS!

Outline: 

Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting Conclusions

Motivation: 

Motivation Data mining: ~ find patterns (rules, outliers) How do detached cat retinas evolve? How do real graphs look like? How do (numerical) streams look like?

ViVo: cat retina mining: 

ViVo: cat retina mining with Ambuj Singh, Mark Verardo, Vebjorn Ljosa, Arnab Bhattacharya (UCSB) Jia-Yu Tim Pan, HJ Yang (CMU)

Detachment Development: 

Detachment Development Normal 1 day after detachment 3 days after detachment 7 days after detachment 28 days after detachment 3 months after detachment

Data and Problem: 

Data and Problem (Problem) What happens in retina after detachment? What tissues (regions) are involved? How do they change over time? How will a program convey this info? More than classification “we want to learn what classifier learned”

Main idea: 

Main idea extract characteristic visual ‘words’ Equivalent to characteristic keywords, in a collection of text documents

Visual vocabulary?: 

Visual vocabulary?

Visual vocabulary?: 

Visual vocabulary? news: president, minister, economic sports: baseball, score, penalty

Visual Vocabulary (ViVo) generation: 

Visual Vocabulary (ViVo) generation Step 1: Tile image Step 2: Extract tile features Step 3: ViVo generation Visual vocabulary Feature 1 Feature 2 8x12 tiles

Biological interpretation: 

Biological interpretation

Which tissue is significant on 7-day?: 

Which tissue is significant on 7-day?

FEMine: Mining Fly Embryos: 

FEMine: Mining Fly Embryos

With: 

With Eric Xing (CMU CS) Bob Murphy (CMU – Bio) Tim Pan (CMU -> Google) Andre Balan (U. Sao Paulo)

Outline: 

Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting Conclusions

Graphs - why should we care?: 

Graphs - why should we care?

Graphs - why should we care?: 

Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

Joint work with: 

Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

Problem: network and graph mining: 

Problem: network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is ‘normal’/‘abnormal’? which patterns/laws hold?

Graph mining: 

Graph mining Are real graphs random?

Laws and patterns: 

Laws and patterns NO!! Diameter in- and out- degree distributions other (surprising) patterns

Laws – degree distributions: 

Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree count ?? 3

Laws – degree distributions: 

Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree

Solution:: 

Solution: The plot is linear in log-log scale [FFF’99] freq = degree (-2.15) O = -2.15 Exponent = slope Outdegree Frequency Nov’97 -2.15

But:: 

But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

The Peer-to-Peer Topology: 

The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]

More power laws:: 

More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman

Slide29: 

Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature 2001 4781 Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi http://www.nd.edu/~networks/ Publication%20Categories/ 04%20Talks/2005-norway-3hours.ppt

More power laws:: 

More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf ``ebay’’

epinions.com: 

epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people user

A famous power law: Zipf’s law: 

A famous power law: Zipf’s law Bible - rank vs frequency (log-log) similarly, in many other languages; for customers and sales volume; city populations etc etc log(rank) log(freq)

Olympic medals (Sidney’00, Athens’04):: 

Olympic medals (Sidney’00, Athens’04): log( rank) log(#medals)

More power laws: areas – Korcak’s law: 

More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)

But:: 

But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

Time evolution: 

Time evolution with Jure Leskovec (CMU) and Jon Kleinberg (Cornell) (‘best paper’ KDD05)

Evolution of the Diameter: 

Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data?

Evolution of the Diameter: 

Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time As the network grows the distances between nodes slowly decrease

Diameter – ArXiv citation graph: 

Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter

Diameter – “Autonomous Systems”: 

Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter

Diameter – “Affiliation Network”: 

Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter

Diameter – “Patents”: 

Diameter – “Patents” Patent citation network 25 years of data time [years] diameter

Temporal Evolution of the Graphs: 

Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)

Temporal Evolution of the Graphs: 

Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! But obeying the ``Densification Power Law’’

Densification – Physics Citations: 

Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69

Densification – Physics Citations: 

Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69 1: tree

Densification – Physics Citations: 

Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2

Densification – Patent Citations: 

Densification – Patent Citations Citations among patents granted 1999 2.9 million nodes 16.5 million edges Each year is a datapoint N(t) E(t) 1.66

Densification – Autonomous Systems: 

Densification – Autonomous Systems Graph of Internet 2000 6,000 nodes 26,000 edges One graph per day N(t) E(t) 1.18

Densification – Affiliation Network: 

Densification – Affiliation Network Authors linked to their publications 2002 60,000 nodes 20,000 authors 38,000 papers 133,000 edges N(t) E(t) 1.15

Graphs - Conclusions: 

Graphs - Conclusions Real graphs obey some surprising patterns which can help us spot anomalies / outliers A lot of interest from web searching companies recommendation systems link spamming trust propagation HUGE graphs (Millions and Billions of nodes)

Outline: 

Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting Conclusions

Why care about streams?: 

Why care about streams?

Why care about streams?: 

Why care about streams? Sensor devices Temperature, weather measurements Road traffic data Geological observations Patient physiological data sensor-Andrew project Embedded devices Network routers

Co-evolving time sequences: 

Co-evolving time sequences Joint work with Jimeng Sun (CMU) Dr. Spiros Papadimitriou (CMU/IBM) Dr. Yasushi Sakurai (NTT) Prof. Jeanne VanBriesen (CMU/CEE)

Motivation: 

Motivation water distribution network normal operation sensors near leak sensors away from leak

Motivation: 

Motivation water distribution network normal operation major leak chlorine concentrations sensors near leak sensors away from leak

Motivation: 

Motivation actual measurements (n streams) k hidden variable(s) spot: “hidden (latent) variables”

Motivation: 

Motivation chlorine concentrations Phase 1 Phase 1 Phase 2 Phase 2 actual measurements (n streams) k hidden variable(s) k = 2 spot: “hidden (latent) variables”

Motivation: 

Motivation chlorine concentrations Phase 1 Phase 1 Phase 2 Phase 2 Phase 3 Phase 3 actual measurements (n streams) k hidden variable(s) k = 1 spot: “hidden (latent) variables”

SPIRIT / InteMon: 

SPIRIT / InteMon http://warsteiner.db.cs.cmu.edu/demo/intemon.jsp http://localhost:8080/demo/graphs.jsp self-* storage system (PDL/CMU) 1 PetaByte storage self-monitoring, self-healing: self-* with Jimeng Sun (CMU/CS) Evan Hoke (CMU/CS-ug) Prof. Greg Ganger (CMU/CS/ECE) John Strunk (CMU/ECE)

Related project: 

Related project Anomaly detection in network traffic (Zhang, Xie)

Conclusions: 

Conclusions Biological images, graphs & streams pose fascinating problems self-similarity, fractals and power laws work, when other methods fail!

Books: 

Books Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)

Contact info: 

Contact info christos@cs.cmu.edu www.cs.cmu.edu/~christos Wean Hall 7107 Ph#: x8.1457 and, again WELCOME!

authorStream Live Help