Ic 06 V13

Uploaded from authorPOINT Lite
Download as
 PPT
Presentation Description 

No description available

Views: 61
Like it  ( Likes) Dislike it  ( Dislikes)
Added: April 15, 2008 This Presentation is Public 
Presentation Category : Education All Rights Reserved
Presentation Transcript

Multimedia and Graph mining: Multimedia and Graph mining Christos Faloutsos CMU


CONGRATULATIONS!: CONGRATULATIONS!


Outline: Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting Conclusions


Motivation: Motivation Data mining: ~ find patterns (rules, outliers) How do detached cat retinas evolve? How do real graphs look like? How do (numerical) streams look like?


ViVo: cat retina mining: ViVo: cat retina mining with Ambuj Singh, Mark Verardo, Vebjorn Ljosa, Arnab Bhattacharya (UCSB) Jia-Yu Tim Pan, HJ Yang (CMU)


Detachment Development: Detachment Development Normal 1 day after detachment 3 days after detachment 7 days after detachment 28 days after detachment 3 months after detachment


Data and Problem: Data and Problem (Problem) What happens in retina after detachment? What tissues (regions) are involved? How do they change over time? How will a program convey this info? More than classification “we want to learn what classifier learned”


Main idea: Main idea extract characteristic visual ‘words’ Equivalent to characteristic keywords, in a collection of text documents


Visual vocabulary?: Visual vocabulary?


Visual vocabulary?: Visual vocabulary? news: president, minister, economic sports: baseball, score, penalty


Visual Vocabulary (ViVo) generation: Visual Vocabulary (ViVo) generation Step 1: Tile image Step 2: Extract tile features Step 3: ViVo generation Visual vocabulary Feature 1 Feature 2 8x12 tiles


Biological interpretation: Biological interpretation


Which tissue is significant on 7-day?: Which tissue is significant on 7-day?


FEMine: Mining Fly Embryos: FEMine: Mining Fly Embryos


With: With Eric Xing (CMU CS) Bob Murphy (CMU – Bio) Tim Pan (CMU -> Google) Andre Balan (U. Sao Paulo)


Outline: Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting Conclusions


Graphs - why should we care?: Graphs - why should we care?


Graphs - why should we care?: Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]


Joint work with: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)


Problem: network and graph mining: Problem: network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is ‘normal’/‘abnormal’? which patterns/laws hold?


Graph mining: Graph mining Are real graphs random?


Laws and patterns: Laws and patterns NO!! Diameter in- and out- degree distributions other (surprising) patterns


Laws – degree distributions: Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree count ?? 3


Laws – degree distributions: Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree


Solution:: Solution: The plot is linear in log-log scale [FFF’99] freq = degree (-2.15) O = -2.15 Exponent = slope Outdegree Frequency Nov’97 -2.15


But:: But: Q1: How about graphs from other domains? Q2: How about temporal evolution?


The Peer-to-Peer Topology: The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]


More power laws:: More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman


Slide29: Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature 2001 4781 Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi http://www.nd.edu/~networks/ Publication%20Categories/ 04%20Talks/2005-norway-3hours.ppt


More power laws:: More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf ``ebay’’


epinions.com: epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people user


A famous power law: Zipf’s law: A famous power law: Zipf’s law Bible - rank vs frequency (log-log) similarly, in many other languages; for customers and sales volume; city populations etc etc log(rank) log(freq)


Olympic medals (Sidney’00, Athens’04):: Olympic medals (Sidney’00, Athens’04): log( rank) log(#medals)


More power laws: areas – Korcak’s law: More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)


But:: But: Q1: How about graphs from other domains? Q2: How about temporal evolution?


Time evolution: Time evolution with Jure Leskovec (CMU) and Jon Kleinberg (Cornell) (‘best paper’ KDD05)


Evolution of the Diameter: Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data?


Evolution of the Diameter: Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time As the network grows the distances between nodes slowly decrease


Diameter – ArXiv citation graph: Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter


Diameter – “Autonomous Systems”: Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter


Diameter – “Affiliation Network”: Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter


Diameter – “Patents”: Diameter – “Patents” Patent citation network 25 years of data time [years] diameter


Temporal Evolution of the Graphs: Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)


Temporal Evolution of the Graphs: Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! But obeying the ``Densification Power Law’’


Densification – Physics Citations: Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69


Densification – Physics Citations: Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69 1: tree


Densification – Physics Citations: Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2


Densification – Patent Citations: Densification – Patent Citations Citations among patents granted 1999 2.9 million nodes 16.5 million edges Each year is a datapoint N(t) E(t) 1.66


Densification – Autonomous Systems: Densification – Autonomous Systems Graph of Internet 2000 6,000 nodes 26,000 edges One graph per day N(t) E(t) 1.18


Densification – Affiliation Network: Densification – Affiliation Network Authors linked to their publications 2002 60,000 nodes 20,000 authors 38,000 papers 133,000 edges N(t) E(t) 1.15


Graphs - Conclusions: Graphs - Conclusions Real graphs obey some surprising patterns which can help us spot anomalies / outliers A lot of interest from web searching companies recommendation systems link spamming trust propagation HUGE graphs (Millions and Billions of nodes)


Outline: Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting Conclusions


Why care about streams?: Why care about streams?


Why care about streams?: Why care about streams? Sensor devices Temperature, weather measurements Road traffic data Geological observations Patient physiological data sensor-Andrew project Embedded devices Network routers


Co-evolving time sequences: Co-evolving time sequences Joint work with Jimeng Sun (CMU) Dr. Spiros Papadimitriou (CMU/IBM) Dr. Yasushi Sakurai (NTT) Prof. Jeanne VanBriesen (CMU/CEE)


Motivation: Motivation water distribution network normal operation sensors near leak sensors away from leak


Motivation: Motivation water distribution network normal operation major leak chlorine concentrations sensors near leak sensors away from leak


Motivation: Motivation actual measurements (n streams) k hidden variable(s) spot: “hidden (latent) variables”


Motivation: Motivation chlorine concentrations Phase 1 Phase 1 Phase 2 Phase 2 actual measurements (n streams) k hidden variable(s) k = 2 spot: “hidden (latent) variables”


Motivation: Motivation chlorine concentrations Phase 1 Phase 1 Phase 2 Phase 2 Phase 3 Phase 3 actual measurements (n streams) k hidden variable(s) k = 1 spot: “hidden (latent) variables”


SPIRIT / InteMon: SPIRIT / InteMon http://warsteiner.db.cs.cmu.edu/demo/intemon.jsp http://localhost:8080/demo/graphs.jsp self-* storage system (PDL/CMU) 1 PetaByte storage self-monitoring, self-healing: self-* with Jimeng Sun (CMU/CS) Evan Hoke (CMU/CS-ug) Prof. Greg Ganger (CMU/CS/ECE) John Strunk (CMU/ECE)


Related project: Related project Anomaly detection in network traffic (Zhang, Xie)


Conclusions: Conclusions Biological images, graphs & streams pose fascinating problems self-similarity, fractals and power laws work, when other methods fail!


Books: Books Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)


Contact info: Contact info christos@cs.cmu.edu www.cs.cmu.edu/~christos Wean Hall 7107 Ph#: x8.1457 and, again WELCOME!