logging in or signing up ic 06 v13 Savin Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 94 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: April 15, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Multimedia and Graph mining: Multimedia and Graph mining Christos Faloutsos CMUCONGRATULATIONS!: CONGRATULATIONS!Outline: Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting ConclusionsMotivation: Motivation Data mining: ~ find patterns (rules, outliers) How do detached cat retinas evolve? How do real graphs look like? How do (numerical) streams look like?ViVo: cat retina mining: ViVo: cat retina mining with Ambuj Singh, Mark Verardo, Vebjorn Ljosa, Arnab Bhattacharya (UCSB) Jia-Yu Tim Pan, HJ Yang (CMU) Detachment Development: Detachment Development Normal 1 day after detachment 3 days after detachment 7 days after detachment 28 days after detachment 3 months after detachmentData and Problem: Data and Problem (Problem) What happens in retina after detachment? What tissues (regions) are involved? How do they change over time? How will a program convey this info? More than classification “we want to learn what classifier learned”Main idea: Main idea extract characteristic visual ‘words’ Equivalent to characteristic keywords, in a collection of text documents Visual vocabulary?: Visual vocabulary?Visual vocabulary?: Visual vocabulary? news: president, minister, economic sports: baseball, score, penaltyVisual Vocabulary (ViVo) generation: Visual Vocabulary (ViVo) generation Step 1: Tile image Step 2: Extract tile features Step 3: ViVo generation Visual vocabulary Feature 1 Feature 2 8x12 tilesBiological interpretation: Biological interpretationWhich tissue is significant on 7-day?: Which tissue is significant on 7-day?FEMine: Mining Fly Embryos: FEMine: Mining Fly EmbryosWith: With Eric Xing (CMU CS) Bob Murphy (CMU – Bio) Tim Pan (CMU -> Google) Andre Balan (U. Sao Paulo)Outline: Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting ConclusionsGraphs - why should we care?: Graphs - why should we care?Graphs - why should we care?: Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]Joint work with: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)Problem: network and graph mining: Problem: network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is ‘normal’/‘abnormal’? which patterns/laws hold?Graph mining: Graph mining Are real graphs random?Laws and patterns: Laws and patterns NO!! Diameter in- and out- degree distributions other (surprising) patternsLaws – degree distributions: Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree count ?? 3Laws – degree distributions: Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degreeSolution:: Solution: The plot is linear in log-log scale [FFF’99] freq = degree (-2.15) O = -2.15 Exponent = slope Outdegree Frequency Nov’97 -2.15But:: But: Q1: How about graphs from other domains? Q2: How about temporal evolution?The Peer-to-Peer Topology: The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]More power laws:: More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) UllmanSlide29: Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature 2001 4781 Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi http://www.nd.edu/~networks/ Publication%20Categories/ 04%20Talks/2005-norway-3hours.pptMore power laws:: More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf ``ebay’’epinions.com: epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people userA famous power law: Zipf’s law: A famous power law: Zipf’s law Bible - rank vs frequency (log-log) similarly, in many other languages; for customers and sales volume; city populations etc etc log(rank) log(freq)Olympic medals (Sidney’00, Athens’04):: Olympic medals (Sidney’00, Athens’04): log( rank) log(#medals)More power laws: areas – Korcak’s law: More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)But:: But: Q1: How about graphs from other domains? Q2: How about temporal evolution? Time evolution: Time evolution with Jure Leskovec (CMU) and Jon Kleinberg (Cornell) (‘best paper’ KDD05)Evolution of the Diameter: Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data?Evolution of the Diameter: Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time As the network grows the distances between nodes slowly decreaseDiameter – ArXiv citation graph: Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameterDiameter – “Autonomous Systems”: Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameterDiameter – “Affiliation Network”: Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameterDiameter – “Patents”: Diameter – “Patents” Patent citation network 25 years of data time [years] diameterTemporal Evolution of the Graphs: Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)Temporal Evolution of the Graphs: Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! But obeying the ``Densification Power Law’’ Densification – Physics Citations: Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69Densification – Physics Citations: Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69 1: treeDensification – Physics Citations: Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2Densification – Patent Citations: Densification – Patent Citations Citations among patents granted 1999 2.9 million nodes 16.5 million edges Each year is a datapoint N(t) E(t) 1.66Densification – Autonomous Systems: Densification – Autonomous Systems Graph of Internet 2000 6,000 nodes 26,000 edges One graph per day N(t) E(t) 1.18Densification – Affiliation Network: Densification – Affiliation Network Authors linked to their publications 2002 60,000 nodes 20,000 authors 38,000 papers 133,000 edges N(t) E(t) 1.15Graphs - Conclusions: Graphs - Conclusions Real graphs obey some surprising patterns which can help us spot anomalies / outliers A lot of interest from web searching companies recommendation systems link spamming trust propagation HUGE graphs (Millions and Billions of nodes)Outline: Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting ConclusionsWhy care about streams?: Why care about streams?Why care about streams?: Why care about streams? Sensor devices Temperature, weather measurements Road traffic data Geological observations Patient physiological data sensor-Andrew project Embedded devices Network routersCo-evolving time sequences: Co-evolving time sequences Joint work with Jimeng Sun (CMU) Dr. Spiros Papadimitriou (CMU/IBM) Dr. Yasushi Sakurai (NTT) Prof. Jeanne VanBriesen (CMU/CEE)Motivation: Motivation water distribution network normal operation sensors near leak sensors away from leakMotivation: Motivation water distribution network normal operation major leak chlorine concentrations sensors near leak sensors away from leak Motivation: Motivation actual measurements (n streams) k hidden variable(s) spot: “hidden (latent) variables” Motivation: Motivation chlorine concentrations Phase 1 Phase 1 Phase 2 Phase 2 actual measurements (n streams) k hidden variable(s) k = 2 spot: “hidden (latent) variables”Motivation: Motivation chlorine concentrations Phase 1 Phase 1 Phase 2 Phase 2 Phase 3 Phase 3 actual measurements (n streams) k hidden variable(s) k = 1 spot: “hidden (latent) variables”SPIRIT / InteMon: SPIRIT / InteMon http://warsteiner.db.cs.cmu.edu/demo/intemon.jsp http://localhost:8080/demo/graphs.jsp self-* storage system (PDL/CMU) 1 PetaByte storage self-monitoring, self-healing: self-* with Jimeng Sun (CMU/CS) Evan Hoke (CMU/CS-ug) Prof. Greg Ganger (CMU/CS/ECE) John Strunk (CMU/ECE) Related project: Related project Anomaly detection in network traffic (Zhang, Xie) Conclusions: Conclusions Biological images, graphs & streams pose fascinating problems self-similarity, fractals and power laws work, when other methods fail!Books: Books Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!) Contact info: Contact info christos@cs.cmu.edu www.cs.cmu.edu/~christos Wean Hall 7107 Ph#: x8.1457 and, again WELCOME! You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
ic 06 v13 Savin Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 94 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: April 15, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Multimedia and Graph mining: Multimedia and Graph mining Christos Faloutsos CMUCONGRATULATIONS!: CONGRATULATIONS!Outline: Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting ConclusionsMotivation: Motivation Data mining: ~ find patterns (rules, outliers) How do detached cat retinas evolve? How do real graphs look like? How do (numerical) streams look like?ViVo: cat retina mining: ViVo: cat retina mining with Ambuj Singh, Mark Verardo, Vebjorn Ljosa, Arnab Bhattacharya (UCSB) Jia-Yu Tim Pan, HJ Yang (CMU) Detachment Development: Detachment Development Normal 1 day after detachment 3 days after detachment 7 days after detachment 28 days after detachment 3 months after detachmentData and Problem: Data and Problem (Problem) What happens in retina after detachment? What tissues (regions) are involved? How do they change over time? How will a program convey this info? More than classification “we want to learn what classifier learned”Main idea: Main idea extract characteristic visual ‘words’ Equivalent to characteristic keywords, in a collection of text documents Visual vocabulary?: Visual vocabulary?Visual vocabulary?: Visual vocabulary? news: president, minister, economic sports: baseball, score, penaltyVisual Vocabulary (ViVo) generation: Visual Vocabulary (ViVo) generation Step 1: Tile image Step 2: Extract tile features Step 3: ViVo generation Visual vocabulary Feature 1 Feature 2 8x12 tilesBiological interpretation: Biological interpretationWhich tissue is significant on 7-day?: Which tissue is significant on 7-day?FEMine: Mining Fly Embryos: FEMine: Mining Fly EmbryosWith: With Eric Xing (CMU CS) Bob Murphy (CMU – Bio) Tim Pan (CMU -> Google) Andre Balan (U. Sao Paulo)Outline: Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting ConclusionsGraphs - why should we care?: Graphs - why should we care?Graphs - why should we care?: Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]Joint work with: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)Problem: network and graph mining: Problem: network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is ‘normal’/‘abnormal’? which patterns/laws hold?Graph mining: Graph mining Are real graphs random?Laws and patterns: Laws and patterns NO!! Diameter in- and out- degree distributions other (surprising) patternsLaws – degree distributions: Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree count ?? 3Laws – degree distributions: Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degreeSolution:: Solution: The plot is linear in log-log scale [FFF’99] freq = degree (-2.15) O = -2.15 Exponent = slope Outdegree Frequency Nov’97 -2.15But:: But: Q1: How about graphs from other domains? Q2: How about temporal evolution?The Peer-to-Peer Topology: The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]More power laws:: More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) UllmanSlide29: Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature 2001 4781 Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi http://www.nd.edu/~networks/ Publication%20Categories/ 04%20Talks/2005-norway-3hours.pptMore power laws:: More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf ``ebay’’epinions.com: epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people userA famous power law: Zipf’s law: A famous power law: Zipf’s law Bible - rank vs frequency (log-log) similarly, in many other languages; for customers and sales volume; city populations etc etc log(rank) log(freq)Olympic medals (Sidney’00, Athens’04):: Olympic medals (Sidney’00, Athens’04): log( rank) log(#medals)More power laws: areas – Korcak’s law: More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)But:: But: Q1: How about graphs from other domains? Q2: How about temporal evolution? Time evolution: Time evolution with Jure Leskovec (CMU) and Jon Kleinberg (Cornell) (‘best paper’ KDD05)Evolution of the Diameter: Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data?Evolution of the Diameter: Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time As the network grows the distances between nodes slowly decreaseDiameter – ArXiv citation graph: Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameterDiameter – “Autonomous Systems”: Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameterDiameter – “Affiliation Network”: Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameterDiameter – “Patents”: Diameter – “Patents” Patent citation network 25 years of data time [years] diameterTemporal Evolution of the Graphs: Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)Temporal Evolution of the Graphs: Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! But obeying the ``Densification Power Law’’ Densification – Physics Citations: Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69Densification – Physics Citations: Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69 1: treeDensification – Physics Citations: Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2Densification – Patent Citations: Densification – Patent Citations Citations among patents granted 1999 2.9 million nodes 16.5 million edges Each year is a datapoint N(t) E(t) 1.66Densification – Autonomous Systems: Densification – Autonomous Systems Graph of Internet 2000 6,000 nodes 26,000 edges One graph per day N(t) E(t) 1.18Densification – Affiliation Network: Densification – Affiliation Network Authors linked to their publications 2002 60,000 nodes 20,000 authors 38,000 papers 133,000 edges N(t) E(t) 1.15Graphs - Conclusions: Graphs - Conclusions Real graphs obey some surprising patterns which can help us spot anomalies / outliers A lot of interest from web searching companies recommendation systems link spamming trust propagation HUGE graphs (Millions and Billions of nodes)Outline: Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting ConclusionsWhy care about streams?: Why care about streams?Why care about streams?: Why care about streams? Sensor devices Temperature, weather measurements Road traffic data Geological observations Patient physiological data sensor-Andrew project Embedded devices Network routersCo-evolving time sequences: Co-evolving time sequences Joint work with Jimeng Sun (CMU) Dr. Spiros Papadimitriou (CMU/IBM) Dr. Yasushi Sakurai (NTT) Prof. Jeanne VanBriesen (CMU/CEE)Motivation: Motivation water distribution network normal operation sensors near leak sensors away from leakMotivation: Motivation water distribution network normal operation major leak chlorine concentrations sensors near leak sensors away from leak Motivation: Motivation actual measurements (n streams) k hidden variable(s) spot: “hidden (latent) variables” Motivation: Motivation chlorine concentrations Phase 1 Phase 1 Phase 2 Phase 2 actual measurements (n streams) k hidden variable(s) k = 2 spot: “hidden (latent) variables”Motivation: Motivation chlorine concentrations Phase 1 Phase 1 Phase 2 Phase 2 Phase 3 Phase 3 actual measurements (n streams) k hidden variable(s) k = 1 spot: “hidden (latent) variables”SPIRIT / InteMon: SPIRIT / InteMon http://warsteiner.db.cs.cmu.edu/demo/intemon.jsp http://localhost:8080/demo/graphs.jsp self-* storage system (PDL/CMU) 1 PetaByte storage self-monitoring, self-healing: self-* with Jimeng Sun (CMU/CS) Evan Hoke (CMU/CS-ug) Prof. Greg Ganger (CMU/CS/ECE) John Strunk (CMU/ECE) Related project: Related project Anomaly detection in network traffic (Zhang, Xie) Conclusions: Conclusions Biological images, graphs & streams pose fascinating problems self-similarity, fractals and power laws work, when other methods fail!Books: Books Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!) Contact info: Contact info christos@cs.cmu.edu www.cs.cmu.edu/~christos Wean Hall 7107 Ph#: x8.1457 and, again WELCOME!