Multimedia and Graph mining Christos Faloutsos
CMU

CONGRATULATIONS!:

CONGRATULATIONS!

Outline:

Outline Problem definition / Motivation
Biological image mining
Graphs and power laws
Streams and forecasting
Conclusions

Motivation:

Motivation Data mining: ~ find patterns (rules, outliers)
How do detached cat retinas evolve?
How do real graphs look like?
How do (numerical) streams look like?

ViVo: cat retina mining:

ViVo: cat retina mining with Ambuj Singh, Mark Verardo, Vebjorn Ljosa, Arnab Bhattacharya (UCSB)
Jia-Yu Tim Pan, HJ Yang (CMU)

Detachment Development:

Detachment Development Normal 1 day after
detachment 3 days after
detachment 7 days after
detachment 28 days after
detachment 3 months after
detachment

Data and Problem:

Data and Problem (Problem) What happens in retina after detachment?
What tissues (regions) are involved?
How do they change over time?
How will a program convey this info?
More than classification “we want to learn what classifier learned”

Main idea:

Main idea extract characteristic visual ‘words’
Equivalent to characteristic keywords, in a collection of text documents

With Eric Xing (CMU CS)
Bob Murphy (CMU – Bio)
Tim Pan (CMU -> Google)
Andre Balan (U. Sao Paulo)

Outline:

Outline Problem definition / Motivation
Biological image mining
Graphs and power laws
Streams and forecasting
Conclusions

Graphs - why should we care?:

Graphs - why should we care?

Graphs - why should we care?:

Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

Joint work with:

Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

Problem: network and graph mining:

Problem: network and graph mining How does the Internet look like?
How does the web look like?
What constitutes a ‘normal’ social network?
What is ‘normal’/‘abnormal’?
which patterns/laws hold?

Graph mining:

Graph mining Are real graphs random?

Laws and patterns:

Laws and patterns NO!!
Diameter
in- and out- degree distributions
other (surprising) patterns

Laws – degree distributions:

Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree count ?? 3

Laws – degree distributions:

Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree

Solution::

Solution: The plot is linear in log-log scale [FFF’99]
freq = degree (-2.15) O = -2.15 Exponent = slope Outdegree Frequency Nov’97 -2.15

But::

But: Q1: How about graphs from other domains?
Q2: How about temporal evolution?

The Peer-to-Peer Topology:

The Peer-to-Peer Topology Frequency versus degree
Number of adjacent peers follows a power-law [Jovanovic+]

More power laws::

More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman

Slide29:

Swedish sex-web Nodes: people (Females; Males)
Links: sexual relationships Liljeros et al. Nature 2001 4781 Swedes; 18-74;
59% response rate. Albert Laszlo Barabasi
http://www.nd.edu/~networks/
Publication%20Categories/
04%20Talks/2005-norway-3hours.ppt

More power laws::

More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf ``ebay’’

A famous power law: Zipf’s law Bible - rank vs frequency (log-log)
similarly, in many other languages; for customers and sales volume; city populations etc etc
log(rank) log(freq)

More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)

But::

But: Q1: How about graphs from other domains?
Q2: How about temporal evolution?

Time evolution:

Time evolution with Jure Leskovec (CMU)
and Jon Kleinberg (Cornell)
(‘best paper’ KDD05)

Evolution of the Diameter:

Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter:
diameter ~ O(log N)
diameter ~ O(log log N)
What is happening in real data?

Evolution of the Diameter:

Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter:
diameter ~ O(log N)
diameter ~ O(log log N)
What is happening in real data?
Diameter shrinks over time
As the network grows the distances between nodes slowly decrease

Diameter – ArXiv citation graph:

Diameter – ArXiv citation graph Citations among physics papers
1992 –2003
One graph per year time [years] diameter

Diameter – “Autonomous Systems”:

Diameter – “Autonomous Systems” Graph of Internet
One graph per day
1997 – 2000 number of nodes diameter

Diameter – “Affiliation Network”:

Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers
10 years of data time [years] diameter

Diameter – “Patents”:

Diameter – “Patents” Patent citation network
25 years of data
time [years] diameter

Temporal Evolution of the Graphs:

Temporal Evolution of the Graphs N(t) … nodes at time t
E(t) … edges at time t
Suppose that
N(t+1) = 2 * N(t)
Q: what is your guess for
E(t+1) =? 2 * E(t)

Temporal Evolution of the Graphs:

Temporal Evolution of the Graphs N(t) … nodes at time t
E(t) … edges at time t
Suppose that
N(t+1) = 2 * N(t)
Q: what is your guess for
E(t+1) =? 2 * E(t)
A: over-doubled!
But obeying the ``Densification Power Law’’

Graphs - Conclusions Real graphs obey some surprising patterns
which can help us spot anomalies / outliers
A lot of interest from web searching companies
recommendation systems
link spamming
trust propagation
HUGE graphs (Millions and Billions of nodes)

Outline:

Outline Problem definition / Motivation
Biological image mining
Graphs and power laws
Streams and forecasting
Conclusions

Why care about streams?:

Why care about streams?

Why care about streams?:

Why care about streams? Sensor devices
Temperature, weather measurements
Road traffic data
Geological observations
Patient physiological data
sensor-Andrew project
Embedded devices
Network routers

Co-evolving time sequences:

Co-evolving time sequences Joint work with
Jimeng Sun (CMU)
Dr. Spiros Papadimitriou (CMU/IBM)
Dr. Yasushi Sakurai (NTT)
Prof. Jeanne VanBriesen (CMU/CEE)

Motivation:

Motivation water distribution network normal operation sensors
near leak sensors
away
from leak

Motivation:

Motivation water distribution network normal operation major leak chlorine concentrations sensors
near leak sensors
away
from leak

Motivation:

Motivation actual measurements
(n streams) k hidden variable(s) spot: “hidden (latent) variables”

Motivation:

Motivation chlorine concentrations Phase 1 Phase 1 Phase 2 Phase 2 actual measurements
(n streams) k hidden variable(s) k = 2 spot: “hidden (latent) variables”

Motivation:

Motivation chlorine concentrations Phase 1 Phase 1 Phase 2 Phase 2 Phase 3 Phase 3 actual measurements
(n streams) k hidden variable(s) k = 1 spot: “hidden (latent) variables”

SPIRIT / InteMon:

SPIRIT / InteMon http://warsteiner.db.cs.cmu.edu/demo/intemon.jsp
http://localhost:8080/demo/graphs.jsp
self-* storage system (PDL/CMU)
1 PetaByte storage
self-monitoring, self-healing: self-*
with Jimeng Sun (CMU/CS)
Evan Hoke (CMU/CS-ug)
Prof. Greg Ganger (CMU/CS/ECE)
John Strunk (CMU/ECE)

Related project:

Related project Anomaly detection in network traffic (Zhang, Xie)

Conclusions:

Conclusions Biological images, graphs & streams pose fascinating problems
self-similarity, fractals and power laws work, when other methods fail!

Books:

Books Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)

Contact info:

Contact info christos@cs.cmu.edu
www.cs.cmu.edu/~christos
Wean Hall 7107
Ph#: x8.1457
and, again WELCOME!

You do not have the permission to view this presentation. In order to view it, please
contact the author of the presentation.

Send to Blogs and Networks

Processing ....

Premium member

Use HTTPs

HTTPS (Hypertext Transfer Protocol Secure) is a protocol used by Web servers to transfer and display Web content securely. Most web browsers block content or generate a “mixed content” warning when users access web pages via HTTPS that contain embedded content loaded via HTTP. To prevent users from facing this, Use HTTPS option.