Social Web and Flickr

Category: Entertainment

Presentation Description

No description available.


Presentation Transcript

The Social Web or how flickr changed my life:

The Social Web or how flickr changed my life Kristina Lerman USC Information Sciences Institute

Web 1.0:

Web 1.0

Web 2.0:

Web 2.0

Elements of Social Web:

Elements of Social Web Users contribute content Images (Flickr, Zoomr) , news stories (Digg, Reddit) , bookmarks (Delicious, Bibsonomy) , videos (YouTube, Vimeo) , … Users add metadata to content Tags: annotate content with freely chosen keywords Discussion: leave comments Evaluation: active through voting or passive through views & favorites Users create social networks Add other users as friends/contacts Sites provide an easy interface to track friends’ activities Transparency Publicly navigable content and metadata


Flickr tags submitter discussion image stats

User profile:

User profile

User’s tags:

User’s tags Tags are keyword-based metadata added to content Help users organize their own data Facilitate searching and browsing for information Freely chosen by user

User’s favorite images (by other photographers):

User’s favorite images (by other photographers)

So what?:

So what? By exposing human activity, Social Web allows users to exploit the intelligence and opinions of others to solve problems New way of interacting with information Social Information Processing Exploit collective effects Word of mouth to amplify good information Amenable to analysis Design optimal social information processing systems Challenge for AI: harness the power of collective intelligence to solve information processing problems

Outline for the rest of the talk:

Outline for the rest of the talk User-contributed metadata can be used to solve following information processing problems Discovery Collectively added tags used for information discovery Personalization User-added metadata, in the form of tags and social networks, used to personalize search results Recommendation Social networks for information filtering Dynamics of collaboration Mathematical study of collaborative rating system

PowerPoint Presentation:

Discovery personalization recommendation dynamics of collaboration with: Anon Plangrasopchok

Information discovery:

Information discovery Goal: Automatically find resources that provide some functionality weather conditions, flight tracking, geocoding, … Simpler goal: Find resources that provide the same functionality as the seed, e.g., Improve robustness of information integration applications Increase coverage of the applications Approach: Leverage user-contributed tags to discover new resources similar to the seed

Anatomy of Delicious:

Anatomy of Delicious resource popular tags user tags user notes

Probabilistic approach:

Probabilistic approach Find a compressed description of the source Extract “latent topics” in a collection of sources, using Probabilistic Generative Model Compute pair-wise similarity between the seed and a source using compressed description Users Tags Sources Probabilistic Model Compute Source Similarity Compressed description Similar sources (sorted)

Alternative models:

Alternative models U I N t D Z R T N t D Z R T U N b Z R T ITM pLSA MWA [Plangrasopchok & Lerman, in IIWeb ’ 07 ] [Hoffman, in UAI’99 ] [Wu+, in WWW’06 ]


Datasets Seed resources: flytecomm , geocoder , wunderground For each seed, retrieve the 20 popular tags For each tag, retrieve other resources annotated with same tag For each resource, retrieve all resource-user-tag triples flytecomm geocoder wunderground Resources 3,562 5,572 7,176 Tags 14,297 16,887 77,056 Users 34,594 46,764 45,852

Experimental results:

Experimental results # of sources with similar functionality to the seed found pLSA – ignores users MWA – naïve Bayes ITM – our model (user interests and source topics) Google – ‘find similar pages’ Plangrasopchok & Lerman, “Exploiting Social Annotation for Resource Discovery” in AAAI IIWeb workshop, 2007

Summary and future work:

Summary and future work Exploit tagging activities of different users to find data sources similar to the seed Future work Extend the probabilistic model to learn topic hierarchies ( aka folksonomies) Travel Flights Booking Status Hotels Booking Reviews Car rentals Destinations

PowerPoint Presentation:

discovery Personalization recommendation dynamics of collaboration with: Anon Plangrasopchok & Michael Wong

Image search on Flickr:

Image search on Flickr Tag search finds all images tagged with a given keyword … It is prone to ambiguity Beetle Insect Car model Tiger Panthera tigris House cat Shark (tiger shark) Mac OS X Flower (tiger lily) Newborn Baby Kitten Puppy Etc…

Plain tag search:

Plain tag search Query Sense Relevant Precision newborn baby 412 0.82 tiger Panthera tigris 337 0.67 beetle insect 232 0.46 Relevance results for top 500 images retrieved by tag search (manually labeled using the first sense of each keyword)

Personalizing search results:

Personalizing search results Users express their tastes and preferences through the metadata they create Contacts they add to their social networks Tags they add to their own images Images they mark as their favorite Groups they join Use this metadata to improve image search results! Personalizing by tags Personalizing by contacts Restrict results of image search to those images that were submitted by user u ‘s friends ( Level 1 contacts)

Personalizing by contacts results:

# L1+L2 Pr Re 49,539 0.85 0.85 10,970 0.90 0.77 13,153 0.89 0.79 8,439 0.91 0.75 13,142 0.78 0.76 14,425 0.76 0.79 7,270 0.79 0.67 7,073 0.79 0.71 53,480 0.49 0.93 41,568 0.49 0.90 62,610 0.49 0.94 14,324 0.52 0.70 Personalizing by contacts results User #L 1 rel. Not rel. Pr Re newborn user1 719 232 0 1.00 0.56 user2 154 169 0 1.00 0.41 user3 174 147 0 1.00 0.36 user4 128 132 0 1.00 0.32 tiger user5 63 11 1 0.92 0.03 user6 103 78 3 0.96 0.23 user7 62 65 1 0.98 0.19 user8 56 30 0 0.97 0.09 beetle user9 445 18 1 0.95 0.08 user10 364 25 8 0.81 0.15 user11 783 78 25 0.75 0.34 user12 102 7 1 0.88 0.03 L1+L2: 9%-16% average improvement in precision

Personalizing by tags:

Personalizing by tags Users often add descriptive metadata to images Tags Titles Image descriptions Add image to groups Personalizing by tags Find (hidden) topics of interest to the user Find images in the search results related to these topics

Probabilistic topic model:

Probabilistic topic model Tagging as a stochastic process User u posts an image i Based on u ’s interests, topics z are chosen Tag t is selected based on z Probabilistic topic model Use EM to estimate p ( t | z ) and p ( z | u ) from data To find topics in each search set of 4500 images U Z T N t I


p(t|z) Topic 1 Topic 2 Topic 5 Topic 8 Topic 10 tiger tiger tiger tiger tiger zoo specanimal cat apple lion animal animalkingdomelite kitty mac dog nature abigfave cute osx shark animals flower kitten macintosh nyc wild butterfly cats screenshot cat tijger macro orange macosx man wildlife yellow eyes desktop people ilovenature swallowtail pet imac arizona cub lily tabby stevejobs rock siberiantiger green stripes dashboard beach blijdorp canon whiskers macbook sand london insect white powerbook sleeping australia nature art os tree portfolio pink feline 104 forest “tiger” image set: 4500 images trained on 10 topics

Personalizing by tags: Results:

Personalizing by tags: Results Precision of N top ranked search results, compared to plain search 4 users chosen to be interested in the first sense of search term Plain search – Flickr’s ordering of search results Lerman et al. , “Personalizing Image Search Results on Flickr” in AAAI ITWP workshop, 2007 newborn beetle

Summary & future work:

Summary & future work Improve results of image search for an individual user as long as the user has expressed interest in the topic of search Future work Lots of other metadata to exploit Favorites, groups, image titles and descriptions Discover relevant synonyms to expand search Topics that are new to the user? Exploit collective knowledge to find communities of interest Identify authorities within those communities

PowerPoint Presentation:

discovery personalization Recommendation dynamics of collaboration with: Dipsy Kapoor

Social News Aggregation on Digg:

Social News Aggregation on Digg Users submit stories Users vote on (digg) stories Select stories promoted to the front page based on received votes Collaborative front page emerges from the opinions of many users, not few editors Users create social networks by adding others as friends Friends Interface makes it easy to track friends’ activities Stories friends submitted Stories friends dugg (voted on)

Top users:

Top users Digg ranks users Based on how many of their stories were promoted to front page User with most stories is ranked #1, … Top 1000 users data Collected by scraping Digg … now available through the API Usage statistics User rank How many stories user submitted, dugg, commented on Social networks Friends: outgoing links A  B := B is a friend of A Reverse friends: incoming links A  B := A is a reverse friend of B

Digg datasets:

Digg datasets To see how votes change in time Tracked 2858 stories submitted over a period > day in May 2006 Only 98 stories were promoted to the front page To see how users vote on stories For ~200 front page stories Names of users who voted on (dugg) the story

Dynamics of votes:

Dynamics of votes Top users’ stories

`Interestingness’ distribution:

`Interestingness’ distribution Top users are not submitting the most “interesting” stories 50 stories from 14 users ave. max votes=600 48 stories from 45 users ave. max votes=1050

Social filtering as recommendation:

Social filtering as recommendation Social filtering explains why top users are so successful Users express their preferences by creating social networks Use these networks – through the Friends Interface – to find new stories to read Claim 1: Users digg stories their friends submit Claim 2: Users digg stories their friends digg

Social network on Digg:

Social network on Digg Top 1000 Digg users

How Friends interface works:

How Friends interface works submitter ‘ see stories my friends submitted’ … … ‘ see stories my friends dugg’

Users digg stories submitted by friends:

Users digg stories submitted by friends Number of diggs coming from submitter’s friends Probability that that many friends dugg a story by chance is P=0.005 num reverse friends num diggs from friends Lerman, “Social Browsing & Information Filtering in Social Media” submitted to JCMC

Users digg stories their friends digg:

Users digg stories their friends digg Combined social network size of the first m diggers and number of diggs coming from users within the combined network After m diggs Probability m=1 P=0.005 m=6 P=0.028 m=16 P=0.060 Probability such numbers could have been observed by chance

`Tyranny of the minority’:

`Tyranny of the minority’ Top users submit lion’s share of front page stories Explained by social filtering Top users have bigger, more active social networks Conspiracy: alternative explanation of top user success Top users accused of colluding to automatically promote each other’s stories Resulting uproar led Digg to change its story promotion algorithm … To discount votes coming from friends Led to greater front page diversity, but also unintended consequences

Effect of the new promotion algorithm:

Effect of the new promotion algorithm Intended effects Greater user diversity on the front page Smaller spread in story interestingness Unintended consequences Discourage users from joining social networks Alienating top users 36 stories from 24 users ave. max votes=960 35 stories from 35 users ave. max votes=1270

Design of collaborative rating systems:

Design of collaborative rating systems Designing a collaborative rating system, which exploits the emergent behavior of many independent evaluators, is difficult Small changes can have big consequences Few tools to predict system behavior Execution Simulation Can we explore the effects of promotion algorithms before they are implemented?

PowerPoint Presentation:

discovery personalization recommendation Dynamics of collaboration with: Dipsy Kapoor

Analysis as a design tool:

Analysis as a design tool Mathematical analysis can help understand and predict the emergent behavior of collaborative information systems Study the choice of the promotion algorithm before it is implemented Effect of design choices on system behavior story timeliness, interestingness, user participation, incentives to join social networks, etc.

Dynamics of collaborative rating:

Dynamics of collaborative rating Story is characterized by Interestingness r probability a story will received a vote when seen by a user Visibility Visibility on the upcoming stories page Decreases with time as new stories are submitted Visibility on the front page Decreases with time as new stories are promoted Visibility through the friends interface Stories friends submitted Stories friends dugg (voted on)

Mathematical model:

Mathematical model Mathematical model describes how the number of votes m ( t ) changes in time Solve equation Solutions parametrized by S , r Other parameters estimated from data

Dynamics of votes:

Dynamics of votes data model Lerman, “Social Information Processing in Social News Aggregation” Internet Computing (in press) 2007

Exploring the parameter space:

Exploring the parameter space Minimum S required for the story to be promoted for a given r for a fixed promotion threshold Time taken for a story with r and S to be promoted to the front page for a fixed promotion threshold

Dynamics of user influence:

Dynamics of user influence Digg ranked users according to how many front page stories they had Model of the dynamics of user influence Number of stories promoted to the front page F User’s social network growth S user1 user2 user3 user4 user5 user6

Model of rank dynamics:

Model of rank dynamics Number of stories promoted to the front page F Number of stories M submitted over D t =week User’s promotion success rate ~ S ( t ) User’s social network S grows as Others discover him through new front page stories ~ D F Others discover him through the Top Users list ~ g ( F ) Solve equations Estimate b, c, g ( F ) from data

Solutions 1:

Solutions 1 user2 data user6 data user2 model user6 model Lerman, “Dynamics of Collaborative Rating of Information” in KDD/SNA workshop, 2007

Solutions 2:

Solutions 2 user1 data user5 data user1 model user5 model Lerman, “Dynamics of Collaborative Rating of Information” in KDD/SNA workshop, 2007

Solutions 3:

Solutions 3 user3 data user4 data user3 model user4 model Lerman, “Dynamics of Collaborative Rating of Information” in KDD/SNA workshop, 2007

Previous works:

Previous works Technologies that exploit independent activities of many users for information discovery and recommendation Collaborative filtering [e.g., Grouplens project 1997-present] Users express opinions by rating many products System finds users with similar opinions and recommends products liked by those users Product recommendation used by Amazon & Netflix Users reluctant to rate products Social navigation [Dieberger et al, 2000] Exposes activity of others to help guide users to quality information sources “N users found X helpful” best seller lists, “what’s popular” pages, etc.


Conclusions In their every day use of Social Web sites, users create large quantity of data, which express their knowledge and opinions Content Articles, media content, opinion pieces, etc. Metadata Tags, ratings, discussion, social networks Links between users, content, and metadata Social Web enables new problem solving approaches Social Information Processing Use knowledge, opinions, work of others for own information needs Collective problem solving Efficient, robust solutions beyond the scope of individual capabilities

Upcoming events:

Upcoming events Social Information Processing Symposium When: March 2008 Where: AAAI Spring Symposium series @ Stanford Organizers: K. Lerman, B. Huberman (HP Labs), D. Gutelius (SRI), S. Merugu (Yahoo)

The future of the Social Web 2:

The future of the Social Web 2 Instead of connecting data , the Web connects people New applications Collaboration tools Collective intelligence: A large group of connected individuals acts more intelligently than individuals on their own The personalization of everything The more the system learns about me, the better it should filter Discovery, not search What papers do I need to read to know about the research on social networks? Identifying emerging communities Community-based vocabulary Authoritative sources within the community

The future of the Social Web:

The future of the Social Web New challenge for AI: Instead of ever cleverer algorithms , harness the Collective Intelligence Semantic Web vision [Berners-Lee & Hendler in Scientific American, 2001] Web content annotated with machine-readable metadata (a formal classification system) to aid automatic information integration Still unrealized in 2007 Too complicated: specialized training to be used effectively Costly and time-consuming to produce Variety of specialized ontologies: ontology alignment problem Folksonomy “ user generated taxonomy used to categorize and retrieve web content using open-ended labels called tags .” [source: Wikipedia] Bottom-up: decentralized, emergent, scalable Dynamic: adapts to changing needs and priorities Noisy: need tools to extract meaning from data

authorStream Live Help