Research Trends in Multimedia Content Services

Research Trends in Multimedia Content Services Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences András A. Benczúr

Web 2.0, 3.0 …? • Platform convergence (Web, PC, mobile, television) – information vs. recreation • Emphasis on social content (blogs, Wikipedia, photo and video sharing) • From search towards recommendation (query free, profile based, personalized) • From text towards multimedia • Glocalization (language, geography) • Spam

A sample service RSS Web 2.0 Recommender engine client software • Small screen browsing • Recommendation based on user profile (avoid query typing) • Read blogs, view media, …

The user profile • History stored for each user: • Known ratings, preferences, opinion – scarce! • Items read, weighted by time spent • details seen, scrolling, back button • Terms in documents read, tf.idf weighted top list • User language, region, current location and known sociodemographic data • Multimedia!

Same item—multiple source

Information vs recreation: Do not mix the two?

Spam is increasingly annoying

Distribution of categories Unknown 0.4% Alias 0.3% Empty 0.4% Non-existent 7.9% Ad 3.7% Weborg 0.8% Spam 16.5% Reputable 70.0%

Keresési találati pozíció hatása Találati pozíció nézésével töltött idő Találathoz érkezés ideje

Multimedia Information Retrieval

Segmentation Similar objects

ImageCLEFObject Retrieval Task Class of Query Image Pre-classified Images VOC2007 Query Images Original Training Set

Networked relation • spam • social network analysis • churn

Szociális hálózatok ADSL --- ADSL --- home business

Biztosítási csalások – hálózatban

Stacked Graphical Learning ? • Predict churn p(v) of node v • For target node u, aggregate p(v) for neighbors to form new feature f(u) • Rerun classification by adding feature f(.) • Iterate v7 v1 v2 u

Why social networks are hard to analyze Subgraphs of social networks Tentacles induce noise Medium size dense communities attract much algorithmic work

Mapping into 2D plain spectral semidefinite

Research Highlights Recommenders: KDD Cup 2007 Task 1 First Prize Predict the probability that a user rated a movie in 2006, based on year –2005 training data Spam filtering: Web Spam Challenge 1 first place Churn prediction: method presented at KDD Cup 2009 Workshop Task XXXX

Netflix: lessons and differences learned • Ratings 1– 5 stars • Predict an unseen rating • Evaluation: RMSE • 0.8572: $1,000,000 • Current leader: 0.8650 • Oct/07: 0.8712 KDD Cup 2007 • same data set • predict existence of a rating

Results of two separate tasks BellKor team report [Bell, Koren 2007]: • Low rank approximation • Restricted Boltzmann Machine • Nearest neighbor KDD Cup 2007: Predict probability that a user rated a movie in 2006: • Given list of 100,000 user–movie pairs • Users and movies drawn from Netflix Prize data set Winner report [K, B, and our colleauges 2007]

Evaluation and Issue 1 • For a given user i and movie j • where is the predicted value • KDD Cup example: • Our RMSE: 0.256 • First runner up: 0.263 • All zeroes prediction: 0.279 (Place 10-13) • But why do we use RMSE and not precision/recall? • RMSE preferes correct probability guesses for the majority unfrequently visited items • The presence of the recommender changes usage

Method Overview • Probability by naive user-movie independence • Item frequency estimation (Time Series) • User frequency estimation • Reaches RMSE 0.260 in itself (still first place) • Data Mining • SVD • Item-item similarities • Association Rules • Combination (we used linear regression)

Time series prediction Interest remains for long time range (several years)

Short lifetime of online items Publication day Next day usage peak Origo Very different behavior in time: news articles Third day and gone … http://www.origo.hu/filmklub/20060124kiolte.html

SVD user movie news item • K-dim SVD: Noise filtering – the essence of the matrix – optimizes • SVD explains ratings as effect of few linear factors • RMSE (ℓ2 error) 10-30 dim: 0.93 • Issue: too many news items • 18K Netflix movies vs. • potentially infinite set of items • -> may recommend data source but not the item

Lessons learned • Content similarity might be the key feature • Relative success of trivial estimates on KDD Cup! • Data mining techniques overlap, apparently catch similar patterns • Precision/recall is more important than RMSE • Solution must make heavy use of time

Future plans and ideas • New partners and application fields: network infrastructure, new generation services, bioinformatics, …? • Scaling our solutions to multi-core architectures • Use our search (cross-lingual, multimedia etc) and recommender system capabilities in major solutions; mobile, new generation platforms etc. • Expand means of our European level collaboration, e.g. KIC participation

Questions ? benczur@sztaki.hu http://datamining.sztaki.hu Andras A. Benczur

Research Trends in Multimedia Content Services

Research Trends in Multimedia Content Services

Presentation Transcript

Content Design for Multimedia

content services

Multimedia e-Content in Education

ENGAGE STUDENTS IN CREATIVE MULTIMEDIA CONTENT PRODUCTION

P2P Content Distribution for Multimedia Services in IMS-based Telco Environments

Content-Based Retrieval (CBR) -in multimedia systems

Content Based Multimedia Signal Processing

Multimedia Services in the Internet

Multimedia Technology Research in HKUST

Multimedia Research in E-Education

Multimedia Services in the Internet

CREATING SUSTAINABLE MULTIMEDIA CONTENT

MULTIMEDIA ALARM SERVICES

Embedding Multimedia Content in WebPages

Semantics in Multimedia Content for Multimedia Use

Content Types: Markup and Multimedia

Multimedia Research in E-Education

Multimedia Technology Research in HKUST

Content-Based Multimedia Access