DeviantART Analysis using Image Features

DeviantART Analysis using Image Features Bart Buter, Davide Modolo, Sander van Noort Nick Dijkshoorn, Quang Nguyen, Bart van de Poel

Profile Project • Our project focused on explorative research on the analysis of artists and their images of a huge art community called deviantART • The research touched different fields: • Visualization (implementation of a Toolkit) • Data collection • Features extraction (statistical and cognitive-inspired) • Classification • Network analysis

Overview • Introduction • Toolkit • Experiments & Results • Future work • Conclusion

Introduction - deviantART • deviantART (dA) is the largest online community showcasing various forms of user-made artwork • 13 million registered members (called Deviants) • Allows emerging and established artists to exhibit, promote, and share their works • All artwork is well organized (comprehensive category structure) • Traditional media (painting and sculpture), to digital art, pixel art, films and anime

Research questions • Can we visualize important aspects of deviantART? • Can artists and/or styles be distinguished? • Are artists influencing each other? • Do art styles change over time? • Are there none-artists interesting for deviantART?

Toolkit • General tool to answer research questions about social art communities (deviantART) • 4 Components Online

Data collection from deviantART • Network of “professional” artists • Download artist’s name and their watchers • Output for Pajek and Matlab graph toolbox • Artist’s images and information about these images • Download galleries from users as dataset • No web API, instead follow Backend links • Parse RSS XML files and download images Data collection

Data collection • For each image store a xml file Example: <?xml version="1.0"?> <root xml_tb_version="3.1"> <guid>http://catluvr2.deviantart.com/art/42-Journals-73664427</guid> <title>-42 Journals</title> <category>customization/screenshots/other</category> <filename>_42_Journals_by_catluvr2.jpg</filename> </root> Data collection

Dataset information • Downloaded 31 users • About 5000 images • Daily Deviations of a random day • Top categories: • photography: 2244 • customization: 906 • traditional: 842 • digitalart: 587 • fanart: 239 Data collection

Feature extraction • Why we need features • Can’t visualize sets of images in high-dimensional space • Features can be intuitive for toolkit users • Easier to work with than raw data (classification) • Kind of features: • Statistical features • Cognitively-inspired features Feature extraction

Feature format • Store features in XML files • One XML file per image describing all features • Easy to addnew features of existing images • Easy to add images • Onlycalculate features that are notalready present in XML file • Addthose features to the XML file of the image Feature extraction

Statistical features • Low level & understandablefeatures • RGB values (average, median) • Hue, Saturation&Intensityvalues (average, median) • Edge-pixel ratio • Corner-pixel ratio • Entropy of the intensity • Variance of the intensity • Compositional features Feature extraction – Statistic part

Edge-pixel ratio Ratio: 0.0094 Ratio: 0.0998 Feature extraction - Statistic part

Average of the intensity AvgIntensity: 21.90 AvgIntensity: 123.96 AvgIntensity: 243.67 Feature extraction - Statistic part

Entropy of the intensity Intensity entropy: 1.5408 Intensity entropy: 7.8799 Feature extraction - Statistic part

Variance of the intensity Intensity variance: 506 Intensity variance: 14676 Feature extraction - Statistic part

Compositional edge-pixel ratio Feature extraction - Statistic part

Hue and Saturation Feature extraction - Statistic part

Weibull-Distribution Image Contrast • Why Feature extraction – Statistical part

Cognitively-inspired features Model of Saliency-Based Visual Attention • It has appeared that attention influences visual information even in the earliest areas of primate visual cortex • This influence seems to shape an integrated saliency map • This maps is the representation of the environment that weighs every input by its local feature contrast and its current behavioral relevance • It enables the visual system to integrate alarge amount of information Feature extraction - Cognitive part

Itti, Koch and Niebur’s Model Feature extraction - Cognitive part

Example of saliency map color ORIGINAL IMAGE orientation Feature extraction - Cognitive part intensity EXTRA: skin SALIENCY MAP

What do we have Intensity map Orientation map • Important visual features about the style of the photo of this image: • - The portrait is not exactly in the middle • The portrait is a human • - The portrait is standing statically • - Colors are quite uniform, and they are not so many Skin map Feature extraction - Cognitive part Saliency map Color map But how to use all the different maps to represent these information?

Cognitively-inspired features (1) • Shannon entropy of the 5 different maps (the saliency and the conspicuity ones) • Standard deviation of the saliency distribution in the saliency map • Location of the three most salient points • Skin intensity Feature extraction - Cognitive part

Cognitively-inspired features (2) • Location has been computed using the Inhibition Of Return (IOR) procedure: Original saliency map Feature extraction - Cognitive part 3 most salient locations After the first inhibition After the second inhibition

Cognitively-inspired features (3) • Skin is an extra channel (not standard in the Itti’s model) but it has been found really interesting • It can easily be used to detect nude images (that are quite popular within devianArt’s professional photographer) Original image Feature extraction - Cognitive part Skin map Skin map Original image

OpenCV face detector Feature extraction - Cognitive part

Classification • Given a set of features, the classification is used to: • Determine if two artists/categories are distinguishable • Determine which features are useful to do it • Different classifiers are available in the Toolkit: • k-Nearest Neighbour (kNN) • Naive Bayes (NB) • Nearest Mean (NM) • Support Vector Machine (libSVM) Classification

Classification • Pre-processing functions: • Reading in XML files and creating a dataset • Normalization • Dataset filtering on classes and features • Parameter optimization using cross-validation • Classification current capabilities: • 1 class against another class • 1 class against all other classes Classification

Classification • Feature selection is needed when dealing with a lot of features • Reduces the dimensions of the data representation • Give the feature combination that best separate a class • Sequential forward feature selection • First select the most informative feature and iteratively add the next most informative feature to it • Criterion is based on the inter-intra distance Classification

Classification • Evaluation measures: • Precision • The percentage of how many of the positive classified images were indeed positive • Recall • The percentage of how many of the total positive images were found positive • F1-Measure • The weighted average of the precision and recall Classification

Visualization • Purpose of the visualization: • Visualize the dataset • Find patterns • Analyse classification results • Filtering (relevant information) • Input: Dataset (thumbs+full) images & XML features files • Converted to single TAB seperated file • Express the classification performance • Capture the performance in one graph • Input: performance output of the classifier Visualization

Visualization • Use existing visualization application? • Mondrian, general purpose statistical data-visualization system Visualization http://rosuda.org/mondrian/

Visualization • Use existing visualization application? • XmdvTool, interactive visual exploration of multivariate data sets • Flat version of the data set Visualization http://davis.wpi.edu/~xmdv/

Visualization • Use existing visualization application? • Tool that has generic uses, produce only generic displays • Data can take many interesting forms • Require unique types of display and interaction • Not captured with general applications • UI not intuitive (lack easy way to filter data) • (These tools also look outdated) Visualization

Visualization • What language/framework for our visualization? • There are many… • Prefuse visualization toolkit (generic displays) • Adobe Flash/Flex (expensive, slow for large datasets) Visualization

Visualization • (Partially) Implemented in “Processing” • Open source programming language to create images, animations, and interactions • Build on top of Java (collection of Java classes) • Consists of: • Processing Development Environment (PDE) (very minimalistic) • A collection of commands (API) • Several libraries that support more advanced features (OpenGL, XML) • Easy to integrate into Java (Eclipse) Visualization

Visualization: Processing • Provides functions to make life more easy • image(img, x, y, [width, height]) • line(x1, y1, x2, y2) stroke(color) • Not to draw complete graphs/plots • Right combination of cost, ease of use and speed • Export the application as a Java Applet • Run it on a website • Use URL instead of images to avoid legal issues Visualization

Experiments & Results

Experiment #1 – Classification • Goal: • Use the toolkit to find what kind of features best separate two artists • Details of the experiment • Experiment was performed for all artists in the dataset • Feature selection algorithm was used to output the 1-5 most informative features • Evaluation was done using the F-measure

Selecting the classifier • Select classifier for the experiment • Train all the classifiers on a subset of the trainingdata using crossvalidation to optimize parameters • Criteria of selection: F-measure • SVM gives the highest F-measure Average F-measure 1vs1 classification over all artists

Result Matrix using the top 1 feature

Result Matrix using top 2 features

Result Matrix using top 3 features

Result Matrix using the top 4 features

Result Matrix using the top 5 features

Result Matrix using all features

Visualization Case (1) • Artist Pair: Kitsunebaka91 and LALAax • Fmeasure Pair: 0.952941 and 0.884615 • medIntCells_2 • gridEdgeRatio_4 • Artist Pair: fediaFedia and gsphoto • Fmeasure Pair: 0.867347 and 0.938095 • avgHue • intVariance

Visualization Case (2) • Artist Pair: K1lgore and sekcyjny • Fmeasure Pair: 0.692308 and 0.640000 • avgBCells_3 • salMapCEntropy • Artist Pair: stereoflow and zihnisinir • Fmeasure Pair: 0.649007 and 0.683871 • avgHueCells_4 • avgR

Results

DeviantART Analysis using Image Features

DeviantART Analysis using Image Features

Presentation Transcript

Detecting Categories in News Video Using Image Features

Automatic Image Stitching using Invariant Features

Detecting Image Region Duplication Using SIFT Features

Features/requirements analysis using design studio

Detection of Explosives Using Image Analysis

Lecture 13 Image features

Image Features

Quantifying Cement Systems using Image Analysis

Image features and properties

Automatic Panoramic Image Stitching using Local Features

Image Features

Image Features - I

Detecting Image Features: Corner

Image Matching using Local Symmetry Features

Detection of explosives using image analysis

Doing history: Using image analysis

General Image Retrieval Using Shape and Combined Features

Image recognition using analysis of the frequency domain features

Image Analysis Using R

Detection of explosives using image analysis

Automatic Panoramic Image Stitching using Local Features