130 likes | 158 Views
This study explores using a fixed image vocabulary to improve object recognition through a machine learning lexicon. By aligning bitext and annotating images, EM clustering and evaluation methods are employed to predict image regions based on text. Results show improved recall and precision rates. Human evaluation further confirms the effectiveness of the approach.
E N D
Object Recognition a Machine Translation Learning a Lexicon for a Fixed Image Vocabulary Miriam Miklofsky
Lexicons • A vocabulary of terms used in a subject • A specialized list of terms • Devices that predict one representation given another representation
Dataset • Aligned bitext • Annotated images • Images with regions • Unknown which region of image goes with which word from text
Clustering • K means clustering • Vector quantize the image region representation • Kullback-Leibler divergence • Relative entropy • Measure of difference of two probability distributions over the same event space
Evaluation • Auto annotate images • Quantize regions • Use lexicon to determine word • Annotate image with word
Results - Annotation • Base results • 80 words of 371 word vocabulary could be predicted • Retraining • Similar results but some words with higher recall and precision
Results(cont.) • Null probability • Recall decreases • Precision increases • Clustering of like words • Recall values of clusters higher than for single words
Results -Correspondence • Base results • Some good words up to 70% correct prediction • Null prediction • Predict good words with greater probability • Word clustering • Prediction rate generally increases
Evaluation • Human evaluation • Images viewed by hand • Somewhat subjective