Addressing the Medical Image Annotation Task using visual words representation

Addressing the Medical Image Annotation Task using visual words representation Uri Avni , Tel Aviv University, Israel Hayit Greenspan Tel Aviv University, Israel Jacob Goldberger Bar Ilan University, Israel

Outline • Challenge description • Proposed system • Image representation • classification • Results • Parameters optimization • Performance analysis • Conclusion

ImageClef 2009 medical annotation challenge 12,677 classified x-ray images, 1733 unknown images Classification according to four labeling sets: • 57 classes • 116 classes • 116 IRMA codes • 196 IRMA codes

IRMA database • Noisy images • Irregular brightness, contrast • Non-uniform class distribution The IRMA group - Aachen University of Technology (RWTH), Germany

IRMA Database - samples • Great intra-class variability • Category #: 1121-230-961-700 Sagittal, Mediolateral, Left hip

IRMA Database - samples Category #1121-110-500-000 overview image posteroanterior (PA) Category #1123-112-500-000 high beam energy posteroanterior (PA),expiration Category #1123-121-500-000 high beam energy anteroposterior (AP),inspiration Category #1121-127-500-000 overview image anteroposterior (AP), supine • Great inter-class similarity

Outline • Challenge description • Proposed system • Image representation • classification • Results • Parameters optimization • Performance analysis • Conclusion

Image representation Image model 0.04 0.02 0 0 100 200 Word number • Move from 2D image to a vector of numbers • Representation should preserve enough information of the image content • Should be not sensitive to translation, artifacts and noise • Compare and classify the compact representation

Patch extraction • Extract raw pixels from patches of fixed size • Dense sampling, ~200,000 patches per image • Normalize intensity, variance • Ignore empty patches • Sample several images – one collection with millions of patches

Feature space description • - Reduce dimension of the collection • Add position (x,y) to the features, position weight is important • 8 dimensional feature vector PCA 6 coefficients 9x9 pixels

Build dictionary • Select k feature vectors as far apart as possible • Run k-means clustering Cluster centers , with x,y Cluster centers

Image representation 0.04 0.02 0 0 50 100 Word number • Scan image – translate patches to words histogram Dictionary Image Probability

Image representation 0 0 150 250 50 100 200 300 • Use multiple scales

Classification • Examine knn classifier, with different distance metrics • One-vs-one multiclass SVM classifier, with n(n-1)/2 binary classifiers • Examine several SVM kernels: • Radial basis function • Chi-square • Histogram intersection

Outline • Our objective • Proposed system • Image representation • Retrieval & classification • Results • Parameters optimization • Performance analysis • Conclusion and future work

Selecting classifier type Effect of histogram distance metric in k-nearest neighbors vs svm classifier Symmetric Kullback – Leibler divergence Jeffery divergence SVM

Selecting feature space Effect of parameters on classification accuracy, using 20 cross-validation experiments with x,y No x,y

Selecting features Selecting type of features - invariance / discriminative power tradeoff * Scale and rotation invariance are not desired

Running time 12,677 images Running on Intel daul quad core Xeon 2.33Ghz

Selecting dictionary

Selecting dictionary Using multiple dictionaries for 3 scales increases classification accuracy by 0.5%

Classification results – effect of kernel Effect of kernel function on SVM classifier, for optimal kernel parameters

Classification results – confusion matrix Confusion matrix of random 2000 test images (2007 labels) 91.95% correct

Submission to ImageClef 2009 medical annotation task • One run submitted • Use the same classifier for the 4 label sets (2005,2006,2007,2008) • Ignore IRMA code hierarchy • Don’t use wildcards

Conclusion & future work • Using visual words with simple features and dense sampling is efficient and accurate in general x-ray annotation • We are applying the system to pathology classifications of chest x-rays, together with Sheba Medical Center Healthy Enlarged heart Lung filtrate Left+righteffusion

Thank you.

Addressing the Medical Image Annotation Task using visual words representation