1 / 13

Clustering Algorithms for Perceptual Image Hashing

IEEE Eleventh DSP Workshop, August 3 rd 2004. Clustering Algorithms for Perceptual Image Hashing. Vishal Monga, Arindam Banerjee, and Brian L. Evans. {vishal, abanerje, bevans}@ece.utexas.edu. Embedded Signal Processing Laboratory Dept. of Electrical and Computer Engineering

becky
Download Presentation

Clustering Algorithms for Perceptual Image Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IEEE Eleventh DSP Workshop, August 3rd 2004 Clustering Algorithms for Perceptual Image Hashing Vishal Monga, Arindam Banerjee, and Brian L. Evans {vishal, abanerje, bevans}@ece.utexas.edu Embedded Signal Processing Laboratory Dept. of Electrical and Computer Engineering The University of Texas at Austin http://signal.ece.utexas.edu Research supported by a gift from the Xerox Foundation

  2. Hash Example • Hash function: Projects value from set with large (possibly infinite) number of members to set with fixed number of (fewer) members Irreversible Provides short, simple representationof large digital message Example: sum of ASCII codes forcharacters in name modulo N,a prime number (N = 7) Database name search example

  3. Perceptual Hash: Desirable Properties • Perceptual robustness • Fragility to distinct inputs • Randomization Necessary in security applicationsto minimize vulnerability againstmalicious attacks

  4. Input Image Final Hash Visually Robust Feature Vector Compress(or cluster) Feature Vectors Feature Vector Extraction Hashing Framework • Two-stage hash algorithm • Goal: Retain perceptual significance Let (li, lj) denote vectors in metric space of feature vectors V and 0 < ε < δ, then it is desired Minimizing average distance between clusters inappropriate

  5. Cost Function for Feature Vector Compression • Define joint cost matrices C1 and C2 (nxn) n = total number of vectors be clustered, C(li), C(lj) denote the clusters that these vectors are mapped to • Exponential cost Ensures severe penalty associated if feature vectors far apart “Perceptually distinct” clustered together α > 0, Г > 1 are algorithm parameters

  6. Cost Function for Feature Vector Compression • Define S1 as *S2 is defined similarly • Normalize to get , • Then, minimize “expected” cost p(i) = p(li), p(j) = p(lj)

  7. Basic Clustering Algorithm • Obtainε, δ,set k = 1. Select the data point associated with highest probability mass, label it l1 • Make the first cluster by including all unclustered points ljsuch that D(l1,lj) < ε/2 3. k = k + 1. Select the highest probability data point lk among the unclustered points such that where S is any cluster, C – set of clusters formed till this step • Form the kth cluster Skby including all unclustered points ljsuch that D(lk,lj) < ε/2 5. Repeat steps 3-4 until no more clusters can be formed

  8. Observations • For any (li, lj) in cluster Sk • No errors up to this stage of algorithm Each cluster is at least ε away from any other cluster Within each cluster, maximum distance between any two points is at most ε

  9. Approach 1 • Select data point l* among unclustered data points that has highest probability mass • For each existing cluster Si, i = 1,2,…, k compute Let S(δ) = {Si such that di ≤δ} • IF S(δ) = {Φ} THEN k = k + 1. Sk = l* is a cluster of its own ELSE for each Siin S(δ) define where denotes the complement of Si i.e. all clusters in S(δ) except Si. Then, l* is assigned to the cluster S* = arg min F(Si) 4. Repeat steps 1 through 3 until all data points are exhausted

  10. Approach 2 • Select data point l* among unclustered data points that has highest probability mass • For each existing cluster Si, i = 1, 2,…, k, define and β lies in [1/2, 1] Here, denotes the complement of Si i.e. all existing clusters except Si. Then, l* is assigned to the cluster S* = arg min F(Si) 3. Repeat steps 1 and 2 until all data points are exhausted

  11. Summary • Approach 1 Tries to minimize conditioned on = 0 • Approach 2 Smoothly trades off the minimization of vs. via the parameter β β = ½  joint minimization β = 1  exclusive minimization of • Final hash length determined automatically! Given by bits, where k is number of clusters formed Proposed clustering can compress feature vectors in any metric space, e.g. Euclidean, Hamming, and Levenshtein

  12. Clustering Results • Compress binary feature vector of L = 240 bits Final hash length = 46 bits, with Approach 2, β = 1/2 • Value of cost function is orders of magnitude lower for proposed clustering

  13. Conclusion & Future Work • Two-stage framework for image hashing • Feature extraction followed by feature vector compression • Second stage is media independent • Clustering algorithms for compression • Novel cost function for hashing applications • Applicable to feature vectors in any metric space • Trade-offs facilitated between robustness and fragility • Final hash length determined automatically • Future work • Randomized clustering for secure hashing • Information theoretically secure hashing

More Related