1 / 15

A New Approach for Video Text Detection and Localization

A New Approach for Video Text Detection and Localization. M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong. Related work. Text Area Detection Uncompressed domain methods Texture-based Color-based Edge-based Compressed domain methods DCT coefficients

Download Presentation

A New Approach for Video Text Detection and Localization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong

  2. Related work • Text Area Detection • Uncompressed domain methods • Texture-based • Color-based • Edge-based • Compressed domain methods • DCT coefficients • Number of intra-coded blocks on P- / B- frames • Text String Localization • Bottom-up scheme • Top-down scheme

  3. Language-independent characteristics • Contrast • An adaptive contrast threshold according to the background complexity • Color • Color bleeding caused by compression • Orientation • Well-defined size and orientation make it easy to understand • Stationary location • Appear a certain long time

  4. Language-dependent characteristics

  5. Sampling & color space conversion Video text detection and localization on every sampled frame Multi-frame comparison Workflow

  6. Original image Edge map Text regions Text area Detection Text string Localization Edge detection Size/ f(l) Size f(l) Original coordinates of text regions Level = 1 Level = 2 Level = n-1 Text area Detection Text string Localization Edge map Text regions Size/ f(l) Size f(l) Original coordinates of text regions Level = n Final text regions with original coordinates A sequential multi-resolution paradigm

  7. Text detection • Edge detection • Sobel edge detector • Local thresholding • Adaptive to background complexity • Text-like area recovery • Enhance the density of text areas

  8. Count Low part High part Edge strength 0 MAX (a) Concentric kernel and window (b) A window on the multi-line text area and the horizontal projection in it. (c) Local threshold selection Kernel P3h . . . . 3h h Window P1 Local Thresholding • Use a small kernel (gray) to scan the whole edge map row by row. • In the bigger window surrounding the kernel, check the background type: “Clear” or “Noisy”. • For Clear background and Noisy background, determined the local threshold by low and high parts, respectively, of the edge strength histogram in the bigger window.

  9. Video image Global thresholding results Local thresholding results Thresholding result comparison

  10. Before recoveryAfter recovery Text-like area recovery • Labeling: Classify current edge pixels as “TEXT” and “NON_TEXT” based on its local density. • Recovery/Suppression: • Bring back neighboring lower-strength edge pixels of the TEXT edge pixels. • The NON_TEXT edge pixels are suppressed.

  11. Sub-regions Add to the processing array Y Pop the first region from the processing array Each sub-region Y Horizontal projection Vertical projection Divisible? Divisible? N N Indivisible regions Initialization The whole edge map is the only region in the processing array. The region If the array is empty, terminate. Check aspect ratio N Y Discard false regions Add to the resulting text regions Coarse-to-fine Text localization • Projection-based top-down localization. • To handle complex text layout.

  12. (1) (2) (3) (4) Localization steps

  13. Experimental results

  14. Experimental results

  15. Performance statistics Statistics of 10 news videos: • Processing time per frame: 0.25 s (PIII 1GCPU) • Detection rate = = 93.6% • Detection accuracy = = 87.2% • Localization accuracy = > 90%

More Related