Vision

Vision

University of Oulu Face Database Yale Face Database What are the requirements for a computer vision system capable of recognizing faces? What problems must be solved? What special techniques are required? What uses could be made of a working application?

What are the requirements for a computer vision system capable of interpreting sign language? What problems must be solved? What special techniques are required? What uses could be made of a working application?

What are the requirements for a computer vision system capable of autonomous vehicle navigation? What problems must be solved? What special techniques are required? What uses could be made of a working application?

Introduction to the Human Vision System • Reference: Essentials of Neural Science and Behavior • (Kandel, Schwartz, Jessell) • What the vision system does: •  Solves the binding problem • Creates 3D perception from 2D projections  Segments into figure and ground •  Forms perceptual groupings •  Conjectures spatial relationships •  Infers lighting and color •  Infers edges •  Feeds higher cognitive centers • Consider possible similarities between vision and mental imagery (compare to computer vision and graphics) Essentials of neural science and behavior (Kandel, J. Schwartz, & T. Jessell, 1995)

The Vision System… Solves the binding problem (what, where, motion, color, texture)

Three parallel pathways: Depth and form Motion Color Essentials of neural science and behavior (Kandel, J. Schwartz, & T. Jessell, 1995)

The Vision System… Creates 3D perception from 2D projections

The Vision System… Segments into figure and ground

The Vision System… Forms perceptual groupings

The Vision System… Forms perceptual groupings (Wertheimer: laws of similarity, proximity, common fate, good continuation, set, past experience)

The Vision System… Conjectures spatial relationships (Müller-Lyer; vanishing perspective) 

The Vision System… Infers lighting and color

The Vision System… Infers edges (Kaniza triangles)

The Eye as a Camera Essentials of neural science and behavior (Kandel, J. Schwartz, & T. Jessell, 1995)

The Retina Essentials of neural science and behavior (Kandel, J. Schwartz, & T. Jessell, 1995)

Essentials of neural science and behavior (Kandel, J. Schwartz, & T. Jessell, 1995)

Binocular Vision Essentials of neural science and behavior (Kandel, J. Schwartz, & T. Jessell, 1995)

Feature Map Hypothesis Essentials of neural science and behavior (Kandel, J. Schwartz, & T. Jessell, 1995)

Felleman’s and Van Essen’s Vision System Circuits (1991) From Suzuki and Amaral in Crick (1994)

Saccades Essentials of neural science and behavior (Kandel, J. Schwartz, & T. Jessell, 1995)

Vision is more than what happens at the eye Essentials of neural science and behavior (Kandel, J. Schwartz, & T. Jessell, 1995)

Computer Vision Objective: Make useful decisions about real physical objects and scenes based on sensed images. Computer vision = machine vision = image understanding Critical Issues: Sensing: How do sensors obtain images of the world? How do the images encode properties of the world, such as material, shape, illumination, and spatial relationships? Encoded Information: How do images yield information for understanding the 3D world, including the geometry, texture, motion, and identity of objects in it? Representations: What representations should be used for stored descriptions of objects, their parts, properties, and relationships? Algorithms: What methods are there to process image information and construct descriptions of the world and its objects? Computer Vision (Shapiro and Stockman, 2001)

Some problems: Brightness changes across an image Color changes across images Feature blending (to obscure edges) Feature variation across images Object occlusion Computer Vision (Shapiro and Stockman, 2001)

Is it easy for you to decide the gender and approximate age of persons pictured in magazine ads? Psychologists might tell us that humans have the ability to see a face and immediately decide on the age, sex, and degree of hostility of the person. Assume that this is the case. Is such an ability based on image features? If so, what are they? If not, how are such decisions made? Computer Vision (Shapiro and Stockman, 2001)

Identify the age, sex, and degree of hostility for each of the following images:

Who is this?

Which invariant features of the following objects enable you to recognize them in rain or sunshine, alone or alongside other objects, from the front or side: • your tennis shoes • your front door • your mother • your favorite make of automobile? Computer Vision (Shapiro and Stockman, 2001)

Selfridge’s Model (cont.)

Image Representation Digital Images Computer Vision (Shapiro and Stockman, 2001)

Color Systems RGB: Red-Green-Blue additive system (R,G,B) → (0-255, 0-255, 0-255) Colors are created by adding components to black (0,0,0) Works well for monitors based on use of three types of phosphors CMY: Cyan-Magenta-Yellow subtractive system Colors are created by subtracting from white (255,255,255) Useful for printing on white paper Cyan absorbs red, Magenta absorbs green, yellow absorbs blue HSI: Hue-Saturation-Intensity system Separates intensity from two chromaticity values (H and S) Project color cube (Figure 6.7) along its major diagonal to produce Figure 6.8 YIQ: Uses one luminance value Y and two chromaticity values I and Q NTSC television standard YUV: Used in JPEG and MPEG compression algorithms and some video products Note that YIQ and YUV permit better compression than other schemes because luminance and chrominance can be encoded using different numbers of bits. Computer Vision (Shapiro and Stockman, 2001)

Color Systems Computer Vision (Shapiro and Stockman, 2001)

Using Color for Vision Processing Computer Vision (Shapiro and Stockman, 2001)

Image file format used in this class • Type:an integer value 1, 2, or 3, where 1=24 bit color, 2=256 levelgray scale, and 3=binary (black/white) • Columns: an integer giving the # of columns in the image • Rows: an integer giving the # of rows in the image • Pixel data: Each pixel value is stored in a single byte for types 2 and 3,and in three bytes for type 1 (in the orderR, G, B). Assumespixel data has already been inverted by the program that savedthe data.

Thresholding to produce binary images Computer Vision (Shapiro and Stockman, 2001)

//Method: colorToBinary //Description: Converts a 24 bit RGB image to a binary (2-color) image. //Parameters: h - # rows in the image space // w - # columns in the image space // r[ ][ ] - red values for each pixel // g[ ][ ] - green values for each pixel // b[ ][ ] - blue values for each pixel // t - the binary threshold. RGB color values for a given // pixel are averaged thendivided by 2 (i.e., // summed and dividedby 6), thencompared to // this value. Valuesabove the thresholdare // considered white;values below, black. //Returns: binary[ ][ ]- binary pixel values for converted image //Calls: nothing public int[][] colorToBinary(int h, int w, int r[ ][ ], int g[ ][ ], int b[ ][ ], int t) { int[ ][ ] binary = new int[height][width]; for (int row = 0; row < height; row++) for (int col = 0; col < width; col++) if (((r[row][col] + g[row][col] + b[row][col])/6) > t) binary[row][col] = 255; //white else binary[row][col] = 0; //black return binary; }

Color image (upper left) and binary images with thresholds 127, 65, 33, 180, and 220

Thresholding Gray-Scale Images Threshold above: set all pixels with gray-tone values  threshold to foreground value Threshold below: set all pixels with gray-tone values  threshold to foreground value Threshold inside: set all pixels with gray-tone values  upper threshold and  lower threshold to foreground value Threshold outside: set all pixels with gray-tone values  upper threshold and  lower threshold to foreground value Key concern: How to choose the threshold(s) One solution: Use histograms for threshold selection Computer Vision (Shapiro and Stockman, 2001)

Pixel neighborhoods 4-Neighborhood Includes pixels: [r-1,c], [r+1,c], [r,c-1], [r,c+1] 8-Neighborhood Includes pixels: [r-1,c], [r+1,c], [r,c-1], [r,c+1], [r-1,c-1], [r-1,c+1], [r+1,c-1], [r+1,c+1] Computer Vision (Shapiro and Stockman, 2001)

Vision

Vision

Presentation Transcript

Vision

Vision

Vision

God’s Vision – Big Vision

Vision

Vision

vision

Vision

Vision

VISION

Vision

Vision:

Vision

Vision

Vision

Vision – 4 Color Vision

VISION

Vision

VISION

VISION