1.39k likes | 1.83k Views
Computer Vision. Chapter 1 Introduction. The goal of computer vision is to make useful decisions about real physical objects and scenes based on sensed images. Applications areas. Industrial inspection Medical imaging Image database and query Satellite and surveillance imagery
E N D
Computer Vision Chapter 1 Introduction
The goal of computer vision is to make useful decisions about real physical objects and scenes based on sensed images.
Applications areas • Industrial inspection • Medical imaging • Image database and query • Satellite and surveillance imagery • Entertainment • Handwriting and printed character recognition
Image dimensionality • 1D • audio (sound) • 2D • digital camera picture, chest x-ray, ultrasound • 3D • video sequence of 2D images • multispectral 2D images • volumetric medical imagery (CT, MRI) • 4D • PET-CT • MRI
Image types • Binary • Grayscale • Color • Multispectral
Operations on images • Neighborhood (local) operations • Enhancing the entire image • Combining multiple images • Ex. differences, noise reduction, blending • Feature extraction • Ex. area, centroid (center of mass), orientation, lines • invariants
General hardware discussion • General purpose vs. special purpose (DSP, GPU) • Uniprocessors vs. parallel processors (COWs, multiprocessors) • Sensors (discussed later)
General software discussion • Android SDK • Java-based • freely available from http://developer.android.com/sdk/index.html • Albie start app • doxygen for source code documentation • freely available from doxygen.org • code format • http://www.oracle.com/technetwork/java/codeconv-138413.html
General software discussion C# use Visual C# (Express Edition is freely available from Microsoft) CSImageViewer starter app main course web page has links doxygen for source code documentation code format
What’s in a program file? • Comments • Code
What’s a compiler? • A program • Input • Processing • Output
What’s a compiler? • A program • Input: • Text file (your program) • Processing: • Convert HLL statements into machine code (or similar) • Ignore comments • Output: • A binary file of machine code (or similar)
Traditional documentation • Code files are separate from design documents. • Wouldn’t it be great if we could bring code and documentation together into the same file(s)?
Tools like doxygen and javadoc • A program • Input: • Text file (your program) • Processing: • Convert (specially formatted) comments into documentation • Ignore HLL statements • Output: • Documentation (typically in HTML)
Getting started with doxygen • Download from doxygen.org. • Do this only once in directory (folder) containing your source code: (already done for you) doxygen –g • This creates a doxygen configuration file called Doxyfile which you may edit to change default options. • Edit Doxyfile and make sure all EXTRACTs are YES • Then whenever you change your code and wish to update the documentation: doxygen • which updates all documentation in html subdirectory • Demonstrate.
Usingdoxygen: document every (source code) file /** * \file ImageData.java * \brief contains ImageData class definition (note that this * class is abstract) * * <more verbose description here> * \author George J. Grevera, Ph.D. */ . . .
Using doxygen: document every class //---------------------------------------------------------------------- /** \brief CSImageViewer class. * * Longer description goes here. */ public class CSImageViewer : Form { . . .
Using doxygen: document every function //---------------------------------------------------------------- /** \brief Given a pixel's row and column location, this * function returns the gray pixel value. * \param row image row * \param col image column * \returns the gray pixel value at that position */ public int getGray ( int row, int col ) { int offset = row * mW + col; return mOriginalData[ offset ]; }
Using doxygen: document every function (parameters) //---------------------------------------------------------------- /** \brief Given a pixel's row and column location, this * function returns the gray pixel value. * \param row image row * \param col image column * \returns the gray pixel value at that position */ public int getGray ( int row, int col ) { int offset = row * mW + col; return mOriginalData[ offset ]; }
Using doxygen: document every function (return value) //---------------------------------------------------------------- /** \brief Given a pixel's row and column location, this * function returns the gray pixel value. * \param row image row * \param col image column * \returns the gray pixel value at that position */ public int getGray ( int row, int col ) { int offset = row * mW + col; return mOriginalData[ offset ]; }
Using doxygen: document all class members (and global and static variables in C/C++) protected bool mIsColor; ///< true if color (rgb); false if gray protected bool mImageModified; ///< true if image has been modified protected int mW; ///< image width protected int mH; ///< image height protected int mMin; ///< overall min image pixel value protected int mMax; ///< overall max image pixel value protected String mFname; ///< (optional) file name
doxygen(lengthier example including html) /** \brief Actual original (unmodified) unpacked (1 component per * array entry) image data. * * If the image data are gray, each entry in this array represents a * gray pixel value. So mImageData[0] is the first pixel's gray * value, mImageData[1] is the second pixel's gray value, and so * on. Each value may be 8 bits or 16 bits. 16 bits allows for * values in the range [0..65535]. * <br> <br> * If the image data are color, triples of entries (i.e., 3) represent * each color rgb value. So each value is in [0..255] for 24-bit * color where each component is 8 bits. So mImageData[0] is the * first pixel's red value, mImageData[1] is the first pixel's green * value, mImageData[2] is the first pixel's blue value, mImageData[3] * is the second pixel's red value, and so on. */ protected int[] mOriginalData;
Required documentation rules • Each file, class, method, and member variable must be documented w/ doxygen. • Exception is when we follow the one-class-per-file rule. In that case only the class or file needs to be documented. • The contents of the body of each method should contain comments, but none of these comments should be in the doxygen format. (Not every comment is a doxygen comment.)
Not every comment should be a doxygen comment. Required: • every file/class • every function/method • every class member (data) • (in C/C++, every static and/or global variable) Use regular, plain comments in the body of a function/method. (One exception is the \todo.)
int mColorImageData[][][]; ///< should be mColorImageData[mH][mW][3] //---------------------------------------------------------------------- /** \brief Given a buffered image, this ctor reads the image data, stores * the raw pixel data in an array, and creates a displayable version of * the image. Note that this ctor is protected. The user should only * use ImageData.load( fileName ) to instantiate an object of this type. * \param bi buffered image used to construct this class instance * \param w width of image * \param h height of image * \returns nothing (constructor) */ protected ColorImageData ( final BufferedImage bi, final int w, final int h ) { mW = w; mH = h; mOriginalImage = bi; mIsColor = true; //format TYPE_INT_ARGB will be saved to mDisplayData mDisplayData = mOriginalImage.getRGB(0, 0, mW, mH, null, 0, mW); mImageData = new int[ mW * mH * 3 ]; mMin = mMax = mDisplayData[0] & 0xff; for (int i=0,j=0; i<mDisplayData.length; i++) { mDisplayData[i] &= 0xffffff; //just to insure that we only have 24-bit rgb final int r = (mDisplayData[i] & 0xff0000) >> 16; final int g = (mDisplayData[i] & 0xff00) >> 8; final int b = mDisplayData[i] & 0xff; if (r<mMin) mMin = r; if (g<mMin) mMin = g; …
Summary of most useful tags \file \author \brief \param \returns \todo (not used in assignments) And many, many others.
Back to images and imaging… The good, the bad, and the ugly
The good, the bad, and the ugly. • Success is usually hard won! • Problems: • Matching models to reality • Lighting variation • Sensor noise • Occlusion & rotation/translation/scale • Limited resolution • An image is a discrete model of an underlying continuous function • Spatial discretization • Sensed values quantization • Levels Of Detail (LOD)
So let’s try to recognize chairs. Task that is trivial for us.
The good, the bad, and the ugly. • Problem: Matching models to reality
The good, the bad, and the ugly. • Problem: Matching models to reality
The good, the bad, and the ugly. • Problem: Matching models to reality
The good, the bad, and the ugly. • Problem: Matching models to reality
The good, the bad, and the ugly. • Problem: Matching models to reality
The good, the bad, and the ugly. • Problem: Matching models to reality
The good, the bad, and the ugly. • Problem: Matching models to reality
The good, the bad, and the ugly. • Problem: Matching models to reality
The good, the bad, and the ugly. • Problem: Matching models to reality • Maybe CAD/CAM models can help! • http://www.3dcadbrowser.com/browse.aspx?category=59
The good, the bad, and the ugly. • Problems: Lighting variation
The good, the bad, and the ugly. • Problem: Sensor noise
The good, the bad, and the ugly. • Problem: Occlusion
The good, the bad, and the ugly. • Problem: Rotation, reflection, translation, & scale
The good, the bad, and the ugly. • Problem: • Limited resolution • An image is a discrete model of an underlying continuous function • Spatial discretization (above and below) • Sensed values quantization (next slide) • Too much of a good thing can be a problem too!