Computer Vision

Fall 2001

Sparse Chapter Outline of Introductory Techniques for Computer Vision

Chapter 6 - Camera Calibration

Extrinsic Parameters

Definition - the parameters that define the location and orientation of the camera reference frame with respect to a known world reference frame.

R, the 3 x 3 rotation matrix
T, the 3d translation vector

Intrinsic Parameters

Definition - the parameters that are needed to link the pixel coordinates of an image point with the corresponding coordinates in the camera reference frame.

f_x = f/s_x, length in effective horizontal pixel size units.
å = s_y/s_x, aspect ratio
(o_x,o_y), image center coordinates
k₁, radial distortion coefficient

Chapter 7 - Stereopsis

Introduction

Definition - Stereo vision refers to the ability to infer information on the 3d structure and distance of a scene from two or more images taken from different viewpoints.

Correspondence Problem - Which parts of the left and right images are projections of the same scene element?
Assumptions

Most scene points are visible from both viewpoints
Corresponding image regions are similar

Correlation based correspondence algorithms attempt to match elements' image windows of fixed size. The criterion for measuring is a measure of the correlation between these two windows; one might you cross-correlation, or sum of squared differences, for example. SSD is less biased by the presence of very small or large intensity values.
Feature based method restrict the search for correspondences to a sparse set of features. Instead of window, numerical and symbolic properties of features are used from feature descriptors. Most methods narrow the number of possible features with thich to match by constraints: geometric constraints, analytical constraints.

Reconstruction Problem - Given a number of corresponding parts of the left and right image, and possibly information on the geometry of the stereo system, what can we say about the 3d location and structure of the observed objects?

Triangulation - The way in which stereo determines the position in space of corresponding points in pairs of images.

Baseline - The distance between the centers of projection

Disparity - The difference in retinal position betwen the corresponding points in two images. Disparity is inversely propertional to the depth of the point in space.

Intrinsic Stereo Parameters - Characterize the transformation mapping an image point from camera to pixel coordinates in each camera.

Extrinsic Stereo Parameters - Describe the relative position and orientation of the two cameras.

Epipolar Geometry

Definition - The geometry of stereo. Each point in the left image is restricted to lie on a given line in the right image, the epipolar line--and vice versa. This is called the epipolar constraint.

Epipoles - The point at which the line through the centers of projection of each image intersects the image planes. The left epipole is the image of the center of projection of the right camera and vice versa.

Essential Matrix E - Establishes a natural link between the epipolar constraint and the extrinsic parameters of the stereo system. Extrinsic parameters can be retrieved via E In sum, Eis the mapping between points and epipolar lines we were looking for.
Satisfies the equation: p_rTEp_l = 0 where p is in camera coordinates
Properties

encodes information on the extrinsic parameters only
has rank 2
its two nonzero singular values are equal

Fundamental Matrix F - Establishes a link between the epipolar constraint and the extrinsic parameters of the stereo system. The difference from the Essential Matrix is that F is defined in terms of pixel coordinates, while E is defined in terms of camera coordinates.
Satisfies the equation: p_rTFp_l = 0 where p is in pixel coordinates
Properties

encodes information on both the intrinsic and extrinsic parameters
has rank 2

NOTE: Relationship between E and F is F = M_r^-T E M_l^-1 where M are the matrices of the left and right intrinsic parameters.

Rectification F - Given a stereo pair of images, rectification determines a transformation of each image such that pairs of conjugate epipolar lines become collinear and parallel to one of the images axes, usually the horizontal one. Why? Because, then the correspondence problem is reduced to 1d from 2d.

3d Reconstruction

The amount of 3d Reconstruction possible depends on the amount of a priori knowledge available on the parameters of the stereo system.

Both Intrinsic and Extrinsic parameters are known --> you can solve the reconstruction unambiguously by triangulation.
If only the intrinsic parameters are known --> you can solve the problem, and estimate the extrinsic parameters up to an unknown scaling factor. Why? Because we do not know the baseline of the system and therefore cannot reconstruct its actual depth.
If no intrinsic or extrinsic and only the pixel correspondences are known, you can still obtain a reconstruction of the environment, but only up to an unknown, global projective transformation

%start of comment %end of comment out