August 22, 2012

Welcome to the third post in the series "The Perspecive Camera - An Interactive Tour." In the last post, we learned how to decompose the camera matrix into a product of intrinsic and extrinsic matrices. In the next two posts, we'll explore the extrinsic and intrinsic matrices in greater detail. First we'll explore various ways of looking at the extrinsic matrix, with an interactive demo at the end.

The camera's extrinsic matrix describes the camera's location in the world, and what direction it's pointing. Those familiar with OpenGL know this as the "view matrix" (or rolled into the "modelview matrix"). It has two components: a rotation matrix, *R*, and a translation vector ** t**, but as we'll soon see, these don't exactly correspond to the camera's rotation and translation. First we'll examine the parts of the extrinsic matrix, and later we'll look at alternative ways of describing the camera's pose that are more intuitive.

August 14, 2012

So, you've been playing around a new computer vision library, and you've managed to calibrate your camera... now what do you do with it? It would be a lot more useful if you could get at the camera's position or find out it's field-of view. You crack open your trusty copy of Hartley and Zisserman, which tells you how to decompose your camera into an intrinsic and extrinsic matrix --- great! But when you look at the results, something isn't quite right. Maybe your rotation matrix has a determinant of -1, causing your matrix-to-quaternion function to barf. Maybe your focal-length is negative, and you can't understand why. Maybe your translation vector mistakenly claims that the world origin in *behind* the camera. Or worst of all, everything looks fine, but when you plug it into OpenGL, you just don't see *anything*.

Today we'll cover the process of decomposing a camera matrix into intrinsic and extrinsic matrices, and we'll try to untangle the issues that can crop-up with different coordinate conventions. In later articles, we'll study the intrinsic and extrinsic matrices in more detail, and I'll cover how to convert them into a form usable by OpenGL.

August 13, 2012

On September 27, 1998 a yellow line appeared across the gridiron during an otherwise ordinary football game between the Cincinnati Bengals and the Baltimore Ravens. It had been added by a computer that analyzed the camera's position and the shape of the ground in real-time in order to overlay thin yellow strip onto the field. The line marked marked the position of the next first-down, but it also marked the beginning of a new era of computer vision in live sports, from computerized pitch analysis in baseball to automatic line-refs in tennis.

In 2006, researchers from Microsoft and the University of Washington automatically constructed a 3D tour of the Trevi Fountain in Rome using only images obtained by searching Flickr for "trevi AND rome."

In 2007, Carnegie Mellon PhD student Johnny Lee hacked a $40 Nintento Wii-mote into an impressive head-tracking virtual reality interface.

In 2010, Microsoft released the Kinect, a consumer stereo camera that rivaled the functionality of competitors sold for ten times its price, which continues to disrupt the worlds of both gaming and computer vision.

What do all of these technologies have in common? They all require a precise understanding of how the pixels in a 2D image relate to the 3D world they represent. In other words, they all hinge on a strong camera model. This is the first in a series of articles that explores one of the most important camera models in computer vision: the pinhole perspective camera. We'll start by deconstructing the perspective camera to show how each of its parts affect the rendering of a 3D scene. Next, we'll describe how to import your calibrated camera into OpenGL to render virtual objects into a real image. Finally, we'll show how to use your perspective camera to implement rendering in a virtual-reality system, complete with stereo rendering and head-tracking.

This series of articles is intended as a supplement to a more rigorous treatment available in several excellent textbooks. I will focus on providing what textbooks generally don't provide: interactive demos, runnable code, and practical advice on implementation. I will assume the reader has a basic understanding of 3D graphics and OpenGL, as well as some background in computer vision. In other words, if you've never heard of homogeneous coordinates or a camera matrix, you might want to start with an introductory book on computer vision. I highly recommend Multiple View Geometry in Computer Vision by Hartley and Zisserman, from which I borrow mathematical notation and conventions (e.g. column vectors, right-handed coordinates, etc.)