Computer Vision Project 3: Stereo Depth Estimation

Goal:

Write a program that takes as input a rectified stereo pair, and creates a depth map of the scene, (where each pixel is color coded by your estimate of the depth).

Extension
One challenge is images that have large blank regions; develop some other heuristics or rules to "guess" what the best correspondence is for these large regions.

Intro

In stereo matching, we find correspondences between input images. Generally, we determine correspondence between two points by inspecting the pixel neighborhood N around both points. I then select the pairing that has the lowest sum of absolute differences as a corresponding point pair. In practice, a matching block is located for each pixel in an image. The relative difference in the location of the points on the image planes is the disparity of that point. Due to the assumption of being constrained into a 1-dimensional search space, these disparities can be represented as a 2D disparity map which is the same size as the image.

Assumptions

Constraining the problem into a 1-dimensional search space means I can model disparities as a 2D disparity map.
I assume that disparity of a point is closely related to the depth of the point.
I assume that the intensities of corresponding points in the images are identical. (note this is true only when the scene is Lambertian).

Dataset:

I found pairs of rectified stereo images (that were not solely evaluation sets) here, from the "Indoor", "Outdoor", and "Office" datasets. The datasets were created by Dr. Stefano Mattoccia of the University of Bologna. I also used the MYamanSKalkan_Multi-Modal_Stereo_Datasets, made by the KOVAN research lab, so I can compare their ground truth depth maps. Also useful was the classic Middlebury Stereo Vision Datasets.

Sample Results (Basic Implementation)

depth map

depth stereo image

left image

left stereo image

right image

right stereo image

stereo depth

depth stereo image

left image

left stereo image

right image

right stereo image

Implementation:

Big idea: find pixel-by-pixel correspondence from sum of absolute differences

Basic Algorithm

Finding corresponding points:

In the basic implementation, I use sum of absolute differences for my error function, as inspired by this paper: Sum of Absolute Differences algorithm in stereo correspondence problem for stereo matching in computer vision application by Hamzah et. al. The sum of absolute differences function takes the absolute value of the differences between the left and right windows. Noticing that pairs of images in the Office dataset are very closely alligned, I felt it was only necessary to move my evaluation window across rows (rather than columns).

Extension: images with blank regions

Idea: the closer the pixel intesities are between the two, the more likely they are to be of similar depth. With this insight, and a few other computational tricks, maybe we can improve our error function somehow to handle these blank regions. Inspired by the Stereo Vision demo and resources available for Matlab, I decided to try my hand at dynamic programming to optimize the error function. As I observed from the results above, I block matching creates a noisy disparity image. However, we can reduce the noise with a smoothing constraint.
In my basic implementation, I chose the optimal disparity for each pixel based on only its cost function.
I set up the problem as such:

Extension Results

depth map

final bowling pins image

left image

 left bowling pins image

right image

bowling pins right image
final image  left image lunar right image
final image  left image lunar right image