Depth Reconstruction from a Single Image with Deep Neural Networks

project lapse time: 2021-07-26 - 2021-08-27

As part of my academic project at MIT summer online course, I had the opportunity to lead a team that explored the use of deep neural networks for depth reconstruction from a single image. Our goal was to develop an accurate and efficient system that could predict the depth of a scene from a single input image using deep learning techniques.

Project Overview

Depth reconstruction from a single image is a challenging task in computer vision that requires the use of complex algorithms and machine learning techniques. Our team decided to focus on using deep neural networks to solve this problem.

We used the KITTI dataset, which consists of stereo image pairs, along with their corresponding ground truth depth maps, as our training data. We preprocessed the data to extract the necessary features and fed it into a deep neural network built using the TensorFlow framework.

To train the network, we used the seminal work of Eigen et al. as a starting point.

They proposed a deep learning architecture that achieved state-of-the-art performance on the task of depth reconstruction from a single image. We modified this architecture to improve its accuracy and efficiency.

Technical Implementation

We implemented the deep neural network using TensorFlow and trained it on Google Colab for group collaboration. We used a combination of convolutional and fully connected layers to extract features from the input image and predict the depth map.

We experimented with different loss functions, including mean squared error and binary cross-entropy, and fine-tuned the hyperparameters to achieve optimal performance. We also implemented data augmentation techniques such as random flipping and rotation to improve the robustness of the model.

Results

Our model achieved a training loss of 0.7759 and a validation loss of 0.8067 on the KITTI dataset over 150 epochs of training.

We also visualized the depth maps generated by our model and compared them to the ground truth depth maps. Our results demonstrated that our model was able to accurately predict the depth of a scene from a single image.

Conclusion

In conclusion, our project demonstrated the potential of deep neural networks for depth reconstruction from a single image. We were able to successfully train a deep neural network to predict the depth of a scene from a single image on the KITTI dataset using the TensorFlow framework and a modified version of the deep learning architecture proposed by Eigen et al.

Our project has several potential applications, including autonomous driving, robotics, and augmented reality. We believe that the techniques and insights gained from this project will be valuable for future research in computer vision and machine learning.