借着最近开题写开题报告的机会,比较细致地整理了一下之前看过的自监督单目深度估计相关的论文。合计了一下,感觉写篇综述有点太耗时耗力,干脆就在这里分享出来好了。
[En]
Taking the opportunity of writing the opening report recently, I sorted out the papers related to the self-supervised monocular depth estimation that I had seen before. After taking it all together, I feel that it takes too much time and effort to write a review, so I might as well share it here.
论文列表持续更新中
广告时间:SMDE-Pytorch
一个基于Pytorch的自监督单目深度估计开发、训练和测试开源工具箱
GitHub
由于近一年都在做自监督单目深度估计的相关工作,自己也动手尝试了不少代码。但自监督单目深度估计一直没有一个像MMsegmentation一样的囊括各种方法的开发工具箱。既然没有那就自己造一个!
- 对于只是想尝试或者体验一下效果的人,该工具箱可以通过简单的配置和命令实现对你自己图像的深度估计。
[En]
for those who just want to try or experience the effect, the toolkit can estimate the depth of your own image through simple configuration and commands.*
- 对于科研工作者,该工具箱中提供最近流行方法的预训练模型,以及统一的测试代码,可以方便地进行对比。
[En]
for researchers, the toolbox provides a pre-training model of the latest popular methods, as well as a unified test code, which can be easily compared.*
- 对于想进一步开发的人,该工具箱可以方便地替换网络结构,损失函数等部分,让你更快速地进行探索和实验(尽请期待)。
[En]
for those who want to develop further, the toolkit can easily replace the network structure, loss function and other parts, allowing you to explore and experiment more quickly (please look forward to it).*
自监督学习的单目深度估计
单目深度估计的目标是从一幅给定的图像中预测一幅深度图,表示图像中每个像素对应的场景与相机之间的距离。基于自监督学习的单目深度估计方法使用深度网络模型完成稠密深度的预测,并且在训练阶段不需要带有深度真值的训练样本,而采用视频序列中的连续帧或双目相机拍摄的图像对作为输入,以图像重建作为目标对深度网络模型进行训练。
[En]
The goal of monocular depth estimation is to predict a depth map from a given image, which represents the distance between the scene and the camera corresponding to each pixel in the image. The monocular depth estimation method based on self-supervised learning uses the depth network model to predict the dense depth, and the training samples with true depth values are not needed in the training stage. on the other hand, the continuous frames in the video sequence or the image pairs captured by the binocular camera are used as the input, and the image reconstruction is taken as the target to train the depth network model.
根据训练时使用的样本形式,基于自监督学习的单目深度估计方法可以大致被分为两类:采用视频序列训练的方法和采用双目图像训练的方法。
[En]
According to the form of samples used in training, monocular depth estimation methods based on self-supervised learning can be roughly divided into two categories: video sequence training and binocular image training.
采用双目图像训练的方法在训练阶段以双目相机拍摄的图像对作为训练样本。不同于视频序列图像之间相机运动的位姿未知,拍摄双目图像的相机相对位置是固定的,所以采用双目图像训练的方法只需要预测目标图像的深度图。考虑到双目图像中像素的视差与场景深度呈反比关系,所以这些方法来也可以预测视差图,并转换为深度图。
[En]
The binocular image training method is used to take the image pairs captured by the binocular camera as the training samples in the training stage. Different from the unknown position of the camera motion between the video sequence images, the relative position of the camera shooting the binocular image is fixed, so the binocular image training method only needs to predict the depth map of the target image. Considering that the parallax of the pixels in the binocular image is inversely proportional to the depth of the scene, these methods can also predict the disparity map and convert it into a depth map.