Sources of Uncertainty in 3D Scene Reconstruction

Title Sources of Uncertainty in 3D Scene Reconstruction
Author Marcus Klasson and Riccardo Mereu and Juho Kannala and Arno Solin
Conf/Jour ECCV Workshop on Uncertainty Quantification for Computer Vision.
Year 2024
Project AaltoML/uncertainty-nerf-gs: Code release for the paper “Sources of Uncertainty in 3D Scene Reconstruction”
Paper Sources of Uncertainty in 3D Scene Reconstruction

image.png|666

Q: 环境光照可以建模为不确定性吗?

AIR

  • Aleatoric Uncertainty
    • random effects in the observations include varying lighting and motion blur
  • Epistemic Uncertainty
    • lack of information in the scene such as occluded (can be reduced by observing more data from new poses)
    • challenging scenes : low texture, repetitive patterns and insufficient overlap in images
  • Confounding outliers
    • non-static scenes (passers by, moving object)
      • non-static elements in a scene, such as moving people or vegetation, introduce variability that is often interpreted as aleatoric noise. However, these elements can also obscure parts of the scene, acting as a source
        of occlusion for parts of the scene
  • Pose Uncertainty
    • Sensitivity to the camera poses in the scene

Contribution:

  • identify and categorize sources of uncertainties in 3D scene reconstruction and propose methods for systematically evaluating their impact.
  • perform an empirical study using efficient NeRF and GS models from Nerfstudio [49] to compare the performance of various uncertainty estimation techniques on the sources of uncertainty.

RelatedWork:

  • Uncertainty Estimation in Deep Learning:
    • Aleatoric Uncertainty commonly is modeled to predict a Probability Distribution from the network (mean and variance). However, this modification is insufficient to model epistemic uncertainty in the model parameters,
    • Epistemic Uncertainty
      • Bayesian deep learning methods give means to quantify epistemic uncertainty through posterior approximations to obtain the predictive distribution.
      • nsembles estimate uncertainty by predictions from multiple networks trained with different weight initializations
      • MC-Dropout performs predictions by masking weights in the network by enabling dropout at test time.
      • the Laplace approximation has been shown to be a scalable and fast option to obtain predictive uncertainties from already-trained networks in a post-hoc fashion
  • Uncertainty Estimation in NeRFs and GS
    • ActiveNeRF models a Gaussian distribution over rendered RGB pixels with the goal of next-best view selection, which spurred interest in this application as well as exploring more flexible probability distributions.
    • Estimating epistemic uncertainty in few-view settings was studied, which require significant modifications to the NeRF architecture as they use variational inference for optimization.
    • Later works have focused on architecture agnostic不可知论 approaches, used ensembles of efficient NeRF backbones to estimate uncertaintie
    • a calibration method for correcting uncalibrated predictions on novel scenes of already-trained NeRFs
    • Bayes’ Rays uses perturbations in a spatial grid to define a NeRF architecture-agnostic spatial uncertainty estimated using the Laplace approximation,
    • while FisherRF [14] computes the Fisher information over the parameters in both NeRF- and GS-based methods to quantify the uncertainty of novel views.
    • Recently, robustness to confounding outliers and removing distractors has been studied
      • using view-specific embeddings
      • leveraging pre-trained networks
      • using robust optimization to learn what scene parts are static and dynamic
    • Other works have aimed to consider aleatoric noise, motion blur and rolling shutter effects, by explicitly modeling for these
    • Furthermore, camera pose optimizers have been proposed to correct inaccurate camera parameters alongside optimizing the scen

Method

In particular, for the aleatoric ones, we adapt the approach proposed in Active-NeRF [30], while, for the epistemic approaches, we use MC-Dropout [7], the Laplace approximations [6] and ensembles [19].

  • ActiveNeRF: Learning where to see with uncertainty estimation.
  • Dropout as a Bayesian approximation: Representing model uncertainty in deep learning.
  • Laplace redux-effortless Bayesian deep learning.
  • Simple and scalable predictive uncertainty estimation using deep ensembles.

limit the use of MC-Dropout and LA to NeRFs since both are Bayesian deep learning methods and, thus, non-trivial to extend to GS. 两个都是通过预测一条光线上的点颜色(NeRF连续),无法简单的用到3DGS中*,两外两个Active-NeRF 和 Ensemble 可以用是因为他们用不同的trained 网络来建模不确定性

Active-NeRF/GS

颜色被处理成gaussian random variable $\mathbf{c}\sim\mathcal{N}(\mathbf{c};\bar{\mathbf{c}},\beta)$ $\bar{\mathbf{c}}\in\mathbb{R}^3$,$\beta\in\mathbb{R}^+$ 并学习均值和方差,方差由另一个网络进行预测

方差被表述为单个量,没有考虑不同通道的color方差不同

$\mathbf{c}_{\text{Active-NeRF}}=\sum_{i=1}^{N_s}T_i\alpha_i\bar{\mathbf{c}}_i\quad\mathrm{and}\quad\mathrm{Var}(\mathbf{c}_{\text{Active-NeRF}})=\sum_{i=1}^{N_s}T_i^2\alpha_i^2\beta_i,$

$\mathbf{c}_{\text{Active-GS}}=\sum_{i=1}^{N_p}T_i\alpha_i\bar{\mathbf{c}}_i\mathrm{~and~}\mathrm{~Var}(\mathbf{c}_{\text{Active-GS}})=\sum_{i=1}^{N_p}T_i\alpha_i\beta_i,$

MC-Dropout NeRF

The uncertainty is estimated by applying dropout M times during inference to obtain M rendered RGB predictions,然后计算这M个rendered color的均值与方差

均值就是均匀分布的期望
这里方差公式应该是写错了吧,应该是:$\mathrm{Var}(\mathbf{c}_{\mathrm{MC-Dropout}}) \approx\frac1M\sum_{m=1}^M\mathbf{c}_{\text{NeRF}}^2-\left(\frac1M\sum_{m=1}^M\mathbf{c}_{\text{NeRF}}\right)^2.$

Laplace NeRF

The idea is to approximate the intractable posterior distribution over the weights with a Gaussian distribution centered around the mode of $p(\mathbf{\theta|\mathcal{D}})$, where $\mathcal{D}$ is the training data set.

  • Mean is set as a local maximum of the posterior $\mathbf{\theta^{*}}=\arg\max_{\mathbf{\theta}}\log p(\mathbf{\theta|\mathcal{D}})$ is obtained by training the network until convergence.
  • Covariance matrix $\log h(\mathbf{\theta})\approx \log h\left( \mathbf{\theta^{}}-\frac{1}{2}(\mathbf{\theta}-\mathbf{\theta^{}})^{\top}\mathrm{H}(\mathbf{\theta}-\mathbf{\theta^{*}}) \right)$
    • by Taylor expanding $\log p(\mathbf{\theta}|\mathcal{D})$ around $\mathbf{\theta^{*}}$
    • $\mathrm{H}=-\nabla^{2}_{\mathbf{\theta}\log h(\mathbf{\theta})|_{\mathbf{\theta^{}}}}$ is the Hessian matrix of the unnormalized log-posterior at $\mathbf{\theta^{}}$

the Laplace posterior approximation as the Gaussian distribution $p(\boldsymbol{\theta}\mid\mathcal{D}) \approx q(\boldsymbol{\theta}) = \mathcal{N}(\boldsymbol{\theta}\mid\boldsymbol{\theta}^*,\mathbf{H}^{-1})$

pointcloud color mean $\hat{\mathbf{c}}$ and variance $\hat{\beta}$ for every input $\mathbf{x}$ along the ray via MC sampling from approximate posterior
$\mathbf{c}_{\text{Laplace}}=\sum_{i=1}^{N_s}T_i\alpha_i\hat{\mathbf{c}}_i\quad\text{and}\quad\text{Var}(\mathbf{c}_{\text{Laplace}})=\sum_{i=1}^{N_s}T_i^2\alpha_i^2\hat{\beta}_i.$

Ensemble NeRF/GS

training M network with different weight initialization to different local minima

$\mathbf{c}_{\mathrm{ens}}=\frac1M\sum_{m=1}^M\mathbf{c}_{\mathrm{NeRF/GS}}^{(m)}\text{ and Var}(\mathbf{c}_{\mathrm{ens}})\approx\frac1M\sum_{m=1}^M\mathbf{c}_{\mathrm{ens}}^2-\left(\frac1M\sum_{m=1}^M\mathbf{c}_{\mathrm{ens}}\right)^2.$

Welcome to my other publishing channels