Novel View Synthesis with View-Dependent Effects from a Single Image

Our NVSVDE-Net is the first single-view NVS method to be trained in a completely self-supervised manner. Neither depths nor pose annotations are required, while the other methods rely on given depths and/or camera poses,

Abstract

In this paper, we firstly consider view-dependent effects into single image-based novel view synthesis (NVS) problems.

For this, we propose to exploit the camera motion priors in NVS to model view-dependent appearance or effects (VDE) as the negative disparity in the scene. By recognizing specularities `follow' the camera motion, we infuse VDEs into the input images by aggregating input pixel colors along the negative depth region of the epipolar lines. Also, we propose a `relaxed volumetric rendering' approximation that allows computing the densities in a single pass, improving efficiency for NVS from single images. Our method can learn single-image NVS from image sequences only, which is a completely self-supervised learning method, for the first time requiring neither depth nor camera pose annotations.

We present extensive experiment results and show that our proposed method can learn NVS with VDEs, outperforming the SOTA single-view NVS methods on the RealEstate10k and MannequinChallenge datasets.

Proposed Single-View NVS Method

Modeling View-Dependent Effects (VDE)

We observe a strong prior in (view-dependent effects) VDEs, VDEs `follow' the camera motion (no disparities).

(a) We synthesize target VDEs (black dots) induced on the input view I by re-sampling I along the negative depth region of the epipolar line (green dots). In contrast, 3D geometry does present disparities between views (blue dots). (b) VDEs present disparities relative to their reflective surfaces in the opposite direction than the projection of the reflective surface itself. (c) and (d) VDE disparity due to novel camera poses is proportional to the reflective surface disparity. The closer the reflective surface, the larger the VDE disparity.

Overall Network Architecture

The proposed NVSVDE-Net models VDEs at the input view as the negative scene disparities under the target camera motion R_c | t_c. Novel views are estimated in two stages. Firstly with coarse fixed ray samples t_i, then with refined adaptive sampling distances t^*_k.

Results on RealEstate10k

Our results on the RealEstate10k dataset (RE10k).

Results on the MannequinChallenge Dataset

Our results on the MannequinChallenge dataset (MC).

Extreme NVS

We trained our NVSVDE-Net to render views that are at most 16 frames apart from the single-image input. In this experiment, we render views equivalent to 40 frames apart from the input view. Despite the inherent challenges associated with extreme Novel View Synthesis, our method consistently produces realistic views, albeit with certain observable artifacts, as anticipated in any single-view NVS framework.

BibTeX


      @misc{bello2023novel,
        title={Novel View Synthesis with View-Dependent Effects from a Single Image},
        author={Juan Luis Gonzalez Bello and Munchurl Kim},
        year={2023},
        eprint={2312.08071},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
      }