Recent advances in neural rendering have shown that, albeit slow, implicit compact models can learn a scene's geometries and view-dependent appearances from multiple views. To maintain such a small memory footprint but achieve faster inference times, recent works have adopted `sampler' networks that adaptively sample a small subset of points along each ray in the implicit neural radiance fields. Although these methods achieve up to a 10x reduction in rendering time, they still suffer from considerable quality degradation compared to the vanilla NeRF.
In contrast, we propose ProNeRF, which provides an optimal trade-off between memory footprint (similar to NeRF), speed (faster than HyperReel), and quality (better than K-Planes). ProNeRF is equipped with a novel projection-aware sampling (PAS) network together with a new training strategy for ray exploration and exploitation, allowing for efficient fine-grained particle sampling.
Our ProNeRF yields state-of-the-art metrics, being 15-23x faster with 0.65dB higher PSNR than NeRF and yielding 0.95dB higher PSNR than the best published sampler-based method, HyperReel. Our exploration and exploitation training strategy allows ProNeRF to learn the full scenes' color and density distributions while also learning efficient ray sampling focused on the highest-density regions. We provide extensive experimental results that support the effectiveness of our method on the widely adopted forward-facing and 360 datasets, LLFF and Blender, respectively.
A conceptual illustration of our fast and high-quality projection-aware sampling of neural radiance fields (ProNeRF). The reference views are available during training and testing. The target view is drawn for illustrative purposes only.
We provide extensive experimental results on the LLFF and Blender datasets to show the effectiveness of our method in comparison with recent SOTA methods. Also, we present a comprehensive ablation study that supports our design choices and main contributions. More results are shown in Supplemental. We evaluate the rendering quality of our method by three widely used metrics: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM) and Learned Perceptual Image Patch Similarity (LPIPS) . When it comes to SSIM, there are two common implementations available, one from Tensorflow (used in the reported metrics from NeRF, MobileNeRF, and IBRnet), and another from sci-kit image (employed in ENeRF, RSeN, NLF). We denoted the metrics from Tensorflow and sci-kit image as SSIM_t and SSIM_s, respectively. Similarly, for LPIPS, we can choose between two backbone options, namely AlexNet and VGG. We present our SSIM and LPIPS results across all available choices to ensure a fair and comprehensive evaluation of our method's performance.
Performance trade-off of neural rendering (memory, speed, quality) on the LLFF dataset. Our ProNeRF yields the best performance profiling (memory, speed, quality) trade-off.
Qualitative comparisons for the LLFF dataset. Zoom in for better visualization.
This work was supported by IITP grant funded by the Korea government (MSIT) (No. RS2022-00144444, Deep Learning Based Visual Representational Learning and Rendering of Static and Dynamic Scenes).
@ARTICLE{10504815,
author={Bello, Juan Luis Gonzalez and Bui, Minh-Quan Viet and Kim, Munchurl},
journal={IEEE Access},
title={ProNeRF: Learning Efficient Projection-Aware Ray Sampling for Fine-Grained Implicit Neural Radiance Fields},
year={2024},
volume={12},
number={},
pages={56799-56814},
keywords={Rendering (computer graphics);Three-dimensional displays;Training;Image color analysis;Geometry;Pipelines;Neural radiance field;3D reconstruction;neural radiance field;neural rendering;view synthesis;ray sampling},
doi={10.1109/ACCESS.2024.3390753}}