SplineGS:

Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

Jongmin Park* 1     Minh-Quan Viet Bui* 1     Juan Luis Gonzalez Bello 1     Jaeho Moon 1     Jihyong Oh† 2     Munchurl Kim† 1
*Co-first authors (equal contribution)
Co-corresponding authors
1KAIST, South Korea        2Chung-Ang University, South Korea

COLMAP-free NVS Comparisons with RoDynRF on DAVIS

We compare our SplineGS with the COLMAP-free novel view synthesis method RoDynRF [27]. Each monocular sequence consists of 40-90 frames of resolution 480x854. We provide the fixed-view-varying-time and spiral-zoom renderings and evaluate the quality by blindly image quality assessment via MUSIQ* score (the higher the better).
*MUSIQ: Multi-scale Image Quality Transformer. In ICCV, 2021.

Fixed-view-varying-time.

RoDynRF (left) vs. Ours (right).

Spiral-zoom.

RoDynRF (left) vs. Ours (right).

Abstract

TL;DR: We propose SplineGS, a COLMAP-free dynamic 3D Gaussian Splatting (3DGS) framework for high-quality reconstruction and fast rendering from monocular videos. At its core is a novel Motion-Adaptive Spline (MAS) method, which represents continuous dynamic 3D Gaussian trajectories using cubic Hermite splines. Experiments show that SplineGS significantly outperforms state-of-the-art methods in novel view synthesis quality for dynamic scenes from monocular videos, achieving thousands times faster rendering speed.

Framework Architecture

architecture

Overview of SplineGS. Our SplineGS leverages spline-based functions to model the deformation of dynamic 3D Gaussians with a novel Motion-Adaptive Spline (MAS) architecture. It is composed of sets of learnable control points based on a cubic Hermite spline function to accurately model the trajectory of each dynamic 3D Gaussian and to achieve faster rendering speed. To avoid any preprocessing of camera parameters, i.e. COLMAP-free, we adopt a two-stage optimization: warm-up and main training stages.

Novel View Synthesis on NVIDIA

We follow the experimental settings of RoDynRF [27]. Each monocular training sequence consists of 12 frames of resolution 270x480.

Training Views.

Novel View Synthesis.

Full screen for better visualization.

Novel View and Time Synthesis on NVIDIA (more challenging)

As discussed in Section 5.1 in the main paper, we propose to evaluate the rendering for unseen timesteps during training. We follow the dataset sampling strategy in [22], which samples 24 timestamps from the NVIDIA dataset. In addition, to simulate a larger motion, we exclude frames with odd time indices in the training sets. To ensure all test timestamps are not seen during training, and thus, to create a more challenging novel view and time synthesis validation, we exclude frames with even time indices in the test sets. Each training frame has the resolution 288x540.

Training Views

Novel View and Time Synthesis.

Full screen for better visualization.

Dynamic 3D Gaussian Trajectory Visualization on DAVIS

We visualize the rasterized 2D trajectories of the dynamic 3D Gaussians of SplineGS on DAVIS dataset. The 3D trajectories are projected to a fixed 2D novel view.

Toy Example of Dynamic 3D Gaussian Trajectory Editing

We visualize a toy example for editing effects when we modify a few control points. The spline-based representation potentially offers a simple motion trajectory manipulation while preserving smooth motion along temporal axis, which can be explored in future works.

BibTeX

@misc{park2024splinegsrobustmotionadaptivespline,
      title={SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video}, 
      author={Jongmin Park and Minh-Quan Viet Bui and Juan Luis Gonzalez Bello and Jaeho Moon and Jihyong Oh and Munchurl Kim},
      year={2024},
      eprint={2412.09982},
      archivePrefix={arXiv},
}