Learning Efficient Fuse-and-Refine for Feed-Forward 3D Gaussian Splatting

Previously titled "SplatVoxel: History-Aware Novel View Streaming without Temporal Training"

1ETH Zurich, 2Google

NeurIPS 2025

Teaser image of SplatVoxel
(Left) We study the problem of Online Novel View Streaming from Sparse-view RGB Videos.
(Right) Per-frame reconstruction methods are prone to temporal flickering artifacts. In contrast, our history-aware model delivers high visual quality and temporal consistency, running at 15 fps with a 350ms delay on two-view inputs of 320 × 240 resolution.

History-Aware Streaming Reconstruction w/o Temporal Training

Flicker analysis figure

Method Overview

We present a Hybrid Splat-Voxel feed-forward scene reconstruction framework. Our system is trained only on static scenes, and can generalize at inference time for zero-shot history-aware 4D novel view streaming .

Overview image of SplatVoxel
Our hybrid SplatVoxel model first extracts input image features using a multi-view transformer, which outputs pixel-aligned Gaussian splats for each input image with associated features. The splat features are then deposited onto a coarse-to-fine voxel grid using the decoded position, and a secondary sparse voxel transformer processes the grid features to output final Gaussian parameters. To merge history, we compute the triangulated scene flow from the input views and perform keypoint guided deformations. These deformed splats can be treated identically to the input-aligned splats, and be similarly deposited into voxel grid to merge the previous state with the current state.
SplatVoxel achieves efficient training with better reconstruction details.
Ablation image of SplatVoxel Comparison image of SplatVoxel

Comparison with Exisiting Novel View Synthesis Methods

Main table of SplatVoxel
Main comparison image of SplatVoxel

BibTeX

@misc{wang2025splatvoxel,
  title={SplatVoxel: History-Aware Novel View Streaming without Temporal Training}, 
  author={Yiming Wang and Lucy Chai and Xuan Luo and Michael Niemeyer and Manuel Lagunas and Stephen Lombardi and Siyu Tang and Tiancheng Sun},
  year={2025},
  eprint={2503.14698},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2503.14698}, 
}