SplatVoxel: History-Aware Novel View Streaming without Temporal Training

Yiming Wang¹, Lucy Chai², Xuan Luo², Michael Niemeyer², Manuel Lagunas²,
Stephen Lombardi², Siyu Tang¹, Tiancheng Sun²

¹ETH Zurich, ²Google

(Left) We study the problem of Online Novel View Streaming from Sparse-view RGB Videos.
(Right) Per-frame reconstruction methods are prone to temporal flickering artifacts. In contrast, our history-aware model delivers high visual quality and temporal consistency, running at 15 fps with a 350ms delay on two-view inputs of 320 × 240 resolution.

History-Aware Streaming Reconstruction w/o Temporal Training

Method Overview

We present a Hybrid Splat-Voxel feed-forward scene reconstruction framework. Our system is trained only on static scenes, and can generalize at inference time for zero-shot history-aware 4D novel view streaming .

Our hybrid SplatVoxel model first extracts input image features using a multi-view transformer, which outputs pixel-aligned Gaussian splats for each input image with associated features. The splat features are then deposited onto a coarse-to-fine voxel grid using the decoded position, and a secondary sparse voxel transformer processes the grid features to output final Gaussian parameters. To merge history, we compute the triangulated scene flow from the input views and perform keypoint guided deformations. These deformed splats can be treated identically to the input-aligned splats, and be similarly deposited into voxel grid to merge the previous state with the current state.

SplatVoxel achieves efficient training with better reconstruction details.

BibTeX

@misc{wang2025splatvoxel,
  title={SplatVoxel: History-Aware Novel View Streaming without Temporal Training}, 
  author={Yiming Wang and Lucy Chai and Xuan Luo and Michael Niemeyer and Manuel Lagunas and Stephen Lombardi and Siyu Tang and Tiancheng Sun},
  year={2025},
  eprint={2503.14698},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2503.14698}, 
}

SplatVoxel: History-Aware Novel View Streaming without Temporal Training

History-Aware Streaming Reconstruction w/o Temporal Training

Method Overview

Comparison with Exisiting Novel View Synthesis Methods

BibTeX