🌀 Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models

1ByteDance   2Peking University   3CUHK   4Xiamen University   5UT Austin
* Equal contribution   † Project lead   ‡ Corresponding author
arXiv   Code  

Diff4Splat is a a unified framework directly predicts dynamic 4D scene content without test-time optimization.

🧩   Abstract   🧩

We introduce Diff4Splat, a feed-forward framework for controllable 4D scene generation from a single image. Our method synergizes the powerful generative priors of video diffusion models with geometric and motion constraints learned from a large-scale 4D dataset. Given a single image, camera trajectory, and optional text prompt, our model directly predicts a complete, dynamic 4D scene represented by deformable 3D Gaussian Splats. This approach captures appearance, geometry, and motion in a single pass, eliminating the need for test-time optimization or post-hoc processing. At the core of our framework is a video latent transformer that enhances existing video diffusion models, enabling them to jointly model spatio-temporal dependencies and predict 3D Gaussian Splats over time. Supervised by objectives targeting appearance fidelity, geometric accuracy, and motion consistency, Diff4Splat achieves performance comparable to state-of-the-art optimization-based methods for dynamic scene synthesis while being significantly more efficient.

🧩   Method   🧩

The network architecture of Diff4Splat. We present a high-fidelity 4D scene generation method from single images through four key innovations: video diffusion latents processed by our novel Transformer enabling dynamic 3DGS deformation, unified supervision with photometric, geometric, and motion losses, and progressive training for robust geometry and texture.

🧩   Results of Diff4Splat   🧩

Input Image Ours (feed-forward) MoSca (test-time optimization)

🧩   Ablation Studies of Diff4Splat   🧩

Ablation of Deformation Gaussian Field shows that removing this module results in ghosting artifacts the red bounding boxes.

📚   BibTeX   📚

If you find our work helpful, please consider citing:

      @article{xxx,
        title={},
        author={},
        journal={arXiv preprint arXiv:},
        year={2025}
      }