Panwang Pan | 潘攀望

I am currently employed as a Researcher and Developer at PICO Architecture Group within ByteDance Ltd. Previously, I held the position of Senior Algorithm Engineer at Alibaba Cloud, where I specialized in 3D Reconstruction and 6DoF Pose Estimation.

In 2019, I earned my Master's degree from Xiamen University, where I was enrolled in the School of Informatics.

I focused on generative models and multi-modal representation learning, particularly in the 3D realm. Research contributions have been integrated into XR devices, Aliyun Cloud AI-Box, and various commercial products.

Email  /  Google Scholar  /  Github  /  Twitter

profile photo
📢 Latest News
  • [2025-02] One paper about VLM + RRHF (JarvisIR) was accepted to CVPR 2025 🎉 .
  • [2025-01] 4K4DGEN was Selected as ICLR25 Spotlight, top 3.2% among 11672 🎉.
  • [2025-01] Three papers about 3D/4D Generative Models (InstantSplamp & DiffSplat & 4K4DGEN) were accepted to ICLR 2025.
  • [2024-09] One paper about generalizable single-view human reconstruction (HumanSplat) was accepted to NeurIPS 2024 🎉 .
  • [2024-09] One paper about VLM Distillation (MRD) was accepted to ECCV 2024 🎉 .
  • 📑 Selected Publications ( Google Scholar )
    * Equal contribution, † Project leader, ‡ Corresponding author
    ICLR 2025

    DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splatting Generation

    Chenguo Lin*, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu,

    [Openreview] [Paper] [Project] [Code]

    We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussians by taming large-scale text-to-image diffusion models. DiffSplat directly generates 3D Gaussians from text prompts or single-view images in 1~2 seconds and achieves SOTA 3D Reconstruction results.

    ICLR 2025 🌟 spotlight 🌟

    4K4DGEN: Panoramic 4D Generation at 4K Resolution

    Renjie Li*, Panwang Pan*, Bangbang Yang, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhengzhong Tu, Zhiwen Fan

    [Openreview] [Paper] [Project] [Code]

    4K4DGEN achieves high-quality Panorama-to-4D generation at a resolution of 4K for the first time using efficient splatting techniques for real-time exploration. 4K4DGEN leverages 2D diffusion models to generate smooth, animated 360° panoramas with global coherence and Elevates panoramic videos into 4D spaces while ensuring seamless spatial and temporal consistency..

    NeurIPS 2024

    HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

    Panwang Pan*, Zhou Su* Chenguo Lin*, Zhen Fan, Yongjie Zhang, Zeming Li1, Tingting Shen, Yadong Mu, Yebin Liu

    [Openreview] [Paper] [Project] [Code]

    We present HumanSplat that predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In particular, HumanSplat comprises a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors that adeptly integrate geometric priors and semantic features within a unified framework.

    InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior

    Chenguo Lin*, Yuchen Lin* Panwang Pan, Xuanyang Zhang1, Yadong Mu

    Under review by Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)

    INSTRUCTLAYOUT first designs a holistic semantic graph based on user instructions. Within the graph, each node is an object endowed with semantic features such as categories and appearances, and each edge represents a spatial relationship between objects. It proceeds to arrange objects in a scene or canvas by decoding spatial attributes from the informative graph prior.

    ICCV23 & ICLR25

    StegaNeRF: Embedding Invisible Information within Neural Radiance Fields / InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting

    [StegaNeRF Paper] [StegaNeRF Project] [StegaNeRF Code]

    [InstantSplamp Paper] [InstantSplamp Project] [InstantSplamp Code]


    StegaNeRF/InstantSplamp achieves reliable recovery of hidden information with minimal rendering impact. These works offer a promising outlook on ownership identification in 3D represents and calls for more attention and effort on related problems.

    💼 Experience

    ByteDance Ltd, Beijing, China, Senior Computer Vision Algorithm Engineer, advised by Cheng Chen and Zeming Li.
    08/2022 - Present
    Alibaba Cloud, Hangzhou, China, Senior Computer Vision Algorithm Engineer
    07/2019 - 07/2022
    DevTech Compute, NVIDIA, Beijing, China, AI Developer Technology Engineer Intern
    advised by Xipeng Li .
    07/2018 - 10/2018
    🏆 Selected Awards

    2023,2024: ByteStyle Award, Bytedance

    2019: Outstanding Graduates of Xiamen University

    2018: National Scholarship for Postgraduates, Ministry of Education

    2018: First Prize of GEDC, Second Prize of MCM & CPIPC

    2017: ZhongXian Huang Scholarship, Xiamen University (about 10 awards per year)

    2015: National Scholarship for Undergraduates (the highest honor scholarship in China)