Panwang Pan | ๆฝ˜ๆ”€ๆœ›

I am currently employed as a Researcher and Developer at PICO within ByteDance Ltd. Previously, I held the position of Senior Algorithm Engineer at Alibaba Cloud, where I specialized in 3D Reconstruction and 6DoF Pose Estimation.

In 2019, I earned my Master's degree from Xiamen University, where I was enrolled in the School of Informatics.

I focused on generative models and multi-modal representation learning, particularly in the 3D realm. Research contributions have been integrated into XR devices, Aliyun Cloud AI-Box, and various commercial products.

Email  /  Google Scholar  /  Github  /  Twitter  /  Wechat

profile photo
๐Ÿ“ข Latest News
  • [2025-06] We released PartCrafter, a 3D-native DiT model designed to generate 3D objects in modular parts.
  • [2025-02] One paper about VLM + RRHF (JarvisIR) was accepted to CVPR 2025 ๐ŸŽ‰ .
  • [2025-01] 4K4DGEN was Selected as ICLR25 Spotlight, top 3.2% among 11672 ๐ŸŽ‰.
  • [2025-01] Three papers about 3D/4D Generative Models (InstantSplamp & DiffSplat & 4K4DGEN) were accepted to ICLR 2025.
  • [2024-09] One paper about generalizable single-view human reconstruction (HumanSplat) was accepted to NeurIPS 2024 ๐ŸŽ‰ .
  • [2024-09] One paper about VLM Distillation (MRD) was accepted to ECCV 2024 ๐ŸŽ‰ .
  • ๐Ÿ“‘ Selected Publications ( Google Scholar )
    * Equal contribution
    Preprint 2025

    sym
    PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

    [Paper] [Project] [Code]

    PartCrafter is a structured 3D generative model that jointly generates multiple parts and objects from a single RGB image in one shot.

    Preprint 2025

    sym
    JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

    Yunlong Lin, Zixu Lin, Kunjie Lin, Jinbin Bai, Panwang Pan, Chenxin Li, Haoyu Chen, Zhongdao Wang, Xinghao Dingโ€ , Wenbo Li, Shuicheng Yanโ€ 

    [Paper] [Project] [Code]

    JarvisArt outperforms GPT-4o with a 60% improvement in average pixel-level metrics on MMArt-Bench for content fidelity, while maintaining comparable instruction-following capabilities.

    CVPR 2025

    JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

    [Paper] [Project] [Code]

    JarvisIR is a VLM-powered intelligent system that dynamically schedules expert models for restoration.

    ICLR 2025

    DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splatting Generation

    Chenguo Lin*, Panwang Pan*โ€ , Bangbang Yang, Zeming Li, Yadong Muโ€ก,

    [Openreview] [Paper] [Project] [Code]

    DiffSplat is a novel 3D generative framework that natively generates 3D Gaussians by taming large-scale text-to-image diffusion models. DiffSplat directly generates 3D Gaussians from text prompts or single-view images in 1~2 seconds and achieves SOTA 3D Reconstruction results.

    ICLR 2025 ๐ŸŒŸ spotlight ๐ŸŒŸ

    4K4DGEN: Panoramic 4D Generation at 4K Resolution

    Renjie Li*, Panwang Pan*โ€ก, Bangbang Yang, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhengzhong Tu, Zhiwen Fan

    [Openreview] [Paper] [Project] [Code]

    4K4DGEN achieves high-quality Panorama-to-4D generation at a resolution of 4K for the first time using efficient splatting techniques for real-time exploration.

    NeurIPS 2024

    HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

    Panwang Panโ€ก, Zhou Su Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li1, Tingting Shen, Yadong Mu, Yebin Liuโ€ก

    [Openreview] [Paper] [Project] [Code]

    HumanSplat predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner.

    InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior

    Chenguo Lin, Yuchen Lin, Panwang Panโ€ , Xuanyang Zhang, Yadong Mu

    Under review by Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)

    InstructLayout is a novel generative framework that integrates a semantic graph prior and a layout decoder to improve controllability and fidelity for 2D and 3D layout synthesis.

    Preprint 2025

    DynamicVerse: Physically-Aware Multimodal Modeling for Dynamic 4D Worlds

    Kairun Wen, Yuzhi Huang, Runyu Chen, Hui Zheng, Yunlong Lin, Panwang Pan, Chenxin Li, Wenyan Cong, Jian Zhang, Junbin Lu, Chenguo Lin, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Yue Huang, Xinghao Ding, Rakesh Ranjan, Zhiwen Fan

    [Paper] [Project] [Code]

    DynamicVerse is a physicalโ€‘scale, multimodal 4D modeling framework for real-world video.

    ICCV23 & ICLR25

    StegaNeRF: Embedding Invisible Information within Neural Radiance Fields / InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting

    [StegaNeRF Paper] [StegaNeRF Project] [StegaNeRF Code]

    [InstantSplamp Paper] [InstantSplamp Project] [InstantSplamp Code]


    StegaNeRF/InstantSplamp achieves reliable recovery of hidden information with minimal rendering impact. These works offer a promising outlook on ownership identification in 3D represents and calls for more attention and effort on related problems.

    ๐Ÿ’ผ Experience

    ByteDance Ltd, Beijing, China, Senior Computer Vision Algorithm Engineer, advised by Cheng Chen and Zeming Li.
    08/2022 - Present
    Alibaba Cloud, Hangzhou, China, Senior Computer Vision Algorithm Engineer
    07/2019 - 07/2022
    DevTech Compute, NVIDIA, Beijing, China, AI Developer Technology Engineer Intern
    advised by Xipeng Li .
    07/2018 - 10/2018
    ๐Ÿ† Selected Awards

    2023,2024: ByteStyle Award, Bytedance

    2019: Outstanding Graduates of Xiamen University

    2018: National Scholarship for Postgraduates, Ministry of Education

    2018: First Prize of GEDC, Second Prize of MCM & CPIPC

    2017: ZhongXian Huang Scholarship, Xiamen University (about 10 awards per year)

    2015: National Scholarship for Undergraduates (the highest honor scholarship in China)