I am currently employed as a Senior Researcher at PICO within ByteDance Ltd. Previously, I held the position of Senior Algorithm Engineer at Alibaba Cloud, where I specialized in 3D Reconstruction and 6DoF Pose Estimation.
In 2019, I earned my Master's degree from Xiamen University, where I was enrolled in the School of Informatics.
My research focuses on the intersection of generative models and multi-modal representation learning, with a primary application in the 3D domain. These contributions have been deployed in real-world systems, including embedded XR devices and large-scale platforms like the Aliyun Cloud AI-Box.
I welcome opportunities for coffee chats and collaborations. Please feel free to reach out!
4K4DGEN achieves high-quality Panorama-to-4D generation at a resolution of 4K for the first time using efficient splatting techniques for real-time exploration.
DiffSplat is a novel 3D generative framework that natively generates 3D Gaussians by taming large-scale text-to-image diffusion models. DiffSplat directly generates 3D Gaussians from text prompts or single-view images in 1~2 seconds and achieves SOTA 3D Reconstruction results.
Under review by Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
InstructLayout is a novel generative framework that integrates a semantic graph prior and a layout decoder to improve controllability and fidelity for 2D and 3D layout synthesis.
StegaNeRF/InstantSplamp achieves reliable recovery of hidden information with minimal rendering impact. These works offer a promising outlook on ownership identification in 3D represents and calls for more attention and effort on related problems.
JarvisArt outperforms GPT-4o with a 60% improvement in average pixel-level metrics on MMArt-Bench for content fidelity, while maintaining comparable instruction-following capabilities.