Yuming Gu

I am a PhD Candidate in Computer Science, Viterbi School of Engineering, Unviersity of Southern California, advised by Prof. Stefanos Nikolaidis. Previously I worked at Vision and Graphics Lab with Prof. Hao Li.

My research interests contain Computer Vision, Generative Models and Embodied Agents. In particular, I am interested in using AI-generated content for Embodied Planning and Content Creation.

I am actively seeking full-time opportunities starting in 2026. Please feel free to reach out if you think my background could be a good fit.

Email  /  Resume  /  Google Scholar /  Linkedin

Professional Services
Reviewer of the following conferences/journals:
CVPR, ECCV, ICCV, ACM Multimedia(MM), WACV
Experience

TikTok
San Jose, United States
Research Intern, 2025

MBZUAI
Abu Dhabi, United Arab Emirates
Visiting Student, 2024

TikTok
San Jose, United States
Research Intern, 2023

Teaching

Teaching Assistant, CSCI 585 Database Systems
Teaching Assistant, CSCI 572 Web Engine

Publications
ENVISION: Embodied Visual Planning via Goal-Imagery Video Diffusion
Yuming Gu, Yizhi Wang, Yining Hong, Yipeng Gao, Hao Jiang, Angtian Wang, Bo Liu, Nathaniel S. Dennler, Zhengfei Kuang, Hao Li, Gordon Wetzstein, Chongyang Ma
[Page]
Area: Embodied AI, Video Generation Model

We introduce ENVISION, a novel framework that generates physically plausible planning videos with precise instruction following, enabling direct execution on robotic systems.

DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis
Yuming Gu, Phong Tran, Yujian Zheng, Hongyi Xu, Heyuan Li, Adilbek Karmanov, Hao Li.
CVPR, 2025 [PDF][Page][Code]
Area: Novel View Syntehsis, Diffusion Model, Head

We introduce Diffportrait360, a novel approach generates fully consistent 360-degree head views, accommodating human, stylized, and anthropomorphic forms, including accessories like glasses.

Diffportrait3d: Controllable diffusion for zero-shot portrait view synthesis
Yuming Gu, You Xie, Hongyi Xu, Guoxian Song, Yichun Shi, Di Chang, Jing Yang, Linjie Luo.
CVPR, 2024 (Highlight - Top 9.3%) [PDF][Page][Code]
Area: Novel View Synthesis, Diffusion Model, Face

We present DiffPortrait3D, a conditional diffusion model that is capable of synthesizing 3D-consistent photo-realistic novel views from as few as a single in-the-wild portrait.

DisUnknown: Distilling Unknown Factors for Disentanglement Learning
Sitao Xiang, Yuming Gu, Pengda Xiang, Menglei Chai, Hao Li, Yajie Zhao, Mingming He
ICCV, 2021 [PDF] [Page] [Code]
Area: Disentanglement, neural feature

We adopt a general setting where all factors that are hard to label or identify are encapsulated as a single unknown factor.

One-Shot Identity-Preserving Portrait Reenactment
Sitao Xiang, Yuming Gu, Pengda Xiang, Mingming He, Koki Nagano, Haiwei Chen, Hao Li
arixv preprint, 2020 [PDF]
Area: Image systhesis, human face

We present a deep learning-based framework for portrait reenactment from a single picture of a target (one-shot) and a video of a driving subject.

Protecting World Leaders Against Deep Fakes
Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, Hao Li
Computer Vision and Pattern Recognition (CVPR workshops), 2019 [PDF]
Area: Image systhesis, Media Forensics

we describe a forensic technique that models facial expressions and movements that typify an individual's speaking pattern.


Copy this bro...