Baiqiao Yin

Baiqiao Yin「尹柏乔」

Hi! I'm Baiqiao Yin. Most recently, I had the great fortune to work with Manling Li at Northwestern University, where we pushed the boundaries of spatial intelligence. Previously, I got my B.Eng. in Intelligent Science and Technology in SYSU, where I worked closely with Xiaodan Liang.
Right now I'm spending my gap year at New York University with Saining Xie as a visiting student, collaborating with Yiming Li on spatial intelligence.
I was also fortunate to collaborate with with Fei-Fei Li and Jiajun Wu.
⭐I am open for discussion/collaborations about spatial intelligence and looking for PhD opportunities (26 Fall). If you think there is anything interesting we can discuss, feel free to email me!

Email / Scholar / Github

Internship

2024.07 - 2024.11, Shanghai AI Lab, Embodied AI Group. Mentor: Xudong Xu. Topic: Indoor scene generation.
2023.05 - 2024.04, Peking University(SZ), HRI Lab. Mentor: Mengyuan Liu. Topic: Human action recognition.

📝Research

💬My research interests lie in spatial intelligence. Currently, my focus is on designing spatial intelligence agents with the following capabilities:

Spatial-Semantic Fusion
Spatial Mental Maniputation
Spatial Consistency Perception
Spatial Active Perception
Dynamic Spatial Understanding

	Spatial Mental Modeling from Limited Views Baiqiao Yin, Qineng Wang, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Manling Li, Jiajun Wu, Li Fei-Fei (Best Paper)ICCV Workshop on Structural Priors for Vision (SP4V)/ (Oral)ACMMM Workshop on Multimodal Foundation Models for Spatial Intelligence (MFMSI), 2025 project page / arXiv Key Takeaway: Guiding VLMs to first generate cognitive maps, then reason upon them, is an effective approach to approximate spatial mental modeling with limited views.
	⭐Awesome Spatial Intelligence in VLM (500+ Stars) Baiqiao Yin GitHub Repository, 2024-Present GitHub A curated collection of papers, datasets, and resources on spatial intelligence in vision-language models, maintained and regularly updated with the latest research developments.
	Skeleton2Point: Recognizing Skeleton-Based Actions as Point Clouds Baiqiao Yin, Jiaying Lin, Jiajun Wen, Yue Li, Jinfu Liu, Yanfei Wang, Mengyuan Liu IROS, 2025(Oral) project page / paper Regard skeleton joints as point cloud via incorporating the position information of skeletons into point cloud methods, demonstrating the validity of modeling position relationships with 3D coordinates.
	TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yiqiang Yan, Xiaodan Liang arxiv, 2024 project page / arxiv Theatergen can interact with users to consistently generate images over multiple Turns.
	HDBN: A Novel Hybrid Dual-branch Network for Robust Skeleton-based Action Recognition Jinfu Liu, Baiqiao Yin, Jiaying Lin, Jiajun Wen, Yue Li, Mengyuan Liu ICME, 2024 code / paper Benefits from the graph convolutional network's proficiency in handling graph-structured data and the powerful modeling capabilities of Transformers for global information.
	LVLM-CL: Make Large Vision-Language Models Work Better Under Continual Learning Settings Baiqiao Yin, Tech Report paper Devise a task-specific continual learning setting especially for LVLMs by classifying the instruction tuning data for the second finetune process of LVLMs into several different tasks

🏆Honors and Awards

2024.04: Champion of ICME Grand Challenge Multi-Modal Video Reasoning and Analyzing Competition.
2023.10: The Second Prize of Intelligent Robot Fighting and gaming competition.
2023.10: Academic Competition Scholarship.
2023.10: The Third Prize Scholarship.
2022.10: Academic Competition Scholarship.
2022.10: The Third Prize Scholarship.