I am a PhD candidate in Computer Science at Case Western Reserve University (CWRU), where I am advised by Prof. Yu Yin.
Prior to that, I was a visiting student at ShanghaiTech University, supervised by Prof. Dinggang Shen. I received my M.S. in Information Science from University of Pittsburgh (Pitt) in 2022, supervised by Prof. Yu-Ru Lin. I received my B.S. in Computing and Information Science from Guangdong University of Technology, supervised by Prof. Weihua He.
I have broad research interests in Computer Vision and Vision-Language Models, with a particular focus on advancing spatial intelligence in the next generation of AI systems.
๐ Selected Publications

Spatial Intelligence in Vision-Language Models: A Comprehensive Survey
Disheng Liu, Tuo Liang, Zhe Hu, Jierui Peng, Yiren Lu, Yi Xu, Yun Fu, Yu Yin;
Website; GitHub
- Vision-Language Models (VLMs) have achieved great success but still lack spatial intelligence, and this survey provides the first unified overview of recent advances, taxonomies, and evaluations toward building spatially intelligent AI.

Disheng Liu, Tuo Liang, Yu Yin; Github
- With the rapid progress of generative models, synthetic data has become a common solution to data scarcity in AI. However, is using it directly without curation ideal for visual recognition? We systematically study how data fidelity and diversity affect recognition performance and show that balancing these factors significantly improves results through a training-free curation pipeline.

CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data
Disheng Liu*, Yiran Qiao*, Wuche Liu, Yiren Lu, Yunlai Zhou, Tuo Liang, Yu Yin, Jing Ma; *Equal contribution; Datasets
- True intelligence relies on understanding hidden causal relations, yet current AI and vision models lack benchmarks to assess this ability. We introduce Causal3D, a comprehensive 19-dataset benchmark linking structured and visual data to evaluate causal reasoning, revealing that performance drops sharply as causal complexity increases.
๐ป Working Experience
- 2023.07 - 2024.08, Research Intern, ShanghaiTech IDEA Lab, Shanghai, China.
- 2022.07 - 2023.07, Algorithm Engineer, Yinwang Intelligent Technology, Shanghai, China.
๐ Servicing
Reviewer for
ICLRโ26, CVPRโ26, NeurIPSโ26
Invited Talk
- Dec. 30, 2025, โSpatial Intelligence in Vision-Language Models: What It Is, What Works, and Whatโs Next,โ ENCODE Lab Lecture Series, Westlake University.
๐ Teaching
Teaching Assistant
โข Fall 2025 โ CSDS 465: Computer Vision (Instructor: Yu Yin)
โข Spring 2025 โ CSDS 425: Computer Networks (Instructor: An Wang)
โข Fall 2024 โ CSDS 425: Computer Networks (Instructor: Mark Allman)
๐ Papers
- 2026 โ Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion
- 2026 โ GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning
- 2026 โ When โYESโ Meets โBUTโ: Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?, IEEE TPAMI
- 2025 โ Spatial Intelligence in Vision-Language Models: A Comprehensive Survey
- 2025 โ Counterfactual Visual Explanation via Causally-Guided Adversarial Steering
- 2025 โ BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting, CVPR 2025
- 2025 โ CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data
- 2025 โ CLIP in Medical Imaging: A Comprehensive Survey, Medical Image Analysis
- 2023 โ Prediction of COVID-19 Patientsโ Emergency Room Revisit Using Multi-Source Transfer Learning