I am a PhD candidate in Computer Science at Case Western Reserve University (CWRU), where I am advised by Prof. Yu Yin.

Prior to that, I was a visiting student at ShanghaiTech University, supervised by Prof. Dinggang Shen. I received my M.S. in Information Science from University of Pittsburgh (Pitt) in 2022, supervised by Prof. Yu-Ru Lin. I received my B.S. in Computing and Information Science from Guangdong University of Technology, supervised by Prof. Weihua He.

I have broad research interests in Computer Vision and Vision-Language Models, with a particular focus on advancing spatial intelligence in the next generation of AI systems.

📚 Selected Publications

2025.10

Spatial Intelligence in Vision-Language Models: A Comprehensive Survey

Disheng Liu, Tuo Liang, Zhe Hu, Jierui Peng, Yiren Lu, Yi Xu, Yun Fu, Yu Yin; Website; GitHub

Vision-Language Models (VLMs) have achieved great success but still lack spatial intelligence, and this survey provides the first unified overview of recent advances, taxonomies, and evaluations toward building spatially intelligent AI.

2025.06

Balancing Fidelity and Diversity: Synthetic data could stand on the shoulder of the real in visual recognition

Disheng Liu, Tuo Liang, Yu Yin; Github

With the rapid progress of generative models, synthetic data has become a common solution to data scarcity in AI. However, is using it directly without curation ideal for visual recognition? We systematically study how data fidelity and diversity affect recognition performance and show that balancing these factors significantly improves results through a training-free curation pipeline.

2025.03

CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data

Disheng Liu*, Yiran Qiao*, Wuche Liu, Yiren Lu, Yunlai Zhou, Tuo Liang, Yu Yin, Jing Ma; *Equal contribution; Datasets

True intelligence relies on understanding hidden causal relations, yet current AI and vision models lack benchmarks to assess this ability. We introduce Causal3D, a comprehensive 19-dataset benchmark linking structured and visual data to evaluate causal reasoning, revealing that performance drops sharply as causal complexity increases.

💻 Working Experience

2023.07 - 2024.08, Research Intern, ShanghaiTech IDEA Lab, Shanghai, China.
2022.07 - 2023.07, Algorithm Engineer, Yinwang Intelligent Technology, Shanghai, China.

📝 Servicing

Reviewer for

ICLR’26, CVPR’26, NeurIPS’26

Invited Talk

Dec. 30, 2025, “Spatial Intelligence in Vision-Language Models: What It Is, What Works, and What’s Next,” ENCODE Lab Lecture Series, Westlake University.

🎓 Teaching

Teaching Assistant

• Fall 2025 — CSDS 465: Computer Vision (Instructor: Yu Yin)

• Spring 2025 — CSDS 425: Computer Networks (Instructor: An Wang)

• Fall 2024 — CSDS 425: Computer Networks (Instructor: Mark Allman)

📄 Papers

2026 — Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion
2026 — GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning
2026 — When ‘YES’ Meets ‘BUT’: Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?, IEEE TPAMI
2025 — Spatial Intelligence in Vision-Language Models: A Comprehensive Survey
2025 — Counterfactual Visual Explanation via Causally-Guided Adversarial Steering
2025 — BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting, CVPR 2025
2025 — CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data
2025 — CLIP in Medical Imaging: A Comprehensive Survey, Medical Image Analysis
2023 — Prediction of COVID-19 Patients’ Emergency Room Revisit Using Multi-Source Transfer Learning