Publications

2025

  1. WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models
    Yongan Yu, Qingchen Hu, Xianda Du, and 3 more authors
    In Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025
  2. think.png
    THiNK: Can Large Language Models Think-aloud?
    Yongan Yu, Mengqian Wu, Yiran Lin, and 1 more author
    Jun 2025
  3. maintainbench.png
    MaintainCoder: Maintainable Code Generation Under Dynamic Requirements
    *Zhengren Wang, *Rui Ling, *Chufan Wang, and 5 more authors
    Apr 2025
  4. codeflow.png
    CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
    Sizhe Wang, Zhengren Wang, Dongsheng Ma, and 5 more authors
    Apr 2025
  5. From Recall to Reasoning: Automated Question Generation for Deeper Math Learning through Large Language Models
    Yongan Yu, Alexandre Krantz, and Nikki G Lobczowski
    In International Conference on Artificial Intelligence in Education, Apr 2025