Publications

2025

  1. think.png
    THiNK: Can Large Language Models Think-aloud?
    Yongan Yu, Mengqian Wu, Yiran Lin, and 1 more author
    Jun 2025
  2. wximpactbench.png
    WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models
    Yongan Yu, Qingchen Hu, Xianda Du, and 3 more authors
    May 2025
  3. reasoning.png
    From Recall to Reasoning: Automated Question Generation for Deeper Math Learning through Large Language Models
    Yongan Yu, Alexandre Krantz, and Nikki G Lobczowski
    May 2025
  4. maintainbench.png
    MaintainCoder: Maintainable Code Generation Under Dynamic Requirements
    *Zhengren Wang, *Rui Ling, *Chufan Wang, and 5 more authors
    Apr 2025
  5. codeflow.png
    CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
    Sizhe Wang, Zhengren Wang, Dongsheng Ma, and 5 more authors
    Apr 2025