Publications

2025

  1. WeatherArchive-Bench: Benchmarking Retrieval-Augmented Reasoning for Historical Weather Archives
    Yongan Yu, Xianda Du, Qingchen Hu, and 7 more authors
    arXiv preprint arXiv:2510.05336, 2025
  2. Maintaincoder: Maintainable code generation under dynamic requirements
    Zhengren Wang, Rui Ling, Chufan Wang, and 5 more authors
    NeurIPS 2025, 2025
  3. MILO: An LLM Multi-Stage Conversational Agent for Fostering Teenagers’ Mental Resilience
    Han Bao, Yongan Yu, Bohan Wang, and 2 more authors
    In Adjunct Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, 2025
  4. THINK: Can Large Language Models Think-aloud?
    Yongan Yu, Mengqian Wu, Yiran Lin, and 1 more author
    arXiv preprint arXiv:2505.20184, 2025
  5. WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models
    Yongan Yu, Qingchen Hu, Xianda Du, and 3 more authors
    In Findings of the Association for Computational Linguistics: ACL 2025, 2025
  6. CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
    Sizhe Wang, Zhengren Wang, Dongsheng Ma, and 5 more authors
    2025
  7. From Recall to Reasoning: Automated Question Generation for Deeper Math Learning through Large Language Models
    Yongan Yu, Alexandre Krantz, and Nikki G Lobczowski
    In International Conference on Artificial Intelligence in Education, 2025