Yuyu Luo (骆昱宇)

Logo

Assistant Professor
Data Science and Analytics Thrust
Information Hub
The Hong Kong University of Science and Technology (Guangzhou)

Email: yuyuluo[at]hkust-gz.edu.cn
Office: E2-615, HKUST(GZ)

Google Scholar

Brief Biography

Yuyu directs the Data Intelligence and Analytics Lab DIAL @ HKUST(GZ), where the lab’s mission is to develop next-generation data intelligence systems through research at the intersection of DATA+AI. Its focus areas include large language models, foundation agents, AI for databases (e.g., Text-to-SQL, Table QA), and data-centric AI.

Yuyu has published over 50 papers in top-tier venues in databases and data mining (SIGMOD, VLDB, KDD, TODS) as well as artificial intelligence (ICML, NeurIPS, ICLR, ACL). He has contributed to or advised several influential DATA+AI projects, including OpenManus, Text-to-SQL Handbook, Alpha-SQL, AFlow, DeepEye, and DeepFund. All research projects are open-source and widely used by enterprises such as Huawei, State Grid, ByteDance, Ant Group, and TAL Education.


骆昱宇,现任香港科技大学(广州)助理教授、香港科技大学联署助理教授、博士生导师,数据智能与分析实验室负责人。研究兴趣为DATA+AI数智融合方向,包括数据为中心的人工智能(Data-centric AI, DCAI)、大模型智能体(Foundation Agents)、数据智能体(Data Agents)、智能数据库系统(AI for Databases)。 主持国家自然科学基金青年项目、科技部重点研发计划项目课题等,在数据管理与数据挖掘(SIGMOD/VLDB/ICDE/SIGKDD/TODS)、机器学习(ICML/NeurIPS/ICLR)等领域发表 CCF-A 类论文 40余篇,也担任多个国际顶会的Associate PC Chair/Area Chair和IEEE Data Engineering Bulletin期刊副主编。他获得多个最佳论文/提名奖(如SIGMOD 2023, CIKM 2022, DASFAA 2019),领导或参与开源了多个DATA+AI系统(如OpenManus智能体项目,Github 4.8万+Stars)。他曾获世界人工智能大会云帆奖、福布斯中国“30位30岁以下精英”榜、华为火花奖、清华大学/中国计算机学会优秀博士学位论文奖、清华特等奖学金等多项荣誉。


课题组常年招生DATA+AI数智融合方向(数据智能体、大模型智能体、Data-centric AI、智能数据库系统、AI4S材料大模型)的博士生(3~4名/年)、红鸟硕士生、RA和访问学生。课题组经费充足,计算资源充沛,和业界合作密切,优秀的博士生可与院士或杰青进行联合培养。AI4S材料大模型方向的博士生与张统一院士联合培养。请感兴趣的同学将相关申请材料发送至我邮箱,每份邮件和材料都会仔细阅读。

Preprints

  1. LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning [Code]
    Xiaotian Lin, Yanlin Qi, Yizhang Zhu, Themis Palpanas, Chengliang Chai, Nan Tang, Yuyu Luo
  2. Trainable Dynamic Mask Sparse Attention [Code]
    Jingze Shi, Yifan Wu, Bingheng Wu, Yiran Peng, Liangdong Wang, Guang Liu, Yuyu Luo
  3. Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards
    Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Jian Tan, Guoliang Li
  4. nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity
    Tianqi Luo, Chuhan Huang, Leixian Shen, Boyan Li, Shuyu Shen, Wei Zeng, Nan Tang, Yuyu Luo
  5. AskChart: Universal Chart Understanding through Textual Enhancement
    Xudong Yang, Yifan Wu, Yizhang Zhu, Nan Tang, Yuyu Luo

Surveys

  1. Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
    [ Awesome-Foundation-Agents]
  2. A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?
    [ NL2SQL Handbook]
  3. Graph Neural Networks for Databases: A Survey

Selected Publications

Underline indicates students I supervised. Unless required by industry collaborators, all source code is open-sourced on our Lab’s GitHub repository .

    Year 2025
  1. AFlow: Automating Agentic Workflow Generation
    Jiayi Zhang, Jinyu Xiang, et al. Yuyu Luo, Chenglin Wu
    ICLR 2025 (Oral, top 1.8%). [Code]
  2. Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search
    Boyan Li, Jiayi Zhang, Ju Fan, Yanwei Xu, Chong Chen, Nan Tang, Yuyu Luo
    ICML 2025. [Code] [Slides/PPT, PDF]
  3. NL2SQL-Bugs: A Benchmark for Detecting Semantic Errors in NL2SQL Translation
    Xinyu Liu, Shuyu Shen, Boyan Li, Nan Tang, Yuyu Luo
  4. KDD 2025. [Dataset and Code]
  5. Natural Language to SQL: State of the Art and Open Problems
    Yuyu Luo, Guoliang Li, Ju Fan, Nan Tang
    VLDB 2025 Tutorial.
  6. Data Imputation with Limited Data Redundancy Using Data Lakes
    Chenyu Yang, Yuyu Luo, Chuanxuan Cui, Ju Fan, Chengliang Chai, Nan Tang
    VLDB 2025.
  7. Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation
    Changlun Li, Chenyu Yang, Yuyu Luo, Ju Fan, Nan Tang
    VLDB 2025.
  8. Coreset Selection over Incomplete Data for Data-Effective and Data-Efficient Machine Learning
    Chengliang Chai, Nan Tang, Ju Fan, Yuyu Luo, Guoliang Li, Ye Yuan, Guoren Wang
    TODS 2025.
  9. EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing [Homepage]
    Yizhang Zhu, Runzhi Jiang, Boyan Li, Nan Tang, Yuyu Luo
    COLM 2025.
  10. Data Interpreter: An LLM Agent For Data Science
    Sirui Hong, Yizhang Lin, Bang Liu, Jiayi Zhang, et al, Yuyu Luo, Chenglin Wu
    ACL 2025 Findings.
  11. DeepVIS: Bridging Natural Language and Data Visualization Through Step-wise Reasoning
    Zhihao Shuai, Boyan Li, Siyu Yan, Yuyu Luo, Weikai Yang
    IEEE VIS 2025.
  12. ChartMark: A Structured Grammar for Chart Annotation [Homepage]
    Yiyu Chen, Yifan Wu, Shuyu Shen, Yupeng Xie, Leixian Shen, Hui Xiong, Yuyu Luo
    IEEE VIS 2025 (Short Paper).
  13. Augmenting Realistic Charts with Virtual Overlays
    Yao Shi, Boyan Li, Yuyu Luo, Lei Chen, Nan Tang
    CHI 2025.

  14. Year 2024
  15. The Dawn of Natural Language to SQL: Are We Fully Ready?
    Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, Nan Tang
    VLDB 2024. [Homepage]
  16. HAIChart: Human and AI Paired Visualization System
    Yupeng Xie, Yuyu Luo, Guoliang Li, Nan Tang
    VLDB 2024. [Code]
  17. Are Large Language Models Good Statisticians?
    Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, Nan Tang
    NeurIPS 2024 [ Dataset]
  18. VerifAI: Verified Generative AI
    Nan Tang, Chenyu Yang, Ju Fan, Lei Cao, Yuyu Luo, Alon Halevy
    CIDR 2024.
  19. Data Playwright: Authoring Data Videos with Annotated Narration
    Leixian Shen, Haotian Li, Yun Wang, Tianqi Luo, Yuyu Luo, Huamin Qu
    TVCG 2024. [Homepage]
  20. ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering
    Yifan Wu, Lutao Yan, Leixian Shen, Yunhai Wang, Nan Tang, Yuyu Luo
    EMNLP 2024. [ Dataset]
  21. Fast, Robust and Interpretable Participant Contribution Estimation for Federated Learning
    Yong Wang, Yuyu Luo, Kaiyu Li, Guoliang Li, Yunyan Guo, Zhuo Wang
    ICDE 2024.
  22. Mitigating Data Scarcity in Supervised Machine Learning through Reinforcement Learning Guided Data Generation
    Chengliang Chai, Kaisen Jin, Nan Tang, Ju Fan, Lianpeng Qiao, Yu-Ping Wang, Yuyu Luo, Ye Yuan, Guoren Wang
    ICDE 2024.
  23. CoInsight: Visual Storytelling for Hierarchical Tables with Connected Insights
    Guozheng Li, Runfei Li, Yunshan Feng, Yu Zhang, Yuyu Luo, Chi Harold Liu
    TVCG 2024.

  24. Year 2023
  25. Learned Data-aware Image Representations of Line Charts for Similarity Search
    Yuyu Luo, Yihui Zhou, Nan Tang, Guoliang Li, Chengliang Chai, Leixian Shen
    SIGMOD 2023. [Slides]
  26. GoodCore: Coreset Selection over Incomplete Data for Data-effective and Data-efficient Machine Learning
    Chengliang Chai, Jiabin Liu, Nan Tang, Ju Fan, Dongjing Miao, Jiayi Wang, Yuyu Luo, Guoliang Li
    SIGMOD 2023. (Best of SIGMOD 2023 Papers) [Slides]
  27. Demystifying Artificial Intelligence for Data Preparation
    Chengliang Chai, Nan Tang, Ju Fan, Yuyu Luo
    SIGMOD 2023. [Tutorial Slides: Part1, Part2, Part3]

  28. Year 2022
  29. Steerable Self-driving Data Visualization.
    Yuyu Luo, Xuedi Qin, Chengliang Chai, Nan Tang, Guoliang Li, Wenbo Li.
    IEEE TKDE 2022.
  30. Sevi: Speech-to-Visualization through Neural Machine Translation
    Jiawei Tang, Yuyu Luo, Mourad Ouzzani, Guoliang Li, Hongyang Chen.
    ACM SIGMOD 2022 (Demo Track).
  31. Data Management for Machine Learning: A Survey
    Chengliang Chai, Jiayi Wang, Yuyu Luo, Zeping Niu, Guoliang Li.
    IEEE TKDE 2022.
  32. Towards Natural Language Interfaces for Data Visualization: A Survey
    Leixian Shen, Enya Shen, Yuyu Luo, Xiaocong Yang, Xuming Hu, Xiongshuai Zhang, Zhiwei Tai, Jianmin Wang.
    IEEE TVCG 2022.
  33. Selective Data Acquisition in the Wild for Model Charging
    Chengliang Chai, Jiabin Liu, Nan Tang, Guoliang Li, Yuyu Luo.
    VLDB 2022.
  34. Feature Augmentation with Reinforcement Learning
    Jiabin Liu, Chengliang Chai, Yuyu Luo, Yin Lou, Jianhua Feng, Nan Tang.
    ICDE 2022.
  35. RW-Tree: A Learned Workload-aware Framework for R-tree Construction
    Haowen Dong, Chengliang Chai, Yuyu Luo, Jiabin Liu, Jianhua Feng, Chaoqun Zhan.
    ICDE 2022.
  36. Interactively Discovering and Ranking Desired Tuples by Data Exploration
    Xuedi Qin, Chengliang Chai, Yuyu Luo, Tianyu Zhao, Nan Tang, Guoliang Li, Jianhua Feng, Xiang Yu, Mourad Ouzzani.
    The VLDB Journal 2022.
  37. GALVIS: Visualization Construction through Example-Powered Declarative Programming.
    Leixian Shen, Enya Shen, Zhiwei Tai, Yun Wang, Yuyu Luo, Jianmin Wang.
    CIKM 2022 (Best Demo Paper Honorable Mention).

  38. Year 2021
  39. Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks
    Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, Xuedi Qin
    ACM SIGMOD 2021 [Project Page]

  40. Natural Language to Visualization by Neural Machine Translation
    Yuyu Luo, Nan Tang, Guoliang Li, Jiawei Tang, Chengliang Chai, Xuedi Qin
    IEEE VIS 2021 [Code] [Poster]

  41. nvBench: A Large-Scale Synthesized Dataset for Cross-Domain Natural Language to Visualization Task
    Yuyu Luo, Jiawei Tang, Guoliang Li
    Workshop on NL VIZ 2021 at IEEE VIS 2021

  42. Year 2020
  43. DeepTrack: Monitoring and Exploring Spatio-Temporal Data
    – A Case of Tracking COVID-19 –

    Yuyu Luo, Wenbo Li, Guoliang Li, Nan Tang
    VLDB 2020.
  44. VisClean: Interactive Cleaning for Progressive Visualization.
    Yuyu Luo, Chengliang Chai, Xuedi Qin, Nan Tang, Guoliang Li.
    VLDB 2020. [Video Demonstration]
  45. Interactive Cleaning for Progressive Visualization through Composite Questions.
    Yuyu Luo, Chengliang Chai, Xuedi Qin, Nan Tang, Guoliang Li.
    IEEE ICDE 2020. [Video]
  46. Human-in-the-loop Outlier Detection
    Chengliang Chai, Lei Cao, Guoliang Li, Jian Li, Yuyu Luo, Samuel Madden.
    ACM SIGMOD 2020.
  47. Interactively Discovering and Ranking Desired Tuples without Writing SQL Queries.
    Xuedi Qin, Chengliang Chai, Yuyu Luo, Nan Tang, Guoliang Li.
    ACM SIGMOD 2020. [Video Demonstration]
  48. DEEPEYE: A Data Science System for Monitoring and Exploring COVID-19 Data.
    Yuyu Luo, Nan Tang, Guoliang Li, Tianyu Zhao, Wenbo Li, Xiang Yu.
    IEEE Data Engineering Bulletin, 2020. (invited)
  49. CrowdChart: Crowdsourced-based Data Extraction from Visualization Chart.
    Chengliang Chai, Guoliang Li, Ju Fan, Yuyu Luo.
    IEEE TKDE 2020.

  50. Year 2019
  51. Making Data Visualization More Efficient and Effective: A Survey.
    Xuedi Qin, Yuyu Luo, Nan Tang, Guoliang Li.
    The VLDB Journal.
  52. MathGraph: A Knowledge Graph for Automatically Solving Mathematical Exercises.
    Tianyu Zhao, Yan Huang, Songfan Yang, Yuyu Luo, et al.
    DASFAA 2019. (Best Paper Award)

  53. Year 2018
  54. DeepEye: Towards Automatic Data Visualization.
    ICDE 2018 Highly Cited Papers Top-2
    Yuyu Luo, Xuedi Qin, Nan Tang, Guoliang Li.
    IEEE ICDE 2018. [DeepEye-APIs (Python3.6)]
  55. DeepEye: Creating Good Data Visualizations by Keyword Search (Demo).
    Yuyu Luo, Xuedi Qin, Nan Tang, Guoliang Li, Xinran Wang.
    ACM SIGMOD 2018. [Online Demo]

PhD Students

Selected Awards

Professional Services