Brief Biography
Yuyu directs the Data Intelligence and Analytics Lab DIAL @ HKUST(GZ), where the lab’s mission is to develop next-generation data intelligence systems through research at the intersection of DATA+AI. Its focus areas include large language models, foundation agents, AI for databases (e.g., Text-to-SQL, Table QA), and data-centric AI.
Yuyu has published over 50 papers in top-tier venues in databases and data mining (SIGMOD, VLDB, KDD, TODS) as well as artificial intelligence (ICML, NeurIPS, ICLR, ACL). He has contributed to or advised several influential DATA+AI projects, including OpenManus, Text-to-SQL Handbook, Alpha-SQL, AFlow, DeepEye, and DeepFund. All research projects are open-source and widely used by enterprises such as Huawei, State Grid, ByteDance, Ant Group, and TAL Education.
骆昱宇,现任香港科技大学(广州)助理教授、香港科技大学联署助理教授、博士生导师,数据智能与分析实验室负责人。研究兴趣为DATA+AI数智融合方向,包括数据为中心的人工智能(Data-centric AI, DCAI)、大模型智能体(Foundation Agents)、数据智能体(Data Agents)、智能数据库系统(AI for Databases)。 主持国家自然科学基金青年项目、科技部重点研发计划项目课题等,在数据管理与数据挖掘(SIGMOD/VLDB/ICDE/SIGKDD/TODS)、机器学习(ICML/NeurIPS/ICLR)等领域发表 CCF-A 类论文 40余篇,也担任多个国际顶会的Associate PC Chair/Area Chair和IEEE Data Engineering Bulletin期刊副主编。他获得多个最佳论文/提名奖(如SIGMOD 2023, CIKM 2022, DASFAA 2019),领导或参与开源了多个DATA+AI系统(如OpenManus智能体项目,Github 4.8万+Stars)。他曾获世界人工智能大会云帆奖、福布斯中国“30位30岁以下精英”榜、华为火花奖、清华大学/中国计算机学会优秀博士学位论文奖、清华特等奖学金等多项荣誉。
课题组常年招生DATA+AI数智融合方向(数据智能体、大模型智能体、Data-centric AI、智能数据库系统、AI4S材料大模型)的博士生(3~4名/年)、红鸟硕士生、RA和访问学生。课题组经费充足,计算资源充沛,和业界合作密切,优秀的博士生可与院士或杰青进行联合培养。AI4S材料大模型方向的博士生与张统一院士联合培养。请感兴趣的同学将相关申请材料发送至我邮箱,每份邮件和材料都会仔细阅读。
Preprints
-
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning
[Code]
Xiaotian Lin, Yanlin Qi, Yizhang Zhu, Themis Palpanas, Chengliang Chai, Nan Tang, Yuyu Luo
-
Trainable Dynamic Mask Sparse Attention
[Code]
Jingze Shi, Yifan Wu, Bingheng Wu, Yiran Peng, Liangdong Wang, Guang Liu, Yuyu Luo
-
Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards
Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Jian Tan, Guoliang Li
-
nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity
Tianqi Luo, Chuhan Huang, Leixian Shen, Boyan Li, Shuyu Shen, Wei Zeng, Nan Tang, Yuyu Luo
-
AskChart: Universal Chart Understanding through Textual Enhancement
Xudong Yang, Yifan Wu, Yizhang Zhu, Nan Tang, Yuyu Luo
Surveys
-
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
[ Awesome-Foundation-Agents]
-
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?
[ NL2SQL Handbook]
-
Graph Neural Networks for Databases: A Survey
Selected Publications
Underline indicates students I supervised. Unless required by industry collaborators, all source code is open-sourced on our
Lab’s GitHub repository
.
Year 2025
-
AFlow: Automating Agentic Workflow Generation
Jiayi Zhang, Jinyu Xiang, et al. Yuyu Luo, Chenglin Wu
ICLR 2025 (Oral, top 1.8%). [Code]
-
Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search
Boyan Li, Jiayi Zhang, Ju Fan, Yanwei Xu, Chong Chen, Nan Tang, Yuyu Luo
ICML 2025. [Code] [Slides/PPT, PDF]
-
NL2SQL-Bugs: A Benchmark for Detecting Semantic Errors in NL2SQL Translation
Xinyu Liu, Shuyu Shen, Boyan Li, Nan Tang, Yuyu Luo
KDD 2025. [Dataset and Code]
-
Natural Language to SQL: State of the Art and Open Problems
Yuyu Luo, Guoliang Li, Ju Fan, Nan Tang
VLDB 2025 Tutorial.
-
Data Imputation with Limited Data Redundancy Using Data Lakes
Chenyu Yang, Yuyu Luo, Chuanxuan Cui, Ju Fan, Chengliang Chai, Nan Tang
VLDB 2025.
-
Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation
Changlun Li, Chenyu Yang, Yuyu Luo, Ju Fan, Nan Tang
VLDB 2025.
-
Coreset Selection over Incomplete Data for Data-Effective and Data-Efficient Machine Learning
Chengliang Chai, Nan Tang, Ju Fan, Yuyu Luo, Guoliang Li, Ye Yuan, Guoren Wang
TODS 2025.
-
EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing
[Homepage]
Yizhang Zhu, Runzhi Jiang, Boyan Li, Nan Tang, Yuyu Luo
COLM 2025.
-
Data Interpreter: An LLM Agent For Data Science
Sirui Hong, Yizhang Lin, Bang Liu, Jiayi Zhang, et al, Yuyu Luo, Chenglin Wu
ACL 2025 Findings.
-
DeepVIS: Bridging Natural Language and Data Visualization Through Step-wise Reasoning
Zhihao Shuai, Boyan Li, Siyu Yan, Yuyu Luo, Weikai Yang
IEEE VIS 2025.
-
ChartMark: A Structured Grammar for Chart Annotation
[Homepage]
Yiyu Chen, Yifan Wu, Shuyu Shen, Yupeng Xie, Leixian Shen, Hui Xiong, Yuyu Luo
IEEE VIS 2025 (Short Paper).
-
Augmenting Realistic Charts with Virtual Overlays
Yao Shi, Boyan Li, Yuyu Luo, Lei Chen, Nan Tang
CHI 2025.
Year 2024
-
The Dawn of Natural Language to SQL: Are We Fully Ready?
Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, Nan Tang
VLDB 2024. [Homepage]
-
HAIChart: Human and AI Paired Visualization System
Yupeng Xie, Yuyu Luo, Guoliang Li, Nan Tang
VLDB 2024. [Code]
-
Are Large Language Models Good Statisticians?
Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, Nan Tang
NeurIPS 2024 [ Dataset]
-
VerifAI: Verified Generative AI
Nan Tang, Chenyu Yang, Ju Fan, Lei Cao, Yuyu Luo, Alon Halevy
CIDR 2024.
-
Data Playwright: Authoring Data Videos with Annotated Narration
Leixian Shen, Haotian Li, Yun Wang, Tianqi Luo, Yuyu Luo, Huamin Qu
TVCG 2024. [Homepage]
-
ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering
Yifan Wu, Lutao Yan, Leixian Shen, Yunhai Wang, Nan Tang, Yuyu Luo
EMNLP 2024.
[ Dataset]
-
Fast, Robust and Interpretable Participant Contribution Estimation for Federated Learning
Yong Wang, Yuyu Luo, Kaiyu Li, Guoliang Li, Yunyan Guo, Zhuo Wang
ICDE 2024.
-
Mitigating Data Scarcity in Supervised Machine Learning through Reinforcement Learning Guided Data Generation
Chengliang Chai, Kaisen Jin, Nan Tang, Ju Fan, Lianpeng Qiao, Yu-Ping Wang, Yuyu Luo, Ye Yuan, Guoren Wang
ICDE 2024.
-
CoInsight: Visual Storytelling for Hierarchical Tables with Connected Insights
Guozheng Li, Runfei Li, Yunshan Feng, Yu Zhang, Yuyu Luo, Chi Harold Liu
TVCG 2024.
Year 2023
-
Learned Data-aware Image Representations of Line Charts for Similarity Search
Yuyu Luo, Yihui Zhou, Nan Tang, Guoliang Li, Chengliang Chai, Leixian Shen
SIGMOD 2023. [Slides]
-
GoodCore: Coreset Selection over Incomplete Data for Data-effective and Data-efficient Machine Learning
Chengliang Chai, Jiabin Liu, Nan Tang, Ju Fan, Dongjing Miao, Jiayi Wang, Yuyu Luo, Guoliang Li
SIGMOD 2023. (Best of SIGMOD 2023 Papers) [Slides]
-
Demystifying Artificial Intelligence for Data Preparation
Chengliang Chai, Nan Tang, Ju Fan, Yuyu Luo
SIGMOD 2023. [Tutorial Slides: Part1, Part2, Part3]
Year 2022
-
Steerable Self-driving Data Visualization.
Yuyu Luo, Xuedi Qin, Chengliang Chai, Nan Tang, Guoliang Li, Wenbo Li.
IEEE TKDE 2022.
-
Sevi: Speech-to-Visualization through Neural Machine Translation
Jiawei Tang, Yuyu Luo, Mourad Ouzzani, Guoliang Li, Hongyang Chen.
ACM SIGMOD 2022 (Demo Track).
-
Data Management for Machine Learning: A Survey
Chengliang Chai, Jiayi Wang, Yuyu Luo, Zeping Niu, Guoliang Li.
IEEE TKDE 2022.
-
Towards Natural Language Interfaces for Data Visualization: A Survey
Leixian Shen, Enya Shen, Yuyu Luo, Xiaocong Yang, Xuming Hu, Xiongshuai Zhang, Zhiwei Tai, Jianmin Wang.
IEEE TVCG 2022.
-
Selective Data Acquisition in the Wild for Model Charging
Chengliang Chai, Jiabin Liu, Nan Tang, Guoliang Li, Yuyu Luo.
VLDB 2022.
-
Feature Augmentation with Reinforcement Learning
Jiabin Liu, Chengliang Chai, Yuyu Luo, Yin Lou, Jianhua Feng, Nan Tang.
ICDE 2022.
-
RW-Tree: A Learned Workload-aware Framework for R-tree Construction
Haowen Dong, Chengliang Chai, Yuyu Luo, Jiabin Liu, Jianhua Feng, Chaoqun Zhan.
ICDE 2022.
-
Interactively Discovering and Ranking Desired Tuples by Data Exploration
Xuedi Qin, Chengliang Chai, Yuyu Luo, Tianyu Zhao, Nan Tang, Guoliang Li, Jianhua Feng, Xiang Yu, Mourad Ouzzani.
The VLDB Journal 2022.
-
GALVIS: Visualization Construction through Example-Powered Declarative Programming.
Leixian Shen, Enya Shen, Zhiwei Tai, Yun Wang, Yuyu Luo, Jianmin Wang.
CIKM 2022 (Best Demo Paper Honorable Mention).
Year 2021
-
Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks
Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, Xuedi Qin
ACM SIGMOD 2021
[Project Page]
-
Natural Language to Visualization by Neural Machine Translation
Yuyu Luo, Nan Tang, Guoliang Li, Jiawei Tang, Chengliang Chai, Xuedi Qin
IEEE VIS 2021
[Code] [Poster]
-
nvBench: A Large-Scale Synthesized Dataset for Cross-Domain Natural Language to Visualization Task
Yuyu Luo, Jiawei Tang, Guoliang Li
Workshop on NL VIZ 2021 at IEEE VIS 2021
Year 2020
-
DeepTrack: Monitoring and Exploring Spatio-Temporal Data
– A Case of Tracking COVID-19 –
Yuyu Luo, Wenbo Li, Guoliang Li, Nan Tang
VLDB 2020.
-
VisClean: Interactive Cleaning for Progressive Visualization.
Yuyu Luo, Chengliang Chai, Xuedi Qin, Nan Tang, Guoliang Li.
VLDB 2020.
[Video Demonstration]
-
Interactive Cleaning for Progressive Visualization through Composite Questions.
Yuyu Luo, Chengliang Chai, Xuedi Qin, Nan Tang, Guoliang Li.
IEEE ICDE 2020.
[Video]
-
Human-in-the-loop Outlier Detection
Chengliang Chai, Lei Cao, Guoliang Li, Jian Li, Yuyu Luo, Samuel Madden.
ACM SIGMOD 2020.
-
Interactively Discovering and Ranking Desired Tuples without Writing SQL Queries.
Xuedi Qin, Chengliang Chai, Yuyu Luo, Nan Tang, Guoliang Li.
ACM SIGMOD 2020. [Video Demonstration]
-
DEEPEYE: A Data Science System for Monitoring and Exploring COVID-19 Data.
Yuyu Luo, Nan Tang, Guoliang Li, Tianyu Zhao, Wenbo Li, Xiang Yu.
IEEE Data Engineering Bulletin, 2020. (invited)
-
CrowdChart: Crowdsourced-based Data Extraction from Visualization Chart.
Chengliang Chai, Guoliang Li, Ju Fan, Yuyu Luo.
IEEE TKDE 2020.
Year 2019
-
Making Data Visualization More Efficient and Effective: A Survey.
Xuedi Qin, Yuyu Luo, Nan Tang, Guoliang Li.
The VLDB Journal.
-
MathGraph: A Knowledge Graph for Automatically Solving Mathematical Exercises.
Tianyu Zhao, Yan Huang, Songfan Yang, Yuyu Luo, et al.
DASFAA 2019. (Best Paper Award)
Year 2018
-
DeepEye: Towards Automatic Data Visualization.
ICDE 2018 Highly Cited Papers Top-2
Yuyu Luo, Xuedi Qin, Nan Tang, Guoliang Li.
IEEE ICDE 2018.
[DeepEye-APIs (Python3.6)]
-
DeepEye: Creating Good Data Visualizations by Keyword Search (Demo).
Yuyu Luo, Xuedi Qin, Nan Tang, Guoliang Li, Xinran Wang.
ACM SIGMOD 2018.
[Online Demo]
PhD Students
2023
- Tianqi Luo (MS from Johns Hopkins University)
- Xinyu Liu (BEng from Northeastern University, China)
- Liangwei Wang (co-advised with Tsung Fugee, BEng from Sun Yat-sen University, China)
- Yao Shi (B.S. from Univ. of Elec. Sci. and Tech. of China)
2024
- Changlun Li (co-advised with Nan Tang, B.S. from CUHK)
- Jiayi Zhang (from Renmin University of China)
- Boyan Li (from HKUST(GZ))
- Shuyu Shen (from HKUST(GZ))
2025
- Xiaotian Lin (from HKUST(GZ))
- Yupeng Xie (from HKUST(GZ))
- Yizhang Zhu (from HKUST(GZ))
- Zihan Sun (from Tsinghua Univ.)
- Zhangyang Peng (from Wuhan Univ.)
Selected Awards
- 2025 - World Artificial Intelligence Conference Yunfan Award (one of 15 global recipients under the age of 30, 2025)
- 2025 - Huawei Spark Award
- 2023 - Forbes China 30 Under 30 List (入选2023福布斯中国30 Under 30榜单)
- 2023 - CCF Doctoral Dissertation Nomination Award
- 2023 - Best of SIGMOD 2023 Papers
- 2023 - Distinguished Doctoral Graduate of Tsinghua University (清华优博士)
- 2023 - Distinguished Doctoral Dissertation Award of Tsinghua University (清华优秀博士学位论文)
- 2023 - Rising Star in Data Visualization, CSIG VIS
- 2022 - CIKM 2022 Best Paper Honorable Mention (Demo Track)
- 2021 - Zhejiang Lab’s International Talent Fund for Young Professionals
- 2020 - Tsinghua Top Grade Scholarship (清华大学特等奖学金)
(The highest award in the Tsinghua Univ.)
- 2020 - Zhong Shimo Scholarship (钟士模奖学金), Tsinghua University
(The highest award in the Dept. of CST)
- 2020 - China National Scholarship, Ministry of Education of China
- 2019 - Excellent Paper Award – Big Data Mining and Analytics
- 2019 - DASFAA 2019 Best Student Paper Award
Professional Services
- Session Chair: VLDB 2024
- PC Member: VLDB 2023-2025, ICDE 2024-2025, ICLR 2025
- Conference Reviewer: IEEE VIS 2021-2024, CHI 2022-2025
- Journal Reviewer: ACM Transactions on Database Systems, TVCG, ACM/IMS TDS, Data Science and Engineering
- Conference Volunteer: SIGMOD 2021/2023