I am an Assistant Professor in the Department of Computer Science and Engineering at the Hong Kong University of Science and Technology where I lead the HKUST Automated Reasoning and Transformation of Software research group.
My main research interests are in programming languages and software engineering. I aim to improve software development by automating tasks that currently require substantial manual engineering effort. My research focuses on developing automatic techniques that analyze, manipulate, and transform software. My broader interests include computer systems and security.
Before joining HKUST in 2023, I received a PhD degree from Massachusetts Institute of Technology, where I worked with Professor Martin Rinard. I have also received a master's degree from MIT and a bachelor's degree from Peking University.
Software plays a central role in numerous aspects of human society. Current software development practices involve significant developer effort in all phases of the software life cycle, including the development of new software, detection and elimination of defects and security vulnerabilities in existing software, maintenance of legacy software, and integration of existing software into more contexts. My research goal is to automate software development tasks and enhance how people create, understand, and improve software. Towards this goal, I have developed automatic techniques that analyze, manipulate, and transform programs. Ongoing and past research projects include:
Access management policies may be generated from example requests. An access management policy may be received. One or more example requests that have expected results when evaluated with respect to the access management policy may be received. Updates to the access management policy may be determined that cause the expected results to occur when a new version of the access management policy based on the updates is enforced. The new version of the access management policy may be generated based on the updates.Paper (pdf)
Software now plays a central role in numerous aspects of human society. Current software development practices involve significant developer effort in all phases of the software life cycle, including the development of new software, detection and elimination of defects and security vulnerabilities in existing software, maintenance of legacy software, and integration of existing software into more contexts, with the quality of the resulting software still leaving much to be desired. The goal of my research is to improve software quality and reduce costs by automating tasks that currently require substantial manual engineering effort.
I present a novel approach for program inference and regeneration, which takes an existing program, learns its core functionality as a black box, builds a model that captures this functionality, and uses the model to generate a new program. The new program delivers the same core functionality but is potentially augmented or transformed to eliminate defects, systematically introduce safety or security checks, or operate successfully in different environments.
This research enables the rejuvenation and retargeting of existing software and provides a powerful way for developers to express program functionality that adapts flexibly to a variety of contexts. For instance, one benefit is enabling new development methodologies that work with simple prototype implementations as specifications, then use regeneration to automatically obtain clean, efficient, and secure implementations. Another benefit is automatically improving program comprehension and producing cleaner code, making the code more transparent and the developers more productive. A third benefit is automatically extracting the human knowledge crystallized and encapsulated in legacy software systems and retargeting it to new languages and platforms, including languages and platforms that provide more powerful features.
In this thesis, I present two systems that implement this approach for database-backed programs.
We present KumQuat, a system for automatically generating data-parallel implementations of Unix shell commands and pipelines. The generated parallel versions split input streams, execute multiple instantiations of the original pipeline commands to process the splits in parallel, then combine the resulting parallel outputs to produce the final output stream. KumQuat automatically synthesizes the combine operators, with a domain-specific combiner language acting as a strong regularizer that promotes efficient inference of correct combiners. We present experimental results that show that these combiners enable the effective parallelization of our benchmark scripts.Paper (pdf)
We present Konure, a new system that uses active learning to infer models of applications that retrieve data from relational databases. Konure comprises a domain-specific language (each model is a program in this language) and associated inference algorithm that infers models of applications whose behavior can be expressed in this language. The inference algorithm generates inputs and database configurations, runs the application, then observes the resulting database traffic and outputs to progressively refine its current model hypothesis. Because the technique works with only externally observable inputs, outputs, and database configurations, it can infer the behavior of applications written in arbitrary languages using arbitrary coding styles (as long as the behavior of the application is expressible in the domain-specific language). Konure also implements a regenerator that produces a translated Python implementation of the application that systematically includes relevant security and error checks.Paper (pdf)