Jiasi Shen

Pronunciation
Jiasi Shen Photo
Jiasi Shen 沈嘉思

Assistant Professor
Department of Computer Science and Engineering
The Hong Kong University of Science and Technology
Clear Water Bay, Kowloon, Hong Kong
News
More news...

I am an Assistant Professor in the Department of Computer Science and Engineering at the Hong Kong University of Science and Technology where I lead the HKUST Automated Reasoning and Transformation of Software research group.

My main research interests are in programming languages and software engineering. I aim to improve software development by automating tasks that currently require substantial manual engineering effort. My research focuses on developing automatic techniques that analyze, manipulate, and transform software. My broader interests include computer systems and security.

Before joining HKUST in 2023, I received a PhD degree from Massachusetts Institute of Technology, where I worked with Professor Martin Rinard. I have also received a master's degree from MIT and a bachelor's degree from Peking University.

Teaching

  • COMP2711: Discrete Mathematical Tools for Computer Science (Spring 2024)
  • COMP3021: Java Programming (Fall 2023)

Research

Software plays a central role in numerous aspects of human society. Current software development practices involve significant developer effort in all phases of the software life cycle, including the development of new software, detection and elimination of defects and security vulnerabilities in existing software, maintenance of legacy software, and integration of existing software into more contexts. My research goal is to automate software development tasks and enhance how people create, understand, and improve software. Towards this goal, I have developed automatic techniques that analyze, manipulate, and transform programs. Ongoing and past research projects include:

My PhD research pioneered a new approach, automatic software rejuvenation, which infers an existing program, formulates it as a precise model, and uses the model to regenerate a new program. This paradigm can deliver benefits in many aspects of the software life cycle, such as by generating high-quality code from simple prototypes, improving program comprehension and producing cleaner code, and extracting human knowledge from software and retargeting it to various languages or platforms. In the future, software will serve even more critical tasks in human society. It is important to reduce the fundamental inefficiencies in how people work with software. I aim to capture the maximal benefits from the human effort and knowledge embedded in software and enable new ways to express human creativity.

Recent Papers

  • Generating Access Management Policies from Example Requests
    Jiasi Shen, Homer Strong, Daniel George Peebles, Neha Rungta
    2022: United States Patent (Patent No.: US 11,483,353 B1)

    Access management policies may be generated from example requests. An access management policy may be received. One or more example requests that have expected results when evaluated with respect to the access management policy may be received. Updates to the access management policy may be determined that cause the expected results to occur when a new version of the access management policy based on the updates is enforced. The new version of the access management policy may be generated based on the updates.

    Paper (pdf)
  • Program Inference and Regeneration via Active Learning
    Jiasi Shen
    2022: Ph.D. Thesis, Massachusetts Institute of Technology

    Software now plays a central role in numerous aspects of human society. Current software development practices involve significant developer effort in all phases of the software life cycle, including the development of new software, detection and elimination of defects and security vulnerabilities in existing software, maintenance of legacy software, and integration of existing software into more contexts, with the quality of the resulting software still leaving much to be desired. The goal of my research is to improve software quality and reduce costs by automating tasks that currently require substantial manual engineering effort.
    I present a novel approach for program inference and regeneration, which takes an existing program, learns its core functionality as a black box, builds a model that captures this functionality, and uses the model to generate a new program. The new program delivers the same core functionality but is potentially augmented or transformed to eliminate defects, systematically introduce safety or security checks, or operate successfully in different environments.
    This research enables the rejuvenation and retargeting of existing software and provides a powerful way for developers to express program functionality that adapts flexibly to a variety of contexts. For instance, one benefit is enabling new development methodologies that work with simple prototype implementations as specifications, then use regeneration to automatically obtain clean, efficient, and secure implementations. Another benefit is automatically improving program comprehension and producing cleaner code, making the code more transparent and the developers more productive. A third benefit is automatically extracting the human knowledge crystallized and encapsulated in legacy software systems and retargeting it to new languages and platforms, including languages and platforms that provide more powerful features.
    In this thesis, I present two systems that implement this approach for database-backed programs.

    Paper (pdf)
  • POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat
    Jiasi Shen, Martin Rinard, Nikos Vasilakis
    PPoPP 2022: 27th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, poster

    We present KumQuat, a system for automatically generating data-parallel implementations of Unix shell commands and pipelines. The generated parallel versions split input streams, execute multiple instantiations of the original pipeline commands to process the splits in parallel, then combine the resulting parallel outputs to produce the final output stream. KumQuat automatically synthesizes the combine operators, with a domain-specific combiner language acting as a strong regularizer that promotes efficient inference of correct combiners. We present experimental results that show that these combiners enable the effective parallelization of our benchmark scripts.

    Paper (pdf)
  • Supply-Chain Vulnerability Elimination via Active Learning and Regeneration
    Nikos Vasilakis, Achilles Benetopoulos, Shivam Handa, Alizee Schoen, Jiasi Shen, Martin Rinard
    CCS 2021: ACM SIGSAC Conference on Computer and Communications Security

    Software supply-chain attacks target components that are integrated into client applications. Such attacks often target widely-used components, with the attack taking place via operations (for example, file system or network accesses) that do not affect those aspects of component behavior that the client observes. We propose new active library learning and regeneration (ALR) techniques for inferring and regenerating the client-observable behavior of software components. Using increasingly sophisticated rounds of exploration, ALR generates inputs, provides these inputs to the component, and observes the resulting outputs to infer a model of the component's behavior as a program in a domain-specific language. We present Harp, an ALR system for string processing components. We apply Harp to successfully infer and regenerate string-processing components written in JavaScript and C/C++. Our results indicate that, in the majority of cases, Harp completes the regeneration in less than a minute, remains fully compatible with the original library, and delivers performance indistinguishable from the original library. We also demonstrate that Harp can eliminate vulnerabilities associated with libraries targeted in several highly visible security incidents, specifically event-stream, left-pad, and string-compare.

    Paper (pdf)
  • Active Learning for Inference and Regeneration of Applications that Access Databases
    Jiasi Shen, Martin Rinard
    TOPLAS 2021: ACM Transactions on Programming Languages and Systems

    We present Konure, a new system that uses active learning to infer models of applications that retrieve data from relational databases. Konure comprises a domain-specific language (each model is a program in this language) and associated inference algorithm that infers models of applications whose behavior can be expressed in this language. The inference algorithm generates inputs and database configurations, runs the application, then observes the resulting database traffic and outputs to progressively refine its current model hypothesis. Because the technique works with only externally observable inputs, outputs, and database configurations, it can infer the behavior of applications written in arbitrary languages using arbitrary coding styles (as long as the behavior of the application is expressible in the domain-specific language). Konure also implements a regenerator that produces a translated Python implementation of the application that systematically includes relevant security and error checks.

    Paper (pdf)

More papers...

Service

    Program Committee: OOPSLA 2024, ‹Programming› 2024, Onward! 2023, OOPSLA 2023 (ERC), ECOOP 2023 (ERC), Onward! 2022, OOPSLA 2022 (ERC), AISTA 2022, APLAS 2021, AISTA 2021
    Journal Reviewer: TOSEM 2023
    Organizing Committee: SPLASH 2024 (Student Research Competition Co-Chair), SPLASH 2023 (Student Research Competition Co-Chair and Posters Co-Chair), CCF ChinaSoft 2023 (Forum for Women in Software Engineering Co-Chair)
    Artifact Evaluation Committee: OOPSLA 2023, ECOOP 2023, OOPSLA 2022, PLDI 2021, CAV 2020, SAS 2019, SLE 2016, OOPSLA 2016
    Judges: SPLASH 2021 SRC