Jiasi Shen

Jiasi Shen Photo
Jiasi Shen 沈嘉思
Ph.D. Candidate, MIT EECS & CSAIL
  • [firstname][at]csail.mit.edu
  • +1-617-253-7768
  • 32 Vassar St, 32-G730, Cambridge, MA 02139
  • Blog   Google Scholar   More links...
  • Serving on the Onward!'22 PC
  • I accepted a faculty position in the Department of Computer Science and Engineering at the Hong Kong University of Science and Technology!
  • Serving on the AISTA'22 PC
  • KumQuat paper to appear in PPoPP'22
  • Serving on the OOPSLA'22 ERC+AEC
More news...

I am a Ph.D. student advised by professor Martin Rinard at MIT. I received my master's degree from MIT and bachelor's degree from Peking University. I will be joining the Hong Kong University of Science and Technology as an Assistant Professor in January 2023.

My main research interests are in programming languages and software engineering. I aim to improve software development by automating tasks that currently require substantial manual engineering effort. My research focuses on developing automatic techniques that analyze, manipulate, and transform software. My broader interests include computer systems and security.


Software plays a central role in numerous aspects of human society. Current software development practices involve significant developer effort in all phases of the software life cycle, including the development of new software, detection and elimination of defects and security vulnerabilities in existing software, maintenance of legacy software, and integration of existing software into more contexts. My research goal is to automate software development tasks and enhance how people create, understand, and improve software. Towards this goal, I have developed automatic techniques that analyze, manipulate, and transform programs. Ongoing and past research projects include:

My research pioneered a new approach, automatic software rejuvenation, which infers an existing program, formulates it as a precise model, and uses the model to regenerate a new program. This paradigm can deliver benefits in many aspects of the software life cycle, such as by generating high-quality code from simple prototypes, improving program comprehension and producing cleaner code, and extracting human knowledge from software and retargeting it to various languages or platforms. In the future, software will serve even more critical tasks in human society. It is important to reduce the fundamental inefficiencies in how people work with software. I aim to capture the maximal benefits from the human effort and knowledge embedded in software and enable new ways to express human creativity.

Recent Papers

  • POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat
    Jiasi Shen, Martin Rinard, Nikos Vasilakis
    PPoPP 2022: 27th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, poster

    We present KumQuat, a system for automatically generating data-parallel implementations of Unix shell commands and pipelines. The generated parallel versions split input streams, execute multiple instantiations of the original pipeline commands to process the splits in parallel, then combine the resulting parallel outputs to produce the final output stream. KumQuat automatically synthesizes the combine operators, with a domain-specific combiner language acting as a strong regularizer that promotes efficient inference of correct combiners. We present experimental results that show that these combiners enable the effective parallelization of our benchmark scripts.

    Paper (pdf)
  • Supply-Chain Vulnerability Elimination via Active Learning and Regeneration
    Nikos Vasilakis, Achilles Benetopoulos, Shivam Handa, Alizee Schoen, Jiasi Shen, Martin Rinard
    CCS 2021: ACM SIGSAC Conference on Computer and Communications Security

    Software supply-chain attacks target components that are integrated into client applications. Such attacks often target widely-used components, with the attack taking place via operations (for example, file system or network accesses) that do not affect those aspects of component behavior that the client observes. We propose new active library learning and regeneration (ALR) techniques for inferring and regenerating the client-observable behavior of software components. Using increasingly sophisticated rounds of exploration, ALR generates inputs, provides these inputs to the component, and observes the resulting outputs to infer a model of the component's behavior as a program in a domain-specific language. We present Harp, an ALR system for string processing components. We apply Harp to successfully infer and regenerate string-processing components written in JavaScript and C/C++. Our results indicate that, in the majority of cases, Harp completes the regeneration in less than a minute, remains fully compatible with the original library, and delivers performance indistinguishable from the original library. We also demonstrate that Harp can eliminate vulnerabilities associated with libraries targeted in several highly visible security incidents, specifically event-stream, left-pad, and string-compare.

    Paper (pdf)
  • Active Learning for Inference and Regeneration of Applications that Access Databases
    Jiasi Shen, Martin Rinard
    TOPLAS 2021: ACM Transactions on Programming Languages and Systems

    We present Konure, a new system that uses active learning to infer models of applications that retrieve data from relational databases. Konure comprises a domain-specific language (each model is a program in this language) and associated inference algorithm that infers models of applications whose behavior can be expressed in this language. The inference algorithm generates inputs and database configurations, runs the application, then observes the resulting database traffic and outputs to progressively refine its current model hypothesis. Because the technique works with only externally observable inputs, outputs, and database configurations, it can infer the behavior of applications written in arbitrary languages using arbitrary coding styles (as long as the behavior of the application is expressible in the domain-specific language). Konure also implements a regenerator that produces a translated Python implementation of the application that systematically includes relevant security and error checks.

    Paper (pdf)

More papers...


    Program Committee: Onward! 2022, AISTA 2022, APLAS 2021, AISTA 2021
    External Review Committee: OOPSLA 2022
    Artifact Evaluation Committee: OOPSLA 2022, PLDI 2021, CAV 2020, SAS 2019, SLE 2016, OOPSLA 2016
    Judges: SPLASH 2021 SRC