Enhancing automatic program repair via large language model
In the modern digital era, software systems are deeply ingrained in daily operations, facilitating crucial tasks ranging from monitoring financial transactions to managing transportation systems and enhancing healthcare tools. Such pervasive utilization emphasizes the detrimental impact of software bugs, which not only influence people globally but also accrue significant financial losses. In addressing these bugs, developers dedicate an estimated 35 to 50% of their time to debugging efforts. While Automated Program Repair (APR) techniques have been introduced to automate bug rectifications, they are often marred by limitations: they typically assume pre-known bug locations, necessitating the use of a vulnerability detector, and are constrained by input token lengths, limiting contextual understanding crucial for effective vulnerability repair.
To surmount these challenges, our project endeavors to pioneer an end-to-end file-level vulnerability repair model, leveraging cutting-edge large language models capable of processing an expansive context window of up to 16,000 tokens. This proficiency enables our proposed model to concurrently process five average-sized Python files, positioning it as an optimal solution for comprehensive file-level APR. Furthermore, by integrating advanced fine-tuning techniques, such as instruction fine-tuning and LORA, and incorporating the abstract syntax tree, we aim to adeptly tailor large language models to the nuanced demands of APR applications.
Project Members
- Boyu Zhang