CREST | CREST - Centre for Research on Engineering Software Technologies

Enhancing Security Data Quality with Human-Centric Visualization and Machine Learning

In the rapidly evolving landscape of software security, predicting and mitigating vulnerabilities is of utmost importance. Data-driven vulnerability prediction models often rely on the quality of labeled datasets, whesrc/data/projects/assets/image-project-8-1.pngre each code module is annotated as vulnerable or non-vulnerable. Manual labeling is the most reliable approach, given the substantial noise associated with automatic methods. However, many existing manual data labeling processes are inefficient, labor-intensive, and error-prone. This results in low-quality labeled data, particularly with the complexities of modern software systems that contain millions of lines of code or thousands of code modules.

Our project aims to develop human-centric, interactive noise detection and mitigation tools by harnessing the power of artificial intelligence (AI), visualization techniques, along with automatic algorithms and machine learning. These tools are designed to meet the unique requirements of developers, security analysts, and researchers facing security data quality issues. The proposed solution seeks to improve the detection and mitigation of noisy label data, thereby enhancing overall data quality, and ultimately improving the efficiency and effectiveness of human labeling while bolstering the reliability of software vulnerability prediction models.

Publication

Project Members

Kun-Ting Chen
Triet Le