Towards Combining the Cognitive Abilities of Large Language Models with the Rigor of Deductive Progam Verification

Abstract

Recent investigations hint at the ability of large language models (LLMs) to generate formal specifications for given program code. In this work, we systematically discuss and categorize different use cases and application scenarios that combine specification synthesis via LLMs with deductive program verification. We present preliminary quantitative experiments on the capabilities of LLMs to generate correct specifications. To this end, we use a prototypical integration of GPT (versions 3.5 and 4o) with the deductive program verifier KeY and the bounded model checker JJBMC. We evaluated our prototype on a set of Java programs that are partially annotated with specifications written in the Java Modeling Language (JML). We show that GPT 4o generates correct annotations in approximately half of all instances across the investigated scenarios. For the case of faulty specifications, we investigate how a feedback loop can help to improve the original answer. Finally, we present a vision of how Large Language Models may support rigorous formal verification of software systems and describe the necessary next steps in this direction.

Publication
12th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation

This work is forthcoming, links will be added once available.

Samuel Teuber
Samuel Teuber
PhD Student

Interested in formal methods for software and machine learning verification with a focus on cyber-physical systems and algorithmic fairness.