TY - THES AB - Large Language Models (LLMs) have immense capabilities in understanding and generating language. They are becoming increasingly powerful and hold significant potential in software engineering, where their applications are being actively explored and investigated. Within the field of software engineering, static analysis involves evaluating a program without executing its source code, enabling early detection of bugs and vulnerabilities.In this study, we focus on evaluating the performance of LLMs in static code analysis tasks, particularly call graph construction for Python, JavaScript, and Java programs. We assessed 26 LLMs, including OpenAI’s GPT series and open-source models such as LLaMA and Mistral, using both existing and newly designed microbenchmarks.As part of this study, we introduced SWARM-CG, a comprehensive benchmarking suite aimed at evaluating call graph construction tools across multiple programming languages including Python, JavaScript, and Java. SWARM-CG facilitates cross-language comparisons and consistent analysis. Additionally, we developed SWARM-JS, a specialized micro-benchmark tailored for JavaScript callgraph analysis tasks. The performance of LLMs was also systematically compared with traditional static analysis approaches, highlighting their relative strengths and limitations.The results of this study reveal that, in Python, traditional tools like PyCG significantly outperform LLMs. For JavaScript, the static analysis tool Jelly surpasses LLMs in soundness, while another tool, TAJS, underperforms due to its limited support for modern language features. Interestingly, LLMs achieve considerably better results in Python compared to JavaScript, where several models produce weak outputs. In the case of Java, the static analysis tool SootUp falls short due to its limited support for the dynamic language features present in the CATS benchmark used for Java evaluations. However, LLMs demonstrate strong performance in Java.These findings highlight the potential of LLMs to assist static code analysis tasks, while also underscoring their current limitations in call graph construction. This study establishes a foundation for integrating LLMs into static analysis workflows and advocates for further research into their optimization and broader applications. AU - Sunil, Rose CY - Paderborn DO - 10.17619/UNIPB/1-2501 DP - Universität Paderborn LA - eng N1 - Tag der Abgabe: 14.12.2024 N1 - Universität Paderborn, Masterarbeit, 2024 PB - Veröffentlichungen der Universität PY - 2026 SP - 1 Online-Ressource (viii, 61 Seiten) T2 - Heinz Nixdorf Institut (HNI) TI - LLMs for call graph construction: benchmarking and evaluation across languages UR - https://nbn-resolving.org/urn:nbn:de:hbz:466:2-57344 Y2 - 2026-02-19T14:44:44 ER -