LLMs for call graph construction: benchmarking and evaluation across languages / by Rose Sunil ; thesis supervisors: Prof. Dr. Eric Bodden and Jun.-Prof. Dr. Mohamed Aboubakr Mohamed Soliman. Paderborn, 2026

Inhalt

Introduction
Background

Static Analysis and Call Graph
Call Graph Construction Tools

PyCG
Jelly
TAJS
SootUp

Large Language Models

Related Work
Research Questions
Micro-benchmarks

PyCG: Call-graph Micro-Benchmark
SWARM-CG: Swiss Army Knife of Call Graph Benchmarks
SWARM-JS: Call-graph Micro-Benchmark
CATS: Call-graph Assessment and Test Suite

Methodology

Model Selection
Prompt Design
Evaluation Metrics

Completeness and Soundness
Exact matches
Time

Implementation Details

LLM Experiments
Static Analysis Tools

Results

RQ1: Accuracy of LLMs in performing Callgraph Analysis

Python Callgraph Analysis
JavaScript Callgraph Analysis
Java Callgraph Analysis

RQ2: LLM Accuracy in Call Graph Analysis Across Languages
RQ3: Comparison of LLMs and Traditional Static Analysis Tools

Python
JavaScript
Java

Discussion

Performance of LLMs in Call Graph Analysis
Performance Across Programming Languages
Traditional SA Tools and LLMs

Threats to Validity
Conclusion
Appendix
Appendix

Prompts
Example Responses

Callgraph output generated by gpt-4o model for the Python test case args/assign_return
Callgraph output generated by the phi3.5-mini-it-3.8b model for the Python test case args/assign_return
Callgraph output generated by gpt-4o model for the JavaScript test case functions/call
Callgraph output generated by the mixtral-v0.1-it-8x7b model for the JavaScript test case functions/call
Callgraph output generated by gpt-4o model for the Java test case VirtualCalls/vc1
Callgraph output generated by the llama-3.2-1B-Instruct model for the Java test case VirtualCalls/vc1

Bibliography

Publizieren

Besondere Sammlungen

Digitalisierungsservice

Hilfe

Impressum

Datenschutz

Inhalt