LLMs for call graph construction: benchmarking and evaluation across languages / by Rose Sunil ; thesis supervisors: Prof. Dr. Eric Bodden and Jun.-Prof. Dr. Mohamed Aboubakr Mohamed Soliman. Paderborn, 2026
Inhalt
- Introduction
- Background
- Related Work
- Research Questions
- Micro-benchmarks
- PyCG: Call-graph Micro-Benchmark
- SWARM-CG: Swiss Army Knife of Call Graph Benchmarks
- SWARM-JS: Call-graph Micro-Benchmark
- CATS: Call-graph Assessment and Test Suite
- Methodology
- Results
- RQ1: Accuracy of LLMs in performing Callgraph Analysis
- RQ2: LLM Accuracy in Call Graph Analysis Across Languages
- RQ3: Comparison of LLMs and Traditional Static Analysis Tools
- Discussion
- Performance of LLMs in Call Graph Analysis
- Performance Across Programming Languages
- Traditional SA Tools and LLMs
- Threats to Validity
- Conclusion
- Appendix
- Appendix
- Prompts
- Example Responses
- Callgraph output generated by gpt-4o model for the Python test case args/assign_return
- Callgraph output generated by the phi3.5-mini-it-3.8b model for the Python test case args/assign_return
- Callgraph output generated by gpt-4o model for the JavaScript test case functions/call
- Callgraph output generated by the mixtral-v0.1-it-8x7b model for the JavaScript test case functions/call
- Callgraph output generated by gpt-4o model for the Java test case VirtualCalls/vc1
- Callgraph output generated by the llama-3.2-1B-Instruct model for the Java test case VirtualCalls/vc1
- Bibliography
