Titelaufnahme
Titelaufnahme
- TitelMoRAGBench : a benchmarking framework for RAG pipelines on mobile devices / by Huzaifa Shaaban Kabakibo ; first reviewer: Prof. Dr. Lin Wang, Second reviewer: Prof. Dr. Marco Platzner
- Autor
- Gutachter
- Erschienen
- Umfang1 Online-Ressource (63 Seiten) : Illustrationen, Diagramme
- HochschulschriftUniversität Paderborn, Masterarbeit, 2026
- AnmerkungTag der Abgabe: 19.03.2026
- Datum der Abgabe19.3.2026
- SpracheEnglisch
- DokumenttypMasterarbeit
- Schlagwörter (GND)
- URN
- DOI
Links
- Social MediaShare
- Nachweis
- IIIF
Dateien
Klassifikation
Abstract
Retrieval Augmented Generation (RAG) has emerged as an effective approach for improving the factual grounding and contextual relevance of Large Language Models (LLMs) by combining neural generation with external knowledge retrieval. While most existing RAG systems are designed for server or cloud environments, executing complete pipelines directly on mobile devices introduces significant challenges due to limited computational resources, memory constraints, hardware heterogeneity, and immature software stacks and optimizations. Despite growing interest in on-device intelligence, there remains limited systematic understanding of how individual RAG components behave and contribute to the overall performance under realistic mobile conditions. This thesis presents MoRAGBench, a modular benchmarking framework for evaluating RAG pipelines on Android smartphones. The framework enables configurable experimentation across all stages of the pipeline, including document chunking, embedding generation, indexing and retrieval, augmentation, and LLM inference. To provide comprehensive analysis, MoRAGBench introduces two complementary evaluation modes: an approximate nearest neighbor benchmark that isolates retrieval performance, and an end-to-end task benchmark that measures down-stream question answering quality and overall system efficiency. Extensive experiments conducted on a modern smartphone reveal fundamental trade-offs be-tween retrieval accuracy, latency, throughput, and memory consumption. The results show that efficient on-device RAG deployment is primarily a systems-level challenge in which performance improvement emerges from the interaction between pipeline components and hardware execution backends rather than from individual model improvements alone. The study further highlights limitations of current mobile inference-acceleration frameworks and demonstrates the importance of balanced RAG pipeline configurations and approximate similarity search methods for achieving practical performance. By enabling systematic, reproducible evaluation of RAG systems under mobile constraints, MoRAGBench provides practical insights and a foundation for future research toward efficient, privacy-preserving, fully on-device intelligent assistants. MoRAGBench is fully open-sourced athttps://github.com/upb-cn/MoRAGBench.

