Schumacher, Tobias: Performance modeling and analysis in high-performance reconfigurable computing. 2011
Inhalt
- Abstract
- Zusammenfassung
- Contents
- 1 Introduction
- 2 Background and Related Work
- 2.1 Accelerated Supercomputing
- 2.2 Performance Modeling, Analysis and Optimization
- 2.2.1 Performance Analysis in High-Performance Computing
- 2.2.2 Analytical Performance Estimation for Reconfigurable HPC
- 2.2.3 Bottleneck Identification and Optimization of Reconfigurable Accelerators
- 2.2.4 Reconfigurable Hardware Characterization
- 2.3 Design Implementation, Verification and Optimization
- 2.3.1 High-Level Language (HLL) Synthesis
- 2.3.2 Visual Design Entry
- 2.3.3 Multi-Core System Generation
- 2.4 Chapter Summary
- 3 Programming, Execution and Performance Model
- 4 The IMORC Architectural Template
- 4.1 Cores, Links, and Channels
- 4.2 Network Topology and Arbitration
- 4.3 Performance Counters
- 4.4 Utility Cores
- 4.4.1 Host Interface Cores
- 4.4.2 Memory Cores
- 4.4.3 Request Generator Cores
- 4.4.4 IMORC-to-Register Interface Core
- 4.4.5 Register-to-IMORC Interface Core
- 4.4.6 Farming Cores
- 4.5 IMORC on the XtremeData XD1000
- 4.6 IMORC Infrastructure Cores and Accelerator Generation
- 4.6.1 Core Generation
- 4.6.2 Communication Infrastructure Generation
- 4.6.3 Simulation
- 4.6.4 Synthesis
- 4.6.5 Execution and Runtime Monitoring
- 4.7 Chapter Summary
- 5 Architecture Characterization
- 5.1 The IMORC Benchmarking Infrastructure
- 5.2 Performance Characterization of the XD1000
- 5.2.1 CPU <-> Host Memory Bandwidth
- 5.2.2 CPU <-> FPGA Communication Initiated by the CPU
- 5.2.3 Burst Read/Write Transfers Initiated by the FPGA
- 5.2.4 Simultaneous Access by Multiple Cores with a Common Access Scheme (Read or Write)
- 5.2.5 Contention Benchmark with Multiple Simultaneous Reads and Writes
- 5.3 Chapter Summary
- 6 Experimental Evaluation
- 6.1 Cube Cut
- 6.1.1 The Cube Cut Algorithm
- 6.1.2 Design and Implementation
- 6.1.3 Architecture Mapping, Implementation and Performance Evaluation
- 6.2 A Compositing Accelerator for a Parallel Rendering Framework
- 6.3 K-th Nearest Neighbor Thinning
- 6.3.1 Application Model
- 6.3.2 IMORC KNN Cores
- 6.3.3 Architecture Generation
- 6.3.4 Numeric Evaluation
- 6.4 Chapter Summary
- 7 Conclusion and Outlook
- Acronyms
- List of Figures
- List of Tables
- Author's Publications
- Bibliography
