Abstract
Modern many-core Graphics Processing Units (GPUs) are extensively employed in general purpose computing (GPGPU), offering a remarkable execution speedup to inherently data parallel workloads. Unlike graphics computing, GPGPU computing has more stringent reliability requirements. Thus, accurate reliability assessment of GPU hardware structures is important for making informed decisions for error protection. In this paper we focus on microarchitecture-level reliability assessment for GPU architectures. The paper makes the following contributions. First, it presents a comprehensive fault injection framework that targets key hardware structures of GPU architectures such as the register file, the shared memory, the SIMT stack and the instruction buffer, which altogether occupy large part of a modern GPU silicon area. Second, it reports our reliability assessment findings for the target structures, when the GPU executes a diverse set of twelve GPGPU applications. Third, it discusses remarkable differences in the results of fault injection when the applications are simulated in the virtual NVIDIA GPUs instruction set (ptx) vs. the actual instruction set (sass). Finally, it discusses how the framework can be employed either by architects in the early stages of design phase or by programmers for a GPU application’s error resilience enhancement.
Details
- BIBTEX:
@INPROCEEDINGS{7482077, author={S. Tselonis and D. Gizopoulos}, booktitle={2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)}, title={GUFI: A framework for GPUs reliability assessment}, year={2016}, pages={90-100}, keywords={computer architecture;graphics processing units;instruction sets;GPGPU;GUFI framework;NVIDIA GPU instruction set;general purpose graphics processing unit;microarchitecture-level reliability assessment;Computer architecture;Graphics processing units;Hardware;Kernel;Microarchitecture;Registers;Reliability;GPGPU;fault injection;microarchitecture simulators;reliability assessment}, doi={10.1109/ISPASS.2016.7482077}, month={April},}
- DOI: 10.1109/ISPASS.2016.7482077
- KEYWORDS: computer architecture;graphics processing units;instruction sets;GPGPU;GUFI framework;NVIDIA GPU instruction set;general purpose graphics processing unit;microarchitecture-level reliability assessment;Computer architecture;Graphics processing units;Hardware;Kernel;Microarchitecture;Registers;Reliability;GPGPU;fault injection;microarchitecture simulators;reliability assessment