Statistical Fault Injection-Based AVF Analysis of a GPU Architecture

N. Farazmand, R. Ubal, D. Kaeli Department Electrical and Computer Engineering Northeastern University

Abstract

—The ever-increasing application of Graphics Processing Units (GPUs) for non-graphics general purpose computing (GPGPU) raises new challenges not found in traditional graphics processing. Reliable computing using an unreliable GPU is one such challenge. In order to guarantee a promising reliability level for GPGPU computing while avoiding significant impact on performance and hardware size, careful analysis of the GPU hardware is inevitable. In this paper, we provide novel insight into the Architectural Vulnerability Factor (AVF) of GPU hardware structures, which are either absent from a CPU architecture or have different design properties than structures present on CPU architectures. Using statistical fault injection to inject faults into register files(REG), local memory(MEM), and active mask stack (AMS), we show that the AMS, a GPU specific structure, is highly vulnerable with 40% AVF-util mandating protection against faults. We also show that the AVF/AVF-util for a GPU register file and local memory are 6%/15% and 1%/3% on average, lower that their typical values in CPU.

You can find the paper here