Understanding Error Propagation in GPGPU Applications
Guanpeng Li, Karthik Pattabiraman, Chen-Yong Cher and Pradip Bose, International Conference for High-Performance Computing, Storage and Networking (SC), 2016. [PDF | Talk ] (Link to LLFI-GPU) Abstract— GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics…
Modeling Input Dependent Error Propagation in Programs
Guanpeng Li and Karthik Pattabiraman, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2018. [PDF | Talk] Abstract: Transient hardware faults are increasing in computer systems due to shrinking feature sizes. Traditional methods to mitigate such faults are through hardware duplication, which…
One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors
Behrooz Sangchoolie*, Karthik Pattabiraman+, Johan Karlsson* (IFIP-2017) Abstract— Recent studies have shown that technology and voltage scaling are expected to increase the likelihood that particle-induced soft errors manifest as multiple-bit errors. This raises concerns about the validity of using single…
Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications
ABSTRACT Deep learning neural networks (DNNs) have been successful in solving a wide range of machine learning problems. Specialized hardware accelerators have been proposed to accelerate the execution of DNN algorithms for high-performance and energy efficiency. Recently, they have been…
Modeling Soft-Error Propagation in Programs
Abstract—As technology scales to lower feature sizes, devices become more susceptible to soft errors. Soft errors can lead to silent data corruptions (SDCs), seriously compromising the reliability of a system. Traditional hardware-only techniques to avoid SDCs are energy hungry, and…
[gem5-users] Adding CommMonitor between CPU and L1d-cache
CommMonitor L1-dcache and CPU I want to add the CommMonitor between CPU and L1d-cache in SE mode to trace all the memory operation requests in the system. I am running in x-86 SE mode I added following lines in the…
A Cache Error Propagation Model
cache-error-propagation-xhtyau Abstract Cache memory is a small, fast, memory system that holds frequently used data. With increasing processor speed, designer follow aggresive design practices in the design of cache memories. Such design practices increase the probability of fault occurrence and…
High Precision Fault Injections on the Instruction Cache of ARMv7-M Architectures
147-2139vbi Abstract Hardware and software of secured embedded systems are prone to physical attacks. In particular, fault injection attacks revealed vulnerabilities on the data and the control flow allowing an attacker to break cryptographic or secured algorithms implementations. While many…
Injecting Errors for Fun and Profit
error-210yopw INJECTING E-CACHE ERRORS ON THE ULTRASPARC-II “Handling errors is just attention to detail. Injecting errors is rocket science.” —me While the hardware engineers were working on determining the cause of the e-cache parity errors and then working on a…
SST Simulator
SST Simulator Structural Simulation Toolkit ISCA 2015 Tutorial (13th June 2015, Portland, OR) The Structural Simulation Toolkit is a parallel discrete-event simulation framework will allows many different components to connect together in a unified framework. The toolkit provides support for…