BY JON CAMERON CALHOUN DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2017 Abstract As high-performance computing (HPC) continues…

Konstantinos Parasyris∗ Georgios Tziantzoulis† Christos Antonopoulos‡ Nikolaos Bellas§ ∗‡§Dept. of Electrical and Computer Eng. ∗‡§I.RE.TE.TH. †Computer Science Dept. University Of Thessaly Centre for Research and Technology, Hellas Northwestern University Volos, Greece Volos, Greece Chicago, U.S.A. E-mail: ∗koparasy,‡cda,§nbellas@inf.uth.gr, †georgiostziantzioulis2011@u.nortwestern.edu Abstract— Dependable…

Konstantinos Parasyris Submitted to the Department of Computer & Communication Engineering in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer & Communication Engineering at the University Of Thessaly February 2013 Reliable computing under unreliable…

SEPTEMBER 2015 CHUAN ZHANG B.S., BEIJING INSTITUTE OF TECHNOLOGY M.S.M.E., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Israel Koren Traditional fault tolerant techniques such as hardware or time redundancy incur high overhead and are inefficient for checking arithmetic operations. Our…

N. Farazmand, R. Ubal, D. Kaeli Department Electrical and Computer Engineering Northeastern University Abstract —The ever-increasing application of Graphics Processing Units (GPUs) for non-graphics general purpose computing (GPGPU) raises new challenges not found in traditional graphics processing. Reliable computing using…

Guanpeng Li, Karthik Pattabiraman, Chen-Yong Cher and Pradip Bose, International Conference for High-Performance Computing, Storage and Networking (SC), 2016.  [PDF | Talk ] (Link to LLFI-GPU) Abstract— GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics…

Guanpeng Li and Karthik Pattabiraman, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2018. [PDF | Talk] Abstract: Transient hardware faults are increasing in computer systems due to shrinking feature sizes. Traditional methods to mitigate such faults are through hardware duplication, which…

Behrooz Sangchoolie*, Karthik Pattabiraman+, Johan Karlsson* (IFIP-2017) Abstract— Recent studies have shown that technology and voltage scaling are expected to increase the likelihood that particle-induced soft errors manifest as multiple-bit errors. This raises concerns about the validity of using single…

ABSTRACT Deep learning neural networks (DNNs) have been successful in solving a wide range of machine learning problems. Specialized hardware accelerators have been proposed to accelerate the execution of DNN algorithms for high-performance and energy efficiency. Recently, they have been…