Transient hardware faults simulation in GEM5 – Study of the behavior of multithreaded applications under faults

Konstantinos Parasyris Submitted to the Department of Computer & Communication Engineering in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer & Communication Engineering at

the University Of Thessaly

February 2013

Reliable computing under unreliable circumstances is the next challenge the computing community must overcome. To achieve such a difficult task we need to perform a thorough analysis of the way hardware faults manifest errors to architectural components and how such errors affect the applications behavior. In this direction the first contribution of my diploma thesis is the enhancement of new concepts in an already existed fault injection tool which was created by another thesis and improved by mine. The new framework utilized the Gem5 full cycle accurate simulator in order to enable fault injection. The current tool provides a variety of fault injection methods while it is not limited to models covering radiation or timing induced faults, but also facilitates an easily extensible tool to support future effective fault models. Extensive experimentation showed that our GEM5-based fault injection mechanism was very effective in emulating the behavior of faults in modern high-performance processors running complex workloads. An additional contribution of my thesis is the experimental analysis on two different applications: blackscholes and fluidanimate. We observed that tolerance to injected faults was highly dependent on the spatial location of the faults (e.g. registers, program counter, IF unit, etc.) and on the specific portion of the code affected. To accelerate data gathering and increase simulation speed, we made extensive use of a checkpoint mechanism , called DMTCP (Distributed MultiThreaded CheckPointing), while the whole procedure was automatized to execute on a distributed

You can find the thesisĀ here