Integrated electronic systems are more and more used in a wide number of applications and environments, ranging from mobile devices to safety-critical products. This wide distribution is mainly due to the miniaturization surrounded by an increasing computing power of semiconductor devices. However, there are many complex and arduous challenges associated to this phenomenon. One of these challenges is the reliability of electronic systems. Nowadays, several research e↵orts are aimed at improving the semiconductors reliability. Manufacturing processes, aging phenomena of components and environmental stress may cause internal permanent defects and damages during the lifetime of a device; in the other side, the environment in which these devices are employed could introduce soft errors (i.e., errors that do not damage the device but a data during the computation) in their internal circuitry, thus compromising the correct behavior of the whole system. Consequently, in order to guarantee product quality and consumer satisfaction, it is necessary to discover faults as soon as possible (both, in the manufacturing process and during the devices lifetime); moreover, it is equally important to provide the electronic systems with fault tolerance equipments aimed to assure a correct functioning in every condition. Despite the reliability requirements, modern electronic systems require also an increasing computational power to satisfy the customers needs. In order to face to this demand, in the last two decades di↵erent powerful computational devices have been designed and developed. They are mainly based on architectures allowing the execution of multiple computations in parallel at the same time. Among the others, the Very Long Instruction Word (VLIW) processors are a particular type of multicore and reconfigurable processors; they have been developed to perform several operations in parallel, where the scheduling of the operations themselves is completely demanded at the compiler: VLIWs are suitable for systems requiring high computational performance maintaining a reduced power consumption. Another interesting type of multicore computational units are the General Purpose Graphics Processing Units (GPGPUs): their very high computational power, combined with low cost, reduced power consumption, and flexible development platforms are pushing their adoption not only for graphical applications, but also in the High Performance Computing (HPC) market and in embedded devices. Moreover, GPGPUs are increasingly used in some safety-critical embedded domains, such as automotive, avionics, space and biomedical. The main in common feature of VLIWs and GPGPUs is that they can be used in a System-on-Chip (SoC) as computational co-processors: in a typical SoC, in fact, the main Central Processing Unit (CPU) is in charge of demand and supervise the execution of data intensive operations to these architectures; in this way, the workload of the CPU itself is lower. As an example, in the NASA labs, VLIWs have been evaluated to efficiently perform image analysis on board a Mars rover for future space missions, while the main CPU of the system is available to perform other realtime control operations. In the other hand, the Advanced Driver Assistance Systems (ADASs) which are increasingly common in cars, uses GPGPUs or GPGPU-like devices to analyze images (or radar signals) coming from external cameras and sensors to detect possible obstacles, requiring the automatic intervention of the breaking system. In this PhD thesis, several new techniques have been developed with the common goal of improving the reliability characteristics of multicore processing units. More in particular, considering VLIW processors, new test and diagnostic methods have been studied and implemented in order to detect permanent faults; they are mainly based on the Software-Based Self-Test (SBST) technique. The final goal is to reduce the time required to perform the test of a generic VLIW processor, and to efficiently localize the faulty module. On the other hand, the present dissertation focus on the e↵ects introduced by soft errors in GPGPU devices; this works have been done through the execution of several neutron radiation tests. At the end of these analysis, new techniques finalized to the fault tolerance enhancement of GPGPU applications have been proposed. As industrial case, the validation of a programmable timing multicore co-processor module (i.e., the Generic Timer Module manufactured by Bosch) used in the today automotive Electronic Control Units (ECUs) has been designed and implemented. More in particular, an FPGA-based validation platform has been developed, where one of its main feature is the ability to efficiently verify the behavior of the module under test, thus ensuring a correct implementation of the software running on it. This work has been done in collaboration with General Motors Powertrain Europe (site of Torino, Italy).

New Test and Fault Tolerance Techniques for Reliability Characterization of Parallel and Reconfigurable Processors / Sabena, Davide. - (2015).

New Test and Fault Tolerance Techniques for Reliability Characterization of Parallel and Reconfigurable Processors

SABENA, DAVIDE
2015

Abstract

Integrated electronic systems are more and more used in a wide number of applications and environments, ranging from mobile devices to safety-critical products. This wide distribution is mainly due to the miniaturization surrounded by an increasing computing power of semiconductor devices. However, there are many complex and arduous challenges associated to this phenomenon. One of these challenges is the reliability of electronic systems. Nowadays, several research e↵orts are aimed at improving the semiconductors reliability. Manufacturing processes, aging phenomena of components and environmental stress may cause internal permanent defects and damages during the lifetime of a device; in the other side, the environment in which these devices are employed could introduce soft errors (i.e., errors that do not damage the device but a data during the computation) in their internal circuitry, thus compromising the correct behavior of the whole system. Consequently, in order to guarantee product quality and consumer satisfaction, it is necessary to discover faults as soon as possible (both, in the manufacturing process and during the devices lifetime); moreover, it is equally important to provide the electronic systems with fault tolerance equipments aimed to assure a correct functioning in every condition. Despite the reliability requirements, modern electronic systems require also an increasing computational power to satisfy the customers needs. In order to face to this demand, in the last two decades di↵erent powerful computational devices have been designed and developed. They are mainly based on architectures allowing the execution of multiple computations in parallel at the same time. Among the others, the Very Long Instruction Word (VLIW) processors are a particular type of multicore and reconfigurable processors; they have been developed to perform several operations in parallel, where the scheduling of the operations themselves is completely demanded at the compiler: VLIWs are suitable for systems requiring high computational performance maintaining a reduced power consumption. Another interesting type of multicore computational units are the General Purpose Graphics Processing Units (GPGPUs): their very high computational power, combined with low cost, reduced power consumption, and flexible development platforms are pushing their adoption not only for graphical applications, but also in the High Performance Computing (HPC) market and in embedded devices. Moreover, GPGPUs are increasingly used in some safety-critical embedded domains, such as automotive, avionics, space and biomedical. The main in common feature of VLIWs and GPGPUs is that they can be used in a System-on-Chip (SoC) as computational co-processors: in a typical SoC, in fact, the main Central Processing Unit (CPU) is in charge of demand and supervise the execution of data intensive operations to these architectures; in this way, the workload of the CPU itself is lower. As an example, in the NASA labs, VLIWs have been evaluated to efficiently perform image analysis on board a Mars rover for future space missions, while the main CPU of the system is available to perform other realtime control operations. In the other hand, the Advanced Driver Assistance Systems (ADASs) which are increasingly common in cars, uses GPGPUs or GPGPU-like devices to analyze images (or radar signals) coming from external cameras and sensors to detect possible obstacles, requiring the automatic intervention of the breaking system. In this PhD thesis, several new techniques have been developed with the common goal of improving the reliability characteristics of multicore processing units. More in particular, considering VLIW processors, new test and diagnostic methods have been studied and implemented in order to detect permanent faults; they are mainly based on the Software-Based Self-Test (SBST) technique. The final goal is to reduce the time required to perform the test of a generic VLIW processor, and to efficiently localize the faulty module. On the other hand, the present dissertation focus on the e↵ects introduced by soft errors in GPGPU devices; this works have been done through the execution of several neutron radiation tests. At the end of these analysis, new techniques finalized to the fault tolerance enhancement of GPGPU applications have been proposed. As industrial case, the validation of a programmable timing multicore co-processor module (i.e., the Generic Timer Module manufactured by Bosch) used in the today automotive Electronic Control Units (ECUs) has been designed and implemented. More in particular, an FPGA-based validation platform has been developed, where one of its main feature is the ability to efficiently verify the behavior of the module under test, thus ensuring a correct implementation of the software running on it. This work has been done in collaboration with General Motors Powertrain Europe (site of Torino, Italy).
2015
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2593389
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo