

## D5.5 – Preliminary EDA tool-suite (draft)

#### Authors<sup>1</sup>

A. Savino (POLITO), S. Di Carlo (POLITO), A. Vallero (POLITO), G. Politano (POLITO)

Version 1.0 - 19/03/2016

| Lead contractor: Politecnico di Torino                                                                                                      |
|---------------------------------------------------------------------------------------------------------------------------------------------|
| Contact person:                                                                                                                             |
| Alessandro Savino<br>Control and Computer Engineering Dep.<br>Politecnico di Torino, C.so Duca degli Abruzzi, 24<br>I-10129 Torino TO Italy |
| E-mail: alessandro.savino@polito.it                                                                                                         |
|                                                                                                                                             |
| Involved Partners <sup>2</sup> : POLITO                                                                                                     |
| Involved Partners <sup>2</sup> : POLITO<br>Work package: WP5                                                                                |
| Involved Partners <sup>2</sup> : POLITO<br>Work package: WP5<br>Affected tasks: T5.6                                                        |
| Involved Partners <sup>2</sup> : POLITO<br>Work package: WP5<br>Affected tasks: T5.6                                                        |

R

ΡU

Ρ

PP

D

RE

Ο

CO

<sup>&</sup>lt;sup>1</sup> Authors listed here only identify persons that contributed to the writing of the document.

Nature of deliverable<sup>3</sup>

**Dissemination level**<sup>4</sup>

<sup>&</sup>lt;sup>2</sup> List of partners that contributed to the activities described in this deliverable.

<sup>&</sup>lt;sup>3</sup> R: Report, P: Prototype, D: Demonstrator, O: Other

<sup>&</sup>lt;sup>4</sup> **PU**: public, **PP**: Restricted to other programme participants (including the commission services), **RE** Restricted to a group specified by the consortium (including the Commission services), **CO** Confidential, only for members of the consortium (Including the Commission services)

# COPYRIGHT

© COPYRIGHT CLERECO Consortium consisting of:

- Politecnico di Torino (Italy) Short name: POLITO
- National and Kapodistrian University of Athens (Greece) Short name: UoA
- Centre National de la Recherche Scientifique Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (France) - Short name: CNRS
- Intel Corporation Iberia S.A. (Spain) Short name: INTEL
- Thales SA (France) Short name: THALES
- Yogitech s.p.a. (Italy) Short name: YOGITECH
- ABB (Norway and Sweden) Short name: ABB
- Universitat Politècnica de Catalunya: UPC

#### CONFIDENTIALITY NOTE

THIS DOCUMENT MAY NOT BE COPIED, REPRODUCED, OR MODIFIED IN WHOLE OR IN PART FOR ANY PURPOSE WITHOUT WRITTEN PERMISSION FROM THE CLERECO CONSORTIUM. IN ADDITION TO SUCH WRITTEN PERMISSION TO COPY, REPRODUCE, OR MODIFY THIS DOCUMENT IN WHOLE OR PART, AN ACKNOWLEDGMENT OF THE AUTHORS OF THE DOCUMENT AND ALL APPLICABLE PORTIONS OF THE COPYRIGHT NOTICE MUST BE CLEARLY REFERENCED

ALL RIGHTS RESERVED.

## INDEX

| COPYRIGHT                                                                                                             | .2                         |
|-----------------------------------------------------------------------------------------------------------------------|----------------------------|
| INDEX                                                                                                                 | .3                         |
| Scope of the document                                                                                                 | .4                         |
| 1. Introduction                                                                                                       | .5                         |
| 2. RELTech Tools<br>2.1. Soft Error Rate Technology Analyzer                                                          | .7<br>7                    |
| <ol> <li>RelHW Tools</li></ol>                                                                                        | .8<br>9<br>10<br>11<br>12  |
| <ul> <li>3.6. NANDA - A tool for the reliability analysis of NAND Flash based SSDs.</li> <li>4. RELSw Tools</li></ul> | 13<br>14<br>15<br>16<br>17 |
| <ol> <li>5. RELSys Tools</li></ol>                                                                                    | 18<br>18<br>19             |

## Scope of the document

This document is the main outcome of task T5.6 "**Preliminary EDA tool-suite**", elaborated in the Description of Work (DoW) of the CLERECO project under Work Package 5 (WP5).

This documents gives a commercial view of all developed tools updated at the date of submission of this document. This is a preliminary version of the deliverable and a consolidated document will be produced at the end of the project.

## 1. Introduction

The CLERECO EDA tool-suite offers tools, models and technologies that cover the four main design dimensions. Figure 1 provides a high-level view of the set of available tools.



Maskii

Statistical Masking Characterization of VHDL IP cores

Figure 1: CLERECO EDA tool-suite

The tools are organized into four clusters, with each cluster identified by a dedicated logo:

- **RELTech Tools:** offers a set of predictive models to analyze the impact of future technology nodes on specific basic design blocks. Predictive models for technologies are very valuable for OEM and Tier 1 system designers that usually lack access to fab technology data.
- RelHW Tools: offers a set of tools for reliability analysis of different hardware architectures. These tools cover the analysis of all major hardware structures of a complex digitals system (i.e., Microprocessors, Accelerators, Memories, Interconnections and Custom IP Cores). A very wide set of ICT players including Tier 2 technology providers, up to OEM system's integrators both in the embedded systems and HPC domain are potentially interested in these tools.
- **ReISW Tools:** offers capability to analyze software fault masking in isolation from the hardware architecture. Both static and dynamic analysis of the software is supported by our technology. These tools have a key value for OEM system designers that exploit software fault tolerance solutions to enhance the reliability of their systems.
- **RelSyst Tools:** is the core of the CLERECO design methodology. It integrates information from the other tools into a high-level system model and provides tools to perform early reliability evaluation at system level as well as tools to perform design space exploration in order to optimize the target system given the reliability constraints. This is specifically devoted to OEM system's designers that require the evaluation of the reliability of their products.

This document provides a commercial overview of each developed. The same descriptions are public on the CLERECO website. This preliminary version of the deliverable only includes the description of those tools that reached high-level of maturity. There is a set of minor tools that are still not included. We are currently deciding whether improving them as stand-alone tools or as functionalities of the already completed tools.

## 2. RELTech Tools

## 2.1. Soft Error Rate Technology Analyzer



# 3. RelHW Tools

# 3.1. MaFIN - Microarchitecture Level Fault Injector for x86 Intel/AMD CPUs



# 3.2. GeFIN - Microarchitecture Level Fault Injector for ARM, Intel and AMD CPUs





#### Microarchitecture Level Fault Injector for ARM, Intel and AMD CPUs

March 2016

#### **Product overview**

GeFIN is a complete microarchitecture level reliability evaluation framework for high performance and embedded computing systems. It is based on state-of-the-art statistical fault injection and built or ACE analysis on Gem5 full-system simulator, providing accurate results for the entire CPU and all its components.

#### Supported Architectures

- ⇒ ARMv7, ARMv8, x86 , Alpha
- ⇒ Comes with ARM Cortex-A15, Cortex-A9 All fields of caches (L1 data and and Intel Haswell presets
- ⇒ Most commercial embedded and high Prefetchers of L1 data, L1 instruction, performance microarchitectures

#### Extensions & Tools

- Fully automated interface
- Benchmark profiling and checkpointing
  - Fault-injection campaign  $\Diamond$
- ♦ Result classification
- Extension with x86 Translation caches
  - Graphical web interface
  - ♦ Live status monitoring
  - Early result classification

# Logged in: asvos

cal@di

Poculto libran



- instruction, L2, L3)
- 12
- Load/Store Queue (all data fields)
- Instruction Queue (all data fields)
- ROB (active list)
- Rename map
- TLB (Instruction and data) • Branch Predictors, RAS, BTB
- Main memory

#### Supported Fault Models

- ⇒ Transient any multiple combina-
- tion of model, compo- $\Rightarrow$  Intermittent
- ⇒ Permanent nent, entry and cycle

#### Measurements

- AVF/FIT, HVF
- Fault effect classification:
- 1. Masked
- 2. Silent Data Corruption (SDC)
- 3. Crash
- 4. Assert
- 5. Timeout
- 6. DUE

Flexible user extensible parser.

Measurements in any unmodified workload.



- Computer Architecture Lab

#### Acceleration with efficient driven simulation

Intelligent acceleration features:

- Workload analysis Initial analysis to effectively drive fault injection only to crusial parts - Introduces a novel grouping technique.
- Simulation speedup Runtime simulation speedup with several acceleration techniques.
  - Up to 1000x faster compared to baseline fault-injection.



**Contact Us** 

March 2016

# 3.3. GuFI - Microarchitecture Level Reliability Evaluation of NVIDIA GPUs



#### Microarchitecture Level Reliability Evaluation of NVIDIA GPUs

#### **Product overview**

GUFI is a tool for comprehensive reliability assessments of NVIDIA GPU Architectures. It is built on top of a state-of-the art micro-architectural simulator GPGPU-Sim. It reports the vulnerability of many on chip hardware components based on Fault Injection (FI) or Architectural Correct Execution (ACE) analysis.

#### Supported Architectures

⇒G80 (Quadro FX 5600)  $\Rightarrow$ GT200 (Quadro FX 5800) ⇒Fermi (GeForce GTX 480, Tesla C2050)

#### Extensions & Tools

Fully automated tools for:

- Fault Injection
  - 1. running the golden run
  - 2. fault mask generation
  - 3. actual fault injection in GPGPU-Sim
- 4. fault classification (Configurable parser according to user needs)
- ACE Analysis
- Both methodologies can be applied to:
  - the whole CUDA application comprehensive reliability evaluation of a hardware component for an application
- a specific kernel invocation reliability evaluation of a hardware component for a given invocation of a CUDA kernel

#### Target Components

- General Purpose Register file
- Shared Memory
- Single Instruction Multiple Thread (SIMT) Stacks
- Valid bit of Instruction buffer entries

#### Supported Fault Models

- Transient any multiple combination of
- model, component, entry • Intermittent and cycle
- Permanent

#### Measurements

- Architectural Vulnerability Factor (AVF)
- AVF of utilized resources (AVF util)
- Failures In Time (FIT)
- Mean Instructions to Failure (MITF)
- Fault effect classification:
- 1. Masked Detectable Unrecoverable Error (DUE) 2
- 3. Silent Data Corruption (SDC)



"Microarchitecture Level Reliability Evaluation of NVIDIA GPU Architectures based on Fault Injection or ACE-Analysis"

- Computer Architecture Lab

#### Ways to use GUFI

GUFI is a useful tool either for architects (early in the design phase) or programmers:

- Architects may evaluate the reliability of various GPU models.
  - Hardware based protection techniques may be incorporated and also evaluated in terms of performance and reliability.
- Programmers can break the vulnerability of an entire application down to the vulnerability of its kernels.
  - Adding software based error protection only to the most vulnerable kernel of an application can deliver remarkable improvements on its error resilience combined with low loss in performance.





Dimitris Gizopoulos Phone: +30 210 727 5145, Fax: +30 210 727 5214 Email: dgizop AT di DOT uoa DOT gr

Version 1.0 - 25/03/2016

Contact Us

# 3.4. SIFI - Microarchitecture Level Reliability Evaluation of AMD Southern Islands GPGPUs



POLITECNICO DI TORINO

Politecnico of Turin, Department of Controls and Computer Engineering Corso Duca degli Abruzzi 24, 10129, Torino, Italy

> Stefano Di Carlo Phone: +39 011 0907080 Fax: +39 011 0907099 Email: stefano.dicarlo@polito.it

# 3.5. MASkIt - Soft Error Rate Predictor for Combination Circuits



# 3.6. NANDA - A tool for the reliability analysis of NAND Flash based SSDs.



## 4. RELSw Tools

## 4.1. LIFILL - A LLVM-based software fault injector



## 4.2. LICFI - A Full features C-Based Fault Injector



**Giorgio Di Natale** Phone: +33 467 41 85 01 Email: giorgio.dinatale@lirmm.fr

Version 1.0 - 25/03/2016

de Montpellier

LIRMM

## 4.3. ALIVE - A LLVM-based Lifetime Variable Analysis





Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier



**Contact Us** LIRMM - CNRS / Université Montpellier UMR 5506 - CC 477, 161 rue Ada, 34095 Montpellier Cedex 5 France

**Giorgio Di Natale** Phone: +33 467 41 85 01 Email: giorgio.dinatale@lirmm.fr

## 4.4. BalTA - Bayesian Instruction Trace Analyzer for x86 Software



## 5. RELSys Tools

## 5.1. SyRA - A full System Reliability Analyzer



## 5.1. ReDO - A full System Reliability Design Optimizer



Stefano Di Carlo Phone: +39 011 0907080 Fax: +39 011 0907099 Email: stefano.dicarlo@polito.it