

## SEVENTH FRAMEWORK PROGRAMME

## D5.5 – Preliminary EDA tool-suite (draft)

Project Number: FP7-611404

#### Authors<sup>1</sup>

A. Savino (POLITO), S. Di Carlo (POLITO), A. Vallero (POLITO), G. Politano (POLITO)

Version 1.0 – 19/03/2016

Lead contractor: Politecnico di Torino

Contact person:
Alessandro Savino
Control and Computer Engineering Dep.
Politecnico di Torino, C.so Duca degli Abruzzi, 24
I-10129 Torino TO Italy
E-mail: alessandro.savino@polito.it

Involved Partners²: POLITO

Work package: WP5

Affected tasks: T5.6

| Nature of deliverable <sup>3</sup> | R  | Р  | D  | 0  |
|------------------------------------|----|----|----|----|
| Dissemination level <sup>4</sup>   | PU | PP | RE | СО |

\_

<sup>&</sup>lt;sup>1</sup> Authors listed here only identify persons that contributed to the writing of the document.

<sup>&</sup>lt;sup>2</sup> List of partners that contributed to the activities described in this deliverable.

<sup>&</sup>lt;sup>3</sup> R: Report, P: Prototype, D: Demonstrator, O: Other

<sup>&</sup>lt;sup>4</sup> **PU**: public, **PP**: Restricted to other programme participants (including the commission services), **RE** Restricted to a group specified by the consortium (including the Commission services), **CO** Confidential, only for members of the consortium (Including the Commission services)

## **COPYRIGHT**

#### © COPYRIGHT CLERECO Consortium consisting of:

- Politecnico di Torino (Italy) Short name: POLITO
- National and Kapodistrian University of Athens (Greece) Short name: UoA
- Centre National de la Recherche Scientifique Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (France) - Short name: CNRS
- Intel Corporation Iberia S.A. (Spain) Short name: INTEL
- Thales SA (France) Short name: THALES
- Yogitech s.p.a. (Italy) Short name: YOGITECH
- ABB (Norway and Sweden) Short name: ABB
- Universitat Politècnica de Catalunya: UPC

#### **CONFIDENTIALITY NOTE**

THIS DOCUMENT MAY NOT BE COPIED, REPRODUCED, OR MODIFIED IN WHOLE OR IN PART FOR ANY PURPOSE WITHOUT WRITTEN PERMISSION FROM THE CLERECO CONSORTIUM. IN ADDITION TO SUCH WRITTEN PERMISSION TO COPY, REPRODUCE, OR MODIFY THIS DOCUMENT IN WHOLE OR PART, AN ACKNOWLEDGMENT OF THE AUTHORS OF THE DOCUMENT AND ALL APPLICABLE PORTIONS OF THE COPYRIGHT NOTICE MUST BE CLEARLY REFERENCED

ALL RIGHTS RESERVED.

## **INDEX**

| COPYRIGHT             | 2                              |
|-----------------------|--------------------------------|
| INDEX                 | 3                              |
| Scope of the document | 4                              |
| 1. Introduction       | 5                              |
| RELTech Tools         |                                |
| 3. RelHW Tools        | 8<br>CPUs9<br>10<br>ands<br>11 |
| 4. RELSw Tools        | 14<br>15<br>16                 |
| 5. RELSys Tools       | 18                             |

## Scope of the document

This document is the main outcome of task T5.6 "**Preliminary EDA tool-suite**", elaborated in the Description of Work (DoW) of the CLERECO project under Work Package 5 (WP5).

This documents gives a commercial view of all developed tools updated at the date of submission of this document. This is a preliminary version of the deliverable and a consolidated document will be produced at the end of the project.

## 1. Introduction

The CLERECO EDA tool-suite offers tools, models and technologies that cover the four main design dimensions. Figure 1 provides a high-level view of the set of available tools.



Figure 1: CLERECO EDA tool-suite

The tools are organized into four clusters, with each cluster identified by a dedicated logo:

- **RELTech Tools:** offers a set of predictive models to analyze the impact of future technology nodes on specific basic design blocks. Predictive models for technologies are very valuable for OEM and Tier 1 system designers that usually lack access to fab technology data.
- **RelHW Tools:** offers a set of tools for reliability analysis of different hardware architectures. These tools cover the analysis of all major hardware structures of a complex digitals system (i.e., Microprocessors, Accelerators, Memories, Interconnections and Custom IP Cores). A very wide set of ICT players including Tier 2 technology providers, up to OEM system's integrators both in the embedded systems and HPC domain are potentially interested in these tools.
- **ReISW Tools:** offers capability to analyze software fault masking in isolation from the hardware architecture. Both static and dynamic analysis of the software is supported by our technology. These tools have a key value for OEM system designers that exploit software fault tolerance solutions to enhance the reliability of their systems.
- RelSyst Tools: is the core of the CLERECO design methodology. It integrates information from the other tools into a high-level system model and provides tools to perform early reliability evaluation at system level as well as tools to perform design space exploration in order to optimize the target system given the reliability constraints. This is specifically devoted to OEM system's designers that require the evaluation of the reliability of their products.

This document provides a commercial overview of each developed. The same descriptions are public on the CLERECO website. This preliminary version of the deliverable only includes the description of those tools that reached high-level of maturity. There is a set of minor tools that are still not included. We are currently deciding whether improving them as stand-alone tools or as functionalities of the already completed tools.

## 2. RELTech Tools

## 2.1. Soft Error Rate Technology Analyzer



Soft Error Rate Technology Analyzer

March 2016

#### **Product Overview**

SERTA (SER Technology Analyzer) allows for a fast characterization of raw failure rates of current and future technologies for a variety of components such as memories (i.e. SRAMs) and the most common logic gates (i.e. NAND, NOR, NOT). It also provide a sensitivity analysis to operating conditions such as temperature, voltage and location.

#### Supported Architectures

Any technology based on SPICE description

#### Target Components

- Memories Logic Gates

#### Extensions & Tools

- Compliant with the Hazucha ✓ Soft Errors and Svensson model
- Fully parametrized analysis. Measurements
- Full Technology fair comparison available

#### Supported Fault Models

 SER value across several environmental conditions

## "Technology reliability is not only about the **present**, it

is about the **future** too"

- ARCO Research Group

#### Extra Features

The tool also allows to perform a fair comparison of these technologies and components using the same methodology to compute their SER.







niversitat Politècnica de Catalunya, Dep. of Computer Architecture Campus Nord UPC, Cr. Jordi Girona 1-3, 08034 Barcelona (ES)

## 3. RelHW Tools

# 3.1. MaFIN - Microarchitecture Level Fault Injector for x86 Intel/AMD CPUs



# 3.2. GeFIN - Microarchitecture Level Fault Injector for ARM, Intel and AMD CPUs





Grant Agreement FP7-611404



March 2016

#### **Product overview**

GeFIN is a complete microarchitecture level reliability evaluation framework for high performance and embedded computing systems. It is based on state-of-the-art statistical fault injection and built or ACE analysis on Gem5 full-system simulator, providing accurate results for the entire CPU and all its components.

#### Supported Architectures

- ⇒ ARMv7, ARMv8, x86 , Alpha
- ⇒ Comes with ARM Cortex-A15, Cortex-A9 All fields of caches (L1 data and and Intel Haswell presets
- performance microarchitectures

#### Extensions & Tools

- Fully automated interface
  - ♦ Benchmark profiling and checkpointing
  - Fault-injection campaign
  - ♦ Result classification
- Extension with x86 Translation caches
- Graphical web interface
  - ♦ Live status monitoring
  - Early result classification

#### Target Components

- Physical Register File (Int, FP, CC)
- instruction, L2, L3)
- ⇒ Most commercial embedded and high Prefetchers of L1 data, L1 instruction, L2
  - Load/Store Queue (all data fields)
  - Instruction Queue (all data fields)
  - ROB (active list)
  - Rename map
  - TLB (Instruction and data)
  - Branch Predictors, RAS, BTB
  - Main memory

#### Supported Fault Models

- ⇒ **Transient** any multiple combina-⇒ Intermittent ⇒ **Permanent** nent, entry and cycle

## tion of model, compo-

#### Measurements

- AVF/FIT, HVF
- Fault effect classification:
- 1. Masked
- Silent Data Corruption (SDC)
- 3. Crash
- 4. Assert
- 5. Timeout

Flexible user extensible parser. Measurements in any unmodified work"100 to 1000 times

faster microarchitecture level reliability assessments for Intel/ AMD x86 and ARM

- Computer Architecture Lab

#### Acceleration with efficient driven simulation

processors"

Intelligent acceleration features:

- Workload analysis Initial analysis to effectively drive fault injection only to crusial parts - Introduces a novel grouping technique.
- Simulation speedup Runtime simulation speedup with several acceleration techniques.
- Up to 1000x faster compared to baseline fault-injection.







# 3.3. GuFI - Microarchitecture Level Reliability Evaluation of NVIDIA GPUs







#### Microarchitecture Level Reliability Evaluation of NVIDIA GPUs

March 2016

#### **Product overview**

GUFI is a tool for comprehensive reliability assessments of NVIDIA GPU Architectures. It is built on top of a state-of-the art micro-architectural simulator GPGPU-Sim. It reports the vulnerability of many on chip hardware components based on Fault Injection (FI) or Architectural Correct Execution (ACE) analysis.

#### Supported Architectures

⇒G80 (Quadro FX 5600)

⇒GT200 (Quadro FX 5800)

⇒Fermi (GeForce GTX 480, Tesla C2050)

#### **Extensions & Tools**

Fully automated tools for:

- Fault Injection
  - 1. running the golden run
  - 2. fault mask generation
  - 3. actual fault injection in GPGPU-Sim
  - 4. fault classification (Configurable parser according to user needs)
- ACE Analysis
- Both methodologies can be applied to:
  - the whole CUDA application comprehensive reliability evaluation of a hardware component for an application
  - a specific kernel invocation reliability evaluation of a hardware component for a given invocation of a CUDA kernel

#### Target Components

- General Purpose Register file
- Shared Memory
- Single Instruction Multiple Thread (SIMT) Stacks
- Valid bit of Instruction buffer entries

#### Supported Fault Models

- Transient • Intermittent
- Permanent

any multiple combination of model, component, entry and cycle

#### Measurements

- Architectural Vulnerability Factor (AVF)
- AVF of utilized resources (AVF util) • Failures In Time (FIT)
- Mean Instructions to Failure (MITF)
- Fault effect classification:
- 1. Masked
- Detectable Unrecoverable Error (DUE)
- 3. Silent Data Corruption (SDC)



"Microarchitecture Level Reliability Evaluation of NVIDIA GPU Architectures based on Fault Injection or ACE-Analysis"

- Computer Architecture Lab

#### Ways to use GUFI

GUFI is a useful tool either for architects (early in the design phase) or program-

- Architects may evaluate the reliability of various GPU models.
  - Hardware based protection techniques may be incorporated and also evaluated in terms of performance and reliability.
- Programmers can break the vulnerability of an entire application down to the vulnerability of its kernels.
  - Adding software based error protection only to the most vulnerable kernel of an application can deliver remarkable improvements on its error resilience combined with low loss in performance.







Dimitris Gizopoulos Phone: +30 210 727 5145, Fax: +30 210 727 5214 Email: dgizop AT di DOT uoa DOT gu

# 3.4. SIFI - Microarchitecture Level Reliability Evaluation of **AMD Southern Islands GPGPUs**









Microarchitecture Level Reliability Evaluation of AMD Southern Islands GPGPUs March 2016

#### **Product Overview**

SIFI is a tool for comprehensive reliability assessments of AMD Southern hardware architects Islands GPGPU Architectures. It is built on top of a state-of-the art microarchitectural simulator multi2sim. It can analyze architectural and programmers" vulnerability of many on chip hardware components by Fault Injection (FI) and Architectural Correct Execution (ACE) analysis.

"SIFI is tailored on

#### Supported Architectures

AMD Southern Islands GPGPU Architectures

#### Extensions & Tools

- Fully automated tools for:
  - Fault injection
  - ACE Analysis
- Fully customizable GPGPU architectures for design exploration
- Reliability-related vulnerable code analysis

#### Target Components

- General Purpose Vector Register File
- General Purpose Scalar Register File
- Special Registers of the Scalar Register File
- Local Memory

## Supported Fault Models

- **Transient**
- Intermittent
- **Permanent** Measurements

#### How To Use SIFI

- Fault injection campaign can run in two different modes:
  - 1. by components
  - 2. by internal specific internal resources of each component

- Testgroup (Polito)

- Architectural Vulnerability Factor (AVF)
- AVF of utilized resources (AVF Util)
- Failure In Time (FIT)
- Mean Instruction To Failure (MITF)
- Fault effect classification:
  - Masked
  - 2. Detectable Unrecoverable Error (DUE)
  - 3. Silent Data Corruption (SDC)



POLITECNICO **DI TORINO** 

Politecnico of Turin, Department of Controls and Computer Engineering Corso Duca degli Abruzzi 24, 10129, Torino, Italy

# 3.5. MASkIt - Soft Error Rate Predictor for Combination Circuits





UPC

Universitat Politècnica de Catalunya, Dep. of Computer Architecture Campus Nord UPC, Cr. Jordi Girona 1-3, 08034 Barcelona (ES)

> Martí Anglada Phone: +34-934016988 Email: manglada@ac.upc.edu

# 3.6. NANDA - A tool for the reliability analysis of NAND Flash based SSDs.



A tool for the reliability analysis of NAND Flash based SSDs.

March 2016

#### **Product Overview**

NAND Analyzer is a product for design analysis of NAND Flash based Solid State Drives (SSDs). In includes models to assess:

- Flash memory error rate prediction based on the workload
- Wear-out analysis

Extensions & Tools

algorithms

system

• ECC scheme analysis.

Workload based characteri-

Different ISPP programming

Different ECC configurations

Support for YAFFS2 file

### Target Components

- SLC NAND Flash Memories
- MLC NAND Flash Memories

#### Supported Fault Models

- ✓ Intermittent
- ✓ Permanent

#### How To Use NAND Analyzer

"Evaluate error-rate

and **lifetime** of your

SSD storage system "

- Testgroup (Polito)

- Configure your SSD characteristics
- Explore different design dimensions

#### Measurements

- Bit Error Rates
- Timing
- Power consumption
- Full statistics report







The state of the s

POLITECNICO
DI TORINO

Contact Us
Politecnico
Corso Duco

Politectics of Turin, Department of Controls and Computer Engineering

Stefano Di Carlo Phone: +39 011 0907080 Fax: +39 011 0907099 Email: stefano.dicarlo@polito.it

## 4. RELSw Tools

## 4.1. LIFILL - A LLVM-based software fault injector



#### A LLVM-based software fault injector

March 2016

#### **Product Overview**

LIFILL (LIrmm Fault Injection LLVM-based) is able to inject faults in both data and instructions of the LLVM code. The LLVM source code is modified by applying mutations that implement the effect of the fault on the variable or the instructions.

"We provided you the **passcode** to the **reliability** of any software you develop"

- LIRMM (CRNS)

#### Supported Architectures

**Any** language provided with a LLVM compiler.

#### Extensions & Tools

- Fully Hardware independent
- Controllability on the fault location and its effects.

#### **Target Components**

- Any data (variables, vectors, etc.)
- Any standard LLVM instruction.

#### Supported Fault Models

CLERECO developed Software Fault Models (SFM):

- ✓ Wrong Data
- ✓ Instruction Replacement

## Measurements

- Masking probability
- Fault Silent Violation (FSV)
- Crashed
- Detected Faults

#### System Requirements:

OS: Linux

Tools: clang/llvm

■ RAM: 4GB





Contact Us
LIRMM - CNRS / Université Montpellier
UMR 5506 - CC 477,
161 rue Ada, 34095 Montpellier Cedex 5 France
Giorgio Di Natale

Giorgio Di Natale Phone: +33 467 41 85 01 Email: giorgio.dinatale@lirmm.fr

## 4.2. LICFI - A Full features C-Based Fault Injector





#### A Full features C-Based Fault Injector

March 2016

#### **Product Overview**

LICFI (LIrmm C-Based Fault Injector) randomly inject faults in both data and instructions of a program written in C language. Injections are randomly and dynamically performed while the program is currently running.

"The only feasible way to prove your C program is **reliable** is testing it, quickly"

- LIRMM (CRNS)

#### Supported Architectures

The tool supports all C language programs.

#### Extensions & Tools

- Hardware independent.
- source code, which offers an efficient observability of the software components.
- Execute on the final executable file.
- Easy fault injection mechanism.
- Multi-Thread implementation.

#### **Target Components**

- Any data (variables, vectors, etc.)
- Any **standard** C instruction.

#### Supported Fault Models

Instrumented at the original CLERECO developed Software Fault Models (SFM):

- ✓ Wrong Data
- Instruction Replacement

#### Measurements

- Masking probability
- Fault Silent Violation (FSV)
- Crashed
- **Detected Faults**

#### **Key Concepts**

Instrumentation of the original code allows a selective analysis of the code.

#### System Requirements:

- OS: Linux
- Tools: clang/llvm
- Libraries: pthread
- RAM: 4GB





Giorgio Di Natale Phone: +33 467 41 85 01 Email: giorgio.dinatale@lirmm.fr

## 4.3. ALIVE - A LLVM-based Lifetime Variable Analysis



### A LLVM-based Lifetime Variable Analysis

March 2016

#### **Product Overview**

ALIVE evaluates the effect of faults in all variables of a generic software, by analyzing the variable lifetime and its propagation to the output of the program.

"Before asking if a single fault will impact on your system, ask if it will be seen at all"

#### Supported Architectures

The tool supports **all** programming languages included in the **LLVM** set of compilers.

#### Extensions & Tools

- Very fast evaluation: only one run is required to provide effective results,
- Time accurate,
- Accounts for all possibile making effects,
- Support Software Error Protection strategies,

#### **Target Components**

- Single variables
- Basic Structures (i.e., vectors and matrix)
- Advanced structures (i.e., unions, multi-type containers)

#### Supported Fault Models

CLERECO developed Software Fault Models (SFM):

- ✓ Wrong Data
- ✓ Instruction Replacement

#### Measurements

- Masking probability
- Fault Silent Violation (FSV)
- Crashed
- Detected Faults

- LIRMM (CRNS)

#### **Key Concepts**

#### Variable Lifetime analysis:

- A variable is alive from the first write to the last read (before next write)
  - A fault in an alive variable can have influence on the program execution
  - A fault in a dead variable is masked (will be either rewritten or never used again)





Contact Us LIRMM - CNRS / Université Montpellier UMR 5506 - CC 477, 161 rue Ada, 34095 Montpellier Cedex 5 France

> **Giorgio Di Natale** Phone: +33 467 41 85 01 Email: giorgio.dinatale@lirmm.fr

## 4.4. BalTA - Bayesian Instruction Trace Analyzer for x86 Software









Bayesian Instruction Trace Analyzer for x86 Software

March 2016

#### **Product Overview**

BaITA is a reliability instruction trace analyzer for softwares based on bayesian network. It provides a very fast analysis of each x86 Instruction Set Architecture (ISA) based software exploring real executable traces of the software without the need of the original sources.

"The **only** way to prove your running software is really reliable"

#### Supported Architectures

The tool is able to parse:

- x86 standard instructions
- **AMD** extensions
- **SSE1** & 2 extensions
- **MMX** instructions

#### Extensions & Tools

- Fully automated analysis
   Data propagation

  - Control flow genera-
- Internal parser fully customizable
- Multi-thread analysis capa-
- Reliability model for further investigation provided as output

#### Target Components

- System Registers
  - ES, SS, DS, CS, ...
  - EIP, EDI, ...
- General Purpose Registers
  - EAX, EBX, ...
  - r1x, r2x, ...
- Floating Point Registers MMX registers
- **All** addressable Memory Locations

#### Supported Fault Models

- Transient
- Intermittent
- Permanent

#### Measurements

- AVF/FIT
- Single target error probabili-

- Testgroup (Polito)

#### Extra Features

- Cross-Platform Implementation
- Easy compilation using **CMake**
- Fully customizable parser
- Extendible Target component description
- Compatible with CLERICO MaFIN and GeFIN tools

#### System Requirements

- OS: Linux, OS X 10.8 or later
- Libraries: SMILE
- RAM: 4GB
- Tools: CMake, Bison, Flex



## POLITECNICO **DI TORINO**

Politecnico of Turin, Department of Controls and Computer Engineering Corso Duca degli Abruzzi 24, 10129, Torino, Italy

Stefano Di Carlo Phone: +39 011 0907080 Fax: +39 011 0907099 Email: stefano.dicarlo@polito.it

## 5. RELSys Tools

## 5.1. SyRA - A full System Reliability Analyzer



#### **Product Overview**

SyRA automates reliability analysis of complex electronic systems by means of component based statistical reliability models. SyRA enables to model the target system in terms of components (technology, hardware and software) and resorting to the CLERECO Bayesian reliability engine can efficiently analyze how faults and errors propagate through components, accounting for complex interactions among them that are not modeled with simpler statistical models.

"You don't need to know that your system is reliable, you need to prove it!"

- Testgroup (Polito)

#### Supported Architectures Target Components

- Supported microprocessors 

  All hardware components architectures through other CLERECO tools (ARM Cortex • A9, ARM Cortex A15, x86\_64)
- Single/Multicore architectures
- Single/Multithread applications

#### Extensions & Tools

- Full system stack analyzed (from technology to the application software)
- Detailed hardware and software description
- Montecarlo simulation to account for uncertainty on reliability parameters of the single components
- Very fast analysis for early design exploration.

- and subcomponents.
- All functions of the OS and the Software.

#### Supported Fault Models

- √ Transient
- **Permanent**

#### Measurements

- AVF/FIT
- Influence Probability

#### **Key Features**

- The model is highly parameterized. It enables to include any factor that can potentially affect the reliability of the system (e.g., environmental factors such as location and temperature) by simply adding new variables to the model.
- Full GUI available

#### System Requirements

- OS: Linux, OS X 10.8 or later
- Libraries: SMILE, QT, Boost
  - RAM: 4GB



POLITECNICO DI TORINO

## 5.1. ReDO - A full System Reliability Design Optimizer



#### A full System Reliability Design Optimizer

March 2016

#### **Product Overview**

ReDO improves the reliability of your system design by selecting the best combination components (technology, hardware and software) to meet your design constraints. ReDO let you explores hundreds of design alternatives automatically. ReDO features an advanced optimization algorithm inspired by the Extremal Optimization evolutionary strategy, and it is based on the CLERECO Bayesian reliability engine.

"There is only one way to optimization: be sure you are getting only the **best** of all."

#### Supported Architectures

- Supported microprocessors Full optimization for: architectures through other 

  • All hardware components CLERECO tools (ARM Cortex A9, ARM Cortex A15, x86\_64) • All functions of the OS and
- Single/Multicore architectures
- Single/Multithread applications

#### Extensions & Tools

- Full system stack optimized (from technology to the application software)
- Very fast design exploration
- Full Design exploration logged
- Multi-objective optimization functions
- Maximum optimization time definable by the user based on early stop conditions

#### Target Components

- and subcomponents.
- the Software.

#### Optimization Parameters:

- Reliability
- **Time**
- Area
- **Power Consumption**
- ... any parameter that can be described and evaluated.

#### Measurements

- AVF/FIT
- Percentages of improvement.
- Influence Probability

#### - Testgroup (Polito)

#### Key Features

- Fully design optimization via support of users defining objective functions.
- Full GUI available

#### System Requirements

- OS: Linux, OS X 10.8 or later
- Libraries: SMILE, QT, Boost
- RAM: 4GB





### POLITECNICO Contact Us DI TORINO

Stefano Di Carlo Phone: +39 011 0907080 Fax: +39 011 0907099 Email: stefano.dicarlo@polito.it