Some pictures collected at the CLERECO stand during DATE 2104 un Dresden, Germany.
Foutris, N.; Gizopoulos, D.; Chatzidimitriou, A.; Kalamatianos, J.; Sridharan, V., in Proceedings of the 10th IEEE Workshop on Silicon Errors in Logic – System Effects (SELSE 2014), Stanford, CA, USA, April 1-2, 2014.
This WP includes all activities related for the appropriate coordination of both technical and administrative work between the CLERECO project partners and between them and the EC. Internal and external information flows will be defined, implemented and used as a result.
Leader: POLITO
Participants: UoA, CNRS, INTEL, THALES, YOGITECH, ABB
Dissemination of the CLERECO results to several selected groups of interested parties and exploitation of project results at the industrial and academic level is a key objective of the CLERECO project that will be pursued through the tasks of this WP. Given the importance of this activity a detailed dissemination plan and an exploitation strategy for the project will be developed and continuosly updated to reflect the results obtained from the research WPs.
Leader: INTEL
Participants: POLITO, UoA, CNRS, THALES, YOGITECH, ABB
WP6 is responsible for the validation of CLERECO early reliability estimation methodology and for the demonstration of project results exploiting deliverables and methods coming from WP3, WP4 and WP5.
Two main activities will be carried out within this WP. Although the main load of WP6 is at the end of the project, during the first half of the project it will define the evaluation methodology. The first activity will focus on the automation of all algorithms and methods defined within WP6 for the CLERECO reliability analysis within a prototype Electronic Design Automation (EDA) tool-suite. This is a mandatory task to enable the application of CLERECO methods to real test cases. The second activity will instead focus on the definition of realistic use-cases on which CLERECO concepts can be efficiently demonstrated and validated. This second activity will be steered by CLERECO industrial partners that will cooperate to the definition of relevant application examples.
Reliability of selected use cases will be analyzed through the developed EDA tool-suite at different design stages, considering different sets of available information, thus reproducing realistic situations typical of a product design cycle. Reliability results obtained through the use of the CLERECO method will be constantly compared to reliability measures obtained through traditional extensive (and clearly costly and time consuming) fault injection campaigns as well as laser/EM injections. This will enable CLERECO partners to clearly assess the accuracy of CLERECO estimation.
Eventually, CLERECO optimization design heuristics will be exploited to show how project results will help designers in optimizing developed systems gaining better performances.
Leader: THALES
Participants: POLITO, UoA, CNRS, INTEL, YOGITECH, ABB
WP5 contains the core activities of this project. Descriptions of the target systems and related parameters will be integrated into a comprehensive statistical model that will be used to estimate reliability metrics defined in WP2 (iteratively in the different design stages of the system). Together with reliability assessment, WP5 includes research on the development of algorithms to support designers with valuable instruments for reliability related decision-making process that will in turn allow the design of reliable systems with improved cost-related characteristics (area, energy/power, and performance) and reduced TTM.
The leading concepts that will be pursued are:
WP5 has also a key harmonization role of the research activity of this project. WP2, WP3, WP4 and WP5 are closely related WPs that require an intensive exchange of information to achieve their goals. Information must be properly represented and standardized in order to guarantee easy and reliable circulation and integration among tasks. WP5 is in charge of this through a dedicated task.
Finally, due to the complexity of the activities performed in this WP a constant validation of preliminary and intermediate results is mandatory. Research activities within WP5 will be therefore organized as a continue alternation of solutions development and preliminary validation activities on simple cases.
Leader: POLITO
Participants: UoA, CNRS, INTEL, THALES, YOGITECH, ABB
Similarly to WP3, WP4 iteratively breaks down the software stack into its basic components (from high-level application software modules down to the instruction set architecture level) that will be characterized form the reliability standpoint.
To enable early reliability estimations, software analysis must be possible at early system design stages, even when a target platform is not yet defined. To cope with this requirement, WP4 aims at defining metrics and models enabling to abstract the behavior of the software no matter the specific hardware architecture of the system.
Several activities will be addressed:
Once all these issues are covered, we will analyze each software component level: system, selected drivers and application. This analysis will be the foundation for the construction of a set of characterized software modules to be used in WP5 and WP6.
Finally, WP4 will also be engaged in the production of a preliminary library of characterized modules that will be exploited for the validation and demonstration activity of this project. Similar to WP3, realizing a full comprehensive library of components is out of the capacity of this project and we will only show the path for the analysis of future use cases.
Leader: CNRS
Participants: POLITO, UoA, INTEL, THALES, YOGITECH, ABB
In this WP hardware systems are iteratively broken down into their basic components that will be characterized form the reliability standpoint. With characterization here we intend the computation of specific parameters and measures potentially impacting the overall system reliability (e.g., area, error masking probability, resource utilization, timing constraints, etc.). This approach follows the concept of modular design/IP-reuse, for fast TTM that is a common challenge for both HPC and ES applications.
One of the key aspects of CLERECO is its cross-layer approach to reliability evaluation where all systems elements from the raw technologies up to the software layers are carefully considered with respect to their impact on system reliability. Hardware architectures will be analyzed considering their components (CPUs, memories, accelerators, peripherals, interconnects, etc.) at different levels of detail but always maintaining a connection between components and related user available instructions. The reliability-related behavior of hardware components at different stages of the system design cycle will be evaluated (in isolation): from very early specification stages, through high-level and more detailed design stages, down to the prototyping phases of the design flow. Thus, the reliability evaluation will be an iterative process providing different levels of detail while moving from the conceptual design phases, through all intermediate design phases, down to the post-silicon design validation phase.
Different sets of hardware components that will be studied are the following:
During the course of the project, different hardware components may emerge or may prevail in different market segments. The CLERECO research work is adaptable and flexible enough to consider the characteristics of new components even in later stages when the reliability evaluation framework has already been setup to some extent. Therefore, the impact of the emerging components in the overall system reliability will be normally considered.
For each hardware component, a set of important parameters for the reliability of the system will be extracted. In particular, the following important information will be either estimated or actually measured (depending on the level of abstraction of each phase of the design cycle):
Both ASIC and FPGA implementation of different cores will be analyzed in this WP to fully consider all design alternatives available when working on real computing systems.
Finally, WP3 will be also engaged in the implementation of a preliminary library of characterized modules that will be employed for the validation and demonstration activity of this project in WP6. It is worth to mention here that realizing a full comprehensive library of components is out of the capacity of this project. We will show here the path for the analysis of future use cases focusing on instruments that will allow a fast technology transfer from the research domain to real cases.
Leader: UoA
Participants: POLITO, CNRS, INTEL, THALES, YOGITECH, ABB
WP1 analyzes the different failure mechanisms that will be relevant in future technologies, which will be likely employed in the computing continuum (scaled bulk CMOS, III-V Ge, Finfets, spin logic, etc.) and works on identifying and characterizing the main sources of failure. Moreover, this WP also sets the reliability requirements for the different computing segments within the computing continuum such as ES and HPC.
The starting point of WP1 activities consists on studying the defects and reliability failure mechanisms that are anticipated in future computing systems (due to technology and architectural specifications). This operation involves the selection of a set of use cases ranging from very specific ES applications to general-purpose HPC systems. Among the wide range of possible use cases we will target those in which architectural and technological solutions among the ES and the HPC segments are rapidly converging including for instance multi/many cores in the 22/16nm Finfet technology node. Identified sources of failure will be characterized and will be used in the reliability estimation methodology developed in CLERECO.
System reliability is influenced by several parameters, which must all be carefully considered in the development of an accurate reliability evaluation framework. To take this into account, WP1 also aims at identifying the different operating modes of the system (e.g., voltage and frequency levels), and the different operating conditions (e.g., temperature, electronic noise, etc.).
The results of the project will be eventually captured in a series of reliability metrics that are required throughout heterogeneous segments of the computing continuum market. Some will be the well-known SDC/DUE FIT rates, but we expect to define some more metrics capturing different reliability aspects. Some will derive from safety standards, while some will derive from more implicit requirements (like the user experience, FIT rate that has impact on performance but not on correctness, etc.). WP1 is also in charge of determining the acceptable estimate error for the different design phases (from early abstract design phases up to final RTL).
WP Leader: INTEL
Participants: POLITO, UoA, CNRS, THALES, YOGITECH, ABB