WP2: Common and domain-specific sources of failure and unreliability

WP1 analyzes the different failure mechanisms that will be relevant in future technologies, which will be likely employed in the computing continuum (scaled bulk CMOS, III-V Ge, Finfets, spin logic, etc.) and works on identifying and characterizing the main sources of failure. Moreover, this WP also sets the reliability requirements for the different computing segments within the computing continuum such as ES and HPC.

The starting point of WP1 activities consists on studying the defects and reliability failure mechanisms that are anticipated in future computing systems (due to technology and architectural specifications). This operation involves the selection of a set of use cases ranging from very specific ES applications to general-purpose HPC systems. Among the wide range of possible use cases we will target those in which architectural and technological solutions among the ES and the HPC segments are rapidly converging including for instance multi/many cores in the 22/16nm Finfet technology node. Identified sources of failure will be characterized and will be used in the reliability estimation methodology developed in CLERECO.

System reliability is influenced by several parameters, which must all be carefully considered in the development of an accurate reliability evaluation framework. To take this into account, WP1 also aims at identifying the different operating modes of the system (e.g., voltage and frequency levels), and the different operating conditions (e.g., temperature, electronic noise, etc.).

The results of the project will be eventually captured in a series of reliability metrics that are required throughout heterogeneous segments of the computing continuum market. Some will be the well-known SDC/DUE FIT rates, but we expect to define some more metrics capturing different reliability aspects. Some will derive from safety standards, while some will derive from more implicit requirements (like the user experience, FIT rate that has impact on performance but not on correctness, etc.). WP1 is also in charge of determining the acceptable estimate error for the different design phases (from early abstract design phases up to final RTL).


WP Leader: INTEL


