# **<u>Thesis Defense</u> Comprehensive Resiliency Evaluation for Dependable Embedded Systems**

#### Yohan Ko

#### <u>Committee</u>

Prof. Kyoungwoo Lee Prof. Bernd Burgstaller Prof. Yo-Sub Han Prof. Hyunok Oh Dependable Computing Lab. Dept. of Computer Science Yonsei University





- Thesis reminder
- Comments from the previous presentation
- Response to comments
- Conclusion





- Thesis reminder
  - How to quantify the resiliency of a processor
  - How to quantify the effectiveness of protection techniques
- Comments from the previous presentation
- Response to comments
- Conclusion



# Soft errors?

- Charge carrying particles induce soft errors
  - Alpha particles
  - Neutrons
  - Cosmic ray
- Soft error rate
  - More than 1 bits in a chip
  - Exponentially increases with technology scaling and near-threshold computing





[ASPLOS 2010] Shuguang Feng, Shantanu Gupta, Amin Ansari, and Scott Mahlke. Shoestring: probabilistic soft error reliability on the cheap. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 2010.

4 / 24

# How to quantify the resiliency of a processor



Performance

Runtime (cycle)

Hardware configuration

- Issue width, ROB size, IQ size, LSO size

- Software configuration
  - Compiler (gcc, LLVM)
  - Optimization options
  - Algorithm
- System configuration
  - ISAs (ARM, X86, POWER, SPARC)
  - Number of cores

Good for design space exploration in terms of performance and resiliency at the early design phase

- Vulnerability (*bit*  $\times$  *cycle*)

# How to quantify the effectiveness of protection techniques

- Design guidelines for resilient and efficient parity protected writeback L1 caches
- To do this, we have extended gemV-tool to gemV-cache



- Design questions:
  - When to check for parity
    - At Reads
    - At Writes
    - At both Reads and Writes
  - Granularity of status bits
    - Block level
    - Word level

[DAC 2015] **Yohan Ko**, Reiley Jeyapaul, Youngbin Kim, Kyoungwoo Lee, and Aviral Shrivastava. Guidelines to design parity protected write-back L1 data cache. Design Automation Conference (DAC). 2015.



#### When should parity be checked?



### At what granularity must we implement status bits?





- Thesis reminder
- Comments from the previous presentation
  - Error modeling
  - Strength of gemV-tool
  - Use of gemV-tool
- Response to comments
- Conclusion



#### 1st comment: Need of concrete error modeling





#### 2nd comment: What can gemV-tool do?





- Thesis reminder
- Comments from the previous presentation
- Response to comments
  - Our soft error model
  - Strength of our gemV-tool
  - Use of gemV-tool to choose protection techniques
- Conclusion





<sup>13 / 24</sup> 







Soft error



**Computer system** 

System failure



Hard error



Hacking





Software bug

**Cure-all protection** 



# Our soft error model

- 1. Occurrence of soft errors are proportional to chip size of microarchitectural components [TACO 2013]
- 2. External charge usually induces single-bit soft errors, not multiple-bit soft errors ITECS 2016]



[TACO 2013] Jongwon Lee, Yohan Ko, Kyoungwoo Lee, Jonghee M. Youn, and Yunheung Paek. Dynamic code duplication with vulnerability awareness for soft error detection on VLIW architectures. ACM Transactions on Architecture and Code Optimization (TACO). 9, 4, Article 48. January 2013.

[TECS 2016] **Yohan Ko**, Jihoon Kang, Jongwon Lee, Yongjoo Kim, Joonhyun Kim, Hwisoo So, Kyoungwoo Lee, and Yunheung Paek. Software-Based selective validation techniques for robust CGRAs against soft errors. ACM Transactions on Embedded Computing 15 / 24 Systems (TECS). 15, 1, Article 20. January 2016



### What is vulnerability?



[MICRO 2003] Shubhendu S. Mukherjee, Christopher Weaver, Joel Emer, Steven K. Reinhardt, and Todd Austin. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. International Symposium on Microarchitecture (MICRO). 2003.



### Vulnerability modeling at the architectural level

- 1: ADD r1, r2, r3
- 2: SUB r5, r1, r4
- 3: STORE r2, r6



# What makes our gemV-tool better?



### **Outcome from gemV**

- What is the probability that a single-bit soft error in a computer system results in system failure?
  - Architectural vulnerability factor

