1 / 93

Self-testing and Self-repairing Processors

Self-testing and Self-repairing Processors. Mario Schölzel Computer Engineering Group at Brandenburg University of Technology Cottbus, Germany. Outline. Introduction Self-Test Coarse-Grained On-Line Test BIST and SBST Fine-Grained Hybrid Test Self-Repair On-Line Self-Repair

bowie
Download Presentation

Self-testing and Self-repairing Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Self-testing and Self-repairing Processors Mario Schölzel Computer Engineering Group at Brandenburg University of Technology Cottbus, Germany

  2. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Outline • Introduction • Self-Test • Coarse-Grained On-Line Test • BIST and SBST • Fine-Grained Hybrid Test • Self-Repair • On-Line Self-Repair • Software-Based Self-Repair (Coarse-Grained) • Fine-Grained Software-Based Self-Repair

  3. Introduction

  4. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion What is the Presentation About? Test and Repair/Fault Handling Test Repair / Fault Handling Post-Production In-the-field (Diagnosis must be done on-chip) Post-Production In-the-field Off-Chip (External Test Equipment) On-Chip On-Chip Off-Chip (External Equipment) Memories Programmable Processors ASICs FPGAs Static Scheduled Program Dynamic Scheduled Program

  5. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Processor Model (VLIW) Program Memory FE Slot 1 Slot 2 Slot 3 Slot 4 FE-Reg 4 FE-Reg 3 FE-Reg 1 FE-Reg 2 DE DE-Reg 4 DE-Reg 3 DE-Reg 2 DE-Reg 1 EX Register File EU 1 EU 2 EU 3 EU 4 WB-Reg 1 WB-Reg 2 WB-Reg 3 WB-Reg 4 WB

  6. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Program Execution Program Memory Op A Op B Op C Op D FE Slot 1 Slot 2 Slot 3 Slot 4 FE-Reg 2 FE-Reg 3 FE-Reg 4 Op A FE-Reg 1 Op B Op C Op D DE DE-Reg 1 DE-Reg 2 DE-Reg 3 DE-Reg 4 opL opR opL opR opL opR opL opR EX Register File EU 1 EU 2 EU 3 EU 4 Result Result Result Result WB-Reg 1 WB-Reg 2 WB-Reg 3 WB-Reg 4 WB

  7. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Why Self-Test and Self-Repair In-The-Field? • Nano-scaled transistors are • more susceptible to process variations, • more susceptible to voltage drops, • have higher stress in the field due to higher current density. • Consequence: • Some transistors will be „out of specification“: • immediately after manufacturing, • very soon after manufacturing (early-life-failures), • after some time of heavy usage (wear-out): NBTI, HCI, Gate Oxide Break Down, Metal Migration

  8. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Classification of Faults that occur In-The-Field Temporary Faults: Appear only for a very short time period and disappears without any external intervention. Transient Faults: Triggered by an external event. Intermittent Faults: Triggered by a certain state of the system. Permanent Faults: Do not disappear without a repair (deterministic reproducible).

  9. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Self-Test and Self-Repair Scenarios Handling of a transient or permanent fault Handling of permanent faults only Handling of transient and permanent faults Handling of transient and permanent faults Start Up Start Up Start Up Start Up Permanent Permanent Permanent Self-Test Capability Embedded Processor Embedded Processor Self-Test Capability Self-Test Capability Embedded Processor Off-Line Self-Test Off-Line Self-Test Off-Line Self-Test Self-Repair Capability Embedded Processor Self-Repair Capability Embedded Processor Self-Repair Capability Embedded Processor Off-Line Self-Repair Off-Line Self-Repair Off-Line Self-Repair Transient/ Permanent Transient/ Permanent Transient/ Permanent Transient/ Permanent Fault Tolerance Embedded Processor Embedded Processor Fault Tolerance Embedded Processor Fault Tolerance Embedded Processor Running the desired application Running the desired application Fault Tolerance Running the desired application Fault Tolerance Running the desired application Fault Tolerance

  10. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Structure of the Presentation Self-testing and Self-repairing Processors Self-Test (and diagnosis) Self-Repair (requires diagnosis) On-Line Off-Line Off-Line On-Line BIST SBST Hybrid SW-based HW-based Coarse-Grained Fine-Grained

  11. Self-Test and Diagnosis

  12. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Structure of the Presentation Self-testing and Self-repairing Processors Self-Test (and diagnosis) Self-Test (and diagnosis) Self-Repair (requires diagnosis) On-Line On-Line Off-Line Off-Line On-Line BIST SBST Hybrid SW-based HW-based Coarse-Grained Fine-Grained

  13. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion On-Line Test • On-Line Test: A system that executes an application performs a self-test while it executes the application. • The goal is to handle transient faults. • Very often done by means of fault tolerance; i.e. redundancy in: • Data (error correcting codes) • Hardware (n-modular redundancy, watchdog, …) • Time (Checkpointing and/or concurrent computing)

  14. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Diagnostic On-Line Test for VLIWs • Compiler Support: • Duplication of each operation of the application. • Scheduling original and duplicate on different execution units. • The result of an original operation is not used before the duplicated operation is executed (ensures simple roll-back). • Hardware Support: • Comparing results of original and duplicated operations in hardware. • Recovery-Mode: Re-execution of failed operations in order to obtain a third result for majority voting.

  15. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Modified Architecture Program Memory FE Slot 1 Slot 2 Slot 3 Slot 4 FE-Reg 2 FE-Reg 3 FE-Reg 4 FE-Reg 1 DE Rebinding Logic DE-Reg 1 DE-Reg 2 DE-Reg 3 DE-Reg 4 EX Register File EU 1 EU 1 EU 2 EU 3 EU 4 Temporary Register File FDCL 1 FDCL 2 FDCL 3 FDCL 4 WB-Reg 1 WB-Reg 1 WB-Reg 2 WB-Reg 3 WB-Reg 4 WB

  16. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Task of the Rebinding Logic Op E Op A Op F Op B Op C Op G Op D Op H … in normal operation mode: FE Op A FE-Reg 1 Op B FE-Reg 2 Op C FE-Reg 3 FE-Reg 4 Op D DE Rebinding Logic DE-Reg 1 DE-Reg 2 DE-Reg 3 DE-Reg 4 EX … in recovery mode: FE Op A NOP Op A Op A FE-Reg 1 Op B Op B FE-Reg 2 Op C Op C NOP FE-Reg 3 NOP Op D Op D FE-Reg 4 DE Rebinding Logic DE-Reg 1 DE-Reg 2 DE-Reg 3 DE-Reg 4 EX

  17. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Task of the FDCL Compiler Support: Each operation is duplicated. New instruction format: + R3 R6 R0 10 T7 2 mode: original/duplicated register in the temporary register file unit that executes duplicated/original operation … for original operations: … for duplicated operations: DE-Reg DE-Reg opL opR + R3 R6 R0 10 T7 2 opL opR + R3 R6 R0 11 T7 1 EU 1 EU 2 Result1 Result1 Result2 Temporary Register File Temporary Register File FDCL 1 FDCL 2 ? = Result1 WB-Reg WB-Reg

  18. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Fault Localization and Recovery Comparing detects a mismatch: DE-Reg DE-Reg opL opR + R3 R6 R0 11 T7 1 Rebinding Logic EU 3 EU 2 Result3 Temporary Register File Temporary Register File ? = FDCL 2 != FDCL 3 Fault State Fault State Result2 Result1 Result1 Result1 WB-Reg WB-Reg Result3 == Result1 Result3 != Result1 Fault in EU 2 Fault in EU 1

  19. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Summary – On-Line Test • Advantages: • Detection of transient and permanent faults at run-time. • Recovery from faults by re-execution and majority voting takes only 1 to 2 clock cycles. • Coarse-grained diagnosis. • Disadvantages: • Overhead in performance and/or hardware. • Capability of detecting all faults get lost after an occurred fault. • Diagnosis can not handle multiple faults (is always very expensive by using On-Line Test).

  20. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Off-Line Test • Off-Line Test: A system performs a self-test during the startup, before it starts to execute a certain application. • The goal is • to detect permanent faults, • have better diagnostic capability for permanent faults with lower hardware overhead. • On-Chip Implementations may be based on: • Built-In Self-Test (BIST) • Software-based Self-test (SBST)

  21. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Structure of the Presentation Self-testing and Self-repairing Processors Self-Test (and diagnosis) Self-Repair (requires diagnosis) On-Line Off-Line Off-Line On-Line BIST SBST Hybrid SW-based HW-based Coarse-Grained Fine-Grained

  22. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Built-In Self-Test (BIST) Chip BIST Controller Test-Pattern Generator (e.g. an Linear Feedback Shift Register) LFSR Test Patterns Circuit under Test (with scan chains) comb. logic comb. logic scan-chain scan-chain scan-chain Test Responses Output Response Analyzer (e.g. an MISR) MISR Compacted Test Responses

  23. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Summary BIST • BIST is based on the scan-test infrastructure of the manufacturing test: • Scan-chains in the circuit • External test equipment is replaced by the BIST controller • Diagnosis is difficult due to the output compaction (information gets lost) • BIST destroys the internal state of the circuit • Advantages: • High coverage for static faults • Well supported by CAD Tools • Applicable to many circuit types (processors, ASICs, …) • Disadvantages: • Overhead in chip area • Reduced testability of dynamic faults • Overtesting • High power consumption • Slow clock rate

  24. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Structure of the Presentation Self-testing and Self-repairing Processors Self-Test (and diagnosis) Self-Repair (requires diagnosis) On-Line Off-Line Off-Line On-Line BIST SBST* Hybrid SW-based HW-based Coarse-Grained Fine-Grained *Content taken from an internal presentation of Tobias Koal

  25. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Software-Based Self-Test (SBST) RAM/ROM CPU Application Instructions/Data Relation to BIST: SBST Software SBST Software creates certain (Test-)Patterns in the Flip-Flops (scan-chains) of the circuit. Test Responses Data

  26. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion SBST Classification SBST With structural information Without structural information ATPG Based Open Loop Feedback Based • Test Software is created by expert knowledge (e.g., structure of the processor, already known test software, …) • Exact fault coverage is not known ld r2,#5 add r2,r3 … writes Expert Testsoftware

  27. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion SBST Classification SBST With structural information Without structural information ATPG Based Open Loop Feedback Based • Test programs are created randomly (e.g. by genetic algorithms) • Fault coverage is determined by simulation of these programs (feed back) • High fault coverage can be achieved • Very time consuming Testprogram Generation ld r2,#5 add r2,r3 … ld r2,#5 add r2,r3 … ld r2,#5 add r2,r3 … Fault Coverage Simulation Randomly generated Testprograms

  28. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion SBST Classification SBST With structural information Without structural information ATPG Based Open Loop Feedback Based Part for a certain combinational block (e.g. ALU) • Test pattern are created by an ATPG Tool • Mapping of test pattern to software templates • High fault coverage is guaranteed by the ATPG tool • But not all test pattern can be applied by using software routines Test Pattern Generator 11011101010000010011101101 Test Patterns 11011101110000010011101101 Constraints (e.g. jumps are limited to certain addresses) ld r2,#7 ld r3,#0 add r2,r3 … Testprogram

  29. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Summary • Advantages of the SBST: • At speed testing • No hardware overhead • Provides good diagnostic capabilities • Problems of the SBST: • Processor-Test only by using valid data and operations (lower fault coverage than BIST) • Test pattern generation is difficult, because the test pattern generator must be controlled by constraints. • Not all test patterns can be generated in the pipeline register by a SBST (Especially dynamic tests will be difficult). • Problems of the BIST: • Diagnosis on-chip is difficult due to their complexity. • Shifting in/out of test patterns requires a lot of time. • Detection of multiple faults difficult. • Goal: • Fine-grained diagnostic self-test in the field with high fault-coverage. • Solution: • Combining BIST and SBST for statically scheduled data paths

  30. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Structure of the Presentation Self-testing and Self-repairing Processors Self-Repair (requires diagnosis) Self-Test (and diagnosis) On-Line Off-Line On-Line Off-Line BIST SBST Hybrid SW-based HW-based Coarse-Grained Fine-Grained

  31. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Hybrid Approach • Each stage uses only components that were tested in a previous stage. • BIST: • Checks basic functionalities of the processor. • no diagnostic features required. • Test pattern based SBST: • Loads test patterns directly into the pipeline registers. • Coarse-grained diagnostic capability by concurrent-checking (slots of the VLW). • SBST: • Fine-grained diagnosis is possible (read ports of the register file).

  32. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Architecture for the Hybrid Self-Test Tested by BIST Tested by test pattern based SBST Tested by SBST Infrastructure of the test pattern based SBST

  33. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Controlling the Test Pattern Based SBST Stage Switches into the test mode Switches back into normal mode Picture taken from a presentation of Markus Ulbricht

  34. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Infrastructure for the Test Pattern Based SBST Stage Pictures taken from a presentation of Markus Ulbricht

  35. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Summary Hybrid Self-Test • Usage of already available resources of the processor for the test. • Diagnostic capability costs about 5% hardware overhead compared to a BIST without diagnostic capability. • Loading test patterns in parallel reduces test time by 18% compared to the BIST (data path only 600 clock cycles). • Jumps will be suppressed in the test mode; no constraints for test patterns • This simplifies the test pattern generation • Diagnostic capability of each test (BIST / SBST) method is well employed.

  36. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Summary Off-Line Test • Advantages of the Off-Line Test: • Low hardware overhead • Fine-Grained Diagnosis is possible • Disadvantages of the Off-Line Test: • No detection of transient faults • Detection of permanent faults only during start-up • Transient faults may be detected as permanent faults

  37. Self-Repair

  38. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Structure of the Presentation Self-testing and Self-repairing Processors Self-Test Self-Repair On-Line Off-Line Off-Line On-Line BIST SBST Hybrid SW-based HW-based Coarse-Grained Fine-Grained

  39. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Fault Tolerant Architecture Program Memory FE Slot 1 Slot 2 Slot 3 Slot 4 FE-Reg 2 FE-Reg 3 FE-Reg 4 FE-Reg 1 DE Rebinding Logic DE-Reg 1 DE-Reg 2 DE-Reg 3 DE-Reg 4 EX Register File EU 1 EU 1 EU 2 EU 3 EU 4 Temporary Register File FDCL 1 FDCL 2 FDCL 3 FDCL 4 WB-Reg 1 WB-Reg 1 WB-Reg 2 WB-Reg 3 WB-Reg 4 WB

  40. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Handling of Permanent Faults Original operation executes on faulty unit: DE-Reg DE-Reg original operation duplicated operation Rebinding Logic EU 1 EU 2 Result1 Result2 Temporary Register File FDCL 1 FDCL 2 Fault State Fault State Result1 WB-Reg WB-Reg

  41. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Handling of Permanent Faults Original operation executes on faulty unit: DE-Reg original operation DE-Reg duplicated operation Rebinding Logic EU 1 EU 2 Result1 Result1 Result2 Temporary Register File FDCL 1 FDCL 2 Fault State Fault State Result1 WB-Reg WB-Reg If both results are equal, a transient error has been discovered before.

  42. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Synthesis Results VLIW4 FT VLIW4late FT VLIW4early VLIW2 TMR FE Slot 1 Slot 2 FE Slot 1 Slot 4 FE Slot 1 Slot 4 FE Slot 1 Slot 4 FE Slot 1 Slot 2 Slot 1 Slot 2 Slot 1 Slot 2 Rebinding DE DE DE DE DE EX EX EX EX Rebinding EX WB WB WB WB WB Voting

  43. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Reliability Results

  44. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Reliability Results

  45. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Summary – On-Line Self-Repair • HW/SW-based method: • for the detection of transient faults by concurrent execution and • recovery from faults with localization of permanent faults by re-execution and majority vote (only 1 to 2 clock cycles). • A detected permanent fault is masked without delay. • Hardware for concurrent checking is also used for handling permanent faults. • Applicable for statically scheduled data paths with medium to large scaled EUs. • Capability of detecting a second fault gets lost after the first occurred permanent fault. • Solution: Repair permanent faults off-line!

  46. Self-Repair Off-Line Software-Based

  47. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Structure of the Presentation Self-testing and Self-repairing Processors Self-Test Self-Repair On-Line Off-Line Off-Line On-Line BIST SBST Hybrid SW-based HW-based Coarse-Grained Fine-Grained

  48. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Idea of the Software-Based Self-Repair Faultless Data Path Faulty Data Path add sub mul add add nop sub mul add add nop shl cmp add nop nop shl cmp add mul shl mul nop add mul shl mul xor nop mul add nop xor nop mul add jmp nop nop inc jmp nop nop inc Slot 1 Slot 2 Slot 3 Slot 4 Slot 1 Slot 2 Slot 3 Slot 4 Fetch-Register Fetch-Register Decode-Register Decode-Register Register File Register File EU 1 EU 2 EU 3 EU 4 EU 1 EU 2 EU 3 EU 4 Write-Back-Register Write-Back-Register

  49. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Basic System Architecture • A permanent fault is detected and repaired during the startup of the system (on-chip but off-line): • At Start-up a diagnostic self-test is done by a self-test-routine. • A Repair-Routine is executed in order to reorder the operations of the application. • Problem: Access to the program memory. Application Program Memory Very Long Instruction Word (VLIW) Core Data Memory Software-Based Self-Test and Repair-Routine

  50. Introduction Self Test On-Line BIST SBST Hybrid Self Repair On-Line SW-Based Fine-Grained Conclusion Infrastructure for Moving Instructions Program Memory Arbiter Data Memory instruction operation 1 operation 2 Program Memory Bus 0x0056 0xffff 0x0010 read Data Memory Bus instruction pMemAddr dMemAddr dMemData dMemCtrl VLIW Core NOP Ldc 0xFFFF -> R0 Arbiter makes up about 3% of the transistor count of the VLIW-core. Ldc 0x0056 -> R2 Ldc 0x0010 -> R3 Load [R0] -> R1 // Initialize Read-Mode in arbiter Store R3 -> [R2] // Read instruction from 0x10 and // save it in dmem to address 0x56 Nop // Keep the VLIW-core synchronous Nop // with the program in the program memory

More Related