1 / 46

Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration. Ronald F. DeMara and Kening Zhang University of Central Florida. 1 July 2005. Fault-Handling Techniques for SRAM-based FPGAs. Reprogrammable Device Failure. Characteristics. Duration :. Transient : SEU.

piper
Download Presentation

Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Autonomous FPGA Fault HandlingthroughCompetitive Runtime Reconfiguration Ronald F. DeMara and Kening ZhangUniversity of Central Florida 1 July 2005

  2. Fault-Handling Techniques for SRAM-based FPGAs Reprogrammable Device Failure Characteristics Duration: Transient: SEU Permanent: SEL, Oxide Breakdown, Electron Migration, LPD Device Configuration Processing Datapath Device Configuration Processing Datapath Target: BIST Evolutionary Repetitive Readback [Wells00] TMR (conventional spatial redundancy) Approach: STARS [Abramovici01] CED [McCluskey04] Sussex [Vigander01] CRR [DeMara05] Methods Supplementary Testbench Duplex Output Comparison Duplex Output Comparison Detection: (not addressed) Cartesian Intersection Isolation: (not addressed) Bitwise Comparison Majority Vote unnecessary Fast Run-time Location Worst-case Clock Period Dilation Diagnosis: unnecessary unnecessary Population-based GA using Extrinsic Fitness Evaluation Evolutionary Algorithm using Intrinsic Fitness Evaluation Recovery: Replicate in Spare Resource Select Spare Resource Invert Bit Value Ignore Discrepancy

  3. Previous Work Strategies: 1) Evolve redundancyinto designbeforeanticipated failure 2) Redesignafterdetection of failure 3)Combinedesirable aspects of both strategies 1) + 2) … Detection Characteristics of FPGA Fault-Handling Schemes

  4.  = RS:  = (Hamming Distance) CRR Arrangement in SRAM FPGA • Configurations in Population • C = CL CR • CL = subset of left-half configurations • CR = subset of right-half configurations • |CL|=|CR |= |C|/2 • Discrepancy Operator • Baseline Discrepancy Operator is dyadic operator with binary output: • Z(Ci) is FPGA data throughput output of configuration Ci • Each half-configuration evaluates  using embedded checker (XNOR gate) within each individual • Any fault in checker lowers that individual’s fitness so that individual is no longer preferred and eventually undergoes repair WTA: (Equivalence)

  5. Terminology and Characteristics Pristine Pool: CP. For anyCiC, is member of CP at generation G if and only if Suspect Pool:CS. For anyCiC, is member of CS at generation G if and only if at least one of Under Repair Pool:CU: For anyCiC, is member of CU at generation G if and only if Refurbished Pool:CR: after Genetic Operator applied, the new generated individual is member of CR at generation G if and only if EDis Discrepancy Count of Ciand EC is Correctness Count of Ci Length of Evaluation Fitness Window:W = ED+EC Fitness Metric:f(Ci) =EC/ EW

  6. Sketch of CRR ApproachPremise: Recovery Complexity << Design Complexity • Initialization • Population P of functionally-identical yet physically-distinct configurations • Partition P into sub-populations that use supersets of physically-distinct resources, e.g. size |P|/2 to designate physical FPGA left-half or right-half resource utilization • Fitness Assessment • Discrepancy Operator is some function of bitwise agreement between each half’s output • Four Fitness States defined for Configurations as {CP,CS,CU,CR} with transitions, respectively: Pristine Suspect Under Repair Refurbished • Fitness Evaluation WindowWdetermines comparison interval • Regeneration • Genetic Operators used to recover from fault based on Reintroduction Rate  • Operators only applied once then offspring returned to “service” without for concern about increasing fitness fitness assessment via pairwise discrepancy(temporal voting vs. spatial voting)

  7. States Transitions during lifetime of ith Half-Configuration Configuration Health States

  8. Procedural Flow underCompetitive Runtime Reconfiguration Integrates all fault handling stages using EC strategy • Detects faults by the occurrence of discrepancy • Isolates faults by accumulation of discrepancies • Failure-specific refurbishment using Genetic Operators: • Intra-Module-Crossover, Inter-Module-Crossover, Intra-Module-Mutation Realize online device refurbishment • Refurbished online without additional function or resource test vectors • Repair during the normal data throughput process

  9. Selection Process

  10. Fitness Adjustment Procedure

  11. W i=1 Fitness Evaluation Window • Fitness Evaluation Window: W • denotes number of iterations used to evaluate fitness before the state of an individual is determined • Determination ofWfor 3x3 multiplier • 6 input pins articulating 26=64 possible inputs • W should be selected so that all possible inputs appear • More formally, • Let rand(X) return some xiX at random • Seek W : [ rand(X) ] = X with high probability • xK = distinct orderings of K inputs showing in D trials • if D constant, can calculate Pk>1 successively • probability PK of K inputs showing after D trials is ratio of xK / KD

  12. W Determination When K=64:

  13. Impact of Fault on Viable Individuals • Existence of Positive Test Vector • Input Ip comprises a articulating test iff Ci(Ip) Cji(Ip) = 1 • So if a discrepancy is detected then some Ip exists which manifests the fault • Minimal Case whenIpis Unique • Ipis unique if fault is observable under exactly one input pattern • Probability Mass Function for Encountering Minimal CaseIp • Consider W=600 yielding 99.5% coverage for a module with input space X=64 • The number of input occurrences, 0  i  600, that randomly encounter Ip to identify the fault is governed by the probability density function: p.m.f. = where

  14. Integer Multiplier Case Study • 3bit x 3bit unsigned multiplier automated design: • Building blocks • Half-Adder: 18 templates created • Full-Adder: 24 templates • Parallel-And : 1 template created • Randomly select templates for instantiation in modules • GA parameters • Population size : 20 individuals Crossover rate : 5% • Mutation rate : up to 80% per bit • GA operators • External-Module-Crossover • Internal-Module-Crossover • Internal-Module-Mutation Experiments Demonstrate … Experimental Evaluation Xilinx Virtex II Pro on Avnet PCI board • Objective fitness function replaced by the Consensus-based Evaluation Approach and Relative Fitness • Elimination of additional test vectors • Temporal Assessment process

  15. Template Fault Coverage Half-Adder Template A Half-Adder Template A Half-Adder Template B • Template A • Gate3 is an AND gate • Will lose correctness if a Stuck-At-Zero fault occurs in second input line of the Gate3, an AND gate Template B • Gate3 is a NOT gate and only uses the first input line • Will work correctly even if second input line is stuck at Zero or One

  16. Regeneration Performance Parameters: Difference (vs. Hamming Distance) Evaluation Window, Ew = 600 Suspect Threshold: S = 1-6/600=99% Repair Threshold: R = 1-4/600 = 99.3% Re-introduction rate: r = 0.1 Repairs evolved in-situ, in real-time, without additional test vectors, while allowing device to remain partially online.

  17. Discrepancy Mirror • Mechanism for Checking-the-Checker (“golden element” problem) • Makes checker part of configuration that competes for correctness [DeMara PDPTA-05] Fault Coverage

  18. Discrepancy Mirror Circuit Fault Coverage

  19. Perpetually Articulating Inputs with Equiprobable Distribution Intermittently Articulating Inputs with Equiprobable Distribution Influence of LUT utilization • expected number of pairings grows sub-linearly in number of resources • utilization below 20% or above 80% implicates (or exonerates) a smaller sub-set of resources • 50% utilization, the expected number of pairings for 1,000, 10,000, and 100,000 resources are 11.1, 14.9, and 17.6 • at 90% utilization mean value of 258 pairings are required to isolate the faulty resource.

  20. Future Work:Development Board to Self-Contained FPGA Qualitative Analysis of CRR model • Number of iterations and completeness of regeneration repair • Percentage of time the device remains online despite physical resource fault (availability) Hardware Resource Management • Optimization of hardware profile for Xilinx Virtex II Pro Field Testing on SRAM-based FPGA in a Cubesat mission

  21. Backup Slides • On following pages …

  22. Isolation: Block Duelling • Algorithm based on group testingmethods • Successive intersection to assess health of resources Each configuration k has a binary Usage Matrix Uk[i,j] 1  i  m and 1  j  n • m, n are the number of rows and columns of resources in the device • Elements Uk[i,j] = 1 are resources used in k History Matrix H [i,j] 1  i  m and 1  j  n, initially all zero, exists in which : • entries represent the fitness of resources (i, j) • Information regarding the fitness of resources over time is stored A discrepant output will lead to an increase in the value of H[i,j],  Uk[i,j] = 1 ,k  S • All elements of H, corresponding to resources used by discrepant configuration will be incremented by one. • At any point in time, H[i,j] will be a record the outcomes of competitions • m successive intersections among are performed until |S|=1

  23. Dueling Example H [i,j] @ t = 0 U2 U1 • H [i,j] changes after C1 and C2 are loaded • U1 and U2are corresponding Usage Matrices • (3,3) is identified as the faulty resource H [i,j] @ t = 2 Fitness of configuration k k k

  24. Isolation of a single faulty individual with 1-out-of-64 impact • Outliers are identified after W iterations elapsed • E.V. = (1/64)*600 = 9.375 from minimum impact faulty individual • Isolated individual’s f differs from the average DV by 3after 1 or more observation intervals of length W

  25. Isolation of a single faulty L individual with 10-out-of-64 impact • Compare with 1-out-of-64 fault impact • E.V. of (10/64)*600 = 93.75 discrepancies for faulty configuration • One isolation will be complete approx. once in every 93.75/5 = 19 Observation Intervals • Fault Isolation demonstrated in 100% of case

  26. Isolation of 8 faulty individuals L4&R4 with 1-out-of-64 impact • Expected isolations do not occur approximately 40% of the time • Average discrepancy value of the population is higher • Outlier isolation difficult • Multiple faulty individual, Discrepancies scattered

  27. Online Dueling Evaluation • Objective • Isolate faults by successive intersection between sets of FPGA resources used by configurations • Analyze complexity of Isolation process • Variables • Total resources available • Measured in number of LUTs • Number of Competing Configurations • Number of initial “Seed” designs in CRR process • Degree of Articulation • Some inputs may not manifest faults, even if faulty resource used by individual • Resource Utilization Factor • Percentage of FPGA resources required by target application/design • Number of Iterations for Isolation • Measure of complexity and time involved in isolating fault

  28. Isolation of Faulty Resource at the FPGA resource (LUT) granularity • 50625 LUTs comparable to LUTs on a Xilinx Virtex II Pro FPGA • Xilinx Virtex II Pro has approximately • 67 columns, 78 rows • 4 slices per CLB • 2 LUTs per slice

  29. Isolation of Faulty Resource:Effect of Articulation • No direct, uniform relation between % Articulation and Number of Isolations! • Performance best when Articulation (%) = 50%  10% • Each successive intersection provides maximal information • Greatest number of resources are intersected out of “suspect” pool.

  30. For further info … EH Websitehttp://cal.ucf.edu

  31. Fast Reconfiguration for Autonomously Reprogrammable Logic • Motivation • Dynamic reconfiguration required by application • Exploit architectural & performance improvements fully • Reconfiguration delay – a major performance barrier • Previous Work • Methodology • Multilayer Runtime Reconfiguration Architecture (MRRA) • Spatial Management • Prototype Development • Loosely-Coupled solution • Timing Analysis • System-On-Chip solution

  32. Reconfiguration Demand during CRR For a complete repair • Approximately 2,000 generations ( ) may be required • For each generation, # evaluations may be up to 100 evaluations • Yielding the Cumulative Number of Reconfigurations (CNR) up to • For each reconfiguration task • Therefore, the total delay Even if reconfiguration delay alone is assumed to be in the order of tens or hundreds of milliseconds  Ltot >= 5.5 hours

  33. Previous Work - Tool Level

  34. Previous Work - Algorithm Level compression methodtemporal methodspatial method

  35. Multilayer Runtime Reconfiguration Architecture (MRRA) • Develop MRRA fast reconfiguration paradigm for the CRR approach • Validate with real hardware platform along with detailed performance analysis • First general-purpose framework for a wide variety of applications requiring dynamic reconfiguration • Extend existing theories on reconfiguration

  36. Loosely Coupled Solution The Virtex-II Pro is mounted on a development board which can then be interfaced with a WorkStation running Xilinx EDK and ISE. The entire system operates on a 32-bit basis

  37. Result Assessment • Establish full functional framework of both prototypes • Communication overhead, throughput and overall speed-up analysis • Communication overhead for SOC solution is decreased to micro or sub-micro second orderVs.milliseconds order of Loosely Coupled solution • Up to 5-fold speedup is expected compared to the Loosely Coupled solution • Translation Complexity Analysis • The quantity of information that needs to be translated to generate the reconfiguration bitstream • Simplification from file level to bit level is expected • Storage Complexity Analysis • The memory spacerequired for the run-time algorithms • Decreased memory requirement is expected due to the translation complexity improvement

  38. Project Milestones SW Schedule: HWSchedule:

  39. Publications AcceptedManuscripts • R. F. DeMara and K. Zhang, “Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration,” to appear in NASA/DoD Conference on Evolvable Hardware(EH’05), Washington D.C., U.S.A., June 29 – July 1, 2005. • H. Tan and R. F. DeMara, “A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management,” to appear in International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA’05), Las Vegas, Nevada, U.S.A, June 27 – 30, 2005. • R. F. DeMara and C. A. Sharma, “Self-Checking Fault Detection using Discrepancy Mirrors,” to appear in International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’05), Las Vegas, Nevada, U.S.A, June 27 – 30, 2005. SubmittedManuscripts • R. F. DeMara and K. Zhang, “Populational Fault Tolerance Analysis Under CRR Approach,” submitted to International Conference on Evolvable Systems (ICES’05), Barcelona, Sept. 12 – 14, 2005. • R. F. DeMara and C. A. Sharma, “FPGA Fault Isolation and Refurbishment using Iterative Pairing,” submitted to IFIP VLSI-SOC Conference, Perth, W. Australia, October 17 – 19, 2005. Manuscripts In-preparation • R. F. DeMara and K. Zhang, “Autonomous Fault Occlusion through Competitive Runtime Reconfiguration,” submission planned to IEEE Transactions on Evolutionary Computation. • R. F. DeMara and C. A. Sharma, “Multilayer Dynamic Reconfiguration Supporting Heterogeneous FPGA Resource Management,” submission planned to IEEE Design and Test of Computers. Field Testing Implementation of CRR on-board SRAM-based FPGA in a Cubesat mission

  40. EHW Environments • Evolvable Hardware (EHW)Environmentsenable experimentalmethods to researchsoft computingintelligent search techniques • EHW operates by repetitive reprogramming of real-world physical devices using aniterative refinementprocess: Extrinsic Evolution Intrinsic Evolution Application Two modes of Evolvable Hardware or Genetic Algorithm Genetic Algorithm Stardust Satellite: • >100 FPGAs onboard • hostile environment: radiation, thermal stress • How to achieve reliability to avoid mission failure??? Simulation in the loop Hardware in the loop Done? Build it software model new approach to Autonomous Repair of failed devices device “design-time” refinement device “run-time” refinement

  41. Genetic Algorithms (GAs) Mechanism coarsely modeled after neo-Darwinism (natural selection + genetics) start replacement offspring population of candidate solutions evaluate fitness of individuals Fitness function mutation crossover selection of parents parents Goal reached

  42. Genetic Mechanisms • Guided trial-and-error search techniques using principles of Darwinian evolution • iterative selection, “survival of the fittest” • genetic operators -- mutation, crossover, … • implementor must define fitness function • GAs frequently use strings of 1s and 0s to represent candidate solutions • if 100101 is better than 010001 it will have more chance to breed and influence future population • GAs “cast a net” over entire solution space to find regions of high fitness • Can invokeElitism Operator(E=1, E=2 …) • guarantees monotonically increasing fitness of best individual over all generations

  43. GA Success Stories Commercial Applications: • Nextel: frequency allocation for cellular phone networks -- $15M predicted savings in NY market • Pratt & Whitney: turbine engine design --- engineer: 8 weeks; GA: 2 days w/3x improvement • International Truck: production scheduling improved by 90% in 5 plants NASA:superior Jupiter trajectory optimization, antennas, FPGAs Koza:25 instances showing human-competitive performance such as analog circuit design, amplifiers, filters

  44. Representing Candidate Solutions • Representation of an individual can be using discrete values (binary, integer, or any other system with a discrete set of values) • Example of Binary DNA Encoding: Individual (Chromosome) GENE

  45. mutation recombination (crossover) Genetic Operators t t+1 selection reproduction

  46. . . . cut cut 1 1 1 1 1 1 1 0 0 0 0 0 0 0 parents 1 1 1 0 0 0 0 0 0 0 1 1 1 1 Crossover Operator Population: offspring

More Related