ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 10 – Design for Testability

ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 10 – Design for Testability Reading Assignment: Kang – CMOS Digital Integrated Circuit: Analysis and Design Chapter 15

Testing your prototype!!! • Test is time consuming and Test equipment is very expensive!!!! • Test cost contributes greatly to the cost of the system (20-30% of the chip cost). • You must think about the test during the design • End-up with untestable chip • Test your functionality as well as performance • If you don’t test it, It won’t work!!!! Prototype Specification ?

Introduction • Testing is important, probably as important as the design process. • Test the chip to make sure it is full functional is highly complex and time-consuming • Cost of chip debugging is much higher than that of board-level debugging which is in turn much higher than that of system-level debugging. • In production environment, many chips must be tested within a short time fro timely delivery to customers. • Therefore design for testability become very critical.

Testing Classification • Diagnostic test • Used in chip/board level debugging • Defect localization • “go/no go” or production test • Used in chip production • Parametric test • Voltage and current test, instead of logic test • Check other parameters such as noise margin (NM), threshold voltage (Vt), delay time (tp) and temperature (T).

Chip Debugging • Design errors or fabrication defect? • Micro-probing the die • E-beam • Single-die repair

Testing is Expensive • VLSI tester cost several million dollars (US) • Volume manufacturing requires large number of testers, maintenance • A lot of time, design company cannot afford this and a rental model is commonly used. The rent is counted by time usage. • Tester time costs are in $/sec • Test cost contributes 20-30% to total chip cost.

Types of Testing

Manufacturing defects • Errors can occur at different stage in the life-time of a chip • During Manufacturing: misalignment, dust and other particles, “stacking faults”, defects in dielectric, mask scratches, thickness variation: layer to layer shorts, discontinuous wires (open), circuit sensitivities (Vth, Lchannel): found during wafer probe of test structures. • During packaging: Defects from scratching in handling, damage during bonding misalignment (need always to check the wire bonding), other defects undetected during wafer probe: found during test of packaged parts. • During mounting: Defects from damage during board insertion(thermal, ESD), infant mortality (mfg defects that show up after a few hours of use). Noise problems, susceptibility to latch-up: found during testing/mounting on board. • Long term: Defects that appear after months or years of utilization (metal migration, oxide damage during manufacture, impurities): found by the customer

Testers for volume manufacturing Associated with each pin • Each pin on the chip is driven/observed by a separate set of circuitry which typically can either: • drive the pin to one data value per cycle • or observe the value of the pin at a particular point in a clock cycle. • Timing of input transitions and sampling is controlled by a high resolution timing generators Device under test (DUT) is mounted on the test head

Test Strategy • The test using the testers is achieved in many steps: • Supply a set of test vectors that specify an input or output value for every pin on every cycle. • Tester will load the program into the pin cards. • Run the program and report any miss-compares between an output value and the expected value.

Testers for volume manufacturing Specification Behavioral model Force/Compare Test patterns I/O vectors Design Cycle Memory error Vcompare

How many test vectors do we need? • For exhaustive test: for a digital circuit with 25 inputs and 50 states, 275 cycles are required. Assuming 1us/cycle then test time >109 years. • Exhaustive test is impractical and unnecessary. • We only need to verify that no faults are present which may take fewer vectors. • In fact many vectors can test the same fault. 2n inputs required to exhaustively test circuit 2n+m inputs required to exhaustively test the circuit

Fault Types and Models • Testing Goal: to detect faults in fabrication, design and failures due to stressful operating conditions and reliability problems • Test process: • Input test vector to the device under test (DUT) as its stimuli • Measured outputs are compared with the expected correct responses to determine the correctness • Difficulty: only system inputs and outputs pins are accessible • Another difficulty – generation of correct test vectors to detect all modeled faults and design errors. • Manual or automatic test pattern generator (ATPG) becomes a difficult task.

Defect causes • Physical defects: • Defects in silicon substrate • Photolithographic defects • Mask contamination and scratches • Process variations and abnormalities • Oxide defects • Physical defects -> electrical faults • Shorts (bridging faults) or Opens • Transistor stuck-on, stuck-open • Resistive shorts and opens • Excessive change in threshold voltage and current • Electrical faults -> logical faults • Logical stuck-at-0 or stuck-at-1 • Slow transition (delay fault) • AND-bridging, OR-bridging

Fault models • Traditional models, first developed for board-level tests, assumes that a node gets “stuck” at “0” or “1”, presumably by shorting to GND or Vdd. • If the output is faulty the entire gate is “stuck”. There are also cases which would correspond to a transistor stuck or stuck-off. F=(A+B)’ What about Fx (F with stuck off fault)?

Fault Models • Most Popular – “stuck-at” model Sa0 (output stuck at 0) Sa1 (input stuck at 1) • Covers almost all (other) occurring faults, such as opens and shorts 1,3: x sa1 2: y sa0 or x sa0 3: z sa1 z 3 x 1 w y 2

Another example

Stuck-at fault • Single stuck at fault models are used frequently • Complexity of test generation is greatly reduced • Single stuck-at fault is independent of technology, design style • Single stuck-at tests cover a large percentage of multiple stuck-at fault • Single stuck-at tests cover a large percentage of unmodeled physical defects

Delay fault • Cause timing failures at target speed • Reason for delay fault • Improper estimation of on-chip interconnect delay and other timing consideration • Excessive variation in the fab. Process -> variations in circuit delay and clock skew • Open in metal line connecting parallel transistors • Aging effects such as hot carrier induced delay increase. • Detecting delay fault is even more subtle than detecting functional faults in steady state.

Problem with stuck-at model: CMOS open fault • Sequential effect: needs two vectors to ensure detection y x y z x 0 x 1 z x 1 1 0 1 0 zn-1 y • Other options: • Use stuck-open or stuck-short models • This requires fault-simulation and analysis at the switch or transistor level – very expensive

Problem with stuck-at model: CMOS short fault A • Cause short circuit between Vdd and GND for A-C=0 and D = 1 • Possible approach: • Supply Current Measurement (IDDQ) • Not applicable for gigascale integration B 0 C D 0 C A 0 D B 1

Design for Testability Combinational function Sequential function • Exhaustive test is impossible or unpractical. • We need to find meaningful vectors to test for possible faults????? • Not easy because of limited IO and increased complexity • Concept of: Controllability and observability 2n inputs required to exhaustively test circuit 2n+m inputs required to exhaustively test the circuit

Controllability and Observability • Controllability – measure of how easy the controller (test engineer) can establish a specific signal value at each node by setting values at the circuit inputs • Observability - measure of how easy the controller (test engineer) can determine the signal value at any logic node by control values at the circuit primary inputs and observing the primary circuit outputs • Degree of controllability and observability (testability) can be measured with respect to whether the test vectors are generated deterministically or randomly.

Path Sensitization • Step1: Sensitize the circuit: Find input values that produce a value on the faulty node that’s different from the value forced by the fault. For our S-A-1 fault example, we want output of OR gate to be 0. • Is this always possible? What would it mean if no such input values exist? • Is the set of sensitizing input values unique? • What’s left to do?

Error Propagation • Step2: Fault propagation: Select a path that propagates the faulty value to an observed output (Z in our example) • Step3: Backtracking: Find a set of input values that enables the selected path. • Is this always possible? • What would it mean if no such input values exist?

Testability • Example of non-testable error: • For x=1 we need both a and =1,  What ever the value of C, one of the three outputs is 1: PB!!!! Two possible propagation of “1”  Pb: Fault propagation

Controllability and Observability Fault test pattern generation: Fault sensitization – input vector to sensitize a fault Fault propagation – condition that propagate the fault to the output so that it can be observed. Line 7 cannot be tested at the primary output. Thus this circuit is not fully testable. Reason: reconvergent fanout of line 7

Controllability and Observability • Circuit with poor controllability • Circuits with feedbacks • Decoder and clock generator • Circuits with poor observability • Sequential circuits with long feedback loops • Circuits with reconvergent fanouts • Redundant nodes • Embedded memories such as RAM, ROM, PLA • Use self test for circuits with poor observability

Generating and Validating Test-vectors • Automatic test-pattern generation (ATPG) • For given fault, determine excitation vector (called test vector) that will propagate error to primary (observable) output • Majority of available tools: combinational networks only • Sequential ATPG available from academic research • Fault simulation • Determine fault coverage of proposed test-vector set • Simulate correct network in parallel with faulty networks • Both require adequate models of faults in CMOS IC

ATPG Process Fault Selection Fault Observe Point Assessment Fault Excitation Vector Generation Fault Simulation Fault Dropping

ATG for fanout-free combinational circuits • 2 steps • Activate (excite) the fault from the primary input • For signal l with stuck-at-v fault, set primary input values such that signal l equal to v’ • Called justification problem – find an assignment of PI vaues that results in a desired value setting on a specified signal in the circuit • Propagate the resulting error to a primary output • Composite logic values (v/vf), where v and vf are values of the same signal in N and Nf, where N and Nf are the fault-free circuit, and faulty circuit, respectively. • Composite logic values (1/0, denoted by D) and (0/1, denoted by D’) represent errors • We have this logic behavior: • D+D’=1, D.D’=0,D+D = D.D=D,D’+D’=D’.D’=D’, D+0=D, D’+0=D’,….

Test generation for the fault l stuck-at-v in a fanout-free circuit Begin set all values to x Justify(l,v’) if v = 0 then propagate(l,D) else propage(l,D’) end

Stuck-at-0 a f j b h g c i d e Example Stuck-at-0 1 a f 1 D D j b h D g 1 0 0 c i d x e 0

Circuits with Fanout • Two basic goals: fault activation and error propagation • Fanout – several ways to propagate an error to PO • Fundamental difficulty • Reconvergent fanout – the resulting line justification problems are no longer independent

Example d f1 G1 G4 G6 a f2 G2 b s-a-1 c G5 G3 e The only vector that can test the fault is 111x0

Another Example a k b q l c d m s r n o e p f h s-a-1

Fault SImulation • Applying a set of vectors to a structural (netlist) description of a design and determining how many and which faults are detected out of the total set of available faults. • Concurrent fault simulation • Applies the vectors to many copies of the netlist at the same time. • Each copy contains one or more faults. • Each of these simulations is run concurrently with a good circuit simulation • If a difference is observed at the legal observation point between the good circuit and any faulty circuit simulation, the fault is listed as detected

What can we do to increase testability? • Increase observability: • Add more pins (???? Can be a problem) • Add small “probe” bus to selectively enable different values onto the bus • Compress a sequence of values (for example a value of a bus over many clock cycles) into a small number of bits for later read-out • Increase controllability • Use Multiplexers to isolate sub-modules and select sources of test data as inputs • Provide easy setup of internal states

Test approaches • Scan-based testing • Built-in self test

Scan-based technique • Minimize the use of additional I/O pins for testing • Use scan registers with both shift and parallel load capabilities. • Storage cells in registers act as observation points, control points or both. • Reduce testing of a sequential circuit to that of a combinational circuit

Scan the idea • Two modes of operations: normal and one in which all registers are chained into one long shift register which can be loaded and read-out serially. Scan-in Scan-out reg Comb. Logic A reg Comb. Logic A Scan-based structure

Scan-based structure

Scan-path register scan f2 f1 out scanin scanout in keep load

In3 In1 In2 test test test test test test test test scanin latch latch latch latch Scan-based Test- operation In0 scanout Out0 Out1 Out2 Out3 Test Testing time per test pattern increases due to shifting time in long register. f1 f2 1 cycle evaluation N cycle scan-out N cycle scan-in

Scan-path Testing in0 in1 scanin Reg Reg Reg Reg + Reg > scanout Reg out

JTAG – Boundary Scan • Testing PCB and multichip modules carrying multiple pins • Shift registers are placed in each chip close to I/O pins in order to form a chain around the board for testing PCB Scan path

Buit-In Self Test –The idea • Problem: Scan-based approach is very useful for testing combinational logic but can be impractical when trying to test memory blocks because of the number of separate test values required to get adequate fault coverage. • Solution: use on-chip circuitry to generate test data and check the results. Can be used at every power-on to verify correct operation! Generate pseudo-random data for most circuit using e.g. a linear Feedback shift register (LFSR). For pseudo-random input data, compute some output values and compare against expected value “signature” at the end of the test.

Built-in self test • Parts of the circuit are used to test the circuit itself. • Essential circuit modules: • Pseudo random pattern generator (PRPG) • Output response analyzer (ORA)

PRPG using LFSR • LSFR – linear feedback shift register Q0 Q1 Q2 • 0 0 • 1 1 0 • 1 1 • 0 1 1 • 0 1 • 0 1 0 • 0 0 1 • 1 0 0

ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 10 – Design for Testability

ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 10 – Design for Testability

Presentation Transcript

CMOS VLSI Design Lecture 1 3 : Design for Low Power

332:479 Concepts in VLSI Design Lecture 5 MIPS Processor Example

Design and Implementation of VLSI Systems (EN0160) Lecture 25: Sequential Circuit Design (3/3)

ELEC 7770 Advanced VLSI Design Spring 2007 Introduction

ELEC 7770 Advanced VLSI Design Spring 2008 Design for Testability (DFT): Scan

Lower Power VLSI Design Research Trends

ECE 681 VLSI Design Automation

EE466: VLSI Design Lecture 17: Design for Testability

ELEC516 VLSI System Design and Design Automation Spring 2010 Course Description

ECE-777 System Level Design and Automation Hardware/Software Co-design

ELEC 7770 Advanced VLSI Design Spring 2007 Retiming

ELEC 7770 Advanced VLSI Design Spring 2014 Power and Ground

VLSI DESIGN

Lecture 10 Design Automation

VLSI DESIGN Lecture 10 Design for Testability

ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

ELEC 7770 Advanced VLSI Design Spring 2012 Timing Verification and Optimization

ECE 681 VLSI Design Automation

ELEC 7770 Advanced VLSI Design Spring 2014 VLSI Yield and Moore’s Law

Chapter 2

Design and Implementation of VLSI Systems (EN1600) Lecture 20: Combinational Circuit Design (2/3)

ELEC 7770 Advanced VLSI Design Spring 2012 Soft Errors and Fault-Tolerant Design