1 / 25

EE800 Circuit Elements in Digital Computations (Review)

EE800 Circuit Elements in Digital Computations (Review). Professor S. Ko Electrical and Computer Engineering University of Saskatchewan Spring 2010. To begin with. Combinational logic vs. sequential logic Moore machine (current state only) vs. Mealy machine (current state + input)

ackley
Download Presentation

EE800 Circuit Elements in Digital Computations (Review)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EE800Circuit Elements in Digital Computations(Review) Professor S. Ko Electrical and Computer Engineering University of Saskatchewan Spring 2010 Summary, EE800

  2. To begin with • Combinational logic vs. sequential logic • Moore machine (current state only) vs. Mealy machine (current state + input) • Latch vs. Flip-Flop Summary, EE800

  3. 1 ExTimeold ExTimenew Speedupoverall= = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced Performance and Cost • Amdahl’s Law: Performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. Ex. New CPU is 10 times faster than original processor. CPU busy with 40 % of the time, waiting for 60 % of the time. Sol. Fractionenhanced = 0.4, Speedupenhanced = 10 Speedupoverall = 1 / {0.6 + (0.4/10)} = 1 / 0.64 = 1.56 Summary, EE800

  4. CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Performance and Cost CPU performance is equally dependent on 3 characteristics: clock cycle (or rate), clock cycles per instruction, and instruction count. Summary, EE800

  5. More.. RISC vs. CISC Gustafson’s law Amdahl’s law Big Endian vs. Little Endian Moore’s law Summary, EE800

  6. x’ y z w x’ z’ w’ y f z’ w’ x z’ w y’ z w’ AND/OR EXPRESSION & REALIZATION xy 00 01 11 10 00 01 zw 11 10 Summary, EE800

  7. z’ y’ w’ x’ f y’ z’ x’ y w AND/XOR EXPRESSION & REALIZATION xy 00 01 11 10 00 01 zw 11 10 Summary, EE800

  8. More.. • 5(6)-variable Kmap • Q-M Algorithm • Two-level minimization • Multi-level minimization • Technology mapping (Shannon’s theorem, Davio theorem) Summary, EE800

  9. Simple Adders S = x y  cin Cout = ycin + xy + xcin Binary half-adder (HA) and full-adder (FA). Summary, EE800

  10. Critical path Ripple-Carry Adder: Slow But Simple Ripple-carry binary adder with 32-bit inputs and output. Summary, EE800

  11. Carry Propagation 0111 + 0001 1000 Summary, EE800

  12. Carry Lookahead AdderMultiplication: Booth algorithmDivision: restoring, non-restoring, SRT..ComparatorFixed point vs. Floating pointetc. Summary, EE800

  13. Objectives • IEEE 754-2008 standard for Decimal Floating-Point (DFP) arithmetic (Lecture 1) • DFP numbers formats • DFP number encoding • DFP arithmetic operations • DFP rounding modes • DFP exception handling • Algorithm, architecture and VLSI circuit design for DFP arithmetic (Lecture 2) • DFP adder/substracter • DFP multiplier • DFP divider • DFP transcendental function computation EE800, U of S

  14. DFP Add/Sub Data flow Summary, EE800

  15. Architecture of DFP Multiplier Summary, EE800

  16. DFP Division Data Flow • Unpacking Decimal Floating-Point Number • Check for zeros and infinity • Subtract exponents • Divide Mantissa • Normalize and detect overflow and underflow • Perform rounding • Replace sign • Packing Summary, EE800

  17. Architecture: Decimal LogConverter

  18. Architecture: Dec. Antilog Converter

  19. Memory Hierarchy • Principle of locality + smaller hardware is faster + make the common case faster + CPU-memory performance gap • 4 memory hierarchy questions • Where can a block be placed in the upper level? • How is a block found if it is in the upper level? • Which block should be replaced on a miss? • What happens on a write? • Reducing miss rate: • Compulsory: the first access to a block cannot be in the cache • Capacity: if the cache cannot contain all the blocks needed during execution of a program • Conflict: in set-associative or direct-mapped because a block may be discarded and later retrieved if too many blocks map to its set • Performance = f (hit time, miss rate, miss penalty) • danger of concentrating on just one when evaluating performance Summary, EE800

  20. Memory HierarchyCache Optimization Summary Technique MR MP HT Complexity Larger Block Size + – 0Higher Associativity + – 1Victim Caches + + 2Pseudo-Associative Caches + 2HW Prefetching of Instr/Data + 2Compiler Controlled Prefetching + 3Compiler Reduce Misses + 0 Priority to Read Misses + 1Subblock Placement + 1Early Restart & Critical Word 1st + 2Non-Blocking Caches + 32nd Level Caches + 2 Small & Simple Caches – + 0Avoiding Address Translation + 2Pipelining Writes + 1 miss rate miss penalty hit time Summary, EE800

  21. MultiprocessorsAn Example Snoopy Protocol • Invalidation protocol, write-back cache • Each block of memory is in one state: • Clean in all caches and up-to-date in memory (Shared) • or Dirty in exactly one cache (Exclusive) • or Not in any caches • Each cache block is in one state (track these): • Shared : block can be read • or Exclusive : cache has only copy, its writeable, and dirty • or Invalid : block contains no data • Read misses: cause all caches to snoop bus • Writes to clean line are treated as misses Summary, EE800

  22. P P P P C C C C Multiprocessors A Snoopy Cache Coherence Protocol Bus Memory Finite-state control mechanism for a bus-based snoopy cache coherence protocol with write-back caches Summary, EE800

  23. MultiprocessorsDirectories to Guide Data Access Distributed shared-memory multiprocessor with a cache, directory, and memory module associated with each processor Summary, EE800

  24. MultiprocessorsDirectory-Based Cache Coherence States and transitions for a directory entry in a directory-based cache coherence protocol (c is the requesting cache) Summary, EE800

  25. MultiprocessorsSnooping vs. Directory • Snooping • Useful for smaller systems • Send all requests for data to all processors • Processors snoop to see if they have a copy and respond accordingly • Requires broadcast, since caching information is at processors • Works well with bus (natural broadcast medium) • But, scaling limited by cache miss & write traffic saturating bus • Dominates for small scale machines (most of the market) • Directory-based schemes • Scalable multiprocessor solution • Keep track of what is being shared in a directory • Distributed memory → distributed directory (avoids bottlenecks) • Send point-to-point requests to processors Summary, EE800

More Related