1 / 27

Concurrent and Distributed Applications - Testing and Bug Patterns

Concurrent and Distributed Applications - Testing and Bug Patterns. Eitan Farchi. Outline. ConTest Abstract categories of concurrent bugs Sub-categories of concurrent bugs Testing: how to reveal concurrent bugs. Why is Concurrent Testing Hard?. Concurrency introduces non-determinism

gloriajean
Download Presentation

Concurrent and Distributed Applications - Testing and Bug Patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concurrent and Distributed Applications - Testing and Bug Patterns Eitan Farchi

  2. Outline • ConTest • Abstract categories of concurrent bugs • Sub-categories of concurrent bugs • Testing: how to reveal concurrent bugs Contest

  3. Why is Concurrent Testing Hard? • Concurrency introduces non-determinism • Multiple executions of the same test may have different interleaving (and different results!) • an interleaving is the relative execution order of the program threads • Re-executing a test on a single stand-alone processor is not useful • Debugging affects the timing • No useful coverage measures for the interleaving space • Result: Most bugs are found in system tests, stress tests, or by the customer Contest

  4. Re-executing on a Stand-alone Processor is Not Useful Contest

  5. How Does ConTest Find Bugs? • We instrument every concurrent event • Concurrent events are events whose order determines the result of the program • At every concurrent event, a random based decision is made whether to cause a context switch or not • For example using a sleep statement • Philosophy: • Modify the program in such a way that it will be more likely to exhibit bugs (without introducing new bugs) • Minimize impact on testing process (under the hood technology) • Re-use existing tests • Utilize idle computer time Contest

  6. ConTest • Finds concurrency bugs in industrial components • Correctness issues solved • E.g., the interrupt and sleep pair • Bugs found by ConTest are real bugs – a formal prove exists • Several different noise primitives (sleep, yield, synchronize) • Noise strength is randomly chosen around the average • Randomize on the noise primitive • Bug pattern and coverage based heuristics • Replay increases the probability that a concurrency bug is reproduced once found Contest

  7. How much is 1+10+100+1000+10000? Contest

  8. Coverage Results Without ConTest Contest

  9. Coverage Results With ConTest Contest

  10. Bug found by ConTest in an internet search algorithm • The internet search algorithm is about 1000 lines of code • Bug description: • if (connection != null) connection.setStopFlag(); • connection is checked to be not null • CPU is lost • connection is set to null before CPU is regained • If this happens before connection.setStopFlag(); is executed, an exception is taken Contest

  11. A servlet server bug • A high availability servlet server has a hash table that is kept synchronized using a cluster • The high availability servlet has a primary and a backup node • The primary node caches the hash table • Objects are trees: deep copy is obtained using an auxiliary table Contest

  12. A servlet server bug (continued) • thread1: • A node becomes primary • Start deep copying of hash table to CACHE table using the auxiliary table • thread2: • A cache miss on object A occurs • Start deep copying of object A from hash table to CACHE table using auxiliary table • thread1: • Complete deep copying of hash table to the CACHE table. Auxiliary table is discarded • thread2: • An attempt is made to access the auxiliary table Contest

  13. Outline • ConTest • Abstract categories of concurrent bugs • Sub-categories of concurrent bugs • Testing: how to reveal concurrent bugs Contest

  14. Categorizing concurrent bugs • A concurrent event is a synchronization event or a memory access event • An interleaving is a sequence of concurrent events that occurred in a specific execution of P (Java replay, Choi and Srinivasan, 1998) • For a given program P • I(P) is the space of all possible interleavings of P • C(P) is the space of all correct interleavings of P Contest

  15. Categorizing concurrent bugs (continued) • [I(P) – C(P)] is the set of erroneous interleavings • Objective: characterize the gap between I(P) and C(P) I(P) C(P) Contest

  16. Example • Given a two byte global variable X • Two threads execute (X = 257) in parallel with (X = 0) The programmer’s intended results are 0 or 257. Thus, C(P) contains: Thread 1 Thread 2 X = 257 X = 0 and Thread 1 Thread 2 X = 0 X = 257 Contest

  17. Example (continued) • The programmer ignored the non-atomicity of the assignment operation • A possible interleaving in I(P) – C(P) is (X[0] is the least significant byte): Thread 1 Thread 2 X[0] = 0 X[0] = 1 X[1] = 1 X[1] = 0 • Result is 1 Contest

  18. Outline • Introduction • Abstract categories of concurrent bugs • Explain why the I(P)-C(P) gap is created • Sub-categories of concurrent bugs or • Bug tales • Testing: how to reveal concurrent bugs Contest

  19. The gap is created when there is • A weak reality principle: the programmer incorrectly assumes that a code segment is protected • As in the first example • Denial: the programmer incorrectly assumes that an interleaving is impossible • Fork/join design pattern when the join is “implemented” using sleep() • Blocking: the programmer incorrectly assumes that a code segment will never block (i.e., wait for an event indefinitely) • A server assumes that incoming messages will arrive, but they never do Contest

  20. Weak reality bug tales • An operation is assumed to be atomic but is actually not • Source code operations often seem to the inexperienced programmer to be atomic when they are not • Example: x++ Contest

  21. Weak reality bug tales (continued) Two stage access: • We are given two tables • To change a record in the second table, the first table is queried and then the second • Each table is protected by a separate lock lock [ First query key1 -> key2 ] window -> the tables can be changed here lock [Second query key2 -> record to be changed ] Contest

  22. Weak reality bug tales (continued) Wrong lock or no lock • Protection of thread one does not apply to thread two • There is an access protocol that is not followed due to: • A new team member • An attempt to improve performance Thread 1Thread 2 Synchronized (o){ x++; x++; } Contest

  23. Weak reality bug tales (continued) • Double-check locking • At object initialization time, the thread local copy of object fields is initialized but not written to the heap • Result: heap view is partially initialized while reference is not null • Source code level is misleading (looks atomic) • Bug pattern is well-documented on the internet Contest

  24. Denial bug tales • One example is adding sleep() statements to ensure that only the correct interleavings occur • As in the fork/join example, partial non-consistent results are used at the join stage • Losing notify: the notify is “lost” because it occurs before the thread executes the wait() primitive • The gap was created because the programmer didn’t think the notify would occur before the wait Thread 1Thread 2 synchronized (o){ o.notifyAll(); } Synchronized (o){ o.wait(); } Contest

  25. Blocking bug tales • Blocking critical section • In the design of a critical section protocol we assume that the thread executing the critical section will eventually exit • This assumption might be broken if the code is written by a third party or a different group • The tale of the orphaned thread • A single master thread drives actions of other threads • Messages are put on the queue by the master thread and processed by the worker’s threads • Abnormal termination of the master thread results in the remaining threads being orphaned • The system often blocks Contest

  26. Halt one thread Heuristic • Originally motivated by the notify bug pattern but has wider applicability • Increases the chance of lost notifies • A thread is randomly chosen to halt • Other threads register when they progress in a global flag • The halted thread periodically checks if any thread is advancing • If non is advancing the halted thread advances, otherwise the flag indicating that other threads are advancing is reset Contest

  27. Summary • The interleaving space can be used to categorized concurrent bugs • In addition to the known deadlock bug, there are three types of concurrent bugs • Weak reality: non-protected code assumed to be protected • Denial: interleaving assumed to never occur • Blocking: blocking code assumed to be non-blocking • ConTest uses bug patterns and coverage based heuristics to find bugs Contest

More Related