1 / 18

COMS W3156: Software Engineering, Fall 2001

COMS W3156: Software Engineering, Fall 2001. Lecture #2: The Open Class Janak J Parekh janak@cs.columbia.edu. Important terminology (I). NEW: Different colors from previous version. ALL NEW: Software is not compatible with previous version. UNMATCHED: Almost as good as the competition.

cooper-cruz
Download Presentation

COMS W3156: Software Engineering, Fall 2001

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMS W3156:Software Engineering, Fall 2001 Lecture #2: The Open Class Janak J Parekh janak@cs.columbia.edu

  2. Important terminology (I) • NEW: Different colors from previous version. • ALL NEW: Software is not compatible with previous version. • UNMATCHED: Almost as good as the competition. • ADVANCED DESIGN: Upper management doesn't understand it. • NO MAINTENANCE: Impossible to fix.

  3. Important terminology (II) • BREAKTHROUGH: It finally booted on the first try. • DESIGN SIMPLICITY: Developed on a shoestring budget. • UPGRADED: Did not work the first time. • UPGRADED AND IMPROVED: Did not work the second time.

  4. Some leftover points from last class • Plagiarism: I was being cute last time – you will get into trouble if you are caught. • Books: They’re available from Papyrus, 114th and Broadway • Office hours: Sorry about this week… • Questionnaire: finally done, see http://softe.cs.columbia.edu • C/C++ students, talk to me

  5. Next class – course “begins” • Read chapters 1 and 4 of Schach, if you have the book • The first one should be a breeze (introduction); the fourth isn’t that bad (teams) • We will also start discussing the project in detail in next class • Recitations will begin next week

  6. Why Software Engineering? • We started discussing this last class • Mythical Man-Month: start reading it when you get a chance; we’ll go over it later • In the meantime, let’s discuss some case studies of how software engineering (or lack thereof) changed certain operations

  7. Success/Failure: Mars Rover (I) • http://catless.ncl.ac.uk/Risks/19.49.html#subj1 • To the public, it was said in 1997 that “software glitches” and “too many things trying to be done at once” were the cause of the Pathfinder’s failures • In reality, “priority inversion” was at fault

  8. Success/Failure: Mars Rover (II) • There were three main threads, scheduled preemptively • Information bus data-moving: high priority, frequent • Meterological data-gathering: low priority, occasional • Communications task: medium priority, occasional • Occasionally, the communications task would be scheduled during a blocked information bus operation, since the bus was waiting for the meteorological data to be gathered

  9. Success/Failure: Mars Rover (III) • The communications task would prevent the meterological data work to be done, since it was higher priority • A watchdog would occur since the info bus was “dead”, resetting the entire system • The low-priority meterological task upended the system: “priority inversion”

  10. Success/Failure: Mars Rover (IV) • Good news • They had left debugging mode on • The Rover was running VxWorks, a small runtime OS that has tracing capabilities • They managed to trace the source • Lastly, VxWorks has priority inheritance; this means a lower-priority process will inherit the priority of the blocked process if it’s higher. • They were able to upload a small change to solve the crash, as a consequence

  11. Lessons: Mars Rover • Black box testing would have been impossible – had to see interrupts, etc. • Therefore, leaving debugging facilities on afterwards here was a big win • Designing for maintenance • Just because the data bus maintenance task ran frequently and is short means nothing

  12. Failure: Therac-25 (I) • http://sunnyday.mit.edu/papers/therac.pdf - don’t read it if you are squeamish • Therac-25 was a linear accelerator released in 1982 for cancer treatment by releasing limited doses of radiation • This new model was software-controlled as opposed to hardware-controlled; previous units had software merely for convenience

  13. Failure: Therac-25 (II) • Controlled by a PDP-11 computer; software controlled safety • In case of error, the software was designed to prevent harmful effects • However, in case of software error, cryptic codes were given back to the operator: “MALFUNCTION xx”, where 1 < xx < 64

  14. Failure: Therac-25 (III) • Operators were rendered insensitive to the errors; they happened often, and they were told it was impossible to overdose a patient • However, from 1985-1987, six people received massive overdoses of radiation; several of them died

  15. Failure: Therac-25 (IV) • Main cause: • Race condition often happened when operator entered data quickly, then hit the UP arrow key to correct, and values weren’t reset properly • AECL (the company) never noticed quick data-entry – their people didn’t do this on a daily basis • Apparently the problem existed in previous units, but they had a hardware interlock mechanism to prevent it; here, they trusted the software and took out the hardware interlock

  16. Lessons from Therac-25 (I) • Overconfidence in software, especially for embedded systems • Reliability != safety • No defensive design, bizarre error messages • They just “bugfixed”, didn’t look for root causes • Complacency

  17. Lessons from Therac-25 (II) • Improper software engineering practices • Most testing, in reality, was done in a simulated environment and a complete unit; little if any unit and software testing • They claimed 2700 hours of testing; it was really 2700 hours “of use” • Overly complex, poorly organized design • Blind software reuse

  18. Is there a “successful” way? • Hard to say – software engineering is an imprecise field • There’s always “room to improve” • Nevertheless, there are many examples of million-dollar savings after initial investments that seemed large, but was quickly offset by the cost-savings • See the book

More Related