An Approach to Software Testing of Machine Learning Applications

An Approach to Software Testing of Machine Learning Applications Chris Murphy, Gail Kaiser, Marta Arias Columbia University

Introduction • We are investigating the quality assurance of Machine Learning (ML) applications • Currently we are concerned with a real-world application for potential future use in predicting electrical device failures • Machine Learning applications fall into a class for which it can be said that there is “no reliable oracle” • These are also known as “non-testable programs” and could fall into Davis and Weyuker’s class of “programs which were written in order to determine the answer in the first place. There would be no need to write such programs, if the correct answer were known.”

Introduction • We have developed an approach to creating test cases for Machine Learning applications: • Analyze the problem domain and real-world data sets • Analyze the algorithm as it is defined • Analyze an implementation’s runtime options • Our approach was designed for MartiRank and then generalized to other ranking algorithms such as Support Vector Machines (SVM)

Overview • Machine Learning Background • Testing Approach and Framework • Findings and Results • Evaluation and Observations • Future Work

Machine Learning Fundamentals • Data sets consist of a number of examples, each of which has attributes and a label • In the first phase (“training”), a model is generated that attempts to generalize how attributes relate to the label • In the second phase, the model is applied to a previously-unseen data set (“testing” data) with unknown labels to produce a classification (or, in our case, a ranking) • This can be used for validation or for prediction

MartiRank and SVM • MartiRank was specifically designed for the device failure application • Seeks to find the combination of segmenting and sorting the data that produces the best result • SVM is typically a classification algorithm • Seeks to find a hyperplane that separates examples from different classes • Different “kernels” use different approaches • SVM-Light has a ranking mode based on the distance from the hyperplane

Related Work • There has been much research into applying Machine Learning techniques to software testing, but not the other way around • Reusable real-world data sets and Machine Learning frameworks are available for checking how well a Machine Learning algorithm predicts, but not for testing its correctness

Analyzing the Problem Domain • Consider properties of the real-world data sets • Data set size: Number of attributes and examples • Range of values: attributes and labels • Precision of floating-point numbers • Categorical data: how alphanumeric attrs are addressed • Also, repeating or missing data values

Analyzing the Algorithm • Look for imprecisions in the specification, not necessarily bugs in the implementation • How to handle missing attribute values • How to handle negative labels • Consider how to construct a data set that could cause a “predictable” ranking

Analyzing the Runtime Options • Determine how the implementation may manipulate the input data • Permuting the input order • Reading the input in “chunks” • Consider configuration parameters • For example, disabled anything probabilistic • Need to ensure that results are deterministic and repeatable

Equivalence Classes • Data sizes of different orders of magnitude • Repeating vs. non-repeating attribute values • Missing vs. no-missing attribute values • Categorical vs. non-categorical data • 0/1 labels vs. non-negative integer labels • Predictable vs. non-predictable data sets • Used data set generator to parameterize test case selection criteria

Testing MartiRank • Produced a core dump on data sets with large number of attributes (over 200) • Implementation does not correctly handle negative labels • Does not use a “stable” sorting algorithm

Regression Testing of MartiRank • Creation of a suite of testing data allowed us to use it for regression testing • Discovered that refactoring had introduced a bug into an important calculation

Testing Multiple Implementations of MartiRank • We had three implementations developed by three different coders • Can be used as “pseudo-oracles” for each other • Used to discover a bug in the way one implementation was handling missing values

Applying Approach to SVM-Light • Permuting the input data led to different models • Caused by “chunking” data for use by an approximating variant of optimization algorithm • Introduction of noise in a data set in some cases caused it not to find a “predictable” ranking • Different kernels also caused different results with “predictable” rankings

Evaluation and Observations • Testing approach revealed bugs and imprecision in the implementations, as well as discrepancies from the stated algorithms • Inspection of the algorithms led to the creation of “predictable” data sets • What is “predictable” for one algorithm may not lead to a “predictable” ranking in another • Algorithm’s failure to address specific data set traits can lead to incorrect results (and/or inconsistent results across implementations) • The approach can be generalized to other Machine Learning ranking algorithms, as well as classification

Limitations and Future Work • Test suite adequacy for coverage not addressed • Can also include mutation testing for effectiveness of data sets • Should investigate creating large data sets that correlate to real-world data • Could also consider non-deterministic Machine Learning algorithms

Questions?

An Approach to Software Testing of Machine Learning Applications

An Approach to Software Testing of Machine Learning Applications

Presentation Transcript

Anusaaraka: An Approach to Machine Translation

A Financial Approach to Machine Learning with Applications to Credit Risk

Applications of Machine Learning to Medical Informatics

Software Process Evaluation: A Machine Learning Approach

Practical Approach to Teaching Software Testing

An Introduction to Machine Learning

An Overview of Machine Learning

An Integrated Machine Learning Approach to Stroke Prediction

Aircraft Design and an Engineer’s Approach to Software Testing

Scientific Applications of Machine Learning

Bioinformatics Applications of Machine Learning

A Machine Learning Approach to Programming

Machine learning Applications - SciExperts

Machine Learning Applications

Applications of machine learning

Properties of Machine Learning Applications for Use in Metamorphic Testing

Frontiers in Applications of Machine Learning

Applications of Machine Learning to Ecological Modelling

An Overview of Machine Learning

Machine Learning Projects | Machine Learning Applications | Machine Learning Training | Simplilearn