120 likes | 246 Views
Reduced-Parameter Modeling (RPM) for Cost Estimation Models. Zhihao Chen zhihaoch@cse.usc.edu. Reduced-Parameter Modeling (RPM). What Is RPM?. Why Is It Useful?. How Does It Work?. What Should You Not Use It?. What is RPM?.
E N D
Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu
Reduced-Parameter Modeling (RPM) What Is RPM? Why Is It Useful? How Does It Work? What Should You Not Use It?
What is RPM? • A machine learning technique for determining a minimum-essential set of cost model parameters • Using an organization’s particular project data points • Assuming that the organization’s project data points will be representative of its future projects
Why Is It Useful? • Simplifies cost model usage and data collection • Often improves estimation accuracy • Eliminates highly-correlated, weak-dispersion, or noisy-data parameters • Identifies organization’s most important cost drivers for productivity improvement
Organizations Have Different Data Distributions Correlation Analysis of NASA Project02 22 Projects Correlation Analysis of COCOMO81 63 Projects
Under-sampling: A Case Study for CPLX in NASA 60 • Is software complexity a useful cost driver in this domain? • In NASA60 data set, CPLX=high (usually); • Little information in this parameter • Consider dropping the parameter If the even higher complexity projects were the most important ones to NASA, redefine the complexity for the highly complex NASA systems.
How Does It Work – Technically? • Organization collects critical mass of similar project data • RPM tool starts with Size, tests which additional parameter produces most accurate estimates • By calibrating many times to random data subsets, testing on holdout data points • RPM tool continues to add next best parameters until accuracy starts to decrease • This produces best RPM for the data set
Real and Large Industry Data • Research is supported by CSE and NASA/JPL • Two datasets are public and available from PROMISE Software Engineering Repository - http://promise.site.uottawa.ca/ • 63 projects in Cocomo81/Software cost estimation • 60 projects NASA/Software cost estimation • Two datasets from COCOMO II database • 161 projects in COCOMO II 2000 • 119 projects in COCOMO II 2004 • More data are coming • 30 more projects from JPL • The techniques can be applied and basic results generalized to any model
What Should You Not Use It • Do not subtract the parameters are important. • In many domains, expert business users hold in their head more knowledge than might be available in historical databases • Do not subtract parameter you still might need them. • User needs some of the subtracted parameters to make a business decision.
Published Results Some results have been recently published on the use of data mining and machine learning techniques to analyze cost estimation models and data • Chen, Menzies, Port, and Boehm. "Finding the Right Data for Software Cost Modeling", IEEE Software 11/2005. • Menzies, Port, Chen, and Hihn. "Specialization and Extrapolation of Software Cost Models", ASE 2005, Long Beach, California, 11/2005. • Menzies, Port, Chen, Hihn, and Stukes. "Validation Methods for Calibration Software Effort Models", ICSE 2005, 05/2005, St. Louis, Missouri • Yang, Chen, Valerdi, and Boehm. "Effect of Schedule Compression on Project Effort", ISPA 2005, 06/2005, Denver, Colorado • Chen, Menzies, Port, and Boehm. "Feature Subset Selection Can Improve Software Cost Estimation Accuracy", PROMISE 2005, 05/2005, St. Louis, Missouri • Menzies, Chen, Port, and Hihn. "Simple Software Cost Analysis: Safe or Unsafe?", PROMISE 2005, 05/2005, St. Louis, Missouri All papers are available from http://www.ssei.org/chen/papers/papers.html