MODELS & DATA

MODELS & DATA • A Four-Box Model of a DSS / BI System • Implicit vs Explicit Models • Typologies of Models • Types of Data • The Model-Data Interdependency • Is Quality Data Worth It? • A Predictive Model for Evaluating Pricing Policies

USER INTERFACE DECISION MODELS A FOUR-BOX MODEL OF A DSS/BI SYSTEM MANAGER STATISTICAL ANALYSIS DATA BASE ENVIRONMENT

Standard statistical packages for : Time series analysis Moving averages Exponential smoothing  Seasonal adjustments Trend curves Regression analysis, etc. ANALYSIS OF DATA Most frequently used operations are simple : Segregating data into groups Aggregating data Making comparisons Taking ratios Picking out exceptions Ranking, Plotting, Making tables, etc.

Models provide a framework for identifying what data should be collected and how it should be processed once obtained MODELS AND DATA Good data are vital ... but data for data's sake is a worthless luxury John D.C. Little David Montgomery & Glen Urban

What is a Model? Whenever a manager (or anybody else) looks at data, he or she has a preconceived idea of how the world works and therefore of what is interesting or worthwhile in the data. We shall call such ideas models. John D. C. Little Models provide the means for converting data into actionable information...

A model is the decision-maker's perception of how something works PROBLEM DATA ANALYSIS MODEL ACTIONABLE INFORMATION WHAT IS A MODEL ? All decisions are based on some kind of model

Implicit Models (or Mental Models) - Models carried in people's heads Explicit Models - Prose Models - Flow Models - Mathematical Models IMPLICIT vs EXPLICIT MODELS Key Issues Why do managers use implicit models ? What are the benefits of explicating an implicit model ? What problems are encountered when explicating an implicit model ?

Descriptive Models Describes how something works Predictive Models Provides “what if” information Normative Models Prescribes the “best” solution to the problem A Typology of Models- What is the Purpose?

1. How the model is formulated? Linear vs. Non-linear Models How time is handled? Static vs. Dynamic Models How risk is handled? Deterministic vs. Stochastic Models At what level of detail? Micro vs. Macro Models A Typology of Models- How is the Real World Represented?

Optimization Models Determines the “best” values for the decision variables in the models Simulation Models Evaluates consequences of alternative decisions A Typology of Models- How is the Model Analyzed?

Search for the best solution using an optimizing model Problems: Model may not fit the problem More data needed More time and cost Higher intellectual cost "SATISFICING" vs. "OPTIMIZING"IN DECISION-MAKING Choose a solution that is good enough using manager's rules of thumb or heuristics. Benefits: Saves time and cost Easy to implement versus

3.JUDGEMENTAL DATA - Data based on experience, knowledge and judgement TYPES OF DATA 1.SECONDARY DATA - Readily available data 2.PRIMARY DATA - Data generated for the problem at hand

ITERATIVE PROCESS OF BUILDING MODELS 1. Define the Problem to be Addressed by the Model 2. List Relevant Factors - Do not worry about Data 3. Select the Most Critical Factors 4. Link the Selected Factors 5. Obtain the Required Data 6. Develop the System 7. Validate the Output from the System 8. Sensitivity Analysis of the Output from the System

Whirlpool- Schedules service calls of all technicians from a single site in Knoxville, Tennessee Oakwood Medical Labs, Detroit- Arranges the 800 stops of 26 drivers each day to pick up blood samples from, and drop-off time-sensitive results to, 1000 clinics and hospitals Sleepy’s - A Mattress Chain in Bethage, N.Y.- Promises quicker home delivery than its competition Homemakers, a Furniture Superstore in Des Moines, Iowa- Offers a two-hour window on next-day home delivery- Previously, “it would take two days to prepare the schedules and, even though we used to give a 4-hour delivery window, maybe we made it on time or maybe not. Source: Wall Street Journal, Apr 2, 1998 RIMMS: A Model-Based System For Efficient Routing & Scheduling

Uses detailed street maps and other data affecting schedules, e.g.: Toll gates and posted speed limits Users add data on scheduled stops, pickups and individual customer time-demands Model calculates the best way to manage a day’s deliveries and pick-ups Users can incorporate soft-data on other relevant factors, for example:- courier pick-ups take several minutes longer than drop-offs, a devilish problem that can throw off schedules- how a storm the previous night can slow driving speeds Biggest Strength: Good Data

Models that are simply wrong. - e.g. linear model of sales to advertising Models that are too big. - require too much data - "larger" is not always "better" What is a "good" model ? easy to understand complete on important issues just enough detail for operational accuracy judicious use of all types of data "BAD" vs "GOOD" MODELS

What are the objectives of the model ? What is the scope of the model ? What data will be used ? How was the model validated ? How sensitive is the output to: - data inputs - model structure - analysis techniques What significant factors have been excluded ? EVALUATING MODELS

The “Chicken or Egg” Question -- An Approach Build the simplest model Use judgmental data if necessary Test sensitivity of the information Get better data Or, improve the model The Model-Data Interdependency Constrained by Available Data DATA MODEL Specifies Data Requirements

Time Series Models (e.g., Moving Averages, Exponential Smoothing) Data readily available Straightforward models BUT ... Ignore what causes sales Regression Models Better because they link sales to “explanatory” variables However ... ... Which variables? Cost of Data? ... What type of relationship? ... Accuracy of projections of the explanatory variables? An Example: Forecasting Sales

Trends in Rx Sales vs Symptoms R

Our Promotions vs Comp. Promotions

Actual vs Predicted Rx Sales Rx Sales = 527 + 0.13*Symptoms + 74*(Our Prom / Comp Prom)

...Managers Ask for Analysis, Not Retrieval A Data Warehouse is Not Enough Because ... Sometimes retrieval questions come up of course, but most often the answers to important questions require non-trivial manipulation of stored data. Knowing this tells us much about the kind of software required. For example, a database management system is not enough. - John Little (1979) “Data” has to be converted into “Information” that triggers managerial action. The conversion process is critical to get value from the data warehouse.

A framework for identifying what data should be collected and how it should be processed Avoids the “completeness” trap in building a data warehouse A “good” model... simple complete on important issues just enough detail for operational accuracy judicious use of hard and soft data Models Help in Data Conversion

. . . More Data . . . More Time to Develop . . . And, Cost More Not just $ but the Intellectual Cost People tend to reject what they do not understand. The manager carries responsibility for outcomes. We should not be surprised if he prefers a simple analysis that he can grasp, even through it may have qualitative structure, broad assumptions, and only a little relevant data, to a complex model whose assumptions may be partially hidden or couched in jargon and whose parameters may be the result of obscure statistical manipulations. - John Little (1970) Better Models Require . . .

Value vs Cost? How to Assess Cost-Effectiveness of Data- A Pragmatic Approach Design a Prototype scaled to the barest minimum Collect data for the Prototype - Lowest data cost Develop Prototype using real data Users evaluate benefits of system “No Go” “Go” Full-blown System Stop

System Objective: To evaluate sales impact of trade promotions Data Problem: Serious gaps in operational data Available data on promotions: How much was spent When the bills were paid Missing key data: When were the promotions run ...to correlate with sales data Issue: Data problem is solvable in principle But... Is it worth the effort and cost? Case Example:A Consumer Packaged Goods Company

Model limited to the core variables sales, promotion expenditures and dates, margins Detailed data needed for useful information by packs for each brand and by markets weekly data for capturing sales fluctuations two years of data to compare pre- with post-deal sales levels Cost of data Manual effort to extract dates of promotions from logbooks Barest-minimum Prototype 2 brands, a major brand and a new brand 8 markets (out of 50), 3 large, 3 medium and 2 small Results Demonstrated the value of collecting the missing data and building an integrated database Led to the development of a promotion-event calendar system The Low-Cost Prototype- To Assess Value of Data

Gaps in Operational Data:A Perennial Problem -- Why? Because of the narrow focus of operational systems Operational systems are an important source of data for decision support Design of operational systems must incorporate data requirements of management support systems An Example: When implementing new Human Resource Information Systems (e.g., PeopleSoft), are the data requirements of human resource management considered? For evaluating hiring sources? Career development? Etc.

Critical Problem for ALL Enterprises Private Sector and Public Sector Predicting Customer Response is Difficult Past behavior is of limited value Competitor’s reactions to “our” price is unpredictable Even More Difficult in the Public Sector Bottom-line impact is not enough Must consider: Who is affected? How? The Product Pricing Problem

Highly non-linear Exhibit “threshold effects” Delayed response Price is only one factor -- other decision variables (e.g., distribution, promotion) interact with price to affect demand External factors, about which we have imperfect information, impact pricing decisions Price and Demand Relationships Are Complex

Current Fare Structure Essentially a “flat” fare Insensitive to distance traveled Inequities of Present Fare Structure Favors long trips at the expense of short ones Long-distance riders -- mostly suburban commuters with relatively high incomes. Short-distance riders -- mostly urban residents traveling off-peak for discretionary purposes Thus, distance inequities often imply social inequities The Transit Pricing Problem

Evens out the fare per mile paid by all riders e.g., with a 25 cent Flat Fare: Rider #1 travels 1 mile and pays 25 cents per mile Rider #2 travels 5 miles and pays 5 cents per mile Drawback of Flat Fares: Long-distance riders being subsidized by short-distance riders Potential of Distance-Based Fares to: Reduce inequities in fare per mile Increase revenue Why Consider Distance-Based Fares?

Operate on aggregate data Relate a measure of travel demand to a set of explanatory (“independent”) variables Measures of travel demand: # of passengers or # of trips Explanatory variables: Demographic variables (e.g., median income), trip characteristics (e.g., peak/off-peak), and decision variables (e.g., fares) Macro Models for Demand Forecasting - The Conventional Tool

Macro Models are useless for evaluating who is affected by a change in transit fares For example: Would a price increase hurt inner city residents more or less than suburban commuters? Would loss in patronage be greater off-peak than peak? Would a lower fare benefit work trips? Shopping trips? A Micro Model at the level of the individual rider is needed to handle the variety of ridership characteristics such as age, income, place of residence, time and purpose of travel, etc. Macro Models versus Micro Models

The Micro Model focuses on the behavior of the individual rider: how is his/her transit usage affected by a fare change? The “what if” forecasts for the individual riders are then aggregated by age, income, purpose of trip, etc. to show what groups of riders would be affected by the fare change. Micro - Simulation Model

Travel demand of a rider would change in a manner governed by the fare elasticity appropriate to that rider. Forecast transit usage and revenue for individual riders in the sample survey. Weight the individual rider’s figures by an expansion factor to project the results to the population. Aggregate the weighted figures by the desired ridership categories to assess the revenue and equity effects. Gist of the Micro Model for Transit Pricing

Model is complete on important factors that affect demand -- income, age, purpose of trip, time of travel, etc. are all represented in the individual riders in the sample -- the “Micro” approach The “what if” demand for a new fare policy is determined through the fare elasticity appropriate for that rider -- the “Simulation” approach Merits of the Micro-Simulation Approach

The micro-simulation results can be subsequently aggregated by any desired rider characteristic for the equity analysis Model is easy to understand -- critical since user will not risk using it for pricing decisions; even more so when a multiplicity of parties are involved as in transit pricing Merits of the Micro-Simulation Approach

Conventional wisdom: “the bigger the better” Problem: The more elaborate the model, the more data needed to set up the model For the model to be useful, it should be: Simple enough for transit managers to readily understand but not simplistic Complete on important issues for a valid assessment of the impact of new fare policies A model that does not rely on historical data for calibration Generating outputs that the user finds easy to interpret Design of the Transit Pricing Model

Forecast Usage for Rider # 1 = Present Usage of Rider #1 + (Elasticity of Rider # 1 * Fare Change Ratio) Above equation adjusts the current demand through a ratio based on the fare elasticity that is appropriate for that rider Micro-simulation is better than a macro regression model in an important way -- the model is robust because reasonable values for the elasticity will not yield unreasonable values for forecast demand What is the Model?

Individual X uses the travel system 5 times per week paying a flat fare of 25 ¢ and traveling a distance of 5 miles per trip Proposed distance-based fare policy: a base fare of 10¢ and a 5¢ increment per mile New fare for this rider is 35 ¢ per trip % change in fare paid by this rider = (10 ¢/25 ¢) x 100 = 40% % change in frequency of ridership = (% change in fare paid) x EE = “fare elasticity of demand” = % change in demand for a 1% change in fare e.g., an E value of -.25 implies that a 1% increase in fare will reduce demand by .25% Hence, for the 40% increase in fare paid by this rider under the new policy, the percent reduction in demand is predicted to be 10% An Example

Different fare elasticities can be applied to individual riders, thus making the model complete on important factors that affect travel demand Calibration of the model involves the estimation of only one parameter - fare elasticity To simplify the calibration, segment the sample of riders into groups that are expected to have the same elasticity Since fare elasticity has a clear operational meaning, it is feasible for the transit managers to judgmentally segment the market and estimate fare elasticities for each segment Key Features of the Model

A model-based set of procedures for processing data and judgments to assist a manager in decision making Enables more policy alternatives to be examined than if the manager relied on judgment alone Uses sensitivityanalysis to test the robustness of the conclusions with regard to the soft data inputs used in the analysis Key element of this concept is its approach to calibration: Use the manager’s judgment, especially when available data are either inadequate or dirty Decision Calculus Concept

For an individual rider in the sample survey: The model calculates the % change in frequency of ridership for the proposed fare change based on the elasticity appropriate for that rider The model applies this % change to current weekly frequency of ridership to obtain predicted new frequency with the proposed policy The model calculates the fare paid per trip under the new policy and the predicted weekly revenue for the individual rider How the Model Works

Predicted ridership and revenue figures for each rider are expanded by suitable factors to project the sample to the ridership population The expanded ridership and revenue figures are then aggregated according to income, age, etc. Computer output includes % changes in ridership and revenue to facilitate “before” and “after” comparisons How the Model Works

Crux of the model: fare elasticity which can be judgmentally estimated by managers using historical estimates, if available, as a first cut. Since all riders in the population do not react in the same way to fare changes, the population should be first subdivided into segments whose members are expected to be fairly similar in terms of their responses to fare changes Since elasticity estimates are soft, sensitivity analysis has to be done using multiple elasticity values to select a fare policy that performs in a satisficing manner with the range of estimates used Why the Model Works

MODELS & DATA

MODELS & DATA

Presentation Transcript

Other Models of Computation

Organizing Data and Information

Introduction to Generalized Linear Models

limma: Linear Models for Microarray Data

Data Stream Algorithms Intro, Sampling, Entropy

Use of Proc Mixed to Analyze Experimental Data

GEE and Mixed Models for longitudinal data

2. Models for cognitive ergonomics

Nonlinear Models with Spatial Data

Efficient Algorithms for SNP Genotype Data Analysis using Hidden Markov Models of Haplotype Diversity

Parallel Programming Models, Languages and Compilers

Traffic Flow models for Road Networks

Parallel Programming Models, Languages and Compilers

SQL and SQAPL

Linear Mixed Models: An Introduction

Two-way fixed-effect models Difference in difference

Index

Database Systems Other Data Models

Chapter 4 Data-Oriented Models

Two-way fixed-effect models Difference in difference

Index

Multilevel Regression Models

MODELS &amp; DATA

MODELS &amp; DATA

Presentation Transcript

Other Models of Computation

Organizing Data and Information

Introduction to Generalized Linear Models

limma: Linear Models for Microarray Data

Data Stream Algorithms Intro, Sampling, Entropy

Use of Proc Mixed to Analyze Experimental Data

GEE and Mixed Models for longitudinal data

2. Models for cognitive ergonomics

Nonlinear Models with Spatial Data

Efficient Algorithms for SNP Genotype Data Analysis using Hidden Markov Models of Haplotype Diversity

Parallel Programming Models, Languages and Compilers

Traffic Flow models for Road Networks

Parallel Programming Models, Languages and Compilers

SQL and SQAPL

Linear Mixed Models: An Introduction

Two-way fixed-effect models Difference in difference

Index

Database Systems Other Data Models

Chapter 4 Data-Oriented Models

Two-way fixed-effect models Difference in difference

Index

Multilevel Regression Models

MODELS & DATA

MODELS & DATA