Investigation of Treatment of Influential Values

Investigation of Treatment of Influential Values Mary H. Mulry Roxanne M. Feldpausch

Outline • Current practices • Methods investigated • Results • Next steps

Influential Observation An observation is considered influential if its weighted contribution has an excessive effect on the estimate of the total (Chambers et al 2000)

The Data - U.S. Monthly Retail Trade Survey • Collect sales and inventories • Monthly survey of about 12,500 retail business with paid employees • Sample selected every 5 years • Sample is stratified based on industry and sales • Quarterly sample of births • Deaths are removed

The Data • Analysis done at published NAICS level • Hidiroglou-Berthelot algorithm ran on the data before looking for influential values • Horvitz-Thompson estimator

Causes of Influential Units • One time or rare event • Erroneous measure of size • Change in the make-up of the unit • Seasonal Businesses

Current Practices • Analyst review an effect listing of micro level data and investigates units that may be influential • When the analyst determines a correctly reporting unit may be influential, the case is referred to a statistician

Current Practices • One time influential value • Imputation • Recurring influential value • Weight adjustment based on the principles of representativeness • Moving the unit to a different industry when the nature of the business changes

Goals • To improve upon current methodology by making it more objective and rigorous • To find methodology that uses the observation but in a manner that assures its contribution does not have an excessive effect on the total

Assumptions • Influential observations occur infrequently, but are problematic when they appear. • The influential observation is true, although unusual. It is not the result of a reporting or coding error.

Strategy • Identify candidate methodologies and test with real data from one industry (about 700 businesses) for a month that contains an influential value

Evaluation Criteria • Number of influential observations detected, including the number of true and false detections made • Estimate of bias • Impact on month-to-month change

Notation • where • Yi is the sales for the i-th business in a survey sample of size n • wi is the sample weight for the i-th unit • Xi is the previous month’s sales for the ith business

Methods Examined • Weight trimming • Reverse calibration • Winsorization • Generalized M-estimation

Weight Trimming • Does not identify influential units • Adjusts the weight of the observation

Weight Trimming • Truncate the weight of the influential observation • Adjust the weights of the non-influential observations to account for the remainder of the truncated weight • Sum of the new weights is the same as the sum of the original weights • (Potter 1990)

Weight Trimming Notes • Calculations were done within sample stratum. • Choice of correction factor could be investigated. We arbitrarily chose ci=wi/3.

Reverse Calibration • Does not identify influential units • Adjusts the value of the observation

Reverse Calibration • Use a robust estimation method to estimate the total • Modify the influential observations to achieve that total • (Chambers and Ren 2004)

Winsorization • Identifies influential units • Adjusts the value of the observation

Winsorization • Type I • Type II

Winsorization – Defining K • Define a separate Kh for each stratum in a manner than minimizes the mse (Kokic and Bell 1994) • Define a separate Ki for each observation in a manner that minimizes the mse (Clarke 1995)

Winsorization – Defining K • Use unweighted data to define Kh for each stratum where Kh = mh +2sh • Use weighted data to define Kh for each stratum where Kh = mh +2sh where mh and sh are based on the weighted data

Winsorization-Our Implementation • Used a robust regression in SAS to estimate the parameters needed in the calculations

M-estimation • M-estimators are robust estimators that come from a generalization of maximum likelihood estimation

M-estimation • Identifies influential units • Adjusts either the weight or the value of the influential observation

M-estimation • Used a weighted M-estimation technique that is able to modify the weights or the values of the influential observations (Beaumont and Alavi 2004)

Results

Number of Outliers Detected *Method does not detect outliers, one outlier was specified

Replacement Values (in Millions) *Weight trimming adjusts the other 18 weights in the stratum **Winsor wgt +2s identified 3 other values

Total Sales for the Industry

Chosen for Further Study • Winsorization by each observation • M-estimation by observation • M-estimation by weight

Contact Information Mary.H.Mulry@census.gov Roxanne.Feldpausch@census.gov

Investigation of Treatment of Influential Values

Investigation of Treatment of Influential Values

Presentation Transcript

Investigation of Epidemic

Presenting Numerical Values from an Investigation

Values of Rangelands

Influential RESEARCHERS of psychology

Recurrent Miscarriage, Investigation and Treatment of Couples

Types of Values

Investigation of Fraud

Tables of Values

Values of Rangelands

Treatment of missing values

Methods of Investigation

Plan of Investigation

Ethical/Legal Aspects of Consent to Investigation or Treatment

Values Work in ACT: Dignifying Treatment of Disordered Eating

Investigation of hyperlipidaemia

Investigation of Cuts

Recognition, Investigation and Treatment of Myopathies

Top influential personalities of Oman

Investigation of Biofield Treatment

Influential phases Of Software development

Common Habits of Influential Farmers