1 / 52

Editing And Imputation For Manufacturing Statistics At Statistics Canada

Editing And Imputation For Manufacturing Statistics At Statistics Canada. Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March 15 to 17, 2011. Outline Of The Presentation. Overview of the Manufacturing Program Centralized Process Surveys

frey
Download Presentation

Editing And Imputation For Manufacturing Statistics At Statistics Canada

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March 15 to 17, 2011

  2. Outline Of The Presentation • Overview of the Manufacturing Program • Centralized Process • Surveys • Overview of the UES Survey Process • Post Collection Processing Inputs & Tools • Use of Tax Data • The many phases of UES Post Collection Process • Managing the UES Post Collection Process

  3. Statistics Canada

  4. Business and Trade Statistics Industry Statistics Economy-wide Statistics Agriculture, Technology and Transportation Statistics Manufacturing and Energy Consumer Prices Agriculture Distributive Trades International Trade Small Business And Special Surveys Service Industries Producer Prices Science, Innovation And Electronic Information Enterprise Statistics Investment and Capital Stock Transportation Enterprise Statistics Statistics Canada

  5. Manufacturing Distribution Of Sales

  6. Establishments primarily engaged in the physical or chemical transformation of materials and substances into new products Includes assembly of the component parts of manufactured goods, blending of materials, finishing of manufactured products by dyeing, heat treating, plating and similar operations Transformation of own materials or those owned by others Service outputs: custom work, repair and maintenance Product outputs: finished goods, intermediate goods Who Are Manufacturers?

  7. Monthly Survey of Manufacturing (MSM) Annual Survey of Manufactures and Logging (ASML) Series of sub-annual commodity surveys Manufacturing Program At Statistics Canada (STC)

  8. Monthly indicator of manufacturing activity Last Redesign in 1999 Designed to be a reliable indicator for both trends and levels Establishment Survey (n= 10,500) Stratified by Province, NAICS and Size General Characteristics Of The MSM

  9. Sales Goods of own manufacture Inventories Raw materials Goods-in-process Finished products Orders New orders Unfilled orders Goods purchased for resale (revenue and inventory) These data are collected but not released Sales is the main concept, exceptionally production for some industries (aerospace and shipbuilding) MSM Concepts

  10. Frame And Coverage

  11. MSM Sampling Plan Take-All Take-Some Survey Tax replaced Take-None

  12. Background The Goods and Services Tax (GST) is the federal Value Added Tax GST is collected by the Canada Revenue Agency (CRA) The CRA provides tax data to Statistics Canada Information received includes the Business Number, revenue, tax remitted and input tax credit MSM Sampling Plan: Use Of Tax

  13. Who is replaced? Single establishment enterprises Replace 50% of sampled data with GST data Chronic refusals Who are not replaced? Very large single enterprise establishments Complex units (i.e. multiple establishments) – as it is found in the GST database Use Of Tax Data

  14. Measures the contribution of manufacturing industries to economic activity in Canada In 2010, manufacturing accounted for 15% of GDP and 12% of total employment (SEPH) Key input to SNA Input-Output tables Survey collects data on what commodities are produced (Make matrix) where commodities are destined (provincial I/O tables) what commodities and primary inputs are used in production (Use matrix) What Is The Annual Survey Of Manufactures And Logging (ASML)?

  15. ASML is conducted under the umbrella of Statistics Canada’s Unified Enterprise Survey Program (UES) Same as MSM Establishments primarily engaged in manufacturing and logging activities and classified to NAICS 31, 32 and 33 as well as NAICS 113 Estimates produced for 261 NAICS6 level industries Estimates produced for the 10 provinces and 3 territories. Survey Coverage

  16. Revenue variables (16), expense variables (43), detailed opening and closing inventories (12), other financial (5) Sales or outputs variables are valued at producer or FOB factory gate prices required by SNA Commodities consumed (inputs) and produced (outputs) both goods and services Collect commodity values and quantities (for selected goods) Services produced and consumed collected as expense items and classified based on COA Content: Commodity Variables

  17. Types Of Administrative (Tax) Data • From the Canadian Revenue Agency (CRA) • Agreement between two agencies • T1 (unincorporated businesses) • T2 (incorporated businesses) • T4 (pay slips) • GST (goods and service tax) • PD7 (payroll deduction accounts)

  18. Editing And Imputation For Manufacturing Surveys

  19. Why A Centralized Process? Best Practices Standardization of Processes Cross Survey Comparisons Enterprise Centric Processing/Coherence Analysis Efficient use of Resources Transportable Knowledge Across Survey Programs

  20. Challenges Of A Centralized Process Remain Centralized Distribute processing Priority Setting Communication and Coordination

  21. UES Post-Collection Processing “Clean” Records Tax Data Central Data Store Pre-Grooming USTART Edit & Imputation Subject Matter Review & Correction Tool Allocation / Estimation

  22. Collection Period: February to early October Collection Processing System: Blaise Blaise can be seen as being a Collection Control Center Blaise has many functions: Call Scheduler Transaction history files Audit Trail Files And more Collection

  23. Questionnaire number Mail-out date Number of calls Length of the call Number of contact attempts Response code And more Blaise: Variables

  24. BlaiseTransaction History (BTH) Files Collection data analysis: Produced a paper on best time to call Produced a paper on maximum # of attempts Audit Trail Files Find outliers Difficult to answer questions Blaise: Bonuses Over The Years

  25. Precontact(Dec-Jan) Mostly for Business Register (BR) births; verification of contact information (name, address, …) By phone (in a few cases, a letter or a fact sheet is sent) Mail-out of questionnaires (Jan-March) 2 or 3 mail-out dates Follow-up in case of non-response for some units (begins about a month atfer mail-out) Phone call, remail or fax Mail-back of questionnaires Verifications of received questionnaires / Edits Is the questionnaire complete or are some key variables missing? (Edit follow-up by phone in some cases) Collection

  26. Coding of questionnaires (about 20 response codes) Response, non-response, out-of-scope, … Imaging / Data capture (CADI - Computer Assisted Data Input) Collection

  27. Centralized Collection Pre-Contact (17K Businesses) Score Function Mailout (38K CEs) Edit / Verification (BLAISE) Receipt (75% target) “Clean” Records Capture / Imaging Delinquent Follow-Up

  28. Introduced in 2002, the UES score function is the main tool used at the collection stage to determine which priority to give for the follow-up of about 23,000 Collection Entities (CE) each year. Reduces collection costs yet retains data quality Similar to the collection goal of obtaining a high weighted coverage response rate. PRIORITY 1:Extensive follow-up for the larger revenue CEs in cases of non-response. PRIORITY 0:Minimum follow-up for the smaller CEs in cases of non-response. UES: Data Collection / Score Function

  29. Operating Surplus Value added Shipments Outputs Inputs GDP EBIT Sales Gross profit Operating revenue Cost of sales Expenses Chart Of Accounts COLLECTION LINK, BRIDGE, CONCORDANCE DISSEMINATION

  30. Standardization in business data collection Higher survey response Increase in quality of data Comparison of data from various sources Increase efficiency in using administrative data Expected Benefits Of A Chart Of Accounts

  31. Links To Chart Of Accounts CHART OF ACCOUNT Establishment Enterprise Legal entity

  32. UES: Use Of Tax Data • Validation (comparison) • Verify dubious collected data against the equivalent tax data record • Imputation • One of the methods used for non-response • Estimation • Below take-none • Direct Data Replacement • Update Business Register • Allocation of survey data (use tax revenues, salaries and expenses)

  33. Centralized Processing Systems And Databases • Develop centralized systems • Move away from stand-alone • Single point of access for security • Integrated Questionnaire Metadata System • Edit and imputation • Allocation and Estimation • Data Warehouse

  34. Enterprise Portfolio Managers • Top 350 enterprises in Canada • Status • Platinum, Gold, Silver, Bronze • Personal visits • Enterprise Profiling • Coordination of mail-out and collection • Enterprise/ Establishment coherence • Holistic Response Management • Strategic Response Unit • Escalation Process / Statistics Act

  35. Review and Correction (Post-Capture) Done via an application which is a micro-editing tool Opportunity to perform edits and to manually correct data before the automated edit and imputation process Opportunity to gain an understanding of the quality of data coming in from the field

  36. What Is Generally Done By SMOs During This Process? Ensure that industry codes are valid and response code are correct Ensure that equivalent survey cells have consistent data Enter data for records that came in after the collection cut-off date Review high impact outliers in terms of profit, average salary, etc. Check comments made by respondents and collection staff

  37. Why Is This Process Necessary? Reviewing and correcting records will increase the number and quality of donors for the automated edit and imputation (E&I) stage. This will improve the quality of data coming out of E&I. Need to assess the quality of collected data Determine if problems with questionnaire Inability of respondent to provide a given data point Determine if enough data for E&I

  38. What Should Not Be Done During This Process? Do not plug data for non-response records. They will be imputed during the automated E&I.

  39. What Is E & I? Editing Verify that parts add-up to total Ensure that there are no missing values where parts add up to total There must be consistency between related variables Imputation Changing values in fields which fail edit rules with a view to ensuring that the resulting data satisfy all edit rules. In practice, reported data will rarely be changed Impute for missing data or partially responded data Impute entire records in the case of total non-response

  40. Why Is E&I Necessary? To produce a complete and consistent data file that accounts for all sampled units Both units that did not respond to the survey must be imputed and units that did not provide a complete response must be imputed Correct erroneous responses

  41. E&I Terminology Data Group Groupings (defined by SM) of records that will be kept together for imputation purposes These groupings are based on multi dimensions: industry (NAICS) geography (province) Data groups that will be used for a specific survey will depend on: initial sample design (number of units sampled and the level of stratification used) number of records that respond to the survey (a minimum of 5 or 10 records are required in a data group) May be changed during production if not enough donors

  42. E&I Terminology (continued) Edit Group Grouping of variables within a record that will be processed together in an imputation method Generally edit groups may be defined as follows for most surveys: revenue and expense sections employment section and provincial distribution of goods/services sold Allows for a record to be a donor if it has clean data in one section even when other sections are blank; this increases the donor pool

  43. E&I Terminology (continued) Key variables Total operating revenue Total operating expenses Salaries Cost of goods sold

  44. The Stages Of The E&I System Pre-processing BANFF E & I System Post-Processing Allocation

  45. Preprocessing Deterministic Edits Conditional edits - If A then B Sum of Parts (SOP) Assign 100% to percentage totals Impute reporting period Donor Outlier Detection

  46. BANFF E & I System Impute for missing key variables as specified by subject matter (i.e. total revenue, total expenses) Impute for other missing variables: Apply Historical Trend Apply Current Year Trend Use donor (for partial imputation), Select a donor for massive imputation for total non-response

  47. BANFF Algorithms DIFTREND- Historical trend imputation CURRATIO- Current ratio imputation PREVALUE– Value from the previous period for the same unit is imputed PREAUX– Historical value of a proxy variable for the same unit CURAUX– Current value of a proxy variable for the same unit

  48. Post-Processing Prorate components to ensure that they sum exactly to totals Perform a number of consistency checks to ensure that micro-data are valid Assign customer location (percentage cells) Massive Imputation (donor selected during processor but applied in the post-processor)

  49. Allocation - Definition & Purpose Definition: Allocation is the distribution of survey and administrative data from their acquisition level (Collection Entity) to the targeted statistical units (Establishments or Locations) as defined on the survey frame. Purpose: To provide fully-processed micro data on a fiscal year basis, for establishments or locations in-sample for the UES Determine the distribution of value added by province

  50. Sample Survey Allocation

More Related