1 / 19

Replicating Results- Procedures and Pitfalls

Replicating Results- Procedures and Pitfalls. June 1, 2005. The JMCB Data Storage and Evaluation Project. Project summary Part 1- July 1982 JMCB started requesting programs/data from authors Part 2- attempt replication of published results based on submissions

sabin
Download Presentation

Replicating Results- Procedures and Pitfalls

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Replicating Results- Procedures and Pitfalls June 1, 2005

  2. The JMCB Data Storage and Evaluation Project • Project summary • Part 1- July 1982 JMCB started requesting programs/data from authors • Part 2- attempt replication of published results based on submissions • Review of results from Part 2 in Replication in Empirical Economics: The Journal of Money, Credit and Banking Project; The American Economic Review, Sept 1986, by Dewald, Thursby, Anderson

  3. The JMCB Data Storage and Evaluation Project/ Dewald et al • The paper focuses on Part 2 • How people responded to the request • Quality of the data that was submitted • The actual success (or lack thereof) of replication efforts

  4. The JMCB Data Storage and Evaluation Project/ Dewald et al • Three groups: • Group 1: Papers submitted and published prior to 1982. These authors did not know upon submission that they would be subsequently asked for programs/data. • Group 2: Authors whose papers were accepted for publication beginning July, 1982 • Group 3: Authors whose papers were under review beginning July, 1982

  5. Summary of Responses/Datasets Submitted, Dewald et al, p 591

  6. Summary of Examined Datasets Dewald et al, p 591-592

  7. “Our findings suggest that inadvertent errors in published empirical articles are a commonplace rather than a rare occurrence.” – Dewald et al, page 587-588 “We found that the very process of authors compiling their programs and data for submission reveals to them ambiguities, errors, and oversights which otherwise would be undetected.” – Dewald et al, page 589

  8. Raw data to finished product Raw data Analysis data Runs/results Finished product

  9. Raw Data -> Analysis Data • Always have two distinct data files- the raw data and analysis data • A program should completely re-create analysis data from raw data • NO interactive changes!! Final changes must go in a program!!

  10. Raw Data -> Analysis Data • Document all of the following: • Outliers? • Errors? • Missing data? • Changes to the data? • Remember to check- • Consistency across variables • Duplicates • Individual records, not just summary stats • “Smell tests”

  11. Analysis Data -> Results • All results should be produced by a program • Program should use analysis data (not raw) • Have a “translation” of raw variable names -> analysis variable names -> publication variable names

  12. Analysis Data -> Results • Document- • How were variances estimated? Why? • What algorithms were used and why? Were results robust? • What starting values were used? Was convergence sensitive? • Did you perform diagnostics? Include in programs/documentation.

  13. Thinking ahead • Delete or archive old files as you go • Use a meaningful directory structure (/raw, /data, /programs, /logfiles, /graphs etc.) • Use relative pathnames • Use meaningful variable names • Use a script to sequentially run programs

  14. Example script to sequentially run programs 1. #! /bin/csh 2. #File location: /u/machine/username/project/scripts/myproj.csh 3. #Author: your name 4. #Date: 9/21/04 5. #This script runs a do-file in Stata which produces and saves a dta-file 6. #in the data directory. Stat-transfer converts the .dta file to .sas7bdat 7. #and saves the file in the data folder. The program analyze.sas is run on 8. #the new sas data-file. 9. cd /u/machine/username/project/ 10. stata -b do programs/cleandata.do 11. st data/H00x_B.dta data/$file.sas7bdat 12. sas programs/analyze.sas

  15. Log files • Your log file should tell a story to the reader. • As you print results to the log file, include words explaining the results • Don’t output everything to the log-file- use quietlyand noisily in a meaningful way. • Include not only what your code is doing, but your reasoning and thought process

  16. Project Clean-up • Create a zip file that contains everything necessary for complete replication • Delete/archive unused or old files • Include any referenced files in zip • When you have a final zip archive containing everything- • Open it in it’s own directory and run the script • Check that all the results match

  17. When there are data restrictions… • Consider releasing: • the subset of the raw data used • your analysis data as opposed to raw data • (at a minimum) notes on process from raw to analysis data PLUS everything pertaining to the data analysis • Consider “internal” and “external” version of your log-file: • Do this via a variable at the top of your log-files: local internal = 1 … list if `internal’ == 1

  18. Ethical Issues • All authors are responsible for proper clean-up of the project • Extremely important whether or not you plan on releasing data and programs • Motivation • self-interest • honest research • the scientific method • allowing others to be critical of your methods/results • furthering your field

  19. Ethical Issues – for discussion • What if third-party redistribution of data is not allowed? • Solutions for releasing data while protecting your time investment in data collection • Is it unfair to ask people to release data after a huge time investment in the collection?

More Related