1 / 29

Managing Research Data

Managing Research Data. Statlab Workshop Spring 2009 Amelia Aldao, M.S. Jeremy Green. Managing Data Outline. Data entry Cleaning data Using someone else’s data Keeping track of data From data to results. 1. Data Entry. Data Entry. Excel is the best software for data entry:

raquelc
Download Presentation

Managing Research Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing Research Data Statlab Workshop Spring 2009 Amelia Aldao, M.S. Jeremy Green

  2. Managing Data Outline Data entry Cleaning data Using someone else’s data Keeping track of data From data to results

  3. 1. Data Entry

  4. Data Entry Excel is the best software for data entry: Easy to use Files can be easily imported to data analysis packages

  5. Data Entry - Excel Earlier versions (before 2007) only had 256 columns per sheet You can get around this by creating several sheets, then merging them with data analysis packages Although Excel has some basic statistical functions (e.g. mean, standard deviation), they are limited, so it is best to only use it for entering data and leaving all calculations for statistical packages

  6. Data Entry - Excel - Useful Tips Each column should contain a variable. Use the first row to label the variables Given that most modern data analyses packages allow for long variables names (e.g. more than 8 characters), be specific with your variable names If you want to see a particular set of columns while entering data (say, to make sure you are entering values in the ID), go to Window-Spilt Sometimes it can be useful to sort data while entering it. To do so, go to Data - Sort and select the variables you would like to sort your data by

  7. Data Entry - Excel - Useful Tips When entering data from categorical variables you have several options: Enter the data in string format and keep it that way Enter the data in string format and convert it to numerical values later Enter values that have already been converted to numerical If you will be assigning numerical values to categorical ones, please keep track of what your codes mean. It is usually helpful to write a codebook where you keep track of all this information

  8. Data Entry - Codebook Always create a codebook that contains: Instructions for entering data Instructions for making decisions when data are ambiguous Instructions for handling missing observations Numerical codes you will use for categorical data General troubleshooting information Treat it as a working document

  9. Data Entry - Multiple Coders If you will have multiple people working on your data, you should encourage them to append the codebook with relevant information as they work on it Also, you should create a column for them to enter their initials next to each line of data they enter. This way, if any issues come up, you can always find out who entered that particular line of data Every time they open the master sheet, ask them to resave it with today’s date and their initials. This will also help you track who did what

  10. Data Entry - Missing Values The easiest thing is to leave them blank Sometimes, however, you might want to make distinctions between different kinds of missing data (e.g. non-applicable, non-available, unclear). In this case, use a different code for each (and write this on the codebook). When using codes for missing values always remember to: Use values outside of the range of your variable Define all missing values in your data analysis package

  11. Data Entry - Long vs. short format Long form: one line of data per participant • Short form: multiple lines of data per participant

  12. 2. Cleaning Data

  13. Cleaning your data In order to minimize errors while manually entering data, you can set ranges in Excel so that if a value outside the range is entered, the cell will change color. To to this go to Format - Conditional Formatting and specify the ranges for which a different format should show up. Also, you can use the data validation options. Go to Data - Validation

  14. Cleaning Data - Tips Take advantage of labeling options in software packages If possible, sort your variables so that you can easily find them Ask for descriptive statistics of all variables to check if values outside the range were entered Also, ask for skewness. Sometimes a highly skewed statistic might signal something was entered wrong

  15. Cleaning Data - Tips It is always a GREAT idea to plot your data

  16. 3. Using someone else’s data

  17. Dealing with existing data Sometimes you might download the data (e.g. ICPSR) or get it form a colleague, advisor, etc As tempting as it is to start running analysis because the data look clean, you need to make sure the data set is cleaned and organized in a way that makes sense to you

  18. Dealing with existing data To do this, always ask for: Codebook Syntaxes Outputs they have produced Papers published with that data Suggestions on how to handle that particular data if you are not very familiar with it

  19. 4. Keeping Track of Data

  20. Keeping Track of Data Sets Ever time you make changes to your data, save it with the current date Keep a document with a list of the major changes with each version A good idea is to keep a folder with the original data sets and create different subfolders as you make changes to the data set. Sometimes it is also a good idea to keep a working directory for currently active files Always make backup copies

  21. Keeping Track of Syntaxes and Outputs Save all the syntax you write Save all the output you produce and try to annotate it as much as possible Save your syntax and output with the data file name a brief description of the analyses and the current date Save syntax and output in a separate folder from your data

  22. 5. From Data to Results

  23. Variables Dependent Variable (response, outcome, criterion) Independent Variables (explanatory or predictor variables) • Treatment Variable • Covariates / Confounding Variables Categorical and Continuous Variables Remember: Types of variables we choose, determine the statistics we use

  24. Endogeneity Problem: “Independent” variables are not really independent • The “dependent” and “independent” variables are determined in equilibrium (example: effect of education on wages) • Treatment effects will be biased Modeling approaches to deal with this • Assumption-based methods • Instruments

  25. So, how do I analyze my data? Correlation • Correlation allows you to quantify relationships between variables (r, r-squared) • Regression allows prediction of dependent variable based on one or more independent variables Group differences • t-test & ANOVA • Chi-square for categorical and frequency data Significance v. effect size More Complex Model

  26. Descriptive Statistics Variables Dependent Variable(s) Independent Variable(s) Important Covariates Graphs Summary Statistics on Key Variables Number, Mean, Minimum, Maximum, Standard Deviation Cross-Tabs

  27. Putting Output into a Paper Cut and Paste Graphs Cut and Paste into Word Processing document Save as .jpeg or .tif file Tables Cut and Paste Format in Word Processing document Import into Excel, format, and then place in Word

  28. Last piece of advice…. As tempting as it is to start running analyses as soon as you get a chance, please make sure that you have cleaned your data! There is nothing worse than having to redo analysis….and finding out they have changed! Invest your time wisely. Plan ahead and be organized!

  29. Questions?

More Related