270 likes | 308 Views
Learn how to use Stata for climate change analysis, including importing data, creating variables, and running regressions to analyze the impact of climate on outcomes like crop choices.
E N D
Basics • We will follow what Emanuele did but “import” all of it into Stata. • Focus will be on using Stata: • - changing directory • importing excel data into Stata- manipulating/editing data • saving data- creating variables- dropping variables- simple OLS regression- logit regression
Basics Climate change analysis means that we want to relate two phenomena. Usually an outcome of interest (e.g. farmer’s crop choice) and climate (e.g. temperature and precipitation). How does a change in one effect the other? Emanuele showed us what regressions are. Stata can help us do regressions. It will be the primary software we use for data analysis. Will generally provide both GUI and command line based commands (but we’ll pick and choose a bit and even mix them up for convenience).
Basics Open Stata 13. Anatomy of Stata screen: Taskbar Record of results Variables (data) Record of commands Commands
Basics • Change directory: • Type: cd “c:\…\Desktop” • OR • File>Change Working Directory • A way for Stata to “know” where to look for things and put things. • (If you want to look up a book or re-arrange your books, you’ve got to be at the bookshelf)
Importing Data from Excel • Your teams will use an Excel sheet to enter data. • How can we import data from Excel into Stata? • Let’s import Emanuele’s Excel data into Stata (open Excel file to take a look at it): • File>Import>Excel Spreadsheet • Click on “Browse” • Locate Excel file and click “Open” • Check box “Import first row as variable names” • Click “Ok”
Take a Look: Know Your Data Type: browse Two types of data in this world: numeric and text (non-numeric) Click on “Farmer” variable in the Variables pane on the right. It’s type is non-numeric (str2). Click on “AnnualmeantemperatureC” variable. It’s type is numeric (byte). Click on “Valueoflandhectare” variable. It’s type is numeric (int). Note: “Farmer” variable has a slight issue with it…
Manipulating/editing Data • We can now edit individual data items. • Change farmer D’s land value from 320 to 325: • Using GUI: • Data>Data Editor>Data Editor (Edit) • Find farmer D • Click cell with her/his land value • Change to 325: type in the field at the top which contains the current value of 320 • Press “Enter” • Close data editor window • Can also access data editor by clicking Data Editor (Edit) icon in task bar
Manipulating/editing Data (continued) • Using command line: • Type: replace Valueoflandhectare = 330 if Farmer == “D” • It didn’t work! (0 values changed) Why? • Issue with “Farmer” variable: has extra space characters floating around: • Go to editor window (click Data Editor (Edit) icon in task bar) • Notice trailing blank space for Farmer A (and B and C and…) • Stata has lots of neat commands for precisely these kinds of common data issues. • We will use trim – removes leading and trailing blank spaces. • Type: replace Farmer = trim(Farmer) (8 real changes made) • Check this has worked: go to editor window (click Data Editor (Edit) icon in task bar) • Type: replace Valueoflandhectare = 325 if Farmer == “D”… it works this time!
Save Data • Using GUI • File>Save (give it a name: “Stata Training.dta”) • Using command line: • Type: save “Stata Training.dta” • First part is a folder name, second part is actual file name (Stata files use the extension .dta). • Notice: we didn’t need to type a long title with a file path – this is because we changed our working directory earlier.
Creating Variables Create a new variable – annual mean temperature in degrees farenheit Type: gen AnnualmeantemperatureF = AnnualmeantemperatureC * 9/5 + 32 Label variable – very important! Type: label variable AnnualmeantemperatureF "Annual mean temperature (°F)"
Drop Variables Type: drop [insert variable name] Annual mean temperature in F – not important… Type: drop AnnualmeantemperatureF We can remove everything at once too, type: clear
Do Files • Do files are incredibly important. • Contain all your commands and produce results you want. • General rule: keep adding commands as you go. • Let’s do that for what we’ve done so far: • Click on New Do-file Editor icon in task bar • Blank Do-file is ready for editing • Let’s add in the commands we’ve already run – pick these from the Review pane on the left: • Right-click on a command • Select Copy • Paste into Do-file editor
Do Files (continued) Should look like: cd "C:\Users\agha.ali.akram\Desktop\Stata Training“ import excel "Stata Training Data.xlsx", sheet("Data") firstrow replace Farmer = trim(Farmer) replace Valueoflandhectare = 325 if Farmer == "D“ gen AnnualmeantemperatureF = AnnualmeantemperatureC * 9/5 + 32 label variable AnnualmeantemperatureF "Annual mean temperature (°F)“ save "Stata Data Training.dta“ Add comments to your Do-file by putting “\\” before a sentence. Save your Do-file
Do Files (continued) Run your Do-file Everything works… but it didn’t save. This is because that file name already exists. Type: “, replace” at the end of the save statement in the Do-file Select the save line in the Do-file (double click and highlight the line) and press the “Execute (do)” icon in the Do-file editor. Save your Do-file! (Tip: Keep saving your work – important to keep doing this)
Regressions Type: reg Valueoflandhectare AnnualmeantemperatureC We can add conditions too. Type: reg Valueoflandhectare AnnualmeantemperatureC if Valueoflandhectare < 600 This regress land values on temperature for all farmers whose land value is less than 600.
Storing Regression Results Might want to record our results somewhere so we can look them up without having to re-run all this Stata code. Type: reg Valueoflandhectare AnnualmeantemperatureC Then, Type: outreg2 using "Stata Training Results.doc“ Add this to the Do-file (insert it before the save command).
Regressions Recall: we fit a quadratic form (“concave”). Let’s look at our data. Type: graph twoway scatter Valueoflandhectare AnnualmeantemperatureC Looks concave…
Creating Variables (continued) Create a new variable – annual mean temperature squared Type: gen AnnualmeantemperatureC_SQ = AnnualmeantemperatureC * AnnualmeantemperatureC Label variable: Type: label variable AnnualmeantemperatureC "Annual mean temperature squared"
Regressions Type: reg Valueoflandhectare AnnualmeantemperatureC AnnualmeantemperatureC _SQ This adds the “concave” functional form that Emanuele introduced earlier Got a great fit to the data!
Let’s Take this to a “Big” Dataset • A subset of agricultural data from China. • Type: clear • Let’s import this data – same as before: • File>Import>Excel Spreadsheet • Click on “Browse” • Locate Excel file and click “Open” (“China Data.xls”) • Check box “Import first row as variable names” • Click “Ok”
Let’s Take this to a “Big” Dataset I’ve sabotaged this data! Can you find which variable is problematic? Hint: Look at the data types… are they all numeric as we’d like?
Let’s Take this to a “Big” Dataset I’ve sabotaged this data! Can you find which variable is problematic? Hint: Look at the data types… are they all numeric as we’d like? Variable “clay” is a non-numeric string variable. Fix this, type: destring clay, replace
Explore this Data Type: sum tab Wheat tab Rice tab Rice Wheat
Create New Variables Recall from Emanuele’s presentation that we relate our variable of interest to climate variable along with its squared term. Let’s generate squared terms for temperature and precipitation variables. Type: gen stemp1_SQ = stemp1 * stemp1 gen stemp2_SQ = stemp2 * stemp2 gen stemp3_SQ = stemp3 * stemp3 gen stemp4_SQ = stemp4 * stemp4 KEEP ADDING YOUR CODE TO THE DO-FILE
Run Regressions Type: reg Wheat stemp1 - stemp4 stemp1_SQ - stemp4_SQ Hang on – what happened there? Used a shortcut: the “-” allows us to refer to whole groups of variables (in sequence). Another important type of regression is logit. Type: logit Wheat stemp1 - stemp4 stemp1_SQ - stemp4_SQ