310 likes | 524 Views
Introduction to Stata using the Northern Ireland Household Panel Survey (NIHPS). Katrina Lloyd (QUB) Patricia McKee (UU). Format. 9:15 Intro to NIHPS 9:30 Intro to Stata 10:30 – 11:00 Coffee break 11:00 Stata files – log / do Advantages of Stata 12:30 Questions / examples. NIHPS.
E N D
Introduction to Statausing the Northern Ireland Household Panel Survey (NIHPS) Katrina Lloyd (QUB) Patricia McKee (UU)
Format • 9:15 Intro to NIHPS • 9:30 Intro to Stata • 10:30 – 11:00 Coffee break • 11:00 Stata files – log / do Advantages of Stata • 12:30 Questions / examples
NIHPS • NIHPS began in 2001 and is an extension of the BHPS (1991) • ISER at Essex University has overall responsibility for survey • NISRA carries out fieldwork in NI • 6 waves of NIHPS data available from UK Data Archive (2001-2006)
NIHPS • NIHPS follows representative sample of individuals • Household-based interviewing: • All adults aged 16+ • From Wave 4 all children aged 11-15 (Youth Panel) • Unique value is that NIHPS measures change at the individual level
NIHPS • Achieved sample (full interviews all years) • Wave 1 - 3,458 individuals in 1,978 households • Wave 2 - 2,692 individuals • Wave 3 - 2,414 individuals • BY Wave 6 - 2,151 individuals • Attrition
Content of the NIHPS • NIHPS has 3 components: • Core component asked every year • Includes health, housing, finances • Rotating core component – every 3 years • Includes wealth, assets and debt, parenting • Variable component – once in the panel • Includes race, place of birth, age left school
NIHPS datasets • Cross-sectional files for each wave • Longitudinal files for individuals • Files linked by common variables • PID (unique Personal Identification Number) • wHID (Household ID – changes year on year) • wPNO (person number – changes year on year) • w refers to the wave id k,l,m,n,o,p - years 2001-2006 respectively
NIHPS Record Type Record Description wHHSAMP - household-level data for issued households wHHRESP - household-level data for responding households wINDSAMP - individual-level data for issued households wINDALL - enumerated individuals' data (including children and non- respondents)
NIHPS Record Type Record Description wINDRESP - individual-level data for respondents wEGOALT - relationship of each individual in a household wINCOME - income and payment data wJOBHIST - information from the employment history
NIHPS additional files wMARRIAG - one record for each reported legal marriage wCOHABIT - one record for each cohabitation spell outside marriage wCHILDAD - information about adopted and/or step- children wCHILDNT - information about natural children wCHILD - information on children and parenting styles wYOUTH - responses to the Young persons questionnaire wLIFEMST - information about employment status spells
NIHPS additional files For ALL Waves XWAVEID - information for matching individuals between waves XWLSTEN - information on the latest known sample status of individuals XWAVEDAT - central source of data on individuals which is fixed and only measured once in the panel e.g. race
Stata windows Previous commands Results Variables Commands
Edit Preferences Click on Edit tab Come down to preferences Select general preferences
LOG files – record your session • Start • Either click icon or select File > Log > Begin • Types • .smcl = Stata formatted • .log = a text file or ASCII file • Choices • View existing file • Append new to old • Overwrite with new • Closure • When you exit • Choose to suspend / resume
Log file Choose folder LOG file Give filename Choose type LOG Note : if a log file is on the name appears below results and above commands
DO files • Text file containing commands rather than typing commands at the keyboard • Contents of review window (previous commands can be saved into a do file • Do files may call other do-files which call other do-files nested 64 deep orin a master.do up to 1,000 do files can be called one after the other
Do file Note: comment Select commands to run and click icon
Built-in Variables • _pi contains the value π to machine precision • _n contains the number of the current obs. • Eg age 23 34 45 56 _n 1 2 3 4 • _N contains the total number of obs. • Eg age 23 34 45 56 _N 4 4 4 4 Note Stata respects case: 3 distinct names myvar Myvar MYVAR
Example of _n and _N use kindall, clear sort khid kpno // sort file hhold and pno within gen totcases = _N // generate total number of obs * For each hhold generate no of people in hhold bysort khid: gen totninhh = _N * For each hhold generate the number within bysort khid: gen nwithinhh = _n list pid khid kpno totninhh nwithinhh in 1/20 tab totninhh nwithinhh ,miss // crosstab include missing
gen totcases = _N // generate total number of obs tab totcases
bysort khid: gen totninhh = _N tab totninhh totninhh Freq. Percent Cum. 1 518 9.98 9.98 2 1,238 23.86 33.85 <- 2 persons 3 915 17.64 51.48 4 1,176 22.67 74.15 <- 4 persons 5 830 16.00 90.15 6 252 4.86 95.01 7 175 3.37 98.38 8 56 1.08 99.46 9 18 0.35 99.81 10 10 0.19 100.00 Total 5,188 100.00
list pid khid kpno totninhh nwithinhh in 1/20 Case pid khid kpno totninhh nwithinhh 1. 118500023 11850027 1 3 1 2. 118500058 11850027 2 3 2 3. 118500074 11850027 3 3 3 4. 118500317 11850043 1 1 1 5. 118501135 11850116 1 1 1
Saved Results summarize produces summary statistics sum kage12 Variable Obs Mean Std. Dev. Min Max kage12 5188 35.46164 22.59792 0 97 Also saves in r( ) 19 scalars like: r(N) – no of obs r(mean) – mean r(sum) – sum of age r(sd) – std deviation r(p1) – 1st percentile r(p95) 95th percentile some are only available with sum kage12, detail To list results stored in r( ) type return list
. sum kage12, detail age at 1.12.2001 Percentiles Smallest 1% 0 0 5% 3 0 10% 6 0 Obs 5188 25% 16 0 Sum of Wgt. 5188 50% 34 Mean 35.46164 Largest Std. Dev. 22.59792 75% 53 92 90% 68 94 Variance 510.6658 95% 75 96 Skewness .2723639 99% 83 97 Kurtosis 2.072386
scalars: r(N) = 5188 r(sum_w) = 5188 r(mean) = 35.46164225134927 r(Var) = 510.66577343513 r(sd) = 22.59791524533026 r(skewness) = .2723638715033958 r(kurtosis) = 2.072386222684342 r(sum) = 183975 r(min) = 0 r(max) = 97 r(p1) = 0 r(p5) = 3 r(p10) = 6 r(p25) = 16 r(p50) = 34 r(p75) = 53 r(p90) = 68 r(p95) = 75 r(p99) = 83 After sum kage12,detail type return list
LOCAL variables eg var referred to as `var’` from key beside 1 and ‘ from key down beside L Programming - loop over items/values • foreachvar in – loops over items • Can be varlist or newlist or numlist • forvaluesx = – loops over consecutive values • loop is executed as long as `x’ is in range
Example * Comment Setup a local variable testvars local testvars " khgr2r khgsex kage12" * Start of loop – note { and ending } * Could also use foreach x in khgr2r khgsex kage12 { foreach x of local testvars { display " the current variable is `x' tab `x' // displays frequencies sum `x' // produces summary statistics ret list // displays all the saved results } // end of loop
Merging data files • Two kinds of merges • One-to-one • Match-merge • Result contained in new var _merge • 1 = obs occurred ONLY in master dataset • 2 = obs occurred ONLY in using dataset • 3 = obs occurred in BOTH master and using datasets
Example of merging Local dirdata “j:\nihps\nihps data\” foreach x in k l m n o p { use “`dirdata’`x'indall”, clear keep pid `x'age12 `x'newhy sort pid save temp`x’,replace } use tempk,clear foreach x in l m n o p { merge pid using temp`x', _merge(mer`x') sort pid }