14a. Accessing Data Files in SPSS ®

14a. Accessing Data Files in SPSS®

Prerequisites • Recommended modules to complete before viewing this module • 1. Introduction to the NLTS2 Training Modules • 2. NLTS2 Study Overview • 3. NLTS2 Study Design and Sampling • NLTS2 Data Sources, either • 4. Parent and Youth Surveys or • 5. School Surveys, Student Assessments, and Transcripts • NLTS2 Documentation • 10. Overview • 11. Data Dictionaries • 12. Quick References

Overview • Purpose • Open and view data files • Limiting variables • Subsetting cases • Joining/combining data files • Summary • Closing • Important information

NLTS2 restricted-use data NLTS2 data are restricted. Data used in these presentations are from a randomly selected subset of the restricted-use NLTS2 data. Results in these presentations cannot be replicated with the NLTS2 data licensed by NCES.

Purpose • Learn to • Open a data file • See what is in a file (i.e., contents of the file) • “Size” a data file for a perfect fit • Reduce the number of variables • Reduce the number of cases (i.e., subset the data) • Combine information from multiple sources • Bring in data from another source or another wave • Join or combine files • Create a new file

Open and view data files • SAS® and SPSS® data are in separate folders. • SPSS data have a “.sav” extension. • Note: Data files were developed in SAS, which • Allows 28 distinct missing values vs. SPSS’s 3 distinct values or a range of values. • Has associated user-defined value label formats stored separately in a format library vs. the SPSS convention of storing value labels with the variable. • See “Notes to SPSS Users” hyperlinked from the table of contents in the data documentation for details.

Open and view data files • Files are either read from or written to. • Files have a name and a location where they are stored. • SPSS needs to know the name of the file and where to find it. • The path describes the nesting of folders. • Example: • C:\myprojects\NLTS2\Datais a path or location. • i.e., the file is located on Drive “C”, in the folder “Data,” which is nested inside the “myprojects” and “NLTS2” folders. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Open and view data files • Syntax • Open file from menu • Open file command in SPSS syntax editor • Submit a “Get File” commandGet File 'C':\myprojects\NLTS2\Data\n2w1tchr.sav'. • Menu command • From menu: • File: Open: Data [select file from browser] • A window opens with spreadsheet type of display. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Open and view data files • The open file is the “active” dataset. • Select the “Variable View” tab for details about variables. • Rows list the variables. • Columns contain descriptors and attributes about the variables and values. • “Data View” has case-by-case values for each variable. • Each row holds data for a single respondent. • Each column holds the data for a single variable. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Viewing files Open the Wave 1 Teacher file. Look at both the data view and variable view. Open and view data files: Example These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Open and view data files: Example These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Limiting variables • How to reduce the number of variables in the file • Large files with many cases and many variables are unwieldy; simplify. • Fewer variables to search through • Fewer cases to process • Create files that are limited to just those variables needed for analysis. • You have the choice of drop or keep. • Which one is best? The one that requires less typing! • If you are dropping more variables than keeping, use “keep” and vice versa. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Limiting variables • Note: When making changes to a file • Use work files for temporary changes. • Work files are files that are in existence only for the duration of the program or SPSS session. • An active file is a work file unless it is saved. • Save the results to a new data file. • Usually it is best to create a new file rather than to modify the source file. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Limiting variables • In syntax editorGet File= 'C:\MyProjects\NLTS2\Data\n2w1parent.sav' /KEEP= ID w1_DisHdr2001 w1_GendHdr2001w1_IncomeHdr2001 w1_AgeHdr2001 np1Weight np1HealthProb np1GroupMember np1ProblemCount np1E2a np1B2a. EXECUTE. SAVE OUTFILE='C:\MyProjects\NLTS2\Data\Par_w1_lmt_vars.sav' /DROP= np1E2anp1GroupMember. • Notice that “Keep” is on the “Get File” and “Drop” is on the “Save File” commands. • Notice that SPSS statements end in a period. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Limiting variables • Menu driven • Open a file. • From “File” select “Save as.” • Click on “Variables.” • in next pop-up menu, click on “Drop All.” • In the large box with the variable list, click in each little box in the “Keep” column to select only the variables needed from this file. • Little boxes with an “x” indicate the variables are kept; blank boxes indicate the variables are dropped. • Click “Continue,” give the file a new name in the Browse window, and click “Save.” • Open a new file. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Limiting variables: Example • Limiting variables • Create a file with fewer variables. • Create a new file called “PrScores.sav” from n2w2dirassess.sav. • Keep only the following variables: • ID • ndacalc_pr • ndaPC_PR • ndasyn_pr • NDaF1_friend • na_age4 • w2_dis12 • w2_gend2 • na_grade4 • w2_incm3 • wt_na. • Open and review the new file in variable view. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Limiting variables These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Subsetting cases • How to reduce the number of cases or records in the file. • Often analysis is done on a subset; for example: • Select only youth with visual impairment. • Select only youth who are out of secondary school. • Exclude younger students. • All variables are available in the file; only cases are conditionally restricted. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Subsetting cases • Example: Limit Wave 4 Parent/Youth interview data to those who are 21 or older, excluding youth who are 19 or 20 (W4_Age2007 = 19 or 20). • Code in syntax editor to limit cases USE ALL.COMPUTE filter_$=(W4_Age2007>20).VARIABLE LABEL filter_$ 'W4_Age2007>20 (FILTER)'.VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.FORMAT filter_$ (f1.0).FILTER BY filter_$.EXECUTE. • Code to select all cases FILTER OFF.USE ALL.EXECUTE. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Subsetting cases • To limit cases from menu • Data: Select Cases • Click “If condition is satisfied” and “If” button. • Build logic condition. • Click “Continue” and “OK.” • To select all cases from menu • Data: Select Cases • Select “All Cases” and click “OK.” • To have the best of both worlds • Click “Paste” to save code. • Select and run code from syntax editor. • Toggle on and off as needed. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Subsetting cases: Example • Subsetting cases • Create a small data set with a subset of cases. • Open “PrScores.sav” created in previous example. • Limit cases to those classified with hearing impairment only, i.e., those with a value of “5” for “w2_dis12.” • Look at “Variable View” and “Data View” in the data editor. • Are there any visual clues that the filter is on? • Turn the filter off so all cases are selected. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Joining/combining data files • How to bring in data from another file • Purpose • Learn to combine or join files • Bring in data from another source • Bring in data from another wave • Learn what to watch for • Number of cases in the combined file • How cases are joined • Key variable, i.e., which variable to match on • Keyed file, i.e., which cases to keep These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Joining/combining data files • Why do this? • Often it is necessary to combine information from different files to perform comparative analyses, create new variables, or measure differences over time. • For example, you may want to • Create composite variables from multiple sources. • Look at similar items at different points in time. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Joining/combining data files • Example of a composite variable from multiple sources • Create a variable for “if parent attended a parent/teacher conference” using Wave 2 teacher survey item nts2C8 and fill in with parent interview item np2E1a_d if teacher data are missing. • Example of items at different points in time • Create a variable to look at the pattern of employment between Waves 2 and 3: employed both waves, either wave, or neither wave. • Set to employed both waves if np2HasPdJob (W2) and np3HasJob (W3) are yes. • Else set to employed in either wave if np2HasPdJob or np3HasJob are yes. • Else set to not employed if np2HasPdJob and np3HasJob are no. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Joining/combining data files There will be missing records across files and missing items within files. If data look like this, ask which file is the main filebeing analyzed. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Joining/combining data files • Data in all files must be sorted by the key variable. • The key variable matches files case by case. • Key variable is “ID.” • Files on CD should be sorted by key variable, but as you work with files they may become unsorted. • Code in syntax editor to sort data. SORT CASES BYID (A). • To sort data from menu • Data: Sort Cases • Select “Sort in ascending order” radio button.Select “ID” and move it to the “Sort by” box by clicking the right-facing arrow. • Click "OK" or “Paste" and run code from syntax editor. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Joining/combining data files • Code in syntax editor to join files. MATCH FILES /FILE=C:\MyProjects\NLTS2\Data\n2w1parent.sav‘ /TABLE='C:\MyProjects\NLTS2\Data\n2w2paryouth.sav' /TABLE='C:\MyProjects\NLTS2\Data\n2w3paryouth.sav‘ /TABLE='C:\MyProjects\NLTS2\Data\n2w4paryouth.sav' /BY ID /KEEP=ID np1i_3a_7 np2HasPdJob np3HasJob np4HasJob. Execute. • Why “File” or “Table”? • File: All cases in the file are kept. • Table: Keeps only the cases that match those found in “File.” These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Joining/combining data files • To join data using menu-driven options • From active file, go to menu item “Data” and select “Merge Files” and “Add Variables.” • Select file from browser window and click “Open.” • Note: All variables that appear in both files will automatically be moved to the “Excluded Variables” box. • Select “ID” in “Excluded Variables” box and do the following: • Click “Match cases on key variables in sorted file.” • Select “External file is keyed table” radio button. • In some versions of SPSS, “Non-active dataset is keyed table.” • Keyed table keeps only those cases that match those in the active file. • Click left-facing arrow next to move “ID” to “Key Variables” box. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Joining/combining data files • To join data using menu-driven options (cont’d) • Select variables to keep or drop • Active file variables are marked with "*." • Variables from the keyed or external file are marked with "+." • To drop items, move variables to "Excluded Variables" box from the "NewActive Dataset" box by highlighting each variable and clicking the left-facing arrow • If only selecting a few variables • Exclude all those marked with "+" using click/shift click on the first and last variables. • Click left-facing arrow to move variables to the "Excluded Variables" box. • Select each variable to keep in the "Excluded Variables" box. • Click the right-facing arrow to move to the “New Active Dataset.” • Press “OK” to run or select “Paste” to run code from syntax editor. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Joining/combining data files • To save your work using menu-driven instructions • Select “Save as” from “File” menu and give the file a name. • If a new file has already been created with a new name using the steps outlined in the subsetting data files, select “Save” under the “File” menu. • To save your work using the syntax editor • A file can be saved using either an existing or new name. • To save a file with the name “MyNewFile”:SAVE OUTFILE= 'C:\MyProjects\NLTS2\Data\MyNewFile.sav'. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Joining/combining data files • If bringing in data from more than one file, repeat this process for each file. • Suggestion: Name files in a meaningful way, such as • By date: AnFile_29July.sav • By type of analysis: PI_CrossWave.sav • By source: PI_W123.sav • By sequence: File_5.sav. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Joining/combining data files:Example • Joining/combining data • Combine data from another file with an existing file. • Open and sort PrScores.sav by ID. • Bring in np2HasPdJob from n2w2paryouth.sav. • Bring in np3HasJob from n2w3paryouth. • Save the file as PrScoresEmp.sav. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Joining/combining data files:Example These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Closing • Congratulations, you have learned to • Open and view a file • Create a new file • Reduce the size of files by specifying the • Variables needed • Cases needed • Join files using a key variable • Save files with a new name.

Closing • Topics discussed in this module • Purpose • Open and view data files • Limiting variables • Subsetting cases • Joining/combining data files • Next module: • 15a. Accessing Data: Frequencies in SPSS

Important information • NLTS2 website contains reports, data tables, and other project-related information http://nlts2.org/ • Information about obtaining the NLTS2 database and documentation can be found on the NCES website http://nces.ed.gov/statprog/rudman/ • General information about restricted data licenses can be found on the NCES websitehttp://nces.ed.gov/statprog/instruct.asp • E-mail address: nlts2@sri.com

14a. Accessing Data Files in SPSS ®