410 likes | 695 Views
SAS Workshop. SAS Data Management. Hun Myoung Park, Ph.D. University Information Technology Services Center for Statistical and Mathematical Computing. Tuesday, April 1, 2014. © 2009-2010 The Trustees of Indiana University http://www.indiana.edu/~statmath
E N D
SAS Workshop SAS Data Management Hun Myoung Park, Ph.D. University Information Technology Services Center for Statistical and Mathematical Computing Tuesday, April 1, 2014 © 2009-2010 The Trustees of Indiana University http://www.indiana.edu/~statmath statmath@indiana.edu (812) 855-4740, (317) 278-4740
SAS Data Management Outline • INPUT Statement: Input Styles • INFILE Statement • PROC IMPORT/EXPORT • Data Manipulation • IF…ELSE & DO…END Clauses • Recode Variables • LABEL and RENAME Variables • Select Observations & Variables • Working with Data Sets University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management SAS DATA SOURCES University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management DATA STEP • DATA steps read and manipulate data. • INPUT tells how SAS reads data. • DATALINES indicates data items to be read begin from the next line. • INFILE reads data from an external file. • LIBNAME sm ‘c:\temp\sas’; • DATA sm.grade0; • INPUT name $ id stat math; • DATALINES; • John 10091 89 95 • … • RUN; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management INPUT STATEMENT • There are six input styles. • List INPUT: simply list variable names • Modified list INPUT: for ill-organized data • Column INPUT: location of data items • Formatted INPUT: format of data items • Named INPUT: variable=data • Mixed INPUT: combine input styles • SAS recognizes styles automatically. • List (previous slide), column, and formatted INPUT are commonly used. University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management COLUMN INPUT • Specify column locations of data items • $ follows a character variable name • Data should be aligned correctly • Some data may be skipped (id below) • DATA sm.grade1; • INPUT name $ 1-6 stat 12-14 math 15-17; • DATALINES; • John 10091 89 95 • Rachel10100 99 • … • RUN; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management FORMATTED INPUT • Specify formats of data items. • Data should be aligned correctly. • $6. means a string 6 characters long. • 3.0 or 3. means a 3 digit number without a decimal point. • DATA sm.grade2; • INPUT name $6. id 5. male 3. stat 3.0; • DATALINES; • John 10091 89 95 • Rachel10100 99 • … University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management COLUMN CONTROLS • @n moves the input pointer to the nth column. • +n shifts the input pointer by n column to the right. • You may skip some variables (e.g., id) • INPUT name $6. +5 stat 3. @15 math 3.; • DATALINES; • John 10091 89 95 • Rachel10100 99 • … University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management LINE CONTROLS • @ and @@ trailing hold input records • INPUT name $ stat math @@; • DATALINES; • John 89 95 Ron 76 100 • #n moves the input pointer to nth line, / moves the input pointer to the next line INPUT name $6. #2 stat 3. / math 3.; • DATALINES; • John • 89 • 95 University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management USING INFORMAT • INFORMAT defines a format of data entry and FORMAT for data output • COMMA and PERCENT • INPUT income COMMA9. saving PERCENT3.; • DATALINES; • 170,200 89% • 92,540 95% • MMDDYY and DOLLAR • INPUT dates MMDDYY10. sales DOLLAR12.0; • DATALINES; • 10/01/2009 $343,750 • 10/05/2009 $9,784,650 University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management INFILE STATEMENT 1 • INFILE specifies an external data file that is created in advance. • INFILE ‘c:\temp\sas\grade2.txt’; • INPUT name $6. id 5. stat math (3.); • If the data file contains missing values, add the MISSOVER option to INFILE. • The MISSOVER option treats all remaining variables without values as missing values. • INFILE ‘c:\temp\sas\grade2.txt’ MISSOVER; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management INFILE STATEMENT 2 • Alternatively, use FILENAME to make an alias of an external file. • Like LIBNAME, FILENAME makes it easy to referring a file name especially when the name and path are long. • FILENAME gFile ‘c:\temp\grade2.txt’; • DATA sm.grade2; • INFILE gFile MISSOVER; • INPUT name $6. id 5. stat math (3.); • RUN; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management INFILE STATEMENT 3 • DLM specifies a delimiter. More than one delimiter can be used. e.g., *iu* • DSD (Delimiter Sensitive Data) assumes a comma delimiter and is needed for missing values • INFILE DATALINES DLM=’,’ DSD; • INPUT name $ id stat math; • DATALINES; • John,10091,89,95 • Rachel,10100,,99 • … University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management READ ODD STRINGS • What if a string variable is long and/or contain commas? Use ampersand (&) • INFILE DATALINES DLM=‘,’ DSD; • INPUT name : $20. title & $100.; • DATALINES; • Lindblom,”Still Muddling, Not Yet Through” • Park, “Reading “”Small is Beautiful””” • How to read quotation marks in a string? Use tilde(~) • INPUT name : $20 title ~ $100.; • DATALINES; • Lindblom,”Still Muddling, Not Yet Through” University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management READ THROUGH NETWORK • SAS can read external data files through TCP/IP (http and ftp) FILENAME masil URL 'http://www.masil.org/sas/grade2.txt'; DATA sm.grade2; INFILE masil FIRSTOBS=1;INPUT name $6. id 5. stat math (3.); • URL denotes HTTP (protocol) • FIRSTOBS=1 tells SAS to read observations from the 1st row University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management VIEW A DATA SET • PROC CONTENTS displays data structures of a data set. • PROC CONTENTS DATA=sm.grade2; RUN; • PROC MEANS displays summary statistics of a data set. • PROC MEANS DATA=sm.grade2; RUN; • PROC PRINT displays values of variables in a data set. • PROC PRINT DATA=sm.grade2; • VAR stat math; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management PROC IMPORT 1 • PROC IMPORT reads external files such as CSV, Excel, dBase, and Access. • FILENAME airline URL 'http://www.indiana.edu/~statmath/stat/all/panel/airline.csv'; • PROC IMPORT DATAFILE=‘airline’ OUT=sm.airline DBMS=CSV; • GETNAMES=YES; • DATAROW=2; • DBMS specifies the format of the file. • IMPORT Wizard is more user-friendly in MS- Windows. Click File->Import Data. University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management PROC IMPORT 2 • An example of reading an Excel file. • PROC IMPORT DATAFILE="c:\airline.xls“ • OUT=sm.airline2 DBMS=EXCEL2000; • SHEET=“sheet1"; • GETNAMES=YES; • SHEET specifies a worksheet with a range of data to be read in an Excel file. SHEET=“sheet1$A1:G91"; • GETNAMES=YES reads variable names from the file. • CAUTION: Excel 2007 file should be saved as 2003 format for SAS 9.13. University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management PROC EXPORT • Export to a CSV file. DATA specifies a data set to be exported. • PROC EXPORT • DATA=sm.airline OUTFILE="c:\temp\airline.csv“ • DBMS=CSV REPLACE; • Export to an Excel file. REPLACE overwrites the existing data set. PROC EXPORT DATA=sm.ariline OUTFILE="c:\temp\airline2.xls“ DBMS=EXCEL2000 REPLACE; • RUN; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management DATA MANIPULATION • Data sets are manipulated in DATA steps • SET statement loads existing SAS data sets DATA sm.grade7; • SET sm.grade2; • sum1 = stat+math; • sum2 = SUM(stat, math); • mean1 = (stat+math)/2; • mean2 = MEAN(stat, math); RUN; • The result is stored into grade7 without changing anything in grade2 University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management IF…THEN CLAUSE • IF… THEN • IF mean1 GT 90 THEN DELETE; • IF sum1 >= 98 THEN grade=‘A+’; • IF (stat>90 AND math>90) THEN grade=‘A’; • IF… THEN… ELSE • IF mean2 >= 98 THEN grade=‘A+’; • ELSE grade=‘A’; • IF… THEN… ELSE IF… • IF mean2 >= 98 THEN grade=‘A+’; • ELSE IF mean2>=92 THEN grade=‘A’; • ELSE IF mean2>=88 THEN grade=‘A-’; • ELSE grade=‘B’; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management DO LOOP • DO…TO… END • DO i=1 TO 100 BY 1; random=RANNOR(7654321); OUTPUT sm.randomtest; • END; • DO num= 4, 6, 9, 10; …; END; • DO str=‘Mon’, ‘Tue’, ‘Wed’; …; END; • DO WHILE (n<=10); …; END; • DO OVER for an array variable • DO OVER arr; …; END; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management IF & DO… END • This combination executes a list of statements in a block. Imagine { and } in C and most computer languages. • IF mean2 GT 90 THEN DO; • …; • …; • END; • ELSE DO; • …; • …; • END; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management INPUT & DO… END • This combination enables you to read experiment data in an easy manner. • INPUT drug $ @@; • DO level=‘High’, ‘Medium’, ‘Low’; • INPUT symptom @@; • OUTPUT; • END; • DATALINES; • Drug1 34 55 37 • Drug2 84 86 87 • Drug3 87 86 24 University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management RECODE VARIABLES 1 • Use operators and functions • DATA sm.grade8; • SET sm.grade2; • random=RANNOR(7654321); • id=_N_; /* serial observation number */ • IF… THEN and Special Usage • IF age <= 20 THEN age2=0; • ELSE IF age<=30 THEN age2=1; • ELSE IF age<=40 THEN age2=2; • ELSE age2=3; • Age2 = (age>20)+(age>30)+(age>40); University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management RECODE VARIABLES 2 • ARRAY links between variables and corresponding arrays (temporary vars.) • Suppose trust has 1 through 5 and needs reverse recoding. • ARRAY arr trust; • DO OVER arr; • arr=6-arr; • END; • ARRAY arr(5) q1-q5; • DO i=1 TO 5; • arr(i)=6-arr(i); • END; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management MULTIPLE RESPONSE • Suppose respondents are asked to choose three items and their choices are stored in the wide form from c1, c2, c3, respectively. • For frequency analysis, data need to be rearranged in to the long form using OUTPUT. • DATA sm.patient2; • SET sm.patient; • choice=c1; OUTPUT; • choice=c2; OUTPUT; • choice=c3; OUTPUT; • RUN; • The number of observations becomes N*3 . University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management RENAME VARIABLES • Use the DATA statement in DATA Step. • “old variable= new variable” • DATA sm.grade9( RENAME=(sum1=total1) ); • SET sm.grade7; • … • RUN; • Use the RENAME statement in a DATA step. “old variable= new variable” • DATA sm.grade10; • SET sm.grade7; • RENAME sum2=total2 mean2=average2; • … • RUN; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management LABEL VARIABLES • Variable labels • SET sm.grade2; • LABEL stat=‘Statistics’ math=‘Math’; • Value labels using PROC FORMAT. PROC FORMAT; • VALUE yes_L 1=‘Yes’ 0=‘No’; • VALUE $male_L ‘M’=‘Male’ ‘F’=‘Female’; • A period in label name ($male_L.) may not be omitted in a FORMAT statement. • DATA sm.survey2; • SET sm.survey1; • FORMAT trust yes_L. male $male_L.; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management SELECT OBSERVATIONS • Use the IF clause • Keep only obs that meet the condition • IF mean2 >= 90; • IF (stat GE 80 OR math GE 80); • Delete obs that meet the condition • IF mean2 LT 80 THEN DELETE; • IF (stat<80 AND math <80) THEN DELETE; • WHERE in DATA/PROC uses a subset of a data set w/o changing the data set • WHERE mean2 GE 90; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management SELECT VARIABLES • Use either KEEP or DROP statement; You may not use both in a DATA step. • DATA sm.grade15; • SET sm.grade7; • DROP sum1 mean1; • RUN; • In a DATA statement. Do not omit =. • DATA sm.grade16 (KEEP=sum2 mean2); • SET grade7; • DATA sm.grade17 (DROP=sum1 mean1); • SET grade7; • RUN; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management APPEND DATA SETS • Use SET with a list of data sets to be appended. • DATA sm.grade20; • SET sm.grade12 sm.grade13; • You may use PROC APPEND or PROC DATASETS when the master data set is huge. • PROC APPEND BASE=sm.grade12 DATA=sm.grade13; • RUN; • BASE= specifies a master data set and DATA= a data set to be appended. University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management SPLIT DATA SETS • Use DATA statement. Do not omit = after DROP and KEEP. • DATA sm.grade21 (DROP=math) sm.grade22(DROP=stat); • SET sm.grade2; • … • Use OUTPUT statement in IF… THEN… ELSE clause. • DATA … • SET sm.grade7; • IF mean2 GT 90 THEN OUTPUT sm.gradeHigh; ELSE OUTPUT sm.gradeLow; • … University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management SORT DATA SETS • Sort data sets before using BY in a DATA/PROC step (e.g., match merge) • PROC SORT DATA=sm.grade21 OUT=grade23; • BY id; • PROC SORT by default sorts in a ascending order. For a descending order, add DESCENDING before the variable name • PROC SORT DATA=sm.grade22 OUT=sm.grade24; • BY id DESCENDING mean2; • RUN; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management MERGE DATA SETS • Simple merge using MERGE. • DATA sm.grade25; • MERGE sm.grade21 sm.grade24; • Match merge using MERGE and BY. • DATA sm.grade26; • MERGE sm.grade23 sm.grade24; • BY id; • … • Be careful when data structure is different across data sets to be merged. • Can be used for updating data sets. University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management UPDATE DATA SETS • UPDATE updates a data set using a transaction data set. • Data sets need to be sorted in advance. • DATA sm.surveyNew; • UPDATE sm.survey08 sm.survey10 UPDATEMODE=MISSINGCHECK; • BY id; • MISSINGCHECK prevents missing in the transaction file from replacing values in a master file. University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management GET AGGREGATE DATA • PROC MEANS generates aggregate statistics into a data set. • PROC MEANS DATA=sm.airline; • VAR cost output; • CLASS airline; • OUTPUT OUT=sm.air_mean1 MEAN=costM outputM; • Variables costM and outputM are stored into a new data set air_mean1. • You may specify other statistics as well. • PROC MEANS DATA=sm.airline N SUM MEAN VAR; • VAR cost output fuel load; • CLASS airline; • OUTPUT OUT=sm.air_mean2; University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management PROC DATASETS • Display data sets in the library of interest. PROC DATASETS LIBRARY=sm; RUN; • The procedure can also manage (e.g., copy, rename, repair, delete) SAS files. University Information Technology Services Center for Statistical and Mathematical Computing
SAS Data Management RESOURCES • Burlew, Michele M. 2002. Reading External Data Files Using SAS: Examples Handbook. Cary, NC: SAS Institute. • Long, J. Scott. 2009. The Workflow of Data Analysis Using Stata. College Station, TX: Stata Press. • http://support.sas.com/documentation/index.html • http://v9doc.sas.com/sasdoc/ • http://www.indiana.edu/~statmath/stat/sas/index.html • http://www.indiana.edu/~statmath/support/bydoc/ • http://www.masil.org/sas/datastep.html • http://www.masil.org/sas/statement.html • http://www.masil.org/sas/input.html • http://www.masil.org/sas/import.html • http://www.ats.ucla.edu/stat/sas/ • http://www.listserv.uga.edu/archives/sas-l.html University Information Technology Services Center for Statistical and Mathematical Computing