1 / 53

Alan Elliott

Alan Elliott. Dealing with Missing Values. Title "Dealing with Missing Values in SQL"; PROC SQL ; select INC_KEY,GENDER, RACE, INJTYPE, case when ISS=- 81 then . else ISS end as ISS, case when EDGCSTOTAL=- 81 then . else EDGCSTOTAL end as EDGCSTOTAL,

miles
Download Presentation

Alan Elliott

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alan Elliott

  2. Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROCSQL; select INC_KEY,GENDER, RACE, INJTYPE, case when ISS=-81 then . else ISS end as ISS, case when EDGCSTOTAL=-81 then . else EDGCSTOTAL end as EDGCSTOTAL, case when AGE=-81 then . else AGE end as AGE from "C:\sasdata\trauma_sm"; quit; SEE SQLMISSING.SAS

  3. Partial Output Missing Values

  4. Character variables • Take INJTYPE out of initial list of variables on the SELECT statement and add the following code: end as AGE, case when INJTYPE="Burn" then "" else INJTYPE end as INJTYPE • And rerun

  5. Results WAS BURN

  6. Combined CASE / Conditional PROCSQL; select INC_KEY,GENDER,RACE, AGE,DISSTATUS, case when AGE LT 10 and DISSTATUS in ("Dead") then "CHILD DEATH" when AGE GE 10 and DISSTATUS in ("Dead") then "OTHER DEATH" ELSE "Alive" end as TYPEDEATH from "C:\sasdata\trauma_sm"; quit; • SEE SQLCASE.SAS

  7. Partial Output

  8. Order by TYPEDEATH (Descending) Modify the code to read … … end as TYPEDEATH from "C:\sasdata\trauma_sm" ORDER BY TYPEDEATH DESC;

  9. Results

  10. Summarize and Count PROCSQL; select case when AGE LT 10 and DISSTATUS in ("Dead") then "CHILD DEATH" when AGE GE 10 and DISSTATUS in ("Dead") then "OTHER DEATH" ELSE "Alive" end as TYPEDEATH, count(calculated TYPEDEATH) as COUNTDEATH from "C:\sasdata\trauma_sm" GROUP BY TYPEDEATH; quit; SEE SQLSUMMARY1.SAS (NOTE: Take calculates out of above statement, and observe error.)

  11. Results

  12. Reorder the table Change the code (at end to read) from "C:\sasdata\trauma_sm" GROUP BY TYPEDEATH order by typedeath;

  13. Results

  14. SQL Summary Functions Summary Function Description AVG, MEAN Average or mean of values COUNT, FREQ, N Aggregate number of non-missing values CSS Corrected sum of squares CV Coefficient of variation MAX Largest value MIN Smallest value NMISS Number of missing values PRT Probability of a greater absolute value of Student’s t RANGE Difference between the largest and smallest values STD Standard deviation STDERR Standard error of the mean SUM Sum of values SUMWGT Sum of the weight variable values which is 1 T Testing the hypothesis that the population mean is zero USS Uncorrected sum of squares VAR Variance

  15. Using Some Summary Functions procsql; select count(brand) as Tot_Cars, sum(minivan) as TOT_Minivans, min(CITYMPG) as MIN_MPG, max(CITYMPG) as MAX_MPG, SUM(CITYMPG)/COUNT(CITYMPG) as AVG_MPG from sasdata.cars; quit; SEE SQLSUMMARY2.SAS

  16. Add group statement from sasdata.cars group by minivan; quit;

  17. Compare Sort (Data Step) DATA SORT DATA MYDATA; INPUT @1 LAST $20. @21 FIRST $20. @45 PHONE $12.; Label LAST = 'Last Name' FIRST = 'First Name' PHONE = 'Phone Number'; DATALINES; Reingold Lucius 201-876-0987 Jones Pam 987-998-2948 Etc… ; *-------- Modify to sort by first name within last (by last first); PROCSORT; BY LAST FIRST; PROCPRINT LABEL NOOBS; TITLE 'ABC Company'; TITLE2 'Telephone Directory'; RUN; RUN this code, observe results. SEE DATASORT.SAS

  18. Results from DATA statement

  19. Sort Using SQL PROCSQL; SELECT LAST LABEL="Last Name", FIRST LABEL="First Name", PHONE LABEL "Phone Number" from MYDATA ORDER by LAST, FIRST; QUIT; SEE SQLSORT.SAS Note – variables appear in table in order selected…

  20. Results for SQL Sort Note – easier to order variable names.

  21. Appending/Concatenating Two Files • Recall from the Data Step, to append two data files you can use the code DATA NEW; SET OLD1 OLD2; RUN; (See SQLAPPEND.SAS)

  22. Results for Data Append

  23. Append Files using SQL PROCSQL; select * from old1 union select * from old2; QUIT; UNION-means concatenate the query results. It produces all the unique rows that result from both queries

  24. Results (same as Data Append)

  25. Basic SQL Operators Combine two or more queries in various ways by using the following set operators:

  26. Duplicate records • Suppose there are duplicate records. • See SQLAPPEND2.SAS DATAOLD1; INPUT SUBJ $ AGE YRS_SMOKE; datalines; 001 34 12 003 44 14 004 55 35 006 21 3 011 33 11 ; DATA OLD2; INPUT SUBJ $ AGE YRS_SMOKE MARRIED; datalines; 006 21 3 . 011 33 11 1 012 25 19 0 023 65 45 1 032 71 55 1 ; RUN; This record added This record added

  27. Union appends, keeps unique rows PROCSQL; select * from old1 union select * from old2; QUIT; One row 6 is unique. Two row 11’s are unique. (UNION keeps all unique.) This is the same code from before – only difference is the duplicated records in the data sets.

  28. Union all • To keep all rows, use UNION ALL PROCSQL; select * from old1 union all select * from old2; QUIT; Add ALL to the code and re-run.

  29. Results – Union All Both SUBJ 6 records included (even though not unique).

  30. EXCEPT • To keep only the data from the first data set that are not in the 2nd set (but all variables) use EXCEPT PROCSQL; select * from old1 except select * from old2; QUIT; Run this code and observe output.

  31. Except Output Note – Record 6 is the same, so it is not kept. Record 11 is different so it is kept.

  32. Switch Data Set order PROCSQL; select * from old2 except select * from old1; QUIT; Run this code and observe output.

  33. Output Note – (Same as before) Record 6 is the same, so it is not kept. Record 11 is different so it is kept.

  34. Except ALL datalines; 001 34 12 003 44 14 004 55 35 006 21 3 006 21 3 011 33 11 ; Suppose there was a duplicate record 006 in the first data set. Using EXCEPT, record 6 would not appear in the result because there is a record 6 in the second data set. If you want a duplicate record that is not a duplicate matched in the 2nd data set to appear in the result, use EXCEPT ALL See SQLAPPEND2a.SAS Added duplicate record

  35. Except ALL PROCSQL; select * from old1 except all select * from old2; QUIT; 6 is a duplicate record in OLD1 and there is not a duplicate 6 in OLD 2, so it appears in the results

  36. Intersect • The INTERSECT command returns only those records that occur in both data sets. • Change EXCEPT ALL to INTERSECT PROCSQL; select * from old1 intersect select * from old2; QUIT;

  37. Intersect Results Only record 6 was duplicated in both data sets.

  38. Compare Union with Outer Union • UNION- produces all unique rows from both queries. • OUTER UNION – concatenates the series results • See SQLAPPEND3.SAS

  39. Compare Output PROCSQL; select * from old1 UNION select * from old2; QUIT; PROCSQL; select * from old1 OUTER UNION select * from old2; QUIT;

  40. Results of Union

  41. Results of Outer Union Note: SQL allows you to create a data set with DUPLICATE variable names.

  42. Cartesian Join • Combines ALL rows from one file with ALL rows from another. PROCSQL; select * from old1, old2; QUIT; SEE SQLCARTESIAN_JOIN.SAS

  43. Cartesian Join Note SUBJ 1 appears 5 times

  44. Using Table Aliases select a.subj, a.age, b.subj as sub_from_b, etc… from old1 a, old2 b Table Alias allows you to distinguish variables from different tables without ambiguity. Note “a” variables prefix specifies that the variable is from the table “old1” since “old1” is labeled as the “a” table below. Table old1 is labeled as table “a” in this code.

  45. Inner Join (Using Table Alias) PROCSQL; select a.subj, a.age, b.subj, b.age, b.married as Married from old1 a, old2 b where a.subj=b.subj; QUIT; In an INNER JOIN, only observations with both key values matching are selected. SEE SQL_INNER_JOIN.SAS

  46. Inner Join Code PROCSQL; select a.subj, a.age, b.subj, b.age, b.married as Married from old1 a, old2 b where a.subj=b.subj; QUIT; Note Married is from table “b” but will be called Married in output “Where” limits the join to those that are in BOTH Tables

  47. Inner Join Results

  48. Inner Join (1-to-1 merge) Example 2 PROCSQL; select * from old1, old2 where old1.subj=old2.subect; QUIT; Use the table names as the alias. SEE SQL_INNER_JOIN2.SAS

  49. Results Note – data for both files where not in order. Only those in both tables with a matching key variable are included in the result. EXERCISE – Add the phrase “order by old1.subj” to put the table in Subject order.

  50. One to Many Merge Suppose you have data like this… you want to match building to employees. DataLOC; input BUILDING $ Location $; datalines; A1 DALLAS A2 WACO A3 HOUSTON ; RUN; DATA EMPLOYEE; input EID $ LOC $ ROOMNUMBER; datalines; 001 A2 103 003 A1 100 005 A1 1001 006 A3 12 002 A1 101.1 ; run;

More Related