1 / 14

R A Personalized Introduction

R A Personalized Introduction. Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 18, 2014. About “R”. A suite of software tools for Data manipulation Calculations Graphical display Largely based on the programming language S Packages

ralph
Download Presentation

R A Personalized Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RA Personalized Introduction Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 18, 2014

  2. About “R” • A suite of software tools for • Data manipulation • Calculations • Graphical display • Largely based on the programming language S • Packages • About 25 packages standard and recommended supplied • Many more available for download at: http://CRAN.R-project.org • Free (GPL). Also BSD, MIT

  3. Basic • Arithmetic > 2+2 [1] 4 • Assign variables > x <- 2 > y <- 5 > z <- 2 * x + 3 * y > z [1] 19 • The created objects are now stored in the workspace. List them > ls() [1] "x" "y" "z” • Also, we can remove them > rm(x) > ls() [1] "y" "z”

  4. Vectors • Creating a vector > x <- c(2,5,9) > y <- c(3,1,-1) > x + y [1] 5 6 8 • But x * y would do a element-wise multiplication > x * y [1] 6 5 9 • But x + 2 would add 2 to all elements of x > x + 2 [1] 4 7 11

  5. Useful functions related to vectors • Sequence of integers from a to b > seq(2,9) [1] 2 3 4 5 6 7 8 9 • The repeat function > rep(1,3) [1] 1 1 1 > rep(1:3,3) [1] 1 2 3 1 2 3 1 2 3 • Try the help or ? command > help(rep) > ?rep

  6. Data and Statistics – Basics • A lot of things out of the box > x <- c(2,3,1,5,7,2,5,8,3,2,0,3,2,6,7,3,1,3,5,8,4) > summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.00 2.00 3.00 3.81 5.00 8.00 • Specifying elements or subsets (index starts at 1, not 0) > x[1] [1] 2 > x[3:6] [1] 1 5 7 2 • Excluding elements by the minus sign > x[-(2:4)] [1] 2 7 2 5 8 3 2 0 3 2 6 7 3 1 3 5 8 4

  7. Matrices • Bind columns (cbind) or rows (rbind) > x <- c(3,5,2); y <- c(8,2,1) > z <- cbind(x,y) > z x y [1,] 3 8 [2,] 5 2 [3,] 2 1 • Or specify the entries and number of rows > A <- matrix(c(3,5,2,8,2,1),nrow=3) > B <- matrix(c(3,5,2,8,2,1),nrow=2)

  8. Matrix operations • Addition is usual > A + 2* A [,1] [,2] [1,] 9 24 [2,] 15 6 [3,] 6 3 • Multiplication: x * y is element wise, not matrix multiplication • Matrix multiplication: %*% > A %*% B [,1] [,2] [,3] [1,] 49 70 14 [2,] 25 26 12 [3,] 11 12 5

  9. Inverse and Covariance of matrix • Computes the inverse of a matrix if it exists: > solve(X) • Covariance matrix > var(X) > cov(X) • Covariance matrix (recall) X1,…, Xn are random variables, each with finite variance Σis the covariance matrix where • Also called var(X) = Variance of the random vector X

  10. Writing a function • A new function can be defined > z <- function(x,y) 3*x + 4*y > z(2,3) [1] 18 • A function with many lines > z <- function(x,y) { c <- 3*x + 4*y; 5 * c } • The last line is the output • Can write the function in a text file prog.R and source it > source("/Users/deb/…/R/xTest.R") • Can also define a new binary operator > “%LL%” <- function(x,y) { 3*x + 4*y } > 5 %LL% 3

  11. Data • Read an entire data frame • The first line of the file should have a name for each variable in the data frame • Each additional line of the file has as its first item a row label and the values for each variable Age Income.KOwns.House 01 25 8 No 02 33 5 No 03 30 130 Yes 04 45 50 Yes 05 65 5 No 06 75 7 Yes > H <- read.table(”filename")

  12. Using data • Plot tries to figure out what kind of plot will be suitable > plot(H[1:2]) • We want to label points based on some attribute • Let us select a subset of the data > H[which(H$Owns.House=='Yes'),] Age Income.KOwns.House 03 30 130 Yes 04 45 50 Yes 06 75 7 Yes 07 28 200 Yes 08 35 90 Yes 10 55 102 Yes … … … …

  13. Using data • Plot one subset with blue, another with red > HYes <- H[which(H$Owns.House=='Yes'),] > plot(HYes[1:2], col='blue') > points(HNo[1:2], col='red') New observation (black) Hands on in class

  14. References • The R manual: http://cran.r-project.org/doc/manuals/r-release/R-intro.html • A self-learn tutorial:https://www.nceas.ucsb.edu/files/scicomp/Dloads/RProgramming/BestFirstRTutorial.pdf

More Related