1 / 29

R: Packages & Data

R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task. However, some “quick & dirty” commands are useful to know for when all the “better” options aren’t working. R Packages. What is an R package?

theresa
Download Presentation

R: Packages & Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R: Packages & Data

  2. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task. However, some “quick & dirty” commands are useful to know for when all the “better” options aren’t working

  3. R Packages • What is an R package? • A series of programs bundled together • Once installed a copy of the package lives on the computer and doesn’t need to be reinstalled • Updating R • Must reinstall packages • May loose packages that aren’t kept updated

  4. Packages-> Install Package

  5. Choose a Mirror Site

  6. Choose Package

  7. Loading Package/Contents • To load a package • library(package name) • Contents of package • library(help= package name) • For additional documentation • http://cran.r-project.org/ • PackagesPackage Name  Downloads: Reference Manual • Note: Some packages may overwrite the contents or functions in another package, when this happens it will be indicated in the log

  8. Advanced: Loading Packages • To find out what packages are already installed on a computer • installed.packages() • To check if a given package is installed • is.installed <- function(mypkg) is.element(mypkg, installed.packages()[,1]) • To install a package without clicking through windows • Install.package(“Package Name”) • These last two commands are particularly helpful when writing functions for other users

  9. Functions within a Package • To get help • ?FunctionName • ??Topic of Interest • To see the source code • Function Name • To see an example • example(Function Name)

  10. help(topic) ?topic help.search(“topic”) ??topic str() ls() dir() history() library() library(help=) rm() rm(list=ls()) example() setwd() source() function Getting Started: Loading Files

  11. Data Manipulation: Data Entry • Types of Data • Numerical, categorical, logical, factors • mode(variable) • Formats of Data • Scalar, vector/array, matrix, data frame, list • Ways to enter data • Manually • read.csv,read.table,scan • library(foreign) • library(Hmisc)

  12. Importing from SAS • Option One: • In SAS proc export DATA=file DBMS=CSV OUTFILE=“destination\name.csv"; run; • In R • read.csv()

  13. read.csv() • Syntax • read.csv(file, header = TRUE, sep = ",“, dec=".", fill = TRUE,...) • File: the name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an absolute path, the file name is relative to the current working directory, getwd(). File can also be a complete URL. • Header: a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one fewer field than the number of columns. • Sep: the field separator character. Values on each line of the file are separated by this character. If sep = "" (the default for read.table) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns. • Dec: the character used in the file for decimal points. • fill :logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added. See ‘Details’. • Additional Options available, see documentation • Note: If you’re desperate to read in an unusual data type see “scan”

  14. .RData • The extension .RData is a way to store objects created in R. • Store using the command save(c(object1, object2),file=“Storage.RData”) • Access later using load( “Storage.RData”)

  15. Advanced: Reading Data directly from SAS or STATA • SAS Option Two: • In SAS • libname library xport =“destination\name.xpt"; • data library.data; • set data; • run; • In R • library(Hmisc) • data<-sasexport.get(“destination\name.xpt“) • STATA • library(foreign) • NOTE: THE PACKAGE FOREGIN CAN HANDLE MULTIPLE FILE TYPES INCLUDING SAS • data.stata<-read.dta(“file.dta")

  16. c(…) seq(from,to) rep(x,times) data.frame() list() matrix() Data Entry • read.dta() • sasxport.get() • read.csv() • data() • data(R DataSet) • help(R DataSet) • load()

  17. mode() is.character() is.numeric() is.logical() is.factor() class() is.matrix() is.data.frame() names() head() tail() length() dim() nrow() ncol() is.na() dimnames() rownames() colnames() unique() describe() levels() Data Information

  18. Data Manipulation • It is possible to access subsets of a data item using bracketed commands. (e.g. x[n] ) • Options to do this includes the everything but command (x[-n]), multiple selections (x[1:n] or x(c(1,2,3)]) • Logical Arguments can also be used (x[x > 3 & x < 5]) • Lists use a double bracketing structure ( x[[n]] ) • Data frame items can be called using two formats • x[[“name”]] • x$name • Anything with row and column data uses a double structure to index (x[ i , j ])

  19. as.numeric() as.logical() as.character() as.array() as.data.frame() as.matrix() factor() ordered() t() reshape() cat() rbind() cbind() merge() sort() order() library(reshape) rownames()<-c() colnames()<-c() na.omit() cut() Data Manipulation

  20. nchar() substr() tolower() toupper() chartr() grep() match() %in% pmatch() charmatch() sub() strsplit() paste() Sys.time() Sys.Date() date() as.Date as.POSIXct() Character & Time Based Data

  21. ftable() format() paste() xtable() write.table(data,"clipboard",sep="\t",col.names=NA) write.csv() write.foreign() write.dta sink() save() print() save.image() Data Export

  22. format() • Syntax • format(x, trim = FALSE, digits = NULL, nsmall = 0L, justify = c("left", "right", "centre", "none"), width = NULL, na.encode = TRUE, scientific = NA, big.mark = "", big.interval = 3L, small.mark = "", small.interval = 5L, decimal.mark = ".", zero.print = NULL, drop0trailing = FALSE, ...) • X: any R object • Trim: logical, if FALSE numbers are right-justified to a common width, If TRUE the leading blacks for justification are suppressed. • Digits: how many significant digits should be used. • justify: character, vector should be left-justified, right-justified, or centered. • See also • format.Date,(methods for dates) • format.POSIXct (date-times)

  23. Extra Resources

  24. Advanced Packages to try • gtools • reshape • Journal of Statistical Computing • http://stat-computing.org/ • Journal of Statistical Software • http://www.jstatsoft.org/

  25. http://journal.r-project.org/

  26. www.rseek.org

  27. http://r-forge.r-project.org/

  28. http://www.statmethods.net/index.html

More Related