acc522sept282006

Uploaded from authorPOINTLite
Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Introduction to Exploratory Descriptive Data Analysis in S-Plus II: 

Introduction to Exploratory Descriptive Data Analysis in S-Plus II Jagdish S. Gangolly School of Business State University of New York at Albany

Data Manipulation: Accessing elements: 

Data Manipulation: Accessing elements country.data[1:2, 2:3] Population and inflation in austria and france country.data[3, 1:2] gdp and population of germany dimnames(country.data)[1] Names of the first rows in country.data

Data Manipulation: Matrix arithmetic I: 

Data Manipulation: Matrix arithmetic I Addition & Subtraction: The dimensions of the matrices must be the same e.g., A + B or A – B Scalar can be add to, subtracted from, multiplied by, or divided into a matrix. Matrix multiplication: The dimensions must be compatible (the number of rows in the first matrix must be the same as the columns of the second) eg., A %*% B Element-wise multiplication: A*B (matrix dimensions must be the same)

Data Manipulation: Merging Matrices: 

Data Manipulation: Merging Matrices Binding vectors to Matrices and merging matrices: bind rows (rbind), bind columns (cbind)

Data Manipulation: Arrays I: 

Data Manipulation: Arrays I Arrays can of up to eight dimensions. array(1:24, c(3,4,2)) 1 4 7 10 2 3 5 6 8 9 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Data Manipulation: Arrays II: 

Data Manipulation: Arrays II Useful functions for matrices: rowMeans, colMeans, rowSums, colSums, rowVars, colVars,… apply(data, dim, function,…) Example: x <- array(1:24, c(3,4,2)) > apply(x, 1, max) [1] 22 23 24 > apply(x, 2, max) [1] 15 18 21 24 > apply(x, 3, max) [1] 12 24

Data Manipulation: Data Frames I: 

Data Manipulation: Data Frames I Provides flexibility by allowing binding of vectors of different types together. Data types are preserved in data frames, and so functions such as max, mean, etc. can be computed. You can use sapply to find out the data types e.g., sapply(barley, class) yield variety year site "numeric" "ordered" "ordered" "ordered"

Data Manipulation: Data Frames II: 

Data Manipulation: Data Frames II You can find out if a data is a frame e.g., is.data.frame(country.data) [1] F You can refer to individual variables in a data frame e.g., country.frame$gdp [1] 227 1534 2365 country.frame$pop [1] 8 58 82

Data Manipulation: Lists: 

Data Manipulation: Lists Lists are data structures pasted together e.g., Ger.lang <- c(“austria”, “germany”, “leichtenstein”, “switzerland”) country.list <- list(country.frame, Ger.lang) > country.list [[1]]: gdp pop inflation austria 227 8 1.3 france 1534 58 1.2 germany 2365 82 1.8 [[2]]: [1] "austria" "germany" "leichtenstein" "switzerland"

S-Plus Graphics: 

S-Plus Graphics graphsheet( ) : To open a graphics window. Each time you invoke this, a new graphics window is opened. dev.off() : Close the most recent graphics device opened. graphics.off() : Close all graphics devices. plot comma-separated variables, plot character)

Graphing Data: 

Graphing Data plot command examples: a. plot(geyser$waiting, geyser$duration) b. attach(geyser) plot(waiting, duration) Syntax: plot (x, y, main, sub, xlab, ylab, type)

Figure Layouts: 

Figure Layouts par() command Example: par (mar=c(1,1,1,1)) margins 1” all around) par (mfrow=c(2,2) 4 (2 x 2) figures on a graph sheet to be plotted by row (mfcol, if to be filled by column)

Trellis Graphics I: 

Trellis Graphics I A matrix of graphs Example: >par(mfrow=c(2,2)) # 2 X 2 matrix of figures >x <- 1:100/100:1 >plot(x) # plot cell (1,1) >plot(x, type=“l”) # plot cell (1,2) line >hist(x) # plot cell (2,1) histogram >boxplot(x) # plot cell (2,2) boxplot

Trellis Graphics: Singer Data: 

Trellis Graphics: Singer Data

Trellis Graphics I: 

Trellis Graphics I Syntax: Dependent variable ~ explanatory variable |conditioning variable, Data set

Trellis Graphics II: 

Trellis Graphics II Example: histogram(~height | voice.part, data=singer) No dependent variable for histogram Height is explanatory variable Data set is singer