Shunting data

Shunting data Copying stuctures

TO F R O M

METHODS OF CONVERSION • C- copying • F-Fuction call • Q-querying • P-Printing • V-views • S-Subelements.

C – copying • Gsl_memcpy function is used • intgsl_vector_memcpy(gsl_vector * dest, const gsl_vector * src) • This function copies the elements of the vector src into the vector dest. The two vectors must have the same length. • This function assumes that the destination to which data is copied has already been allocated.

apop_data* apop_data_copy(const apop_data * in) • Copies one apop_data structure to another. That is, all data is duplicated. • Memmove(&second,&first,sizeof(datatype)) • It goes to the locatin of first and blindly copies what it finds to the location of second upto the size of one datatype.

Apop_system() : intapop_system(const char * fmt, ... ) • Call system(), but with printf-style arguments. • E.g. : char filenames[] = "apop_asst.capop_asst.o" apop_system("ls -l %s", filenames); • Returns: The return value of the system() call. • intgsl_matrix_memcpy(gsl_matrix * dest, const gsl_matrix * src) • This function copies the elements of the matrix src into the matrix dest. The two matrices must have the same size.

F – Function Call • These are designed to convert one format to another • There are two ways : • Using pointer to declare a list of pointers to pointers • Automatically allocated array to use double subscripts Second method is more convenient but it allows decl of matrix only once.

F – Function Call

P -printing • The ouptut can be directed to screen , file, database or system • Apop_opts.output_type function is used .it has following choices : • ‘s’ : print to screen (default) • ‘f’ : print to file • ‘d’ : stores the result in db table • ‘p’ : write to pipe in apop_opts.output-pipe

Q – querying • To get data from db queries can be used. • Four ways : • Apop_query_to_float • Apop_query_to_vector • Apop_query_to_data • Apop_query_to_matrix

int apop_query(const char * fmt, ... ) • Send a query to the database that returns no data. • As with the apop_query_to_... functions, the query can include printf-style format specifiers, such as : apop_query("create table %s(id, name, age);", tablename).

apop_data* apop_query_to_data(const char * fmt, ... ) • Queries the database, and dumps the result into an apop_data set. • Most data will be in the matrix element of the output. Column names are appropriately placed. • double apop_query_to_float(const char * fmt, ... ) • Queries the database, and dumps the result into a single double-precision floating point number.

This calls apop_query_to_data and returns the (0,0)th element of the returned matrix. Thus, if your query returns multiple lines, you will get no warning, and the function will return the first in the list • gsl_matrix* apop_query_to_matrix(const char * fmt, ... ) • Queries the database, and dumps the result into a matrix. • Uses apop_query_to_data and returns just the matrix part • Returns gsl_matrix

gsl_vector* apop_query_to_vector(const char * fmt, ... ) • Queries the database, and dumps the first column of the result into a gsl_vector. • Uses apop_query_to_data internally, then throws away all but the first column of the matrix. • Returns:Agsl_vector holding the first column of the returned matrix. Thus, if your query returns multiple lines, you will get no warning, and the function will return the first in the list.

S- subelements • Only some data items can be pulled out of entire set. • For this method of copying function from F above can be used.

V - views • Exactly similar to db views. • Can have have subsets of original matrices . • Changes made to original data will be reflected in views and vice versa • Following gsl_matrix functions are used: • Apop_matrix_row(m,row,v) • Apop_matrix_col(m,col,v) • Apop_submatrix (m, srow, scol, nrows, ncols, o )

Apop_matrix_col(m,col,v) • After this call, v will hold a vector view of the colth column of m. • Eg : Apop_matrix_col(m,5,col_v) • It will return a gsl_vector named col_v holding the fifth column • Apop_matrix_row(m,row,v) • After this call, v will hold a vector view of the rowth row of m. • Eg : Apop_matrix_row(m,3,row_v) • It will return a gsl_vector named row_v holding the third row

Apop_submatrix (m, srow, scol, nrows, ncols, o ) • It Pulls a pointer to a submatrix into a gsl_matrix • Parameters: • m : The root matrix • srowthe first row (in the root matrix) of the top of the submatrix • scol :the first column (in the root matrix) of the left edge of the submatrix • nrow: number of rows in the submatrix ncolnumber of columns in the submatrix

Example • Apop_submatrix(m,2,4,6,8,submat) • It will return a gsl_matrix * named submat whose (0,0)th element is at (2,4) from original matrix For data sets we use these functions with row/column names Apop_row_t(m,”fourth_row”,row_v) Apop_col_t(m,”fifth column”,col_v)

LINEAR ALGEBRA

apop_data* apop_dot(const apop_data * d1, const apop_data * d2, char form1,char form2 ) • A convenience function for dot products. • d1 may be a vector or a matrix, and the same for d2, • so this function can do vector dot matrix, matrix dot matrix, and so on. • If d1 includes both a vector and a matrix, then later parameters will indicate which to use. • Char form 1 and 2 are flags for each matrix indicating what to do with it • i.e ‘t’ for transpose • ‘v’ for vector • 0 use the matrix as it is.

Eg : apop_data(X,X,’t’,0) • it will X’X i.e.it takesdot product of X with itself and the first version of X is transposed.while the second is not. • If first row is vector it is always taken to be row .if second element is is a vector it is alwys taken to be column

intgsl_blas_ddot(const gsl_vector * x, const gsl_vector * y, double * result) • It returns the dot product of vectors X and Y • Eg : double dotprod; gsl_blas_dot(x,y,&dotprod); **The Basic Linear Algebra Subprograms (BLAS) define a set of fundamental operations on vectors and matrices which can be used to create optimized higher-level linear algebra functionality. The functions are declared in the file gsl_blas.h

MATRIX INVERSION AND EQUATION SOLVING

gsl_matrix* apop_matrix_inverse(const gsl_matrix * in) • Inverts a matrix. The in matrix is not destroyed in the process. You may need to call apop_matrix_determinant first to check that your input is invertible, or use apop_det_and_inv to do both at once. • Parameters : in is the The matrix to be inverted. • Returns:Its inverse.

double apop_matrix_determinant(const gsl_matrix * in) • Find the determinant of a matrix. The in matrix is not destroyed in the process. • apop_matrix_inverse , or apop_det_and_inv to do both at once. • Parameters: in • The matrix to be determined. Returns:The determinant.

double apop_det_and_inv(const gsl_matrix * in,gsl_matrix ** out,int calc_det,int calc_inv ) • Calculate the determinant of a matrix, its inverse, or both, via LU decomposition. The in matrix is not destroyed in the process.

Parameters:in The matrix to be inverted/determined. • Out : If you want an inverse, this is where to place the matrix to be filled with the inverse. Will be allocated by the function. • calc_det0: Do not calculate the determinant. \ 1: Do. • calc_inv0: Do not calculate the inverse. \ 1: Do. Returns:If calc_det == 1, then return the determinant. Otherwise, just returns zero. If calc_inv!=0, then *out is pointed to the matrix inverse.

Numbers • Values taken by floating point numbers can take :they are • INFINITY • -INFINITY • NAN(not a number)(mainly used for missing data)

MODELS

Apop_model • Similar to apo_data it encapsulates model information in uniform manner • It allows models to be in various functions that can take any model as input . • A model is intermediate between data and parameters.from there model can go in three directions

X ⇒ β : given data, estimate parameters (OLS parameter or covariance) • β⇒ X : given parameters generate artificial data (Monte Carlo) • (X, β )⇒ p : given both parameters and data estimate their likelihood or probability (Bayesian Estimation)

apop_model* apop_estimate(apop_data * d,apop_modelm ) • estimate the parameters of a model given data. • This function copies the input model, preps it, and calls m.estimate(d,&m). • If your model has no estimate method, then It assume apop_maximum_likelihood(d, m), with the default MLE params.

Parameters: • d :The data • m :The model Returns:A pointer to an output model, which typically matches the input model but has its parameters element filled in. • Eg apop_model *est = apop_estimate(data,apop_normal)

#include <apop.h> int main(void) { apop_text_to_db(.text_file="data", .tabname="d"); apop_data *data = apop_query_to_data("select * from d"); apop_model *est = apop_estimate(data, apop_ols); apop_model_show(est); } Ols : Ordinary Least Squares

Examples • Cook’s distance • Network data • MLE models • Utility maximization

Cook’s Distance • It is an estimate of how much each data point affects a regression. In a practical ordinary least squares analysis, Cook's distance can be used in several ways: 1)to indicate data points that are particularly worth checking for validity; 2) to indicate regions of the design space where it would be good to be able to obtain more data points. It is named after the American statistician R. Dennis Cook, who introduced the concept in 1977.

Cook's distance measures the effect of deleting a given observation. Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression. Points with a large Cook's distance are considered to merit closer examination in the analysis

MLE models • maximum-likelihood estimation (MLE) is a method of estimating the parameters of a statistical model. • When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters.

Example • one may be interested in the heights of adult female penguins, but be unable to measure the height of every single penguin in a population due to cost or time constraints. • Assuming that the heights are normally (Gaussian) distributed with some unknown mean and variance, the mean and variance can be estimated with MLE while only knowing the heights of some sample of the overall population. • MLE would accomplish this by taking the mean and variance as parameters and finding particular parametric values that make the observed results the most probable (given the model).

GRAPHICS

Graphics • Gnuplot is a free, command-driven, interactive, function and data plotting program. • Any mathematical expression accepted by C, FORTRAN, Pascal, or BASIC may be plotted. The precedence of operators is determined by the specifications of the C programming language.

plot and splot are the primary commands in Gnuplot. They plot functions and data in many many ways. plot is used to plot 2-d functions and data, while splot plots 3-d surfaces and data.

Syntax • plot {[ranges]} {[function] | {"[datafile]" {datafile-modifiers}}} {axes [axes] } { [title-spec] } {with [style] } {, {definitions,} [function] ...} • where either a [function] or the name of a data file enclosed in quotes is supplied.

To plot functions simply type: • plot [function] at the gnuplot> prompt. • For example: • gnuplot> plot sin(x)/x • gnuplot> splot sin(x*y/20) • gnuplot> plot sin(x) title 'Sine Function', tan(x) title 'Tangent'

Discrete data contained in a file can be displayed by specifying the name of the data file (enclosed in quotes) on the plot or splot command line. • Data files should have the data arranged in columns of numbers • Columns should be separated by white space (tabs or spaces) only, (no commas). • Lines beginning with a # character are treated as comments and are ignored by Gnuplot. • A blank line in the data file results in a break in the line connecting data points.

Customization of the axis ranges, axis labels, and plot title, as well as many other features, are specified using the set command. Specific examples of the set command follow.

Create a title: > set title "Force-Deflection Data" • Put a label on the x-axis: > set xlabel "Deflection (meters)" • Put a label on the y-axis: > set ylabel "Force (kN)" • Change the x-axis range: > set xrange [0.001:0.005] • Change the y-axis range: > set yrange [20:500] Have Gnuplot determine ranges: > set autoscale • Move the key: > set key 0.01,100

Delete the key: > unset key • Put a label on the plot: > set label "yield point" at 0.003, 260 • Remove all labels: > unset label • Plot using log-axes: > set logscale • Plot using log-axes on y-axis: > unset logscale; set logscale y • Change the tic-marks: > set xtics (0.002,0.004,0.006,0.008) • Return to the default tics: > unset xtics; set xtics auto

Shunting data

Shunting data

Presentation Transcript

Data Mining: Data

Hypothesized physiological functions of cardiac shunting

Data, Data, and more Data

Data, Data, and more Data

Data Data Data

Data Mining: Data

Data ! Data! Data!

Shunting-yard algorithm

Shunting passenger train units: Practical planning aspects

Data Mining: Data

Data Mining: Data

Plug in Hybrid Shunting Locomotive

Shunting: Detecting and Blocking Network Attacks at Ultra-High Speeds

Data Mining: Data

Hydrocephalus and Neuro Shunting

Data Begets Data

Data, Data, Everywhere...

Data, Data, and More Data

Shunting away the surges from the data and communication lines

Data Mining: Data