1 / 13

Describing and Exploring Data

Describing and Exploring Data. Initial Data Analysis. Overview. Describing and Exploring data Initial Data Analysis Characteristics (Some) Steps involved Methods Statistics Central Tendency Variability Relationships Issues. Describing and Exploring Data.

lucita
Download Presentation

Describing and Exploring Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Describing and Exploring Data Initial Data Analysis

  2. Overview • Describing and Exploring data • Initial Data Analysis • Characteristics • (Some) Steps involved • Methods • Statistics • Central Tendency • Variability • Relationships • Issues

  3. Describing and Exploring Data • Once data has been collected, the raw information must be manipulated in some fashion to make it more informative. • Several options are available including plotting the data or calculating descriptive statistics.

  4. Plotting Data • Often, one of the first things one does with a set of raw data is to plot the data in some manner. • One should start with visual display of data. • Examples • Frequency and density information • Histograms, Violin plots • Bar plots • Trends over time or across groups • Line graphs • Display of interval information (error bars) • Relationships • Scatterplots • Combinations • Visual display of data allows for more rapid comprehension of distributions and relationships • Use it whenever possible

  5. Descriptives • The other main part of initial examination of data includes acquiring descriptive statistics • Measures of Central Tendency- ‘Expected’ values • Single measure estimates • Mean, Median, Mode, Trimmed Means, M-estimators • Measures of Variability: estimates of uncertainty • Standard deviation, MAD • Allow for interval estimates on any number of statistics via the standard error • Simple correlation measures among the variables under consideration • You should think of correlation statistic as a descriptive, not inferential statistic • Except for purely exploratory endeavors, correlations are a starting point for analysis, not an end • In fact, many of the analyses you come across use the correlation matrix as the dataset

  6. Initial Data Analysis (IDA) • Also Initial Examination of Data, Exploratory Data Analysis • Often overlooked or thought of as being not all that important but… • It is at the beginning stages where much trouble can be avoided, and if the data is glossed over this can lead to missed findings or results that will not be able to be replicated because they represent bad data. • Bad data?

  7. Initial Data Analysis • IDA includes: • General descriptive and graphical output • A healthy inspection of the individual variables’ behaviors • Especially visually • Outlier analysis • Outliers in terms of the model, not the individual variables necessarily • Possible model selection or re-specification • Initial inference measures and testing assumptions of the analysis

  8. Steps of an analysis • 1. Clarify the objectives of the investigation • 2. Collect the data in the appropriate way • 3. Investigate the structure and quality of the data • 4. Carry out IDA (descriptive) • 5. Select and carry out formal statistical analysis (inferential) • 6. Compare findings with previous results, collect more data if necessary • 7. Interpret and communicate results * Be flexible in your approach, and treat each research situation uniquely

  9. Method of IDA • Data scrutiny and description • Study variables in light of how they were collected • Look for troublesome variables and that may warrant special analysis later if used inferentially • Search for outliers, missing data etc. that may result in less powerful inferential analysis • Gather summary (descriptive) statistics and graphs presented as to not be misleading • See if transformations or robust statistics are necessary.

  10. Method of IDA • Use inferential analyses in an exploratory way • Model Formulation • Include relevant theory • Recognize important features of the data • Do the model and data go together? • Might there be new hypotheses worthy of examination? • Is further analysis even necessary?

  11. Initial Data Analysis • Problem • Although seen by most stats folk as an important part of data analysis, IDA is often underused as a source of information and important first step in in data interpretation • “Theories looking for data” • Too much concern on inferential analysis, statistical significance • A far too typical approach seems to be get the data, run descriptives, and at the same time or immediately following run the actual analysis • Then because results are poor start figuring out ways to ‘fix’ it.

  12. Why the lack of emphasis on IDA? • Assumed it is the natural way that people conduct their research anyway • It isn’t if they are left to their own devices • Assumed lack of standard methods for going about it • In fact there are guidelines for how to do it • See Chatfield in related articles section • Assumed its too exploratory • IDA != fishing • Don’t disregard prior knowledge and theory • Risk of invalid conclusions • This would be a concern if you didn’t perform IDA

  13. Conclusion • Analysis of data takes time and one must be prepared to exhaustively examine all aspects of the information collected • The purpose of analysis is to allow the data to tell its story, not enforce our own onto the data • An open-minded and thoughtful approach is necessary to any investigation

More Related