1 / 42

Data Visualisation

HCI 0283 Lecture 3 Quantitative Data. Data Visualisation. Basic Task. When dealing with quantitative data we are trying to select one item from many based upon the numerical values of their attributes

sulwyn
Download Presentation

Data Visualisation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HCI 0283 Lecture 3 Quantitative Data Data Visualisation

  2. Basic Task • When dealing with quantitative data we are trying to select one item from many based upon the numerical values of their attributes • For instance, when buying a car we may consider price, fuel consumption, engine size, number of passengers… • We need to be able to present this data and allow it to be rearranged in a manner that makes the decision task easier

  3. Dimensionality • The complexity of a display is generally dependent upon the number of attributes (variables) involved • This is the dimensionality of the problem • If we have only one attribute (such as price) then we are dealing with univariate data and the problem is relatively straightforward • If we add a second attribute (such as fuel consumption) then we are dealing with bivariate data and the problem is a little more difficult • Three attributes – trivariate data – leads to even more difficulty in display, and anything more than that – hypervariate data – is a very difficult task indeed • We will consider, in turn, univariate, bivariate, trivariate and hypervariate data and how we can display it

  4. Univariate Data • Suppose we have a list of cars that are characterised by price • This is an example of a collection of univariate data • We could present this data as a table or, more effectively, as a plot of points against some scale • Which we choose may depend on how much space is available

  5. Simple Plot • The simple plot makes it easy to see • The lowest and highest values • The general distribution of points • Any bunching of points • It is not easy to add labels • It is not easy to make any judgements based on aggregation, e.g. the mean price 40 50 60 0 10 20 30 Price (£K)

  6. Tukey Plots • The easiest was to introduce aggregation is to use a Tukey Box Plot • This shows • The 25th, 50th and 75th percentiles as a box • The 10th and 90th percentiles as lines • Anything beyond these as ‘outliers’ 40 50 60 0 10 20 30 Price (£K)

  7. Simple Plots • It is very useful to be able to zoom in to examine particular ranges in detail BUT • Zooming should reveal more detail, not larger dots! 30 40 Volvo Volvo Rover Rover Merc BMW BMW Saab 32 34 36 38 30 40

  8. Bivariate Data • Bivariate data is often displayed using a two dimensional plot of one variable against the other • Such scatterplots allow us to easily identify • Global trends • Trade-offs • Outliers 6 5 4 3 Number of Bedrooms 2 1 50K 100K 150K 200K 250K 300K Price (£)

  9. Bivariate Data • If we can control one variable and therefore group the data then we can create a display of multiple boxplots • This makes it easy to make comparisons of aggregate measures between groups • This is beginning to look a bit like a histogram…

  10. Multiple Boxplot

  11. Histograms • Histograms can also be used to show categorised bivariate data • Simple histograms are of limited use, but we can extend their usefulness by making them interactive • With two histograms we can show how the data in one histogram is related to the data in the other

  12. Attribute A Attribute B

  13. Attribute A Attribute B

  14. Attribute A Attribute B

  15. Attribute A Attribute B

  16. Coffee Time!

  17. Trivariate Data • We live in a three-dimensional world, so surely a three-dimensional data display is natural, right? • Yes, if you can build a physical model, do it in virtual reality or using holographic displays  • Most people have to make do with two-dimensional monitors or sheets of paper

  18. Trivariate Data • Can we decide if A has a greater value of Price than B? • No • We could aid this by projecting each point onto each axis Price D C Bedrooms B A Travel time

  19. Trivariate Data Bedrooms Price Travel Time D A A C D D B B A C C B Travel Time Bedrooms Price

  20. Scatterplot Matrix D D Price C C B B A A Travel Time A Travel Time D B C Bedrooms

  21. Trivariate Data • The scatterplot matrix contains as many scatterplots as there are pairs of parameters • For more than five parameters this becomes unworkable • If we have N objects with M parameters we end up with NxM points in total • Labelling this many points is a problem

  22. Brushing • The scatterplot matrix is still a very useful tool if we use the brushing technique to highlight points of interest • If we select (or brush) a subset of points on one plot then the corresponding points on other plots are also highlighted • Brushing is particularly useful when dealing with hypervariate data and can be implemented in many ways

  23. Brushing Price Travel Time Travel Time Bedrooms

  24. Trivariate Data • It is also difficult to interpret surfaces • Simple questions such as ‘what is the minimum value of Z?’ are difficult to answer • There are two main approaches to this • Flooding, i.e. slicing through the surface at the desired value • Rotating the surface

  25. Trivariate Data • Rotate it…

  26. Hypervariate Data • Many real-world situations require us to display more than three variables • One solution is to use parallel coordinate plots • These take all of the axes of a multidimensional space and arrange them parallel to each other • Each data point appears once on each axis

  27. Parallel Coordinate Plot Price Travel Time Price Travel Time Price D C B A Travel Time

  28. Multivariate Data • This is easy to extend to any desired number of dimensions with each dimension being treated equally

  29. Hypervariate Data • A variation on this approach is to use a starplot • In this case the axes radiate from a common origin • This is similar to Florence Nightingale’s original ‘batwing’ plots • Multiple objects can be compared on the basis of their shapes

  30. Starplot A G B F C E D

  31. Hypervariate Data • Mosaic plots can also be used to represent and rearrange hypervariate data • If we add gender to the eye and hair colour data in the mosaic plot example we used last week then we can extend the mosaic plot to show this

  32. Male / Female

  33. Hypervariate Data • The extension to four dimensions is also straightforward • In April 1912 the cruise liner Titanic sank killing 1731 of the 2201 people on board • The raw data on these deaths contains four variables – Gender, Survival, Class and Adult/Child

  34. Titanic Raw Data

  35. Hypervariate Data • The Scatterbox principle can also be extended to large numbers of dimensions • The resulting structure is a hyperbox • This looks like a 3D structure constructed so that all possible pairs of variables are shown plotted against each other • This is best used interactively so that the face of interest can be rotated to the front

  36. Hyperbox • Each pair of numbers represents a pair of dimensions • Each face is a bivariate display • Rotating and deforming the object allows each face to be easily viewed and interpreted 12 13 23 14 34 24 45 15 35 25

  37. Summary • Univariate • 1 dimension, Tukey Box Plots • Bivariate • 2 dimensions, scatterplots, histograms • Trivariate • 3 dimensions, scatterplot matrix, brushing • Hypervariate • >3 dimensions, parallel coordinate plots, mosaic diagrams, starplots, hyperboxes

  38. Coming Soon… • Next lecture: Representation • Homework: Read chapter 3 of Information Visualisation (Spence) and the two papers handed out in the lecture

More Related