1 / 48

Computacion inteligente

Computacion inteligente. Fuzzy Clustering. Agenda. Basic concepts Types of Clustering Types of Clusters Distance functions Clustering Algorithms. Basic concepts. Classification. Historically, objects are classified into groups periodic table of the elements (chemistry)

reedthomas
Download Presentation

Computacion inteligente

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computacion inteligente Fuzzy Clustering

  2. Agenda • Basic concepts • Types of Clustering • Types of Clusters • Distance functions • Clustering Algorithms

  3. Basic concepts

  4. Classification • Historically, objects are classified into groups • periodic table of the elements (chemistry) • taxonomy (zoology, botany) • Why classify? • Understanding • prediction • organizational convenience, convenient summary • Summarization • Reduce the size of large data sets These aims do not necessarily lead to the same classification; e.g. SIZE of object vs. TYPE/USE of object

  5. Classification • Classification divides objects into groups based on a set of values • Unlike a theory, a classification is neither true nor false, and should be judged largely on the usefulness of results • However, a classification (clustering) may be useful for suggesting a theory, which could then be tested

  6. Inter-cluster distances are maximized Intra-cluster distances are minimized What is clustering? • Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups

  7. Simple example Composition of mammalian milk

  8. Composition of mammalian milk Proteins (%) Clustering Fat (%) Simple example

  9. Feature space Pattern What is clustering? • No class values denoting an a priori grouping of the data instances are given. • So, it’s a method of data exploration • a way of looking for patterns or structure in the data that are of interest

  10. What is clustering? • A form of unsupervised learning • You generally don’t have examples demonstrating how the data should be grouped together • Clustering is often called an unsupervised learning task Due to historical reasons, clustering is often considered synonymous with unsupervised learning.

  11. Clustering vs. class prediction • Clustering: • No learning set, no given classes • Goal: discover the ”best” classes or groupings • Class prediction: • A learning set of objects with known classes • Goal: put new objects into existing classes • Also called: Supervised learning, or classification

  12. Components of Clustering Task • Pattern Representation • Number of classes and available patterns • Number, type, and scale of features available to algorithm • Feature selection/extraction • Definition of Pattern Proximity measure • Defined on pairs of patterns • Distance measures and conceptual similarities And…

  13. Components of Clustering Task • Clustering / Grouping • Data abstraction (optional) • Extraction of simply and compact data representation • Output Assessment (optional) • How good is it? • The quality of a clustering result depends on the algorithm, the distance function, and the application.

  14. Pattern Representation • Which features do we use? • Currently, no theoretical guidelines to suggest appropriate patterns and features to use in specific situation • User generally must provide insight • Careful analysis of available features can yield improved clustering results

  15. Pattern Representation • Which features do we use? • Currently, no theoretical guidelines to suggest appropriate patterns and features to use in specific situation • User generally must provide insight • Careful analysis of available features can yield improved clustering results

  16. Pattern Representation • Example: The balls of same colour are clustered into a group as shown below: Thus, we see clustering means grouping of data or dividing a large data set into smaller data sets of some similarity.

  17. How many clusters? Six Clusters Two Clusters Four Clusters Notion of a Cluster can be Ambiguous

  18. Types of Clustering

  19. Types of Clusterings • Important distinction between hierarchical and partitional sets of clusters • Partitional Clustering • A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset • Hierarchical clustering • A set of nested clusters organized as a hierarchical tree

  20. Original Points A Partitional Clustering Partitional Clustering

  21. Hierarchical Clustering

  22. Other Distinctions Between Sets of Clusters • Exclusive versus non-exclusive • In non-exclusive clusterings, points may belong to multiple clusters. • Fuzzy versus non-fuzzy • In fuzzy clustering, a point belongs to every cluster with some weight between 0 and 1 • Weights must sum to 1 • Partial versus complete • In some cases, we only want to cluster some of the data

  23. Types of Clusters

  24. Types of Clusters • Well-separated clusters • Center-based clusters • Contiguous clusters • Density-based clusters • Property or Conceptual • Described by an Objective Function

  25. 3 well-separated clusters Types of Clusters: Well-Separated • Well-Separated Clusters: • A cluster is a set of points such that any point in a cluster is closer (or more similar) to every other point in the cluster than to any point not in the cluster.

  26. 4 center-based clusters Types of Clusters: Center-Based • Center-based • A cluster is a set of objects such that an object in a cluster is closer (more similar) to the “center” of a cluster, than to the center of any other cluster • The center of a cluster is often a centroid, the average of all the points in the cluster, or a medoid, the most “representative” point of a cluster

  27. 4 center-based clusters Types of Clusters: Center-Based • Center-based • The centroid representation alone works well if the clusters are of the hyper-spherical shape. • If clusters are elongated or are of other shapes, centroids are not sufficient

  28. Common ways to represent clusters • Use the centroid of each cluster to represent the cluster. • compute the radius and • standard deviation of the cluster to determine its spread in each dimension • The centroid representation alone works well if the clusters are of the hyper-spherical shape. • If clusters are elongated or are of other shapes, centroids are not sufficient

  29. 8 contiguous clusters Types of Clusters: Contiguity-Based • Contiguous Cluster (Nearest neighbor or Transitive) • A cluster is a set of points such that a point in a cluster is closer (or more similar) to one or more other points in the cluster than to any point not in the cluster.

  30. 6 density-based clusters Types of Clusters: Density-Based • Density-based • A cluster is a dense region of points, which is separated by low-density regions, from other regions of high density. • Used when the clusters are irregular or intertwined, and when noise and outliers are present.

  31. 6 density-based clusters Types of Clusters: Density-Based • Density-based • A cluster is a dense region of points, which is separated by low-density regions, from other regions of high density. • Used when the clusters are irregular or intertwined, and when noise and outliers are present.

  32. 2 Overlapping Circles Types of Clusters: Conceptual Clusters • Shared Property or Conceptual Clusters • Finds clusters that share some common property or represent a particular concept.

  33. Types of Clusters: Objective Function • Clusters Defined by an Objective Function • Finds clusters that minimize or maximize an objective function. • Enumerate all possible ways of dividing the points into clusters and evaluate the ‘goodness’ of each potential set of clusters by using the given objective function.

  34. Types of Clusters: Objective Function • Clusters Defined by an Objective Function • Can have global or local objectives. • Hierarchical clustering algorithms typically have local objectives • Partitional algorithms typically have global objectives

  35. Distance functions

  36. Clustering Task • Consists in introducing D, a distance measure (or a measure of similarity or proximity) between sample patterns.

  37. Distance functions • The similarity measure is often more important than the clustering algorithm used • Instead of talking about similarity measures, we often equivalently refer to dissimilarity measures

  38. Quality in Clustering • A good clustering method will produce high quality clusters with • high intra-class similarity • low inter-class similarity • The quality of a clustering result depends on both the similarity measure used by the method and its implementation

  39. Distance functions • There are numerous distance functions for • Different types of data • Numeric data • Nominal data • Different specific applications Weights should be associated with different variables based on applications and data semantics.

  40. Distance functions for numeric data • We denote distance with: • where xi and xj are data points (vectors) • Most commonly used functions are • Euclidean distance and • Manhattan (city block) distance d(x,y) They are special cases of Minkowski distance

  41. Metric Spaces • Metric Space: A pair (X,d) where X is a set and d is a distance function such that for x,y in X: Symmetry Separation Triangular inequality

  42. C Q p = 1 Manhattan (Rectilinear, City Block) p = 2 Euclidean p =  Max (Supremum, “sup”) d(Q,C) Minkowski distance, Lp

  43. Euclidean distance, L2 • Here n is the number of dimensions in the data vector.

  44. deuc=0.5846 deuc=1.1345 deuc=2.6115 Euclidean distance These examples of Euclidean distance match our intuition of dissimilarity pretty well…

  45. deuc=1.41 deuc=1.22 Euclidean distance …But what about these? What might be going on with the expression profiles on the left? On the right?

  46. Weighted Euclidean distance • Weighted Euclidean distance

  47. Mahalanobis distance

  48. More Metrics • Manhattan distance, L1 • Linf(Chessboard):

More Related