1 / 15

CLUTO A Clustering Toolkit

CLUTO A Clustering Toolkit. By Roseline Antai. Wha t is CLUTO?. CLUTO is a software package which is used for clustering high dimensional datasets and for analyzing the characteristics of the various clusters. Algorithms of CLUTO. v cluster s cluster Major difference : Input format

Download Presentation

CLUTO A Clustering Toolkit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLUTOA Clustering Toolkit By RoselineAntai

  2. What is CLUTO? CLUTO is a software package which is used for clustering high dimensional datasets and for analyzing the characteristics of the various clusters.

  3. Algorithms of CLUTO vcluster scluster Major difference: Input format vcluster: actual multidimensional representation of the objects to be clustered. scluster: The similarity matrix (or graph) between these objects.

  4. Calling Sequence vcluster [optional parameters] MatrixFileNclusters scluster [optional parameters] MatrixFileNClusters

  5. Optional Parameters • Standard specification -paramname or –paramname = value • Three categories: • Clustering algorithm parameters • Reporting and Analysis parameters • Cluster Visualization parameters

  6. Clustering algorithm parameters • Control how CLUTO computes the clustering solution. • Examples • -clmethod=string ( rb, agglo,direct,graph, etc) • -sim = string (cos,corr,dist,jacc) • -crfun = string (i1,i2 etc) • -fulltree

  7. Reporting and Analysis Parameters • Control the amount of information that vcluster and scluster report about the clusters as well as the analysis performed on discovered clusters. • Examples • -clustfile= string. ( Default is MatrixFile.clustering.Nclusters( or GraphFile)) • -clabelfile = string (name of the file that’s stores the labels of the columns. Used when –showfeatues, -showsummaries or –labeltree are used)

  8. -rlabelfile=string • -rclassfile=string (Stores the labels of the rows – objects to be clustered). • -showtree • -showfeatures (descriptive and discriminating)

  9. Cluster Visualization Parameters • Simple plots of the original input matrix which show how the different objects (rows) and features (columns) are clustered together. • Examples • -plottree = string; gives graphic representation of the entire hierarchical tree • -plotmatrix= string; shows how the rows of the original matrix are clustered together.

  10. A practical example • ../cluto/Linux/vcluster -clmethod=rb -sim=cos -fulltree -rlabelfile=Final_Results/rlabelfile -rclassfile=Final_Results/classfile -showtree -plotformat=gif -plottree=Final_Results/Images/PT-Final10d -plotmatrix=Final_Results/Images/PM-Final10d -plotclusters=Final_Results/Images/PC-Final10d -showfeaturesFinal_Results/FinalOutput10d-Vt.mat 4

  11. Classfile and rlabelfile EvoSemImpImpDeoDeoImpImpDeoDeoImpDeoDeoImpSemDeoSemImpImpEvo 0123456789101112131415

  12. Plotclusters output

  13. The plot uses red to denote positive values and green to denote negative values. Bright red/green indicate large positive/negative values, whereas colors close to white indicate values close to zero.

More Related