1 / 9

Weka: An open-source tool for data analysis and mining with machine learning

Weka: An open-source tool for data analysis and mining with machine learning. Quantitative Data Analysis Colloquium Centenary College of Louisiana Mark Goadrich 4/17/2008. Regression lines and correlation. Find relationship between two attributes Correlation coefficient. Categorization.

lane-potter
Download Presentation

Weka: An open-source tool for data analysis and mining with machine learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Weka: An open-source tool for data analysis and mining with machine learning Quantitative Data Analysis Colloquium Centenary College of Louisiana Mark Goadrich 4/17/2008

  2. Regression lines and correlation • Find relationshipbetween twoattributes • Correlationcoefficient

  3. Categorization • Can we learn onecategory basedon the others? • This search forclassification linesis called machinelearning

  4. Data Sets • House of Representative Votes • Labor Relations • Iris (plant) Discrimination • Breast Cancer • Many more at http://archive.ics.uci.edu/ml/ • Table of Features • Example is a row • Features are discrete or continuous

  5. Weka Time - Explore • http://www.cs.waikato.ac.nz/ml/weka/ • Open Explorer • Open Data File • ARFF or CSV • Visualize All • Visualize Crosstabs

  6. Discrete : Decision Trees • Reduce confusion(entropy) in thedata by drawingrecursive lines • Result is comprehensibleto humans

  7. Continuous : ANN and SVM • Artificial Neural Networks simulate activating and thresholding neurons • Support VectorMachines use akernel to transformdata to higherdimensions

  8. Weka Time - Classify • Choose Algorithm • J48, Multilayered Perceptron, SMO • Validate Learning • Training set • Cross validation • Visualize output • ROC Curves • Precision-Recall Curves

  9. Future Topics • Clustering • Number and makeup of categories unknown • Relational Data • Features are related within examples • Features are related across examples

More Related