90 likes | 212 Views
Weka: An open-source tool for data analysis and mining with machine learning. Quantitative Data Analysis Colloquium Centenary College of Louisiana Mark Goadrich 4/17/2008. Regression lines and correlation. Find relationship between two attributes Correlation coefficient. Categorization.
E N D
Weka: An open-source tool for data analysis and mining with machine learning Quantitative Data Analysis Colloquium Centenary College of Louisiana Mark Goadrich 4/17/2008
Regression lines and correlation • Find relationshipbetween twoattributes • Correlationcoefficient
Categorization • Can we learn onecategory basedon the others? • This search forclassification linesis called machinelearning
Data Sets • House of Representative Votes • Labor Relations • Iris (plant) Discrimination • Breast Cancer • Many more at http://archive.ics.uci.edu/ml/ • Table of Features • Example is a row • Features are discrete or continuous
Weka Time - Explore • http://www.cs.waikato.ac.nz/ml/weka/ • Open Explorer • Open Data File • ARFF or CSV • Visualize All • Visualize Crosstabs
Discrete : Decision Trees • Reduce confusion(entropy) in thedata by drawingrecursive lines • Result is comprehensibleto humans
Continuous : ANN and SVM • Artificial Neural Networks simulate activating and thresholding neurons • Support VectorMachines use akernel to transformdata to higherdimensions
Weka Time - Classify • Choose Algorithm • J48, Multilayered Perceptron, SMO • Validate Learning • Training set • Cross validation • Visualize output • ROC Curves • Precision-Recall Curves
Future Topics • Clustering • Number and makeup of categories unknown • Relational Data • Features are related within examples • Features are related across examples