170 likes | 342 Views
Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning. Jian Zhang Supervised by: Karen Petrie. Background. Cancer research has become an extremely data rich environment. Plenty of analysis packages can be used for analyzing the data. Data preprocessing.
E N D
Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning Jian Zhang Supervised by: Karen Petrie
Background • Cancer research has become an extremely data rich environment. • Plenty of analysis packages can be used for analyzing the data. • Data preprocessing.
Rich data environment • There are some factors about breast cancer
Raw clinical data sample • Yes-No data: yes: yes, Yes, Ye, yed, yef … no: No, n, not … null: don’t know, no data, waiting for lab • Positive-Negative data: Positive: +, ++, p, p++… Negative: -, n, neg, n---… Null: no data, ruined sample, waiting for lab
Question? Could we make the process automated?
Introduction • Decision Tree learning • Weka
Decision Tree Learning • Decision tree learning is a method for approximating discrete-valued functions, which is one of the most popular inductive algorithms.
Weka • Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, which contains a collection of algorithms for data analysis and predictive modeling.
Experiment • Data: Training dataset with 100 instances Test dataset with 100 instances, which has 17 different values from the training dataset • Tool: weka
Experiment • Experiment 1 : training dataset • Experiment 2 : training dataset, test dataset
Result • Through the results, the decision tree has a good classification and prediction for the existing entries, but for the unknown entries, the prediction is not as good as expected.
Future work • Find and correct the incorrect prediction in the process • Automated transformation for unknown entries