110 likes | 202 Views
A Perspective on the Data. Ajit Paul Singh M.Sc. Candidate Dept. of Computing Science University of Alberta. Machine Learning. Systems that use experience to improve at a given task . Data as experience Supervised vs. Unsupervised Learning SNP focus: supervised learning.
E N D
A Perspective on the Data Ajit Paul Singh M.Sc. Candidate Dept. of Computing Science University of Alberta
Machine Learning • Systems that use experience to improve at a given task. • Data as experience • Supervised vs. Unsupervised Learning SNP focus: supervised learning
Data Assumptions • Samples are independent, and identically distributed (IID) • Dealing with patients/tuples • One set complex distribution more training data • Split into subsets many simpler distribution less training data per problem
Defining the Task • Predictive • Diagnosing members of the public • Rare class issue • Diagnosing clinic referrals • Is the training set representative of patients that will be tested ? • Subtyping cancer patients • Feature Selection • Find interesting SNPs for further study
Measuring Improvement • Competitors • Human experts using clinical data • Diagnostic tests (e.g. BRCA1 truncations) • Other learners using genetic markers • Benefits of Polyomx • Accuracy, Cost, Speed • Need for a baseline to compare against
Issues to Consider • Missing data • Negative control features
Types of Missing Data • Missing Completely At Random (MCAR) • Missing At Random (MAR) • Censored
Negative Control Features • SNPs were hand selected • Feature selection problem • Measuring relevance of selected features • Prediction problem • Ensuring the learner is robust • Add negative control features • Features that are probably irrelevant