120 likes | 210 Views
GECCO 2013 Industrial Competition. Computer Engineering Lab, School of Electrical and IT Engineering. Rommel Vergara. Introduction. Machine Learning Algorithm used: Kernel Recursive Least Squares
E N D
GECCO 2013 Industrial Competition Computer Engineering Lab, School of Electrical and IT Engineering Rommel Vergara
Introduction • Machine Learning Algorithm used: Kernel Recursive Least Squares • Used an open-source C++ library, dlib (http://dlib.net) which has a full implementation of the KRLS algorithm. • Challenges: • Aperiodic and missing data samples • Selection of the feature set to describe the Temperature and Humidity values.
Data Preprocessing • All data sets contained missing data and were aperiodic in nature. • SOLUTION: • Missing Data: Missing data occurrences were linearly approximated between their adjacent available data points. • Aperiodic Nature: Data was standardized into 10 minutes intervals (as required by the format for the competition)
Data Preprocessing • Data set was also narrowed to contain the target weekdays only: Tuesday, Wednesday and Thursday. • Able to capture more accurately minor and specific changes that occur on these particular weekdays.
Data Preprocessing Tue Tue Wed Thu
Feature Set Selection • The ‘Feature Set’ is a selection of inputs that contribute and explain the output • Important to choose the feature set to describe the outputs we are predicting • There are many ways to represent the feature set to describe an output. • Each feature set is represented as a column vector and is fed into the KRLS algorithm.
Temperature Feature Set • The temperature feature set chosen contained 145 values • Current weather value • 144 temperature values 10 minute rolling window of the previous and current weekday lagged by 1 week. • REASONING: • Remove noise that exists in other weekdays • Allow KRLS to concentrate and focus more on the specific weekdays that needed to be judged in the competition.
Temperature Feature Set • Example: • To predict the following data point: • 20/02/2013 00:00 (Wednesday) • The following temperature values were used: • 12/02/2013 00:00 (Tuesday) to • 12/02/2013 23:50 (Tuesday) • To predict the next data point: • 20/02/2013 00:10 (Wednesday) • The following temperature values were used (10 minute rolling window): • 12/02/2013 00:10 (Tuesday) to • 13/02/2013 00:00 (Wednesday)
Humidity Feature Set • The humidity feature set chosen contained 2 values • Current weather value • Predicted KRLS temperature value • This proved to be ineffective, providing an RMSE of 0.12 in the competition. • CHALLENGE: Humidity data set was aperiodic and observed to have discrete-like behaviour. • IMPROVEMENT: Given a more continuous data set, I would have chosen the same technique as the Temperature Feature Set, which is to take a rolling window of previous and current weekday humidity values lagged by 1 week.
References • “The Kernel Recursive Least Squares Algorithm” (2003), Yaakov Engel, ShieMannor, Ron Meir. • dlib library: http://dlib.net • Contact: • Rommel Vergara (University Of Sydney) • rommel_vergara@hotmail.com