230 likes | 413 Views
Task 1 of PP Interpretation. 1.1 Further applications of boosting: This talk 1.2 Publication on boosting: Paper of Oliver Marchand submitted, but not yet published. Thunderstorm Prediction with Boosting: Verification and Implementation of a new Base Classifier. André Walser (MeteoSwiss)
E N D
Task 1 of PP Interpretation 1.1 Further applications of boosting:This talk 1.2 Publication on boosting:Paper of Oliver Marchand submitted, but not yet published
Thunderstorm Prediction with Boosting: Verification and Implementation of a new Base Classifier André Walser (MeteoSwiss) Martin Kohli (ETH Zürich, Semester Thesis)
Overview • Boosting Algorithm • Impact of learn data • Verification results • Mapping to probability forecast • New base classier: decision tree
New Data yes/no Supervised Learning Learner Rules Historic Data Classifier
COSMO-7 assml cycle Data for 79 SYNOP stations in Switzerland At least on year, every hour e.g. SI, CAPE, W, date, time LABEL DATA a thunderstorm „yes“ if an appropriate ww-code was reported in the SYNOP or at least 3 lightnings were registered within 13.5 km Learn data 13.5 km station
Iteration 1determine base classifier G 2calculate error, weights w 3adapt the weights of falselyclassified samples Input Weighted learn samples Number of base classifier M AdaBoost Algorithm
Output of the Learn process • M base classifier • Threshold classifier:
Iteration 1determine base classifier G 2calculate error, weights w 3adapt the weights of falselyclassified samples Input Weighted learn samples Number of base classifier M AdaBoost Algorithm Classifier:
Output of the Classifier: C_TSTORM Biased! 17 UTC 18 UTC Biased! 19 UTC
Reason: Inappropriate learn data… • SYNOP messages contain events and non-events, but are only available every 3 hours (most messages for 6, 12, 18 UTC). • Lightning data only contains events
New learn data sets • B – biasedSYNOP messages; only events from lightning data • F – fullSYNOP messages; all missing values are considered as non events • AL1 – at least 1SYNOP messages; when lightning data shows at least 1 events, all non missing value are considered as non-events
Without bias… 17 UTC 18 UTC 19 UTC
Verification • POD and FAR for different C_TSTORM values between 0.3 and 0.6 FAR = False Alarms / #Alarms • Learn data:Model: COSMO-7 assimilation cycle Jun 06 – May 07Obs: B / AL1 / F • Verification data: Model: COSMO-7 forecasts July 06 and May/June 07Obs: F
Verification: earlier results • Results reported last year for 2005:POD = 72%, FAR = 34% • Unfortunately not realistic, verification done with obs data B
July 2006 ~7% events Random forecast
Comparison with other system • DWD Expert-System: • Periode April 2006 - September 2006: POD = 0.346, FAR = 0.740
Mapping to a probability forecast Polygon fit in a reliability diagram: PC_TSTORM
Mapping to a probability forecast 0 ifx ≤ 0.4; ax2 + bx + c if 0.4 < x < 0.6; a0.62 + b0.6 + c if x ≥ 0.6. PC_TSTORM = Limitedresolution: Thesystempredictsprobabilitiesonlybetween 0 and ~40%
New Base Classifier: Decision Tree threshold classifier 1 1 0
New Base Classifier: Decision Tree threshold classifier 1 class 1 class 0 threshold classifier 2 threshold classifier 3 0 1 0 1
Conclusions & Outlook • Boosting • is a simple, efficient and effective machine learning method for model post-processing • is completely general • can employ a number of redundant indicators • computes a certainty of the classification mapped to probability forecast • First verification results promising, extended verification required • Benefit of decision trees?