80 likes | 233 Views
Predictive Modeling in Data Management. Byung S. Lee Computer Science University of Vermont http://www.emba.uvm.edu/~bslee/homepage/. Cost UDF Overview. Funding: US Department of Energy. Title: Generating Cost Functions of User-Defined Functions. Phase 1: preliminary studies.
E N D
Predictive Modeling in Data Management Byung S. Lee Computer Science University of Vermont http://www.emba.uvm.edu/~bslee/homepage/
Cost UDF Overview • Funding: US Department of Energy. • Title: Generating Cost Functions of User-Defined Functions. • Phase 1: preliminary studies. • Phase 2: core modeling techniques. • Phase 3: applications.
How long would this one take to run? UDF CostUDF Problem
Phase 1 • Approaches: • Off-line training with cost data sets generated in the same batch. • On-line training with cost data sets generated in incremental batches. (a.k.a. self-tuning) • Models: • parametric or nonparametric regression.
Phase 1 • UDFs: • Financial time series aggregate functions: • median(time series, start date, end date) • nth moving window average(time series, start date, end date, window size) • Keyword-based text search functions: • “dog AND cat” • “dog OR cat” • “dog cat” within w words apart. • Spatial search operators: • range(ref_point, distance) • Window(lower_left_point, upper_right_point) • KNN(ref_point, K)
Phase 2 • Approaches: • On-line training with cost data points generated one at a time. • Assume limited main memory. • Models: • Nonparametric techniques using multidimensional index structures.
Phase 2 • Core modeling techniques: • Incremental edited k nearest neighbors. • Memory limited quadtrees. • Dr. Zhen He will give a quick overview of the recent phase 2 efforts.
Phase 3 • Additional core modeling techniques. • Abstraction of the problem to “efficient adaptive predictive modeling of incremental data.” • Applications that need • Value predictions. • Class predictions.