140 likes | 233 Views
Introducing Scalability into Smart Grid. p resented by Vasileios Zois CS at USC 09/20/2013. Smart Grid Project Services. Manage Data Sparse Data Heterogeneous Data Semantic Represantation Train Prediction Models Data Intensive Application On Demand Procedure
E N D
Introducing Scalability into Smart Grid presented by Vasileios Zois CS at USC 09/20/2013
Smart Grid Project Services • Manage Data • Sparse Data • Heterogeneous Data • Semantic Represantation • Train Prediction Models • Data Intensive Application • On Demand Procedure • Make Prediction & Update Models • Fast Access to Trained Models • Update with new values
Steps to Scalability • Management of Data • Choose Underline Technology • Evaluate provided services • Training of Models • Design Training Tools • Take Advantage of Infrastructure • Give Efficient Solutions to Training • Access & Update Training Models • Update: Change Invariants that Effect Prediction • Do it Efficiently
Managing Data • Requirements • Efficient Usage of Storage • Access Client to Data • Semantic Organization of Data • Possible Solutions • Distributed File System (HDFS) • Raw Data • Work out a Structure (XML, Ontology Schemas) • Column Oriented NoSQL Systems(Hbase,Cassandra) • Structure offered – Column Families • Implemented Operations • Still Needs Reasoning Operations
Prediction Models • Regression Tree • Support Features • Tree Building • Scalable Implementation OpenPlanet • ARIMA Model • Short Term Prediction • Does Not Support Features? • On Demand Training • Small Prediction Window
Scalable Prediction • Brute Force • Efficient use of resources • Build a system from scratch • Decrease Problem Size • Group Data and Pick Representatives • Clustering of Data with Similar Features • Introduce Features into ARIMA model • Use features to cluster Data • Execute Model on Clustered Data • Customer SuperCustomer
Parallel Clustering • Problem • Computationally Expensive • High Dimensional • Inevitable Parallelization • Challenges to Parallelization • Partitioning of Data to achieve Load Balance • Reduction of the Communication Cost • Approaches • Hierarchical Clustering : PBirch • Evolutionary Strategies Clustering • Density Based Clustering : PDBSCAN • Model Based Clustering : Autoclass System
Parallel Hierarchical Clustering • PBirch • Single Program Multiple Data(SPMD) • Message Passing Interface (MPI) • Steps • Distribute Data Equally • Build Tree on Each Processor • Execute Clustering on Leaf nodes - Parallel Kmeans • Results • Linear Speedup • Increased Communication Latency • http://www.cs.gsu.edu/~wkim/index_files/papers/pbirch.pdf
Clustering with Evolutionary Strategies • Model • Stochastic Optimization • Biological Evolution Concepts • Recombination, Mutation • Motive: Huge Range of Possible Solutions • Parallelization Techniques • Master – Slave Model • Master in charge of parent solutions • Slave in charge of recombination and mutation • Fits into mapreduce model • Proposed Solution • http://www.cs.gsu.edu/~wkim/index_files/papers/clusteringwithes.pdf
Parallel Density Based Clustering • PDBSCAN • Based on original DBSCAN Algorithm • Shared Nothing Architecture • Execution • Divide Input into Several Partitions • Concurrently Cluster Data Locally with DBSCAN • Merge Local Clusters into Global Clusters • dR*-Tree Introduced • Decreased Communication Cost – Efficient Access of Data • Distributed Data Pages • Replicated Indices on all Machines • Results • Near Linear Speedup to the number of Machines • http://www.cs.gsu.edu/~wkim/index_files/papers/fastParallel_XU.pdf
Parallel Model Based Clustering • Auto-class System • Bayesian Classification • Probability of an Instance belonging to a class • Approach • SIMD Single Instruction Multiple Data • Divide Input into Processors • Update Parameters for Classification Locally • No Need for Load Balancing • Results • Good Scaling • After a certain threshold the communication starts to hinder the performance
Clustering By Sorting Potential Values • Main Idea • Potential Model • Derived from Gravitational Force Model in Euclidean Space • Parameters: • Gravitational Constant, • Bandwidth Distance B ( Max Distance from center of cluster ) • δ threshold distance (avoid singularity problem) • Execution • Calculate Potential at each Point • Sort Points According to the Calculated Potential • Choose Cluster Centers by iteration over sorted array • If distance between to points in array > B create new cluster • Results • Near optimal Solution • http://www.sciencedirect.com/science/article/pii/S0031320312001136
Thank you for your attention! Vasilis Zois vzois@usc.edu