1 / 38

Data Mining Applied to Chemistry and chemical engineering

Data Mining Applied to Chemistry and chemical engineering. 陆文聪. Department of Chemistry, College of Sciences, Shanghai University, P. R. China. 1 Introduction 1.1 Concept.

thetis
Download Presentation

Data Mining Applied to Chemistry and chemical engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Applied to Chemistry and chemical engineering 陆文聪 Department of Chemistry, College of Sciences, Shanghai University, P. R. China

  2. 1 Introduction 1.1 Concept • Data Mining is an analytic process designed to explore data in search of consistent patterns and/or systematic relationships

  3. between variables, and then to validate the findings by applying the detected patterns to new subsets of data.

  4. 1.2 Main Focuses(1) Materials design How to find the best conditions of preparation or the structure-property relationship of materials, in order to make experimental design for new materials preparation or to predict the physico-chemical properties of unknown materials systems.

  5. (2) Molecular design How to find the structure-active relationship of molecules, in order to design new compounds with expected biological activities or predict the physico-chemical properties of unknown molecules.

  6. (3) Industrial optimization How to acquire the optimized conditions of processing productions, in order to achieve the good results of industrial production.

  7. 2. Methods in MASTER (1) Optimal map recognition The projection map with best separability can be selected out according to the rate of correctness for classification.

  8. Fig.1 OMR Comparison to PCA (a) Classification diagram by using Optimal Map Recognition (OMR) (b) Classification diagram by using Pincipal Component Analysis (PCA)

  9. (2) Hyper-polyhedron (HP) HP Model can be created in such a way that the optimal zone can be expressed by a series of inequalities to describe the boundaries of two types of samples.

  10. Fig.2 Conceptual HP model

  11. (3) Optimal projection regression (OPR) The OPR method is a quantitative model with the data fusion of regression and Optimal Map Recognition (OMR) method. It utilizes the information of classification of data set to select the most appropriate features for regression.

  12. Fig.3 Conceptual OPR model Projection from hyperspace to 2-dimensional space X1 X2

  13. (4) Inverse projection Fig.4 Projection from 2-dimensional space to hyperspace

  14. (5) Hierachical projection model Fig.5 Conceptual hierachical projection

  15. (6) Support Vector Machine Support Vector Classification:

  16. Support Vector Regression: 支持向量超曲面 回归超平面 支持向量 不敏感通道 支持向量

  17. 3 Examples of Application 3.1 Applications in Materials Design (1) Optimization of high temperature superconductor A nonlinear function based on 5 terms with the PRESS value of 0.128 was obtained. By using inverse projection and OPR method, the critical temperature was promoted from 116 K to 121 K.

  18. Inverse projection result of high temperature superconductor

  19. (2) Composition design of rare-earth containing phosphor By extrapolation we obtained a series of new compositions located outside of the scope of German patents. Our experimental work confirmed that the brightness of these newly designed phosphor was higher than those the German patents had declared.

  20. Importance of features

  21. Classification diagram using Fisher method

  22. (3) Optimization of VPTC ceramic semiconductors By using MASTER, some proposed new composition and technological condition of VPTC materials gave much better result: the ratio of the electric resistance at 273K and minimum resistance was elevated from 20 to 27.3.

  23. Partial Least Square (PLS) result of VPTC ceramic semiconductors

  24. (4) Composition design of cathode materials of Ni/H battery By using Support Vector Machine (SVM), the mathematical models with powerful prediction ability had been built, and new formulations were predicted and proved by experiments.

  25. Cal. vs Exp. values of C400/C0

  26. (5) Formation condition for amorphous phase of ternary fluorides By using OMR method, the inequalities obtained were used to predict whether a new ternary fluoride could form amorphous phase or not. The results predicted were in agreement with the experimental ones.

  27. OMR result of formation condition for amorphous phase of ternary fluorides

  28. (6) Formation condition of ternary intermetallic compounds Using 2400 known phase diagrams as training set, the regularities of formation condition of ternary intermetallic compounds were found. A series of newly discovered ternary intermetallic compounds were “predicted” in this way with good results.

  29. OMR result of formation condition of ternary intermetallic compounds

  30. 3.2 Applications in Molecular design (1) Molecular screening of guanidine compounds The Hyper polyhedron (HP) and Support Vector Classification (SVC) methods were used for the computer-aided molecular screening of guanidine compounds. It was found that the predicted results of HP and SVC were better than those of the PCA, KNN and FDV methods etc.

  31. (2) Structure-activity relationship of antagonists SVC was used to investigate SAR of 26 compounds of antagonists. The results of leave-one-out cross-validation proved that the prediction ability of SVC method was better than those of the PCA, KNN and FDV methods etc.

  32. (3) Molecular screening of triazoles compounds (1) OMR model was used for the molecular screening of new triazoles compounds with probable higher anti-fungicidal activities. (2) The predicted results of SVC were better than those of the PCA, KNN and FDV methods etc.

  33. (4) Structure-property relationship of azo dyestuff Support Vector Regression (SVR) method was employed to predict the absorption maximum wavelength of 37 azo dyestuff molecules. The mean relative error is 4.22% for the training set and 4.52% for the predicted set, respectively.

  34. 3.3 Applications in industrial optimization (1) Optimization of nitriding technique for crankshaft production The problem is that the surface hardness of crankshaft products in the Factory of Wuxi Diesel Enginewas too low. It was found that there existed an “optimal zone” in the multidimensional feature space. After optimization, the rate of rejection decreased from 1.7% to 0.3%.

  35. (2) Springback prediction in sheet metal forming MASTER combining with FEA software (ANSYS/LS-DYNA 5.71) was used to predict the springback in V-type sheet steel forming. The relative error of springback predicted could be controlled within 10% compared with the experiments.

  36. 4Conclusion (1) MASTER software package is a comprehensive system consisting of orthogonal design, statistical analysis, data visualization, pattern recognition, regression analysis, artificial neural networks (ANN) and support vector machine (SVM) etc.

  37. 4 Conclusion (2)MASTER could be used to • optimize the formula and technological conditions • predict the biological activities and physico-chemical properties • improve the product quality and analyze the fault of processing production.

  38. Thank you

More Related