280 likes | 489 Views
Using SQL Server Data Mining. The Main Tools. SQL Server 2012 RDBMS to store data and support DMX Visual Studio 2010/SSDT Provides and IDE The direction MS made to use VS for all its development activity tasks, including DB, certainly include DW and DM. SSDT. It is part of VS2010
E N D
The Main Tools • SQL Server 2012 • RDBMS to store data and support DMX • Visual Studio 2010/SSDT • Provides and IDE • The direction MS made to use VS for all its development activity tasks, including DB, certainly include DW and DM
SSDT • It is part of VS2010 • You can either start it from SQL Server 2012 or VS 2010 • To start from SQL Server 2012 • Start all programs SQL Server 2012 SQL Server Data Tool • What you will see will be VS2010 IDE • The rest is very much the same as doing cubes
Preparing Data Source • We use a relational DBs as the data source • For example, we restore the MovieClick or CollegePlans databases from its backup or import from MS Access
Specify your Data Sources • Make sure the Solution Explorer is visible • If not, go to View Solution Explorer or CTL +W,S • Right Click on Data Sources to bring up the wizard • Skip into • Define connection • Provide • server name • Authentication mode • Then select the DB • If you cannot see the DB you are expecting, the first two steps are incorrect • If the connections are already defined, select one • Note: This is to your Relational DB
Impersonation Information • Impersonate Current User • Most security • Causing problems when delegation is needed • Impersonate Account*** • Make everyone’s life easier – my first choice, most of the cases • Impersonate Service Account • Not recommended • Inherit • May be, never tried
DSV (2) • This is where the modeling begin • Defines how you want to see the data at the data source • Here we define case table, nested case tables, and other lookup tables (also called dimension tables)
View Data • Table • Pivot Table • Chart • Pivot Chart
Creating Structure • Right click on Mining structure • Create a structure with a mode or not
Successfully deployed • To deploy Build Deploy MoviceClick
Mining Model Viewer • Mining Structure • Manages columns • Mining Models • Add new models or set model parameters • Filter the rows • Model view • Second level tabs are model dependent
More on lift Chart • Lift Chart • One line for each model • A random line • Ideal line • Lift is a measure of the effectiveness of a predictive model • You can also perform profit calculation
Other Charts – Scatter Plot • Scatter Plot • A scatter plot is generated instead of a lift chart whenever the predictable attributes has continuous values. • charts the accuracy of a model that predicts a continuous attribute, comparing the actual values versus the predicted values for each case.
Cross-Validation • It uses the training data • Divide in to n folders • Each folder uses data in other n -1 folders for training and uses the current folder data for test • You can check • How good the results of predicting are • If the results are uniform across the folders
Mining Model Prediction • Allows you to create and run DMX queries • You can use the GUI (Design View) to create queries • You can use the Query Editor to modify queries • View the result on the result view
Query Example [MCNT-Tree].[Age] = t.[Age] AND [MCNT-Tree].[Education Level] = t.[Education Level] AND [MCNT-Tree].[Gender] = t.[Gender] AND [MCNT-Tree].[Home Ownership] = t.[Home Ownership] AND [MCNT-Tree].[Internet Connection] = t.[Internet Connection] AND [MCNT-Tree].[Marital Status] = t.[Marital Status] AND [MCNT-Tree].[Movie Selector] = t.[Movie Selector] AND [MCNT-Tree].[Num Bathrooms] = t.[Num Bathrooms] AND [MCNT-Tree].[Num Cars] = t.[Num Cars] AND [MCNT-Tree].[Num Children] = t.[Num Children] AND [MCNT-Tree].[Num T Vs] = t.[Num TVs] AND [MCNT-Tree].[PPV Freq] = t.[PPV Freq] AND [MCNT-Tree].[Prerec Buying Freq] = t.[Prerec Buying Freq] AND [MCNT-Tree].[Prerec Format] = t.[Prerec Format] AND [MCNT-Tree].[Prerec Renting Freq] = t.[Prerec Renting Freq] AND [MCNT-Tree].[Prerec Viewing Freq] = t.[Prerec Viewing Freq] AND [MCNT-Tree].[Theater Freq] = t.[Theater Freq] AND [MCNT-Tree].[TV Movie Freq] = t.[TV Movie Freq] AND [MCNT-Tree].[TV Signal] = t.[TV Signal] Order by (PredictProbability([MCNT-Tree].[Bedrooms], 'One')) SELECT t.[CustomerID], (PredictProbability([MCNT-Tree].[Bedrooms], 'One')) as [One], (PredictProbability([MCNT-Tree].[Bedrooms], 'Two or three')) as [two] From [MCNT-Tree] PREDICTION JOIN OPENQUERY([MCNT], 'SELECT [CustomerID], [Num Bedrooms], [Age], [Education Level], [Gender], [Home Ownership], [Internet Connection], [Marital Status], [Movie Selector], [Num Bathrooms], [Num Cars], [Num Children], [Num TVs], [PPV Freq], [Prerec Buying Freq], [Prerec Format], [Prerec Renting Freq], [Prerec Viewing Freq], [Theater Freq], [TV Movie Freq], [TV Signal] FROM [dbo].[Customers] ') AS t