1 / 41

MKT 700 Business Intelligence and Decision Models

MKT 700 Business Intelligence and Decision Models. Week 8: Algorithms and Customer Profiling (1). Classification and Prediction. SPSS Direct Marketing. SPSS Analysis. Major Algorithms. Euclidean Distance. Euclidean Distance for Continuous Variables.

urban
Download Presentation

MKT 700 Business Intelligence and Decision Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MKT 700Business Intelligence and Decision Models Week 8: Algorithms and Customer Profiling (1)

  2. Classification and Prediction

  3. SPSS Direct Marketing

  4. SPSS Analysis

  5. Major Algorithms

  6. Euclidean Distance

  7. Euclidean Distance for Continuous Variables • Pythagorean distance  √d2 =√(a2+b2) • Euclidean space  √d2 =√(a2+b2+c2) • Euclidean distance  d=[(di)2]1/2

  8. Pearson’s Chi-Square

  9. Contingency Table

  10. Observed and theoretical Frequencies

  11. Chi-Square:

  12. .10 .05 3.032 6.251 7.815 Statistical Inference • DF: (4 col –1) (2 rows –1) = 3

  13. Log Likelihood Chi-Square

  14. Log Likelihood • Cluster distance on probability distributions • Applicable to both categorical and continuous variables

  15. Contingency Table (Observed Frequencies)

  16. Contingency Table (Expected Frequencies)

  17. Chi-Square: p < 0.05; DF = 1; Critical value = 3.84

  18. Log Likelihood Distance & Probability

  19. ANOVA, F Statistics

  20. F-Statistics • For metric or continuous variables • Compare explained (in the model) and unexplained variances (errors)

  21. ANOVA • Group Comparisons: Are errors (discrepancies between observations and the overall mean) explained by group membership or by some other (random) effect?

  22. Variance SS is Sum of Squares DF = N-1 VAR=SS/DF SD = √VAR

  23. OnewayANOVA

  24. MSS(Between)/MSS(Within)

  25. ONEWAY (Excel or SPSS)

  26. Profiling

  27. Customer Profiling • Who is likely to buy or not respond? • Whois likely to buy what product or service? • Who is in danger of lapsing?

  28. Profiling/Decision Tree • SPSS Direct Marketing  Customer Profiling • SPSS Analysis  Classification  Decision Tree • CHAID (Chi-Square Automatic Interactive Detector) • CART (Classification and Regression Tree)

  29. Use of Decision Trees • Classify observations from a target binary or nominal variable Segmentation • Predictive response analysis from a target numerical variable Behaviour • Decision support rules  Processing

  30. Decision Tree

  31. Example:dmdata.sav Underlying Theory  X2

  32. CHAID AlgorithmSelecting Variables • Example • Regions (4), Gender (3, including Missing)Age (6, including Missing) • For each variable, collapse categories to maximize chi-square test of independence: Ex: Region (N, S, E, W,*)  (WSE, N*) • Select most significant variable • Go to next branch … and next level • Stop growing if …estimated X2 < theoretical X2

  33. CART (Nominal Target) • Nominal Targets: • GINI (Impurity Reduction or Entropy) Squared probability of node membership Gini=0 when targets are perfectly classified. Gini Index =1-∑pi2 • Example • Prob: Bus = 0.4, Car = 0.3, Train = 0.3 • Gini = 1 –(0.4^2 + 0.3^2 + 0.3^2) = 0.660

  34. CART (Metric Target) • Continuous Variables: Variance Reduction (F-test)

  35. Comparative Advantages(From Wikipedia) • Simple to understand and interpret • Requires little data preparation • Able to handle both numerical and categorical data • Uses a white box model easilyexplained by Boolean logic. • Possible to validate a modelusing statistical tests • Robust

  36. Where to get help? http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp

  37. Top line from Chapter 13 -1 • Analytics helps you to predict which recipients of your direct mail will buy your products, and which are not likely to buy. At $500 per thousand pieces, analytics can save you a lot of money. • Analytics is not as useful for e-mail marketing. The cost of appending data and the modeling often results in a loss, since the cost of mailing is only $6 per thousand. • Predictive models are based on previous promotions. You add demographic data (age, income, value of home, etc.) to a sample of your file and determine the differences between responders and non-responders. • Predictive modeling uses multiple regressions. It results in an algorithm—a mathematical formula that can be used to “score” any direct mailing file that has demographics appended, and predict, before you mail, which ones are going to respond. • Modeling does not always work. Sometimes what makes people buy is not based on demographics.

  38. Top line from Chapter 13 -2 • Analytics can be used to reduce unsubscribes. If you have done LTV and know the value of your subscribers, you can calculate how much analytics would save you by not mailing unwanted material to some subscribers. • Very few e-mail marketers are doing any predictive modeling today, with good reason. • Direct mail gets higher response rates than e-mail partly because the shelf life of a direct mail piece or catalog can be weeks or months. An e-mail’s shelf life is one day or less. • Modeling can be useful for cross-sales—determining what other products your customers might buy. • Next-best product analytics and churn predictive analytics can be very profitable.

  39. Top line from Chapter 13 -3 • CHAID is very useful for dividing your database into segments containing people with different interests and response rates. • Descriptive analytics is useful for advertising campaigns, but seldom useful for direct mail. • Clickstream data analysis can be very useful in planning the layout of a Web site or an e-mail. • Key performance indicators (KPIs) can help you determine the relative success of e-mail programs.

More Related