1 / 25

Machine learning - Applications to economic analysis

Machine learning - Applications to economic analysis. Andrew Banks Economic Statistics Transformation Programme Office for National Statistics. Overview. Machine learning – the background. Rapid developments in computing power and development of algorithms. Vast increase in open data

pmartines
Download Presentation

Machine learning - Applications to economic analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine learning - Applications to economic analysis Andrew Banks Economic Statistics Transformation Programme Office for National Statistics

  2. Overview

  3. Machine learning – the background Rapid developments in computing power and development of algorithms. Vast increase in open data Open source machine learning libraries

  4. Machine learning – the challenges Most algorithms focus on three key areas: Regression Classification Clustering Best for prediction only, not understanding causal relationships Acquiring adequate data Identifying meaningful, novel aspects in the data

  5. Machine learning – applications to statistics Predictive models with great out of sample accuracy Missing data imputation Change and deviation detection uncover data records that suspiciously diverge from the pattern of their peers. Numerous classification and regression application (images / text etc)

  6. Practical example – Trade data • Country data • Size, area and income (GDP) • Distance from the UK • World development indicators (e.g. airports per sqkm) • Commodity data • World import demand • Average tariff • Commodity description Country characteristics Distance from the UK (Gravity model) Size of country (area, GDP, GDP per capita) Other characteristics (language, continent) Commodity information Conventional rate of duty World trade in those commodities Key words/ features Gravity model of UK trade in Goods, natural log scale

  7. Overview of neural networks Various input feeds Set of weights An ‘activation function’ that determines whether a neuron ‘fires’ A single layer ‘perceptron’ is the most straightforward neural network (shown right)

  8. ‘Deep learning’ > 1 layer = Deep neural network Feed forward or backward Mainly used to classify images or in large recommendation models. Uses backpropagation and gradient descent to arrive at a local optimum solution

  9. Advantages OLS is a single weighted sum of features, cannot deal with non-linearities Could use polynomial regression to control for this, but you would have to make assumptions about the model structure. Neural Networks can model non-linearities automatically Can handle continuous and categorical variables together easily. Far better out of sample prediction accuracy!

  10. Disadvantages A network of units connected by weighted links is difficult for humans to interpret. More than one local optimum solution Requires the tuning of various parameters, such as: No. hidden neurons Layers Iterations to solve More sensitive to how features are scaled

  11. Practical example – Trade data • Country data • Size, area and income (GDP) • Distance from the UK • World development indicators (e.g. airports per sqkm) • Commodity data • World import demand • Average tariff • Commodity description Country characteristics Distance from the UK (Gravity model) Size of country (area, GDP, GDP per capita) Other characteristics (language, continent) Commodity information Conventional rate of duty World trade in those commodities Key words/ features Gravity model of UK trade in Goods, natural log scale

  12. Validation – Using a test / train split (90%, 10%) Training dataset Test dataset Log_distance Log_distance

  13. OLS, Training data, R^2 = 0.307 Training dataset Filled values on training dataset Log_distance Log_distance Variables used: Distance from the UK, GDP per capita, and size of country

  14. OLS, Test dataset (R^2 = 0.172) OLS, Test data, R^2 = 0.172 Filled values on test dataset Test dataset Log_distance Log_distance Variables used: Distance from the UK, GDP per capita, and size of country

  15. Tensorflow Machine learning libraries - Python Scikit Learn Maintained by Google Machine learning Python library Open source – wider breadth of tools • Network used: • 100 hidden layers • Logistic activation function

  16. DNN, Training dataset, R^2 = 0.993 Filled values on training dataset Training dataset Log_distance Log_distance Variables - Distance from UK, GDP per capita, and country size • Network used: • 100 hidden layers • Logistic activation function

  17. DNN Test (R^2 = 0.994) DNN, Test dataset, R^2 = 0.994 Test dataset Filled values on test dataset Log_distance Log_distance Variables - Distance from UK, GDP per capita, and country size • Network used: • 100 hidden layers • Logistic activation function

  18. Full country by commodity dataset (280,000 samples) Practical example – Larger scale dataset Scatter plot of UK exports, country by commodity (x-axis = world import demand for commodity, y-axis = export value, colour = country) Log_value Log_world_im

  19. Full country by commodity dataset (280,000 samples) Practical example – Larger scale dataset • Example commodities: • Parrots, parakeets, macaws and cockatoos • Throat pastilles and cough drops • Wooden furniture for bedrooms (excl. seats) • Brass wind instruments • [X] • All countries • = 290,000 combinations • New variables • World import demand of the commodity • Tariff on exporting a specific commodity • to a particular country

  20. OLS, Test dataset (R^2 = 0.172) OLS, R^2 = 0.170 (50% train/test split) • Variables used: • Distance from the UK • GDP per capita • Size of country • Import demand for the commodity • Tariff rate on commodity Filled values on test dataset Log_predicted value

  21. OLS, Test dataset (R^2 = 0.172) DNN, R^2 = 0.991 (50% train/test split) • Variables used: • Distance from the UK • GDP per capita • Size of country • Import demand for the commodity • Tariff rate on commodity Filled values on test dataset Log_predicted value

  22. Outliers from Deep Neural Network

  23. Machine learning and causality Coefficients on specific neurons are difficult to interpret. In addition, the behavior of certain neurons may not adhere to any behavior one might expect. However new tools are being developed to enforce certain constraints to apply with neural networks (i.e. monotonic conditions) Estimated values of trade with a fake set of countries (x-axis = Predicted values of exports (£), y-axis = distance from the UK (natural log))

  24. Machine learning and causality While DNNs have the benefit of excellent predictive power, this can be a consequence of capturing spurious relationships. Therefore causal relationships cannot be inferred from the results. Machine learning community are also looking at how results can be used to infer causal relationships.

  25. Conclusion

More Related