1 / 63

Theory of Differentiation in Statistics

Theory of Differentiation in Statistics. Mohammed Nasser Department of Statistics. Relation between Statistics and Differentiation. Relation between Statistics and Differentiation. Relation between Statistics and Differentiation. Monotone Function. f(x). Monotone Decreasing.

renardo
Download Presentation

Theory of Differentiation in Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Theory of Differentiation in Statistics Mohammed Nasser Department of Statistics

  2. Relation between Statistics and Differentiation

  3. Relation between Statistics and Differentiation

  4. Relation between Statistics and Differentiation

  5. Monotone Function f(x) Monotone Decreasing Monotone Increasing Non Increasing Strictly Increasing Strictly Decreasing Non Decreasing

  6. Increasing/Decreasing test

  7. Example of Monotone Increasing Function 0

  8. Maximum/Minimum a b Is there any sufficient condition that guarantees existence of global max/global min/both?

  9. Some Results to Mention If the function is continuous and its domain is compact, the function attains its extremum It’s a very general result It holds for any compact space other compact set of Rn. Any convex ( concave) function attains its global min ( max). Without satisfying any of the above conditions some functions may have global min ( max). Calculation of extremum Firstly, proof of existence of extremum Then

  10. What Does Say about f Fermat’s Theorem: if f has local maximum or minimum at c, and if exist, then but converse is not true

  11. Concavity • If for all x in (a,b), then the graph of f concave on (a,b). • If for all x in (a,b), then the graph of f concave on (a,b). • If then f has a point of inflection at c. Concave Convex c Point of inflection

  12. Maximum/Minimum Let f(x) be a differential function on an interval I • f is maximum at • f is maximum at • If for all x in an interval, then f is maximum at first end point of the interval if left side is closed and minimum at last end point if right side is closed. • If for all x in an interval, then f is minimum at first end point of the interval if left side is closed and maximum at last end point if right side is closed.

  13. Normal Distribution The probability density function is given as, point of inflection • continuous on R • f(x)>=0 • Differentiable on R Concave Convex Convex

  14. Normal Distribution Now, Take log both side Put first derivative equal to zero

  15. Normal Distribution Therefore f is maximum at

  16. Normal Distribution Put 2nd derivative equal to zero Therefore f has point of inflection at

  17. Logistic Distribution The distribution function is defined as, Concave Convex

  18. Logistic Distribution Take first derivative with respect to x Therefore F is strictly increasing Take2nd derivative and put equal to zero Therefore F has a point of inflection at x=0

  19. Logistic Distribution Now we comment that F has no maximum and minimum. Since, Therefore F is convex on and concave on

  20. Variance of a Function of Poisson Variate Using Taylor’s Theorem We know that, We are interested to find the Variance of

  21. Variance of a Function of Poisson Variate Using Taylor’s Theorem The Taylor’s series is defined as, Therefore the variance of

  22. Risk Functional Risk functional, RL,P(g)= Population Regression functional /classifier, g* • P is chosen by nature , L is chosen by the scientist • Both RL,P(g*) and g* are uknown • From sample D, we will select gD by a learning method(???)

  23. Empirical risk minimization Empirical Risk functional, = • Problems of empirical risk minimization

  24. What Can We Do? • We can restrict the set of functions over which we minimize empirical risk functionals • modify the criterion to be minimized (e.g. adding a penalty for `complicated‘ functions). We can combine two. Regularization Structural risk Minimization

  25. In linear regression, we minimize the error function: Regularized Error Function Replace the quadratic error function by Є-insensitive error function: An example of Є-insensitive error function:

  26. Linear SVR: Derivation Meaning of equation 3

  27. Linear SVR: Derivation vs. ● ● Complexity Sum of errors ● ● ● ● ● Case I: “tube” complexity Case II: “tube” complexity

  28. C is big ● ● ● ● ● ● ● Linear SVR: Derivation C is small • The role of C ● ● ● ● ● ● ● Case I: “tube” complexity Case II: “tube” complexity

  29. Linear SVR: derivation ● ● ● ● Subject to: ● ● ●

  30. Lagrangian Minimize: Dual var. αn,αn*,μn,μ*n >=0 f(x)=<w,x>=

  31. Dual Form of Lagrangian Maximize: Prediction can be made using: ???

  32. How to determine b? Karush-Kuhn-Tucker (KKT) conditions implies( at the optimal solutions: Support vectors are points that lie on the boundary or outside the tube These equations implies many important things.

  33. Important Interpretations

  34. Support Vector: The Sparsity of SV Expansion and

  35. Dual Form of Lagrangian(Nonlinear case) Maximize: Prediction can be made using:

  36. Non-linear SVR: derivation Subject to:

  37. Non-linear SVR: derivation Subject to: Saddle point of L has to be found: min with respect to max with respect to

  38. Non-linear SVR: derivation ...

  39. What is Differentiation? f,a nonlinear function V, Another B-space U A Banach Space Differentiation is nothing but local linearization In differentiation we approximate a non-linear function locally by a (continuous) linear function

  40. Fréchet Derivative Definition 1 It can be easily generalized to Banach space valued function, f: is a linear map. It can be shown,. every linear map between infinite-dimensional spaces is not always continuous.

  41. Frécehet Derivative We have just mentioned that Fréchet recognized , the definition 1 could be easily generalized to normed spaces in the following way: Where and the set of all continuous linear functions between B1and B2 If we write, the remainder of f at x+h, ; Rem(x+h)= f(x+h)-f(x)-df(x)(h)

  42. S Derivative Then 2 becomes Soon the definition is generalized (S-differentiation ) in general topological vector spaces in such a way ; i) a particular case of the definition becomes equivalent to the previous definition when , domain of f is a normed space, ii) Gateaux derivative remains the weakest derivative in all types of S-differentiation.

  43. S Derivatives Definition 2 Let S be a collection of subsets of B1 , let t R. Then f is S-differentiable at x with derivative df(x) if Definition 3 When S= all singletons of B1, f is called Gâteaux differentiable with Gâteaux derivative . When S= all compact subsets of B1, f is called Hadamard or compactly differentiable with Hadamard or compact derivative . When S= all bounded subsets of B1, f is called or boundedly differentiable with or bounded derivative .

  44. Equivalent Definitions of Fréchet derivative (a) For each bounded set, as in R, uniformly (b) For each sequence, and each sequence

  45. (c) (d) Uniformly in (e) Uniformly in Statisticians generally uses this form or its some slight modification

  46. Relations among Usual Forms of Definitions • Set of Gateaux differentiable function at set of Hadamad differentiable function at set Frechet differentiable function x. In application to find Frechet or Hadamard derivative generally we shout try first to determine the form of derivative deducing Gateaux derivative acting on h,df(h) for a collection of directions h which span B1. This reduces to computing the ordinary derivative (with respect to R) of the mapping which is much related to influence function, one of the central concepts in robust statistics. It can be easily shown that, • When B1=R with usual norm, they will three coincide • When B1, a finite dimensional Banach space, Frechet and Hadamard derivative are equal. The two coincide with familiar total derivative.

  47. Properties of Fréchet derivative • Hadamard diff. implies continuity but Gâteaux does not. • Hadamard diff. satisfies chain rule but Gâteaux does not. • Meaningful Mean Value Theorem, Inverse Function Theorem, Taylor’s Theorem and Implicit Function Theorem have been proved for Fréchet derivative

  48. Lebesgue Counting

  49. Mathematical Foundations of Robust Statistics d 1(F,G) <δ d 2(T(F),T(G)) <ε T(G)≈T(F)+ (T(G)-T(F))≈

More Related