From Data to Differential Equations

From Data to Differential Equations Jim Ramsay McGill University With inspirations from Paul Speckman and Chong Gu

The themes • Differential equations are powerful tools for modeling data. • We have new methods for estimating differential equations directly from data. • Some examples are offered, drawn from chemical engineering and medicine.

Differential Equations as Models • DIFE’S make explicit the relation between one or more derivatives and the function itself. • An example is the harmonic motion equation:

Why Differential Equations? • The behavior of a derivative is often of more interest than the function itself, especially over short and medium time periods. • How rapidly a system responds rather than its level of response is often what matters. • Velocity and acceleration can reflect energy exchange within a system. Recall equations like f = ma and e = mc2.

Natural scientists often provide theory to biologists and engineers in the form of DIFE’s. • Many fields such as pharmacokinetics and industrial process control routinely use DIFE’s as models. • Especially for input/output systems, and for systems with two or more functional variables mutually influencing each other. • DIFE’s arise when feedback systems must be developed to control the behavior of systems.

The solution to an mth order linear DIFE is an m-dimensional function space, and thus the equation can model variation over replications as well as average behavior. • A DIFE requires that derivatives behave smoothly, since they are linked to the function itself. • Nonlinear DIFE’s can provide compact and elegant models for systems exhibiting exceedingly complex behavior.

The Rössler Equations This nearly linear system exhibits chaotic behavior that would be virtually impossible to model without using a DIFE:

Stochastic DIFE’s We can introduce randomness into DIFE’s in many ways: • Random coefficient functions. • Random forcing functions. • Random initial, boundary, and other constraints. • Time unfolding at a random rate.

Deliverables • If we can model data on functions or functional input/output systems, we will have a modeling tool that greatly extends the power and scope of existing nonparametric curve-fitting techniques. • We may also get better estimates of functional parameters and their derivatives.

A simple input/output system • We begin by looking at a first order DIFE for a single output function x(t) and a single input function u(t). (SISO) • But our goal is the linking of multiple inputs to multiple outputs (MIMO) by linear or nonlinear systems of arbitrary order m.

u(t) is often called the forcing function, and • is an exogenous functional independent • variable. • Dx(t) = -β(t)x(t) is called the homogeneous • part of the equation. • α(t) and β(t) are the coefficient functions • that define the DIFE. • The system is linear in these coefficient • functions, and in the input u(t) and output • x(t).

In this simple case, an analytic solution is possible: However, it is necessary to use numerical methods to find the solution to most DIFE’S.

A simpler constant coefficient example We can see more clearly what happens when • the coefficients α and β are constants, • α = 1, x0= 0, and • u(t) is a step function, stepping from 0 to 1 at time t1:

Constant α/β is the gain in the system. • Constant β controls the responsivity of the system to a change in input.

A Real Example: Lupus treatment • Lupus is an incurable auto-immune disease that mainly afflicts women. • It flares unpredictably, inflicting wide damage with severe symptoms. • The treatment is prednisone, an immune system suppressant used in transplants. • But prednisone has serious short- and long-term side affects, and exposure to it must be controlled.

How to Estimate a Differential Equation from Raw Data • A previous method, principal differential analysis, first smoothed the data to get functions x(t) and u(t), and then estimated the coefficient functions defining the DIFE. • This two-stage procedure is inelegant and probably inefficient. Going directly from data to DIFE would be better.

Profile Least Squares • The idea is to replace the function fitting the raw data, x(t), by the equations defining the fit to the data conditional on the DIFE. • Then we optimize the fit with respect to only the unknown parameters defining the DIFE itself. • The fit x(t) is defined as a by-product of the process, but does not itself require additional parameters.

This profiling process is often used in nonlinear least squares problems where some parameters are easily solved for given other parameters. • There we express the conditional estimates of the these easy-to-estimate parameters as functions of the unknown hard-to-estimate parameters, and optimize only with respect to the hard parameters. • This saves both computational time and degrees of freedom. • An alternative strategy is to integrate over the easy parameters, and optimize with respect to the hard ones; this is the M-step in the EM algorithm.

The DIFE as a linear differential operator We can re-express the first order DIFE as a linear differential operator: More compactly, suppressing “(t)”, and making explicit the dependency of L on α and β,

Smoothing data with the operator L If we know the differential equation, then the differential operator L defines a data smoother (Heckman and Ramsay, 2000). The fitting criterion is: The larger λ is, the more the fitting function x(t) is forced to be a solution of the differential equation Lαβx(t) = 0.

Let x(t) be expanded in terms of a set K basis functions φk(t), • Let N by K matrix Z contain the values of these basis functions at time points ti, and • Let y be the vector of raw data.

Then the smooth values have the expression Zc, where c is the vector of coefficients. • But these coefficients are easy parameters to estimate given operator Lαβ. The expression for them is • We therefore remove parameter vector c by • replacing it with the expression above.

How to estimate L L is a function of weight coefficients α(t) and β(t). If these have the basis function expansions then we can optimize the profiled error sum of squares with respect to coefficient vectors a and b.

It is also a simple matter to: • constrain some coefficient functions to be zero or a constant. • force some coefficient functions to be smooth, employing specific linear differential operators to smooth them towards specific target spaces. We do this by appending penalties to SSE(a,b), such as where M is a linear differential operator for penalizing the roughness of β.

And more … This approach is easily generalizable to: • DIFE’s and differential operators of any order. • Multiple inputs uj(t) and outputs xi(t). • Replicated functional data. • Nonlinear DIFE’s and operators.

Adaptive smoothing We can also use this approach to have the level of smoothing vary. We modify the differential operator as follows: The exponent function κ(t) plays the role of a log λ that varies with t.

Choosing the smoothing parameter λ is always a delicate matter. • The right value of λ will be rather large if the data can be well-modeled by a low-order DIFE. • But it should not so large as to smooth away additional functional variation that may be important. • Estimating λ by generalized cross-validation seems to work reasonably well, at least for providing a tentative value to be explored further.

A First Example • The first example simulates replicated data where the true curves are a set of tilted sinusoids. • The operator L is of order 4 with constant coefficients. • How precisely can we estimate these coefficients? • How accurately can we estimate the curves and first two derivatives?

For replications i=1,…,N and time values j=1,…,n, let where the cik’s and the εij’s are N(0,1); and t = 0(0.01)1. The functional variation satisfies the differential equation where β0(t) = β1(t) = β3(t)=0 and β2(t) = (6π)2 = 355.3.

For simulated data with N = 20 replications and constant bases for β0(t) ,…, β3(t), we get • L = D4: best results are forλ=10-10 and the RIMSE’s for derivatives 0, 1 and 2 are 0.32, 9.3 and 315.6, respectively. • L estimated: best results are forλ=10-5 and the RIMSE’s are 0.18, 2.8, and 49.3, respectively. • giving precision ratios of 1.8, 3.3 and 6.4, resp. • β2was estimated as 353.6 whereas the true value was 355.3. • β3 was 0.1, with true value 0.0.

In addition to better curve estimates and much better derivative estimates, note that the derivative RMSE’s do not go wild at the end points, which is usually a serious problem with polynomial spline smoothing. • This is because the DIFE ties the derivatives to the function values, and the function values are tamed at the end points by the data.

A decaying harmonic with a forcing function Data from a second order equation defining harmonic behavior with decay, forced by a step function, is generated by • β0 = 4.04, β1 = 0.4, α = -2.0. • u(t) = 0, t < 2π, u(t) = 1, t ≥ 2π. • Adding noise with std. dev. 0.2.

With only one replication, using minimum generalized cross-validation to choose λ, the results estimated for 100 trials are:

An oil refinery example • The single input is “reflux flow” and the output is “tray 47” level in a distillation column. • There were 194 sampling points. • 30 B-spline basis functions were used to fit the output, and a step function was used to model the input.

After some experimentation with first and second order models, and with constant and varying coefficient models, the clear conclusion seems to be the constant coefficient model: The standard errors of β and α in this model, as estimated by parametric bootstrapping, were 0.0004 and 0.0023, respectively. The delta method yielded 0.0004 and 0.0024, respectively. Pretty much the same.

Monotone smoothing • Some constrained functions can be expressed as DIFE’s. • A smooth strictly monotone function can be expressed as the second order DIFE

We can monotonically smooth data by estimating the second order DIFE directly. • We constrain β0(t) = 0, and give β1(t) enough flexibility to smooth the data. • In the following artificial example, the smoothing parameter was chosen by generalized cross-validation. β1(t) was expanded in terms of 13 B-splines.

Analyzing the Lupus data • Weight function β(t) defining an order 1 DIFE for symptoms estimated with and without prednisone dose as a forcing function. • Weight expanded using B-splines with knots at every observation time. • Weight α(t) for prednisone is constant.

The forced DIFE for lupus

The data fit

Adding the forcing function halved the LS fitting criterion being minimized. • We see that the fit improves where the dose is used to control the symptoms, but not where it is not used. • These results are only suggestive, and much more needs to be done. • We want to model treatment and symptom as mutually influencing each other. This requires a system of two differential equations.

Summary • We can estimate differential equations directly from noisy data with little bias and good precision. • This gives us a lot more modeling power, especially for fitting input/output functional data. • Estimates of derivatives can be much better, relative to smoothing methods. • Functions with special properties such as monotonicity can be fit by estimating the DIFE that defines them.

From Data to Differential Equations