480 likes | 940 Views
Clustering Functional Data: Methods and Applications. Catherine Sugar University of Southern California sugar@usc.edu This is joint work with Gareth James of USC UCLA May 1st, 2006. Clustering and Functional Data.
E N D
Clustering Functional Data:Methods and Applications Catherine Sugar University of Southern California sugar@usc.edu This is joint work with Gareth James of USC UCLA May 1st, 2006
Clustering and Functional Data • Cluster Analysis: The art of finding groups in data. Points in the same cluster should be as similar as possible and points in disjoint clusters should be widely separated • Functional Data: Observations for a subject consist of curves or trajectories rather than finite dimensional vectors. • Growth curves • Longitudinal measurements of clinical status • Technology evolution • Spectra
Outline • Traditional approaches to clustering curves and problems with sparse data • A new approach using basis functions and a mixture model • Applications of our approach in medicine and business • Tools, extensions, and model selection issues
Functional Examples Spinal Bone Mineral Density Data Technology Evolution Curves
Functional Examples: Membranous Nephropathy Data
Traditional Approaches To Functional Clustering • Regularization: • Form a grid of equally spaced time points. • Evaluate each curve at the time points, giving a finite representation of each curve. • Apply a standard finite dimensional method possibly with a regularization constraint • Filtering: • Fit a smooth curve to each subject using a finite set of basis functions, • Perform clustering on the basis coefficients ()
Problems With the Traditional Approaches • Regularization: • Cannot be easily applied when curves are measured at different or unevenly spaced time points or when the data are too sparse • Even when it can be used, the resulting data vectors are high-dimensional and auto-correlated • Filtering: • Measurements may be too sparse to fit a curve for each subject • Requires fitting many parameters • If subjects are measured at different time points, the basis coefficients will not have a common covariance
Our Model • Let gi(t),Yi(t) and i(t) respectively be the true value, observed value and error for ith curve at time t. i.e. • We represent g(t) using a natural cubic spline basis: where s(t) is a spline basis vector and i is the vector of spline coefficients. • The coefficients are treated as random effects with where zi denotes cluster membership
Our Model • Our model becomes • We fit this model using the observed time points and an EM algorithm.
Model Applications I:Low Dimensional Representations • One can plot functional data but it is hard to assess relative “distances” between curves • We use the basis coefficients to project data into a low-dimensional space where it can be plotted as points • Projecting causes no information loss in terms of cluster assignment • The projections are exact analogues of the discriminants used in LDA
Model Applications ILow Dimensional Representations: Bone Data
Model Applications I:Low Dimensional Representations: Technology Data
Model Applications I:Low Dimensional Representations: Nephrology Data
Model Applications II:Dimensions of Separation • It is useful to know what dimensions do the best job of separating the clusters. • This depends on a combination of distance between clusters and within cluster covariance, and is equivalent to identifying which dimensions determine cluster assignment • The optimal weights for cluster assignment are given by an extension of the classical discriminant function for a Gaussian mixture:
Model Applications II:Dimensions of Separation Correlation and Covariance: Discriminating Functions:
Model Applications III:Prediction and Confidence Intervals • Another advantage of our method is that it provides accurate predictions for missing portions of g(t) • Natural estimate: • The prediction with minimum mean squared error is • CI’s and PI’s: Two step procedure—find the set of clusters most likely to contain g(t) and then create intervals conditional on cluster membership
Model Applications IIIPrediction on Technology Data Optical bit density Magnetic storage bit density
HDD 3.5 in. storage capacity • Black = Functional Clustering • Red = Linear Gompertz • Green = Mansfield-Blackman • Cyan = Weibull • Orange = Bass • Blue = S-curve
A Comparison With Standard Approaches • We took the first 10 years as training data and tried to predict the following 5 years using various different approaches. • Here we report the MSE on the left out data as a percentage of that from using a traditional S-curve (logistic curve).
Advantages of Our Model • Borrows strength from all curve fragments to simultaneously estimate mixture parameters and requires fitting fewer parameters. • Allows one to make more accurate predictions into the future based on only a few observations. • Flexible. Can be used effectively when data are sparse, irregularly spaced or sampled at different time points • Automatically puts the correct weights on estimated basis coefficients • Can be easily extended to include multiple functional and finite dimensional covariates.
Extensions I:Multiple Functional Covariates • Just as finite dimensional clustering algorithms can incorporate multiple covariates one should be able to use multiple functional variables • We can do this creating a block diagonal spline basis matrix using the entries for the p individual curves: • More care must be taken with the error structure but the same basic model and fitting procedure apply.
Extensions II:Finite Dimensional Covariates • It is just as easy to add finite dimensional covariates to the model • Let Xi be the vector of finite dimensional covariates. • We replace the spline basis matrix, Si, by the identity, Iix • The model can be fit just as before • Note that this provides a way of doing high dimensional standard clustering problems with missing data—just delete the corresponding rows of the identity matrix
Extensions III:Dimension Reduction • Reducing dimensions ahead of time (e.g. by PCA) may be risky. • Example below shows a case where the dimensions that explain most of the variability are not the ones determining cluster separation. Our method (right) does a superior job
References: • Bacrach, L. et al.(1999) Bone mineral Acquisition in healthy Asian, Hispanic, Black, and Caucasian youth; a longitudinal study. Journal of Clinical Endocrinology & Metabolism 84, 4702-4712 • Banfield, J. and Raftery, A. (1993). Model-based Gaussian and non-gaussian clustering. Biometrics 49, 803-821 • James, G., and Hastie, T. (2001). Functional Linear Discriminant analysis for irregularly sampled curves. JRSSB 63, 533-550 • James, G., Hastie, T., and Sugar, C. (2000). Principal component models for sparsely sampled functional data. Biometrika87, 587-602 • James, G. and Sugar, C. (2003) Clustering for sparsely sampled functional data. JASA, 98, 397-408 • Sugar, C., and James, G. (2003) Finding the number of clusters in a data set: An information theoretic approach. JASA, 98, 750-763
Model Selection Issues: • Choosing the spline basis and number and placement of knots • Choosing the dimension of the mean space • Choosing the covariance structure for the clusters • Choosing the number of clusters
How Many Clusters: • Raftery et al. suggest using approximate Bayes factors in the finite dimensional setting • We propose an approach based theory from Electrical Engineering involving distortion • Distortion is • Plot distortion as a function of k, the number of clusters • Rate distortion theory suggests the form of the resulting “distortion curve”
How Many Clusters:Basic Results • If the data are generated from a single cluster in q dimensions then asymptotically the distortion curve can be made linear, specifically • When there are an unknown number, K, of clusters, the inverse distortion plot will be straight both before and after K, and will experience its maximum jump at K subject to certain conditions.
How Many Clusters:Examples The figures below show a transformed distortion curve when there is (a) a single component and (b) six components in the generating mixture distribution
A General Functional Model • Let g(t) be the curve for a randomly chosen individual. (We will assume g(t) follows a Gaussian process.) • If g(t) is in the kth cluster we write • If Y is the vector of observed values at times t1,…,tn and errors are assumed independent then
A General Functional Model • Regularization and filtering can be viewed as approaches to fitting the general functional clustering model. • The regularization approach estimates k(t) and k(t,t') on a fine grid of time points with structural constraints on k(t,t') • The filtering approach assumes g(t) = (t) where (t) is a vector of basis functions and is the vector of coefficients. The 's are estimated separately for each individual and then clustered
Our Model • It is further useful to parameterize the cluster mean coefficients as where 0 and k are respectively q and h dimensional vectors and is a qh matrix. • If h < K-1 then the we are assuming the cluster mean coefficients lie in a restricted subspace. • Our model becomes
Fitting Our Model Via EM • Fitting our model involves estimating • We can do this by maximizing either the classification likelihood or the clustering likelihood, noting that conditional on class membership • We can use an iterative procedure such as EM to obtain the estimates
Model Selection III:Alternate Covariance Structures • So far we have assumed a common covariance matrix, , for all clusters • Raftery et al. suggest a class of covariance structures in their model-based clustering methods for finite-dimensional data: • James and Hastie suggest regularization or rank reduction in their papers on functional PCA and LDA