110 likes | 205 Views
6. Other issues. Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP. How many components to use?. Use ‘unfolding trick’ i.e. look at rank of each mode. does not have strict statistical basis, but generally works well! Use core-consistency diagnostic (PARAFAC).
E N D
6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP
How many components to use? • Use ‘unfolding trick’ i.e. look at rank of each mode. • does not have strict statistical basis, but generally works well! • Use core-consistency diagnostic (PARAFAC). • also seems to work well in practice • Split-half analysis. • Does algorithm converge without problems? • Use full cross-validation. • N-way Toolbox now has a routine for this – can be slow! • Look at loadings and residuals. • Use chemical knowledge.
Mean-centering removes offsets from the data • removes constant background effects • can help to linearize data, i.e. Preprocessing: centering (1) • We are often interested in the differences between objects, not in their absolute values. • building calibration models: differences between samples
Three-way xjk X secondary variable object primary variable Preprocessing: centering (2) • When performing a calibration, it is most common to remove the mean value from each column: Two-way X object variable
Preprocessing: scaling (1) • Sometimes we want to analyse variables measured in different units • chemical engineering: temperatures, pressures, flow rates • QSAR: ionization constants, Hammett constants, dipole moments • These variables should be scaled in order to give variables an equal chance to appear in the model.
Three-way xjk X secondary variable object primary variable Autoscaling can destroy multilinear structure! Preprocessing: scaling (2) • For two-way arrays (object variables), it is common to divide by the standard deviation after mean-centering the data (‘autoscaling’): Two-way X object variable
Double slab scaling may also be useful - ITERATIVE Xj Xk X process variable 2 object process variable 1 Preprocessing: scaling (3) Slab scaling maintains the multilinear structure! Xj X time object process variable
Tucker models • Tucker1: X = AG + E • Tucker1 = PCA • Tucker2: X = G(BA)T + E • G (I R2 R3) • very rarely used • Tucker3: X = AG(CB)T + E
time shift PARAFAC2 time shift object (I) time (K) wavelength (J) In PARAFAC2, only the matrix product XiXiT (JJ) is modelled. It works if the correlation structures in the objects are the same.
missing known • 1. Estimate model, (maximization) • 2. Replace missing values with model values • (expectation) Missing data • Expectation-maximization (EM) is a technique for estimating models (PARAFAC, Tucker, PLS, PCA etc.) when some of the data is missing: X = [X* X#] • 0. Initialize X# • 3. Repeat until convergence
Muito obrigado para sua atenção!