1 / 11

6. Other issues

6. Other issues. Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP. How many components to use?. Use ‘unfolding trick’ i.e. look at rank of each mode. does not have strict statistical basis, but generally works well! Use core-consistency diagnostic (PARAFAC).

katoka
Download Presentation

6. Other issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

  2. How many components to use? • Use ‘unfolding trick’ i.e. look at rank of each mode. • does not have strict statistical basis, but generally works well! • Use core-consistency diagnostic (PARAFAC). • also seems to work well in practice • Split-half analysis. • Does algorithm converge without problems? • Use full cross-validation. • N-way Toolbox now has a routine for this – can be slow! • Look at loadings and residuals. • Use chemical knowledge.

  3. Mean-centering removes offsets from the data • removes constant background effects • can help to linearize data, i.e. Preprocessing: centering (1) • We are often interested in the differences between objects, not in their absolute values. • building calibration models: differences between samples

  4. Three-way xjk X secondary variable object primary variable Preprocessing: centering (2) • When performing a calibration, it is most common to remove the mean value from each column: Two-way X object variable

  5. Preprocessing: scaling (1) • Sometimes we want to analyse variables measured in different units • chemical engineering: temperatures, pressures, flow rates • QSAR: ionization constants, Hammett constants, dipole moments • These variables should be scaled in order to give variables an equal chance to appear in the model.

  6. Three-way xjk X secondary variable object primary variable Autoscaling can destroy multilinear structure! Preprocessing: scaling (2) • For two-way arrays (object  variables), it is common to divide by the standard deviation after mean-centering the data (‘autoscaling’): Two-way X object variable

  7. Double slab scaling may also be useful - ITERATIVE Xj Xk X process variable 2 object process variable 1 Preprocessing: scaling (3) Slab scaling maintains the multilinear structure! Xj X time object process variable

  8. Tucker models • Tucker1: X = AG + E • Tucker1 = PCA • Tucker2: X = G(BA)T + E • G (I R2 R3) • very rarely used • Tucker3: X = AG(CB)T + E

  9. time shift PARAFAC2 time shift object (I) time (K) wavelength (J) In PARAFAC2, only the matrix product XiXiT (JJ) is modelled. It works if the correlation structures in the objects are the same.

  10. missing known • 1. Estimate model, (maximization) • 2. Replace missing values with model values • (expectation) Missing data • Expectation-maximization (EM) is a technique for estimating models (PARAFAC, Tucker, PLS, PCA etc.) when some of the data is missing: X = [X* X#] • 0. Initialize X# • 3. Repeat until convergence

  11. Muito obrigado para sua atenção!

More Related