f

PRS - back to the homepage {short description of image} Pattern Recognition Systems - mail us {short description of image}
{short description of image}
{short description of image}
INFO
Software
   |Sirius
    Brochure
    Applications
    Literature
    Quotes
    Prices
   |Xtricator
   |MS Resolver
Courses
Consultancy
Order
News Download
Support

 

05.09.09 mail uswebmaster

PRS as
MIX M?hlenpris PB 24
5006 Bergen
NORWAY
Ph:
+47 47339830

New features in Sirius 8.0

Some new features in Sirius 8.0 are:

----------------------------Top of Page

Mixture Design

Although Sirius is not a primarily an experimental design package, the design part of Sirius is continuously improved. A limited version of Mixture Design is implemented in version 8.0

This option eases the use of creating the appropriate design

The new option available in version 8.0 includes Simplex Lattice Design, Simplex Centroids Design, Non-Simplex Design and Screening Mixture Design

----------------------------Top of Page

Target Projection (TP) - an OPLS equivalent

Target projection (TP) has been introduced to facilitate interpretation of latent variable regression models. Orthogonal partial least squares (OPLS) regression was introduced as an alternative method for the same purpose.

Target projection (TP) and orthogonal partial least squares (OPLS) can both be described as a rotation of the components extracted by standard partial least squares (PLS, PLS-DA) regression. For the same number of components, OPLS and X-orthogonal target projection (XOTP) is shown to provide score and loading vectors for the predictive component that are the same except for a scaling factor. Furthermore, it has been shown that the TP approach can be extended to embrace systematic variation in X unrelated to the response.

In Sirius 8.0 a complete new implementation for the Target Projection approach has been added. This implementation is focusing on an graphical presentation of the results from TP This includes the Selectivity Ratio plot and the DIVA plot.

Selectivity Ratio (SR)

From Eq.6, we can calculate explained vexpl and residual vres variance for the target projection. From this we can define a selectivity ratio SR for each spectral variable i:

SRi = vexpl,i/Vres,i          i= 1,2,3,..

The selectivity ratio can be displayed similarly to a spectrum and a high value means that the spectral variable has a strong ability to discriminate controls from impacted samples. Thus, the selectivity ratio can be used quantitatively to detect biomarker candidates. The boundary between spectral regions with marker candidates and less interesting regions is chosen by the user. A small ratio increases the risk of selecting false candidates, while a high ratio increases the risk of loosing potential markers.

 

The DIVA (DIscriminating Variable) plot is closely related to the PLS-DA and Target Projection method.

The nonparametric DIVA test is designed for connecting Selectivity Ratio (SR) to discriminatory ability of a variable quantified as probability for correct classification.

From the nonparametric DIVA test we can obtain probability based boundaries for the SR plot. This provides a quantitative display for assessing the discriminatory ability of all regions in a complex variable profile. Furthermore, we can take advantage of the fact that the sign of the regression coefficient for a variable shows if a variable increases or decreases between two groups of samples on the TP component.

----------------------------Top of Page

Multiple Linear Regression

A linear regression model that contains more than one predictor variable is called a multiple linear regression model. The following model is a multiple linear regression model with two predictor variables, X1 and X2

Y = β0 + β1 X1 + β2 X2

The model is linear because it is linear in the parameters β0,. β1  and β2  .

In modern analysis, the datasets often (e.g., spectroscopic) have a large number of variables and more sophisticated methods are needed. Methods like multiple linear regression (MLR),principal component regression (PCR) and partial least-squares (PLS) are methods supporting analysis of such data.

There are 3 types of regression available in Sirius 8.0, Multi-Linear Regression (MLR), Principal Component Regression (PCR), and Partial Least-Square Regression (PLS). MLR is considered a reverse regression method placing all weight on the Y data when regressing. Placing the weight on the Y data means that the prediction error is minimized. PCR on the other hand, is considered a forward regression method placing all the weight on X data , hence minimizing the calibration error. PLS uses both X and Y data equally.

Many papers and discussion have been presented to compare the three methods, and different conclusions have been drawn.

However, the dimensionality of spectral (and other) data is basically limited by the number of samples, whereas the number of variables can reach a very large number. Furthermore, the high-dimensional spectral data are highly correlated and usually noisy. Therefore methods like PCR and PLS are often more suitable for analysing such data.

One of the problems with MLR is that the size of the X matrix of unknowns grows rapidly as more spectral wavelengths are included in the regression model. This means that the number of calibration samples with known property/concentration values must also grow rapidly as more wavelengths are included in the model.

Another problem with MLR is that, for spectral data that exhibit subtle variations with the typical process variation, the matrix inverse step is poorly conditioned. A poorly conditioned system will lead to large errors in the computation of the regression coefficient matrix B, and resulting poor prediction accuracy. A poorly conditioned calibration matrix will lead to models that will be extremely unreliable in predicting on samples with spectra that are dissimilar to those spectra contained in the calibration set data.

PLS and PCR have lower prediction error than MLR because they don't suffer from the "overfit" problem characteristic of MLR. Because they use fewer degrees of freedom (less flexibility) and base their factors on covariance (PLS) or variance (PCR) they don't use very small variations in the data that make models fit the calibration data better but are generally not predictive with new data.

----------------------------Top of Page

External Model Validation

Model Validation means checking the quality of the model:

Model Validation means checking how well the model will perform on new data (data not included in the modeling).

A regression model is usually made to do predictions in the future. The validation of the model, estimates the uncertainty of such future predictions. If the uncertainty is reasonably low, the model can be considered valid.

The same argument applies to a descriptive multivariate analysis such as PCA: If you want to extrapolate the correlations observed in your data table to future, similar data, you should check whether they still apply for new data.

In Sirius 8.0 a variant of double cross validation is implemented. The applied method repeatedly splits of the data into test sets and a validation sets and the average prediction error is calculated.

From the analysis the optimal number of components (size of model) can be estimated.

For PLS-DA an additional validation algorithm is implemented, response permutation.

Response permutation is a testing technique for checking the robustness of a PLS-DA model. The dependent variable vector, Y-vector, is randomly shuffled and a new model is developed using the original independent variable matrix. The process is repeated several times. It is expected that the resulting models will generally have low R2 and low Q2 values.

If the new models developed from the data set with randomised responses have significantly lower R2 and Q2 than the original model, then this is strong evidence that the proposed model is well founded, and not just the result of chance correlation.

----------------------------Top of Page

New Effective Alignment algorithms

An important assumption when performing multivariate data analysis, is that the variables are the same through all samples. Therefore, peak alignment can be an important task for data based on many instrumental measuring techniques, that is, GC, NMR, MALDI and more.

In addition to the well-known COW (Correlation Optimized Warping), two new methods are implemented in Sirius 8.0.

  1. PAFFT correlation method (Peak alignment by Fast Fourier Transform)
  2. RAFFT correlation method (Recursive alignment by Fast Fourier Transform)

These methods are fast and wells suited for handling large datasets.

----------------------------Top of Page

New Baseline Correction options

All spectroscopists know and have observed, spectrometers do not always collect data with an ideal baseline. Due to a variety of problems (detector drift, changing environmental conditions such a temperature, spectrometer purge, sampling accessories, etc.), the baseline of a given spectrum is not always where it should be. Beer?s Law assumes that the absorption of light at a given wavelength is due entirely to the absorptivity of the constituents in the sample; it does not account for "spectrometer error" or "sampling error." Therefore, in order to accurately calculate concentrations, it is necessary to remove the baseline effect introduced by the spectrometer.

As with most random variations in the spectral data, most chemometric models can compensate for these effects by adding extra factors. Or, if the variations are truly completely random, ignore them altogether. However, as with all pre-processing methods, a more robust model will usually result when the known interference's in the data are removed first.

There are a number of methods used by spectroscopists to remove baseline effects from the spectra they collect, some methods which are reasonably automated enough to be used as part of a pre-preprocessing step. The following list of baseline correction methods available in Sirius 8.0

 

  • Simple Offset Correction
  • 2-pont Baseline Correction.
  • Restrained Moving Average
  • Multi-point Baseline Correction
  • Differentiation
  • Polyfit
  • LIMPIC

----------------------------Top of Page

Colouring Templates for objects and variables

Graphics and colouring of objects and variables are extremely important in multivariate analysis.

It is now possible to save name/colour/symbols schemes for later use. One dataset can operate with several name/colour/symbols schemes.

In the Sirius 8.0 it is possible to save (and load) Colour Templates. These can later be activated in various plots.

----------------------------Top of Page

New Import Options

Additional import options have been added.

----------------------------Top of Page

New features in the Data editor

  • Support for PLS-DA
  • Covariance analysis
  • Generate new objects by averaging existing objects
  • Ternary plot
  • And more.

----------------------------Top of Page

 New useful features for supporting PLS-DA

PLS-DA is a multivariate analysis technique getting more popular.

  • In Sirius 8.0 a special option has been implemented for adding response variables suitable for PLS-DA analysis.

  • Target Projection with DIVA plot for biomarker identification

  • Permutation test for model validation

 ----------------------------Top of Page

New modelling options

The following options are available in Sirius 8.0

  Available methods Decription
Graphical Univariate Analysis Editor,Graphics, Univariate Statistics, T-tests Use these options to get an overview of the available data
Explorative Analysis PCA, CA, MOP, MVP, Fuzzy Clustering, Membership Use these multivariate methods to explore the data for groupings, outliers, etc
Classification/Discrimination PCA, CA, M OP, MVP Use these options to build classification models
Response Modelling PLS, PCR, TP, OS-2 Use these options to build prediction models
Experimental Design Factorial Design, Central Composite Design,  D-Optimal Design, Plackett-Burman Design, Mixture Design Use these options to perform am experimental design study

New option in version 8.0 are:

  • Target Projection supporting Biomarker identification and Variable Selection has been implemented.

  • External Validation supporting a variant of double cross validation and response permutation has been implemted.

  • New Graphics; SR plot, DIVA plot, S plot

----------------------------Top of Page

Additional

  • Improved Variable selection in regression

  • General Improvements in layout and methods

  • New Tutorials supporting Target Projection

 ----------------------------Top of Page