Analysis of 1H NMR spectra in metabolomics studies often requires multivariate pattern recognition techniques to extract meaningful results. Targeted profiling offers the ability to analyze spectra based directly on the identity and quantities of individual compounds. Profiles created using targeted profiling in Chenomx NMR Suite can be used as input in statistical software packages such as Umetrics SIMCA-P. Performing PCA on Chenomx targeted profiles yields information-rich results that allow rapid biological interpretation, since group separation can be directly correlated to variations in specific metabolite concentrations.

Metabolomics researchers are faced with two key challenges when analyzing their NMR data. First, they must determine whether there are meaningful statistical patterns in their data. Then, they must translate these patterns into biologically relevant information in the form of metabolite identity and quantity.

In this note we demonstrate how to resolve both of these challenges using Chenomx NMR Suite and Umetrics SIMCA-P. In this approach, Chenomx NMR Suite extracts targeted profiling information from the NMR spectra (see [10]), and Umetrics SIMCA-P performs statistical analysis of this data.

The process of analyzing this data will be illustrated using a set of 46 NMR spectra of rat brain extracts. Samples were collected from the hippocampus and the frontal, occipital and temporal cortices. The goal is to determine whether it is possible to differentiate the four regions based on their metabolic profiles and, if so, which metabolites are important in differentiating the regions.

With the concentration data exported from Profiler, and normalized as desired, you can now start the statistical analysis of the data using SIMCA-P. The version used in this application note is Umetrics SIMCA-P 11.0.

You must first create a new SIMCA-P project. The first step of creating a new project is defining the data source, which will be the concentration set exported in Exporting Results from Profiler and normalized in Data Normalization.


With the targeted profiling data imported into SIMCA-P, you can begin analysis of the data. The most common statistical models used with metabolomic datasets are Principal Components Analysis (PCA) and Partial Least Squares for Discriminant Analysis (PLS-DA).

One of the goals in this area of research is to use data obtained from your samples to build an appropriate model for classification and, if possible, determine the factors leading to differences among the samples. PCA and PLS-DA are often used for this purpose, as these techniques determine orthogonal latent variables that describe the input data and classify the data based on these variables. The primary advantage of using targeted profiling as an input to PCA or PLS-DA is that the resulting variables are combinations of measured metabolite concentrations. As such, these variables are easier to interpret as factors in the underlying classification model. Thus, targeted profiling provides meaningful and interpretable factors describing the input data.

PCA is a data reduction technique that may be used to reduce the dimensionality of a multi-dimensional dataset while retaining the characteristics of the dataset that contribute most to its variance.

PLS-DA is a regression extension of PCA that takes advantage of class information to attempt to maximize the separation between groups of observations.


In this dataset, you can use the available class information, specifically, the brain region that a particular sample came from, to establish a PLS-DA model. You can extract this information from the spectrum file names. First, edit the workset by clicking on "Workset > Edit > 1", and clicking on the "Observations" tab. Select "Primary ID" from the "Class from obs ID:" drop-down. Click on "Set", enter a start position of 1 and a length of 3, then click "OK". This will use the regions indicated at the start of each file name to define classes for the PLS-DA model (fcx = frontal cortex, hip = hippocampus, ocx = occipital cortex, tcx = temporal cortex). If you do not see the "Observations" tab, you may need to switch to Advanced Mode by clicking the appropriate button in the bottom left corner of the workset wizard.


Once the data is filtered and class values are identified, you can generate a PLS-DA model. Click "Analysis > Change Model Type" and select "PLS Discriminant Analysis". To fit the model to the data, select "Analysis > Autofit", which will extract the principal components needed to properly fit the model.

When the fit of the PLS-DA model is complete, you can use a 3D plot to visualize the results. To generate a 3D plot of the PLS-DA scores, start a new 3D scatter plot from the Plot/List menu, and select the Variables and Scores data type, then select t[1] for the X-Axis, t[2] for the Y-Axis, and t[3] for the Series. Figure 4 shows the resulting PLS-DA scores plot. Each data point represents a particular sample, and is automatically color-coded according to its class. This plot shows a very good separation between the various regions of the brain.


Having determined that there is a clear pattern in the data, it is natural to ask which metabolites are responsible for separating the various elements of the brain. In other words, what is the biological significance of the data? The loadings plot can provide an answer to this question, by describing the weighting coefficients for each metabolite.


Interpreting both the scores and loadings plots will be simpler with a 2D projection of the data. To generate a 2D plot of the PLS-DA scores, start a new scatter plot from the Plot/List menu, select the Variables and Scores data type, and select t[1] for the X-Axis and t[2] for the Series.(Figure 5).

Examining the loadings plot for PC 1 and PC 2 (Figure 6) will reveal which metabolites are important in separating the two samples. To generate a 2D plot of the PLS-DA loadings, start a new scatter plot from the Plot/List menu, select the Observations and Loadings data type, and select p[1] for the X-Axis and p[2] for the Series. There is a direct geometric link between the scores plot and the loadings plot. The metabolites responsible for "pulling" samples to, for example, the lower-left quadrant of the scores plot can be found in the lower-left quadrant of the loadings plot. For example, you can see that alanine, fumarate, choline, inosine, lactate and myo-inositol are significant in characterizing samples from the hippocampus (red circles), and that aspartate, hypoxanthine, acetate, O-phosphocholine, glycine and 4-aminobutyrate are significant in characterizing samples from the frontal cortex (black squares).