In this note we present a rapid, efficient method for identifying metabolites in biofluid NMR spectra using targeted profiling. Conventional techniques for identifying and quantifying metabolites in such spectra are labor-intensive and error-prone, as positions and linewidths of peaks can vary widely with changes in pH and other solution matrix effects. The length of time and level of operator skill needed to analyze large numbers of these complex spectra (Figure 1) are significant barriers to the widespread application of NMR in metabolomics.
Variability in spectra due to peak position and linewidth changes is often handled by a form of data reduction known as "spectral binning" or "spectral bucketing" [1] [2]. The spectrum is divided into a number of regions, or bins (K), and the total integrated area within each bin is considered in further analysis. The bins can be of a fixed width or they can be of variable size, using manual inspection for the best optimization or automated algorithms for high throughput [3]. Use of spectral binning is based on the premise that considering regions of the spectra, as opposed to individual data points, can compensate for minor peak shifts and linewidth differences for the same compound across multiple samples (N).
Figure 1. Analyzing complex NMR spectra like this 800 MHz spectrum of a brain tissue extract can be a daunting task.
In practice, sensitivity to sample conditions, along with baseline and other spectral distortions that can arise during an experiment, mean that the bin integrations do not necessarily reflect true changes in spectral area [4] [5]. Since pattern recognition techniques such as principal component analysis (PCA) depend on linear combinations of the input spectral bins, artifacts in the integrations of input bins will compromise the analysis. If the metabolites of interest vary subtly, peak position and linewidth differences between samples, as well as instrumental and spectral artifacts, may mask significant changes to their concentrations. Statistical tools such as orthogonal signal correction can be applied to regression-type experiments; these tools have been shown to eliminate components of the data which are not relevant to the analysis, including the effects of physiological and instrumental variation [6].
In spite of these advances in spectral pre-processing algorithms and pattern recognition methods for spectral binning data, little information is available about individual metabolites and their concentrations in the sample. Any statistical treatment of NMR spectra in metabolomics is based on the idea that metabolites are the actual variables of interest. The ideal statistical treatment would directly involve the concentrations of all metabolites in the samples, since these values represent the underlying physical model that generates the observed NMR spectrum. Such a treatment would allow analysis of compounds either jointly, for total characterization of the mixture, or selectively, for targeted characterizations such as pathway modeling. In this case, there would still be N samples, but each variable K would be the concentration of a single metabolite.
Targeted profiling is an approach to data reduction that involves comparison to NMR spectral signatures of individual metabolites found in a reference database. This technique works by reducing spectral data to quantified metabolites, which can then be used as input variables in pattern recognition tools such as PCA or projection to latent structures (PLS). The approach reduces the dimensionality of the problem space compared to spectral binning, as assignment of all protons in a compound will show all spectral regions correlated to that compound. As a result, a variety of approaches to targeted profiling have recently been developed for both in vivo and in vitro NMR [4] [7] [8]. This method is most useful when investigating predefined compounds of interest.
Chenomx NMR Suite uses targeted profiling to reduce analysis time, combining advanced analysis tools with a compound library of more than 230 common metabolites. Targeted profiling can be applied to NMR spectra of virtually any complex mixture, including urine, blood serum, saliva and various cell extracts. Data reduction methods such as spectral binning are not necessary; in fact, they reduce the quality of analysis possible with targeted profiling by abstracting away key details of the spectrum.
Before performing an analysis using Chenomx NMR Suite’s targeted profiling, it is important to accurately define and characterize the chemical shape indicator (CSI) in a sample using the Processor module, in order to obtain accurate analysis results. Once the CSI is appropriately defined, the spectrum can be moved to the Profiler module to begin analysis.
A major strength of targeted profiling is the ability to analyze for particular compounds selectively. Specific compounds can be fit independently of the rest of the metabolite library, providing results for only those compounds. The targeted results can then be exported for further analysis using statistical or other data analysis software, without the need to completely characterize the entire spectrum.
Selecting compound signatures to fit to an experimental spectrum can be approached in two ways: A set of compounds can be determined in advance, and their signatures selectively fit to the spectrum, or peaks and clusters can be selected out of the spectrum and correlated to individual compound signatures.
Figure 3. The frequency filter displays compounds that occur near the selected location in the compound table.
Predetermining compounds to be fit allows the pursuit of precisely-defined studies, explicitly targeting a desired subset of compounds. This approach could be used to study a particular metabolic pathway, to monitor the known metabolites of a prescription medication or even simply to focus a study on only the most commonly-detected metabolites. Text search and category filters (Figure 2) can help to find the compounds desired and allow their signatures to be displayed and fit.
Selecting compounds based on peaks and clusters seen in the spectrum permits a more open-ended approach, treating the analysis of the spectrum as a discovery process. Such an analysis could be used to discover metabolic implications for a new medication, to establish novel biomarkers for disease states, or to prepare detailed metabolic profiles to be used, for example, in longitudinal research studies. The frequency filter (Figure 3), accessed by right-clicking on the spectrum, can provide a helpful shortlist of compounds that may appear in the current region of interest.
Figure 4. Visual inspection is often enough to confirm or reject the presence of a compound in a sample.
The metabolite libraries supplied with Chenomx NMR Suite contain dynamic compound signatures that can be individually fine-tuned to provide the best possible fit to an experimental spectrum. Once selected, a compound is fit through a simple yet powerful click-and-drag interface. A compound signature initially appears in the spectrum display as a series of dots indicating cluster positions. Dragging one of these dots upward increases the height of all of the clusters in the compound signature, allowing a direct visual comparison with the corresponding peaks in the experimental spectrum (Figure 4). This is often enough to determine whether or not the compound is a likely component in the sample.
Figure 5. The compound table displays the compounds selected by the various filters, as well as results of the fitting process, including concentrations.
Concentrations of compounds are calculated based on comparison to the CSI that has been defined for the sample. As the height of the signature changes, the resulting calculated concentration updates dynamically, and is reflected in the compound table (Figure 5). In many cases, the initial locations of the clusters in a compound signature are not completely consistent with their actual locations in the spectrum. Most often, such discrepancies are due to differences in sample pH, or more generally to a variety of other solution matrix effects. To help compensate for these variations, each cluster within a compound signature can be moved independently of the other clusters. The cluster navigator tool (Figure 6) provides easy access to the compound signature's individual clusters. The basic fitting process can thus be simply described as the combination of setting signature heights and cluster positions.
Figure 6. The cluster navigator helps to select specific clusters of the displayed compound. In this example, clusters 0 and 1 appear in the current display, and cluster 0 is selected.
As each cluster is fit, the sum line, red by default, updates to reflect the changes made. The sum line should be adjusted via changes to individual compounds to match the actual spectrum as closely as possible. This is especially important in crowded regions, as there may be several compounds contributing to the overall fit at a given location (Figure 7). As a result, it may be necessary to adjust several compounds to properly adjust the fit in these crowded regions. When the fit is complete, the compound table contains a list of all compounds identified and their respective concentrations.
Figure 7. Strong overlap can yield a sum line (red) that is quite different from the contributing compounds (blue and green).
After all the desired compounds have been fit, further information is often necessary to place the identified compounds in context. Detailed information about a particular compound may be accessed by double-clicking on its entry in the compound table. This opens the reference panel along the right side of the window, containing structures, alternate names, CAS-RN, and links to external databases, including KEGG Ligand, ChEBI and PubChem.
For further analysis, results from single or multiple spectra may be exported as a comma-delimited text file (.csv) for import into statistical analysis software such as Umetrics, or as an XML MetaProfile (.xmp). The MetaProfile can be used as a starting point for analyzing similar spectra in the same dataset, or imported into any software capable of handling XML-encoded data.
Targeted profiling in Chenomx NMR Suite is a versatile technique that may be used both as a primary research tool and as a component of a larger workflow. Existing spectra can be imported directly, or processed in one of several major NMR processing packages and imported afterward. PCA can be applied to the concentration data instead of the raw spectrum, providing principal components as specific compounds instead of just important regions that must later be identified. With targeted profiling, identification of compounds is an integral part of the analysis process, instead of a time-consuming post-processing step.
Chenomx would like to thank Brent McGrath and Dr. P. Silverstone in the Department of Psychiatry at the University of Alberta for providing data for this application note.
[1] 1991. "Application of pattern recognition methods to the analysis and classification of toxicological data derived from proton nuclear magnetic resonance spectroscopy of urine". Molecular Pharmacology. 629-642.
[2] 1994. "Pattern-recognition classification of the site of nephrotoxicity based on metabolic data derived from proton nuclear magnetic resonance spectra of urine". Molecular Pharmacology. 199-211.
[4] 2005. "Curve-fitting method for direct quantitation of compounds in complex biological mixtures using H-1 NMR: Application in metabonomic toxicology studies". Anal Chem. 4556-4562.
[5] 2001. "NMR of biofluids and pattern recognition: Assessing the impact of NMR parameters on the principal component analysis of urine from rat and mouse". J Pharm Biomed Anal. 463-476.
[6] 2002. "Application of orthogonal signal correction to minimise the effects of physical and biological variation in high resolution 1H NMR spectra of biofluids". Analyst. 1283-1288.