Rank-deficient spectral FIA data
Nørgaard & Ridder (1994) have investigated a problem of measuring samples with three different analytes on a flow injection analysis (FIA) system where a pH-gradient is imposed. The data are interesting from a data analytical point of view, especially as an illustration of closure or rank-deficiency and the use of constraints.
Get the data
The data are available in zipped MATLAB 4.2 format. Download the data and write load data in MATLAB. If you use the data we would appreciate that you report the results to us as a courtesey of the work involved in producing and preparing the data. Also you may want to refer to the data by referring to
Nørgaard L, Ridder C, Rank annihilation factor analysis applied to flow injection analysis with photodiode-array detection. Chemometrics and Intelligent Laboratory Systems 23:107, 1994
The data have also been described in
- Bro, R, Multi-way Analysis in the Food Industry. Models, Algorithms, and Applications. 1998. Ph.D. Thesis, University of Amsterdam (NL) & Royal Veterinary and Agricultural University (DK).
- Bro R, Sidiropoulos ND, Least squares algorithms under unimodality and non-negativity constraints. Journal of Chemometrics 12:223, 1998
- Kiers HAL, Smilde AK, Constrained three-mode factor analysis as a tool for parameter estimation with second-order instrumental data. Journal of Chemometrics, 1998, 12, 125-147.
- Bro R, Harshman RA, Sidiropoulos N, Rank-deficient models for multi-way data, Journal of Chemometrics, Submitted.
The basic setup of the FIA system is shown in Figure 1a. A carrier stream containing a Britton-Robinson buffer of pH 4.5 is continuously injected into the system with a flow of 0.375 mL/min. The 77 µL of sample and 770 µL of reagent (Britton-Robinson buffer pH 11.4) are injected simultaneously into the system by a six-port valve and the absorbance is detected by a diode-array detector (HP 8452A) from 250 to 450 nm in two nanometer intervals. The absorption spectrum is determined every second 89 times during one injection. By the use of both a carrier and a reagent (Figure 1b) a pH gradient is induced over the sample plug from pH 4.5 to 11.4.
The three analytes present in the samples are 2-, 3-, and 4-hydroxy-benzaldehyde (HBA). All three analytes have different absorption spectra depending on whether they are in their acidic or basic form. Twelve samples of different constitution (Table 1) are measured. Thus the data set is a 12 (samples) × 100 (wavelengths) × 89 (times) array. The time mode of dimension 89 is also a pH profile due to the pH-gradient.
Table 1. The concentrations of the three analytes in the 12 samples.
For each sample a landscape is obtained showing the spectra for all times, or conversely the time profiles for all wavelengths (see below).
It is characteristic of FIA that there is no physical separation of the sample. All analytes have the same dilution profile due to dispersion, i.e., all analytes will have equally shaped total time profile. Above this profile is shown to the left bottom. This profile thus maintains its shape at all wavelengths for all samples and for all analytes. The total profile is the profile actually detected by the photometer (the manifest profile) and is the sum of the profiles of protonated and deprotonated analytes. Due to the pH-gradient, and depending on the pKa of a given analyte, an analyte will show up with different amounts of its acidic and basic form at different times, and hence will have different acidic and basic profiles in the sample plug. In the figure above these profiles are shown for one analyte. The first part of the sample plug, i.e., the earliest measurements of a sample, is dominated by deprotonated analytes while the end of the sample plug is dominated by protonated analytes.
In order to specify a mathematical model for the data array of FIA data initially ignore the time domain and consider only one specific time, i.e., one specific pH. An I × J matrix called Xk is obtained where I is the number of samples (12), J is the number of wavelengths (100), and k indicates the specific pH/time selected.
There are three analytes with three corresponding concentration profiles and there are six spectra, an acidic and a basic for each analyte. A standard bilinear model would be an obvious decomposition method for this matrix, but this is not very descriptive in this case. In the sample mode, a three-dimensional decomposition is preferable, as there are only three different analytes. However, each analyte exists in two forms (acid/base), so there will be six different spectra, to be resolved, requiring a six-dimensional decomposition in the spectral mode. To accommodate these seemingly conflicting requirements, a more general model can be used instead
Xk = AHBT, (1)
where A is an I × 3 matrix, and the columns are vectors describing the variations in the sample domain (ideally the concentrations in Table 4), B is a J × 6 vector describing the variations in the spectral domain (ideally the pure spectra), and H is a 3 × 6 matrix which defines the interactions between the columns of A and B. In this case it is known how the analyte concentrations relate to the spectra, as the acidic and basic spectrum of, e.g., 2HBA only relate to the concentration of 2HBA. Therefore H reads
The matrix H assures that the contribution of the first analyte to the model is given by the sum of a1b1T and a1b2T etc. By using only ones and zeros any information in H about the relative size of the interactions is removed; this information is represented in B. The H matrix is reserved for coding the interaction structure of the model.
So far, only a single time/pH has been considered. To represent the entire data set, the model must be generalized into a multi-way form. For each time the data can be represented by the model above except that it is necessary to adjust it such that the changes in relative concentration (acidic and basic fraction) can be represented as well. The relative concentration of each of the six acidic and basic analytes can be represented by a 6 × 1 vector at each time. The relative concentrations at all K times is held in the K × 6 matrix C. To use the generic model at the kth time it thus is necessary to scale the contribution from each analyte by its corresponding relative concentration. The six weights from the kth row of C are placed in a 6 × 6 diagonal matrix Dk so that the sth diagonal element gives the relative amount of the sth species. The model can be then written
or, in other words, as
Xk = AHDkBT. k = 1, ..., K (4)
Note how the use of a distinct H and C (Dk) matrix allows the qualitative and quantitative relationships between A and B to be expressed separately. The interaction matrix H, which is globally defined, gives the interaction structure; it shows exactly which factors in A are interacting with which factors in B. In contrast, the C matrix gives the interaction magnitudes. For every k the kth row of C (diagonal of Dk) shows to which extent each interaction is present at the given k. The distinction between qualitative and quantitative aspects is especially important, since knowledge of the exact pattern of interactions is not always available. Not fixing H as here allows for exploring the type of interaction. This can be helpful for rank-deficient problems in general. The matrix C also has a straightforward interpretation as each column in C will be the estimated FIAgram or time profile of the given analyte in its acidic or basic form. Note that the model above bares some resemblance to the PARAFAC model
Xk = ADkBT, (6)
but differs mainly by the introduction of the matrix H, which enables the interactions between factors in different modes. It also enables A and B/C to have different column dimensions.
The PARATUCK2 model is given
while the FIA model is
Xk = AHDkBT. (8)
This FIA model can be fitted a what has been called a restricted PARATUCK2 model (which is now more generally referred to as a PARALIND model - see references above). The matrix H remains fixed during the analysis to ensure that every analyte only interact with two spectra/profiles, namely its acidic and its basic counterpart.