Typical Input and Output Files

Next: Coincident Sharp and Broad Up: Examples Previous: Examples

Typical Input and Output Files

Recommendation: Because MemExp generates several files, it's convenient to place each raw data file in its own subdirectory before analyzing it with MemExp.

MemExp produces textual and graphical summaries of each numerical inversion. Let's look in detail at the output generated for the first test case that is downloaded with the MemExp software. It results from the following command:

memexp-6.0.exe simple dir1/data.001 1 none 1. 1. 2 40 1 5.0D-4 0

Here, the data in file data.001 stored in subdirectory dir1 are to be analyzed. All MemExp output will be written to this subdirectory. This simple analysis calls for three calculations in turn: one preliminary MEM inversion, an estimation of the data's standard errors based on the appropriate MEM fit, and a final series of fits to the data by both continuous and discrete kinetic descriptions. NOTE: When MemExp is run on different platforms, small differences in round-off may propagate to produce slightly different output. Therefore, your results for the test case may not be exactly the same as the output shown here.

1. Preliminary MEM Fit with Uniform Errors or Errors Estimated from Smoothed Data

The progress of the calculation is written to the file data.001.out. Information is plotted in the PostScript file data.001.out.ps every time a MEM distribution is written to disk (up to a maximum of PLOTS). The calculation proceeds from top to bottom in these graphical summaries. Each row of plots characterizes the MEM distribution at one point along the calculation. On the left, the data (black), fit (colored), and baseline (dotted, if nonzero) are plotted for the last half of the time interval spanned by the measurement. This plot is useful in evaluating the computed baseline. In the middle column, the data, fit, and baseline are plotted over the entire temporal range. More importantly, the residuals (colored) and the autocorrelation of the residuals (black) are plotted. The correlation length of the residuals $\tau_c$ is given. On the right, the continuous g (colored solid) and h (colored dashed) rate distributions are plotted. If F is derived from f during the MEM calculation, F is plotted (black dotted) (See data.001_e1.out.ps below). This plot on the right is labeled by the file extension, C, the normalization , and the two quantities that measure convergence to the maximum-entropy solution, Ratio (r) and Test (T). Note that Test is a good measure of convergence when only the Lagrange multiplier $\lambda$ is used (equation 3). Ratio is a good measure whether or not the Lagrange multiplier $\alpha$ is also used (equation 3).

In each graphical summary, the fit recommended by MemExp is marked by an asterisk (*) on the plot's right hand side. Because no automated selection criteria will choose optimally in all cases, the user of MemExp should compare these recommended fits to those that immediately precede and follow them in the graphical summary. The MemExp selection criteria performed very well for realistic simulated data [1].

2. Error Estimation

Following this preliminary MEM inversion, the recommended fit is analyzed to determine error estimates according to parmeters that are assumed by MemExp and written to the file data.001.ana. These errors are assigned to the kinetics by writing a new file named in the MemExp command dir1/data.001_e1. The errors are also plotted in the file data.001_e1.sigma.ps Here, the root mean-square deviations (rmsd) of the recommended MEM fit from the data are plotted (squares), along with the smoothed error estimates (dashed) derived from them. Also plotted at the top of the page (solid) are the residuals in this smoothing process, i.e., the smoothed estimates minus the unsmoothed rmsd values.

3. Distributed and Discrete Fits with Time-dependent Errors

Next, data.001_e1 (original kinetics with MEM-derived standard errors) is numerically inverted. The parameters written to the file data.001.mem call for a hybrid distributed/discrete fitting of the data (MAXEXP 0). The MEM calculation is summarized in data.001_e1.out and in data.001_e1.out.ps.

Here, IBIGF = 0. Whenever IBIG is nonzero, the prior model F derived from f is plotted in black. From that point on, that new F would define maximum entropy according to equation 4.

In addition, the fits by discrete exponentials are summarized in data.001_e1.exp.ps. Up to PLOTS discrete fits can be plotted in this PostScript summary. On the right, the discrete exponentials (colored vertical lines, dashed for negative amplitudes) are plotted. Also plotted in the right column of data.001_e1.exp.ps is the continuous distribution from which the initial fit parameter values were derived (black solid).

Once again, the errors estimated from this second MEM inversion are appended to a data file named in the memexp command line, dir1/data.001_e2. These errors are plotted in data.001_e2.sigma.ps. Convergence of the error estimates can be checked by comparing dir1/data.001_e1 and dir1/data.001_e2.

The .out Files

The .out files produced by MemExp begin with the echoing of input parameters. During inversions, these lines begin with 'RDPARS', the name of the subroutine that reads the parameters. (During analyses of images, these lines begin with 'ANALYZ'.) Should a MemExp run terminate prematurely due to incorrect input data, refer to the last few lines of successfully input data and a sample .def file to correct the mistake in the input data. If all inversion parameters have been read successfully, the .out file contains the line, 'Exiting routine RDPARS.'

The progress of the MEM calculation is recorded every NPRINT steps. Upon writing the MEM distributions to files, peaks in the distribution are characterized. The lines in the .out files beginning with 'LT_i:' and 'A_i:' report the mean log lifetime and the area estimated for each peak. The isolation/resolution of each peak is also characterized by two ratios: the intensity of the peak maximum divided by the intensity at the minimum on either side of the maximum. The smaller of these two ratios is reported as 'r_i'. The total area of the distribution is given by 'A_t'. When MAXEXP 0, discrete fits are performed with parameters initialized based on peaks in the MEM distribution. These fits are summarized by reporting the initial and final parameter values: 'Initial LT_i', 'Initial A_i', 'Final LT_i', etc.

MemExp automatically recommends one discrete and one distributed fit. See the lines that begin 'CHOOSX' and 'CHOOSE', respectively.

Image Files and Others

Distributions obtained by the MEM are written to files with names derived from the data file being inverted. For example, one MEM image stored during the numerical inversion of file data.001 might be named data.001.FLT.a.110. This would be the image obtained after 110 evaluations of the function Q to be optimized by the MEM (see above). The FLT indicates that the file was generated by MemExp, i.e, that it contains $f(log \tau)$ values. The a specifies that the optional Lagrange multiplier $\alpha$ was used; an l would indicate that only the Lagrange multiplier $\lambda$ was used. When data are analyzed with both the g and h distributions (NDIST ), all the g parameters are written to the file, then all the h parameters, and finally the baseline parameters. For example, if a linear baseline is used, the final four lines in the output file are the positive constant coefficient, the negative constant coefficient, the positive linear coefficient, and the negative linear coefficient, respectively. To facilitate subsequent plotting of the results when NDIST , the g and h distributions are written separately to data.001.FLT.a.110_pos and data.001.FLT.a.110_neg, respectively.

The $log \tau$ values and amplitudes obtained in fits by discrete exponentials are written to similar files, named data.001.FLT.a.N, where N is the number of exponentials in the fit. The associated baseline parameters are written to files named data.001.FLT.a.N_bln.

Recommendation

The residuals and the autocorrelation function of the residuals plotted in the MEM and NLS/ML PostScript summaries should be inspected visually. These plots are very helpful in evaluating the 'optimal' distributed and discrete fits recommended automatically by MemExp. Inspection of the PostScript file helps strike the compromise desired between two goals: a lifetime distribution possessing a minimal number of local maxima that produces: i) residuals that are acceptably uncorrelated and uniform in magnitude throughout the entire temporal range of the data and ii) a value of C of about 1.0 or somewhat less. Simply choosing the final f distribution stored during the inversion will generally result in over-fitting the data; the residuals may look good but at the expense of unwarranted structure in f and an unneccessarily large number of exponentials used in the discrete fit.

Next: Coincident Sharp and Broad Up: Examples Previous: Examples

Steinbach 2020-01-21