The Maximum Entropy Method of Data Analysis


I. Read this Overview First II. The Numerical Inversion Problem III. Two Sample MEM Inversions IV. Motivating the MEM with Monkeys V. Motivating the MEM with Kangaroos VI. MEM Algorithms VII. MEM Algorithms (cont.) VIII. Lifetime Distributions IX. Fourier Analysis X. Deconvolution XI. Positron Emission Tomography XII. Summary & Comments CMM
Rooted in information theory, the mem seeks to extract as much information from a measurement as is justified by the data's signal-to-noise ratio. It has been used successfully in a variety of fields, including radio astronomy, NMR, fluorescence, and ligand rebinding. The method is by no means restricted to conventional imaging problems, but the word `image' will be used throughout this discussion to refer to any function to be extracted from the data.

This discussion is confined to linear numerical inversions only because they are relatively easy to handle computationally. For data that are linearly related to the desired `image,' the mem solution is unique; there is no `multiple minimum' problem. Application of the mem to nonlinear inversions is generally more complicated but may be worth the effort.

We begin by defining the numerical inversion problem to be addressed. Several examples are listed in which a data set, D, can be interpreted in terms of an `image,' f. The desired image may be of any dimension, .e.g., a one-dimensional Laplace inversion of reaction kinetics (Steinbach et al., 1992) or a two-dimensional deconvolution of an astronomical image. Mathematically, we need to invert a matrix. The difficulty comes from our sampling of the data, which is both limited and noisy. Consequently, many images are consistent with the measured data, giving the same value of the goodness-of-fit statistic, chi-square. The maximum entropy method selects one image from this `feasible set' of images that describe the data equally well. Two sample mem inversions are given. The first is a traditional enhancement of a blurred image. In the second example the `image' to be recovered is the distrbution of activation enthalpies and entropies that describes the kinetics of ligand rebinding measured at multiple temperatures (Steinbach, 1996).

In conventional uses of the mem, the most uniform image consistent with the data is chosen. To motivate the mem choice, the traditional zoo of monkeys and kangaroos is invoked in illustrative arguments. In the spirit of the lab full of monkeys that randomly type great works of literature, we consider a monkey randomly generating images by throwing identical balls (intensity) into identical boxes (pixels). The entropy, S, of a given image is defined as the natural logarithm of the number of ways the monkey could generate it. Not surprisingly, for identical boxes, the image with maximum entropy is uniform (flat). To favor some boxes over others (e.g., to incorporate features in the image based on previous measurements), the definition of S is generalized in terms of a default image F (Skilling, 1989). The image with maximum entropy is then f = F. An automated bootstrapping of F can lead to reduced artifacts in particularly difficult cases, e.g., when broadly distributed and sharp features overlap (Steinbach, 2002).

Next, the `kangaroo problem' illustrates that maximizing S imposes no correlations on the image for which we have no evidence in the available data. This `maximally noncommital' image may not be correct in the sense that future experimental data may call for additional correlations in the image. But in the absence of such evidence, the mem chooses the most uniform image (relative to the default image F). In this sense, the mem choice is the most conservative image that describes the data at a given value of chi-square. As the data become less noisy, the image becomes more featured.

Following this summary of the mem philosophy, two mem algorithms are outlined. The first is only of historical interest. The second (Cornwell and Evans, 1985) was used to obtain all results presented here. We seek to maximize the entropy of the image, S, subject to the constraint that chi-square be sufficiently small. In some cases, a second contstraint is used to constrain the total intensity (normalization) of the image. The function to be maximized, Q, thus contains either two or three terms. A Lagrange multiplier is introduced for each constraint: lambda for the chi-square constraint and alpha for the normalization constraint (when used).

Next, a few useful algorithmic extensions are presented. For example, improved images are often recovered if the default image F is created by smoothing the image f obtained (with a flat F) for an intermediate value of chi-square and restarting the calculation employing this new F. In this way, low frequency features in the data are incorporated into the definition of S. For some problems, it is useful to imbed a mem inversion into a parameter optimization scheme so that parameters, P, other than the image intensities can be varied.

Application of the mem to Fourier analysis illustrates how the method effectively `leaves noise behind' in data space upon transforming to image space. The fast Fourier transform (FFT) of a time dependent signal brings all the noise to frequency space such that the inverse FFT back to time space recovers the data exactly (within roundoff errors). That is, no information is lost in an FFT. The mem approach to Fourier analysis is indeed coded using FFTs, but the function to be recovered by the inverse transformation back to time space is no longer the original data, but only the data to within some reasonable value of chi-square. Thus, the mem `backs off' the data so as to provide a minimally structured spectrum of frequencies. In the example shown, note that as chi-square is lowered the mem reproduces the one true peak better but also includes more of the high frequency noise.

Two more applications are presented: deconvolution and positron emission tomography. We conclude with a brief summary.

For more information or to discuss the application of the mem software demonstrated here to your data, feel free to contact Dr. Peter J. Steinbach, Center for Molecular Modeling, Center for Information Technology, National Institutes of Health (steinbac@mail.nih.gov).

Selected References

Steinbach, P.J. 2012. Filtering artifacts from lifetime distributions when maximizing entropy using a bootstrapped model. Anal. Biochem. 427: 102-105. Download software.

Steinbach, P.J. 2002. Inferring lifetime distributions from kinetics by maximizing entropy using a bootstrapped model. J. Chem. Inf. Comput. Sci. 42: 1476-1478.

Steinbach, P.J., R. Ionescu, and C.R. Matthews. 2002. Analysis of kinetics using a hybrid maximum-entropy/nonlinear-least-squares method: Application to protein folding. Biophys. J. 82:2244-2255.

Cornwell, T.J., and K.F. Evans. 1985. A simple maximum entropy deconvolution algorithm. Astron. Astrophys. 143:77-83.

Skilling, J. 1989. Classic maximum entropy. In: Maximum Entropy and Bayesian Methods. J. Skilling, editor. Kluwer Academic, Norwell, MA. 45-52.

Skilling, J., and R.K. Bryan. 1984. Maximum entropy image reconstruction: general algorithm. Mon. Notices R. Astron. Soc. 211:111-124.

Steinbach, P.J. 1996. Two-dimensional distributions of activation enthalpy and entropy from kinetics by the maximum entropy method. Biophys. J. 70:1521-1528. (See the second of two sample mem inversions.)

Steinbach, P.J., K. Chu, H. Frauenfelder, J.B. Johnson, D.C. Lamb, G.U. Nienhaus, T.B. Sauke, and R.D. Young. 1992. Determination of rate distributions from kinetic experiments. Biophys. J. 61:235-245.