## Publications available online

English, T., and G. W. Greenwood.
Intelligent
Design and Evolutionary Computation

© 2008 Springer. Personal use of this material is permitted.
However, permission to reprint/republish this material for advertising
or promotional purposes or for creating new collective works for resale
or redistribution to servers or lists, or to reuse any copyrighted
component of this work in other works must be obtained from Springer.

Excerpts from
Chapter 1
of

*Design by Evolution: Advances in Evolutionary Design*, P. F. Hingston, L. C. Barone, and Z. Michalewicz (eds.). English, T.
No More Lunch: Analysis of
Sequential Search

© 2004 IEEE. Personal use of this material is permitted.
However, permission to reprint/republish this material for advertising
or promotional purposes or for creating new collective works for resale
or redistribution to servers or lists, or to reuse any copyrighted
component of this work in other works must be obtained from the IEEE.

**Note: This invokes results proved in “On the Structure of Sequential Search: Beyond ‘No Free Lunch’.”**

**Abstract**—Sequential search algorithms of the type predicated in conservation theorems are studied in their own right. With representation of functions as strings, the sets of test functions and search results are identical. This allows sequential search algorithms to be treated as operators on distributions on functions. Certain distributions, referred to as block uniform, are fixed points for all algorithms. Sequential search preserves the identity of the nearest fixed point and the Kullback-Leibler distance to that point. In practice, distributions of test functions are not block uniform, and conservation properties hold to a degree that depends upon distance to the nearest fixed point. Randomized sequential search is also analyzed. Here the search operator generally moves the distribution closer to the nearest fixed point. A random walk transforms any distribution into the nearest fixed point.

Draft paper submitted for review:
4th European Conference on Evolutionary Computation in Combinatorial
Optimization (EvoCOP2004). Revised paper to appear in conference
proceedings.

**Note: This paper proves some properties of sequential search assumed in “No More Lunch: Analysis of Sequential Search.” I recommend reading the latter paper first.**

**Abstract**—In deterministic, exhaustive, sequential search the algorithm permutes a test function to obtain the search result. The mapping from test functions to search results is a one-to-one correspondence. There is a partition of the set of functions into maximal blocks of functions that are permutations of one another. This permits reasoning about search to be reduced to reasoning about search within blocks of functions. It also leads to the definition of the block uniform distribution, in which functions within the same block have the same probability. It is established that the Kullback-Leibler distance to the nearest block uniform distribution is identical for the distributions of test functions and search results. The distance is zero if and only if all search algorithms have identically distributed search results. That is, a block uniform distribution of test functions is necessary and sufficient for “no free lunch.”

English, T. M.
Introduction to
‘No Free Lunch’ in Optimization and Learning

Slides for a
tutorial presented at the 2001 Congress on Evolutionary Computation
(CEC2001), Seoul, March 27, 2001.

© 2000 Springer-Verlag Berlin Heidelberg New York

(published in the Lecture Notes in Computer Science series)

(published in the Lecture Notes in Computer Science series)

**Abstract**—Three theoretical perspectives upon conservation of performance in function optimization are outlined. In terms of statistical information, performance is conserved when the distribution of functions is such that all domain subsets of a given size have identically distributed random values. In terms of algorithmic information, performance is conserved because almost all functions are algorithmically random. Also, an optimizer’s algorithmic complexity bounds the information gained or lost in its processing of the test function. The practical consequences of these theoretical results are explored, with emphasis upon the results from algorithmic information theory, which are new.

English, T. M.
Ranking Models for
Combination

© 2000 The Society of Photo-Optical Instrumentation Engineers

**Abstract**—A considerable amount of research has addressed the methods and objectives of model combination. Very little attention has been given to the question of how to obtain a good collection of models for combination. Here a rationale for inductive inference of multiple models of time series is developed in terms of algorithmic information theory. A model-based Kolmogorov sufficient statistic is described, and is utilized in a recursive scheme for ranking models in a population. Ranks are assigned in such a way that the

*n*top-ranked models are considered to be the best subset of

*n*models to use in combination. The ranking scheme is appropriate for use in the selection operation of an evolutionary computation. The treatment is primarily theoretical, but several practical issues in ranking and model combination are addressed.

© 2000 IEEE. Personal use of this material is permitted.
However, permission to reprint/republish this material for advertising
or promotional purposes or for creating new collective works for resale
or redistribution to servers or lists, or to reuse any copyrighted
component of this work in other works must be obtained from the IEEE.

**Abstract**—Elementary results in algorithmic information theory are invoked to show that almost all finite functions are highly random. That is, the shortest program generating a given function description is rarely much shorter than the description. It is also shown that the length of a program for learning or optimization poses a bound on the algorithmic information it supplies about any description. For highly random descriptions, success in guessing values is essentially accidental, but learning accuracy can be high in some cases if the program is long. Optimizers, on the other hand, are graded according to the goodness of values in partial functions they sample. In a highly random function, good values are as common and evenly dispersed as bad values, and random sampling of points is very efficient.

Preprint of position paper appearing in

*Integrating Multiple Learned Models*(*IMLM*-96).**Abstract**—The objective is to predict on the basis of recent observations the behavior of a nonlinear dynamical system. An evolutionary algorithm searches a large space of model topologies and precision hyperparameters, fitting limited-precision models to older observations and assessing how well the models generalize to more recent observations. In each generation, topology-precision pairs are ranked according to an inductive criterion favoring validation errors that are not only small but low in redundancy with those of higher-ranked pairs. The fitted models associated with the highest ranks of a generation are combined in a nonlinear fashion. Their predictions are treated collectively as the state vector of the modeled system, and memory of errors in similar states of the past is used to correct and combine predictions in the present. Model combinations of different generations are themselves similarly combined. Empirical results are given for populations of recurrent neural networks modeling chaotic systems.

**Emended reference list**

English, T. M.
Evaluation of Evolutionary and
Genetic Optimizers: No Free Lunch [Emended and Amplified]

© 1996 MIT Press

**Note: The paper correctly identifies exchangeabiilty of values of the random objective function as a necessary and sufficient condition for “no free lunch.” However, the exposition incorrectly focuses on “conservation of information,” which is really just statistical independence of the sample selection process and the random values in the sample.****Abstract**—The recent “no free lunch” theorems of Wolpert and Macready indicate the need to reassess empirical methods for evaluation of evolutionary and genetic optimizers. Their main theorem states, loosely, that the average performance of all optimizers is identical if the distribution of functions is average. The present work generalizes the result to an uncountable set of distributions. The focus is upon the conservation of information as an optimizer evaluates points. It is shown that the information an optimizer gains about unobserved values is ultimately due to its prior information of value distributions. Inasmuch as information about one distribution is misinformation about another, there is no generally superior function optimizer. Empirical studies are best regarded as attempts to infer the prior information optimizers have about distributions–i.e., to determine which tools are good for which tasks.