1.2. Theorical Foundations of Genetic Algorithms#
1.2.1. Schemas and the Two Armed Bandit Problem#
Holland’s schema analysis revealed that genetic algorithms implicitly estimate the average fitness of schemas, leading to “implicit parallelism” in the search process.
Selection in genetic algorithms focuses the search on subsets of the search space with estimated above-average fitness, while crossover combines high-fitness “building blocks” to create strings of higher fitness.
Mutation acts as an insurance policy to maintain genetic diversity in the population. Adaptation, according to Holland, involves a balance between exploration (testing new schemas) and exploitation (using and propagating known schemas), crucial for facing unpredictable environments.
Holland’s original genetic algorithm was proposed as an “adaptive plan” to achieve this balance.
Holland’s schema analysis demonstrated that genetic algorithms achieve a near-optimal balance between exploration and exploitation.
Holland’s theory of schemas initially assumed binary strings and single-point crossover but has since been extended to different representations and crossover operators.
1.2.1.1. The Two-Armed Bandit Problem#
The Two-Armed Bandit problem serves as a model to illustrate the tradeoff between exploration and exploitation, commonly studied in statistical decision theory and adaptive control.
In this scenario, a gambler is given N coins to play a slot machine with two arms, each with different mean payoff rates and variances.
The gambler’s goal is to maximize total payoff over N trials by allocating samples to each arm, without prior knowledge of which arm has a higher payoff rate.
The performance criterion is “on-line,” meaning that the payoff at every trial contributes to the final evaluation of performance.
Holland’s solution to the Two-Armed Bandit problem suggests that, as more information is gained through sampling, the optimal strategy is to exponentially increase the probability of sampling the better-seeming arm relative to the worse-seeming arm.
This solution can be applied to schema sampling in genetic algorithms, where schemas are analogous to arms in a multi-armed slot machine. The observed “payoff” of a schema is its observed average fitness, implicitly tracked by the genetic algorithm.
Holland’s claim, supported by the Schema Theorem, is that a near-optimal strategy for sampling schemas arises implicitly in the genetic algorithm, leading to the maximization of on-line performance.
1.2.1.2. Sketch of a Solution#
The Two-Armed Bandit problem illustrates the challenge of maximizing expected profits over N trials by allocating samples to two arms without prior knowledge of their payoff rates.
The gambler aims to minimize expected losses, defined as trials on which the true better arm is not sampled.
Holland’s solution involves calculating the losses
L(N - n, n)
overN
trials, wheren
is the number of trials allocated to the arm with the lower observed payoff.
The probability
q
represents the likelihood that the observed worse arm is actually the better arm, given the allocation of trials.
The solution entails finding
n*
that minimizesL(N - n, n)
by taking its derivative with respect ton
and solving forn
.
Holland approximated
q
using the difference in the sums of payoffs for the two arms and corrected it using the theory of large deviations.The optimal allocation of trials
n*
to the observed second-best arm is approximated by an equationinvolving constants
c1
,c2
, andc3
.As
n*
increases, the optimal allocation of trials to the observed better arm should increase exponentially with the number of trials to the observed worse arm.