Alpha-beta pruning

In artificial intelligence, alpha-beta pruning is an algorithm for identifying and pruning branches that do not affect the calculation of correct minimax decisions, which is relevant in game solvers. AB-pruning is used in IBM’s Deep Blue chess player, the Chinook checkers player, and the Stockfish chess engine.

Core idea is that we can stop searching once we’ve found a node past a certain threshold:

For example, suppose we’re on a max layer. We already know 14, 12, and 8. The parent node (a min node) will pick the minimum value, which is currently 8.
- If the current max node’s children’s maximum is above 8, then we can stop the search.
- Because the max node will always pick the maximum, so it’ll pick something above 8.
- But the min node will always pick the minimum, so it’ll rule out the entire branch and will never pick any branches of the node.

To formalise:

At a max node $S$ , we define:
- $α$ is the highest value of $S$ ’ branches examined so far. It changes as branches of $S$ are examined.
- $β$ is the lowest value found so far by the parent node $P$ (a min node). It is fixed while we examine the children of $S$ .
At a min node $S$ , we define:
- $α$ is the highest value of the parent from previous explored siblings of $S$ . Fixed.
- $β$ is the lowest value examined so far. Changes as branches of $S$ are examined.

i.e., $α$ is always associated with a max node, $β$ is always associated with a min node.

Analysis and extensions

The effectiveness (how many nodes are pruned) of a-b pruning is highly dependent on the order in which the nodes are visited. We want to visit the best branch first, i.e., for MIN nodes, the best pruning occurs if the optimal move for MIN is explored first. This allows the minimax value for $α$ to be triggered early, and more nodes pruned. In the worst-case, no pruning happens and the time complexity is $O (b^{d})$ . With perfect pruning, we get a complexity of $O (b^{\frac{d}{2}})$ .

Two problems: we don’t know which branch has highest or lowest value without doing all the work. And many real games (for chess, $b \approx 35, d \approx 50$ ) cannot practically apply base a-b pruning.

One strategy is to limit the depth of our search. We apply an evaluation function (basically a heuristic) at the depth limit. It’s defined as a function assigning an estimate of utility values to non-terminal states. This represents the “goodness” of a game position. Many evaluation functions can be represented as a weighted sum of features, i.e.,

h (n) = w_{1} f_{1} (n) + w_{2} f_{2} (n) + \dots

Deep Blue (the chess engine) used about 6000 features in its evaluation function. It also used an opening book with 4000 positions and an endgame database with 6-piece endgames and all ≤5-piece endgames. AlphaGo used neural networks that provided an estimate of the value. It also used a randomised search to randomly sample which children to explore. Running this search multiple times allowed it to give an estimate of which move has the best feature.

Implementation

In the value function, we take the state, $α$ , and $β$ :

If the state is a terminal state, return the state’s utility.
If the next agent is MAX: return max-value(state, $α$ , $β$ )
If the next agent is MIN: return min-value(state, $α$ , $β$ )

Max-value function:

Initialise $v = - \infty$ .
For each successor of the state:
- $v = max (v, value(successor, α, β))$ .
- $α = max (α, v)$
- If $α \geq β$ , return $v$ .
Return $v$ .

jszhn

Recent Notes

ALOHA

ARP

American literature

Assert

Atomics

Alpha-beta pruning

Analysis and extensions

Implementation

Graph View

Backlinks