Documentation

Efficiently Updatable Neural Network

NNUE is designed to efficiently compute propagations in neural networks. The concept is based on the fact that a single move involves at most three piece differences on the board. This allows the state to be updated using only the material differentials, rather than performing a full calculation from the input vector.

The "accumulator" is a positional state that stores transformed features. It "accumulates" all the differentials from the initial state (empty board).

Network Architecture

Each subnetwork consists of one feature transformer and three dense layers, two of which are followed by clipped ReLU activations. There are a total 8 of subnetworks that make up the entire network (either Big or Small), with each subnetwork (bucket) being selected based on the number of pieces on the board. Refer to the sections below for details on each layer:

SmallNet

SmallNet is a replacement for the former HCE (handcrafted evaluation), often referred to as "classical evaluation". It has the same network architecture as Big network, except that the size of L1 is smaller. When one side is winning or losing by a significant material difference, it is assumed that a less precise evaluation is sufficient for such positions.

Determining Network Types

Stockfish first evaluates a position only with pure material scores to determine which network type to use in NNUE.

In evaluate.cpp:

int Eval::simple_eval(const Position& pos, Color c) {
    return PawnValue * (pos.count<PAWN>(c) - pos.count<PAWN>(~c))
         + (pos.non_pawn_material(c) - pos.non_pawn_material(~c));
}

bool Eval::use_smallnet(const Position& pos) {
    int simpleEval = simple_eval(pos, pos.side_to_move());
    return std::abs(simpleEval) > 962;
}

When a position is evaluated in SmallNet, however, if the evaluation and PSQT have opposing signs or if the score itself is small, BigNet is used subsequently.

    // Re-evaluate the position when higher eval accuracy is worth the time spent
    if (smallNet && (nnue * psqt < 0 || std::abs(nnue) < 227))
    {
        std::tie(psqt, positional) = networks.big.evaluate(pos, &caches.big);
        nnue                       = (125 * psqt + 131 * positional) / 128;
        smallNet                   = false;
    }

Version Update History

Refer to the nnue-pytorch NNUE documentation to view visualized structures of each architecture.

  • SFNNv9 (current): Increased L1 size to 3072. commit

  • SFNNv8: Increased L1 size to 2560. commit

  • SFNNv7: Increased L1 size to 2048. commit

  • SFNNv6: Increased L1 size to 1536. commit

  • SFNNv5: Introduced squared clipped ReLU layer, output mixed within L1. commit

  • SFNNv4: commit

    • Reduce the input size by half and double the output size of L1.

    • Split L1 activation output.

  • SFNNv3 (tentative): Changed feature set from HalfKAv2 to HalfKAv2_hm. commit

  • SFNNv2 (tentative): commit

    • Changed feature set from HalfKP to HalfKAv2.

    • Increased L1 size to 512x2.

  • Introduction of NNUE: commit

Feature Transformer

Accumulator Update Strategy

A position is sometimes not evaluated in NNUE when a matching TT (transposition table) entry is found. Stockfish searches for the last computed accumulator and determine which is faster: whether to update from it or to construct the accumulator from the cached entry. The decision is often based upon the update cost and refresh cost, which are feature-specific values. Check Feature Sets section for more information.

Normal Position Evaluation

When doing an incremental update, Stockfish updates one or two accumulators; if the previous state has already been computed, only the current position needs to be updated. Otherwise, the accmulators of both the current position and the position following the computed one are updated.

When update is cost is greater than refresh cost, or if the king has moved, the accumulator is updated from the corresponding cache entry.

Figure 1. Update accumulators

Common Parent Position Hint

If it is expected that some child nodes will be evaluated later, Stockfish preemptively computes the accumulator of the parent position, optimizing further material differential calculations. This is done in move-excluded searches (in singular extension), at TT-hit PV nodes, or after a failed ProbCut searches.

Figure 2. Hint common parent position

Incremental Update

WIP

Refresh Cache and Update

WIP

Feature Sets

Order from latest to oldest.

HalfKAv2_hm

WIP

Layers

Affine Layer

Sparse Input Optimization

Clipped ReLU Layer

Square Clipped ReLU Layer

Last updated