Struct accept_any_index

Struct Documentation

struct accept_any_index

Builds a custom evaluator (see CacheManager::custom_evaluator_type) that evaluates a subtree in batches over a contracted index, to bound the peak memory of intermediates that carry that index.

For each node it is consulted on, the returned evaluator chooses a batch axis K via batch_axis(node, accept) (declining if none). It asks the backend to partition K into contiguous element-range batches of about target_batch_size elements each (Result::mode_batches); if that yields at most one batch it declines (so small / unselected indices are left to the standard scheme).

Otherwise it replays the build of every compatible persistent final in the same batch passes: the group is the trigger node plus every key of cache that is registered persistent, not yet alive, and batches over an axis with the identical realized partition. Per batch, each group member is evaluated by the standard scheme — with every leaf carrying the member’s batch axis sliced to the batch’s element range — on a shared registered scratch cache (see detail::make_batched_scratch), so sub-intermediates repeated within a member (canonically-equal siblings) or shared between members are evaluated once per batch, exactly as the real cache would share them; the per-member partials are summed across batches. This is exact because sum_K = sum_{batches} sum_{K in batch}, and never materializes the whole batch-axis extent of any intermediate at once. Completed members are stored into cache (canonical-phase convention); the trigger’s result is returned for evaluate() to cache as usual. Members nested inside other members evaluate in earlier passes and are then seeded (slice-free w.r.t. the outer batch axis) or re-derived sliced in the outer pass. Considering a group candidate costs one leaf evaluation (the mode_batches probe); with an unregistered (empty) cache the group is just the trigger.

Why a group of trees rather than the trigger alone: sub-intermediates are shared between separately-intercepted finals, and a scratch scoped to one final cannot see the other consumers. Concretely, in DF-based PNO-CCSD the half-transformed DF factor gC = g.C (g the 3-index DF factor carrying the aux index K, C the PNO coefficients) feeds both canonically-equal gCC children of the particle-particle-ladder intermediate W = gCC.gCC and the triply-transformed final gCCC. Unbatched, the real cache builds gC once and serves all three uses (its keys are canonical, max_life = 3). Batching each final in isolation rebuilds gC n_batches times per final — the shared scratch of a single pass dedups W’s two gCC children within each batch, but cross-final sharing with gCCC is restored only by streaming both finals over the same batch partition in the same passes, which brings gC back to one evaluation per batch (work parity with the unbatched path, at sliced rather than full intermediate peak memory).

Param le:

the leaf evaluator (captured).

Param target_batch_size:

the desired size of each batch in elements (a user knob; no memory model is assumed). Backend-neutral: a tiled backend rounds batch boundaries to tile boundaries, so realized batches are uneven and each covers at least this many elements where possible.

Param accept:

predicate selecting which contracted indices may be batched (e.g. only those in the auxiliary/RI IndexSpace). Defaults to any.

Param make_scope_guard:

factory, called with the batch count, returning an RAII object held for the duration of the batched partial contractions; a backend may use it to relax block-sparse screening (scaled by the batch count) so per-batch screening does not drop small contributions that are significant once summed over the full batch axis. Defaults to a no-op (make_no_scope_guard).

Param is_volatile:

predicate flagging a volatile leaf node (e.g. an amplitude tensor); the evaluator declines to batch any node whose subtree contains such a leaf, so only persistent (build-once) subtrees are streamed. Defaults to never_volatile (no persistence gate). Same classification as the eval cache’s volatility predicate. Kept last so the prior 4-argument form (…, accept, make_scope_guard) still compiles unchanged.

Public Functions

inline bool operator()(Index const&) const noexcept