Evaluation Trace Analysis¶
When the evaluator is built with SEQUANT_EVAL_TRACE, it emits one
log line per operation. A trace line is one of three kinds, identified
by its first token:
Eval | <mode> | <time> | [left=L | right=R |] result=X | alloc=A | hw=H | <label>Cache | <mode> | key=K | life=c/m | alive=N | entry=E | total=T | <label>Term | Begin | <expr>/Term | End | <expr>
The trace may be embedded in a larger log; the parsers in this guide ignore non-trace lines.
Setup¶
Save this helper next to
your trace file. It uses only the Python 3 standard library.
The sample trace below is from this file, produced by an mpqc4 CCSDT calculation
on a He chain. The same file holds the surrounding mpqc output; the
parsers skip non-trace lines automatically.
Note
The parsers don’t know about iterations. If your trace spans more than one CC iteration (or any other repeated context), aggregates like peak memory or total time blend across them — slice the row lists by line range or by term first. The sample here was capped at a single iteration to keep the examples clean.
Throughout the rest of this page, assume:
import sequant_trace as st
eval_rows, cache_rows, term_rows = st.parse("mpqc_out.txt")
parse walks the file once and returns three lists. Each list holds
NamedTuple rows (EvalRow, CacheRow, TermRow) so columns
are accessed by attribute (r.mode, r.time_ns, …). For one-stream
use the generators st.trace_eval(path) and st.trace_cache(path),
and the list-returning st.trace_term(path).
Reading evaluation steps¶
Each EvalRow has the fields:
modeone of
Tensor,Constant,Variable,Permute,MultByPhase,Sum,Product,SumInplace,Symmetrize,Antisymmetrize.time_nswall time in nanoseconds.
resultbytes of the buffer this op produces (for
SumInplace, the accumulator after the add).allocbytes of fresh allocation by this op.
0forSumInplace; equal toresultfor every other mode.hwlive working-set bytes at the moment of the op — cache contents + result + each operand not aliased to a cache entry.
left/rightoperand bytes, set only for
SumandProduct.labelthe operation’s textual label.
line1-indexed source line in the trace file.
extrasa dict capturing any
key=valuefields the parser didn’t recognize. Empty today; populated automatically if the evaluator starts logging additional columns.
Distinct modes in this trace:
sorted({r.mode for r in eval_rows})
['Constant', 'Permute', 'Product', 'SumInplace', 'Symmetrize', 'Tensor']
First few entries:
st.print_table(eval_rows, max_rows=5)
mode | time_ns | result | alloc | hw | left | right | label | line
--------+-----------+-----------+-----------+-----------+-----------+-------+-----------------------------------------------+-----
Tensor | 412107750 | 163879224 | 163879224 | 163879224 | None | None | g(i_2,a_1,a_2,a_3) | 346
Tensor | 1291 | 640 | 640 | 640 | None | None | t(a_3,i_2) | 347
Product | 3307083 | 4992 | 4992 | 163884856 | 163879224 | 640 | g(i_2,a_1,a_2,a_3) * t(a_3,i_2) -> I(a_1,a_2) | 348
Tensor | 750 | 640 | 640 | 640 | None | None | t(a_2,i_1) | 349
Product | 68917 | 640 | 640 | 6272 | 4992 | 640 | I(a_1,a_2) * t(a_2,i_1) -> I(a_1,i_1) | 350
left and right are populated only for Sum and Product —
the only ops whose operand sizes can differ from the result. For other
modes those fields are None (omitted from the trace), not zero, so
a logged 0B always means an empty buffer.
Top five time-consuming steps:
top_t = sorted(eval_rows, key=lambda r: r.time_ns, reverse=True)[:5]
st.print_table(top_t,
columns=["mode","time_ns","result","alloc","hw","label","line"])
mode | time_ns | result | alloc | hw | label | line
-------+-------------+------------+------------+------------+--------------------+-----
Tensor | 10352466209 | 3100868920 | 3100868920 | 3100889240 | g(a_1,a_2,a_3,a_4) | 688
Tensor | 412107750 | 163879224 | 163879224 | 163879224 | g(i_2,a_1,a_2,a_3) | 346
Tensor | 263996250 | 163879224 | 163879224 | 163900616 | g(a_1,a_2,i_1,a_3) | 797
Tensor | 237571833 | 163879224 | 163879224 | 163900616 | g(i_3,a_2,a_3,a_4) | 806
Tensor | 225673167 | 163879224 | 163879224 | 163924632 | g(a_3,a_2,i_3,a_4) | 1829
Top five by working-set peak. hw is the most useful memory column
for “what was the worst moment of this run”; result answers “what
is the biggest tensor produced” and alloc answers “what allocated
the most fresh memory”:
top_h = sorted(eval_rows, key=lambda r: r.hw, reverse=True)[:5]
st.print_table(top_h,
columns=["mode","time_ns","result","alloc","hw","label","line"])
mode | time_ns | result | alloc | hw | label | line
--------+----------+------------+------------+------------+-------------------------------------------------------------------------------+-----
Product | 72411125 | 1258096 | 1258096 | 3103424296 | g(a_1,a_2,a_4,a_5) * t(a_3,a_4,a_5,i_3,i_1,i_2) -> I(a_3,a_2,a_1,i_2,i_1,i_3) | 1532
Product | 50984709 | 314800 | 314800 | 3101229768 | g(a_2,a_3,a_4,a_5) * t(a_5,i_3) -> I(a_2,a_3,i_3,a_4) | 1840
Product | 54562167 | 314800 | 314800 | 3101205752 | g(a_1,a_2,a_3,a_4) * t(a_3,i_1) -> I(a_2,a_1,i_1,a_4) | 790
Product | 52750125 | 18864 | 18864 | 3100926968 | g(a_1,a_2,a_3,a_4) * t(a_3,a_4,i_1,i_2) -> I(a_2,a_1,i_2,i_1) | 690
Tensor | 13875 | 3100868920 | 3100868920 | 3100914328 | g(a_2,a_3,a_4,a_5) | 1838
Total time spent in the evaluator:
total_ns = sum(r.time_ns for r in eval_rows)
print(f"{total_ns / 1e9:.6f} s")
20.070343 s
Reading cache interactions¶
Each CacheRow has the fields:
modeone of
Store,Access,Release.keyhash identifying the cached intermediate.
life_curr/life_maxremaining and total number of accesses for this entry.
alivenumber of entries in the cache after this interaction.
entrybytes occupied by this entry (
0for a release).totalbytes occupied by all live entries combined.
labellabel of the node touching the cache.
linesource line in the trace file.
extrasdict for any unrecognized
key=valuefields (empty today).
First few entries:
st.print_table(cache_rows, max_rows=5)
mode | key | life_curr | life_max | alive | entry | total | label | line
--------+----------------------+-----------+----------+-------+-------+-------+-----------------------------------------------+-----
Store | 13613060045528791584 | 2 | 3 | 2 | 640 | 640 | g(i_2,i_3,a_2,a_3) * t(a_3,i_2) -> I(i_3,a_2) | 397
Access | 13613060045528791584 | 1 | 3 | 1 | 640 | 640 | g(i_2,i_3,a_2,a_3) * t(a_3,i_2) -> I(i_3,a_2) | 413
Store | 12860816093799665187 | 2 | 3 | 3 | 640 | 1280 | g(i_2,i_3,a_2,a_3) * t(a_3,i_3) -> I(i_2,a_2) | 425
Access | 12860816093799665187 | 1 | 3 | 2 | 640 | 1280 | g(i_2,i_3,a_2,a_3) * t(a_2,i_2) -> I(i_3,a_3) | 434
Release | 13613060045528791584 | 0 | 3 | 1 | 0 | 640 | g(i_2,i_3,a_2,a_3) * t(a_3,i_2) -> I(i_3,a_2) | 445
The first row stores an intermediate with three allowed accesses; the
Store event itself counts as an access, so life is 2/3. The
total column tracks the rolling cache footprint — sum of every live
entry — and the entry column the size of the single entry this row
refers to. Release rows show entry=0 because the entry is gone
by the time the row is recorded.
Peak cache footprint over the whole trace:
peak = max(cache_rows, key=lambda r: r.total)
st.print_table([peak],
columns=["mode","key","alive","entry","total","line"])
mode | key | alive | entry | total | line
------+-------------------+-------+-------+-------+-----
Store | 47978033300314424 | 11 | 4720 | 87856 | 2026
Reading terms¶
trace_term matches each Term | End with the most recent
unclosed Term | Begin and returns one TermRow per pair:
begin_linesource line of the
Term | Begin.end_linesource line of the
Term | End.exprexpression text shared by the matched pair.
First few:
st.print_table(term_rows, max_rows=5)
begin_line | end_line | expr
-----------+----------+--------------------------------------------------------------------
345 | 354 | -1 (g{i_2,a_1;a_2,a_3}:N-C-S * t{a_3;i_2}:N-C-S) * t{a_2;i_1}:N-C-S
355 | 364 | -1 (f{i_2;a_2}:N-C-S * t{a_2;i_1}:N-C-S) * t{a_1;i_2}:N-C-S
366 | 373 | 2 g{i_2,a_1;a_2,i_1}:N-C-S * t{a_2;i_2}:N-C-S
375 | 382 | -2 g{i_2,i_3;a_2,a_3}:N-C-S * t{a_1,a_2,a_3;i_3,i_2,i_1}:N-C-S
384 | 391 | 2 g{i_2,i_3;a_2,a_3}:N-C-S * t{a_1,a_2,a_3;i_1,i_2,i_3}:N-C-S
The line numbers cross-reference into eval_rows and cache_rows.
For example, every evaluation step inside the first term:
first = term_rows[0]
inside = [r for r in eval_rows
if first.begin_line <= r.line <= first.end_line]
st.print_table(inside,
columns=["mode","time_ns","result","alloc","hw","label","line"])
mode | time_ns | result | alloc | hw | label | line
---------+-----------+-----------+-----------+-----------+-----------------------------------------------+-----
Tensor | 412107750 | 163879224 | 163879224 | 163879224 | g(i_2,a_1,a_2,a_3) | 346
Tensor | 1291 | 640 | 640 | 640 | t(a_3,i_2) | 347
Product | 3307083 | 4992 | 4992 | 163884856 | g(i_2,a_1,a_2,a_3) * t(a_3,i_2) -> I(a_1,a_2) | 348
Tensor | 750 | 640 | 640 | 640 | t(a_2,i_1) | 349
Product | 68917 | 640 | 640 | 6272 | I(a_1,a_2) * t(a_2,i_1) -> I(a_1,i_1) | 350
Constant | 541 | 8 | 8 | 8 | -1 | 351
Product | 31625 | 640 | 640 | 1288 | I(a_1,i_1) * -1 -> I(a_1,i_1) | 352
Permute | 11625 | 640 | 640 | 1280 | I(a_1,i_1) | 353
The reverse lookup also works. Earlier we found that the peak cache footprint was reached at line 2026; the term responsible is:
owner = next(t for t in term_rows
if t.begin_line <= peak.line <= t.end_line)
st.print_table([owner], columns=["begin_line","end_line","expr"])
begin_line | end_line | expr
-----------+----------+--------------------------------------------------------------------------------------------------------
2022 | 2032 | 6 ((g{i_4,i_5;a_4,a_5}:N-C-S * t{a_5;i_2}:N-C-S) * t{a_1,a_4;i_1,i_3}:N-C-S) * t{a_2,a_3;i_5,i_4}:N-C-S