Evaluation Trace Analysis Using Nushell¶
Follow the official instructions to install Nushell.
Save this file
in the same directory
where the trace output file is present. We will call this directory
‘working directory’.
This nu script exports the functions: trace eval
, trace cache
,
and trace term
.
If you are using a different shell (eg. bash, zsh, etc.), launch the
nushell from the working directory. Make sure the executable nu
is
on your path.
nu
Then import the functions from the script.
use sequant.nu *
Now we analyze the trace from this file
in steps.
This trace was generated by mpqc4
CCSD calculation for He16
chain using
cc-pVTZ
basis. The trace is for the very first iteration. The mpqc output file without
the SeQuant evaluation trace is here
.
A trace line produced by the evaluator is one of the three kinds: an evaluation, a cache interaction, and a term indicator when the evaluation of a term starts or ends.
Analysis of evaluation steps¶
let eval = trace eval trace.txt
The $eval
variable is a table that contains trace from evaluation
steps.
$eval | columns
╭───┬────────╮
│ 0 │ tag0 │
│ 1 │ tag1 │
│ 2 │ time │
│ 3 │ memory │
│ 4 │ annot │
│ 5 │ line │
╰───┴────────╯
Let’s look at the unique entries of the tag1
column:
$eval | get tag1 | uniq
╭───┬────────────╮
│ 0 │ Tensor │
│ 1 │ Permute │
│ 2 │ Product │
│ 3 │ Constant │
│ 4 │ SumInplace │
│ 5 │ Symmetrize │
╰───┴────────────╯
Now let’s look at the first few entries. Note that the last column line
is the line number
in the output file (trace.txt
in our case).
$eval | first 10
╭───┬──────┬────────────┬───────────────────┬─────────┬───────────────────────────────────────┬──────╮
│ # │ tag0 │ tag1 │ time │ memory │ annot │ line │
├───┼──────┼────────────┼───────────────────┼─────────┼───────────────────────────────────────┼──────┤
│ 0 │ Eval │ Tensor │ 6ms 3µs 295ns │ 36.7 kB │ f(a_1,i_1) │ 2 │
│ 1 │ Eval │ Permute │ 195µs 598ns │ 73.5 kB │ f(a_1,i_1) │ 3 │
│ 2 │ Eval │ Tensor │ 119ms 317µs 395ns │ 10.7 kB │ f(i_2,i_1) │ 6 │
│ 3 │ Eval │ Tensor │ 1µs 613ns │ 36.7 kB │ t(a_1,i_2) │ 7 │
│ 4 │ Eval │ Product │ 6ms 789µs 630ns │ 84.2 kB │ f(i_2,i_1) * t(a_1,i_2) -> I(a_1,i_1) │ 8 │
│ 5 │ Eval │ Constant │ 3µs 296ns │ 8 B │ {{{-1}}} │ 9 │
│ 6 │ Eval │ Product │ 199µs 284ns │ 73.5 kB │ I(a_1,i_1) * {{{-1}}} -> I(a_1,i_1) │ 10 │
│ 7 │ Eval │ Permute │ 755µs 948ns │ 73.5 kB │ I(a_1,i_1) │ 11 │
│ 8 │ Eval │ SumInplace │ 3ms 32µs 326ns │ 73.5 kB │ I(a_1,i_1) │ 13 │
│ 9 │ Eval │ Tensor │ 6ms 265µs 660ns │ 36.7 kB │ f(i_2,a_2) │ 15 │
╰───┴──────┴────────────┴───────────────────┴─────────┴───────────────────────────────────────┴──────╯
Listing top five time-consuming steps:
$eval | sort-by time --reverse | first 5
╭───┬──────┬─────────┬────────────────────────┬─────────┬───────────────────────────────────────────────────────────────┬──────╮
│ # │ tag0 │ tag1 │ time │ memory │ annot │ line │
├───┼──────┼─────────┼────────────────────────┼─────────┼───────────────────────────────────────────────────────────────┼──────┤
│ 0 │ Eval │ Tensor │ 2sec 732ms 827µs 727ns │ 14.9 GB │ g(a_1,a_2,a_3,a_4) │ 260 │
│ 1 │ Eval │ Product │ 1sec 302ms 417µs 788ns │ 15.1 GB │ g(a_1,a_2,a_3,a_4) * t(a_3,a_4,i_1,i_2) -> I(a_2,a_1,i_2,i_1) │ 262 │
│ 2 │ Eval │ Product │ 1sec 95ms 839µs 314ns │ 16.1 GB │ g(a_1,a_2,a_3,a_4) * t(a_3,i_1) -> I(a_2,a_1,i_1,a_4) │ 382 │
│ 3 │ Eval │ Tensor │ 688ms 998µs 813ns │ 88.7 MB │ g(i_2,a_1,i_1,a_2) │ 24 │
│ 4 │ Eval │ Tensor │ 280ms 151µs 443ns │ 1.1 GB │ g(i_2,a_1,a_2,a_3) │ 51 │
╰───┴──────┴─────────┴────────────────────────┴─────────┴───────────────────────────────────────────────────────────────┴──────╯
Listing top five memory-consuming steps:
$eval | sort-by memory --reverse | first 5
╭───┬──────┬─────────┬────────────────────────┬─────────┬───────────────────────────────────────────────────────────────┬──────╮
│ # │ tag0 │ tag1 │ time │ memory │ annot │ line │
├───┼──────┼─────────┼────────────────────────┼─────────┼───────────────────────────────────────────────────────────────┼──────┤
│ 0 │ Eval │ Product │ 1sec 95ms 839µs 314ns │ 16.1 GB │ g(a_1,a_2,a_3,a_4) * t(a_3,i_1) -> I(a_2,a_1,i_1,a_4) │ 382 │
│ 1 │ Eval │ Product │ 1sec 302ms 417µs 788ns │ 15.1 GB │ g(a_1,a_2,a_3,a_4) * t(a_3,a_4,i_1,i_2) -> I(a_2,a_1,i_2,i_1) │ 262 │
│ 2 │ Eval │ Tensor │ 25µs 840ns │ 14.9 GB │ g(a_1,a_2,a_3,a_4) │ 380 │
│ 3 │ Eval │ Tensor │ 2sec 732ms 827µs 727ns │ 14.9 GB │ g(a_1,a_2,a_3,a_4) │ 260 │
│ 4 │ Eval │ Product │ 188ms 365µs 620ns │ 1.2 GB │ g(i_3,a_1,a_3,a_4) * t(a_3,a_4,i_2,i_1) -> I(i_3,a_1,i_2,i_1) │ 466 │
╰───┴──────┴─────────┴────────────────────────┴─────────┴───────────────────────────────────────────────────────────────┴──────╯
The sum total of the time column:
$eval | get time | math sum
16sec 693ms 188µs 413ns
Analysis of cache interaction¶
Parsing the cache trace:
let cache = trace cache trace.txt
Let’s take a look at the first few lines:
$cache | first 5
╭───┬───────┬─────────┬──────────────────────┬──────┬────────┬─────────┬──────╮
│ # │ tag0 │ tag1 │ key │ life │ nalive │ memory │ line │
├───┼───────┼─────────┼──────────────────────┼──────┼────────┼─────────┼──────┤
│ 0 │ Cache │ Store │ 6443766742777296727 │ 2/3 │ 2 │ 36.7 kB │ 95 │
│ 1 │ Cache │ Store │ 14406268377898944426 │ 1/2 │ 3 │ 6.9 MB │ 116 │
│ 2 │ Cache │ Store │ 4354995230928331310 │ 2/3 │ 5 │ 7.0 MB │ 137 │
│ 3 │ Cache │ Access │ 4354995230928331310 │ 1/3 │ 4 │ 7.0 MB │ 155 │
│ 4 │ Cache │ Release │ 14406268377898944426 │ 0/2 │ 3 │ 73.5 kB │ 195 │
╰───┴───────┴─────────┴──────────────────────┴──────┴────────┴─────────┴──────╯
An entry is identified by the key
in the cache. From the table we
see that the first cache interaction was to store an intermediate. Turns out the
intermediate is allowed only three accesses before it is released from the cache.
Since we stored, and immediately accessed it as part of the evaluation step, we see the
life of this intermediate is 2/3
. Because no other entries have been added to the
cache at this point the total number of alive entries is 2
as seen from the
nalive
column. The memory used by the cache which is the sum of the alive entries
is given in the memory
column.
Next, another intermediate is stored in the cache. Since there are two alive entries
already, this brings the nalive
to three. The cache memory reaches to 6.9 MB,
implying the intermediate that was just stored is ~6.8 MB in size. We can see from
the last row the same intermediate is released.
We can see what was the peak memory use by the cache:
$cache | sort-by memory --reverse | take 1
╭───┬───────┬───────┬─────────────────────┬──────┬────────┬──────────┬──────╮
│ # │ tag0 │ tag1 │ key │ life │ nalive │ memory │ line │
├───┼───────┼───────┼─────────────────────┼──────┼────────┼──────────┼──────┤
│ 0 │ Cache │ Store │ 1502144429761737779 │ 1/2 │ 5 │ 267.4 MB │ 575 │
╰───┴───────┴───────┴─────────────────────┴──────┴────────┴──────────┴──────╯
Analysis of the terms¶
We can see when the evaluation of a term began and ended. Let’s start by parsing the terms related trace.
let term = trace term trace.txt
Let’s take a look at the first few entries:
$term | first 5
╭───┬──────┬───────┬─────┬────────────────────────────────────────────────╮
│ # │ tag0 │ begin │ end │ expr │
├───┼──────┼───────┼─────┼────────────────────────────────────────────────┤
│ 0 │ Term │ 1 │ 4 │ f{a_1;i_1}:N-C-S │
│ 1 │ Term │ 5 │ 12 │ -1 f{i_2;i_1}:N-C-S * t{a_1;i_2}:N-C-S │
│ 2 │ Term │ 14 │ 21 │ 2 f{i_2;a_2}:N-C-S * t{a_1,a_2;i_1,i_2}:N-C-S │
│ 3 │ Term │ 23 │ 30 │ -1 g{i_2,a_1;i_1,a_2}:N-C-S * t{a_2;i_2}:N-C-S │
│ 4 │ Term │ 32 │ 39 │ 2 g{i_2,a_1;a_2,i_1}:N-C-S * t{a_2;i_2}:N-C-S │
╰───┴──────┴───────┴─────┴────────────────────────────────────────────────╯
From this table we can identify which term did the particular evaluation step,
or the cache interaction step belonged to. The expr
column has the term printed out
as a string that can be parsed by SeQuant. For example, the second last entry
tells that the trace produced between the line numbers 23 and 30 belongs to
the term -1 g{i_2,a_1;i_1,a_2}:N-C-S * t{a_2;i_2}:N-C-S
.
The evaluation steps that belong to this term are:
$eval | where line in 23..30
╭───┬──────┬──────────┬───────────────────┬─────────┬───────────────────────────────────────────────┬──────╮
│ # │ tag0 │ tag1 │ time │ memory │ annot │ line │
├───┼──────┼──────────┼───────────────────┼─────────┼───────────────────────────────────────────────┼──────┤
│ 0 │ Eval │ Tensor │ 688ms 998µs 813ns │ 88.7 MB │ g(i_2,a_1,i_1,a_2) │ 24 │
│ 1 │ Eval │ Tensor │ 2µs 424ns │ 36.7 kB │ t(a_2,i_2) │ 25 │
│ 2 │ Eval │ Product │ 31ms 669µs 619ns │ 88.8 MB │ g(i_2,a_1,i_1,a_2) * t(a_2,i_2) -> I(a_1,i_1) │ 26 │
│ 3 │ Eval │ Constant │ 1µs 553ns │ 8 B │ {{{-1}}} │ 27 │
│ 4 │ Eval │ Product │ 181µs 701ns │ 73.5 kB │ I(a_1,i_1) * {{{-1}}} -> I(a_1,i_1) │ 28 │
│ 5 │ Eval │ Permute │ 1ms 18µs 213ns │ 73.5 kB │ I(a_1,i_1) │ 29 │
╰───┴──────┴──────────┴───────────────────┴─────────┴───────────────────────────────────────────────┴──────╯
We can also go in reverse. We saw from the result of $cache | first 5
that an
intermediate of ~6.8 MB size is stored to the cache at line number 116. We can find
out what term produced this intermediate.
$term | where begin < 116 | where end > 116
╭───┬──────┬───────┬─────┬──────────────────────────────────────────────────────────────────────────╮
│ # │ tag0 │ begin │ end │ expr │
├───┼──────┼───────┼─────┼──────────────────────────────────────────────────────────────────────────┤
│ 0 │ Term │ 112 │ 120 │ (g{i_2,i_3;a_2,a_3}:N-C-S * t{a_2;i_1}:N-C-S) * t{a_1,a_3;i_3,i_2}:N-C-S │
╰───┴──────┴───────┴─────┴──────────────────────────────────────────────────────────────────────────╯