Evaluation Trace Analysis Using Nushell

Follow the official instructions to install Nushell.

Save this file in the same directory where the trace output file is present. We will call this directory ‘working directory’.

This nu script exports the functions: trace eval, trace cache, and trace term.

If you are using a different shell (eg. bash, zsh, etc.), launch the nushell from the working directory. Make sure the executable nu is on your path.

nu

Then import the functions from the script.

use sequant.nu *

Now we analyze the trace from this file in steps. This trace was generated by mpqc4 CCSD calculation for He16 chain using cc-pVTZ basis. The trace is for the very first iteration. The mpqc output file without the SeQuant evaluation trace is here.

A trace line produced by the evaluator is one of the three kinds: an evaluation, a cache interaction, and a term indicator when the evaluation of a term starts or ends.

Analysis of evaluation steps

let eval = trace eval trace.txt

The $eval variable is a table that contains trace from evaluation steps.

$eval | columns
╭───┬────────╮
│ 0 │ tag0   │
│ 1 │ tag1   │
│ 2 │ time   │
│ 3 │ memory │
│ 4 │ annot  │
│ 5 │ line   │
╰───┴────────╯

Let’s look at the unique entries of the tag1 column:

$eval | get tag1 | uniq
╭───┬────────────╮
│ 0 │ Tensor     │
│ 1 │ Permute    │
│ 2 │ Product    │
│ 3 │ Constant   │
│ 4 │ SumInplace │
│ 5 │ Symmetrize │
╰───┴────────────╯

Now let’s look at the first few entries. Note that the last column line is the line number in the output file (trace.txt in our case).

$eval | first 10
╭───┬──────┬────────────┬───────────────────┬─────────┬───────────────────────────────────────┬──────╮
│ # │ tag0 │    tag1    │       time        │ memory  │                 annot                 │ line │
├───┼──────┼────────────┼───────────────────┼─────────┼───────────────────────────────────────┼──────┤
│ 0 │ Eval │ Tensor     │     6ms 3µs 295ns │ 36.7 kB │ f(a_1,i_1)                            │    2 │
│ 1 │ Eval │ Permute    │       195µs 598ns │ 73.5 kB │ f(a_1,i_1)                            │    3 │
│ 2 │ Eval │ Tensor     │ 119ms 317µs 395ns │ 10.7 kB │ f(i_2,i_1)                            │    6 │
│ 3 │ Eval │ Tensor     │         1µs 613ns │ 36.7 kB │ t(a_1,i_2)                            │    7 │
│ 4 │ Eval │ Product    │   6ms 789µs 630ns │ 84.2 kB │ f(i_2,i_1) * t(a_1,i_2) -> I(a_1,i_1) │    8 │
│ 5 │ Eval │ Constant   │         3µs 296ns │     8 B │ {{{-1}}}                              │    9 │
│ 6 │ Eval │ Product    │       199µs 284ns │ 73.5 kB │ I(a_1,i_1) * {{{-1}}} -> I(a_1,i_1)   │   10 │
│ 7 │ Eval │ Permute    │       755µs 948ns │ 73.5 kB │ I(a_1,i_1)                            │   11 │
│ 8 │ Eval │ SumInplace │    3ms 32µs 326ns │ 73.5 kB │ I(a_1,i_1)                            │   13 │
│ 9 │ Eval │ Tensor     │   6ms 265µs 660ns │ 36.7 kB │ f(i_2,a_2)                            │   15 │
╰───┴──────┴────────────┴───────────────────┴─────────┴───────────────────────────────────────┴──────╯

Listing top five time-consuming steps:

$eval | sort-by time --reverse | first 5
╭───┬──────┬─────────┬────────────────────────┬─────────┬───────────────────────────────────────────────────────────────┬──────╮
│ # │ tag0 │  tag1   │          time          │ memory  │                             annot                             │ line │
├───┼──────┼─────────┼────────────────────────┼─────────┼───────────────────────────────────────────────────────────────┼──────┤
│ 0 │ Eval │ Tensor  │ 2sec 732ms 827µs 727ns │ 14.9 GB │ g(a_1,a_2,a_3,a_4)                                            │  260 │
│ 1 │ Eval │ Product │ 1sec 302ms 417µs 788ns │ 15.1 GB │ g(a_1,a_2,a_3,a_4) * t(a_3,a_4,i_1,i_2) -> I(a_2,a_1,i_2,i_1) │  262 │
│ 2 │ Eval │ Product │  1sec 95ms 839µs 314ns │ 16.1 GB │ g(a_1,a_2,a_3,a_4) * t(a_3,i_1) -> I(a_2,a_1,i_1,a_4)         │  382 │
│ 3 │ Eval │ Tensor  │      688ms 998µs 813ns │ 88.7 MB │ g(i_2,a_1,i_1,a_2)                                            │   24 │
│ 4 │ Eval │ Tensor  │      280ms 151µs 443ns │  1.1 GB │ g(i_2,a_1,a_2,a_3)                                            │   51 │
╰───┴──────┴─────────┴────────────────────────┴─────────┴───────────────────────────────────────────────────────────────┴──────╯

Listing top five memory-consuming steps:

$eval | sort-by memory --reverse | first 5
╭───┬──────┬─────────┬────────────────────────┬─────────┬───────────────────────────────────────────────────────────────┬──────╮
│ # │ tag0 │  tag1   │          time          │ memory  │                             annot                             │ line │
├───┼──────┼─────────┼────────────────────────┼─────────┼───────────────────────────────────────────────────────────────┼──────┤
│ 0 │ Eval │ Product │  1sec 95ms 839µs 314ns │ 16.1 GB │ g(a_1,a_2,a_3,a_4) * t(a_3,i_1) -> I(a_2,a_1,i_1,a_4)         │  382 │
│ 1 │ Eval │ Product │ 1sec 302ms 417µs 788ns │ 15.1 GB │ g(a_1,a_2,a_3,a_4) * t(a_3,a_4,i_1,i_2) -> I(a_2,a_1,i_2,i_1) │  262 │
│ 2 │ Eval │ Tensor  │             25µs 840ns │ 14.9 GB │ g(a_1,a_2,a_3,a_4)                                            │  380 │
│ 3 │ Eval │ Tensor  │ 2sec 732ms 827µs 727ns │ 14.9 GB │ g(a_1,a_2,a_3,a_4)                                            │  260 │
│ 4 │ Eval │ Product │      188ms 365µs 620ns │  1.2 GB │ g(i_3,a_1,a_3,a_4) * t(a_3,a_4,i_2,i_1) -> I(i_3,a_1,i_2,i_1) │  466 │
╰───┴──────┴─────────┴────────────────────────┴─────────┴───────────────────────────────────────────────────────────────┴──────╯

The sum total of the time column:

$eval | get time | math sum
16sec 693ms 188µs 413ns

Analysis of cache interaction

Parsing the cache trace:

let cache = trace cache trace.txt

Let’s take a look at the first few lines:

$cache | first 5
╭───┬───────┬─────────┬──────────────────────┬──────┬────────┬─────────┬──────╮
│ # │ tag0  │  tag1   │         key          │ life │ nalive │ memory  │ line │
├───┼───────┼─────────┼──────────────────────┼──────┼────────┼─────────┼──────┤
│ 0 │ Cache │ Store   │ 6443766742777296727  │ 2/3  │      2 │ 36.7 kB │   95 │
│ 1 │ Cache │ Store   │ 14406268377898944426 │ 1/2  │      3 │  6.9 MB │  116 │
│ 2 │ Cache │ Store   │ 4354995230928331310  │ 2/3  │      5 │  7.0 MB │  137 │
│ 3 │ Cache │ Access  │ 4354995230928331310  │ 1/3  │      4 │  7.0 MB │  155 │
│ 4 │ Cache │ Release │ 14406268377898944426 │ 0/2  │      3 │ 73.5 kB │  195 │
╰───┴───────┴─────────┴──────────────────────┴──────┴────────┴─────────┴──────╯

An entry is identified by the key in the cache. From the table we see that the first cache interaction was to store an intermediate. Turns out the intermediate is allowed only three accesses before it is released from the cache. Since we stored, and immediately accessed it as part of the evaluation step, we see the life of this intermediate is 2/3. Because no other entries have been added to the cache at this point the total number of alive entries is 2 as seen from the nalive column. The memory used by the cache which is the sum of the alive entries is given in the memory column.

Next, another intermediate is stored in the cache. Since there are two alive entries already, this brings the nalive to three. The cache memory reaches to 6.9 MB, implying the intermediate that was just stored is ~6.8 MB in size. We can see from the last row the same intermediate is released.

We can see what was the peak memory use by the cache:

$cache | sort-by memory --reverse | take 1
╭───┬───────┬───────┬─────────────────────┬──────┬────────┬──────────┬──────╮
│ # │ tag0  │ tag1  │         key         │ life │ nalive │  memory  │ line │
├───┼───────┼───────┼─────────────────────┼──────┼────────┼──────────┼──────┤
│ 0 │ Cache │ Store │ 1502144429761737779 │ 1/2  │      5 │ 267.4 MB │  575 │
╰───┴───────┴───────┴─────────────────────┴──────┴────────┴──────────┴──────╯

Analysis of the terms

We can see when the evaluation of a term began and ended. Let’s start by parsing the terms related trace.

let term = trace term trace.txt

Let’s take a look at the first few entries:

$term | first 5
╭───┬──────┬───────┬─────┬────────────────────────────────────────────────╮
│ # │ tag0 │ begin │ end │                      expr                      │
├───┼──────┼───────┼─────┼────────────────────────────────────────────────┤
│ 0 │ Term │     1 │   4 │ f{a_1;i_1}:N-C-S                               │
│ 1 │ Term │     5 │  12 │ -1 f{i_2;i_1}:N-C-S * t{a_1;i_2}:N-C-S         │
│ 2 │ Term │    14 │  21 │ 2 f{i_2;a_2}:N-C-S * t{a_1,a_2;i_1,i_2}:N-C-S  │
│ 3 │ Term │    23 │  30 │ -1 g{i_2,a_1;i_1,a_2}:N-C-S * t{a_2;i_2}:N-C-S │
│ 4 │ Term │    32 │  39 │ 2 g{i_2,a_1;a_2,i_1}:N-C-S * t{a_2;i_2}:N-C-S  │
╰───┴──────┴───────┴─────┴────────────────────────────────────────────────╯

From this table we can identify which term did the particular evaluation step, or the cache interaction step belonged to. The expr column has the term printed out as a string that can be parsed by SeQuant. For example, the second last entry tells that the trace produced between the line numbers 23 and 30 belongs to the term -1 g{i_2,a_1;i_1,a_2}:N-C-S * t{a_2;i_2}:N-C-S.

The evaluation steps that belong to this term are:

$eval | where line in 23..30
╭───┬──────┬──────────┬───────────────────┬─────────┬───────────────────────────────────────────────┬──────╮
│ # │ tag0 │   tag1   │       time        │ memory  │                     annot                     │ line │
├───┼──────┼──────────┼───────────────────┼─────────┼───────────────────────────────────────────────┼──────┤
│ 0 │ Eval │ Tensor   │ 688ms 998µs 813ns │ 88.7 MB │ g(i_2,a_1,i_1,a_2)                            │   24 │
│ 1 │ Eval │ Tensor   │         2µs 424ns │ 36.7 kB │ t(a_2,i_2)                                    │   25 │
│ 2 │ Eval │ Product  │  31ms 669µs 619ns │ 88.8 MB │ g(i_2,a_1,i_1,a_2) * t(a_2,i_2) -> I(a_1,i_1) │   26 │
│ 3 │ Eval │ Constant │         1µs 553ns │     8 B │ {{{-1}}}                                      │   27 │
│ 4 │ Eval │ Product  │       181µs 701ns │ 73.5 kB │ I(a_1,i_1) * {{{-1}}} -> I(a_1,i_1)           │   28 │
│ 5 │ Eval │ Permute  │    1ms 18µs 213ns │ 73.5 kB │ I(a_1,i_1)                                    │   29 │
╰───┴──────┴──────────┴───────────────────┴─────────┴───────────────────────────────────────────────┴──────╯

We can also go in reverse. We saw from the result of $cache | first 5 that an intermediate of ~6.8 MB size is stored to the cache at line number 116. We can find out what term produced this intermediate.

$term | where begin < 116 | where end > 116
╭───┬──────┬───────┬─────┬──────────────────────────────────────────────────────────────────────────╮
│ # │ tag0 │ begin │ end │                                   expr                                   │
├───┼──────┼───────┼─────┼──────────────────────────────────────────────────────────────────────────┤
│ 0 │ Term │   112 │ 120 │ (g{i_2,i_3;a_2,a_3}:N-C-S * t{a_2;i_1}:N-C-S) * t{a_1,a_3;i_3,i_2}:N-C-S │
╰───┴──────┴───────┴─────┴──────────────────────────────────────────────────────────────────────────╯