The most powerful way to use MPQC is to provide input in an object-oriented text format that can be understood by the KeyVal class. MPQC version 3 and older use a custom text format for input data. MPQC4 abandoned the old format in favor of these popular industry-standard formats used for data exchange:
- JSON,
- XML.
Another useful format that can be used is the INFO format understood by the Boost.PropertyTree library. It is similar to JSON, but is less verbose and supports comments and file inclusions.
The KeyVal class is a means in a C++ program to convert such JSON, XML, or INFO input to C++ primitive data (booleans, integers, reals, string) and user-defined objects. For example, the following JSON,
{ "type" : "Atom" "element" : "C", "r" : [0.0 1.0 -2.0], "isotope" : "13" }
can be used to construct an object of the following C++ class representing an atom:
KeyVal is more than just a structured text importer: it is a general-purpose component for representing "keyword=value" associations in a flexible form. and can be used in any C++ program that needs such functionality. KeyVal objects can be created from JSON-, XML-, or INFO-formatted text, or purely programmatically. For brevity, only the programmatic and JSON-based methods will be illustrated here.
- Assignment
- Constructing KeyVal
- Keyword Grouping and Paths
- Simple Object Construction
- Polymorphic Object Construction
- Array Specification
- Value Substitution
- The DescribedClass class
- Forced linkage of DescribedClass objects
- Deprecated keywords
Assignment
As an example of the use of KeyVal, consider the following JSON input:
{ "x_coordinate" : 1.0, "y_coordinate" : 2.0, "x_coordinate" : 3.0 }
Two assignements will be made. The keyword x_coordinate
will be associated with the value 1.0
and the keyword y_coordinate
will be assigned to 2.0
. The third line in the above input will have no effect since x_coordinate
is already assigned.
- Note
- The data specified by the third line is still internally kept by KeyVal, but cannot be accessed by the standard API. Use KeyVal::tree or KeyVal::top_tree to access the Boost.PropertyTree object to extract this data
Keyword Grouping and Paths
Lets imagine that we have a program which needs to read in the characteristics of animals. There are lots of animals so it might be nice to catagorize them by their family. Here is a sample JSON input file for such a program:
{ "reptile": { "trex": { "legs": 2, "extinct: true }, "python": { "legs": 0, "extinct": false } }, "bird": { "bald eagle": { "species": "Haliaeetus leucocephalus", "flys": true, "extinct": false } } }
This sample illustrates the use of keyword:value
assignments and the keyword grouping operators {
and }
. The hierarchy of keyword/value pairs forms a tree. Direct access to a location in this tree is possible by arranging keywords into composite keywords, or paths. The data in this example can be accessed using these paths:
reptile:trex:legs reptile:trex:extinct reptile:python:legs reptile:python:extinct bird:bald eagle:species bird:bald eagle:flys bird:bald eagle:extinct
The KeyVal::separator character, ': ' , occuring in these paths break the paths into individual keywords, or path segments. The sole purpose of this is to allow persons writing input files to group the input into easy to read sections (JSON refers to such sections as objects , not to be confused with C++ objects). In the above example there are two main sections, the reptile section and the bird section. The reptile section takes the form
"reptile" : { "keyword1": value1, "keyword2": value2, ... }
. Each of the keywords found in the reptile section has the reptile:
prefix attached to its path. Within each of these sections further keyword groupings can be used, as many and as deeply nested as the user wants.
- Note
- Keywords ("bald eagle") as well as string values ("Haliaeetus leucocephalus") can contain whitespace characters. The former was not possible in the old KeyVal input format.
Keyword grouping is also useful when you need many different programs to read from the same input file. Each program can be assigned its own unique section.
Constructing KeyVal
KeyVal objects can be directly created using JSON text input, or created programmatically. If the JSON text shown in Example 1 is stored in text file "example1.json"
then a KeyVal object is constructed and used as follows:
Of course, any basic_istream can be used (e.g. std::istringstream) and not just std::ifstream.
To test whether a keyword refers to a group of keywords or to a value use KeyVal::count that returns a std::optional<size_t>
; it contains a value if its argument refers to a keyword group:
KeyVal::count can also be used to examine other aggregates like arrays (see Array Specification ).
It is often convenient to construct KeyVal objects programmatically. This is particularly useful to construct programmatically classes that have constructors that take KeyVal (see The DescribedClass class ). A KeyVal object corresponding to the JSON input in Example 1 can be created as follows,
It is of course possible to erase the KeyVal entries:
So far so good. But why all this complexity? To understand that, let's consider how KeyVal helps to construct C++ objects, not just simple data.
Simple Object Construction
Consider the following class representing Birds:
To construct Bird
from KeyVal objects we need to add a constructor:
Then, given a KeyVal object constructed from JSON Example 1 or as in Example 2 we can construct a Bird
object representing a bald eagle as follows:
Note the use of the KeyVal::keyval method to construct a KeyVal object representing the subtree whose root is at "bird:bald eagle" of the base KeyVal object. The ability to refer to keyword sections in an existing KeyVal is crucial for constructing hierarchies of C++ objects easily.
- Note
- KeyVal objects have reference semantics, hence a copy of
kv
will refers to the same tree askv
and sharekv
's mutable internal data, such as the internal object registry used by KeyVal::object (see below). The same can be said about the new KeyVal object returned bykv.keyval()
. To create a deep copy of a KeyVal object use the KeyVal::clone method (note that the DescribedClass object registry is not copied, only the PropertyTree object).
Polymorphic Object Construction
JSON in Example 1 specifies three animals: 2 reptiles and 1 bird. Reptiles and birds are both animals, thus they have much in common, e.g. they can both be extinct. Although possible to express in other ways, this fact is often represented using inheritance:
This then allows to write a function that takes as its lone argument a reference or a pointer to an object of class Animal.
Now we would like the user to be able to give a list of Animal objects and have the program count how many extinct animals are given, as well as print out their attributes like scientific names, etc. For each object in the input the user should be able to specify the exact kind of Animal it represents: Bird or Reptile.
For example, in a chemistry context, the MPQC program needs to be able to perform geometry optimization given a Wavefunction object whose type is specified by the user (Hartree-Fock, CCSD, etc.). The input parser code could read in a type string corresponding to the Wavefunction type and then test it in a long series of if
statements against the known types. This unfortunately means that whenever a new Wavefunction is implemented the input parser code must be modified. This problem is solved by the KeyVal library.
Consider the following variant of Example 1 JSON:
{ "trex": { "type": "Reptile", "legs": 2, "extinct: true }, "python": { "type": "Reptile", "legs": 0 "extinct": false }, "bald eagle": { "type": "Bird", "species": "Haliaeetus leucocephalus", "flys": true "extinct": false } }
Note that we eliminated the reptile and bird sections so that all animals appear at the same level.
Skipping for now the modifications of the C++ classes necessary, see how easy it is to read in these animals as objects of Animal class:
KeyVal::object<T> returns std::shared_ptr<T>
, hence the pointer dereferencing. Also, the actual types of trex
, python
, and bald_eagle
are Reptile, Reptile, and Bird, so we can access their full properties:
Now let's see the actual implementation of Animal, Reptile, and Bird classes:
A few comments:
- Animal class is derived from DescribedClass to support polymorphic construction using KeyVal::object (see The DescribedClass class ).
- Smart pointers to objects created using KeyVal::object method are stored in the registry associated with that KeyVal's top tree. They will not be destroyed until all KeyVal's referring to that top tree are destroyed as well. This behavior can be bypassed by providing calling KeyVal::object with the second parameter set to
true
; the lifetime of the object then must be managed by the user. - If keyword
type
is not given, calling KeyVal::object<T> where classT
is abstract or not registered will throw an exception of type KeyVal::bad_input . Setting the third parameter of KeyVal::object totrue
will in such case cause KeyVal::object<T> to return a null pointer. - Every class that we want to construct from KeyVal polymorphically needs to be associated with a global unique identifier descriptor (GUID) string (typically, GUID is the class name without the namespace). This is done using the MPQC_CLASS_EXPORT_KEY() macro. MPQC_CLASS_EXPORT_KEY(Reptile) registers GUID "Reptile" with class
Reptile
. Then a keyword group that includes keywordtype
set to "Reptile" can be used to construct an object of classReptile
. Note that the MPQC_CLASS_EXPORT_KEY() statements must be placed in global namespace.- For a class
T
that lives in a namespaceN
it is usually desired to use "T" instead of "N::T" for its GUID; this can be done by using the MPQC_CLASS_EXPORT_KEY2() macro that accepts the string GUID and the class as its two arguments.
- For a class
- Note
- Animal could also be an abstract class; no additional complications arise.
It is convenient to be able to refer to existing objects of classes derived from DescribedClass in a KeyVal constructed programmatically. This example demonstrates this capability:
Note that if the type of the object to be constructed is known at compile time, it is not necessary to provide the type
keyword as long as KeyVal::object<T> is called with the exact type rather than the base class type:
Array Specification
Input for an array can be specified in several forms. In JSON there is standard support for arrays; alternatively, an array can be specified as a keyword group to make it possible to refer to elements of such arrays. Programmatic manipulation of KeyVal can assign and read arrays stored in standard sequence containers such as std::vector
, std::array
, std::list
, and (assign-only) std::initializer_list
.
JSON Standard Array Syntax
Consider how the attributes of animals in JSON Example 3 can be specified using JSON arrays:
{ "names": [ "trex", "python", "bald eagle" ], "legs": [ 2, 0, 2 ], "extinct": [ true, false, false ] }
Keyword names
equals an array of 3 strings, keyword legs
equals an array of 3 integers, etc. The following C++ code can access the data in the corresponding KeyVal:
KeyVal::count can be used to count the number of elements before reading the data:
It is also possible to specify an array of objects:
can be parsed as follows:
It is not yet possible to read in polymorphic objects this way.
- Note
- Elements of an array specified in standard syntax all have same (empty) key, i.e. elements of an array corresponding to path
path
are located at pathpath:
. This scheme for assigning keys is used by Boost.PropertyTree when reading JSON arrays. This makes it impossible to refer to individual elements since they all share a path, i.e.KeyVal::value<T>
("path:") return only the first element of the array at pathpath
.
JSON Extended Array Syntax
The extended array syntax specifies arrays as a group of keywords "0"
, "1", etc. For example, the arrays in JSON Example 3 can be specified in the extended syntax as
{ "names": { "0":"trex", "1":"python", "2":"bald eagle" }, "legs": { "0":2, "1":0, "2":2 }, "extinct": { "0":true, "1":false, "2":false } }
Although, this syntax is more verbose than the standard JSON array syntax, it can be used to specify arrays of objects and to refer to the elements of an array (see Value Substitution).
- Note
- Another advantage of the extended syntax is that by ensuring that all paths are unique and do not involve empty keywords it is compatible with XML.
Programmatic Handling of Arrays
The following C++ snippet demonstrates how to assign array values to keywords, and read the array values:
Value Substitution
Another powerful feature of KeyVal is the ability to refer to the same value/object multiple times in a KeyVal. The value substitution feature of KeyVal allows for multiple paths to refer to the same value/object. This is accomplished by setting keyword's value to a string that starts with $
followed by the (relative or absolute) path to the keyword whose value to be used. The variable substition feature is best illustrated by a simple JSON example:
{ "types": { "0":"Reptile", "1":"Reptile", "2":"Bird" }, "extinct": { "0":true, "1":false, "2":"$1" }, "trex": { "type": "$:types:0", "extinct": "$..:extinct:0" }, "anothertrex": "$trex", "bald eagle": { "type": "$:types:3", "extinct": "$..:extinct:3" } }
is (almost) the same as the following JSON:
{ "types": { "0":"Reptile", "1":"Reptile", "2":"Bird" }, "extinct": { "0":true, "1":false, "2":false }, "trex": { "type": "Reptile", "extinct": true }, "anothertrex": { "type": "Reptile", "extinct": true }, "bald eagle": { "type": "Bird", "extinct": false } }
The only difference between the examples with and without the value substition: in the first case constructing Reptile objects corresponding to keywords trex
and anothertrex
would produce pointers to the same object, whereas in the second case the two pointers would refer to two different Reptile objects. This feature is implemented by tracking all smart pointers produced by KeyVal::object . The object pointer registry is destroyed when the main KeyVal and all of its subobjects produced with KeyVal::keyval are destroyed. Therefore it is a good practice to delete KeyVal objects as soon as they are no longer needed. N.B. the object registry can be bypass by calling KeyVal::object with the second argument set to true
.
Value substitution can be used also with the programmatic KeyVal construction:
Value substitution can also be used when constructing polymorphic objects programmatically:
The DescribedClass class
To support polymorphic object construction (see Polymorphic Object Construction) the base class must be (publicly) derived from DescribedClass. This potentially adds the overhead of a vtable (DescribedClass has a virtual destructor) and can complicate the lifetime management of such objects because their smart pointers will survive in KeyVal object's registry at least through the lifetime of the owning KeyVal object.
To be able to construct a class T
from KeyVal objects do this:
- make DescribedClass a public base of
T
, or any of its bases, and register
T
with DescribedClass . The latter can be achieved in a number of ways, but the easiest is to add any of the following statements to a source file in the global scope:- if you want to use class name
T
as the type identifier in KeyVal input:MPQC_CLASS_EXPORT_KEY(T)
- if you want to use any other key
Key
as the type identifier in KeyVal input:MPQC_CLASS_EXPORT_KEY2(Key, T)
It is the easiest to add these statements to the .cpp file that defines
T
, but any other .cpp file will work.- if you want to use class name
- Note
- Both MPQC_CLASS_EXPORT_KEY() and MPQC_CLASS_EXPORT_KEY2() create a Global Unique ID (GUID) for the class using a mechanism similar to that of the Boost.Serialization library. (see Boost docs for more details). However, using these macros does not make the type usable with Boost.Serialization, you should use the corresponding BOOST_CLASS_EXPORT() and BOOST_CLASS_EXPORT2() macros (in practice you are likely to need to use BOOST_CLASS_EXPORT_KEY(), or BOOST_CLASS_EXPORT_KEY2(), and BOOST_CLASS_EXPORT_IMPLEMENT(): read the docs CAREFULLY as it is very easy to introduce subtle bugs by misusing the macros).
- As of Boost 1.62.0, the Boost.Serialization macros BOOST_CLASS_EXPORT_KEY(), BOOST_CLASS_EXPORT_KEY2() and BOOST_CLASS_EXPORT_IMPLEMENT() are not variadic and should not work for template classes with 2 or more template arguments. MPQC provides macros MPQC_BOOST_CLASS_EXPORT_KEY2() and MPQC_BOOST_CLASS_EXPORT_IMPLEMENT() that should be used in their place for classes with 2 or more template arguments.
-
If
T
is a template class, you must register each instance of this class you want to construct from KeyVal.
Forced linkage of DescribedClass objects
Executables that initialize DescribedClass-derived objects using the KeyVal library need a mechanism for forcing the linkage of the corresponding libraries. For example, the MPQC executable program knows that user will want to construct and use an object of Wavefunction class, but does not know GUID of which Wavefunction class the user specified in the input file until runtime. The easiest way to force the linkage of all Wavefunction classes that the executable wants to be able to use is to explicitly instantiate an object of every Wavefunction class. But this would create a massive compile dependency between all template classes derived from Wavefunction: changing any of them will force recompilation of all of them when recompiling the main executable. Class ForceLink provides a workaround to this problem. To force linkage of template class T
into an executable the user needs to do the following:
- add the following code to the executable: #include "mpqc/util/keyval/forcelink.h"template <typename U> class T; // note the forward declaration! no need to include the definition of T// define 1 ForceLink object per each instantiation of Tmpqc::detail::ForceLink<T<double>> link_T_double;mpqc::detail::ForceLink<T<int>> link_T_int;
- instantate and export instances of class
T
in a source file of the library that definesT
:#include "mpqc/util/keyval/forcelink.h"MPQC_FORCELINK_KEYVAL_CTOR(T<double>);MPQC_FORCELINK_KEYVAL_CTOR(T<int>);
N.B. For convenience, we provide macros MPQC_CLASS_EXPORT() and MPQC_CLASS_EXPORT2() that combine the DescribedClass registration macros MPQC_CLASS_EXPORT_KEY() and MPQC_CLASS_EXPORT_KEY2(), respectively, with the MPQC_FORCELINK_KEYVAL_CTOR() macro.
Deprecated keywords
It is useful to be able to deprecate some keywords to be able to evolve input format smoothly. For example, we may want to rename an input keyword, but do not want to obsolete the keyword until some future software release. By obsoleting the keyword we can nudge the users to start using new keywords. For example, if we want to rename keywords "extinct"
in JSON Example 1 to "is_extinct"
, we need to modify the KeyVal constructor of class Bird
(C++ Example 4) to initialize field extinct
as
This code will:
- query keyword
is_extinct
, if not found - query deprecated keyword
extinct
, if not found - initialize field
extinct
tofalse
If the deprecated keyword is read (not just queried), a message will be added by default to std::cerr
. It is possible to change the default to throwing an exception (specifically, KeyVal::bad_input
) by calling KeyVal::throw_if_deprecated_path(true)
.
To deprecate a keyword without specifying a replacement the primary keyword can be empty:
This code will:
- query deprecated keyword
extinct
, if not found - initialize field
extinct
tofalse