`bit::vector` — String Encodings

We have methods to encode a bit-vector in various string formats.

std::string
to_string(std::string_view pre  = "",
          std::string_view post = "",
          std::string_view sep  = "",
1          char off='0', char on='1') const;

std::string
2to_bit_order(char off='0', char on='1') const;

std::string
3to_pretty_string(char off='0', char on='1') const;

std::string
4to_hex() const;

1: Returns a binary-string representation using the given characters for set and unset elements.
2: Returns a binary-string representation in bit-order using the given characters for set and unset elements.
3: Returns a formatted representation e.g. [1 1 0 1 0 1].
4: Returns a hex-string representation.

Method Arguments

Argument	Description
`pre`	Prefix for the return string — not used for hex-strings or bit-ordered strings.
`post`	Postfix for the return string — not used for hex-strings or bit-ordered strings.
`sep`	Separator between elements — not used for hex-strings or bit-ordered strings.
`on`, `off`	The characters used for set and unset elements — not used for hex-strings.

By default, v.to_string() will return something like 100101 but by setting the pre, post, and sep, parameters, one can get [1, 0, 0, 1, 0, 1].

For that same bit-vector v.to_bit_order() will return 101001, i.e., the low-order bit on the right and the high-order element on the left!

Character Encodings

There are two principal ways we can encode a bit-vector as a string:

Binary String Encodings

The straightforward character encoding for a bit-vector is a binary string containing just 0’s and 1’s, e.g., “10101”. Each character in a binary string represents a single element in the bit-vector.

By default, we encode bit-vectors to binary strings in vector order \(v_0 v_1 \cdots v_{n-1}\). However, methods that read or write binary strings typically have an extra boolean argument, bit_order. This argument always defaults to false, but if present and set to true, then the binary string will encode the bit-vector in bit-order where the least significant bit v₀ is on the right, so \(v_{n-1} \cdots v_1 v_0\). Hex-strings ignore the bit_order parameter.

Hex String Encodings

The other supported encoding for bit-vectors is a compact hex-type string containing just the 16 hex characters 0123456789ABCDEF. For example, the string “3ED02”. We allow for hex strings with an optional prefix “0x” or “0X,” e.g. “0x3ED02”.

Hex strings are not affected by a bit_order argument — we ignore that argument.

Each hex character naturally translates to four elements in a bit::vector. The hex string 0x0 is equivalent to the binary string 0000, and so on, up to string 0xF, which is the same as the binary 1111.

The hex pair 0x0F will be interpreted in the vector as 00001111. Of course, this is the advantage of hex. It is a more compact format that occupies a quarter of the space needed to write out the equivalent binary string.

However, what happens if you want to encode a vector whose size is not a multiple of 4? We handle that by allowing the final character in the string to have a base that is not 16. To accomplish that, we allow for an optional suffix, which must be one of _2, _4, or _8. If present, the prefix gives the base for just the preceding character in the otherwise hex-based string. If there is no suffix, the final character is assumed to be hex like all the others.

So the string 0x1 (no suffix, so the last character is the default hex base 16) is equivalent to 0001. On the other hand, the string 0x1_8 (the last character is base 8) is equivalent to 001. Similarly, the string 0x1_4 (the last character is base 4) is equivalent to 01, and finally, the string 0x1_2 (the previous character is base 2) is comparable to 1

In the string 0x3ED01_8, the first four characters, 3, E, D, and 0, are interpreted as hex values, and each will consume four slots in the vector. However, that final 1_8 is parsed as an octal 1, which takes up three slots 001. Therefore, this vector has size 19 (i.e., 4*4 + 3).

If the suffix is present, the final character must fit inside the base given by that suffix. The string 0x3_8 is OK, but trying to parse 0x3_2 will result in a std::nullopt return value because the final character is not either 0 or 1, which are the only valid options for something that is supposed to be base 2.

Example: To Binary

#include <bit/bit.h>
int main()
{
1    bit::vector v(16, [](size_t k) { return (k + 1) % 2; });
2    std::cout << "v:    " << v.to_string() << '\n';
3    std::cout << "v:    " << v.to_bit_order() << '\n';
4    std::cout << "v:    " << v.to_bit_order('.', '-') << '\n';
5    std::cout << "v:    " << v.to_pretty_string() << '\n';
6    std::cout << "v:    " << v.to_string("{", "}", ", ") << '\n';
}

1: v has all the even elements set to 1.
2: Printing v in vector_order using the default 0’s and 1’s for the element values. v₀ is on the left.
3: Printing v in bit_order using the default 0’s and 1’s for the element values. v₀ is on the right.
4: Printing v in bit_order using dots and dashes for the element values. v₀ is on the right.
5: Printing v in a more formatted, element-by-element style.
6: Printing v in a custom formatted style.

Output

v:    1010101010101010
v:    0101010101010101
v:    .-.-.-.-.-.-.-.-
v:    [1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0]
v:    {1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0}

Example: To Hex

#include <bit/bit.h>
int main()
{
    auto v5 = bit::vector<>::ones(5);
    auto v6 = bit::vector<>::ones(6);
    auto v7 = bit::vector<>::ones(7);
    auto v8 = bit::vector<>::ones(8);
    auto v9 = bit::vector<>::ones(9);
    std::cout << "v5: " << v5.to_string() << "\t hex: " << v5.to_hex() << '\n';
    std::cout << "v6: " << v6.to_string() << "\t hex: " << v6.to_hex() << '\n';
    std::cout << "v7: " << v7.to_string() << "\t hex: " << v7.to_hex() << '\n';
    std::cout << "v8: " << v8.to_string() << "\t hex: " << v8.to_hex() << '\n';
    std::cout << "v9: " << v9.to_string() << "\t hex: " << v9.to_hex() << '\n';
}

Output

v5: 11111        hex: 0xF1_2
v6: 111111       hex: 0xF3_4
v7: 1111111      hex: 0xF7_8
v8: 11111111     hex: 0xFF
v9: 111111111    hex: 0xFF1_2

Example: From Hex

#include <bit/bit.h>
int main()
{
1    auto v5 = bit::vector<>::random(5);
    auto v6 = bit::vector<>::random(6);
    auto v7 = bit::vector<>::random(7);
    auto v8 = bit::vector<>::random(8);
    auto v9 = bit::vector<>::random(9);

2    auto u5 = bit::vector<>::from(v5.to_hex());
    auto u6 = bit::vector<>::from(v6.to_hex());
    auto u7 = bit::vector<>::from(v7.to_hex());
    auto u8 = bit::vector<>::from(v8.to_hex());
    auto u9 = bit::vector<>::from(v9.to_hex());

    std::cout << "v5 " << v5 << "\t\t u5 " << *u5 << (v5 == *u5 ? "\t match " : "\t FAIL") << '\n';
    std::cout << "v6 " << v6 << "\t u6 " << *u6 << (v6 == *u6 ? "\t match " : "\t FAIL") << '\n';
    std::cout << "v7 " << v7 << "\t u7 " << *u7 << (v7 == *u7 ? "\t match " : "\t FAIL") << '\n';
    std::cout << "v8 " << v8 << "\t u8 " << *u8 << (v8 == *u8 ? "\t match " : "\t FAIL") << '\n';
    std::cout << "v9 " << v9 << "\t u9 " << *u9 << (v9 == *u9 ? "\t match " : "\t FAIL") << '\n';
}

1: Set up some bit-vectors of various lengths with random 50-50 fills.
2: Convert the bit-vectors to hex-strings and use those to construct bit-vectors. Check that the two sets of vectors match.

Output (varies from run to run)

v5 [0 0 1 1 0]           u5 [0 0 1 1 0]          match
v6 [1 0 1 1 1 0]         u6 [1 0 1 1 1 0]        match
v7 [0 1 1 0 0 1 1]       u7 [0 1 1 0 0 1 1]      match
v8 [1 1 1 1 1 0 0 0]     u8 [1 1 1 1 1 0 0 0]    match
v9 [0 0 0 0 0 0 0 0 1]   u9 [0 0 0 0 0 0 0 0 1]  match

Method Arguments

Character Encodings

Binary String Encodings

Hex String Encodings

Example: To Binary

Example: To Hex

Example: From Hex

See Also