The xso::partition
Class
Introduction
The xso::partition
class partitions a single stream of random numbers into a collection of non-overlapping sub-streams. This is primarily useful for setting up parallel computations.
Declaration
The partition
class is in the xso
namespace and is declared as follows:
namespace xso {
template<typename RNG> class partition;
}
Template Parameter
The template parameter RNG
should satisfy the requirements of a State
type. In particular, it should have a class method characteristic_coefficients
that works.
Instance Methods
The xso::partition
class has just a couple of instance methods:
Construction
(const State& gen, std::size_t n_partitions); partition
Here, gen is a parent generator or State, and we want to partition its random state stream into n_partitions
non-overlapping sub-streams. Each of those will be owned
by a copy of gen
, seeded with the correct starting state for the sub-stream.
The constructor computes an appropriate jump size \(J\) that depends on the n_partitions
argument. It then uses the State
characteristic coefficients to compute an appropriate jump polynomial for jumps of size \(J\).
The next()
Method
(); State next
This returns a new State
that is a copy of the parent gen
but seeded at the start of the next partition. The returned RNG
will be the same as gen
, with its state jumped along by some multiple of \(J\) steps
Example
The xso::partition
class is best illustrated with a sketch example.
Example: Setting up for parallel processing
#include <xoshiro.h>
int main()
{
1std::size_t n_cores = available_cores(...);
2::rng gen;
xso3::partition partition(gen, n_cores);
xsofor(std::size_t job = 0; job < n_cores; ++job) {
4(partition.next(), ...);
spawn_job}
...
}
- 1
- Some method that returns the number of compute cores we have available.
- 2
- Create a randomly seeded parent generator.
- 3
-
This constructs a partition of the parent state stream into
n_cores
non-overlapping pieces. - 4
- And we’re off spawning an independent job for each core, handing each its own sub-stream generator to work with.
In the sketch, we first determine how many cores are available for the simulation. That may be a fixed number, or some function determines the number on the fly. For the sake of this example, suppose that n_cores
comes back as 100.
We want to partition the random number stream from gen
into 100 pieces and hand each core its own sub-stream to work with. So, we set up an xso::partition
object to do just that.
The loop spawns jobs where each one is given its own random number generator. The generators come from calling the partition.next()
method. The xso::partition
class ensures that these generators are identical to the original `gen, except that they are seeded appropriately far along to the next sub-stream.
The parent generator has 256 bits of state, so its random state stream has \(2^{256}\) slots. This means that all the sub-streams still have a vast number of random deviates to use up before there is any chance of getting overlaps.
Implementation Note
In the example above, we use a random number generator xso::rng, which has 256 state bits. Those generators will step through an enormous \(2^{256}\) states before repeating the cycle.
We have 100 cores available for parallel computation. Ideally, we want to partition the random number stream from gen
into 100 pieces and hand each core its own non-overlapping sub-stream of size: \[
\frac{2^{256}}{100}.
\] However, that number will overflow any normal 32-bit or 64-bit integer! Instead, the partition class will create \(N =2^p\) sub-streams, where \(N\) is a power of two close to, but larger than 100. In this case, that power will be 7 as \(2^7 = 128 \ge 100\).
The number of created non-overlapping partitions is 128 instead of 100.
We only use 100 of the 128 as we spawn independent jobs, so each partition is therefore slightly smaller than the ideal, and of course, we are not using any of the final 28 partitions at all. However, each of the sub-streams still has some \(2^{256 - 7} = 2^{249}\) available random numbers. That is vastly more than a job on any present-day computer will ever consume, so the “wastage” is completely insignificant.
This class requires the State to have its pre-computed* characteristic coefficients available. This is true for all our type aliased recommended generators.
|