Bernoulli and beyond with alea
Someone in the nim-science chat today asked about if there is a library with a categorical distribution.
I got curious on how to implement this with alea, so that I can learn a bit how alea works.
Alea
In alea first we need to initialize a random number generator to sample from the distributions:
import
random / xorshift
import
alea
import
math, sugar, strutils
var rng = wrap(initXorshift128Plus(1337))
We will carry around this random number generator and use it for every operation with random variables.
It represents the probability space $\Omega$ in the definition
of the random variable $X$:
$$X: \Omega \to E$$
Where $E$ (a measurable space), will be represented by a generic type $T$.
Note that I was not able to use the rng suggested by alea, I got some gcc error due to the usage of urandom.
This is the code that did not work for me:
import random/urandom, random/mersenne
import alea
var rng = wrap(initMersenneTwister(urandom(16)))
I decide to pick another rng (and actually remove the call from urandom that was causing the error) from the documentation
of nim-random. I probably should file an issue.
Bernoulli distribution
Bernoulli distribution is already available.
Every distribution comes with a sample
proc which has as first argument the random number generator.
alea
also provides a take
template to create a sequence with n
samples.
let
p = 0.9
b = bernoulli(p)
echo rng.sample b
block:
let s = take(rng, b, 10)
echo s
1.0
@[1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0]
We can also compute the mean ($p$), variance ($p(1-p)$) and standard deviation ($\sqrt(p (1 - p))$)
echo rng.mean b
echo rng.variance b
echo p * (1 - p)
echo rng.stddev b
echo sqrt(p * (1 - p))
0.9
0.08999999999999998
0.08999999999999998
0.3
0.3
Choice distribution
A categorical distribution that is already implemented in alea is the one where the categories have equal probability:
let dice = choice(@[1, 2, 3, 4, 5, 6])
block:
let s = take(rng, dice, 10)
echo s
@[4, 4, 6, 1, 6, 5, 5, 5, 1, 3]
Note that the mean
will not work for dice
since it is implemented only for RandomVar[float]
.
In order to have a mean I could do like this:
let fdice = choice(@[1.0, 2, 3, 4, 5, 6])
echo rng.mean fdice
3.5
Categorical distribution
Finally, to define a custom distribution
I just need to define:
- an object that contains the parameters of the distribution
- a constructor to add initial validation
- a
sample
proc
Here is the generic object for a categorical distribution (on float numbers):
type
Categorical* = object
p*: seq[float]
e*: seq[float]
This implementation is a bit fragile, since p
and e
should always have same length,
and I will be able to ensure that only through the constructor:
proc categorical*(s: seq[tuple[e, p: float]]): Categorical =
var tot = 0.0
for (e, p) in s:
assert p >= 0 and p <= 1
result.e.add e
result.p.add p
tot += p
assert tot > 0
if tot != 1.0:
echo "probabilities do not add to 1.0, they will be normalized to sum to 1"
let w = 1.0 / tot
for i in 0 .. result.p.high:
result.p[i] = result.p[i] * w
and we can already use this to create a biased dice, for example one that has double chance to output an odd number than an even number:
let oddDice = categorical(@[(1.0, 0.2), (2.0, 0.1), (3.0, 0.2), (4.0, 0.1),
(5.0, 0.2), (6.0, 0.1)])
probabilities do not add to 1.0, they will be normalized to sum to 1
Now to close the loop I need to implement the sample
proc:
proc sample*(rng: var Random; c: Categorical): float =
var tot = 0.0
let r = rng.random()
for i in 0 .. c.e.high - 1:
tot += c.p[i]
if r <= tot:
return c.e[i]
return c.e[^1]
Let's test this:
block:
let s = take(rng, oddDice, 20)
echo s
echo rng.mean(oddDice)
@[1.0, 1.0, 1.0, 3.0, 5.0, 1.0, 1.0, 6.0, 5.0, 3.0, 5.0, 4.0, 3.0, 3.0, 3.0, 5.0, 4.0, 5.0, 1.0, 1.0]
3.34143
And the output seems reasonable (I could probably overload the mean and try to use it to compute an exact mean.
Conclusions
It was rather straightforward to implement a categorical distribution in alea.
We do get limited functionality by doing this, basically the ability to sample and compute some moments.
I did not use them, but in alea there is the possibilty of composing operations
on top of random variables
and conditioning between them.
Something which is probably missing in alea and it could be nice to add is the possibility of
computing probability density function and cumulative distribution functions.
For reference here are the apis of other scientific languages:
import nimib
nbInit
nbDoc.useLatex
nbText: """# Bernoulli and beyond with [alea](https://github.com/andreaferretti/alea)
Someone in the nim-science chat today asked about if there is a library with a [categorical distribution](https://en.wikipedia.org/wiki/Categorical_distribution).
I got curious on how to implement this with [alea](https://github.com/andreaferretti/alea), so that I can learn a bit how alea works.
## Alea
In alea first we need to initialize a random number generator to sample from the distributions:
"""
nbCode:
import random/xorshift
import alea
import math, sugar, strutils
var rng = wrap(initXorshift128Plus(1337))
nbText: """
We will carry around this random number generator and use it for every operation with random variables.
It represents the probability space $\Omega$ in the [definition](https://en.wikipedia.org/wiki/Random_variable)
of the random variable $X$:
$$X: \Omega \to E$$
Where $E$ (a measurable space), will be represented by a generic type $T$.
Note that I was not able to use the rng suggested by alea, I got some gcc error due to the usage of urandom.
This is the code that did not work for me:
```nim
import random/urandom, random/mersenne
import alea
var rng = wrap(initMersenneTwister(urandom(16)))
```
I decide to pick another rng (and actually remove the call from urandom that was causing the error) from the documentation
of [nim-random](https://github.com/oprypin/nim-random). I probably should file an issue.
"""
nbText: """### Bernoulli distribution
Bernoulli distribution is already available.
Every distribution comes with a `sample` proc which has as first argument the random number generator.
`alea` also provides a `take` template to create a sequence with `n` samples.
"""
nbCode:
let
p = 0.9
b = bernoulli(p)
echo rng.sample b
block:
let s = take(rng, b, 10)
echo s
nbText: r"We can also compute the mean ($p$), variance ($p(1-p)$) and standard deviation ($\sqrt(p (1 - p))$)"
nbCode:
echo rng.mean b
echo rng.variance b
echo p*(1 - p)
echo rng.stddev b
echo sqrt(p*(1 - p))
nbText: """### Choice distribution
A categorical distribution that is already implemented in alea is the one where the categories have equal probability:
"""
nbCode:
let dice = choice(@[1, 2, 3, 4, 5, 6])
block:
let s = take(rng, dice, 10)
echo s
nbText: """Note that the `mean` will not work for `dice` since it is implemented only for `RandomVar[float]`.
In order to have a mean I could do like this:
"""
nbCode:
let fdice = choice(@[1.0, 2, 3, 4, 5, 6])
echo rng.mean fdice
nbText: """### Categorical distribution
Finally, to [define a custom distribution](https://github.com/andreaferretti/alea#defining-custom-distributions)
I just need to define:
1. an object that contains the parameters of the distribution
2. a constructor to add initial validation
3. a `sample` proc
Here is the generic object for a categorical distribution (on float numbers):
"""
nbCode:
type
Categorical* = object
p*: seq[float]
e*: seq[float]
nbText: """This implementation is a bit fragile, since `p` and `e` should always have same length,
and I will be able to ensure that only through the constructor:
"""
nbCode:
proc categorical*(s: seq[tuple[e, p: float]]): Categorical =
var tot = 0.0
for (e, p) in s:
assert p >= 0 and p <= 1
result.e.add e
result.p.add p
tot += p
assert tot > 0
if tot != 1.0:
echo "probabilities do not add to 1.0, they will be normalized to sum to 1"
let w = 1.0 / tot
for i in 0 .. result.p.high:
result.p[i] = result.p[i] * w
nbText: """and we can already use this to create a biased dice, for example one that has double chance to output an odd number than an even number:"""
nbCode:
let oddDice = categorical(@[(1.0, 0.2), (2.0, 0.1), (3.0, 0.2), (4.0, 0.1), (5.0, 0.2), (6.0, 0.1)])
nbText: "Now to close the loop I need to implement the `sample` proc:"
nbCode:
proc sample*(rng: var Random, c: Categorical): float =
var tot = 0.0
let r = rng.random()
for i in 0 .. c.e.high - 1:
tot += c.p[i]
if r <= tot:
return c.e[i]
return c.e[^1]
nbText: "Let's test this:"
nbCode:
block:
let s = take(rng, oddDice, 20)
echo s
echo rng.mean(oddDice)
nbText: "And the output seems reasonable (I could probably overload the mean and try to use it to compute an exact mean."
nbText: """
## Conclusions
It was rather straightforward to implement a categorical distribution in alea.
We do get limited functionality by doing this, basically the ability to sample and compute some moments.
I did not use them, but in alea there is the possibilty of [composing operations](https://github.com/andreaferretti/alea#operations-on-random-variables)
on top of random variables
and [conditioning](https://github.com/andreaferretti/alea#conditioning-random-variables) between them.
Something which is probably missing in alea and it could be nice to add is the possibility of
computing probability density function and cumulative distribution functions.
For reference here are the apis of other scientific languages:
- Python: <https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html>
- R: <https://cran.r-project.org/web/views/Distributions.html>
- Julia: <https://github.com/JuliaStats/Distributions.jl>
"""
nbSave