# sf.apps.train.Stochastic¶

class Stochastic(h, vgbs)[source]

Bases: object

Stochastic cost function given by averaging over samples from a trainable GBS distribution.

A stochastic optimization problem is defined with respect to a function $$h(\bar{n})$$ that assigns a cost to an input sample $$\bar{n}$$. The cost function is the average of $$h(\bar{n})$$ over samples generated from a parametrized distribution $$P_{\theta}(\bar{n})$$:

$C (\theta) = \sum_{\bar{n}} h(\bar{n}) P_{\theta}(\bar{n})$

The cost function $$C (\theta)$$ can then be optimized by varying $$P_{\theta}(\bar{n})$$.

In this setting, $$P_{\theta}(\bar{n})$$ is the variational GBS distribution and is specified in Stochastic by an instance of VGBS.

Example usage:

The function $$h(\bar{n})$$ can be viewed as an energy. Clicks in odd-numbered modes decrease the total energy, while clicks in even-numbered modes increase it.

>>> embedding = train.embed.Exp(4)
>>> A = np.ones((4, 4))
>>> vgbs = train.VGBS(A, 3, embedding, threshold=True)
>>> h = lambda x: sum([x[i] * (-1) ** (i + 1) for i in range(4)])
>>> cost = Stochastic(h, vgbs)
>>> params = np.array([0.05, 0.1, 0.02, 0.01])
>>> cost.evaluate(params, 100)
0.03005489236683591
array([ 0.10880756, -0.1247146 ,  0.12426481, -0.13783342])

Parameters
• h (callable) – a function that assigns a cost to an input sample

• vgbs (train.VGBS) – the trainable GBS distribution, which must be an instance of VGBS

 evaluate(params, n_samples) Evaluates the cost function. grad(params, n_samples) Evaluates the gradient of the cost function. h_reparametrized(sample, params) Include trainable parameters in the $$h(\bar{n})$$ function to allow sampling from the initial adjacency matrix.
evaluate(params, n_samples)[source]

Evaluates the cost function.

The cost function can be evaluated by finding its average over samples generated from the VGBS system using the trainable parameters $$\theta$$:

$C (\theta) = \sum_{\bar{n}} h(\bar{n}) P_{\theta}(\bar{n})$

Alternatively, the cost function can be evaluated by finding a different average over samples from the input adjacency matrix to the VGBS system:

$C (\theta) = \sum_{\bar{n}} h(\bar{n}, \theta) P(\bar{n})$

where $$h(\bar{n}, \theta)$$ is given in h_reparametrized() and now contains the trainable parameters, and $$P(\bar{n})$$ is the distribution over the input adjacency matrix. The advantage of this alternative approach is that we do not need to keep regenerating samples for an updated adjacency matrix and can instead use a fixed set of samples.

The second approach above is utilized in Stochastic to speed up evaluation of the cost function and its gradient. This is done by approximating the cost function using a single fixed set of samples. The samples can be pre-loaded into the VGBS class or generated once upon the first call of either Stochastic.evaluate() or Stochastic.grad().

Example usage:

>>> cost.evaluate(params, 100)
0.03005489236683591

Parameters
• params (array) – the trainable parameters $$\theta$$

• n_samples (int) – the number of GBS samples used to average the cost function

Returns

the value of the stochastic cost function

Return type

float

grad(params, n_samples)[source]

Evaluates the gradient of the cost function.

As shown in this paper, the gradient can be evaluated by finding an average over samples generated from the input adjacency matrix to the VGBS system:

$\partial_{\theta} C (\theta) = \sum_{\bar{n}} h(\bar{n}, \theta) P(\bar{n}) \sum_{k=1}^{m} (n_k - \langle n_{k} \rangle) \partial_{\theta} \log w_{k}$

where $$h(\bar{n}, \theta)$$ is given in h_reparametrized(), $$P(\bar{n})$$ is the distribution over the input adjacency matrix, $$n_{k}$$ is the number of photons in mode $$k$$, and $$w_{k}$$ are the weights in the VGBS system.

This method approximates the gradient using a fixed set of samples from the initial adjacency matrix. The samples can be pre-loaded into the VGBS class or generated once upon the first call of Stochastic.evaluate() or Stochastic.grad().

Example usage:

>>> cost.grad(params, 100)
array([ 0.10880756, -0.1247146 ,  0.12426481, -0.13783342])

Parameters
• params (array) – the trainable parameters $$\theta$$

• n_samples (int) – the number of GBS samples used in the gradient estimation

Returns

Return type

array

h_reparametrized(sample, params)[source]

Include trainable parameters in the $$h(\bar{n})$$ function to allow sampling from the initial adjacency matrix.

The reparametrized function can be written in terms $$h(\bar{n})$$ as:

$h(\bar{n}, \theta) = h(\bar{n}) \sqrt{\frac{\det (\mathbb{I} - A(\theta)^{2})} {\det (\mathbb{I} - A^{2})}} \prod_{k=1}^{m}w_{k}^{n_{k}},$

where $$w_{k}$$ is the $$\theta$$-dependent weight on the $$k$$-th mode in the VGBS system and $$n_{k}$$ is the number of photons in mode $$k$$.

Example usage:

>>> sample = [1, 1, 0, 0]
>>> cost.h_reparametrized(sample, params)
-1.6688383062813434

Parameters
• sample (array) – the sample

• params (array) – the trainable parameters $$\theta$$

Returns

the cost function with respect to a given sample and set of trainable parameters

Return type

float