sf.apps.train.Stochastic¶
-
class
Stochastic
(h, vgbs)[source]¶ Bases:
object
Stochastic cost function given by averaging over samples from a trainable GBS distribution.
A stochastic optimization problem is defined with respect to a function \(h(\bar{n})\) that assigns a cost to an input sample \(\bar{n}\). The cost function is the average of \(h(\bar{n})\) over samples generated from a parametrized distribution \(P_{\theta}(\bar{n})\):
\[C (\theta) = \sum_{\bar{n}} h(\bar{n}) P_{\theta}(\bar{n})\]The cost function \(C (\theta)\) can then be optimized by varying \(P_{\theta}(\bar{n})\).
In this setting, \(P_{\theta}(\bar{n})\) is the variational GBS distribution and is specified in
Stochastic
by an instance ofVGBS
.Example usage:
The function \(h(\bar{n})\) can be viewed as an energy. Clicks in odd-numbered modes decrease the total energy, while clicks in even-numbered modes increase it.
>>> embedding = train.embed.Exp(4) >>> A = np.ones((4, 4)) >>> vgbs = train.VGBS(A, 3, embedding, threshold=True) >>> h = lambda x: sum([x[i] * (-1) ** (i + 1) for i in range(4)]) >>> cost = Stochastic(h, vgbs) >>> params = np.array([0.05, 0.1, 0.02, 0.01]) >>> cost.evaluate(params, 100) 0.03005489236683591 >>> cost.grad(params, 100) array([ 0.10880756, -0.1247146 , 0.12426481, -0.13783342])
- Parameters
h (callable) – a function that assigns a cost to an input sample
vgbs (train.VGBS) – the trainable GBS distribution, which must be an instance of
VGBS
Methods
evaluate
(params, n_samples)Evaluates the cost function.
grad
(params, n_samples)Evaluates the gradient of the cost function.
h_reparametrized
(sample, params)Include trainable parameters in the \(h(\bar{n})\) function to allow sampling from the initial adjacency matrix.
-
evaluate
(params, n_samples)[source]¶ Evaluates the cost function.
The cost function can be evaluated by finding its average over samples generated from the VGBS system using the trainable parameters \(\theta\):
\[C (\theta) = \sum_{\bar{n}} h(\bar{n}) P_{\theta}(\bar{n})\]Alternatively, the cost function can be evaluated by finding a different average over samples from the input adjacency matrix to the VGBS system:
\[C (\theta) = \sum_{\bar{n}} h(\bar{n}, \theta) P(\bar{n})\]where \(h(\bar{n}, \theta)\) is given in
h_reparametrized()
and now contains the trainable parameters, and \(P(\bar{n})\) is the distribution over the input adjacency matrix. The advantage of this alternative approach is that we do not need to keep regenerating samples for an updated adjacency matrix and can instead use a fixed set of samples.The second approach above is utilized in
Stochastic
to speed up evaluation of the cost function and its gradient. This is done by approximating the cost function using a single fixed set of samples. The samples can be pre-loaded into theVGBS
class or generated once upon the first call of eitherStochastic.evaluate()
orStochastic.grad()
.Example usage:
>>> cost.evaluate(params, 100) 0.03005489236683591
- Parameters
params (array) – the trainable parameters \(\theta\)
n_samples (int) – the number of GBS samples used to average the cost function
- Returns
the value of the stochastic cost function
- Return type
float
-
grad
(params, n_samples)[source]¶ Evaluates the gradient of the cost function.
As shown in this paper, the gradient can be evaluated by finding an average over samples generated from the input adjacency matrix to the VGBS system:
\[\partial_{\theta} C (\theta) = \sum_{\bar{n}} h(\bar{n}, \theta) P(\bar{n}) \sum_{k=1}^{m} (n_k - \langle n_{k} \rangle) \partial_{\theta} \log w_{k}\]where \(h(\bar{n}, \theta)\) is given in
h_reparametrized()
, \(P(\bar{n})\) is the distribution over the input adjacency matrix, \(n_{k}\) is the number of photons in mode \(k\), and \(w_{k}\) are the weights in theVGBS
system.This method approximates the gradient using a fixed set of samples from the initial adjacency matrix. The samples can be pre-loaded into the
VGBS
class or generated once upon the first call ofStochastic.evaluate()
orStochastic.grad()
.Example usage:
>>> cost.grad(params, 100) array([ 0.10880756, -0.1247146 , 0.12426481, -0.13783342])
- Parameters
params (array) – the trainable parameters \(\theta\)
n_samples (int) – the number of GBS samples used in the gradient estimation
- Returns
the gradient vector
- Return type
array
-
h_reparametrized
(sample, params)[source]¶ Include trainable parameters in the \(h(\bar{n})\) function to allow sampling from the initial adjacency matrix.
The reparametrized function can be written in terms \(h(\bar{n})\) as:
\[h(\bar{n}, \theta) = h(\bar{n}) \sqrt{\frac{\det (\mathbb{I} - A(\theta)^{2})} {\det (\mathbb{I} - A^{2})}} \prod_{k=1}^{m}w_{k}^{n_{k}},\]where \(w_{k}\) is the \(\theta\)-dependent weight on the \(k\)-th mode in the
VGBS
system and \(n_{k}\) is the number of photons in mode \(k\).Example usage:
>>> sample = [1, 1, 0, 0] >>> cost.h_reparametrized(sample, params) -1.6688383062813434
- Parameters
sample (array) – the sample
params (array) – the trainable parameters \(\theta\)
- Returns
the cost function with respect to a given sample and set of trainable parameters
- Return type
float