Quantum neural network

“Neural Network are not black boxes. They are a big pile of linear algebra.” - Randall Munroe, xkcd

Machine learning has a wide range of models for tasks such as classification, regression, and clustering. Neural networks are one of the most successful models, having experienced a resurgence in use over the past decade due to improvements in computational power and advanced software libraries. The typical structure of a neural network consists of a series of interacting layers that perform transformations on data passing through the network. An archetypal neural network structure is the feedforward neural network, visualized by the following example:


../_images/neural_network.svg


Here, the neural network depth is determined by the number of layers, while the maximum width is given by the layer with the greatest number of neurons. The network begins with an input layer of real-valued neurons, which feed forward onto a series of one or more hidden layers. Following the notation of [35], if the \(n\) neurons at one layer are given by the vector \(\mathbf{x} \in \mathbb{R}^{n}\), the \(m\) neurons of the next layer take the values

\[\mathcal{L}(\mathbf{x}) = \varphi (W \mathbf{x} + \mathbf{b}),\]

where

  • \(W \in \mathbb{R}^{m \times n}\) is a matrix,
  • \(b \in \mathbb{R}^{m}\) is a vector, and
  • \(\varphi\) is a nonlinear function (also known as the activation function).

The matrix multiplication \(W \mathbf{x}\) is a linear transformation on \(\mathbf{x}\), while \(W \mathbf{x} + \mathbf{b}\) represents an affine transformation. In principle, any nonlinear function can be chosen for \(\varphi\), but often the choice is fixed from a standard set of activations that include the rectified linear unit (ReLU) and the sigmoid function acting on each neuron. Finally, the output layer enacts an affine transformation on the last hidden layer, but the activation function may be linear (including the identity), or a different nonlinear function such as softmax (for classification).

Layers in the feedforward neural network above are called fully connected as every neuron in a given hidden layer or output layer can be connected to all neurons in the previous layer through the matrix \(W\). Over time, specialized versions of layers have been developed to focus on different problems. For example, convolutional layers have a restricted form of connectivity and are suited to machine learning with images. We focus here on fully connected layers as the most general type.

Training of neural networks uses variations of the gradient descent algorithm on a cost function characterizing the similarity between outputs of the neural network and training data. The gradient of the cost function can be calculated using automatic differentiation, with knowledge of the feedforward network structure.

Quantum neural networks aim to encode neural networks into a quantum system, with the intention of benefiting from quantum information processing. There have been numerous attempts to define a quantum neural network, each with varying advantages and disadvantages. The quantum neural network detailed below, following the work of [35], has a CV architecture and is realized using standard CV gates from Strawberry Fields. One advantage of this CV architecture is that it naturally accommodates for the continuous nature of neural networks. Additionally, the CV model is able to easily apply non-linear transformations using the phase space picture - a task which qubit-based models struggle with, often relying on measurement postselection which has a probability of failure.

CV implementation

A CV quantum neural network layer can be defined as

\[\mathcal{L} := \Phi \circ \mathcal{D} \circ \mathcal{U}_{2} \circ \mathcal{S} \circ \mathcal{U}_{1},\]

where

  • \(\mathcal{U}_{k}=U_{k}(\boldsymbol{\theta}_{k},\boldsymbol{\phi}_{k})\) is an \(N\) mode interferometer,
  • \(\mathcal{D}=\otimes_{i=1}^{N}D(\alpha_{i})\) is a single mode displacement gate (Dgate) with complex displacement \(\alpha_{i} \in \mathbb{C}\),
  • \(\mathcal{S}=\otimes_{i=1}^{N}S(r_{i})\) is a single mode squeezing gate (Sgate) acting on each mode with squeezing parameter \(r_{i} \in \mathbb{R}\), and
  • \(\Phi=\otimes_{i=1}^{N}\Phi(\lambda_{i})\) is a non-Gaussian gate on each mode with parameter \(\lambda_{i} \in \mathbb{R}\).

Note

Any non-Gaussian gate such as the cubic phase gate (Vgate) represents a valid choice, but we recommend the Kerr gate (Kgate) for simulations in Strawberry Fields. The Kerr gate is more accurate numerically because it is diagonal in the Fock basis.

The layer is shown below as a circuit:


../_images/layer.svg


These layers can then be composed to form a quantum neural network. The width of the network can also be varied between layers [35].

Reproducing classical neural networks

Let’s see how the quantum layer can embed the transformation \(\mathcal{L}(\mathbf{x}) = \varphi (W \mathbf{x} + \mathbf{b})\) of a classical neural network layer. Suppose \(N\)-dimensional data is encoded in position eigenstates so that

\[\mathbf{x} \Leftrightarrow \ket{\mathbf{x}} := \ket{x_{1}} \otimes \ldots \otimes \ket{x_{N}}.\]

We want to perform the transformation

\[\ket{\mathbf{x}} \Rightarrow \ket{\varphi (W \mathbf{x} + \mathbf{b})}.\]

It turns out that the quantum circuit above can do precisely this! Consider first the affine transformation \(W \mathbf{x} + \mathbf{b}\). Leveraging the singular value decomposition, we can always write \(W = O_{2} \Sigma O_{1}\) with \(O_{k}\) orthogonal matrices and \(\Sigma\) a positive diagonal matrix. These orthogonal transformations can be carried out using interferometers without access to phase, i.e., with \(\boldsymbol{\phi}_{k} = 0\):

\[U_{k}(\boldsymbol{\theta}_{k},\mathbf{0})\ket{\mathbf{x}} = \ket{O_{k} \mathbf{x}}.\]

On the other hand, the diagonal matrix \(\Sigma = {\rm diag}\left(\{c_{i}\}_{i=1}^{N}\right)\) can be achieved through squeezing:

\[\otimes_{i=1}^{N}S(r_{i})\ket{\mathbf{x}} \propto \ket{\Sigma \mathbf{x}},\]

with \(r_{i} = \log (c_{i})\). Finally, the addition of a bias vector \(\mathbf{b}\) is done using position displacement gates:

\[\otimes_{i=1}^{N}D(\alpha_{i})\ket{\mathbf{x}} = \ket{\mathbf{x} + \mathbf{b}},\]

with \(\mathbf{b} = \{\alpha_{i}\}_{i=1}^{N}\) and \(\alpha_{i} \in \mathbb{R}\). Putting this all together, we see that the operation \(\mathcal{D} \circ \mathcal{U}_{2} \circ \mathcal{S} \circ \mathcal{U}_{1}\) with phaseless interferometers and position displacement performs the transformation \(\ket{\mathbf{x}} \Rightarrow \ket{W \mathbf{x} + \mathbf{b}}\) on position eigenstates.

Warning

The TensorFlow backend is the natural simulator for quantum neural networks in Strawberry Fields, but this backend cannot naturally accommodate position eigenstates, which require infinite squeezing. For simulation of position eigenstates in this backend, the best approach is to use a displaced squeezed state (prepare_displaced_squeezed_state) with high squeezing value r. However, to avoid significant numerical error, it is important to make sure that all initial states have negligible amplitude for Fock states \(\ket{n}\) with \(n\geq \texttt{cutoff_dim}\), where \(\texttt{cutoff_dim}\) is the cutoff dimension.

Finally, the nonlinear function \(\varphi\) can be achieved through a restricted type of non-Gaussian gates \(\otimes_{i=1}^{N}\Phi(\lambda_{i})\) acting on each mode (see [35] for more details), resulting in the transformation

\[\otimes_{i=1}^{N}\Phi(\lambda_{i})\ket{\mathbf{x}} = \ket{\varphi(\mathbf{x})}.\]

The operation \(\mathcal{L} = \Phi \circ \mathcal{D} \circ \mathcal{U}_{2} \circ \mathcal{S} \circ \mathcal{U}_{1}\) with phaseless interferometers, position displacements, and restricted non-Gaussian gates can hence be seen as enacting a classical neural network layer \(\ket{\mathbf{x}} \Rightarrow \ket{\phi(W \mathbf{x} + \mathbf{b})}\) on position eigenstates.

Extending to quantum neural networks

In fact, CV quantum neural network layers can be made more expressive than their classical counterparts. We can do this by lifting the above restrictions on \(\mathcal{L}\), i.e.:

  • Using arbitrary interferometers \(U_{k}(\boldsymbol{\theta}_{k},\boldsymbol{\phi}_{k})\) with access to phase and general displacement gates (i.e., not necessarily position displacement). This allows \(\mathcal{D} \circ \mathcal{U}_{2} \circ \mathcal{S} \circ \mathcal{U}_{1}\) to represent a general Gaussian operation.
  • Using arbitrary non-Gaussian gates \(\Phi(\lambda_{i})\), such as the Kerr gate.
  • Encoding data outside of the position eigenbasis, for example using instead the Fock basis.

In fact, gates in a single layer form a universal gate set, making the CV quantum neural network a model for universal quantum computing, i.e., a sufficient number of layers can carry out any quantum algorithm implementable on a CV quantum computer.

CV quantum neural networks can be trained both through classical simulation and directly on quantum hardware. Strawberry Fields relies on classical simulation to evaluate cost functions of the CV quantum neural network and the resultant gradients with respect to parameters of each layer. However, this becomes an intractable task with increasing network depth and width. Ultimately, direct evaluation on hardware will likely be necessary to large scale networks; an approach for hardware-based training is mapped out in [36]. The PennyLane library provides tools for training hybrid quantum-classical machine learning models, using both simulators and real-world quantum hardware.

Example CV quantum neural network layers are shown, for one to four modes, below:


../_images/layer_1mode.svg

One mode layer


../_images/layer_2mode.svg

Two mode layer


../_images/layer_3mode.svg

Three mode layer


../_images/layer_4mode.svg

Four mode layer


Here, the multimode linear interferometers \(U_{1}\) and \(U_{2}\) have been decomposed into two-mode phaseless beamsplitters (BSgate) and single-mode phase shifters (Rgate) using the Clements decomposition [6]. The Kerr gate is used as the non-Gaussian gate.

Blackbird code

The first step to writing a CV quantum neural network layer in Blackbird code is to define a function for the two interferometers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def interferometer(theta, phi, rphi, q):
    """Parameterised interferometer acting on N qumodes

    Args:
        theta (list): list of length N(N-1)/2 real parameters
        phi (list): list of length N(N-1)/2 real parameters
        rphi (list): list of length N-1 real parameters
        q (list): list of qumodes the interferometer is to be applied to
    """
    N = len(q)

    if N == 1:
        # the interferometer is a single rotation
        Rgate(rphi[0]) | q[0]
        return

    n = 0  # keep track of free parameters

    # Apply the rectangular beamsplitter array
    # The array depth is N
    for l in range(N):
        for k, (q1, q2) in enumerate(zip(q[:-1], q[1:])):
            # skip even or odd pairs depending on layer
            if (l + k) % 2 != 1:
                BSgate(theta[n], phi[n]) | (q1, q2)
                n += 1

    # apply the final local phase shifts to all modes except the last one
    for i in range(max(1, N - 1)):
        Rgate(rphi[i]) | q[i]

Warning

The Interferometer class in Strawberry Fields does not reproduce the functionality above. Instead, Interferometer applies a given input unitary matrix according to the Clements decomposition.

Using the above interferometer function, an \(N\) mode CV quantum neural network layer is given by the function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def layer(q):
    """CV quantum neural network layer acting on N modes

    Args:
        q (list): list of qumodes the layer is to be applied to
    """
    N = len(q)
    BS_variable_number = int(N * (N - 1) / 2)
    R_variable_number = max(1, N - 1)

    theta_variables_1 = tf.Variable(tf.random_normal(shape=[BS_variable_number]))
    phi_variables_1 = tf.Variable(tf.random_normal(shape=[BS_variable_number]))
    rphi_variables_1 = tf.Variable(tf.random_normal(shape=[R_variable_number]))

    theta_variables_2 = tf.Variable(tf.random_normal(shape=[BS_variable_number]))
    phi_variables_2 = tf.Variable(tf.random_normal(shape=[BS_variable_number]))
    rphi_variables_2 = tf.Variable(tf.random_normal(shape=[R_variable_number]))

    s_variables = tf.Variable(tf.random_normal(shape=[N], stddev=0.0001))
    d_variables_r = tf.Variable(tf.random_normal(shape=[N], stddev=0.0001))
    d_variables_phi = tf.Variable(tf.random_normal(shape=[N]))
    k_variables = tf.Variable(tf.random_normal(shape=[N], stddev=0.0001))

    # begin layer
    interferometer(theta_variables_1, phi_variables_1, rphi_variables_1, q)

    for i in range(N):
        Sgate(s_variables[i]) | q[i]

    interferometer(theta_variables_2, phi_variables_2, rphi_variables_2, q)

    for i in range(N):
        Dgate(d_variables_r[i], d_variables_phi[i]) | q[i]
        Kgate(k_variables[i]) | q[i]

The variables fed into the gates of the layer are defined as TensorFlow variables. Multiple layers can then be joined into a network using:

with prog.context as q:

    for _ in range(layers):
        layer(q)

Note

A fully functional Strawberry Fields simulation containing the above Blackbird code for state preparation is included at examples/quantum_neural_network.py.

Applications of CV quantum neural networks to state learning and gate synthesis can be found in the Strawberry Fields gallery.