Pre-generated GBS datasets

Technical details are available in the API documentation: sf.apps.data

Strawberry Fields contains datasets of pre-generated samples from GBS for encoded problems, including graphs for graph optimization and machine learning problems and molecules for calculating vibronic spectra.

Graphs

For dense subgraph and maximum clique identification, we provide:

planted

Planted

tace_as

TACE-AS

Planted()

A random 30-node graph containing a dense 10-node subgraph planted inside [44].

TaceAs()

Binding interaction graph for the TACE-AS complex [40].

For graph similarity, we provide:

mutag_0

MUTAG_0

mutag_1

MUTAG_1

mutag_2

MUTAG_2

mutag_3

MUTAG_3

Mutag0()

First graph of the MUTAG dataset.

Mutag1()

Second graph of the MUTAG dataset.

Mutag2()

Third graph of the MUTAG dataset.

Mutag3()

Fourth graph of the MUTAG dataset.

Molecules

Using the vibronic module and vibronic() function, GBS data has been generated for formic acid at zero temperature. The GBS samples can be used to recover the vibronic spectrum of the molecule.

Formic()

Zero temperature formic acid.

Dataset

The Dataset class provides the base functionality from which all datasets inherit.

Each dataset contains a variety of metadata relevant to the sampling:

  • n_mean: theoretical mean number of photons in the GBS device

  • threshold: flag to indicate whether samples are generated with threshold detection or with photon-number-resolving detectors

  • n_samples: total number of samples in the dataset

  • modes: number of modes in the GBS device or, equivalently, number of nodes in the graph

  • data: the raw data accessible as a SciPy csr sparse array

Graph and molecule datasets also contain some specific data, such as the graph adjacency matrix or the input molecular information.

Note that datasets are simulated without photon loss.

Loading data

We use the Planted class as an example to show how to interact with the datasets. Datasets can be loaded by running:

>>> data = Planted()

Simply use indexing and slicing to access samples from the dataset:

>>> sample_3 = data[3]
>>> samples = data[:10]

Datasets also contain metadata relevant to the GBS setup:

>>> data.n_mean
8
>>> len(data)
50000

The number of photons or clicks in each sample is available using the Dataset.counts() method:

>>> data.counts()
[2, 0, 8, 11, ... , 6]

For example, we see that the data[3] sample has 11 clicks.