GBS datasets¶
Technical details are available in the API documentation: sf.apps.data
Strawberry Fields contains datasets of pregenerated samples from GBS for encoded problems, including graphs for graph optimization and machine learning problems, and molecules for calculating vibronic spectra.
Graphs¶
For dense subgraph and maximum clique identification, we provide:
Planted 
TACEAS 

A random 30node graph containing a dense 10node subgraph planted inside [5]. 

Binding interaction graph for the TACEAS complex [9]. 

Random graph created using the phat generator of [4]. 
For graph similarity, we provide:
MUTAG_0 
MUTAG_1 
MUTAG_2 
MUTAG_3 

First graph of the MUTAG dataset. 

Second graph of the MUTAG dataset. 

Third graph of the MUTAG dataset. 

Fourth graph of the MUTAG dataset. 
Additionally, precalculated feature vectors of the following graph datasets are provided:

Exactlycalculated feature vectors of the 188 graphs in the MUTAG dataset. 

Exactlycalculated feature vectors of 1100 randomlychosen molecules from the QM9 dataset. 

MonteCarlo estimated feature vectors of 1100 randomlychosen molecules from the QM9 dataset. 
Molecules¶
Using the vibronic
module and sample()
function, GBS data has
been generated for formic acid at zero temperature. The GBS samples can be used to recover the
vibronic spectrum of the molecule.

Zero temperature formic acid. 

Vibrational dynamics of the water molecule. 

Vibrational dynamics of the pyrrole molecule. 
Dataset¶
The SampleDataset
class provides the base functionality from which all datasets inherit.
Each dataset contains a variety of metadata relevant to the sampling:
n_mean
: theoretical mean number of photons in the GBS devicethreshold
: flag to indicate whether samples are generated with threshold detection or with photonnumberresolving detectorsn_samples
: total number of samples in the datasetmodes
: number of modes in the GBS device or, equivalently, number of nodes in the graphdata
: the raw data accessible as a SciPy csr sparse array
Graph and molecule datasets also contain some specific data, such as the graph adjacency matrix or the input molecular information.
Note that datasets are simulated without photon loss.
Loading data¶
We use the Planted
class as an example to show how to interact with the datasets. Datasets
can be loaded by running:
>>> data = Planted()
Simply use indexing and slicing to access samples from the dataset:
>>> sample_3 = data[3]
>>> samples = data[:10]
Datasets also contain metadata relevant to the GBS setup:
>>> data.n_mean
8
>>> len(data)
50000
The number of photons or clicks in each sample is available using the
counts()
method:
>>> data.counts()
[2, 0, 8, 11, ... , 6]
For example, we see that the data[3]
sample has 11 clicks.
Contents
Downloads