Quantum operations
The 1Q and 2Q gates are implemented with pairs of 515-nm laser beams separated by the approximately 8.04 GHz qubit frequency splitting. The 1Q gates, \(U_1\rmQ(\theta ,\phi )=\rme^(-\rmi\theta /2)(\cos \phi X+\sin \phi Y)\), are implemented with co-propagating laser beams for improved phase stability of the Raman interaction and minimal sensitivity to the thermal motion of the ions. 1Q Z-rotations, RZ(θ) = e−iZθ/2, are implemented by phase changes in software tracking and applied to the next 1Q gate scheduled. The 2Q gates are implemented with beams intersecting the quantum logic zones at 90° to each other such that the difference k-vector is parallel to the crystal axis (Fig. 2e). The 2Q gate protocol is based on the Mølmer–Sørensen interaction using wrapper pulses to remove optical phase sensitivity21,77, yielding a native 2Q gate RZZ(θ) = e−iZZθ/2. The gate angle θ is specified by the user and is varied by adjusting the detuning and duration of the gate. Gate infidelities have been shown to improve for smaller angles22 but here we only benchmark the perfect entangler RZZ(π/2).
SPAM is achieved in 137Ba+ with a combination of lasers at 493 nm, 614 nm, 650 nm and 1,762 nm, with preparation accomplished by means of narrow-band optical pumping9,78. The 1,762-nm laser is locked to a narrow linewidth cavity to facilitate high-fidelity mapping pulses between the S1/2 ground state and the D5/2 state (Extended Data Fig. 1). The standard measurement protocol first maps the |F = 1, mf = 0⟩ qubit state to the D5/2 manifold with several π pulses to different levels in D5/2. Then the 493-nm and 650-nm lasers are turned on to induce fluorescence from all S1/2 states. Furthermore, the 1,762-nm laser is used to protect neighbouring qubits from measurement crosstalk errors (Extended Data Fig. 1b) and enables a ternary (three-outcome) measurement to detect leakage population (Extended Data Fig. 1c) without the use of ancillas or 2Q gates79,80,81.
The QCCD architecture relies on mid-circuit recooling of ions, achieved here with sympathetic cooling applied to 171Yb+ ions co-trapped with the 137Ba+ qubit ions. The 171Yb+ ion is chosen because of its similar mass to 137Ba+ and for the established and straightforward methods for qubit control and state measurement82. The cooling is performed with lasers tuned near the S1/2 to P1/2 transition of 171Yb+ at 369 nm.
To load ions into the QCCD, we photoionize both species from cold atomic beams produced by an atomic source similar to ref. 22, based on a neutral atom magneto-optical trap83,84. Other hardware details, including implementation of all quantum operations, are described in the Supplementary Information.
The Helios runtime software
Many of the Guppy13 programs for the applications discussed in the ‘Benchmarking’ use the features outlined in the section ‘Real-time compilation of sorting and gates’. Moreover, quantum error correction programs can use dynamic allocation and de-allocation of virtual ancilla qubits without worrying about physical qubit mappings of the ancilla qubits or the precise control flow of the quantum error correction program. Furthermore, any programming language compiling to QIR85, such as Q#86, Qiskit87, OpenQASM 2.0/3.0 (refs. 88,89), Cirq90 and CUDA-Q91, can use QIR adaptive profile features to implement these control flow constructs for programs executing on Helios.
An example of high-level operations enabled by the Helios runtime is the ‘gate streaming’ used in ref. 37. In the Guppy program executed on Helios for this work, a section of the program performs a remote procedure call out to a classical server that is separate from the control system but which is allowed to communicate with the control system through a networking interface92. The information transmitted to the control system by the classical server is the measurement basis for each qubit. If a qubit needs no change in measurement basis, then the runtime receives no 1Q gate to apply before measurement. In the case that a whole row of BY or YB crystals on the top or bottom legs needs no basis change, the Helios runtime will not perform any extraneous transport to address these qubits. Notably, this reduces the overall shot time, improving the critical latency times in that application. Efficient gate streaming would be impossible without the real-time identification of qubits provided by the runtime.
As mentioned in the section ‘Real-time compilation of sorting and gates’, the Helios runtime has four main responsibilities to perform for programs executing on Helios. Responsibility (1) is performed using a model of the physical QPU state as the program runs and determining efficient mappings from virtual qubits to physical qubits. Regardless of the state of the trap when a qubit allocation request is made, a simple algorithm identifies the qubit closest to the quantum operation zone. If an unallocated qubit is in the quantum operation zone, then it is used. Otherwise, a qubit in the storage ring that is unallocated and closest to the junction is allocated. If no allocatable qubits are in the storage ring or quantum operation zone, then all qubits are ‘flushed’ back into the storage ring and then an unallocated qubit closest to the junction is allocated.
Responsibilities (2) and (3) are performed by identifying which quantum logic operations can be done in parallel by storing them in sets contained in a data structure we refer to as a ‘slice’. Sequences of slices are accumulated into another data structure that drives the sorting of each slice to execute the quantum logic operations within. Responsibility (4) is performed by carrying out an O(n) traversal over the ring storage to determine which two pairs in a slice have qubits closest to the cache. The runtime then assigns one pair to move to the top leg and the other to the bottom. Subsequently, the algorithm determines the smallest number of rotations needed to move the two pairs into BYYB crystals in both legs. This process is visualized in Fig. 2. This process repeats until either enough pairs are moved into the cache to fill a batch or no more pairs need to be sorted. Finally, the runtime dispatches the calculated sort by generating these operations as a queue of commands to lower-level control system software for performing transport operations and parallelized cooling, as outlined in the section ‘QCCD operation’. After all of the quantum logic operations have been executed in a given slice through repetitions of this sort, transport is generated to return the qubits back into the ring storage—and the sorting algorithm repeats for subsequent slices. For unconditional programs with no changes in program execution depending on mid-program measurement results, these responsibilities are calculated ‘ahead’ of the physical execution of the operations on the quantum processor and thus add no extra overhead to the time needed to run a program. However, when mid-program measurements are used to determine future quantum operations, submillisecond-scale latency can be added to calculate the above responsibilities for the next round future quantum operations while the qubit state is still live. The transport time savings can be on the several-millisecond timescale for sorting a single batch of qubits more efficiently based on feed-forward quantum operations and much larger quantities of time can be saved for programs with early-exit conditions.
Component-level benchmarks
SPAM
It is difficult to differentiate state preparation errors from measurement errors93, although from detailed modelling of 137Ba+ qubits, we expect state preparation errors to be the largest contributor9.
We measure SPAM errors by preparing 16 qubits in the eight operation zones in the |0⟩ or |1⟩ states and measuring each qubit. For any given shot, the state preparations are randomized among the different qubits, but we ensure that each qubit is prepared in each state for the same total number of shots. We run two experiments: standard measurement that ideally differentiates |0⟩ from |1⟩ but falsely returns |1⟩ in the event that the qubit has leaked and a ternary measurement, shown in Extended Data Fig. 1c, that ideally differentiates |0⟩, |1⟩ and leaked states. For both experiments, we take 4,000 shots per state preparation.
For the standard measurement, we measure errors of 5.2(9) × 10−4 and 1.4(5) × 10−4 when preparing |0⟩ and |1⟩, respectively. Because this measurement protocol mistakenly detects leaked states as |1⟩, the reported error for preparing and measuring |1⟩ will not catch all errors9. For the ternary measurement, we find an average leakage probability of 8(3) × 10−4 and, in the event of non-leakage, we measure SPAM errors of 1.0(1) × 10−3 and 1.3(1) × 10−3 for |0⟩ and |1⟩, respectively. Although the ternary measurement reveals more information as it can detect leakage, it also has a larger SPAM error owing to a larger number of shelving pulses involved. The SPAM errors reported in Fig. 4a,b,i are averaged between the two state preparations. We actively make a trade-off between SPAM fidelity and MCMR crosstalk by reducing laser powers and detection times. SPAM is performed much less frequently than gating, leading to a lower relevant importance in the circuit despite being a large error in Fig. 4a,b,i.
1Q gates
1Q gate errors are mainly caused by spontaneous emission during the Raman gate, laser phase and intensity noise and finite qubit coherence. Notably, spontaneous emission causes leakage outside the computational subspace. We quantify 1Q gate errors by Clifford RB50 (Supplementary Information).
We follow the methods in ref. 94 to account for leakage in the 1Q infidelity estimate. The ternary measurement allows us to measure the leakage population at the end of every circuit without the use of ancilla qubits (as done in ref. 22). We estimate the rate of leakage per 1Q Clifford rL by the rate at which the measured leakage population increases with sequence length. The probability of observing the expected computational state decays exponentially owing to non-leakage errors as p(l) = A(1 − r)l + 1/2 for sequence length l, in which A and r are fit parameters. The reported 1Q error is the Clifford average infidelity ϵavg,1Q = r/2 + r (ref. 94).
Extended Data Fig. 2 shows the success probability and the leaked population as a function of l, for all 16 qubits in the eight operation zones. We obtain a zone-averaged 1Q error of 2.5(1) × 10−5, which includes a leakage rate of 1.12(6) × 10−5. The error bars represent a one-sigma confidence interval obtained from bootstrapping95. The leakage rates and infidelities for each individual qubit are given in the Supplementary Information. The measured errors can be compared with our predictions from physical error models of 2.6(6) × 10−5 that account for measured laser intensity noise, calculated spontaneous emission and measured memory error.
Finally, we ran a statistical hypothesis test for correlated errors in the simultaneous 1QRB data. An error channel on several subsystems is correlated if it cannot be factored into a tensor product of individual error channels on each subsystem, and such correlated errors are a signature of crosstalk. We found no evidence of correlated errors at the 95% confidence level (Supplementary Information).
2Q gates
Errors in the RZZ(θ) gates are caused by spontaneous emission from the Raman lasers and experimental imperfections including laser phase and intensity noise at the position of the ion, thermal motion of the ions, voltage noise on the electrodes and imprecise calibrations of the gate parameters. We validate the performance of the maximally entangling RZZ(π/2) gate (referred to as the 2Q gate) using both Clifford 2QRB and CB. Further details of each implementation are in the Supplementary Information.
We again follow the methods in ref. 94 to account for leakage in the 2QRB infidelity estimate. The leaked population versus sequence length is used to extract a leakage rate per Clifford, which is rescaled into a leakage rate per 2Q gate rL,2Q, using the fact that there are 1.5 2Q gates per 2Q Clifford on average. We fit the success probability of the remaining population to the decay model p(l) = A(1 − r)l + 1/4, for sequence length l, in which A and r are fit parameters. The average infidelity of the non-leakage error component per Clifford is given by 3r/4, which is rescaled into an average infidelity per 2Q gate of r/2. The average infidelity per 2Q gate (including leakage) is then computed as ϵavg,2Q = r/2 + rL,2Q. We note that our rescaling of the error per Clifford into an error per 2Q neglects the errors from 1Q gates and memory errors during the 2QRB sequence, which we estimate to contribute 1.2(2) × 10−4 per 2Q gate.
The experimental 2QRB data are shown in Extended Data Fig. 3. We obtain a zone-averaged 2Q infidelity of ϵavg,2Q = 7.9(2) × 10−4, which includes a leakage rate of rL,2Q = 2.4(1) × 10−4. The leakage rates and infidelities for each individual qubit pair are given in the Supplementary Information. The leakage errors arise from both spontaneous emission error, which we measure to be 1.0(2) × 10−4 in agreement with the model in ref. 96, and from the leakage memory error (discussed in the section ‘Memory errors’). In total, we expect leakage to contribute 1.7(2) × 10−4 of the error.
Our measured value of 7.9(2) × 10−4 can be compared with a total expected error per 2Q gate of 3.5(4) × 10−4, which we predict from an error budget consisting of spontaneous emission errors, memory error and 1Q pulse errors plus other characterized experimental sources of noise, such as laser phase and intensity noise, thermal motion of the ions and imprecise calibrations. The discrepancy of the measured 2Q error with predicted value could be explained by several factors, including higher leakage error in the operational zones owing to finite extinction of the resonant detection beams present, non-thermal motional distributions, crosstalk or other unaccounted for effects.
Just as with the 1QRB data, we performed a statistical test for the presence of correlated errors in the 2QRB data and found no notable evidence of correlated errors between different qubit pairs (Supplementary Information).
We also perform 2QCB (ref. 53) to estimate a partial Pauli error model for the 2Q gate in each operation zone, with the experimental and theoretical details provided in the Supplementary Information. Extended Data Fig. 4 shows the expectation value decays and estimated Pauli error channels for each qubit pair. We find that the zone-averaged infidelity is 8.1(2) × 10−4, which includes a leakage rate of 1.14(4) × 10−4. The error channel is dominated by IZ and ZI errors, which modelling suggests is caused by laser phase noise, spontaneous emission and electrode voltage noise. We note that our estimate of leakage rate per 2Q gate from 2QCB is about a factor of two smaller than the estimate from 2QRB.
Memory errors
Qubits not being gated incur memory errors owing to magnetic field inhomogeneities, with their impact being heavily dependent on the circuit structure and its specific transport schedule. As a figure of merit, we define the depth-n memory error to be the average infidelity per qubit after randomly pairing all qubits, performing the transport and cooling operations that would be required to apply 2Q gates on all pairs (but no actual gate operations) and repeating this process n times.
We measure memory error with a variant of 1QRB that interleaves random transport between 1Q Clifford gates, referred to as transport-1QRB22,52. Our method here differs from ref. 22 in that we partition the 98 qubits into groups in which the qubits in each group have a random 1Q Clifford operation applied after every k rounds of depth-1 transport operations on all qubits (Supplementary Information). The qubits in the different groups will have a different amount of transport and idle time between Clifford operations, which allows us to extract how memory errors scale with the number of depth-1 transport operations for random circuits.
We run transport-1QRB circuits on the 98 qubits in four groups of 25 or 24 qubits, with k ∈ 1, 2, 4, 8 transport operations between Cliffords. Furthermore, we use the ternary measurement to extract any leakage errors during transport. Extended Data Fig. 5a,b shows the measured decay in transport-1QRB for computational and ternary measurements, respectively. The decay curves are clustered into four groups determined by k. By fitting the decay curves and accounting for the leakage rate using the same procedure as in the section ‘1Q gates’, we obtain the Clifford infidelity for each qubit.
Extended Data Figure 5c shows a plot of the Clifford infidelity as a function of the number of depth-1 transport operations, averaged over all qubits in the corresponding group. The expected scaling of memory error with delay time varies depending on the timescale of the noise sources97. For this reason, we fit the memory error versus l to a quadratic equation a + bl + cl2, in which b and c capture the linear memory error rate (from fast noise) and quadratic memory error parameter (from slow noise), respectively52.
From the fit to the data, we infer a linear memory error rate of 5(1) × 10−4 and a quadratic memory error parameter of 7(2) × 10−5. We find that the leakage error scales linearly with the number of transport operations, with a rate of 4.0(2) × 10−4, and accounts for nearly all of the linear memory error. The expected coherent error from typical drift in magnetic fields between calibrations (every roughly 5 s) of approximately 10 μG is 3 × 10−5 in a depth-1 circuit. The remaining coherent error may be explained by imperfections in the phase-tracking routine or other unaccounted sources of noise.
MCMR crosstalk
We measure MCMR crosstalk errors by preparing 16 qubits in the eight operation zones, while the remaining 80 qubits are stored in the ring. A single (‘target’) qubit in each operation zone is measured and reset repeatedly, while the other 90 (‘spectator’) qubits are prepared in the |0⟩ or |1⟩. Crosstalk errors on spectator qubits result from absorbing stray measurement or reset light. The resulting spontaneous emission can lead to incoherence owing to bit-flip, leakage or dephasing errors. Using the ternary measurement at the end allows us to differentiate bit-flip rates from leakage rates to get a more detailed picture of the crosstalk error channel. We find a per MCMR crosstalk error of 1.3(1) × 10−5, with crosstalk in individual operation zones reported in Fig. 4a. Further details are provided in the Supplementary Information.
System-level benchmarks
Random Clifford circuits with mid-circuit measurements
Reference 98 introduced circuits with random Clifford layers as a scalable system-level benchmark called binary randomized benchmarking. An extension allowing for MCMRs was given in ref. 49, called quantum instrument randomized benchmarking. Our circuits are constructed similarly to those in ref. 49, with a few small modifications (Supplementary Information).
In our implementation, a length l circuit on N qubits with nm MCMRs per layer consists of the following for each layer:
-
A distinct uniformly random 1Q Clifford is applied to each qubit.
-
The N qubits are uniformly randomly paired into \(\lfloor \fracN2\rfloor \) qubit pairs and the 2Q gate RZZ(π/2) is applied to each pair, with Pauli twirling applied to the 2Q gates.
-
A uniformly random subset of nm qubits is sampled and, for each qubit, a 1Q Clifford is applied to prepare a measurement in a particular Pauli basis, followed by a MCMR operation.
To classically verify correct circuit outputs, we track a random initial stabilizer through the circuit (Supplementary Information). The parity of the evolved stabilizer defines a success/failure trial. For the purpose of fidelity estimation, the average success probability is rescaled into a quantity called the polarization98, defined as ypol = 2psucc − 1. A polarization of 1 corresponds to perfect success, whereas a polarization of 0 corresponds to 50% success, or random guessing. A plot of ypol(l, nm) versus l for different values of nm is shown in Fig. 4e. Let F(nm) be the process fidelity per circuit layer as a function of nm. We estimate F(nm) by fitting the polarization to an exponential decay model. Figure 4f shows a plot of F(nm) versus nm. We note that the layer fidelity actually increases slightly (with overlapping error bars) as nm increases from 8 to 16. This is explained by the fact that a batch of 16 measurements in the operation zones uses the protected measure scheme (explained in Fig. 1b), which protects against MCMR crosstalk in the operation zones.
To see whether the results are consistent with our component benchmarks, we first compute an effective 2Q gate error ϵeff,2Q from the nm = 0 data, using
$$F(n_\rmm=0)=(1-5\epsilon _\rmeff,2\rmQ/4)^\left\lfloor \fracN2\right\rfloor ,$$
(1)
in which the factor 5/4 comes from the conversion between process and average fidelity99. The effective 2Q gate error includes errors from 2Q gates, 1Q gates and memory errors and can be thought of as the infidelity of a 2Q depolarizing channel that would best fit the data in the absence of all other errors. We find ϵeff,2Q = 1.7(2) × 10−3, whereas an accounting of 2Q and memory errors according to
$$\epsilon _\rmeff,2\rmQ=\frac45\left(\left(\frac54\right)\epsilon _\rmavg,2\rmQ+2\left(\frac32\right)\epsilon _\rmmem\right)$$
(2)
predicts 2.2(1) × 10−3 (Supplementary Information). We attribute the fact that the effective 2Q error is smaller than what the component errors predict to improvements in the gates and memory errors between the times when the component and random Clifford circuit benchmarks were run.
We next compute an effective MCMR error ϵeff,MCMR by best-fitting the F(nm) versus nm data to the heuristic formula
$$F(n_\rmm)=(1-5\epsilon _\rmeff,2\rmQ/4)^\left\lfloor \fracN2\right\rfloor (1-3\epsilon _\rmeff,\rmMCMR/2)^n_\rmm$$
(3)
together with our computed value of ϵeff,2Q. We find ϵeff,MCMR = 2.4(5) × 10−3. By comparison, adding the component-level SPAM, MCMR crosstalk and memory errors, we predict an effective MCMR error of 2.5(1) × 10−3 (Supplementary Information). We conclude that the data from our random Clifford with MCMR circuits is consistent with our measured component-level errors. We remark that our method of comparison is heuristic and a rigorous methodology for comparing component-level to system-level benchmarking performance is an open problem.
RCS mirror benchmarking
RCS is a system-level benchmark assessing how effectively a quantum computer can generate computationally complex quantum states15. Like binary randomized benchmarking, RCS examines the extent to which quantum circuits obtain the performance expected from component-level benchmarks. At the same time, because the classical difficulty of sampling from the outputs of random quantum circuits has been extremely well studied over the past decade100, RCS provides a well-vetted benchmark for the computational power of a quantum computer.
By making use of the arbitrary connectivity of the Helios quantum computer, we consider RCS with circuit geometries constructed from colourings of random regular graphs14: a layer depth-l random circuit is constructed by interleaving l layers of 2Q RZZ(π/2) gates (each layer containing N/2 2Q gates) with l + 1 layers of Haar-random 1Q gates (each layer containing N 1Q gates). Although the fidelity of such circuits can in principle be inferred by running them and performing cross-entropy benchmarking101, evaluating the cross-entropy requires exact simulation of the circuits in question and is infeasible except for small depth or qubit number. To estimate the expected state fidelity in RCS (and therefore the anticipated performance in cross-entropy benchmarking), we follow the strategy of refs. 14,102,103,104,105 and infer the fidelity of a layer depth-l circuit by computing the return-probability FMB of a ‘mirrored’ layer depth-l/2 circuit, with the second (mirrored) half of the circuit using randomized compiling to prevent unintended cancellation of coherent errors. The randomness for randomized compilation is sampled in real time at the start of each shot and the corresponding random 1Q gates are compiled on the fly (with the existing Haar-random 1Q gates), resulting in only one physical 1Q gate per qubit per layer. Following ref. 14, we also initialize each mirrored circuit into a random computational basis state to prevent unequal SPAM errors between the two basis states from biasing the fidelity estimate. At each depth, we execute between 1,000 and 2,500 shots spread evenly across 100 random circuit connectivities. As well as the mirrored random circuits run to assess RCS performance, we also directly sampled the output of a single (unmirrored) random circuit of depth d = 26. That circuit is included in ref. 106, along with 2,500 sampled bitstrings from Helios.
The fidelity of RCS as a function of depth inferred in this manner is reported in Fig. 4g. We perform a least-squares best fit to the gate-counting model from ref. 14,
$$F_\rmGC(l)=(1-p_\rmSPAM)^N\left(1-\frac54\epsilon _\rmeff,2\rmQ\right)^\fracN2(l-\delta ).$$
(4)
Here N = 98, δ = 1.12 is a correction to effective circuit layer depth from boundary effects in mirror circuits14, pSPAM is the effective SPAM error and ϵeff,2Q is the effective average 2Q error rate, which includes effects from 1Q, 2Q and memory errors as in the previous section. From the fit, we estimate pSPAM = 5.3(51) × 10−4 and ϵeff,2Q = 2.00(6) × 10−3. This effective 2Q error is also consistent with the estimate obtained from random Clifford circuits as well as component benchmarks reported in Fig. 4i.
The task of sampling from the output distribution induced by running forward (unmirrored) circuits is well defined for either quantum or classical computers. In either case, the quality of samples can be judged by statistical tests, with the linear cross-entropy test being a widely used standard15. As mentioned above, the linear cross-entropy score of the quantum data is expected to agree closely (for the circuits run here) with the overall circuit fidelity estimated from mirror benchmarking results at comparable depth14. For the high scores achievable with Helios (hitting a minimum of about 3.5% at depth 26), there is no known classical strategy to score well on the linear cross-entropy test without performing (nearly) exact simulation of the circuits in question14, with the most efficient strategy for doing so being tensor-network contraction.
The reported costs in Fig. 4h are for optimized tensor-network contraction assuming so-called ‘embarrassing parallelization’ (by means of slicing) into independent computations involving various amounts of available memory (corresponding to cotengra contraction widths of \(\mathcalW=30\), 49 and 54) and were obtained using (sliced) simulated annealing built into cotengra107. We note that the contraction–cost optimization performed here is only approximate and the costs could certainly be mildly improved by providing the optimization heuristics with more computational power. However, we do not expect such improvements to change the overall conclusion that Helios can produce states at high global fidelity for which the (classical) sampling cost is vastly beyond the capabilities of existing supercomputers.

