Chapter 5 Classically-verifiable quantum advantage from a computational Bell test

5.1 Introduction

The development of large-scale programmable quantum hardware has opened the door to testing a fundamental question in the theory of computation: can quantum computers outperform classical ones for certain tasks? This idea, termed quantum computational advantage, has motivated the design of novel algorithms and protocols to demonstrate advantage with minimal quantum resources such as qubit number and gate depth [AA11, FH19, BMS16, LBR17, HM17, TER18, BIS+18, BFN+19, AC17, NRK+18]. Such protocols are naturally characterized along two axes: the computational speedup and the ease of verification. The former distinguishes whether a quantum algorithm exhibits a polynomial or super-polynomial speedup over the best known classical one. The latter classifies whether the correctness of the quantum computation is efficiently verifiable by a classical computer. Along these axes lie three broad paths to demonstrating advantage: 1) sampling from entangled quantum many-body wavefunctions, 2) solving a deterministic problem, e.g. prime factorization, via a quantum algorithm, and 3) proving quantumness through interactive protocols.

Sampling-based protocols directly rely on the classical hardness of simulating quantum mechanics [AA11, BMS16, BIS+18, BFN+19, AC17, NRK+18]. The “computational task” is to prepare and measure a generic complex many-body wavefunction with little structure. As such, these protocols typically require minimal resources and can be implemented on near-term quantum devices [AAB+19, ZWD+20]. The correctness of the sampling results, however, is exponentially difficult to verify. This has an important consequence: in the regime beyond the capability of classical computers, the sampling results cannot be explicitly checked, and quantum computational advantage can only be inferred (e.g. extrapolated from simpler circuits).

Algorithms in the second class of protocols are naturally broken down by whether they exhibit polynomial or super-polynomial speed-ups. In the case of polynomial speed-ups, there exist notable examples that are provably faster than any possible classical algorithm [BGK18, BGK+20]. However, polynomial speed-ups are tremendously challenging to demonstrate in practice, due to the slow growth of the separation between classical and quantum run-times ¹¹ 1 They also have some other caveats: a provable speedup of $O (1)$ quantum complexity over $O (n)$ classical complexity is promising, but just reading the input may require $O (n)$ time, hiding the computational speedup in practice.. Accordingly, the most attractive algorithms for demonstrating advantage tend to be those with a super-polynomial speed-up, including Abelian hidden subgroup problems such as factoring and discrete logarithms [SHO97]. The challenge is that for all known protocols of this type, the quantum circuits required to demonstrate advantage are well beyond the capabilities of near-term experiments.

The final class of protocols demonstrates quantum advantage through an interactive proof [BCM+21, BKV+20, ABE+17, WAT99, KW00, KM03, FV15, MFI+20]. At a high level, this type of protocol involves multiple rounds of communication between the classical verifier and the quantum prover; the prover must give self-consistent responses despite not knowing what the verifier will ask next. This requirement of self-consistency rules out a broad range of classical cheating strategies and can imbue “hardness” into questions that would otherwise be easy to answer. To this end, interactive protocols expand the space of computational problems that can be used to demonstrate quantum advantage; from a more pragmatic perspective, this can enable the realization of efficiently verifiable quantum advantage on near-term quantum hardware.

Figure 5.1: Schematic representation of the interactive quantum advantage protocol. In the first round of interaction, the classical verifier (right) selects a specific function from a trapdoor claw-free family and the quantum prover (left) evaluates it over a superposition of inputs. The goal of the second round is to condense the information contained in the prover’s superposition state onto a single ancilla qubit. The final round of interaction effectively performs a Bell inequality measurement, whose outcome is cryptographically protected.

Recently, a beautiful interactive protocol was introduced that can operate both as a test for quantum advantage and as a generator of certifiable quantum randomness [BCM+21]. The core of the protocol is a two-to-one function, $f$ , built on the computational problem known as learning with errors (LWE) [REG09]. The demonstration of advantage leverages two important properties of the function: first, it is claw-free, meaning that it is computationally hard to find a pair of inputs $(x_{0}, x_{1})$ such that $f (x_{0}) = f (x_{1})$ . ²² 2 “Claw-free” is often used to refer to a pair of functions $f_{0}, f_{1}$ such that for appropriate $x_{0}, x_{1}$ we have $f_{0} (x_{0}) = f_{1} (x_{1})$ . Here, we use the slightly more general idea of a single 2-to-1 function $f$ for which it is hard to find $x_{0}, x_{1}$ such that $f (x_{0}) = f (x_{1})$ . This is a special case of a “collision-resistant function,” which could potentially be many-to-one. We also note that a claw-free pair of functions can be converted into a single claw-free function by defining $f (b | | x) = f_{b} (x)$ , where $| |$ denotes concatenation.. Second, there exists a trapdoor: given some secret data $t$ , it becomes possible to efficiently invert $f$ and reveal the pair of inputs mapping to any output. (See Section 5.7.5 for an overview of trapdoor claw-free functions). However, to fully protect against cheating provers, the protocol requires a stronger version of the claw-free property called the adaptive hardcore bit, namely, that for any input $x_{0}$ (which may be chosen by the prover), it is computationally hard to find even a single bit of information about $x_{1}$ ³³ 3 To be precise, it is hard to find both $x_{0}$ and the parity of any subset of the bits of $x_{1}$ .. The need for an adaptive hardcore bit within this protocol severely restricts the class of functions that can operate as verifiable tests of quantum advantage.

Here, we propose and analyze a novel interactive quantum advantage protocol that removes the need for an adaptive hardcore bit, with essentially zero overhead in the quantum circuit and no extra cryptographic assumptions. We present four main results. First, we demonstrate how an idea from tests of Bell’s inequality can serve the same cryptographic purpose as the adaptive hardcore bit [BEL64]. In essence, our interactive protocol is a variant of the CHSH (Clauser, Horne, Shimony, Holt) game [CHS+69] in which one player is replaced by a cryptographic construction. Normally, in CHSH, two quantum parties are asked to produce correlations that would be impossible for classical devices to produce. If space-like separation is enforced to rule out communication between the two parties, then the correlations constitute a proof of quantumness. In our case, the space-like separation is replaced by the computational hardness of a cryptographic problem. In particular, the quantum prover holds a qubit whose state depends on the cryptographic secret in the same way that the state of one CHSH player’s qubit depends on the secret measurement basis of the other player. An alternative interpretation, from the perspective of Bell’s theorem, is that the protocol can be thought of as a “single-detector Bell test”—the cryptographic task generates the same single-qubit state as would be produced by entangling a second qubit and measuring it with another detector. As in the CHSH game, a quantum device can pass the verifier’s test with probability $\sim 85 %$ , but a classical device can only succeed with probability at most $75 %$ . This finite gap in success probabilities is precisely what enables a verifiable test of quantum advantage.

Second, by removing the need for an adaptive hardcore bit, our protocol accepts a broader landscape of functions for interactive tests of quantum advantage (see Table 5.1 and Methods). We populate this list with two new constructions. The first is based on the decisional Diffie-Hellman problem (DDH) [DH76, PW08, FGK+10]; the second utilizes the function $f_{N} (x) = x^{2} mod N$ with $N$ the product of two primes, which forms the backbone of the Rabin cryptosystem [RAB79, GMR88]. On the one hand, DDH is appealing because the elliptic-curve version of the problem is particularly hard for classical computers [MIL86, KOB87, BAR16]. On the other hand, $x^{2} mod N$ can be implemented significantly more efficiently, while its hardness is equivalent to factoring. We hope that these two constructions will provide a foundation for the search for more TCFs with desirable properties (small key size and efficient quantum circuits).

Third, we describe two innovations that facilitate our protocol’s use in practice: a way to significantly reduce overhead arising from the reversibility requirement of quantum circuits, and a scheme for increasing noisy devices’ probability of passing the test. Normally, quantum implementations of classical functions like the TCFs used in this protocol have some overhead, due to the need to make the circuit reversible in order to be consistent with unitarity [BEN89, LS90, AKN98, BIC+04, KTR14]. Our protocol exhibits the surprising property that it permits a measurement scheme to discard so-called “garbage bits” that arise during the computation, allowing classical circuits to be converted into quantum ones with essentially zero overhead. In the case of a noisy quantum device, the protocol also enables an inherent post-selection scheme for detecting and removing certain types of quantum errors. With this scheme it is possible for quantum devices to trade off low quantum fidelities for an increase in the overall runtime, while still passing the cryptographic test. We note that these constructions are likely applicable to other TCF-based quantum cryptography protocols as well, and thus may be of independent interest for tasks such as certifiable quantum random number generation.

Finally, focusing on the TCF $x^{2} mod N$ , we provide explicit quantum circuits—both asymptotically optimal (requiring only $O (n log n)$ gates and $O (n)$ qubits), as well as those aimed for near-term quantum devices. We show that a verifiable test of quantum advantage can be achieved with $\sim 10^{3}$ qubits and a gate depth $\sim 10^{5}$ (see Methods). We also co-design a specific implementation of $x^{2} mod N$ optimized for a programmable Rydberg-based quantum computing platform. The native physical interaction corresponding to the Rydberg blockade mechanism enables the direct implementation of multi-qubit-controlled arbitrary phase rotations without the need to decompose such gates into universal two-qubit operations [SAF16, LKS+19, GKG+19, MCS+20, BL20]. Access to such a native gate immediately reduces the gate depth for achieving quantum advantage by an order of magnitude.

Problem

Trap

door

Claw-

free

Adaptive

hard-core

bit

Asymptotic

complexity

(gate count)

LWE [BCM+21]

✓

n^{2} {log}^{2} n

x^{2} mod N

✓

✗

n log n

Ring-LWE [BKV+20]

✓

✗

n {log}^{2} n

Diffie-Hellman

✓

✗

n^{3} {log}^{2} n

Shor’s alg.

—

n^{2} log n

Table 5.1: Cryptographic constructions for interactive quantum advantage protocols.

n

represents the number of bits in the function’s input string. Big-

O

notation is implied an d factors of

log log n

and smaller are dropped. For references and derivations of the circuit complexities, see Section 5.7.6.

5.2 Background and Related Work

The use of trapdoor claw-free functions for quantum cryptographic tasks was pioneered in two recent breakthrough protocols: (i) giving classical homomorphic encryption for quantum circuits [MAH20] and (ii) for generating cryptographically certifiable quantum randomness from an untrusted black-box device [BCM+21]; this latter work also introduced the notion of an adaptive hardcore bit and serves as an efficiently verifiable test of quantum advantage. Remarkably, the scheme was further extended to allow a classical server to cryptographically verify the correctness of arbitrary quantum computations [MAH18]; it has also been applied to remote state preparation with implications for secure delegated computation [GV19].

Recently, an improvement to the practicality of TCF-based proofs of quantumness was provided in the random oracle model (ROM)—a model of computation in which both the quantum prover and classical verifier can query a third-party “oracle,” which returns a random (but consistent) output for each input. In that work, the authors provide a protocol that both removes the need for the adaptive hardcore bit, and also reduces the interaction to a single round [BKV+20]. Because the security of the protocol is proven in the ROM, implementing this protocol in practice requires applying the random oracle heuristic, in which the random oracle is replaced by a cryptographic hash function, but the hardness of classically defeating the protocol is taken to still hold ⁴⁴ 4 Replacing the random oracle with a hash function is termed a heuristic rather than an assumption because the security of this procedure generally holds in practice but is not provable—in fact, there exist constructions that are provably secure in the random oracle model but trivially insecure when instantiated with a hash function [CGH04].. Only contrived cryptographic schemes have ever been broken by attacking the random oracle heuristic [CGH04, KM15], so it seems to be effective in practice and the ROM protocol has significant potential for use as a practical tool for benchmarking untrusted quantum servers. On the other hand, for a robust experimental test of the foundational complexity theoretic claims of quantum computing—that quantum mechanics allows for algorithms that are superpolynomially faster than classical Turing machines—we desire the complexity-theoretic backing of the speedup to be as strong as possible (i.e. provable in the “standard model” of computation [AC17]), which is the goal pursued in the present work. With that said, we emphasize that the various optimizations described below—including the TCF families based on DDH and $x^{2} mod N$ , as well as the schemes for postselection and discarding garbage bits—can be applied to the ROM protocol as well.

Lastly, we also note two recent works which demonstrate that any TCF-based proof of quantumness, including the present work, can be implemented in constant quantum circuit depth (at the cost of more qubits) [LG22, HL21].

5.3 Interactive Protocol for Quantum Advantage

Our full protocol is shown diagrammatically in Figure 5.1. It consists of three rounds of interaction between the prover and verifier (with a “round” being a challenge from the verifier, followed by a response from the prover). The first round generates a multi-qubit superposition over two bit strings that would be cryptographically hard to compute classically. The second round maps this superposition onto the state of one ancilla qubit, retaining enough information to ensure that the resulting single-qubit state is also hard to compute classically. The third round takes this single qubit as input to a CHSH-type measurement, allowing the prover to generate a bit of data that is correlated with the cryptographic secret in a way that would not be possible classically. Having described the intuition behind the protocol, we now lay out each round in detail.

5.3.1 Description of the protocol

The goal of the first round is to generate a superposition over two colliding inputs to the trapdoor claw-free function (TCF). It begins with the verifier choosing an instance $f_{i}$ of the TCF along with the associated trapdoor data $t$ ; $f_{i}$ is sent to the prover. As an example, in the case of $x^{2} mod N$ , the “index” $i$ is the modulus $N$ , and the trapdoor data is its factorization, $p, q$ . The prover now initializes two registers of qubits, which we denote as the $x$ and $y$ registers. On these registers, they compute the entangled superposition $| ψ ⟩ = \sum_{x} | x ⟩_{x} | f_{i} (x) ⟩_{y}$ , over all $x$ in the domain of $f_{i}$ . The prover then measures the $y$ register in the standard basis, collapsing the state to ${(| x_{0} ⟩ + | x_{1} ⟩)}_{x} | y ⟩_{y}$ , with $y = f (x_{0}) = f (x_{1})$ . The measured bitstring $y$ is then sent to the verifier, who uses the secret trapdoor to compute $x_{0}$ and $x_{1}$ in full.

At this point, the verifier randomly chooses to either request a projective measurement of the $x$ register, ending the protocol, or to continue with the second and third rounds. In the former case, the prover communicates the result of that measurement, yielding either $x_{0}$ or $x_{1}$ , and the verifier checks that indeed $f (x) = y$ . In the latter case, the protocol proceeds with the final two rounds.

The second round of interaction converts the many-qubit superposition $| ψ ⟩ = | x_{0} ⟩_{x} + | x_{1} ⟩_{x}$ into a single-qubit state ${| 0 ⟩_{b}, | 1 ⟩_{b}, | + ⟩_{b}, | - ⟩_{b}}$ on an ancilla qubit $b$ . The final state of $b$ depends on the values of both $x_{0}$ and $x_{1}$ . The round begins with the verifier choosing a random bitstring $r$ of the same length as $x_{0}$ and $x_{1}$ , and sending it to the prover. Using a series of CNOT gates from the $x$ register to $b$ , the prover computes the state $| r \cdot x_{0} ⟩_{b} | x_{0} ⟩_{x} + | r \cdot x_{1} ⟩_{b} | x_{1} ⟩_{x}$ , where $r \cdot x$ denotes the binary inner product. Finally, the prover measures the $x$ register in the Hadamard basis, storing the result as a bitstring $d$ which is sent to the verifier. This measurement disentangles $x$ from $b$ without collapsing $b$ ’s superposition. At the end of the second round, the prover’s state is $(- 1)^{d \cdot x_{0}} | r \cdot x_{0} ⟩_{b} + (- 1)^{d \cdot x_{1}} | r \cdot x_{1} ⟩_{b}$ , which is one of ${| 0 ⟩, | 1 ⟩, | + ⟩, | - ⟩}$ . Crucially, it is cryptographically hard to predict whether this state is one of ${| 0 ⟩, | 1 ⟩}$ or ${| + ⟩, | - ⟩}$ .

The final round of our protocol can be understood in analogy to the CHSH game [CHS+69]. While the prover cannot extract the polarization axis from their single qubit (echoing the no-signaling property of CHSH), they can make a measurement that is correlated with it. This measurement outcome ultimately constitutes the proof of quantumness. In particular, the verifier requests a measurement in an intermediate basis, rotated from the $Z$ axis around $Y$ , by either $θ = π / 4$ or $- π / 4$ . Because the measurement basis is never perpendicular to the state, there will always be one outcome that is more likely than the other (specifically, with probability ${cos}^{2} (π / 8) \approx 0.85$ ). The verifier returns $A c c e p t$ if this “more likely” outcome is the one received.

In the next section, we prove that a quantum device can cause the verifier to $A c c e p t$ with substantially higher probability than any classical prover. A full test of quantum advantage would consist of running the protocol many times, until it can be established with high statistical confidence that the device has exceeded the classical probability bound.

5.3.2 Completeness and soundness

We now prove completeness (the noise-free quantum success probability) and soundness (an upper bound on the classical success probability). Recall that after the first round of the protocol, the verifier chooses to either request a standard basis measurement of the first register, or to continue with the second and third rounds. In the proofs below, we analyze the prover’s success probability across these two cases separately. We denote the probability that the verifier will accept the prover’s string $x$ in the first case as $p_{x}$ , and the probability that the verifier will accept the single-qubit measurement result in the second case as $p_{C H S H}$ .

Perfect quantum prover (completeness)

Theorem 2.

An error-free quantum device honestly following the interactive protocol will cause the verifier to return $A c c e p t$ with $p_{x} = 1$ and $p_{C H S H} = {cos}^{2} (π / 8) \approx 0.85$ .

Proof.

If the verifier chooses to request a projective measurement of $x$ after the first round, an honest quantum prover succeeds with probability $p_{x} = 1$ by inspection.

If the verifier chooses to instead perform the rest of the protocol, the prover will hold one of ${| 0 ⟩, | 1 ⟩, | + ⟩, | - ⟩}$ after round 2. In either measurement basis the verifier may request in round 3, there will be one outcome that occurs with probability ${cos}^{2} (π / 8)$ , which is by construction the one the verifier accepts. Thus, an honest quantum prover has $p_{C H S H} = {cos}^{2} (π / 8) \approx 0.85$ . ∎

Classical prover (soundness)

Theorem 3.

Assume the function family used in the interactive protocol is claw-free. Then, $p_{x}$ and $p_{C H S H}$ for any classical prover must obey the relation

p_{x} + 4 p_{C H S H} - 4 < ϵ (n)

(5.1)

where $ϵ$ is a negligible function of $n$ , the length of the function family’s input strings.

Proof.

We prove by contradiction. Assume that there exists a classical machine $A$ for which $p_{x} + 4 p_{C H S H} - 4 \geq μ (n)$ , for a non-negligible function $μ$ . We show that there exists another algorithm $B$ that uses $A$ as a subroutine to find a pair of colliding inputs to the claw-free function, a contradiction.

Given a claw-free function instance $f_{i}$ , $B$ acts as a simulated verifier for $A$ . $B$ begins by supplying $f_{i}$ to $A$ , after which $A$ returns a value $y$ , completing the first round of interaction. $B$ now chooses to request the projective measurement of the $x$ register, and stores the result as $x_{0}$ . Letting $p_{x_{0}}$ be the probability that $x_{0}$ is a valid preimage, by definition of $p_{x}$ we have $p_{x_{0}} = p_{x}$ .

Next, $B$ rewinds the execution of $A$ , to its state before $x_{0}$ was requested. Crucially, rewinding is possible because $A$ is a classical algorithm. $B$ now proceeds by running $A$ through the second and third rounds of the protocol for many different values of the bitstring $r$ (Fig. 1), rewinding each time.

We now show that, for $r$ selected uniformly at random, $B$ can extract the value of the inner product $r \cdot x_{1}$ with probability $p_{r \cdot x_{1}} \geq 1 - 2 (1 - p_{C H S H})$ . $B$ begins by sending $r$ to $A$ , and receiving the bitstring $d$ . $B$ then requests the measurement result in both the $θ = π / 4$ and $θ = - π / 4$ bases, by rewinding in between. Supposing that both the received values are “correct” (i.e. would be accepted by the real verifier), they uniquely determine the single-qubit state $| ψ ⟩ \in {| 0 ⟩, | 1 ⟩, | + ⟩, | - ⟩}$ that would be held by an honest quantum prover. This state reveals whether $r \cdot x_{0} = r \cdot x_{1}$ , and because $B$ already holds $x_{0}$ , $B$ can compute $r \cdot x_{1}$ . We may define the probability (taken over all randomness except the choice of $θ$ ) that the prover returns an accepting value in the cases $θ = π / 4$ and $θ = - π / 4$ as $p_{π / 4}$ and $p_{- π / 4}$ respectively. Then, via union bound, the probability that both are indeed correct is $p_{r \cdot x_{1}} \geq 1 - (1 - p_{π / 4}) - (1 - p_{- π / 4})$ . Considering that $p_{C H S H} = (p_{π / 4} + p_{- π / 4}) / 2$ , we have $p_{r \cdot x_{1}} \geq 1 - 2 (1 - p_{C H S H})$ .

Now, we show that extracting $r \cdot x_{1}$ in this way allows $x_{1}$ to be determined in full even in the presence of noise, by rewinding many times and querying for specific (correlated) choices of $r$ . In particular, the above construction is a noisy oracle to the encoding of $x_{1}$ under the Hadamard code. By the Goldreich-Levin theorem [GL89], list decoding applied to such an oracle will generate a polynomial-length list of candidates for $x_{1}$ . If the noise rate of the oracle is noticeably less than $1 / 2$ , $x_{1}$ will be contained in that list; $B$ can iterate through the candidates until it finds one for which $f (x_{1}) = y$ .

By Lemma 1 in the Methods, for a particular iteration of the protocol, the probability that list decoding succeeds is bounded by $p_{x_{1}} > 2 p_{r \cdot x_{1}} - 1 - 2 μ^{'} (n)$ , for a noticeable function $μ^{'} (n)$ of our choice ⁵⁵ 5 The oracle’s noise rate is not simply $p_{r \cdot x_{1}}$ : that is the probability that any single value $r \cdot x_{1}$ is correct, but all of the queries to the oracle are correlated (they are for the same iteration of the protocol, and thus the same value of $y$ ).. Setting $μ^{'} (n) = μ (n) / 4$ and combining with the previous result yields $p_{x_{1}} > 1 - 4 (1 - p_{C H S H}) - μ (n) / 2$ .

Finally, via union bound, the probability that $B$ returns a claw is

P_{B} \geq 1 - (1 - p_{x_{0}}) - (1 - p_{x_{1}}) > p_{x} + 4 p_{C H S H} - 4 - μ (n) / 2

and via the assumption that $p_{x} + 4 p_{C H S H} - 4 > μ (n)$ we have

P_{B} > μ (n) / 2

a contradiction. ∎

If we let $p_{x} = 1$ , the bound requires that $p_{C H S H} < 3 / 4 + ϵ (n)$ for a classical device, while $p_{C H S H} \approx 0.85$ for a quantum device, matching the classical and quantum success probabilities of the CHSH game. In Section 5.7.7, we provide an example of a classical algorithm saturating the bound with $p_{x} = 1$ and $p_{C H S H} = 3 / 4$ .

5.3.3 Variations on the protocol

In this section we describe two variations on the protocol, the goal of both of which is to remove the need for the “preimage” test (Step 6a of Fig. 5.1). The main benefit of doing so is that it simplifies and improves the classical bound, to simply $p \leq 3 / 4 + ϵ (n)$ , where $p$ now is the overall probability that the prover succeeds (equivalent to $p_{C H S H}$ in the normal protocol, because $p_{x}$ no longer exists). A secondary benefit is that it slightly simplifies the experimental implementation by making the protocol less complicated.

The idea is simple: in Step 6b of Fig. 5.1, instead of choosing a single random bitstring $r$ , the verifier chooses two, $r_{0}$ and $r_{1}$ . Then, in Step 7b, instead of using the single $r$ for both $x_{0}$ and $x_{1}$ , the prover instead computes $| r_{0} \cdot x_{0} ⟩_{b} | x_{0} ⟩_{x} + | r_{1} \cdot x_{1} ⟩_{b} | x_{1} ⟩_{x}$ —a different inner product for each of the preimages. Applying the proof of Theorem 3 to this scheme, the responses of the classical machine $A$ can be used to reconstruct whether $r_{0} \cdot x_{0} = r_{1} \cdot x_{1}$ (where originally we reconstructed simply whether $r \cdot x_{0} = r \cdot x_{1}$ ). The key insight is that the truth value of this new equality is equal to $(r_{0} | | r_{1}) \cdot (x_{0} | | x_{1})$ , where $| |$ denote concatenation. This fact can be used to construct a noisy oracle for the inner product of $x_{0} | | x_{1}$ with arbitrary strings, to which the Goldreich-Levin theorem can be applied to find $x_{0} | | x_{1}$ , fully revealing both $x_{0}$ and $x_{1}$ . (This should be compared to the original proof, which could only decode $x_{0} \oplus x_{1}$ via the Goldreich-Levin theorem, and thus required the preimage test to supply $x_{0}$ or $x_{1}$ and thus reveal the claw). Since $x_{0}$ and $x_{1}$ can both be reconstructed from only the CHSH portion of the protocol, the “preimage” test is not necessary for classical hardness and can be removed.⁶⁶ 6 This variant of the protocol was published in [BGK+23].

The downside of this variation of the protocol is that the prover needs to somehow be able to distinguish $x_{0}$ from $x_{1}$ , so that the appropriate inner product can be taken with each. For many TCFs, such as the one based on LWE [BCM+21] and the DDH-based TCF we present in this chapter, this is not a problem—there is an extra qubit in the preimages which is in the state $| 0 ⟩$ for $x_{0}$ and $| 1 ⟩$ for $x_{1}$ . However for $x^{2} mod N$ , it is not so straightforward. Via the Jacobi symbol it is technically possible to distinguish the two preimages, because it is a fact of $x^{2} mod N$ that one preimage will have Jacobi symbol $+ 1$ and the other $- 1$ . However actually computing the Jacobi symbol is very expensive, much moreso than computing $x^{2} mod N$ itself, defeating our goal of having an efficient implementation! Another somewhat less expensive strategy is to switch to the pair of functions $f_{0} (x) = x^{2} mod N$ and $f_{1} (x) = 4 x^{2} mod N$ , with their domain defined as the set of quadratic residues less than $N$ (instead of the set of integers $[0, N / 2]$ that were used before). By splitting into two functions we get the desired “marker” qubit distinguishing the two preimages, but we run into the problem of generating a uniform superposition of quadratic residues modulo $N$ . To our knowledge the best way to generate such a superposition is to start with the set of all integers less than $N$ , and square them. Then, another square must be taken to actually implement the TCF. So, it seems that using this TCF would require a quantum circuit twice as large as the original protocol using $x^{2} mod N$ —a tradeoff that is probably not worth it for the extra simplicity of removing the preimage test. That being said, if a function other than $x^{2} mod N$ is used which does have the extra qubit, this variation is almost certainly the right choice.

We also note that we learned via personal correspondence with Eitan Porat, Zvika Brakerski, and Thomas Vidick that they found that the original protocol is actually classically hard without the preimage test. The intuitive idea is that we can learn more information from the “measurement results” than just whether $r \cdot x_{0} = r \cdot x_{1}$ . In particular, when that equality holds, we also get access to the value of $r \cdot x_{0}$ (and $r \cdot x_{1}$ , since they are equal). With this extra information it is possible to use a more complicated scheme based on the Goldreich-Levin theorem to decode $x_{0}$ and $x_{1}$ in full, proving the hardness of passing just the CHSH portion directly from the hardness of finding claws. However, apparently the classical bound is not quite as powerful due to the more complicated decoding process.

5.3.4 Robustness: Error mitigation via postselection

The existence of a finite gap between the classical and quantum success probabilities implies that our protocol can tolerate a certain amount of noise. A direct implementation of our interactive protocol on a noisy quantum device would require an overall fidelity of $\sim 83$ % in order to exceed the classical bound ⁷⁷ 7 This number comes from solving the classical bound (Equation 5.1) for circuit fidelity $F$ , with $p_{x} = F$ and $p_{C H S H} = \frac{1}{2} + F / 2$ .. To allow devices with lower fidelities to demonstrate quantum advantage, our protocol allows for a natural tradeoff between fidelity and runtime, such that the classical bound can, in principle, be exceeded with only a small [e.g. $1 / p o l y (n)$ ] amount of coherence in the quantum device ⁸⁸ 8 This is true even if the coherence is exponentially small in $n$ . Of course, with arbitrarily low coherence the runtime may become excessively large such that quantum advantage cannot be demonstrated—the point is that regardless of runtime, the classical probability bound can be exceeded with a device that has arbitrarily low circuit fidelity..

Figure 5.2: Performance of our post-selection scheme. Redundancy is added to the function

x^{2} mod N

by mapping it to

(3^{a} x)^{2} mod 3^{2 a} N

. Numerical simulations are performed on a quantum circuit implementing the Karatsuba algorithm for

a = {0, 1, 2, 3}

(see Section 5.7.9). (a) “Quantumness” measured in terms of the classical bound from Eqn. 5.1 as a function of the total circuit fidelity. With

a = 3

, even a quantum device with only 1% circuit fidelity can demonstrate quantum advantage. (b) Depicts the increased runtime associated the post-selection scheme, which arises from a combination of slightly larger circuit sizes and the need to re-run the circuit multiple times. The latter is by far the dominant effect. Dashed lines are a theory prediction with no fit parameters; points are the result of numerical simulations at

n = 512

bits and error bars depict

2 σ

uncertainty.

The key idea is based upon postselection. For most TCFs, there are many bitstrings of the correct length that are not valid outputs of $f$ . Thus, if the prover detects such a $y$ value in step 3 (Fig. 1), they can simply discard it and try again ⁹⁹ 9 This scheme will only remove errors in the first round of the protocol, but fortunately, one expects the overwhelming majority of the quantum computation, and thus also the majority of errors, to occur in that round.. In principle, the verifier can even use their trapdoor data to silently detect and discard iterations of the protocol with invalid $y$ ¹⁰¹⁰ 10 This procedure does not leak data to a classical cheater, because the verifier does not communicate which runs were discarded. Furthermore, it does not affect the soundness of Theorem 3, because the machine $B$ in that theorem’s proof can simply iterate until it encounters a valid $y$ .. Since $y$ is a function of $x_{0}$ and $x_{1}$ , one might hope that this postselection scheme also rejects states where $x_{0}$ or $x_{1}$ has become corrupt. Although this may not always be the case, we demonstrate numerically that this assumption holds for a specific implementation of $x^{2} mod N$ in the following subsection. One could also compute a classical checksum of $x_{0}$ and $x_{1}$ before and after the main circuit to ensure that they have not changed during its execution. Assuming that such bit-flip errors are indeed rejected, the possibility remains of an error in the phase between $| x_{0} ⟩$ and $| x_{1} ⟩$ . In Section 5.7.9, we demonstrate that a prover holding the correct bitstrings but with an error in the phase can still saturate the classical bound; if the prover can avoid phase errors even a small fraction of the time, they will push past the classical threshold.

Numerical analysis of the postselection scheme for $x^{2} mod N$

Focusing on the function $f (x) = x^{2} mod N$ , we now explicitly analyze the effectiveness of the postselection scheme. Let $m$ be the length of the outputs of this function. In this case, approximately $1 / 4$ of the bitstrings of length $m$ are valid outputs, so one would naively expect to reject about $3 / 4$ of corrupted bitstrings. By introducing additional redundancy into the outputs of $f$ and thus increasing $m$ , one can further decrease the probability that a corrupted $y$ will incorrectly be accepted. As an example, let us consider mapping $x^{2} mod N$ to the function $(k x)^{2} mod k^{2} N$ for some integer $k$ . This is particularly convenient because the prover can validate $y$ by simply checking whether it is a multiple of $k^{2}$ . Moreover, the mapping adds only $log k$ bits to the size of the problem, while rejecting a fraction $1 - 1 / k^{2}$ of corrupted bitstrings.

We perform extensive numerical simulations demonstrating that postselection allows for quantum advantage to be achieved using noisy devices with low circuit fidelities (Fig. 2). We simulate quantum circuits for $(k x)^{2} mod k^{2} N$ at a problem size of $n = 512$ bits. Assuming a uniform gate fidelity across the circuit, we analyze the success rate of a quantum prover for $k = 3^{a}$ and $a = {0, 1, 2, 3}$ . For these simulations we use our implementation of the Karatsuba algorithm (see Section 5.5.1) because it is the most efficient in terms of gate count and depth. The choice of $k = 3^{a}$ , and details of the simulation, are explained in Section 5.7.9.

For $a = 0$ , the circuit implements our original function $x^{2} mod N$ , where in the absence of postselection, an overall circuit fidelity of $F \sim 0.83$ is required to achieve quantum advantage. As depicted in Fig. 5.2(a), even for $a = 0$ , our postselection scheme improves the advantage threshold down to $F \sim 0.51$ . For $a = 2$ , circuit fidelities with $F ≳ 0.1$ remain well above the quantum advantage threshold, while for $a = 3$ the required circuit fidelity drops below $1 %$ .

However, there is a tradeoff. In particular, one expects the overall runtime to increase for two reasons: (i) there will be a slight increase in the circuit size for $a > 0$ and (ii) one may need to re-run the quantum circuit many times until a valid $y$ is measured. Somewhat remarkably, a runtime overhead of only $4.7$ x already enables quantum advantage to be achieved with an overall circuit fidelity of $10 %$ [Fig. 5.2(b)]. Crucially, this increase in runtime is overwhelmingly due to re-running the quantum circuit and does not imply the need for longer experimental coherence times.

5.3.5 Efficient quantum evaluation of irreversible classical circuits

The central computational step in our interactive protocol (i.e. step 2, Fig. 5.1) is for the prover to apply a unitary of the form:

U_{f_{i}} \sum x | x ⟩_{x} | 0^{\otimes m} ⟩_{y} = \sum x | x ⟩_{x} | f_{i} (x) ⟩_{y},

(5.2)

where $f_{i} (x)$ is a classical function and $m$ is the length of the output register. This type of unitary operation is ubiquitous across quantum algorithms, and a common strategy for its implementation is to convert the gates of a classical circuit into quantum gates. Generically, this process induces substantial overhead in both time and space complexity owing to the need to make the circuit reversible to preserve unitarity [BEN89, LS90]. This reversibility is often achieved by using an additional register, $g$ , of so-called “garbage bits” and implementing: $U_{f_{i}}^{'} \sum_{x} | x ⟩_{x} | 0^{\otimes m} ⟩_{y} | 0^{\otimes l} ⟩_{g} = \sum_{x} | x ⟩_{x} | f_{i} (x) ⟩_{y} | g_{i} (x) ⟩_{g}$ . For each gate in the classical circuit, enough garbage bits are added to make the operation injective. In general, to maintain coherence, these bits cannot be discarded but must be “uncomputed” later, adding significant complexity to the circuits.

A particularly appealing feature of our protocol is the existence of a measurement scheme to discard garbage bits, allowing for the direct mapping of classical to quantum circuits with no overhead. Specifically, we envision the prover measuring the qubits of the $g$ register in the Hadamard basis and storing the results as a bitstring $h$ , yielding the state,

| ψ ⟩ = \sum x (- 1)^{h \cdot g_{i} (x)} | x ⟩_{x} | f_{i} (x) ⟩_{y} .

(5.3)

The prover has avoided the need to do any uncomputation of the garbage bits, at the expense of introducing phase flips onto some elements of the superposition. These phase flips do not affect the protocol, so long as the verifier can determine them. While classically computing $h \cdot g_{i} (x)$ is efficient for any $x$ , computing it for all terms in the superposition is infeasible for the verifier. However, our protocol provides a natural way around this. The verifier can wait until the prover has collapsed the superposition onto $x_{0}$ and $x_{1}$ , before evaluating $g_{i} (x)$ only on those two inputs ¹¹¹¹ 11 This is true because $g_{i} (x)$ is the result of adding extra output bits to the gates of a classical circuit, which is efficient to evaluate on any input..

Crucially, the prover can measure away garbage qubits as soon as they would be discarded classically, instead of waiting until the computation has completed. If these qubits are then reused, the quantum circuit will use no more space than the classical one. This feature allows for significant improvements in both gate depth and qubit number for practical implementations of the protocol (see last rows of Table I in Methods). We note that performing many individual measurements on a subset of the qubits is difficult on some experimental systems, which may make this technique challenging to use in practice. However, recent hardware advances have demonstrated these “intermediate measurements” in practice with high fidelity, for example by spatially shuttling trapped ions [ZKL+21, RBL+21]. We thus expect that the capability to perform partial measurements will not be a barrier in the near term. This issue can also be mitigated somewhat by collecting ancilla qubits and measuring them in batches rather than one-by-one, allowing for a direct trade-off between ancilla usage and the number of partial measurements.

5.4 The search for alternative trapdoor claw-free functions

Before moving on to proposals for the physical implementation of this protocol, I would like to briefly summarize some of my unsuccessful efforts to find new constructions for trapdoor claw-free functions, in hope that it can be helpful for anyone trying to do so in the future. Broadly, the goal is to come up with a TCF that can be implemented in as small a quantum circuit as possible—primarily in terms of number of qubits and number of gates. Other potentially important statistics include circuit depth (parallelism) and spatial locality of the gates.

We will focus on the $x^{2} mod N$ -based TCF in the later sections of this chapter because it seems to strike the best balance in achieving the goals above, but it is not perfect because the modulus $N$ needs to be quite large for the problem to be classically hard—which has negative consequences for both the qubit and gate counts. For example, considering just qubit count for the moment, if we desire the security of a $1024$ -bit modulus, there is a hard lower bound of $1024$ qubits required to implement the circuit (and in practice, the circuit will probably require a considerable amount more than that). This should be compared to the fact that in the average case, circuits of fewer than 100 qubits with sufficient depth are infeasible to classically simulate—so there is a large gap between the hardness of simulation and the hardness of the cryptography. Ideally, we would make that gap as small as possible. The DDH-based TCF also proposed in this chapter has the potential to improve the gap considerably: when implemented using elliptic curve cryptography, the group elements can be as small as a couple hundred bits long and the hardness assumption remains secure. Unfortunately, the gate count required to implement that TCF is dramatically worse than for $x^{2} mod N$ , and that is why we do not focus our efforts on building circuits for it.

Given these considerations, I expended a considerable effort in looking for other cryptographic assumptions that could be used to build a trapdoor claw-free function. Coming up with new, more efficient TCFs directly from the ground up is a daunting pursuit: finding ways to make public-key cryptography more efficient is of central concern for classical cryptography, so it has been a subject of intense research for years. So instead of trying to break new ground there, a more modest goal is to take other existing schemes for public-key cryptography which do not have the precise structure of a TCF, and build TCFs out of them.

In my efforts to do so, one promising candidate seemed to be the Learning Parity with Noise (LPN) problem, which has found use for classical cryptography in devices with very limited computational power such as RFID cards. The structure of the LPN problem is similar to that of LWE, but the linear algebra takes place over the field $F_{2}$ of binary numbers instead of integers modulo some large $q$ . [PIE12] To be explicit, consider a binary matrix $A \in {0, 1}^{m \times n}$ , with, say, $m = 2 n$ . For a secret string $s \in {0, 1}^{n}$ and “error” vector $e \in {0, 1}^{m}$ , consider the “noisy” image of $s$ defined as $y = A s + e$ . The LPN hardness assumption states that for appropriate setting of the problem parameters, given only $y$ and $A$ it is computationally hard (even for a quantum computer) to recover $s$ unless $A$ has some special structure.¹²¹² 12 When I first learned about LPN I got extremely interested in exploring the classical hardness of the problem. I wrote the first (to my knowledge) GPU-accelerated solver for it, and ended up breaking the world record for the largest instance that had been solved. After about a year I was unseated by another GPU-based implementation. The competition can be found here: https://decodingchallenge.org/syndrome, I encourage the reader to try their hand at it! Obviously this is the case if the noise vector $e$ is overwhelming; the problem is interesting because this seems to hold even when $e$ is quite sparse (most entries are zero). One can see the potential here for simplicity of implementation: performing the linear algebra requires only addition and multiplication of numbers in $F_{2}$ , which corresponds simply to XOR and AND gates. This is dramatically less complicated than the addition and multiplication circuits for integers modulo some large value $q$ , which are required to implement LWE.

The challenge is to figure out how to build a TCF out of this hardness assumption. Considering the similarity of the LWE and LPN problems, an obvious idea is to follow the structure of the LWE TCF, and define two functions roughly as

	$f_{0} (x) =$	$A x$		(5.4)
	$f_{1} (x) =$	$A x + y$		(5.5)

Using the definition of $y$ , we see that $f_{1} (x) = A (x + s) + e$ , and thus that for a pair $(x_{0}, x_{1})$ where $x_{0} = x_{1} + s$ , we have $f_{0} (x_{0}) = f_{1} (x_{1}) + e$ —that is, it is almost a claw, aside from the error vector $e$ (which has most entries set to zero). But for the protocol to work, we need an exact collision, rather than an approximate one. In LWE, this is done by adding extra error $e^{'}$ to the output of both $f_{0}$ and $f_{1}$ , to “smear out” the values. If the distribution of $e^{'}$ is sufficiently wider than the distribution of $e$ , then $e$ disappears into the noise and the probability distributions have good overlap, yielding collisions. Unfortunately, despite considerable effort, it does not seem that it is possible to do the same trick with LPN. The problem stems from the same reason that LPN seemed promising: the linear algebra is over $F_{2}$ instead of $F_{q}$ . Intuitively, because each value can only be 0 or 1, there is simply no “room” to have a wider probability distribution for the elements of an extra noise vector $e^{'}$ . (In fact, the LWE TCF requires $q$ to be very large precisely for this reason). Perhaps there is some other scheme to create exact collisions from these near-collisions in LPN, like rounding the outputs somehow, but I was never able to find one.

Looking at the problem more broadly, it actually seems very unlikely that it is possible to create perfect collisions in this way, because it turns out doing so would break the assumption of post-quantum hardness of LPN, which is widely believed to hold. The reason is because this pair of functions could be used as an oracle for Simon’s algorithm, which would allow a quantum device to very efficiently find $s$ . [SIM97] The only hope seems to be the fact that Simon’s algorithm requires the functions to perfectly collide all but an exponentially small fraction of the time, so perhaps if the collisions are not perfect, the LPN assumption would not be broken. However, even broadening the search to look for such “noisy” TCFs based on LPN has yet to yield any useful constructions. One last idea is that maybe there is a way to use LPN in an entirely different manner to create a TCF—but for that, it’s not even clear where to start.

5.5 Quantum circuits for trapdoor claw-free functions

Figure 5.3: Quantum circuits implementing step 2 of our interactive protocol for

f (x) = x^{2} mod N

n

is the length of the input register, and

m = n + O (1)

is the length of the output register. (a) Depicts a quantum circuit optimized for qubit number. The circuit shown computes the

k^{t h}

bit of

w = x^{2} / N

and should be iterated for

k

. This iteration should begin at the least significant bit to ensure that the final phase rotation can be estimated classically. Note that the only entangling operations necessary for the circuit are doubly-controlled gates, which can be natively implemented using the Rydberg blockade (see Section 5.5.3). (b) Depicts a quantum circuit optimized for gate number. By combining gates of equal phase, one can reduce the overall circuit complexity to

O (n^{2} log n)

gates. We note that neither circuit requires use of the “garbage bit” procedure described in Section 5.3.5; this design choice reduces measurement complexity. If desired, that procedure could be applied to the counter register of circuit (b) in place of the controlled-decrement operation.

As just discussed, while all of the trapdoor, claw-free functions listed in Table 5.1 can be utilized within our interactive protocol, each has its own set of advantages and disadvantages. For example, the TCF based on the Diffie-Hellman problem (described in the Methods) already enables a demonstration of quantum advantage at a key size of 160 bits (with a hardness equivalent to 1024 bit integer factorization [BAR16]); however, building a circuit for this TCF requires a quantum implementation of Euclid’s algorithm, which is challenging [HJN+20]. Thus, we focus on designing quantum circuits implementing Rabin’s function, $x^{2} mod N$ .

5.5.1 Quantum circuits for $x^{2} mod N$

In Chapter 7 we present what to our knowledge are the most highly optimized circuits known for $x^{2} mod N$ . Here, we present four more basic circuits, that exhibit the range of possible implementations of $x^{2} mod N$ and provide a good comparison for the optimizations in that chapter. For the circuits presented here, implementations in Python using the Cirq library are included as supplementary files ¹³¹³ 13 Code is available at https://github.com/GregDMeyer/quantum-advantage and is archived on Zenodo [MEY22]. The first two are quantum implementations of classical circuits for the Karatsuba and “schoolbook” classical integer multiplication algorithms, where we leverage the reversibility optimizations described in Section 5.3.5 (see Section 5.7.8 for details of their implementation). The latter pair, which we call the “phase circuits” and describe below, are intrinsically quantum algorithms that use Ising interactions to directly compute $x^{2} mod N$ in the phase. Using those circuits, we propose a near-term demonstration of our interactive protocol on a Rydberg-based quantum computer [LKS+19, BL20]; crucially, the so-called “Rydberg blockade” interaction natively realizes multi-qubit controlled phase rotations, from which the entire circuits shown in Figure 3 are built (up to single qubit rotations). A comparison of approximate gate counts for each of the four circuits can be seen in Table I in the Methods. Of the circuits explored here, the Karatsuba algorithm is the most efficient in total gates and circuit depth, while the phase circuits are most efficient in terms of qubit usage and measurement complexity. Chapter 7 manages to combine the benefits of both, yielding circuits with gate counts better than the Karatsuba circuits here and qubit usage and measurement complexity comparable to the phase circuits.

5.5.2 Phase circuits

We now describe the two circuits, amenable to near-term quantum devices, that utilize quantum phase estimation to implement the function $f (x) = x^{2} mod N$ . The intuition behind our approach is as follows: we will compute $x^{2} / N$ in the phase and transfer it to an output register via an inverse quantum Fourier transform [DRA00, BEA03]; the modulo operation occurs automatically as the phase wraps around the unit circle, avoiding the need for a separate reduction step.

In order to implement $\sum_{x} | x ⟩_{x} | x^{2} mod N ⟩_{y}$ , we design a circuit to compute:

(I \otimes I Q F T) {~ U}_{w_{N}} (I \otimes H^{\otimes m}) | x ⟩ | 0^{\otimes m} ⟩ = | x ⟩ | w ⟩

(5.6)

where $H$ is a Hadamard gate, $I Q F T$ represents an inverse quantum Fourier transform, $w \equiv x^{2} / N = 0. w_{1} w_{2} \dots w_{m}$ is an $m$ -bit binary fraction ¹⁴¹⁴ 14 We must take $m > n + O (1)$ to sufficiently resolve the value $x^{2} mod N$ in post-processing, and ${~ U}_{w_{N}}$ is the diagonal unitary,

{~ U}_{w_{N}} | x ⟩ | z ⟩ = exp (2 π i \frac{x^{2}}{N} z) | x ⟩ | z ⟩ .

(5.7)

The simplest circuit to implement ${~ U}_{w_{N}}$ simply decomposese $x$ and $z$ in binary, and performs a digit-by-digit multiplication using the schoolbook algorithm:

exp (2 π i \frac{x^{2}}{N} z) = \prod i, j, k exp (2 π i \frac{2^{i + j + k}}{N} x_{i} x_{j} z_{k}),

(5.8)

With this, one immediately finds that ${~ U}_{w_{N}}$ is equivalent to applying a series of controlled-controlled-phase rotation gates of angle,

ϕ_{i j k} = \frac{2 π 2^{i + j + k}}{N} (mod 2 π) .

(5.9)

Here, the control qubits are $i, j$ in the $x$ register, while the target qubit is $k$ in the $y$ register. Crucially, the value of this phase for any $i, j, k$ can be computed classically when the circuit is compiled.

Figure 5.3 shows two explicit circuits to implement ${~ U}_{w_{N}}$ , one optimizing for qubit count, and the other optimizing for gate count. The first circuit [Fig. 5.3(a)] takes advantage of the fact that the output register is measured immediately after it is computed; this allows one to replace the $m$ output qubits with a single qubit that is measured and reused $m$ times. Moreover, by replacing groups of doubly-controlled gates with a Toffoli and a series of singly-controlled gates, one ultimately arrives at an implementation, which requires $n^{3} / 2 + O (n^{2})$ gates, but only $n + O (1)$ qubits. We note that this does require individual measurement and re-use of qubits, which has been a challenge for experiments; recent experiments however have demonstrated this capability [ZKL+21, RBL+21].

The second circuit [Fig. 5.3(b)], which optimizes for gate count, leverages the fact that $ϕ_{i j k}$ (Eqn. 5.9) only depends on $i + j + k$ , allowing one to combine gates with a common sum. In this case, one can define $ℓ = i + j$ and then, for each value of $ℓ$ , simply “count” the number of values of $i, j$ for which both control qubits are 1. By then performing controlled gates off of the qubits of the counter register, one can reduce the total gate complexity by a factor of $n / log n$ , leading to a implementation with $2 n^{2} log n + O (n^{2})$ gates.

Figure 5.4: Physical implementation in a Rydberg atom quantum computer. (a) Schematic illustration of a three dimensional array of neutral atoms with Rydberg blockade interactions. The blockade radius can be significantly larger than the inter-atom spacing, enabling multi-qubit entangling operations. (b) As an example, Rydberg atoms can be trapped in an optical tweezer array. The presence of an atom in a Rydberg excited state (red) shifts the energy levels of nearby atoms (blue), preventing the driving field (yellow arrow) from exciting them to their Rydberg state,

| r ⟩

. (c) A single qubit phase rotation can be implemented by an off-resonant Rabi oscillation between one of the qubit states, e.g.,

| 1 ⟩

, and the Rydberg excited state. This imprints a tunable, geometric phase

ϕ

, which is determined by the detuning

Δ

and Rabi frequency

Ω

. (d) Multi-qubit controlled-phase rotations are implemented via a sequence of

π

-pulses between the

| 0 ⟩ \leftrightarrow | r ⟩

transition of control atoms (yellow) and off-resonant Rabi oscillations on the target atoms (orange).

5.5.3 Experimental implementation

Motivated by recent advances in the creation and control of many-body entanglement in programmable quantum systems [ZPH+17, AAB+19, SSW+21, EWL+21], we propose an experimental implementation of our interactive protocol based upon neutral atoms coupled to Rydberg states [BL20]. We envision a three dimensional system of either alkali or alkaline-earth atoms trapped in an optical lattice or optical tweezer array [Fig. 5.4(a)] [WZC+15, WKW+16, KWG+18]. To be specific, we consider $^{87} R b$ with an effective qubit degree of freedom encoded in hyperfine states: $| 0 ⟩ = | F = 1, m_{F} = 0 ⟩$ and $| 1 ⟩ = | F = 2, m_{F} = 0 ⟩$ . Gates between atoms are mediated by coupling to a highly-excited Rydberg state $| r ⟩$ , whose large polarizability leads to strong van der Waals interactions. This microscopic interaction enables the so-called Rydberg “blockade” mechanism—when a single atom is driven to its Rydberg state, all other atoms within a blockade radius, $R_{b}$ , become off-resonant from the drive, thereby suppressing their excitation [Fig. 5.4(a,b)] [SAF16].

Somewhat remarkably, this blockade interaction enables the native implementation of all multi-qubit-controlled phase gates depicted in the circuits in Figure 5.3. In particular, consider the goal of applying a $C^{k} R_{ϕ}^{ℓ}$ gate; this gate applies phase rotations, ${ϕ_{1}, ϕ_{2}, \dots, ϕ_{ℓ}}$ , to target qubits ${j_{1}, j_{2}, \dots j_{ℓ}}$ if all $k$ control qubits ${i_{1}, i_{2}, \dots i_{k}}$ are in the $| 1 ⟩$ state [Fig. 5.4(d)]. Experimentally, this can be implemented as follows: (i) sequentially apply (in any order) resonant $π$ -pulses on the $| 0 ⟩ \leftrightarrow | r ⟩$ transition for the $k$ desired control atoms, (ii) off-resonantly drive the $| 1 ⟩ \leftrightarrow | r ⟩$ transition of each target atom with detuning $Δ$ and Rabi frequency $Ω$ for a time duration $T = 2 π / (Ω^{2} + Δ^{2})^{1 / 2}$ [Fig. 5.4(c)], (iii) sequentially apply [in the opposite order as in (i)] resonant $- π$ -pulses (i.e. $π$ -pulses with the opposite phase) to the $k$ control atoms to bring them back to their original state. The intuition for why this experimental sequence implements the $C^{k} R_{ϕ}^{ℓ}$ gate is straightforward. The first step creates a blockade if any of the control qubits are in the $| 0 ⟩$ state, while the second step imprints a phase, $ϕ = π (1 - Δ / \sqrt{Δ^{2} + Ω^{2}})$ , on the $| 1 ⟩$ state, only in the absence of a blockade. Note that tuning the values of $ϕ_{i}$ for each of the target qubits simply corresponds to adjusting the detuning and Rabi frequency of the off-resonant drive in the second step [Fig. 5.4(c,d)].

Demonstrations of our protocol can already be implemented in current generation Rydberg experiments, where a number of essential features have recently been shown, including: 1) the coherent manipulation of individual qubits trapped in a 3D tweezer array [WZC+15, WKW+16], 2) the deterministic loading of atoms in a 3D optical lattice [KWG+18], and 3) fast entangling gate operations with fidelities, $F \geq 0.974$ [LKS+19, GKG+19, MCS+20]. In order to estimate the number of entangling gates achievable within decoherence time scales, let us imagine choosing a Rydberg state with a principal quantum number $n \approx 70$ . This yields a strong van der Waals interaction, $V (\to r) = C_{6} / r^{6}$ , with a $C_{6}$ coefficient $\sim (2 π) 880$ GHz $\cdot μ$ m⁶ [LWN+12]. Combined with a coherent driving field of Rabi frequency $Ω \sim (2 π) 1 - 10$ MHz, the van der Waals interaction can lead to a blockade radius of up to, $R_{b} = {(C_{6} / Ω)}_{6}^{1 / 6} \sim 10 μ$ m. Within this radius, one can arrange $\sim 10^{2}$ all-to-all interacting qubits, assuming an atom-to-atom spacing of approximately, $a_{0} \approx 2 μ$ m ¹⁵¹⁵ 15 We note that this spacing is ultimately limited by a combination of the optical diffraction limit and the orbital size of $n \approx 70$ Rydberg states.. In current experiments, the decoherence associated with the Rydberg transition is typically limited by a combination of inhomogeneous Doppler shifts and laser phase/intensity noise, leading to $1 / T_{2} \sim 10 - 100$ kHz [dBL+18, LKS+19, LSF+21b]. Taking everything together, one should be able to perform $\sim 10^{3}$ entangling gates before decoherence occurs (this is comparable to the number of two-qubit entangling gates possible in other state-of-the-art platforms [AAB+19, SBT+18]). While this falls short of enabling an immediate full-scale demonstration of classically verifiable quantum advantage, we hasten to emphasize that the ability to directly perform multi-qubit entangling operations significantly reduces the cost of implementing our interactive protocol. For example, the standard decomposition of a Toffoli gate uses 6 CNOT gates and 7 $T$ and $T^{†}$ gates, with a gate depth of 12 [NC11, SM09, BBC+95]; an equivalent three qubit gate can be performed in a single step via the Rydberg blockade mechanism.

5.6 Conclusion and outlook

The interplay between classical and quantum complexities ultimately determines the threshold for any quantum advantage scheme. Here, we have proposed a novel interactive protocol for classically verifiable quantum advantage based upon trapdoor claw-free functions; in addition to proposing two new TCFs [Table 5.1], we also provide explicit quantum circuits that leverage the microscopic interactions present in a Rydberg-based quantum computer. Our work allows near-term quantum devices to move one step closer toward a loophole-free demonstration of quantum advantage and also has opened the door to a number of promising future directions.

First, the proof of soundness contained in this chapter only applies to classical adversaries. Since the work in this chapter was originally published, a work by several colleagues and myself has extended the cryptographic proofs to the quantum case. In particular, we show that when the protocol from this work is instantiated with a quantum secure TCF like the one based off of LWE, it can be used to certify certain facts about the inner workings of the quantum device, with implications for quantum cryptographic applications such as certifiable random number generation or even the verification of arbitrary computations. [BGK+23] Second, our work has motivated the search for new trapdoor claw-free functions, as discussed in Section 5.4. At least one new construction has been discovered since this work was published; ideally more will be found as the search continues. [AMR22] More broadly, one could also attempt to build modified protocols, which simplify either the requirements on the cryptographic function or the interactions; interestingly, recent work has demonstrated that using random oracles can remove the need for interactions in a TCF-based proof of quantumness [BKV+20], or even remove the need for a TCF entirely! [YZ22] Finally, while we have focused our experimental discussions on Rydberg atoms, a number of other platforms also exhibit features that facilitate the protocol’s implementation. For example, both trapped ions and cavity-QED systems can allow all-to-all connectivity, while superconducting qubits can be engineered to have biased noise [PSG+20]. This latter feature would allow noise to be concentrated into error modes detectable by our proposed post-selection scheme.

5.7 Additional proofs and data

5.7.1 List decoding lemma

In this section we prove a bound on the probability that list decoding will succeed for a particular value of $y$ , given an oracle’s noise rate over all values of $y$ . Recall that by the Goldreich-Levin theorem [GL89], list decoding of the Hadamard code is possible if the noise rate is noticeably less than $1 / 2$ .

Lemma 1.

Consider a binary-valued function over two inputs $g : Y \times {0, 1}^{n} \to {0, 1}$ , and a noisy oracle $G$ to that function. Assuming some distribution of values $y \in Y$ and $r \in {0, 1}^{n}$ , define $ϵ \equiv {Pr}_{y, r} [G (y, r) \neq g (y, r)]$ as the “noise rate” of the oracle. Now define the conditional noise rate for a particular $y \in Y$ as

ϵ_{y} \equiv Pr r [G (y, r) \neq g (y, r)]

(5.10)

Then, the probability that $ϵ_{y}$ is less than $1 / 2 - μ (n)$ for any positive function $μ$ , over randomly selected $y$ , is

p_{g o o d} \equiv Pr y [ϵ_{y} < 1 / 2 - μ (n)] \geq 1 - 2 ϵ - 2 μ (n) .

(5.11)

Proof.

Let $S \subseteq Y$ be the set of $y$ values for which $ϵ_{y} < 1 / 2 - μ (n)$ . Then by definition we have

ϵ = p_{g o o d} \cdot ϵ_{y \in S} + (1 - p_{g o o d}) \cdot ϵ_{y \notin S}

(5.12)

Noting that we must have $ϵ_{y} \geq 1 / 2 - μ (n)$ for $y \notin S$ by definition, we may minimize the right hand side of Equation 5.12, yielding the bound

ϵ > p_{g o o d} \cdot 0 + (1 - p_{g o o d}) \cdot (1 / 2 - μ (n))

(5.13)

Rearranging this expression we arrive at

p_{g o o d} > 1 - 2 ϵ - 2 μ (n)

which is what we desired to show. ∎

5.7.2 Trapdoor claw-free function constructions

Here we present two trapdoor claw-free function families (TCFs) for use in the protocol of this paper. These families are defined by three algorithms: $G e n$ , a probabilistic algorithm which selects an index $i$ specifying one function in the family and outputs the corresponding trapdoor data $t$ ; $f_{i}$ , the definition of the function itself; and $T$ , a trapdoor algorithm which efficiently inverts $f_{i}$ for any $i$ , given the corresponding trapdoor data $t$ . Here we provide the definitions of the function families; proofs of their cryptographic properties are included in the supplementary information. In these definitions we use a security parameter $λ$ following the notation of cryptographic literature; $λ$ is informally equivalent to the “problem size” $n$ defined in the main text as the length of the TCF input string.

TCF from Rabin’s function $x^{2} mod N$

“Rabin’s function” $f_{N} (x) = x^{2} mod N$ , with $N$ the product of two primes, was first used in the context of public-key cryptography and digital signatures [RAB79, GMR88]. We use it to define the trapdoor claw-free function family $F_{R a b i n}$ , as follows.

Function generation

$G e n (1^{λ})$

Randomly choose two prime numbers $p$ and $q$ of length $λ / 2$ bits, with $p mod 4 \equiv q mod 4 \equiv 3 mod 4$ ¹⁶¹⁶ 16 In practice, $p$ and $q$ must be selected with some care such that Fermat factorization and Pollard’s $p - 1$ algorithm [POL74] cannot be used to efficiently factor $N$ classically. Selecting $p$ and $q$ in the same manner as for RSA encryption would be effective [RSA78]..
Return $N = p q$ as the function index, and the tuple $(p, q)$ as the trapdoor data.

Function definition

$f_{N} : [N / 2] \to [N]$ is defined as

f_{N} (x) = x^{2} mod N

(5.14)

The domain is restricted to $[N / 2]$ to remove extra trivial collisions of the form $(x, - x)$ .

Trapdoor

The trapdoor algorithm is the same as the decryption algorithm of the Rabin cryptosystem [RAB79]. On input $y$ and key $(p, q)$ , the Rabin decryption algorithm returns four integers $(x_{0}, x_{1}, - x_{0}, - x_{1})$ in the range [0, N). $x_{0}$ and $x_{1}$ can then be selected by choosing the two values that are smaller than $N / 2$ . See proof in supplementary information for an overview of the algorithm.

TCF from Decisional Diffie-Hellman

We now present a trapdoor claw-free function family $F_{D D H}$ based on the decisional Diffie-Hellman problem (DDH). DDH is defined for a multiplicative group $G$ ; informally, the DDH assumption states that for a group generator $g$ and two integers $a$ and $b$ , given $g$ , $g^{a}$ , and $g^{b}$ it is computationally hard to distinguish $g^{a b}$ from a random group element. We expand on a known DDH-based trapdoor one-way function construction [PW08, FGK+10], adding the claw-free property to construct a TCF.

Function generation

$G e n (1^{λ})$

Choose a group $G$ of order $q \sim O (2^{λ})$ , and a generator $g$ for that group.
For dimension $k > {log}_{2} q$ choose a random invertible matrix $M \in Z_{q}^{k \times k}$ .
Compute $g^{M} = (g^{M_{i j}}) \in G^{k \times k}$ (element-wise exponentiation).
Choose a secret vector $s \in {0, 1}^{k}$ ; compute the vector $g^{M s}$ (where $M s$ is the matrix-vector product, and again the exponentiation is element-wise).
Publish the pair $(g^{M}, g^{M s})$ , retain $(g, M, s)$ as the trapdoor data.

Function definition

Let $d$ be a power of two with $d \sim O (k^{2})$ . We define the function $f_{i}$ as $f_{i} (b | | x) := f_{i, b} (x)$ , where $| |$ denotes concatenation, for a pair of functions $f_{i, b} : Z_{d}^{k} \to G^{k}$ :

	$f_{i, 0} (x)$	$= g^{M x}$		(5.15)
	$f_{i, 1} (x)$	$= g^{M x} g^{M s} = g^{M (x + s)}$		(5.16)

Trapdoor

The algorithm takes as input the trapdoor data $(g, M, s)$ and a value $y = g^{{M x}_{0}} = g^{M (x_{1} + s)}$ , and returns the claw $(x_{0}, x_{1})$ .

$T ((g, M, s), y)$

Compute $M^{- 1}$ using $M$ .

Compute $g^{M^{- 1} M x_{0}} = g^{x_{0}}$ .

Take the discrete logarithm of each element of $g^{x_{0}}$ , yielding $x_{0}$ . Crucially, this is possible because the elements of $x$ are in $Z_{d}$ and $d = p o l y (n)$ , so the discrete logarithm can be computed in polynomial time by brute force.

Compute $x_{1} = x_{0} - s$

Return $(x_{0}, x_{1})$

5.7.3 Table of circuit sizes

A comparison of the resource requirements for computing $x^{2} mod N$ , for various problem sizes and circuit designs, is presented in Table 5.2. These counts are generated in the “abstract circuit” model, in which error correction, qubit routing, and other practical considerations are not included. For schoolbook and Karatsuba circuits, circuits are decomposed into a Clifford+ $T$ gate set. For the “phase” circuits, we allow controlled arbitrary phase rotations, as we expect these circuits to be appropriate for hardware (physical) qubits where these gates are native. Accordingly, we do not provide $T$ gate counts for those circuits.

Circuit	Qubits	Gates ( $C C R_{ϕ}$ / Toffoli allowed)	Gates (Clifford + $T$ )	$T$ Gates	Depth	Qubit measmts.
$n = 128$ (takes seconds on a desktop [44])
Qubit-optimized phase	128	$1.1 \times 10^{6}$	—	—	$1.1 \times 10^{6}$	128
Gate-optimized phase	264	$4.3 \times 10^{5}$	—	—	$6.3 \times 10^{4}$	0
Schoolbook	515	$1.4 \times 10^{5}$	$9.1 \times 10^{5}$	$3.9 \times 10^{5}$	$1.9 \times 10^{4}$	$3.5 \times 10^{4}$
Karatsuba	942	$1.3 \times 10^{5}$	$7.7 \times 10^{5}$	$3.3 \times 10^{5}$	$2.0 \times 10^{3}$	$3.4 \times 10^{4}$
$n = 400$ (takes hours on a desktop [44])
Qubit-optimized phase	400	$3.3 \times 10^{7}^{*}$	—	—	$3.3 \times 10^{7}^{*}$	400
Gate-optimized phase	812	$4.2 \times 10^{6}^{*}$	—	—	$6.2 \times 10^{5}^{*}$	0
Schoolbook	1603	$1.3 \times 10^{6}$	$8.7 \times 10^{6}$	$3.6 \times 10^{6}$	$5.9 \times 10^{4}$	$3.3 \times 10^{5}$
Karatsuba	3051	$8.8 \times 10^{5}$	$5.4 \times 10^{6}$	$2.3 \times 10^{6}$	$5.3 \times 10^{4}$	$2.4 \times 10^{5}$
$n = 829$ (record for factoring [ZIM20])
Qubit-optimized phase	829	$3.0 \times 10^{8}^{*}$	—	—	$2.9 \times 10^{8}^{*}$	829
Gate-optimized phase	1671	$1.8 \times 10^{7}^{*}$	—	—	$2.6 \times 10^{6}^{*}$	0
Schoolbook	3319	$5.6 \times 10^{6}$	$3.8 \times 10^{7}$	$1.6 \times 10^{7}$	$1.2 \times 10^{5}^{*}$	$1.4 \times 10^{6}$
Karatsuba	5522	$3.0 \times 10^{6}$	$1.8 \times 10^{7}$	$7.7 \times 10^{6}$	$1.1 \times 10^{5}^{*}$	$8.0 \times 10^{5}$
$n = 1024$ (exceeds factoring record)
Qubit-optimized phase	1024	$5.6 \times 10^{8}^{*}$	—	—	$5.5 \times 10^{8}^{*}$	1024
Gate-optimized phase	2061	$2.7 \times 10^{7}^{*}$	—	—	$4.0 \times 10^{6}^{*}$	0
Schoolbook	4097	$8.3 \times 10^{6}$	$5.7 \times 10^{7}$	$2.4 \times 10^{7}$	$1.5 \times 10^{5}^{*}$	$2.1 \times 10^{6}$
Karatsuba	6801	$4.3 \times 10^{6}$	$2.6 \times 10^{7}$	$1.1 \times 10^{7}$	$1.4 \times 10^{5}^{*}$	$1.1 \times 10^{6}$
Other algs. at $n = 1024$
Rev. schoolbook ^†	8192	—	$6.4 \times 10^{8}$	$2.2 \times 10^{8}$	$1.1 \times 10^{8}$	0
Rev. Karatsuba ^†	12544	—	$5.7 \times 10^{8}$	$1.9 \times 10^{8}$	$2.4 \times 10^{7}$	0
Shor’s alg. ^‡	3100	—	—	$1.9 \times 10^{9}^{*}$	—	—

Table 5.2: Circuit sizes for various values of

n = log N

. Values may vary for different

N

of the same length. “Qubit-optimized phase” and “gate-optimized phase” refer to the circuits given in Figure 3(a) and 3(b) of the main text, respectively. “Qubit measmts.” refers to the number of times qubits are measured and then reused during execution of the circuit. See Chapter 7 for alternative circuit constructions to the ones presented here. ^*From analytic estimate rather than building explicit circuit. ^†Reversible circuits constructed using Q# implementation of Ref. [GID19a], and scaled to include Montgomery reduction. ^‡Estimate from [GE21].

5.7.4 Cryptographic proofs of TCF properties

Here we prove the cryptographic properties of the trapdoor claw-free functions (TCFs) presented in the Methods section of the main text. We base our definitions on the Noisy Trapdoor Claw-free Function family (NTCF) definition given in Definition 3.1 of Ref. [BCM+21], with certain modifications such as removing the adaptive hardcore bit requirement and the “noisy” nature of the functions.

We emphasize that in the definitions below, we define security only against classical attackers. Both the $x^{2} mod N$ and DDH constructions could be trivially defeated by a quantum adversary via Shor’s algorithm; since the purpose of the protocol in this paper is to demonstrate quantum capability, this type of adversary is allowed.

We also note that the TCF definition allows the 2-to-1 property to be “imperfect”—that is, we allow the fraction of pre-images which have a colliding pair to be less than 1. In the protocol, the verifier may simply discard any runs in which the prover supplied an output $y$ value that is not part of a claw, that is, does not have two corresponding inputs. This will not affect the prover’s ability to pass the classical threshold (since these runs are counted neither for or against the prover); it will only possibly affect the number of iterations of the protocol required to exceed the classical bound with the desired statistical significance. In the definition below we require the fraction of “good” inputs be at least a constant (which we set to 0.9); in principle the fraction could be as low as $1 / p o l y (λ)$ without interfering with the protocol’s effectiveness.

TCF definition

We use the following definition of a Trapdoor Claw-free Function family:

Definition 1.

Let $λ$ be a security parameter, $I$ a set of function indices, and $X_{i}$ and $Y_{i}$ finite sets for each $i \in I$ . A family of functions

F = {f_{i} : X_{i} \to Y_{i}}_{i \in I}

is called a trapdoor claw free (TCF) family if the following conditions hold:

Efficient Function Generation. There exists an efficient probabilistic algorithm $G e n$ which generates a key $i \in I$ and the associated trapdoor data $t_{i}$ :

$(i, t_{i}) \leftarrow G e n (1^{λ})$
Trapdoor Injective Pair. For all indices $i \in I$ , the following conditions hold:
1. Injective pair: Consider the set $R_{i}$ of all tuples $(x_{0}, x_{1})$ such that $f_{i} (x_{0}) = f_{i} (x_{1})$ . Let $X_{i}^{'} \subseteq X_{i}$ be the set of values $x$ which appear in the elements of $R_{i}$ . For all $x \in X_{i}^{'}$ , $x$ appears in exactly one element of $R_{i}$ ; furthermore, there exists a value $λ_{c}$ such that for all $λ > λ_{c}$ , $| X_{i}^{'} | / | X_{i} | > 0.9$ .
2. Trapdoor: There exists an efficient deterministic algorithm $T$ such that for all $y \in Y_{i}$ and $(x_{0}, x_{1})$ such that $f_{i} (x_{0}) = f_{i} (x_{1}) = y$ , $T (t_{i}, y) = (x_{0}, x_{1})$ .
Claw-free. For any non-uniform probabilistic polynomial time (nu-PPT) classical Turing machine $A$ , there exists a negligible function $ϵ (\cdot)$ such that

$Pr [f_{i} (x_{0}) = f_{i} (x_{1}) \land x_{0} \neq x_{1} | (x_{0}, x_{1}) \leftarrow A (i)] < ϵ (λ)$

where the probability is over both choice of $i$ and the random coins of $A$ .
Efficient Superposition. There exists an efficient quantum circuit that on input a key $i$ prepares the state

$\frac{1}{\sqrt{| X_{i} |}} \sum x \in X_{i} | x ⟩ | f_{i} (x) ⟩$

Proof of $x^{2} mod N$ Tcf

In this section we prove that the function family $F_{R a b i n}$ (defined in Methods) is a TCF by demonstrating each of the properties of Definition 1. Most of the properties follow directly from properties of the Rabin cryptosystem [RAB79]; we reproduce several of the arguments here for completeness.

Theorem 4.

The function family $F_{R a b i n}$ is trapdoor claw-free, under the assumption of hardness of integer factorization.

Proof.

We demonstrate each of the properties of Definition 1:

Efficient Function Generation. Sampling large primes to generate $p, q$ and $N$ is efficient [RAB79].
Trapdoor Injective Pair.
1. Injective pair: By definition of the function, $Y_{i}$ is the set of quadratic residues modulo $N$ . For any $y \in Y_{i}$ , consider the two values $a < p / 2$ and $b < q / 2$ such that $a^{2} \equiv y mod p$ and $b^{2} \equiv y mod q$ . These values exist because $y$ is a quadratic residue modulo $p q$ , therefore it is also a quadratic residue modulo $p$ and $q$ . Define $c \equiv 1 mod p \equiv 0 mod q$ and $d \equiv 0 mod p \equiv 1 mod q$ . The following four values $x$ in the range $[0, N)$ have $x^{2} \equiv y mod N$ : $a c + b d, a c - b d, - a c + b d, - a c - b d$ . Exactly two of these values are in the domain $[N / 2]$ of the TCF, and constitute the injective pair; moreover, these two values will be unique as long as $a, b \neq 0$ . Thus we may define the set $X_{i}^{'} = {x \in [N / 2] | x \equiv̸ 0 mod p \land x \equiv̸ 0 mod q}$ . There exist exactly $((p - 1) + (q - 1)) / 2$ multiples of $p$ or $q$ in the set of integers $X_{i} = [N / 2]$ , thus $| X_{i}^{'} | / | X_{i} | = 1 - ((p - 1) + (q - 1)) / N$ . Recall that $p, q$ are defined to have length $λ / 2$ ; if we let $λ_{c} = 12$ , then $p, q > 2^{5} = 32$ . Since $1 - (31 + 31) / 32^{2} > 0.9$ and $| X_{i}^{'} | / | X_{i} |$ increases monotonically with $λ$ , we have $| X_{i}^{'} | / | X_{i} | > 0.9$ for all $λ > λ_{c}$ .
2. Trapdoor: Because $p$ and $q$ were selected to have $p \equiv q \equiv 3 mod 4$ , $a$ and $b$ in the expressions above can always be computed as $a = y^{(p + 1) / 4} mod p$ and $b = y^{(q + 1) / 4} mod q$ , and then the preimages can be computed as defined above.
Claw-free. We show that knowledge of a claw $x_{0}, x_{1}$ can be used directly to factor $N$ . Writing the claw as $(a c + b d, a c - b d)$ using the values $a, b, c, d$ from above, we have $x_{0} + x_{1} = 2 a c$ . Because $c = 0 mod q$ , $gcd (x_{0} + x_{1}, N) = q$ can be efficiently computed, which then also yields $p = N / q$ . Thus, an algorithm that could be used efficiently to find claws could be equally used to efficiently factor $N$ , which we assume to be hard.
Efficient Superposition. The set of preimages $X_{i}$ is the set of integers $[N / 2]$ . A uniform superposition $\sum_{x \in X_{i}} | x ⟩$ may be computed by generating a uniform superposition of all bitstrings of length $n$ (via Hadamard gate on every qubit), and then evaluating a comparator circuit that generates the state $\sum | x ⟩ | x < N / 2 ⟩$ where $| x < N / 2 ⟩$ is a bit on an ancilla. If this ancilla is then measured and the result is $| 1 ⟩$ , the state is collapsed onto the superposition $\sum_{x \in X_{i}} | x ⟩$ (if the result is $| 0 ⟩$ the process should simply be repeated). Then a multiplication circuit to an empty register may be executed to generate the desired state $\sum_{x \in X} | x ⟩ | x^{2} mod N ⟩$ .

∎

Proof of Decisional Diffie-Hellman TCF

We now prove that $F_{D D H}$ (defined in Methods) forms a trapdoor claw-free function family.

Theorem 5.

The function family $F_{D D H}$ is trapdoor claw-free, under the assumption of hardness of the decisional Diffie-Hellman problem for the group $G$ .

Proof.

We demonstrate each of the properties of Definition 1:

Efficient Function Generation. Each step of $G e n$ is efficient by inspection.
Trapdoor Injective Pair.
1. Injective pair: First we note that the matrix $M$ is chosen to be invertible, thus $f_{0}$ and $f_{1}$ are one-to-one. Therefore for all $x_{0} \in X_{i}$ , at most one other preimage $x_{1} \in X_{i}$ has $f_{i} (x_{0}) = f_{i} (x_{1})$ . Furthermore, since colliding pairs have the structure $(0 | | x_{0}^{'}), (1 | | x_{1}^{'})$ with $x_{0}^{'} = x_{1}^{'} + s$ and $s \in {0, 1}^{k}$ , the only preimages that will not form part of a colliding pair are those where $x_{0}^{'}$ has a zero element at an index where $s$ is nonzero, or $x_{1}^{'}$ has an element equal to $d - 1$ where $s$ is nonzero (the vector element will be outside of the range of vector elements for the other vector). Thus $| X_{k}^{'} | / | X_{k} | > {(1 - \frac{1}{d})}^{k}$ . Since $d \sim O (k^{2})$ and $k \sim O (λ)$ , we have ${lim}_{λ \to \infty} | X_{k}^{'} | / | X_{k} | = 1$ with $| X_{k}^{'} | / | X_{k} |$ monotonically increasing. Therefore, there exists a value $λ_{c}$ such that $| X_{k}^{'} | / | X_{k} | > 0.9$ for all $λ > λ_{c}$ . (We note that if we set $k = λ$ and $d = k^{2}$ , then $λ_{c} = 10$ .)
2. Trapdoor: The steps of the algorithm $T$ are efficient by inspection. Crucially, the discrete logarithm of each vector element is possible by brute force, because the elements of $x_{0}$ only take values up to polynomial in $λ$ .
Claw-free. An algorithm which could efficiently compute a claw $(0 | | x_{0}^{'}, 1 | | x_{1}^{'})$ could then trivially compute the secret vector $s = x_{0}^{'} - x_{1}^{'}$ . For any matrix $M^{'}$ , the existence of an algorithm to uniquely determine $s$ from $(g^{M^{'}}, g^{M^{'} s})$ would directly imply an algorithm for determining whether $M^{'}$ has full rank. But DDH implies it is computationally hard to determine whether a matrix $M^{'}$ is invertible given $g^{M^{'}}$ [PW08, FGK+10]. Therefore DDH implies the claw-free property.
Efficient Superposition. Because $d$ is a power of two, a superposition of all possible preimages $x$ can be computed by applying Hadamard gates to every qubit in a register all initialized to $| 0 ⟩$ . The function $f$ can then be computed by a quantum circuit implementing a classical algorithm for the group operation of $G$ .

∎

5.7.5 Overview of Trapdoor Claw-free Functions

In this section, we provide a brief overview of the cryptographic concepts upon which this work relies.

Foundational to the field of cryptography is the idea of a one-way function. Informally, this type of function is easy to compute, but hard to invert. Here, “easy” means that the function can be evaluated in time polynomial in the length of the input. By “hard” we mean that the cost of the best algorithm to invert the function is superpolynomial in the length of the input. In practice, for a given one-way function we desire that there exists a particular problem size (input length) for which the function can be evaluated fast enough that it is not overly costly to use, but for which inversion would be infeasible for even an adversary with large (but realistic) computing power. One way functions can be used directly to construct many useful cryptographic schemes, including pseudorandom number generators, private-key encryption, and secure digital signatures.

In this work, we rely on a specific type of one-way function called a trapdoor claw-free function (TCF). This class of functions has two additional features.

First, it has a trapdoor. This means that while the function is hard to invert in general, with the knowedge of some secret data (the trapdoor key) inversion becomes easy. This secret data should be easy to generate when the function is chosen (from a large family of similar functions), but should be hard to find given just the description of the function itself. For example, in this work we describe the function $x^{2} mod N$ , with $N$ the product of two primes. The trapdoor is the factorization of $N$ . It is easy to generate this function along with the trapdoor, by simply selecting two primes and multiplying them together. However, under the assumption of hardness of integer factorization, given only the function description (namely the value $N$ ) it is computationally hard to find the trapdoor (the factors $p$ and $q$ ).

The second additional feature of a TCF is that it is claw-free. This means that the function is two-to-one (has two inputs that map to each output), but it is computationally hard to find two such colliding inputs without the trapdoor. Note that if it were possible to invert the function it would be trivial to find a collision (by picking an input, computing the function to get the output corresponding to it, and then inverting the function to find the second input mapping to that output). However the claw-free property is a bit stronger than the hardness of inversion: there exist some two-to-one functions which are one-way but not claw-free.

Importantly, in this work we only require that breaking the claw-free property is hard classically—indeed, the claw-free property of the DDH and $x^{2} mod N$ TCFs described here can be fully broken by quantum computers. However, perhaps surprisingly, we do not require that breaking the claw-free property is easy for a quantum machine. In fact, the claw-free property of the LWE and Ring-LWE based TCFs remains secure even against quantum attacks. This corresponds to a very powerful property of the protocol in this paper, and other related protocols: that a quantum computer can pass the test without actually being able to find a claw. This subtle distinction stems from the fact that the quantum prover generates a superposition over two inputs that collide. No measurement of such a state can yield both superposed values classically in full, but the test is designed to not require both values—just the results of an appropriate measurement of the superposition. A classical cheater, on the other hand, still cannot pass the test because the idea of a superposition does not exist classically.

5.7.6 Explanation of circuit complexities

Here we describe each of the asymptotic circuit complexities listed in Table I of the main text. For these estimates we drop factors of $log log n$ or less. In all cases, we assume integer multiplication can be performed in time $O (n log n)$ using the Schonhage-Strassen algorithm.

We emphasize that the value of $n$ necessary to achieve classical hardness in practice varies widely among these functions, and also that the asymptotic complexities here may not be applicable at practical values of $n$ .

LWE [BCM+21, REG09] The LWE cost is dominated by multiplying an $O (n log n) \times n$ matrix of integers by a length $n$ vector. The integers are of length $log n$ , so each multiplication is expected to take approximately $O (log n)$ time. Thus, the evaluation of the entire function requires $O (n^{2} {log}^{2} n)$ operations.

$x^{2} mod N$ [RAB79] The function can be computed in time $O (n log n)$ using Schonhage-Strassen multiplication algorithm and Montgomery reduction for the modulus.

Ring-LWE [BKV+20, LPR13, dRV+15, RVM+14] Ring-LWE is dominated by the cost of multiplying one polynomial by $log n$ other polynomials. Through Number Theoretic Transform techniques similar to the Schonhage-Strassen algorithm, each polynomial multiplication can be performed in time $O (n log n)$ , so the total runtime is $O (n {log}^{2} n)$ . We note that integer multiplication and polynomial multiplication can be mapped onto each other, so the runtimes for $x^{2} mod N$ and Ring-LWE scale identically except for the fact that Ring-LWE requires $log n$ multiplications instead of $O (1)$ .

Diffie-Hellman [DH76, PW08, FGK+10] The Diffie-Hellman based construction defined in Methods requires performing multiplication of a $k \times k$ matrix by a vector, with $k \sim O (n)$ . However, the “addition” operation for the matrix-vector multiply is the group operation of $G$ ; we expect this operation to have complexity at least $O (n log n)$ (for e.g. integer multiplication). The exponentiation operations have exponent at most $d \sim O (k^{2})$ , so can be performed in $O (log n)$ group operations. So, for each of the $k^{2}$ matrix elements one must perform an operation of complexity $O (n {log}^{2} n)$ , yielding a total complexity of $O (n^{3} {log}^{2} n)$ .

Shor’s Algorithm [SHO97] Allowing for the use of Schonhage-Strassen integer multiplication, Shor’s algorithm requires $O (n^{2} log n log log n)$ gates [ZAL98].

5.7.7 Optimal classical algorithm

Here we provide an example of a classical algorithm that saturates the probability bound of Theorem 2 of the main text. It has $p_{x} = 1$ and $p_{C H S H} = \frac{3}{4}$ .

For a TCF $f : X \to Y$ , consider a classical prover that simply picks some value $x_{0} \in X$ , and then computes $y$ as $f (x_{0})$ , without ever having knowledge of $x_{1}$ . If the verifier requests a projective $x$ measurement, they always return $x_{0}$ , causing the verifier to accept with $p_{x} = 1$ . In the other case (performing rounds 2 and 3 of the protocol), upon receiving $r$ they compute $b_{0} = x_{0} \cdot r$ . The cheating prover now simply assumes that $x_{0} \cdot r = x_{1} \cdot r$ , and thus that the correct single-qubit state that would be held by a quantum prover is $| b_{0} ⟩$ , and returns measurement outcomes accordingly. With probability $\frac{1}{2}$ , $| b_{0} ⟩$ is in fact the correct single-qubit state; in this case they can always cause the verifier to accept. On the other hand, if $x_{0} \cdot r \neq x_{1} \cdot r$ , the correct state is either $| + ⟩$ or $| - ⟩$ . With probability $\frac{1}{2}$ , the measurement outcome reported by the cheating prover will happen to be correct for this state too. Overall, this cheating prover will have $p_{C H S H} = (1 + \frac{1}{2}) / 2 = \frac{3}{4}$ .

Thus we see $p_{x} + 4 p_{C H S H} - 4 = 1 + 4 \cdot \frac{3}{4} - 4 = 0$ which saturates the bound.

5.7.8 Quantum circuits for Karatsuba and schoolbook multiplication

Classically, multiplication of large integers is generally performed using recursive algorithms such as Schonhage-Strassen [SS71] and Karatsuba which have complexity as low as $O (n log n)$ . In the quantum setting, the need to store garbage bits at each level of recursion has limited their usefulness [KPF06, PRM18]. There does exist a reversible construction of Karatsuba multiplication that uses a linear number of qubits [GID19a], but due to overhead required for its implementation it does not begin to outperform schoolbook multiplication until the problem size reaches tens of thousands of bits.

Leveraging the irreversibility described in Section IID of the main text, we are able use these recursive algorithms directly, without needing to maintain garbage bits for later uncomputation. We implement both the $O (n^{1.58})$ Karatsuba multiplication algorithm and the simple $O (n^{2})$ “schoolbook” algorithm. Due to efficiencies gained from discarding garbage bits, we find that the Karatsuba algorithm already begins to outcompete schoolbook multiplication at problem sizes of under 100 bits. Thus Karatsuba seems to be the best candidate for “full-scale” tests of quantum advantage at problem sizes of $n \sim 500 - 1000$ bits. We also note that the Schonhage-Strassen algorithm scales even better than Karatsuba as $O (n log n log log n)$ . However, even in classical applications it has too much overhead to be useful at these problem sizes. We leave its potential quantum implementation to a future work.

The multiplication algorithms just described do not include the modulo $N$ operation, it must be performed in a separate step. We implement the modulo using only two classical-quantum multiplications and one addition via Montgomery reduction [MON85]. Montgomery reduction does introduce a constant $R^{'}$ into the product, but this factor can be removed in classical post-processing after $y = x^{2} R^{'} mod N$ is measured.

Finally, we note that at the implementation level, optimizing classical circuits for modular integer multiplication has received significant study in the context of performing cryptography on embedded devices and FPGAs [JIW16, MD16, YWL+16]. Mapping such optimized circuits into the quantum context may be a promising avenue for further research.

5.7.9 Details of post-selection scheme

In this section we describe several details of the post-selection scheme proposed in Section IIC of the main text.

Quantum prover with no phase coherence saturates the classical bound

Consider the two states $| ψ_{\pm} ⟩ = (| x_{0} ⟩ \pm | x_{1} ⟩)_{x} | y ⟩_{y}$ for some claw $(x_{0}, x_{1})$ with $y = f_{k} (x_{0}) = f_{k} (x_{1})$ . Note that $| ψ_{+} ⟩$ is the state that would be held by a noise-free prover. Suppose a noisy quantum prover is capable of generating the mixed state

ρ_{δ} = (1 / 2 + δ) | ψ_{+} ⟩ ⟨ ψ_{+} | + (1 / 2 - δ) | ψ_{-} ⟩ ⟨ ψ_{-} | .

(5.17)

In words, they are able to generate a state that is a superposition of the correct bitstrings, but with the correct phase only $1 / 2 + δ$ fraction of the time. Here we show that such a prover can exceed the classical threshold of Theorem 2 of the main text, whenever $δ > 0$ . We proceed by examining this prover’s behavior during the protocol.

First, we note that if the verifier requests a projective $x$ measurement after Round 1 of the protocol, this prover will always succeed—they simply measure the $x$ register as instructed, and the phase is not relevant. Thus, using the notation of Theorem 2, $p_{x} = 1$ . With this value set, to exceed the bound we must achieve $p_{C H S H} > 3 / 4$ . Naively performing the rest of the protocol as described in the main text does not exceed the bound when $δ$ is small. However, the noisy prover can exceed the bound if they adjust the angle of their measurements in the third round of protocol (but preserve the sign of the measurement requested by the prover). We now demonstrate how.

Define $| ϕ ⟩$ as the “correct” single-qubit state at the end of Round 2—one of ${| 0 ⟩, | 1 ⟩, | + ⟩, | - ⟩}$ . Let $f_{↕}$ be the probability that our noisy prover holds the correct state when $| ϕ ⟩ \in {| 0 ⟩, | 1 ⟩}$ , and $f_{\leftrightarrow}$ the corresponding probability when $| ϕ ⟩ \in {| 0 ⟩, | 1 ⟩}$ . In the first case, the potential phase error of our prover does not affect the single-qubit state, so $f_{↕} = 1$ . In the other case, the state is only correct when the phase is correct, so $f_{\leftrightarrow} = 1 / 2 + δ$ . We see that our prover will hold the correct single-qubit state with probability greater than $3 / 4$ . But, if they naively measure in the prescribed off-diagonal basis $θ \in {π / 4, - π / 4}$ from the verifier, for small $δ$ their success probability will be less than $3 / 4$ . This can be rectified by adjusting the rotation angle of the measurement basis.

Letting $\pm θ^{'}$ define the pair of measurement angles used by the prover in step 3 of the protocol (nominally $θ^{'} = | θ | = π / 4$ ), we can now express the prover’s success probability $p_{C H S H}$ as

p_{m} = \frac{1}{2} [{cos}^{2} (\frac{θ^{'}}{2}) f_{↕} + {cos}^{2} (\frac{θ^{'}}{2} - \frac{π}{4}) f_{\leftrightarrow} + {sin}^{2} (\frac{θ^{'}}{2}) (1 - f_{↕}) + {sin}^{2} (\frac{θ^{'}}{2} - \frac{π}{4}) (1 - f_{\leftrightarrow})]

(5.18)

If the prover measures with $θ^{'} = π / 4$ as prescribed in the protocol, the success rate will be $p_{C H S H} \approx 0.68 + O (δ) < 3 / 4$ . However, if they instead adjust their measurement angle to $θ = δ$ , they instead achieve $p_{C H S H} = 3 / 4 + 3 δ^{2} / 8 - O (δ^{3})$ , which exceeds the classical bound (provided that $δ$ is large enough to be noticeable).

In practice, both $f_{↕}$ and $f_{\leftrightarrow}$ are likely to be less than one; the optimal measurement angle can be determined as

θ_{o p t}^{'} = {tan}^{- 1} (\frac{2 f_{\leftrightarrow} - 1}{2 f_{↕} - 1})

(5.19)

which is the result of optimizing Equation 5.18 over $θ^{'}$ . In a real experiment, it would be most effective to empirically determine $f_{↕}$ and $f_{\leftrightarrow}$ and then use Equation 5.19 to determine the optimal measurement angle.

Details of simulation and error model

We now describe the details of the numerical simulation that was used to generate Figure 2 of the main text. For several values of the overall circuit fidelity $F$ , we established a per-gate fidelity as $f = F^{1 / N_{g}}$ where $N_{g}$ is the number of gates in the $x^{2} mod N$ circuit. We then generated a new circuit to compute the function $(3^{a} x)^{2} mod 3^{2 a} N$ for various values of $a$ (see next subsection for an explanation of the choice $k = 3^{a}$ ). For each gate in the new circuit, with probability $1 - f$ we added a Pauli “error” operator randomly chosen from ${X, Y, Z}$ to one of the qubits to which the gate was being applied.

For the simulation, we randomly chose two primes $p$ and $q$ that multiplied to yield an integer $N$ of length $512$ bits. We then randomly chose a large set of colliding preimage pairs, and simulated the circuit separately for each such preimage (which is classically efficient, since the circuits only consist of $X$ , CNOT, and Toffoli gates). The relative phase between each pair of preimages (due to error gates) was tracked explicitly during the simulation. Finally, the expected success rate of the prover was determined by analyzing the correctness of the bitstrings and their relative phase at the end of the circuit.

The primes $p$ and $q$ used to generate Figure 2 of the main text are (in base 10):


Ψp = 113287732919697174280284729511923238986362403955638184856698528941220766063369
Ψq = 98359967382337110635377957241353362183812709461386334819166502848512740692727

Choice of $k = 3^{a}$ to improve postselection for $x^{2} mod N$

In the previous subsection, we map the TCF $f_{N} = x^{2} mod N$ to the function $f_{N}^{'} = (k x)^{2} mod k^{2} N$ . To achieve this at the implementation level, we may use essentially the same circuit for modular multiplication; the only new requirement is to efficiently generate a superposition of multiples $k x$ in the $x$ register. We generate this superposition by starting with a uniform superposition over values $x$ and then multiplying by $k$ .

Normally, quantum multiplication circuits (like those we use to evaluate $x^{2} mod N$ ) perform an out-of-place multiplication, where the result is stored in a new register. In this case, however, it is preferable to do the multiplication “in-place,” where the result is stored in the input register itself—this way the $y$ value is computed directly from the input register and thus is more likely to reflect errors that may occur in the input.

In general, performing in-place multiplication is complicated, particularly on a quantum register, because the input is being modified as it is being consumed (not to mention concerns about reversibility). However, multiplication by small constants is much simpler to implement. By setting $k$ to a power of three, we are able to implement the in-place multiplication by performing a sequence of in-place multiplications by $3$ , which can each be performed quite efficiently (see implementation in the attached Cirq code ¹⁷¹⁷ 17 Code is available at https://github.com/GregDMeyer/quantum-advantage and is archived on Zenodo [MEY22]).

Theory prediction of Figure 2 of the main text

For the dashed “theory prediction” lines of Figure 2 of the main text, we predicted the success probabilities under two assumptions (which the numerical experiments are intended to test). First, among noisy runs where at least one bit flip error occurs, the output bitstring is approximately uniformly distributed. Second, we assume that with at least one phase flip error, the probability that the phase is correct in the final state is $1 / 2$ .

Under these assumptions, we compute the predicted success rates $p_{x}$ and $p_{C H S H}$ as follows:

For a given overall fidelity $F$ of the original $x^{2} mod N$ circuit containing $N_{g}$ gates, compute a per-gate fidelity $f = F^{1 / N_{g}}$ . Then compute the expected overall fidelity $F^{'}$ of running the slightly larger $(k x)^{2} mod k^{2} N$ containing $N_{g}^{'}$ gates as $f^{N_{g}^{'}}$ .
Using $F^{'}$ and the given error model (see “Details of simulation and error model” section above), compute three disjoint probabilities: that no errors occur, that only phase errors occur, or that at least one bit flip error (and possibly also phase errors) occurs.
Compute the probability that the output will pass postselection, which includes both cases with no bit flip errors and those that are corrupted but happen to pass postselection by chance.
Normalizing to only those runs that pass postselection, compute $p_{x}$ and $p_{C H S H}$ :
1. $p_{x}$ is computed as the probability that no bit flip errors occurred (among those runs that pass postselection). This is a lower bound (that seems intuitively tight); it assumes a negligible probability that the measured pair $(x, y)$ still has $y = f (x)$ despite bit flip errors.
2. $p_{C H S H}$ is computed by finding the probability that no errors occurred that would affect the single-qubit state at the end of round 2. When the correct single-qubit state should be polarized along $Z$ , this is taken to be the probability that no bit flip errors occurred (phase errors are allowed since they will not affect this state); when the correct state should be polarized along $X$ , it is taken as the probability that no errors at all have occurred. In these “no-error” cases, we compute the verifier’s probability of accepting by applying the adjusted measurement basis described in the first sub-section above, “Quantum prover with no phase coherence saturates the classical bound”. Finally, for the case that there was an error that could affect the single-qubit state, the probability that the verifier receives a correct measurement outcome is taken to be $1 / 2$ (the single-qubit state is taken to be maximally mixed).
Compute the measure of “quantumness” from $p_{x}$ and $p_{C H S H}$ .
Compute the estimate runtime by multiplying the increase in quantum circuit size by the expected number of iterations required to pass postselection (which is computed from the analysis above).

Chapter 5 Classically-verifiable quantum advantage from a computational Bell test

5.1 Introduction

5.2 Background and Related Work

5.3 Interactive Protocol for Quantum Advantage

5.3.1 Description of the protocol

5.3.2 Completeness and soundness

Perfect quantum prover (completeness)

Theorem 2.

Proof.

Classical prover (soundness)

Theorem 3.

Proof.

5.3.3 Variations on the protocol

5.3.4 Robustness: Error mitigation via postselection

Numerical analysis of the postselection scheme for x2modN

5.3.5 Efficient quantum evaluation of irreversible classical circuits

5.4 The search for alternative trapdoor claw-free functions

5.5 Quantum circuits for trapdoor claw-free functions

5.5.1 Quantum circuits for x2modN

5.5.2 Phase circuits

5.5.3 Experimental implementation

5.6 Conclusion and outlook

5.7 Additional proofs and data

5.7.1 List decoding lemma

Lemma 1.

Proof.

5.7.2 Trapdoor claw-free function constructions

TCF from Rabin’s function x2modN

TCF from Decisional Diffie-Hellman

5.7.3 Table of circuit sizes

5.7.4 Cryptographic proofs of TCF properties

TCF definition

Definition 1.

Proof of x2modN Tcf

Theorem 4.

Proof.

Proof of Decisional Diffie-Hellman TCF

Theorem 5.

Proof.

5.7.5 Overview of Trapdoor Claw-free Functions

5.7.6 Explanation of circuit complexities

5.7.7 Optimal classical algorithm

5.7.8 Quantum circuits for Karatsuba and schoolbook multiplication

5.7.9 Details of post-selection scheme

Quantum prover with no phase coherence saturates the classical bound

Details of simulation and error model

Choice of k=3a to improve postselection for x2modN

Theory prediction of Figure 2 of the main text

Numerical analysis of the postselection scheme for $x^{2} mod N$

5.5.1 Quantum circuits for $x^{2} mod N$

TCF from Rabin’s function $x^{2} mod N$

Proof of $x^{2} mod N$ Tcf

Choice of $k = 3^{a}$ to improve postselection for $x^{2} mod N$