Qiskit AI-powered transpiler introduction

Usage estimate: 5 minutes on IBM Heron (NOTE: This is an estimate only. Your runtime may vary.)

Learning outcomes

After going through this tutorial, users should understand:

How to use the AI-powered transpiler (generate_ai_pass_manager) as a drop-in replacement for the standard transpiler
How the AI-powered transpiler compares to the default transpiler in terms of two-qubit depth, gate count, and transpilation time
How to use mirror circuits to evaluate transpilation quality through hardware execution

Prerequisites

We suggest that users are familiar with the following topics before going through this tutorial:

Background

The Qiskit AI-powered transpiler introduces machine-learning-based transpilation passes that can produce shorter, more hardware-efficient circuits than traditional heuristic methods such as SABRE. Shorter circuits accumulate less noise, which directly improves result quality on real quantum hardware.

In this tutorial we compare two transpilation strategies:

Strategy	API
Default	`generate_preset_pass_manager(optimization_level=3, ...)`
AI	`generate_ai_pass_manager(optimization_level=1, ai_optimization_level=3, ...)`

We measure three metrics for each strategy: two-qubit gate depth, total gate count, and transpilation runtime.

AI-powered transpiler benchmarks

In benchmarking tests, the AI-powered transpiler consistently produced shallower, higher-quality circuits compared to the standard Qiskit transpiler. For these tests, we used Qiskit's default pass manager strategy, configured with generate_preset_pass_manager. While this default strategy is often effective, it can struggle with larger or more complex circuits. By contrast, AI-powered passes achieved an average 24% reduction in two-qubit gate counts and a 36% reduction in circuit depth for large circuits (100+ qubits) when transpiling to the heavy-hex topology of IBM Quantum® hardware. For more information on these benchmarks, refer to this blog.

This tutorial explores the key benefits of AI passes and how they compare to traditional methods.

Requirements

Before starting this tutorial, be sure you have the following installed:

Qiskit SDK v2.0 or later, with visualization support
Qiskit Runtime (pip install qiskit-ibm-runtime) v0.22 or later
Qiskit IBM Transpiler with AI local mode (pip install 'qiskit-ibm-transpiler[ai-local-mode]')
Qiskit Aer (pip install qiskit-aer)

Setup

from qiskit import QuantumCircuit
from qiskit.circuit.random import random_circuit
from qiskit.transpiler import generate_preset_pass_manager
from qiskit_ibm_runtime import QiskitRuntimeService, SamplerV2
from qiskit_ibm_transpiler import generate_ai_pass_manager
from qiskit_aer import AerSimulator
from qiskit_aer.noise import NoiseModel, depolarizing_error
import matplotlib.pyplot as plt
from statistics import mean, stdev
import time
import logging

seed = 42


def transpile_with_metrics(pass_manager, circuit):
    """Transpile a circuit and return the result along with key metrics."""
    start = time.time()
    qc_out = pass_manager.run(circuit)
    elapsed = time.time() - start

    depth_2q = qc_out.depth(lambda x: x.operation.num_qubits == 2)
    gate_count = qc_out.size()

    return qc_out, {
        "depth_2q": depth_2q,
        "gate_count": gate_count,
        "time_s": round(elapsed, 3),
    }


def remap_to_contiguous(tqc):
    """Remap a transpiled circuit to use contiguous qubit indices.

    Transpiled circuits target specific physical qubits (e.g., qubit 45, 67)
    on a large backend. This remaps them to 0, 1, 2, ... so Aer only
    simulates the active qubits."""
    active = sorted(
        {tqc.find_bit(q).index for inst in tqc.data for q in inst.qubits}
    )
    qubit_map = {old: new for new, old in enumerate(active)}
    new_qc = QuantumCircuit(len(active))
    for inst in tqc.data:
        old_indices = [tqc.find_bit(q).index for q in inst.qubits]
        new_qc.append(inst.operation, [qubit_map[i] for i in old_indices])
    return new_qc


def build_mirror_circuit(tqc, simulate=True):
    """Build a mirror circuit: U followed by U-dagger, with measurements.

    The expected output is always |0...0>, so measuring the survival
    probability reveals how much noise each transpilation strategy adds.

    Args:
        tqc: A transpiled circuit.
        simulate: If True (default), remap to contiguous qubits so Aer
            only simulates the active qubits. If False, keep the full
            physical layout for hardware execution."""
    if simulate:
        tqc = remap_to_contiguous(tqc)
    mirror = tqc.compose(tqc.inverse())
    mirror.measure_all()
    return mirror


def print_summary(results):
    """Print a summary of each metric as mean +/- stdev across all circuits,
    along with the mean percentage improvement of AI over Default."""
    metrics = [
        ("Depth 2Q", "Depth 2Q (Default)", "Depth 2Q (AI)"),
        ("Gate Count", "Gate Count (Default)", "Gate Count (AI)"),
        ("Time (s)", "Time (Default)", "Time (AI)"),
    ]
    header = (
        f"{'Metric':<12}{'Default (mean +/- std)':>24}"
        f"{'AI (mean +/- std)':>22}{'AI % improvement':>22}"
    )
    print(header)
    print("-" * len(header))
    for label, col_def, col_ai in metrics:
        defaults = [r[col_def] for r in results]
        ais = [r[col_ai] for r in results]
        pct = [(d - a) / d * 100 for d, a in zip(defaults, ais)]
        default_str = f"{mean(defaults):.1f} +/- {stdev(defaults):.1f}"
        ai_str = f"{mean(ais):.1f} +/- {stdev(ais):.1f}"
        pct_str = f"{mean(pct):+.1f}% +/- {stdev(pct):.1f}%"
        print(f"{label:<12}{default_str:>24}{ai_str:>22}{pct_str:>22}")


def plot_metrics_and_pct(results, title_prefix):
    """Plot metric comparisons and percentage improvement of AI over Default."""
    qubits = [r["Qubits"] for r in results]
    metrics = [
        ("Depth 2Q (Default)", "Depth 2Q (AI)", "Two-Qubit Depth"),
        ("Gate Count (Default)", "Gate Count (AI)", "Gate Count"),
        ("Time (Default)", "Time (AI)", "Transpilation Time"),
    ]

    # Row 1: raw metric comparison
    fig, axs = plt.subplots(1, 3, figsize=(21, 5))
    fig.suptitle(
        f"{title_prefix}: Metric Comparison",
        fontsize=15,
        fontweight="bold",
        y=1.02,
    )
    for ax, (col_def, col_ai, label) in zip(axs, metrics):
        ax.plot(qubits, [r[col_def] for r in results], "o-", label="Default")
        ax.plot(qubits, [r[col_ai] for r in results], "s-", label="AI")
        ax.set_title(label)
        ax.set_xlabel("Number of Qubits")
        ax.set_ylabel(label)
        ax.legend()
    plt.tight_layout()
    plt.show()

    # Row 2: percentage improvement
    fig, axs = plt.subplots(1, 3, figsize=(21, 5))
    fig.suptitle(
        f"{title_prefix}: % Improvement of AI over Default",
        fontsize=15,
        fontweight="bold",
        y=1.02,
    )
    for ax, (col_def, col_ai, label) in zip(axs, metrics):
        pct = [(r[col_def] - r[col_ai]) / r[col_def] * 100 for r in results]
        ax.axhline(
            0, color="#1f77b4", linewidth=2, label="Default (baseline)"
        )
        ax.plot(qubits, pct, "s-", color="#ff7f0e", label="AI")
        ax.fill_between(qubits, 0, pct, alpha=0.15, color="#ff7f0e")
        ax.set_title(label)
        ax.set_xlabel("Number of Qubits")
        ax.set_ylabel("% Improvement")
        ax.legend()
    plt.tight_layout()
    plt.show()


# Suppress verbose AI-powered transpiler logs
logging.getLogger(
    "qiskit_ibm_transpiler.wrappers.ai_local_synthesis"
).setLevel(logging.WARNING)

Small-scale simulator example

Step 1: Map classical inputs to a quantum problem

We generate 20 random circuits with depth 4, where the number of qubits ranges from six to 25. These circuits will serve as our test cases for comparing transpilation strategies.

num_circuits_sim = 20
depth_sim = 4
qubit_range_sim = list(range(6, 26))

circuits_sim = [
    # We have only two qubit gates, as those test how well the transpiler can optimize the circuit.
    random_circuit(
        num_qubits=n,
        depth=depth_sim,
        max_operands=2,
        num_operand_distribution={2: 1},
        seed=seed + i,
    )
    for i, n in enumerate(qubit_range_sim)
]

print(
    f"Created {len(circuits_sim)} circuits with qubit counts: {qubit_range_sim}"
)
circuits_sim[0].draw(output="mpl", fold=-1)

Output:

Created 20 circuits with qubit counts: [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]

Step 2: Optimize problem for quantum hardware execution

We build the default (SABRE) pass manager for the chosen backend. Both transpilation strategies target the backend's full coupling map. Local simulation later stays tractable because the simulation step uses remap_to_contiguous to relabel each transpiled circuit onto only its active qubits, so Aer simulates just those qubits instead of the entire device.

service = QiskitRuntimeService()
backend = service.least_busy(
    min_num_qubits=100, operational=True, simulator=False
)


pm_default_sim = generate_preset_pass_manager(
    optimization_level=3,
    backend=backend,
    seed_transpiler=seed,
)

results_sim = []

for i, qc in enumerate(circuits_sim):
    n = qubit_range_sim[i]

    qc_default, m_default = transpile_with_metrics(pm_default_sim, qc)

    # Create a fresh AI pass manager each iteration to avoid stale layout state
    pm_ai = generate_ai_pass_manager(
        optimization_level=1,
        ai_optimization_level=3,
        backend=backend,
    )
    qc_ai, m_ai = transpile_with_metrics(pm_ai, qc)

    results_sim.append(
        {
            "Qubits": n,
            "Depth 2Q (Default)": m_default["depth_2q"],
            "Depth 2Q (AI)": m_ai["depth_2q"],
            "Gate Count (Default)": m_default["gate_count"],
            "Gate Count (AI)": m_ai["gate_count"],
            "Time (Default)": m_default["time_s"],
            "Time (AI)": m_ai["time_s"],
        }
    )

print_summary(results_sim)

Output:

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Metric        Default (mean +/- std)     AI (mean +/- std)      AI % improvement
--------------------------------------------------------------------------------
Depth 2Q               33.0 +/- 12.9          26.4 +/- 8.0      +15.8% +/- 17.6%
Gate Count           522.0 +/- 266.0       560.5 +/- 279.1        -9.0% +/- 9.0%
Time (s)                 0.0 +/- 0.0           0.2 +/- 0.1    -893.6% +/- 362.9%

The summary table shows the mean and standard deviation of each metric across all 20 circuits, along with the average percentage improvement of the AI-powered transpiler over the default. Positive values indicate the AI-powered transpiler produced better results; negative values indicate the default was better.

For this small-scale example, the AI-powered transpiler achieves roughly 16% lower two-qubit depth on average, but at the cost of roughly 9% higher gate count. This highlights a key trade-off when choosing between the two strategies: the AI-powered transpiler prioritizes depth reduction (fewer sequential layers of two-qubit gates), while the default transpiler (SABRE) prioritizes minimizing total gate count (fewer SWAP insertions). Depending on your application, one metric may matter more than the other.

plot_metrics_and_pct(results_sim, "Small-Scale Random Circuits")

Output:

Two-qubit depth: The AI-powered transpiler generally produces circuits with lower two-qubit depth. Depth is one of the primary metrics the AI routing model is trained to optimize, and the improvement is visible across most circuit sizes, though SABRE does match or beat it on individual circuits.

Gate count: The results are closely matched at this scale, with SABRE holding a slight edge overall. SABRE's routing heuristic is designed to minimize the number of inserted SWAP gates, which directly reduces gate count. At small circuit sizes, the difference is modest.

Transpilation time: SABRE's runtime is nearly constant regardless of qubit count, so circuit size has little effect on its transpilation time at this scale. SABRE's core routing logic is highly optimized (largely implemented in Rust). The AI-powered transpiler takes noticeably longer and scales with circuit size, though the absolute times remain reasonable for interactive use.

Step 3: Execute using Qiskit primitives

To evaluate the impact of transpilation on circuit fidelity, build mirror circuits from the 10-qubit case and run them on the Aer simulator with a simple noise model. The expected output of a mirror circuit is always the all-zeros bitstring, so the probability of measuring $|0\rangle^{\otimes n}$ demonstrates how well each transpilation strategy preserves fidelity.

# Use the 10-qubit circuit (index where qubits == 10)
idx_10q = qubit_range_sim.index(10)

qc_10q = circuits_sim[idx_10q]
qc_default_10q, _ = transpile_with_metrics(pm_default_sim, qc_10q)

pm_ai = generate_ai_pass_manager(
    optimization_level=1,
    ai_optimization_level=3,
    backend=backend,
)
qc_ai_10q, _ = transpile_with_metrics(pm_ai, qc_10q)

tqc_methods = {
    "Default": qc_default_10q,
    "AI": qc_ai_10q,
}

print(
    f"Default: depth {qc_default_10q.depth()}, gates {qc_default_10q.size()}"
)
print(f"AI:      depth {qc_ai_10q.depth()}, gates {qc_ai_10q.size()}")

Output:

Default: depth 84, gates 280
AI:      depth 91, gates 343

# Build a simple depolarizing noise model
noise_model = NoiseModel()
noise_model.add_all_qubit_quantum_error(
    depolarizing_error(0.001, 1),
    ["sx", "x", "rz"],  # ~0.1% per 1Q gate
)
noise_model.add_all_qubit_quantum_error(
    depolarizing_error(0.01, 2),
    ["cx", "ecr"],  # ~1% per 2Q gate
)

aer_sim = AerSimulator(noise_model=noise_model)

shots = 10000
survival_probs = {}

for method, tqc in tqc_methods.items():
    mirror = build_mirror_circuit(tqc, simulate=True)

    sampler = SamplerV2(mode=aer_sim)
    job = sampler.run([mirror], shots=shots)
    counts = job.result()[0].data.meas.get_counts()

    all_zeros = "0" * mirror.num_qubits
    survival = counts.get(all_zeros, 0) / shots
    survival_probs[method] = survival
    print(
        f"{method:8s}  P(|00...0>) = {survival:.4f}  ({counts.get(all_zeros, 0)}/{shots})"
    )

Output:

Default   P(|00...0>) = 0.8460  (8460/10000)
AI        P(|00...0>) = 0.8121  (8121/10000)

We ran both mirror circuits through the Aer simulator with a simple depolarizing noise model. The survival probability, defined as the fraction of shots that return the all-zeros bitstring, quantifies how much noise each transpilation strategy introduces.

Step 4: Post-process and return result in desired classical format

We extract the probability of measuring the all-zeros bitstring from both runs. A higher survival probability indicates better fidelity, meaning the transpilation introduced less noise. The plot below shows the complement, 1 - P(|0...0>), so that a lower bar indicates better fidelity and small differences in error are easier to see.

# Plot 1 - P(|0...0>), the probability of an erroneous (non-zero) outcome.
# A lower bar means the transpilation introduced less noise.
error_probs = {method: 1 - p for method, p in survival_probs.items()}

fig, ax = plt.subplots(figsize=(6, 4))
ax.bar(
    error_probs.keys(),
    error_probs.values(),
    color=["steelblue", "coral"],
)
ax.set_ylabel("1 - P(|0...0>)")
ax.set_title("Mirror Circuit Error (10-qubit, Aer Simulator)")
ax.set_ylim(0, 1)
plt.tight_layout()
plt.show()

Output:

In this case, the default transpiler produced both a shallower and smaller circuit for this particular 10-qubit instance, so its higher fidelity is expected. Per-circuit results vary: as the summary table above shows, the AI-powered transpiler's advantage is in lower two-qubit depth on average, not on every individual circuit. Which strategy yields higher fidelity depends on the magnitude of the difference in each metric, the noise characteristics of the hardware, and the structure of the circuit. Under a uniform depolarizing noise model, total gate count often has a more direct impact on accumulated error than depth alone.

Large-scale hardware example

Steps 1-4

Here all of these details are put together into a clear workflow at a larger scale, which is then run on real quantum hardware.

The code below generates 25 random circuits with depth 8, where the number of qubits ranges from 26 to 50. These circuits are then transpiled with both strategies and the same metrics are collected. Then we build mirror circuits from the 26-qubit case and submit them to the real backend.

# -------------------------Step 1-------------------------
num_circuits_hw = 25
depth_hw = 8
qubit_range_hw = list(range(26, 51))

circuits_hw = [
    # We have only two qubit gates, as those test how well the transpiler can optimize the circuit.
    random_circuit(
        num_qubits=n,
        depth=depth_hw,
        max_operands=2,
        num_operand_distribution={2: 1},
        seed=seed + i,
    )
    for i, n in enumerate(qubit_range_hw)
]

print(
    f"Created {len(circuits_hw)} circuits with qubit counts: {qubit_range_hw}"
)

Output:

Created 25 circuits with qubit counts: [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]

# -------------------------Step 2-------------------------
pm_default = generate_preset_pass_manager(
    optimization_level=3,
    backend=backend,
    seed_transpiler=seed,
)

results_hw = []

for i, qc in enumerate(circuits_hw):
    n = qubit_range_hw[i]

    qc_default, m_default = transpile_with_metrics(pm_default, qc)

    # Create a fresh AI pass manager each iteration to avoid stale layout state
    pm_ai = generate_ai_pass_manager(
        optimization_level=1,
        ai_optimization_level=3,
        backend=backend,
    )
    qc_ai, m_ai = transpile_with_metrics(pm_ai, qc)

    results_hw.append(
        {
            "Qubits": n,
            "Depth 2Q (Default)": m_default["depth_2q"],
            "Depth 2Q (AI)": m_ai["depth_2q"],
            "Gate Count (Default)": m_default["gate_count"],
            "Gate Count (AI)": m_ai["gate_count"],
            "Time (Default)": m_default["time_s"],
            "Time (AI)": m_ai["time_s"],
        }
    )

print_summary(results_hw)

Output:

Metric        Default (mean +/- std)     AI (mean +/- std)      AI % improvement
--------------------------------------------------------------------------------
Depth 2Q              217.4 +/- 50.4        191.0 +/- 35.6      +10.9% +/- 10.7%
Gate Count         4513.3 +/- 1394.3     5227.1 +/- 1536.4       -16.4% +/- 5.8%
Time (s)                 0.1 +/- 0.0           3.5 +/- 1.5   -3588.2% +/- 643.6%

plot_metrics_and_pct(results_hw, "Large-Scale Random Circuits")

Output:

# -------------------------Step 3-------------------------
# Build mirror circuits from the 26-qubit case
idx_26q = qubit_range_hw.index(26)

qc_26q = circuits_hw[idx_26q]
qc_default_26q, _ = transpile_with_metrics(pm_default, qc_26q)

pm_ai = generate_ai_pass_manager(
    optimization_level=1,
    ai_optimization_level=3,
    backend=backend,
)
qc_ai_26q, _ = transpile_with_metrics(pm_ai, qc_26q)

mirror_default_hw = build_mirror_circuit(qc_default_26q, simulate=False)
mirror_ai_hw = build_mirror_circuit(qc_ai_26q, simulate=False)

# Re-transpile to basis gates (the inverse can introduce gates like sxdg)
pm_basis = generate_preset_pass_manager(
    optimization_level=0,
    backend=backend,
)
mirror_default_hw = pm_basis.run(mirror_default_hw)
mirror_ai_hw = pm_basis.run(mirror_ai_hw)

print(
    f"Mirror circuit (Default): depth {mirror_default_hw.depth()}, gates {mirror_default_hw.size()}"
)
print(
    f"Mirror circuit (AI):      depth {mirror_ai_hw.depth()}, gates {mirror_ai_hw.size()}"
)

# Submit to real hardware
sampler_hw = SamplerV2(mode=backend)
sampler_hw.options.environment.job_tags = ["TUT_AITI"]

shots_hw = 500000
job_hw = sampler_hw.run([mirror_default_hw, mirror_ai_hw], shots=shots_hw)
print(f"Job submitted: {job_hw.job_id()}")

Output:

Mirror circuit (Default): depth 1577, gates 9672
Mirror circuit (AI):      depth 1235, gates 11092
Job submitted: d8gt7vm6983c73dqbg0g

# -------------------------Step 4-------------------------
result_hw = job_hw.result()

survival_probs_hw = {}
for i, method in enumerate(["Default", "AI"]):
    counts = result_hw[i].data.meas.get_counts()
    mirror = [mirror_default_hw, mirror_ai_hw][i]
    all_zeros = "0" * mirror.num_qubits
    survival = counts.get(all_zeros, 0) / shots_hw
    survival_probs_hw[method] = survival
    print(
        f"{method:8s}  P(|00...0>) = {survival:.4f}  ({counts.get(all_zeros, 0)}/{shots_hw})"
    )

# Plot 1 - P(|0...0>), the probability of an erroneous (non-zero) outcome.
# A lower bar means the transpilation introduced less noise.
error_probs_hw = {method: 1 - p for method, p in survival_probs_hw.items()}

fig, ax = plt.subplots(figsize=(6, 4))
ax.bar(
    error_probs_hw.keys(),
    error_probs_hw.values(),
    color=["steelblue", "coral"],
)
ax.set_ylabel("1 - P(|0...0>)")
ax.set_title(f"Mirror Circuit Error (26-qubit, {backend.name})")
ax.set_ylim(0, 1)
plt.tight_layout()
plt.show()

Output:

Default   P(|00...0>) = 0.0005  (239/500000)
AI        P(|00...0>) = 0.0050  (2516/500000)

Analysis of results

The large-scale results reinforce the trends observed in the small-scale example, now at a more demanding scale.

Two-qubit depth: The AI-powered transpiler continues to deliver noticeably lower two-qubit depth across the full range of circuit sizes. Depth optimization is one of the primary objectives the AI routing model is trained on, and the advantage is more pronounced at larger qubit counts where the routing problem becomes harder for heuristic methods.

Gate count: The default transpiler (SABRE) consistently produces circuits with fewer gates across all circuit sizes in this range. SABRE's heuristic is specifically designed to minimize gate count, and at this scale the advantage is clear and uniform.

Transpilation time: The gap in transpilation time widens at larger scales. SABRE remains nearly constant, while the AI-powered transpiler's runtime grows more steeply. Despite this, the AI-powered transpiler runtime remains practical for most workflows.

Mirror circuit fidelity: Both methods produce survival probabilities well under 1% at this scale, leaving little usable signal. With total gate counts around 10,000 and two-qubit depths exceeding 1,000, the depolarizing noise accumulated across the mirror circuit overwhelms most of the signal. This highlights a key limitation of the mirror circuit approach: while it is simple and requires no classical simulation, it does not scale well to large or deep circuits, where both methods are pushed close to the noise floor and the small surviving signal is dominated by accumulated error.

While these results underscore the AI-powered transpiler's effectiveness, it is important to note its limitations. The AI synthesis method is currently only available for certain coupling maps, which may restrict its broader applicability. This constraint should be considered when evaluating its usage in different scenarios.

Next steps

Recommendations

If you found this work interesting, you might be interested in the following material:

Was this page helpful?

Report a bug, typo, or request content on GitHub.