Open quantum datasets, published where builders work.

Neura Parse publishes its quantum training and evaluation data on Hugging Face: one umbrella quantum-computing dataset and sixteen deep-dive verticals, from fault tolerance and compilation to sensing and post-quantum security. Every set ships the same schema, so fine-tuning, benchmarking, and continued pretraining draw from one consistent corpus.

huggingface.co/Neura-parseCC-BY-4.0EN · Parquet · train/test

Open the Hugging Face org GitHub org

Public datasets

Deep-dive verticals

Record styles / schema

CC-BY-4.0

License · all sets

001Corpus model

One umbrella, sixteen verticals, one schema.

The umbrella dataset covers the whole field at survey depth. Each vertical then expands one domain to research depth: derivations in the theory sets, runnable simulations in the hardware sets, executable pipelines in the software sets. Because every record follows the same schema, the corpus composes: train on the umbrella, specialize on a vertical, evaluate on held-out test splits.

Instruction / response

Supervised fine-tuning (SFT) of assistants and copilots

Open Q&A

Free-form evaluation and retrieval-grounded answering

Multiple choice

Deterministic scoring and regression benchmarks

Runnable code tasks

Code-generation training and execution-checked evaluation

Concepts + pretraining text

Continued pretraining and encyclopedic grounding

Start hereUmbrella dataset

DS-00DS-00Quantum Computing (umbrella)Neura-parse/quantum-computingThe general multi-format dataset spanning theory and hardware: qubits, gates, and algorithms through QPUs, error correction, quantum software, and quantum machine learning. Instruction/response pairs, open and multiple-choice Q&A, runnable code tasks, encyclopedic concepts, and pretraining-style text under one schema.quantum-computingqiskitquantum-machine-learningsynthetic

002Dataset ledger

Sixteen verticals, field by field.

All sets: EN · Parquet · train + test · CC-BY-4.0

Foundations & theory

Proof-oriented verticals on what quantum computation is and where advantage comes from.

DS-01DS-01Quantum Information & Complexity TheoryNeura-parse/quantum-information-and-complexity-theoryChannels, entropies, entanglement measures, distinguishability, and capacities united with complexity classes and the structure of quantum advantage.quantum-channelsentropybqpqma DS-02DS-02QML Theory: Trainability & GeneralizationNeura-parse/quantum-machine-learning-theoryWhy parameterized circuits train or don't (barren plateaus), what they represent, when they provably beat classical models, and classical shadows for learning from quantum data.barren-plateausgeneralizationclassical-shadows DS-03DS-03Advanced Quantum AlgorithmsNeura-parse/advanced-quantum-algorithmsThe fault-tolerant canon with full derivations plus the modern QSVT/block-encoding toolkit through Hamiltonian simulation, amplitude estimation, and quantum linear systems.qsvtblock-encodinghamiltonian-simulationgrover

Machine learning × quantum

Both directions: quantum models that learn from data, and classical ML that makes quantum computers work.

DS-04DS-04QML Models: Encodings, Kernels & QNNsNeura-parse/quantum-machine-learning-modelsCode-first coverage of feature maps, variational classifiers, quantum kernels and QSVMs, through quantum GANs, Born machines, QCNNs, quantum RL, and quantum transformers.qmlquantum-kernelsquantum-ganpennylane DS-05DS-05AI for QuantumNeura-parse/ai-for-quantumNeural and transformer QEC decoders, RL pulse and calibration control, neural-network quantum states, ML tomography, learned circuit optimization, and LLM/agentic quantum software engineering.neural-decodersalphaqubitllm-for-quantum

Hardware, error correction & fault tolerance

From device physics to the physical-to-logical resource pipeline, simulated in code.

DS-06DS-06Quantum Hardware Device PhysicsNeura-parse/quantum-hardware-device-physicsHow qubits are built, controlled, and scaled across superconducting, trapped-ion, neutral-atom, and spin modalities, with runnable QuTiP/scqubits simulations.superconducting-qubitstrapped-ionsquantum-control DS-07DS-07Fault-Tolerant QC: Codes, Decoders & Magic StatesNeura-parse/fault-tolerant-quantum-computingA Stim-backed vertical on QEC code families, decoders, fault-tolerant gate constructions, and full physical-to-logical resource estimation.surface-codeqldpcmagic-state-distillationstim DS-08DS-08Error Mitigation, Characterization & BenchmarkingNeura-parse/quantum-error-mitigation-and-benchmarkingTrustworthy answers from noisy hardware: ZNE, PEC, dynamical decoupling, tomography, and benchmarking with runnable Mitiq, pyGSTi, and Qiskit Experiments pipelines.zero-noise-extrapolationrandomized-benchmarkingmitiq DS-09DS-09Bosonic, CV & Photonic Quantum ComputingNeura-parse/bosonic-photonic-quantum-computingCat, GKP, and binomial codes, Gaussian and measurement-based photonic architectures, and fusion-based approaches with runnable CV simulations.gkp-codecat-qubitsphotonicmeasurement-based DS-10DS-10Topological Quantum ComputingNeura-parse/topological-quantum-computingAnyons and topological order, non-abelian braiding and fusion, Majorana zero modes, Fibonacci versus Ising anyons, and the toric code as a Z2 topological phase.anyonsmajorana-zero-modestoric-codebraiding

Algorithms in practice: software, simulation & optimization

The compilation stack, quantum simulation of matter, and the honest advantage question.

DS-11DS-11Quantum Compilation & ProgrammingNeura-parse/quantum-compilation-and-programmingTurning circuits and unitaries into device-executable programs: synthesis, transpilation, layout and routing (SABRE, VF2), ZX-calculus, OpenQASM 3, and QIR.transpilationcircuit-synthesisopenqasm3zx-calculus DS-12DS-12Quantum Simulation of Chemistry & MaterialsNeura-parse/quantum-simulation-chemistry-materialsElectronic structure, fermion-to-qubit encodings, Hamiltonian factorizations, VQE/QPE and real-time dynamics, built with Qiskit Nature, OpenFermion, PennyLane-QChem, and PySCF.vqeelectronic-structureopenfermionpyscf DS-13DS-13Quantum Optimization, Annealing & FinanceNeura-parse/quantum-optimizationQAOA theory and variants, adiabatic and annealing methods, QUBO/Ising encodings, amplitude-estimation Monte Carlo for finance, and the rigorous where-does-quantum-win question.qaoaqubo-isingquantum-annealingquantum-finance

Networks, sensing & security

Distributed quantum systems, precision measurement, and the quantum-safe boundary.

DS-14DS-14Quantum Networking, Repeaters & Distributed QCNeura-parse/quantum-networking-and-distributedEntanglement distribution and distillation, repeaters, quantum-internet protocol stacks, memories and transduction, protocol- and simulation-backed with NetSquid and SeQUeNCe.quantum-internetrepeatersdistributed-qc DS-15DS-15Quantum Sensing & MetrologyNeura-parse/quantum-sensing-and-metrologyQuantum Fisher information and the Cramér-Rao bound, squeezing from the standard quantum limit toward the Heisenberg limit, realized in atomic clocks, NV magnetometers, and interferometry.fisher-informationheisenberg-limitnv-centers DS-16DS-16Quantum Cryptography & Post-Quantum SecurityNeura-parse/quantum-cryptography-and-post-quantum-securityQKD protocol families from BB84 through MDI/TF/CV-QKD, device-independent protocols, security proofs, quantum hacking countermeasures, and the NIST PQC suite (ML-KEM, ML-DSA, SLH-DSA, HQC).qkdml-kemcrypto-agilitycertified-randomness

003Data flows

From dataset to model to evidence.

The corpus is built for three flows. Each one ends in something reviewable, because a model you cannot evaluate is a liability: every dataset ships a held-out test split, and in our own stack the experiments that consume these sets are recorded as QFlow evidence.

Supervised fine-tuning

Instruction/response and code-task records tune assistants and copilots on quantum domains — from Qiskit-era programming through QEC and compilation. Train on the umbrella, specialize on a vertical.

Evaluation & benchmarking

Held-out test splits with open and multiple-choice Q&A give deterministic scoring for regression tests: measure a base model, measure it after tuning, keep the delta as evidence.

Continued pretraining & grounding

Encyclopedic concepts and pretraining-style text extend a base model's domain knowledge, and double as retrieval corpora for RAG systems that must answer quantum questions with citations.

The corpus is curated from the same research practice as the QANTIS, qmesh, and QMANN lines, and it feeds the assistants and evaluation harnesses we build with NowFlow and QFlow. Publishing it under CC-BY-4.0 is deliberate: the quantum talent pipeline is a shared problem, and open, schema-consistent training data is our contribution to it.

004Quickstart

Two lines to first batch.

Every dataset loads through the standard datasets library with a train and test split. Attribution under CC-BY-4.0: credit Neura Parse Ltd and link the dataset.

Format

Parquet

Splits

train / test

Language

English

License

CC-BY-4.0

load_dataset · Neura-parse

pip install datasets

from datasets import load_dataset

# The umbrella corpus — survey depth across the field
ds = load_dataset("Neura-parse/quantum-computing")

# A deep-dive vertical — research depth on one domain
ft = load_dataset("Neura-parse/fault-tolerant-quantum-computing")

print(ds["train"][0])   # one schema across all 17 sets
print(ft["test"].num_rows)

Use the corpus

Train on it, benchmark with it, build on it.

Seventeen open datasets under one schema. If you are building quantum tooling, assistants, or evaluation pipelines on top of them, we want to hear about it.

huggingface.co/Neura-parse Talk to the team