Open quantum datasets, published where builders work.
Neura Parse publishes its quantum training and evaluation data on Hugging Face: one umbrella quantum-computing dataset and sixteen deep-dive verticals, from fault tolerance and compilation to sensing and post-quantum security. Every set ships the same schema, so fine-tuning, benchmarking, and continued pretraining draw from one consistent corpus.
17
Public datasets
16
Deep-dive verticals
5
Record styles / schema
CC-BY-4.0
License · all sets
One umbrella, sixteen verticals, one schema.
The umbrella dataset covers the whole field at survey depth. Each vertical then expands one domain to research depth: derivations in the theory sets, runnable simulations in the hardware sets, executable pipelines in the software sets. Because every record follows the same schema, the corpus composes: train on the umbrella, specialize on a vertical, evaluate on held-out test splits.
Instruction / response
Supervised fine-tuning (SFT) of assistants and copilots
Open Q&A
Free-form evaluation and retrieval-grounded answering
Multiple choice
Deterministic scoring and regression benchmarks
Runnable code tasks
Code-generation training and execution-checked evaluation
Concepts + pretraining text
Continued pretraining and encyclopedic grounding
Sixteen verticals, field by field.
Foundations & theory
Proof-oriented verticals on what quantum computation is and where advantage comes from.
Machine learning × quantum
Both directions: quantum models that learn from data, and classical ML that makes quantum computers work.
Hardware, error correction & fault tolerance
From device physics to the physical-to-logical resource pipeline, simulated in code.
Algorithms in practice: software, simulation & optimization
The compilation stack, quantum simulation of matter, and the honest advantage question.
Networks, sensing & security
Distributed quantum systems, precision measurement, and the quantum-safe boundary.
From dataset to model to evidence.
The corpus is built for three flows. Each one ends in something reviewable, because a model you cannot evaluate is a liability: every dataset ships a held-out test split, and in our own stack the experiments that consume these sets are recorded as QFlow evidence.
01
Supervised fine-tuning
Instruction/response and code-task records tune assistants and copilots on quantum domains — from Qiskit-era programming through QEC and compilation. Train on the umbrella, specialize on a vertical.
02
Evaluation & benchmarking
Held-out test splits with open and multiple-choice Q&A give deterministic scoring for regression tests: measure a base model, measure it after tuning, keep the delta as evidence.
03
Continued pretraining & grounding
Encyclopedic concepts and pretraining-style text extend a base model's domain knowledge, and double as retrieval corpora for RAG systems that must answer quantum questions with citations.
The corpus is curated from the same research practice as the QANTIS, qmesh, and QMANN lines, and it feeds the assistants and evaluation harnesses we build with NowFlow and QFlow. Publishing it under CC-BY-4.0 is deliberate: the quantum talent pipeline is a shared problem, and open, schema-consistent training data is our contribution to it.
Two lines to first batch.
Every dataset loads through the standard datasets library with a train and test split. Attribution under CC-BY-4.0: credit Neura Parse Ltd and link the dataset.
Format
Parquet
Splits
train / test
Language
English
License
CC-BY-4.0
pip install datasets
from datasets import load_dataset
# The umbrella corpus — survey depth across the field
ds = load_dataset("Neura-parse/quantum-computing")
# A deep-dive vertical — research depth on one domain
ft = load_dataset("Neura-parse/fault-tolerant-quantum-computing")
print(ds["train"][0]) # one schema across all 17 sets
print(ft["test"].num_rows)Train on it, benchmark with it, build on it.
Seventeen open datasets under one schema. If you are building quantum tooling, assistants, or evaluation pipelines on top of them, we want to hear about it.