Skip to content
000OPEN DATASETS

Open quantum datasets, published where builders work.

Neura Parse publishes its quantum training and evaluation data on Hugging Face: one umbrella quantum-computing dataset and sixteen deep-dive verticals, from fault tolerance and compilation to sensing and post-quantum security. Every set ships the same schema, so fine-tuning, benchmarking, and continued pretraining draw from one consistent corpus.

huggingface.co/Neura-parseCC-BY-4.0EN · Parquet · train/test

Public datasets

Deep-dive verticals

Record styles / schema

License · all sets

001Corpus model

The umbrella dataset covers the whole field at survey depth. Each vertical then expands one domain to research depth: derivations in the theory sets, runnable simulations in the hardware sets, executable pipelines in the software sets. Because every record follows the same schema, the corpus composes: train on the umbrella, specialize on a vertical, evaluate on held-out test splits.

Instruction / response

Supervised fine-tuning (SFT) of assistants and copilots

Open Q&A

Free-form evaluation and retrieval-grounded answering

Multiple choice

Deterministic scoring and regression benchmarks

Runnable code tasks

Code-generation training and execution-checked evaluation

Concepts + pretraining text

Continued pretraining and encyclopedic grounding

002Dataset ledger
All sets: EN · Parquet · train + test · CC-BY-4.0

From device physics to the physical-to-logical resource pipeline, simulated in code.

003Data flows

The corpus is built for three flows. Each one ends in something reviewable, because a model you cannot evaluate is a liability: every dataset ships a held-out test split, and in our own stack the experiments that consume these sets are recorded as QFlow evidence.

01

Instruction/response and code-task records tune assistants and copilots on quantum domains — from Qiskit-era programming through QEC and compilation. Train on the umbrella, specialize on a vertical.

02

Held-out test splits with open and multiple-choice Q&A give deterministic scoring for regression tests: measure a base model, measure it after tuning, keep the delta as evidence.

03

Encyclopedic concepts and pretraining-style text extend a base model's domain knowledge, and double as retrieval corpora for RAG systems that must answer quantum questions with citations.

The corpus is curated from the same research practice as the QANTIS, qmesh, and QMANN lines, and it feeds the assistants and evaluation harnesses we build with NowFlow and QFlow. Publishing it under CC-BY-4.0 is deliberate: the quantum talent pipeline is a shared problem, and open, schema-consistent training data is our contribution to it.

004Quickstart

Every dataset loads through the standard datasets library with a train and test split. Attribution under CC-BY-4.0: credit Neura Parse Ltd and link the dataset.

Format

Parquet

Splits

train / test

Language

English

License

CC-BY-4.0

load_dataset · Neura-parse
pip install datasets

from datasets import load_dataset

# The umbrella corpus — survey depth across the field
ds = load_dataset("Neura-parse/quantum-computing")

# A deep-dive vertical — research depth on one domain
ft = load_dataset("Neura-parse/fault-tolerant-quantum-computing")

print(ds["train"][0])   # one schema across all 17 sets
print(ft["test"].num_rows)
Use the corpus

Seventeen open datasets under one schema. If you are building quantum tooling, assistants, or evaluation pipelines on top of them, we want to hear about it.