No description
  • Python 94.8%
  • Shell 5.2%
Find a file
Albert Armea 673ddcdbe8 weights: ship the low-latency hop128 model (checkpoint + per-instrument ONNX, Git LFS)
The Track-1 low-latency variant (history 014): same 4-instrument net trained at a
128-sample hop (375 Hz), halving the algorithmic latency floor (21.3 -> 10.7 ms) at
identical onset timing and comparable note-F1 (violin 0.515 vs 0.500, piano 0.648 vs
0.656), for ~2x compute. Commercial-safe (CC0/CC-BY training data only), like
continuo_stream.

Adds (LFS): continuo_stream_hop128.pt (hop stamped), continuo_stream_hop128.onnx
(block-streaming, all instruments), and continuo_stream_hop128_perlayer_{violin,
eguitar,piano,aguitar}.onnx (per-layer cached, instrument baked). All N_EMIT=1, ORT
validated; per-layer == offline torch at hop128 (3.3e-6). README documents the
DAWMODEL_HOP=128 load requirement (the hop is stamped in the checkpoint too).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 22:28:08 +00:00
augment Conditional mixing: polyphony without the monophonic tax 2026-06-18 22:00:13 -04:00
continuo Rewrite continuo/README.md: standalone + current 2026-06-15 02:01:44 +00:00
docs/ai-history tier3gen: URMP unlabeled real-violin ingest (spec 014 step 1) 2026-06-24 21:26:13 +00:00
model model: derive onset cache depth from kernel size in the cached/ONNX streaming path 2026-06-24 21:26:13 +00:00
setup tier3gen: URMP unlabeled real-violin ingest (spec 014 step 1) 2026-06-24 21:26:13 +00:00
tests Fix per-layer cached string_onset; enable it across all streaming paths 2026-06-18 22:00:15 -04:00
tier1gen tier1gen: more realistic violin — softer tone, evolving vibrato, cantabile, crisp onsets 2026-06-24 21:26:12 +00:00
tier2gen Tier-1/Tier-2 generators for electric guitar + piano 2026-06-14 05:32:34 +00:00
tier3gen tier3gen: fix URMP ingest audio level + silence-labeled-voicing 2026-06-24 21:26:13 +00:00
weights weights: ship the low-latency hop128 model (checkpoint + per-instrument ONNX, Git LFS) 2026-06-24 22:28:08 +00:00
.gitattributes Add pre-exported ONNX graphs via Git LFS (block + per-layer per instrument) 2026-06-18 22:00:16 -04:00
.gitignore Implement Continuo representation, Tier-1 generator, and audio→Continuo model 2026-06-14 00:05:06 +00:00
CLAUDE.md Rust integration spec for the streaming/ONNX core (history 013) 2026-06-18 22:00:15 -04:00
LICENSE.md AGPL 2026-06-18 22:00:07 -04:00
pyproject.toml Add Tier 3 (real recordings) via GuitarSet ingest + retrain 2026-06-14 14:44:05 +00:00
README.md Integrate CREPE pitch head: guitar pitch 0.35->0.52, others preserved 2026-06-14 13:27:57 +00:00

Continuo — representation, data generators, and audio→Continuo model

An audio-transcription research stack built around Continuo, a continuous, microtonal, technique-aware performance representation. Five packages implement the specs in docs/ai-history/:

package role README
continuo/ the representation: Envelope/Note/Stream/Document, instrument profiles, .cont serialization, validation continuo/README
tier1gen/ Tier-1 data — Faust physical-model synthesis (DawDreamer) → exact continuous technique-channel labels tier1gen/README
tier2gen/ Tier-2 data — real sample libraries (VSCO-2 CE) via sfizz keyswitches → real timbre + discrete articulation labels tier2gen/README
model/ low-latency streaming audio→Continuo model (front end → causal TCN+GRU → multi-task heads), training, decoder model/README
augment/ curve-aware (audio_fn, label_fn) augmentation + license-filtered asset registry augment/README

Performance layer only — no symbolic/notation layer (by design). CLAUDE.md is the working guide; the docs/ai-history/00{3,5,7} entries are the implementation logs (decisions, deviations, results) — the authoritative record.

Repository layout

continuo/ tier1gen/ tier2gen/ model/ augment/   # the five packages (+ per-dir README)
setup/        # idempotent env scripts (run in order; see below)
tests/        # test_continuo.py
docs/ai-history/  # specs (000,001,002,004,006) + implementation logs (003,005,007,009)
/mnt/train/   # NOT in git: datasets, asset banks, sfizz build, VSCO-2, runs/<name>

Environment

Target box: Ubuntu 26.04, AMD Ryzen 9950X3D, RTX 5090 (Blackwell/sm_120), /mnt/train fast storage. The original briefs targeted 22.04/24.04 + Python 3.11; see docs/ai-history/2026-06-13 003 … for the 26.04 adaptations.

Idempotent setup scripts, run in order as needed:

bash setup/setup_env.sh             # base: uv + Python 3.12, torch-cu128, DawDreamer, Faust, deps
bash setup/setup_tier2.sh           # builds sfizz_render from source, fetches VSCO-2 CE (Tier 2)
bash setup/setup_aug.sh             # pyroomacoustics + generates the CC0 procedural asset bank
bash setup/download_ccby_assets.sh  # optional: real CC-BY rooms/noise (HOMULA-RIR, DEMAND)
source .venv/bin/activate

Key choices: Python 3.12 (newest with DawDreamer + Blackwell-PyTorch wheels; 26.04 ships only 3.14), provisioned with uv; PyTorch from the cu128 index; DawDreamer installs as a plain wheel. See docs/ai-history/003 for the full 24.04→26.04 adaptation record.

Generate data

Tier 1 — physical-modelling synthesis (Faust via DawDreamer); supplies continuous technique-channel labels:

python -m tier1gen.cli --profile violin --count 4000 \
    --out /mnt/train/violin_tier1 --seed-base 0 --workers 22

Deterministic per seed; writes (wav, .cont) pairs + a dataset_manifest.json with QA status. A second profile (--profile guitar) works by adding a profile module only — no core changes (acceptance #5).

Tier 2 — real recorded sample libraries (VSCO-2 CE, CC0) rendered through sfizz with articulation keyswitches; supplies real timbre + exact discrete articulation labels (continuous technique channels masked). Run bash setup/setup_tier2.sh first (builds sfizz_render, fetches VSCO-2):

python -m tier2gen.cli --library vsco2_solo_violin --count 4000 \
    --out /mnt/train/vsco2_tier2_solo --seed-base 0 --workers 20

Pitch/dynamics are extracted from the rendered audio (provenance=analysis); mode/articulation come from the keyswitch (provenance=synthetic). A second library (--library vsco2_violin_section) works by adding a YAML map only (acceptance #6). See history entry 005.

Combined Tier 1 + Tier 2 training

Pass multiple dataset dirs; the dataloader masks each .cont's declared masked_outputs (Tier 2 masks technique_channels), and the articulation head spans both tiers (8 classes):

python -m model.train --data /mnt/train/violin_tier1 /mnt/train/vsco2_tier2_solo \
    --out /mnt/train/runs/combined_v3 --epochs 30 --batch-size 16 --workers 14

Tier 1 metrics are preserved while Tier 2 adds near-perfect articulation/mode and dynamics on real timbre; per-tier results in history entry 005 §7.

Curve-aware augmentation (CC0)

augment/ transforms (audio, .cont) pairs as paired (audio_fn, label_fn) ops before rasterization (noise/RIR/EQ/codec, gain/compression, time-stretch, pitch-shift, stem-mixing, SpecAugment), keeping labels analytically in sync. The Group-A asset pool (room IRs + noise) is procedurally generated — CC0-by- construction, commercial-safe — by augment.genassets. Run setup/setup_aug.sh once, then training augments train-only and reports a seeded robustness val beside the clean val:

python -m model.train --data /mnt/train/violin_tier1 /mnt/train/vsco2_tier2_solo \
    --out /mnt/train/runs/combined_aug --epochs 30 --batch-size 16 --workers 20

Augmentation collapses the clean→robust generalization gap (e.g. articulation gap 0.30→0.06). Real CC-BY corpora (HOMULA-RIR rooms + DEMAND noise, commercial-OK with attribution) can be added alongside the CC0 bank via setup/download_ccby_assets.sh; on a real-conditions robustness val the ordering is no-aug < procedural-CC0 < CC0+CC-BY on every metric. See history entry 007.

Multiple instruments

The model supports violin, electric guitar, and piano via an instrument-id FiLM embedding and union heads masked per-channel (a violin sample trains only its bow channels, a piano only its pedals). Data: Tier-1 (Faust pm.elecGuitar) + Tier-2 electric guitar (Karoryfer Emilyguitar, CC-BY) and piano (VSCO-2 Upright Piano, CC0). Train across all datasets at once; evaluate per instrument:

python -m model.train --data /mnt/train/violin_tier1 /mnt/train/vsco2_tier2_solo \
    /mnt/train/eguitar_tier1 /mnt/train/eguitar_tier2 /mnt/train/piano_tier2 \
    --out /mnt/train/runs/multi_crepe16 --epochs 28 --batch-size 16 --workers 22
python -m model.eval_per_instrument --ckpt /mnt/train/runs/multi_crepe16/best.pt --data <dirs...>

The grid extends to C2C8 (360 bins) with a dual-resolution front end (~32 ms). Adding an instrument = a Continuo profile + a Tier-1 profile / Tier-2 library YAML. Pitch uses a CREPE-style head (model/crepe.py); the synthetic guitar's pathological pitch is masked so guitar pitch is learned from real Emily samples. Per-instrument pitch acc: violin 0.79, electric guitar 0.52 (was 0.35 with the old harmonic-stack head), piano 0.55 — guitar/piano still trail bowed violin and cost latency (~48 ms). See history 008 (plan), 009 (results), 010 (CREPE + fixes).

Train + evaluate

python -m model.train --data /mnt/train/violin_tier1 \
    --out /mnt/train/runs/violin_v1 --epochs 30 --batch-size 16 --workers 12

Writes history.json, best.pt/last.pt, and final_metrics.json (val + test, split by generator seed). Streaming decode of frame outputs into Continuo notes is in model/decoder.py.

The front end is a swappable seam (model/frontend.py). The default is a learned short-STFT front end (ModelConfig.frontend="stft", stft_win=512, lookahead=3) → ~26.7 ms total algorithmic latency, the best onset/dynamics point of the window×lookahead sweep. The analytic HCQT ("hcqt", ~244 ms) and other points (--stft-win/--lookahead) are available. At matched accuracy the STFT cuts front-end latency ~15× — see history entry 003 §8.

Tests

python tests/test_continuo.py    # envelope lowering, .cont round-trip, validation