Current methods for creating drum loop audio in digital music production, such as using one-shot samples or resampling, often demand non-trivial efforts of creators. While recent generative models achieve high fidelity and adhere to text, they lack the specific control needed for such a task. Existing symbolic-to-audio research often focuses on single, tonal instruments, leaving the challenge of polyphonic, percussive drum synthesis unaddressed. We address this gap by introducing ``Break-the-Beat!,'' a model capable of rendering a drum MIDI with the timbre of a reference audio. It is built by fine-tuning a pre-trained text-to-audio model with our proposed content encoder and a effective hybrid conditioning mechanism. To enable this, we construct a new dataset of paired target-reference drum audio from existing drum audio datasets. Experiments demonstrate that our model generates high-quality drum audio that follows high-resolution drum MIDI, achieving strong performance across metrics of audio quality, rhythmic alignment, and beat continuity. This offer producers a new, controllable tool for creative production.
Our model can synthesis drum audio following the original MIDI pattern. Lisarrangementthe same pattern rendered with five different drum kit styles.
| Info | Kit 1 | Kit 2 | Kit 3 | Kit 4 | Kit 5 |
|---|---|---|---|---|---|
| Speed Metal | Alternative (METAL) | Live Fusion | Funk Rock | Ele-Drum | |
|
Genre: Rock
BPM: 86
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Funk Rock
BPM: 92
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Neworleans Funk
BPM: 93
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Funk
BPM: 95
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Jazz Fusion
BPM: 96
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Jazz Funk
BPM: 116
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Rock
BPM: 118
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Funk Fast
BPM: 125
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Punk
BPM: 128
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Funk Purdieshuffle
BPM: 130
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Rock Halftime
BPM: 140
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Rock Halftime
BPM: 140
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Rock
BPM: 145
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
| Info | Kit 1 | Kit 2 | Kit 3 | Kit 4 | Kit 5 |
|---|---|---|---|---|---|
| Speed Metal | Alternative (METAL) | Live Fusion | Funk Rock | Ele-Drum | |
|
Genre: Rock
BPM: 75
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Hiphop
BPM: 92
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Rock
BPM: 95
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Rock
BPM: 105
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Funk
BPM: 117
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
Genre: Gospel
BPM: 120
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
|
|
Genre: Punk
BPM: 144
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Ground Truth
Synthesized
Source
|
Our model can synthesis drum audio using the rhythm-only MIDI without detailed arrangement.
| Info | Results | |
|---|---|---|
| Ground Truth | Synthesized | |
|
Genre: Rock
BPM: 90
Type: Beat
|
||
|
Genre: Latin-Brazilian-Baiao
BPM: 95
Type: Fill
|
||
|
Genre: Jazz-Fusion
BPM: 96
Type: Fill
|
||
|
Genre: Hiphop
BPM: 100
Type: Fill
|
||
|
Genre: Rock
BPM: 105
Type: Beat
|
||
|
Genre: Country
BPM: 114
Type: Fill
|
||
|
Genre: Rock
BPM: 118
Type: Beat
|
||
|
Genre: Gospel
BPM: 120
Type: Beat
|
||
|
Genre: Gospel
BPM: 120
Type: Fill
|
||
|
Genre: Funk-Fast
BPM: 125
Type: Beat
|
||
|
Genre: Punk
BPM: 128
Type: Fill
|
||
|
Genre: Latin-Chacarera
BPM: 157
Type: Beat
|
||
We demonstrate the model's ability to synthesize drum audio at variable durations, enabling precise duration adaptation while preserving timbral fidelity and rhythmic structure.
| Info | Duration | |||
|---|---|---|---|---|
| 1.0x (Ground Truth) | 0.5x | 1.5x | 2.0x | |
|
Genre: Rock
BPM: 75
Type: Beat
|
||||
|
Genre: Afrobeat
BPM: 90
Type: Beat
|
||||
|
Genre: Funk
BPM: 95
Type: Fill
|
||||
|
Genre: Hiphop
BPM: 100
Type: Fill
|
||||
|
Genre: Jazz-Funk
BPM: 116
Type: Fill
|
||||
|
Genre: Latin-Samba
BPM: 116
Type: Fill
|
||||
|
Genre: Jazz
BPM: 120
Type: Beat
|
||||
|
Genre: Funk-Fast
BPM: 125
Type: Fill
|
||||
|
Genre: Rock
BPM: 135
Type: Fill
|
||||
|
Genre: Rock
BPM: 145
Type: Fill
|
||||
|
Genre: Jazz
BPM: 158
Type: Beat
|
||||
We present a comparison of synthesized drum audio generated from identical arrangement MIDI, using models trained with varying temporal resolution.
| Info | Temporal Resolution | Ground Truth | ||
|---|---|---|---|---|
| 16th Note | 32nd Note | 64th Note | ||
|
Genre: Rock
BPM: 75
Type: Beat
|
||||
|
Genre: Funk-Rock
BPM: 92
Type: Fill
|
||||
|
Genre: Hiphop
BPM: 92
Type: Beat
|
||||
|
Genre: Rock
BPM: 95
Type: Beat
|
||||
|
Genre: Jazz-Fusion
BPM: 96
Type: Fill
|
||||
|
Genre: Rock
BPM: 105
Type: Beat
|
||||
|
Genre: Latin-Samba
BPM: 116
Type: Beat
|
||||
|
Genre: Rock
BPM: 118
Type: Fill
|
||||
|
Genre: Punk
BPM: 144
Type: Beat
|
||||
|
Genre: Rock
BPM: 145
Type: Fill
|
||||