Notice: If you have trouble with the interactive MIDI players on the page, consider using Chrome, Edge or Firefox. You can drag the piano roll using mouse or touch devices to change the playing position.
Music is a type of sequential data with various kinds of long-term relations, including repetition, retrograde, sequences, call-and-responses, and many more. It is crucial to model such (potentially long-term) relations in both music analysis and generation tasks.
Currently, attentive models like transformers are a popular method to capture long-term relations in a sequence. The main mechanism in these models is the element-wise attention mechanism. It is powerful at capturing element-wise similarity, but lacks inductive bias to directly compare sequences against sequences. It also requires a multi-layer setting to capture sequence-level similarity.
In this paper, we present the sequential attention module that directly models sequence-level relations in music. In this module, the type of keys and queries are no longer tokens, but sequences.
Attention module | Element-wise attention | Sequential attention (this paper) |
---|---|---|
Type of keys | Token | Sequence |
Type of querys | Token | Sequence |
Weighting method | Dot(key, query) | FFN(LSTM(Concat(key, query))) |
To perform sequence-wise similarity calculation, we first stack the key and query sequence together, and then feed them into a uni-directional Long Short-Term Memory (LSTM) layer. The output of the LSTM and the corresponding key token will be used to determine the matching score of the two sequences, and a predicted token for the next token of the query string.
Fig. 1 provides an illustrative scenario where we have two sequences of notes (Fig. 1). The first sequence is C4 D4 E4 C4 G4
, and the second is A3 B3 C4 A3 ?
where the question mark denotes an unknown token we want to predict. Notice that these two strings are likely to form a tonal sequence relation, and we can use this information to predict the unknown token is likely to be E4
. If the module is well-trained, it will discover similar relations and use them to improve the prediction accuracy.
We can use this module in an attentive language model (Fig. 2). To predict the next token of a partial sequence, we can regard its suffix as the query string, and its substrings as the key strings. Notice that some key strings are not well-matched with the query string, providing useless information for prediction. The normalized matching score is used as the attention weights since a higher score indicates a more important relation between the key and query sequences. We aggregate the prediction by a weighted average layer to produce the final prediction for the next token.
For the task of conditional sequence generation (e.g., melody generation given chord sequences), we propose the conditional version of the sequential attention module (Fig. 3). In this module, the relations of the condition sequences are also considered. Since future conditions are also revealed, the module contains a backward LSTM to capture the relations of future conditions.
We showed by experiments that the model outperforms a 3-layer transformer model with relative positional encoding in the next token prediction task. We also designed some case-study examples to show what kind of relations the module is able to capture. Notice that the top 2 predictions of case (2) both form valid tonal sequences (in C major and F major keys, respectively).
Even though the model is not designed for music generation in mind, we tried some music generation experiments with the model. In Fig. 6, we use the conditional self-attentive language model to generate the melody given the chords and partial melody notes.Below we show another example where the model generates the melody from the beginning given the chords. Notice that the generated melody contains short-term and long-term repetitions, which occurs mainly at the right places (i.e., the melody repeats where the chord sequence repeats).
More generated MIDI samples are available here.
Thanks to Google Creative Lab for the midi player.
Published and hosted by Github Pages