-
Cluster and Separate: a GNN Approach to Voice and Staff Prediction for Score Engraving
Authors:
Francesco Foscarin,
Emmanouil Karystinaios,
Eita Nakamura,
Gerhard Widmer
Abstract:
This paper approaches the problem of separating the notes from a quantized symbolic music piece (e.g., a MIDI file) into multiple voices and staves. This is a fundamental part of the larger task of music score engraving (or score typesetting), which aims to produce readable musical scores for human performers. We focus on piano music and support homophonic voices, i.e., voices that can contain cho…
▽ More
This paper approaches the problem of separating the notes from a quantized symbolic music piece (e.g., a MIDI file) into multiple voices and staves. This is a fundamental part of the larger task of music score engraving (or score typesetting), which aims to produce readable musical scores for human performers. We focus on piano music and support homophonic voices, i.e., voices that can contain chords, and cross-staff voices, which are notably difficult tasks that have often been overlooked in previous research. We propose an end-to-end system based on graph neural networks that clusters notes that belong to the same chord and connects them with edges if they are part of a voice. Our results show clear and consistent improvements over a previous approach on two datasets of different styles. To aid the qualitative analysis of our results, we support the export in symbolic music formats and provide a direct visualization of our outputs graph over the musical score. All code and pre-trained models are available at https://github.com/CPJKU/piano_svsep
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training
Authors:
Ryoto Ishizuka,
Ryo Nishikimi,
Eita Nakamura,
Kazuyoshi Yoshii
Abstract:
This paper describes a neural drum transcription method that detects from music signals the onset times of drums at the $\textit{tatum}$ level, where tatum times are assumed to be estimated in advance. In conventional studies on drum transcription, deep neural networks (DNNs) have often been used to take a music spectrogram as input and estimate the onset times of drums at the $\textit{frame}$ lev…
▽ More
This paper describes a neural drum transcription method that detects from music signals the onset times of drums at the $\textit{tatum}$ level, where tatum times are assumed to be estimated in advance. In conventional studies on drum transcription, deep neural networks (DNNs) have often been used to take a music spectrogram as input and estimate the onset times of drums at the $\textit{frame}$ level. The major problem with such frame-to-frame DNNs, however, is that the estimated onset times do not often conform with the typical tatum-level patterns appearing in symbolic drum scores because the long-term musically meaningful structures of those patterns are difficult to learn at the frame level. To solve this problem, we propose a regularized training method for a frame-to-tatum DNN. In the proposed method, a tatum-level probabilistic language model (gated recurrent unit (GRU) network or repetition-aware bi-gram model) is trained from an extensive collection of drum scores. Given that the musical naturalness of tatum-level onset times can be evaluated by the language model, the frame-to-tatum DNN is trained with a regularizer based on the pretrained language model. The experimental results demonstrate the effectiveness of the proposed regularized training method.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
Non-Local Musical Statistics as Guides for Audio-to-Score Piano Transcription
Authors:
Kentaro Shibata,
Eita Nakamura,
Kazuyoshi Yoshii
Abstract:
We present an automatic piano transcription system that converts polyphonic audio recordings into musical scores. This has been a long-standing problem of music information processing, and recent studies have made remarkable progress in the two main component techniques: multipitch detection and rhythm quantization. Given this situation, we study a method integrating deep-neural-network-based mult…
▽ More
We present an automatic piano transcription system that converts polyphonic audio recordings into musical scores. This has been a long-standing problem of music information processing, and recent studies have made remarkable progress in the two main component techniques: multipitch detection and rhythm quantization. Given this situation, we study a method integrating deep-neural-network-based multipitch detection and statistical-model-based rhythm quantization. In the first part, we conducted systematic evaluations and found that while the present method achieved high transcription accuracies at the note level, some global characteristics of music, such as tempo scale, metre (time signature), and bar line positions, were often incorrectly estimated. In the second part, we formulated non-local statistics of pitch and rhythmic contents that are derived from musical knowledge and studied their effects in inferring those global characteristics. We found that these statistics are markedly effective for improving the transcription results and that their optimal combination includes statistics obtained from separated hand parts. The integrated method had an overall transcription error rate of 7.1% and a downbeat F-measure of 85.6% on a dataset of popular piano music, and the generated transcriptions can be partially used for music performance and assisting human transcribers, thus demonstrating the potential for practical applications.
△ Less
Submitted 3 April, 2021; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Semi-supervised Neural Chord Estimation Based on a Variational Autoencoder with Latent Chord Labels and Features
Authors:
Yiming Wu,
Tristan Carsault,
Eita Nakamura,
Kazuyoshi Yoshii
Abstract:
This paper describes a statistically-principled semi-supervised method of automatic chord estimation (ACE) that can make effective use of music signals regardless of the availability of chord annotations. The typical approach to ACE is to train a deep classification model (neural chord estimator) in a supervised manner by using only annotated music signals. In this discriminative approach, prior k…
▽ More
This paper describes a statistically-principled semi-supervised method of automatic chord estimation (ACE) that can make effective use of music signals regardless of the availability of chord annotations. The typical approach to ACE is to train a deep classification model (neural chord estimator) in a supervised manner by using only annotated music signals. In this discriminative approach, prior knowledge about chord label sequences (model output) has scarcely been taken into account. In contrast, we propose a unified generative and discriminative approach in the framework of amortized variational inference. More specifically, we formulate a deep generative model that represents the generative process of chroma vectors (observed variables) from discrete labels and continuous features (latent variables), which are assumed to follow a Markov model favoring self-transitions and a standard Gaussian distribution, respectively. Given chroma vectors as observed data, the posterior distributions of the latent labels and features are computed approximately by using deep classification and recognition models, respectively. These three models form a variational autoencoder and can be trained jointly in a semi-supervised manner. The experimental results show that the regularization of the classification model based on the Markov prior of chord labels and the generative model of chroma vectors improved the performance of ACE even under the supervised condition. The semi-supervised learning using additional non-annotated data can further improve the performance.
△ Less
Submitted 8 September, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Multi-Step Chord Sequence Prediction Based on Aggregated Multi-Scale Encoder-Decoder Network
Authors:
Tristan Carsault,
Andrew McLeod,
Philippe Esling,
Jérôme Nika,
Eita Nakamura,
Kazuyoshi Yoshii
Abstract:
This paper studies the prediction of chord progressions for jazz music by relying on machine learning models. The motivation of our study comes from the recent success of neural networks for performing automatic music composition. Although high accuracies are obtained in single-step prediction scenarios, most models fail to generate accurate multi-step chord predictions. In this paper, we postulat…
▽ More
This paper studies the prediction of chord progressions for jazz music by relying on machine learning models. The motivation of our study comes from the recent success of neural networks for performing automatic music composition. Although high accuracies are obtained in single-step prediction scenarios, most models fail to generate accurate multi-step chord predictions. In this paper, we postulate that this comes from the multi-scale structure of musical information and propose new architectures based on an iterative temporal aggregation of input labels. Specifically, the input and ground truth labels are merged into increasingly large temporal bags, on which we train a family of encoder-decoder networks for each temporal scale. In a second step, we use these pre-trained encoder bottleneck features at each scale in order to train a final encoder-decoder network. Furthermore, we rely on different reductions of the initial chord alphabet into three adapted chord alphabets. We perform evaluations against several state-of-the-art models and show that our multi-scale architecture outperforms existing methods in terms of accuracy and perplexity, while requiring relatively few parameters. We analyze musical properties of the results, showing the influence of downbeat position within the analysis window on accuracy, and evaluate errors using a musically-informed distance metric.
△ Less
Submitted 12 November, 2019;
originally announced November 2019.
-
Musical Rhythm Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions
Authors:
Eita Nakamura,
Kazuyoshi Yoshii
Abstract:
Most work on musical score models (a.k.a. musical language models) for music transcription has focused on describing the local sequential dependence of notes in musical scores and failed to capture their global repetitive structure, which can be a useful guide for transcribing music. Focusing on rhythm, we formulate several classes of Bayesian Markov models of musical scores that describe repetiti…
▽ More
Most work on musical score models (a.k.a. musical language models) for music transcription has focused on describing the local sequential dependence of notes in musical scores and failed to capture their global repetitive structure, which can be a useful guide for transcribing music. Focusing on rhythm, we formulate several classes of Bayesian Markov models of musical scores that describe repetitions indirectly using the sparse transition probabilities of notes or note patterns. This enables us to construct piece-specific models for unseen scores with an unfixed repetitive structure and to derive tractable inference algorithms. Moreover, to describe approximate repetitions, we explicitly incorporate a process for modifying the repeated notes/note patterns. We apply these models as prior musical score models for rhythm transcription, where piece-specific score models are inferred from performed MIDI data by Bayesian learning, in contrast to the conventional supervised construction of score models. Evaluations using the vocal melodies of popular music showed that the Bayesian models improved the transcription accuracy for most of the tested model types, indicating the universal efficacy of the proposed approach. Moreover, we found an effective data representation for modelling rhythms that maximizes the transcription accuracy and computational efficiency.
△ Less
Submitted 16 February, 2021; v1 submitted 18 August, 2019;
originally announced August 2019.
-
A portable potentiometric electronic tongue leveraging smartphone and cloud platforms
Authors:
Patrick W. Ruch,
Rui Hu,
Luca Capua,
Yuksel Temiz,
Stephan Paredes,
Antonio Lopez Marin,
Jorge Barroso Carmona,
Aaron Cox,
Eiji Nakamura,
Keiji Matsumoto
Abstract:
Electronic tongues based on potentiometry offer the prospect of rapid and continuous chemical fingerprinting for portable and remote systems. The present contribution presents a technology platform including a miniaturized electronic tongue based on electropolymerized ion-sensitive films, microcontroller-based data acquisition, a smartphone interface and cloud computing back-end for data storage a…
▽ More
Electronic tongues based on potentiometry offer the prospect of rapid and continuous chemical fingerprinting for portable and remote systems. The present contribution presents a technology platform including a miniaturized electronic tongue based on electropolymerized ion-sensitive films, microcontroller-based data acquisition, a smartphone interface and cloud computing back-end for data storage and deployment of machine learning models. The sensor array records a series of differential voltages without use of a true reference electrode and the resulting time-series potentiometry data is used to train supervised machine learning algorithms. For trained systems, inferencing tasks such as the classification of liquids are realized within less than 1 minute including data acquisition at the edge and inference using the cloud-deployed machine learning model. Preliminary demonstration of the complete electronic tongue technology stack is reported for the classification of beverages and mineral water.
△ Less
Submitted 15 July, 2019;
originally announced July 2019.
-
A Topic-Agnostic Approach for Identifying Fake News Pages
Authors:
Sonia Castelo,
Thais Almeida,
Anas Elghafari,
Aécio Santos,
Kien Pham,
Eduardo Nakamura,
Juliana Freire
Abstract:
Fake news and misinformation have been increasingly used to manipulate popular opinion and influence political processes. To better understand fake news, how they are propagated, and how to counter their effect, it is necessary to first identify them. Recently, approaches have been proposed to automatically classify articles as fake based on their content. An important challenge for these approach…
▽ More
Fake news and misinformation have been increasingly used to manipulate popular opinion and influence political processes. To better understand fake news, how they are propagated, and how to counter their effect, it is necessary to first identify them. Recently, approaches have been proposed to automatically classify articles as fake based on their content. An important challenge for these approaches comes from the dynamic nature of news: as new political events are covered, topics and discourse constantly change and thus, a classifier trained using content from articles published at a given time is likely to become ineffective in the future. To address this challenge, we propose a topic-agnostic (TAG) classification strategy that uses linguistic and web-markup features to identify fake news pages. We report experimental results using multiple data sets which show that our approach attains high accuracy in the identification of fake news, even as topics evolve over time.
△ Less
Submitted 2 May, 2019;
originally announced May 2019.
-
Statistical Learning and Estimation of Piano Fingering
Authors:
Eita Nakamura,
Yasuyuki Saito,
Kazuyoshi Yoshii
Abstract:
Automatic estimation of piano fingering is important for understanding the computational process of music performance and applicable to performance assistance and education systems. While a natural way to formulate the quality of fingerings is to construct models of the constraints/costs of performance, it is generally difficult to find appropriate parameter values for these models. Here we study…
▽ More
Automatic estimation of piano fingering is important for understanding the computational process of music performance and applicable to performance assistance and education systems. While a natural way to formulate the quality of fingerings is to construct models of the constraints/costs of performance, it is generally difficult to find appropriate parameter values for these models. Here we study an alternative data-driven approach based on statistical modeling in which the appropriateness of a given fingering is described by probabilities. Specifically, we construct two types of hidden Markov models (HMMs) and their higher-order extensions. We also study deep neural network (DNN)-based methods for comparison. Using a newly released dataset of fingering annotations, we conduct systematic evaluations of these models as well as a representative constraint-based method. We find that the methods based on high-order HMMs outperform the other methods in terms of estimation accuracies. We also quantitatively study individual difference of fingering and propose evaluation measures that can be used with multiple ground truth data. We conclude that the HMM-based methods are currently state of the art and generate acceptable fingerings in most parts and that they have certain limitations such as ignorance of phrase boundaries and interdependence of the two hands.
△ Less
Submitted 1 January, 2020; v1 submitted 23 April, 2019;
originally announced April 2019.
-
Statistical Piano Reduction Controlling Performance Difficulty
Authors:
Eita Nakamura,
Kazuyoshi Yoshii
Abstract:
We present a statistical-modelling method for piano reduction, i.e. converting an ensemble score into piano scores, that can control performance difficulty. While previous studies have focused on describing the condition for playable piano scores, it depends on player's skill and can change continuously with the tempo. We thus computationally quantify performance difficulty as well as musical fide…
▽ More
We present a statistical-modelling method for piano reduction, i.e. converting an ensemble score into piano scores, that can control performance difficulty. While previous studies have focused on describing the condition for playable piano scores, it depends on player's skill and can change continuously with the tempo. We thus computationally quantify performance difficulty as well as musical fidelity to the original score, and formulate the problem as optimization of musical fidelity under constraints on difficulty values. First, performance difficulty measures are developed by means of probabilistic generative models for piano scores and the relation to the rate of performance errors is studied. Second, to describe musical fidelity, we construct a probabilistic model integrating a prior piano-score model and a model representing how ensemble scores are likely to be edited. An iterative optimization algorithm for piano reduction is developed based on statistical inference of the model. We confirm the effect of the iterative procedure; we find that subjective difficulty and musical fidelity monotonically increase with controlled difficulty values; and we show that incorporating sequential dependence of pitches and fingering motion in the piano-score model improves the quality of reduction scores in high-difficulty cases.
△ Less
Submitted 25 October, 2018; v1 submitted 15 August, 2018;
originally announced August 2018.
-
Generative Statistical Models with Self-Emergent Grammar of Chord Sequences
Authors:
Hiroaki Tsushima,
Eita Nakamura,
Katsutoshi Itoyama,
Kazuyoshi Yoshii
Abstract:
Generative statistical models of chord sequences play crucial roles in music processing. To capture syntactic similarities among certain chords (e.g. in C major key, between G and G7 and between F and Dm), we study hidden Markov models and probabilistic context-free grammar models with latent variables describing syntactic categories of chord symbols and their unsupervised learning techniques for…
▽ More
Generative statistical models of chord sequences play crucial roles in music processing. To capture syntactic similarities among certain chords (e.g. in C major key, between G and G7 and between F and Dm), we study hidden Markov models and probabilistic context-free grammar models with latent variables describing syntactic categories of chord symbols and their unsupervised learning techniques for inducing the latent grammar from data. Surprisingly, we find that these models often outperform conventional Markov models in predictive power, and the self-emergent categories often correspond to traditional harmonic functions. This implies the need for chord categories in harmony models from the informatics perspective.
△ Less
Submitted 2 March, 2018; v1 submitted 7 August, 2017;
originally announced August 2017.
-
Note Value Recognition for Piano Transcription Using Markov Random Fields
Authors:
Eita Nakamura,
Kazuyoshi Yoshii,
Simon Dixon
Abstract:
This paper presents a statistical method for use in music transcription that can estimate score times of note onsets and offsets from polyphonic MIDI performance signals. Because performed note durations can deviate largely from score-indicated values, previous methods had the problem of not being able to accurately estimate offset score times (or note values) and thus could only output incomplete…
▽ More
This paper presents a statistical method for use in music transcription that can estimate score times of note onsets and offsets from polyphonic MIDI performance signals. Because performed note durations can deviate largely from score-indicated values, previous methods had the problem of not being able to accurately estimate offset score times (or note values) and thus could only output incomplete musical scores. Based on observations that the pitch context and onset score times are influential on the configuration of note values, we construct a context-tree model that provides prior distributions of note values using these features and combine it with a performance model in the framework of Markov random fields. Evaluation results show that our method reduces the average error rate by around 40 percent compared to existing/simple methods. We also confirmed that, in our model, the score model plays a more important role than the performance model, and it automatically captures the voice structure by unsupervised learning.
△ Less
Submitted 7 July, 2017; v1 submitted 23 March, 2017;
originally announced March 2017.
-
Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices
Authors:
Eita Nakamura,
Kazuyoshi Yoshii,
Shigeki Sagayama
Abstract:
In a recent conference paper, we have reported a rhythm transcription method based on a merged-output hidden Markov model (HMM) that explicitly describes the multiple-voice structure of polyphonic music. This model solves a major problem of conventional methods that could not properly describe the nature of multiple voices as in polyrhythmic scores or in the phenomenon of loose synchrony between v…
▽ More
In a recent conference paper, we have reported a rhythm transcription method based on a merged-output hidden Markov model (HMM) that explicitly describes the multiple-voice structure of polyphonic music. This model solves a major problem of conventional methods that could not properly describe the nature of multiple voices as in polyrhythmic scores or in the phenomenon of loose synchrony between voices. In this paper we present a complete description of the proposed model and develop an inference technique, which is valid for any merged-output HMMs for which output probabilities depend on past events. We also examine the influence of the architecture and parameters of the method in terms of accuracies of rhythm transcription and voice separation and perform comparative evaluations with six other algorithms. Using MIDI recordings of classical piano pieces, we found that the proposed model outperformed other methods by more than 12 points in the accuracy for polyrhythmic performances and performed almost as good as the best one for non-polyrhythmic performances. This reveals the state-of-the-art methods of rhythm transcription for the first time in the literature. Publicly available source codes are also provided for future comparisons.
△ Less
Submitted 28 January, 2017;
originally announced January 2017.
-
Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips
Authors:
Tomohiko Nakamura,
Eita Nakamura,
Shigeki Sagayama
Abstract:
This paper discusses real-time alignment of audio signals of music performance to the corresponding score (a.k.a. score following) which can handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips) in performances. This type of score following is particularly useful in automatic accompaniment for practices and rehearsals, where errors and repeats/skips are often made. Simple…
▽ More
This paper discusses real-time alignment of audio signals of music performance to the corresponding score (a.k.a. score following) which can handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips) in performances. This type of score following is particularly useful in automatic accompaniment for practices and rehearsals, where errors and repeats/skips are often made. Simple extensions of the algorithms previously proposed in the literature are not applicable in these situations for scores of practical length due to the problem of large computational complexity. To cope with this problem, we present two hidden Markov models of monophonic performance with errors and arbitrary repeats/skips, and derive efficient score-following algorithms with an assumption that the prior probability distributions of score positions before and after repeats/skips are independent from each other. We confirmed real-time operation of the algorithms with music scores of practical length (around 10000 notes) on a modern laptop and their tracking ability to the input performance within 0.7 s on average after repeats/skips in clarinet performance data. Further improvements and extension for polyphonic signals are also discussed.
△ Less
Submitted 24 December, 2015;
originally announced December 2015.
-
A Stochastic Temporal Model of Polyphonic MIDI Performance with Ornaments
Authors:
Eita Nakamura,
Nobutaka Ono,
Shigeki Sagayama,
Kenji Watanabe
Abstract:
We study indeterminacies in realization of ornaments and how they can be incorporated in a stochastic performance model applicable for music information processing such as score-performance matching. We point out the importance of temporal information, and propose a hidden Markov model which describes it explicitly and represents ornaments with several state types. Following a review of the indete…
▽ More
We study indeterminacies in realization of ornaments and how they can be incorporated in a stochastic performance model applicable for music information processing such as score-performance matching. We point out the importance of temporal information, and propose a hidden Markov model which describes it explicitly and represents ornaments with several state types. Following a review of the indeterminacies, they are carefully incorporated into the model through its topology and parameters, and the state construction for quite general polyphonic scores is explained in detail. By analyzing piano performance data, we find significant overlaps in inter-onset-interval distributions of chordal notes, ornaments, and inter-chord events, and the data is used to determine details of the model. The model is applied for score following and offline score-performance matching, yielding highly accurate matching for performances with many ornaments and relatively frequent errors, repeats, and skips.
△ Less
Submitted 2 August, 2016; v1 submitted 8 April, 2014;
originally announced April 2014.
-
Outer-Product Hidden Markov Model and Polyphonic MIDI Score Following
Authors:
Eita Nakamura,
Tomohiko Nakamura,
Yasuyuki Saito,
Nobutaka Ono,
Shigeki Sagayama
Abstract:
We present a polyphonic MIDI score-following algorithm capable of following performances with arbitrary repeats and skips, based on a probabilistic model of musical performances. It is attractive in practical applications of score following to handle repeats and skips which may be made arbitrarily during performances, but the algorithms previously described in the literature cannot be applied to s…
▽ More
We present a polyphonic MIDI score-following algorithm capable of following performances with arbitrary repeats and skips, based on a probabilistic model of musical performances. It is attractive in practical applications of score following to handle repeats and skips which may be made arbitrarily during performances, but the algorithms previously described in the literature cannot be applied to scores of practical length due to problems with large computational complexity. We propose a new type of hidden Markov model (HMM) as a performance model which can describe arbitrary repeats and skips including performer tendencies on distributed score positions before and after them, and derive an efficient score-following algorithm that reduces computational complexity without pruning. A theoretical discussion on how much such information on performer tendencies improves the score-following results is given. The proposed score-following algorithm also admits performance mistakes and is demonstrated to be effective in practical situations by carrying out evaluations with human performances. The proposed HMM is potentially valuable for other topics in information processing and we also provide a detailed description of inference algorithms.
△ Less
Submitted 8 April, 2014;
originally announced April 2014.