Search | arXiv e-print repository

doi 10.1145/3678299.3678321

Tonal Cognition in Sonification: Exploring the Needs of Practitioners in Sonic Interaction Design

Authors: Minsik Choi, Josh Andres, Charles Patrick Martin

Abstract: Research into tonal music examines the structural relationships among sounds and how they align with our auditory perception. The exploration of integrating tonal cognition into sonic interaction design, particularly for practitioners lacking extensive musical knowledge, and developing an accessible software tool, remains limited. We report on a study of designers to understand the sound creation… ▽ More Research into tonal music examines the structural relationships among sounds and how they align with our auditory perception. The exploration of integrating tonal cognition into sonic interaction design, particularly for practitioners lacking extensive musical knowledge, and developing an accessible software tool, remains limited. We report on a study of designers to understand the sound creation practices of industry experts and explore how infusing tonal music principles into a sound design tool can better support their craft and enhance the sonic experiences they create. Our study collected qualitative data through semi-structured individual and focus group interviews with six participants. We developed a low-fidelity prototype sound design tool that involves practical methods of functional harmony and interaction design discussed in focus groups. We identified four themes through reflexive thematic analysis: decision-making, domain knowledge and terminology, collaboration, and contexts in sound creation. Finally, we discussed design considerations for an accessible sonic interaction design tool that aligns auditory experience more closely with tonal cognition. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: To be published in: Proceedings of the 19th Audio Mostly Conference: A Conference on Explorations in Sonic Cultures, Milan, Italy, 2024

arXiv:2405.15338 [pdf, other]

SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation

Authors: Xinlei Niu, Jing Zhang, Christian Walder, Charles Patrick Martin

Abstract: We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model. Unlike recent large-scale sound generation models, our model can be efficiently trained under limited computational resources. The integration of a contrastive learning strategy further enhances the connection between text conditions and the gen… ▽ More We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model. Unlike recent large-scale sound generation models, our model can be efficiently trained under limited computational resources. The integration of a contrastive learning strategy further enhances the connection between text conditions and the generated outputs, resulting in coherent and high-fidelity performance. Our experiments demonstrate that SoundLoCD outperforms the baseline with greatly reduced computational resources. A comprehensive ablation study further validates the contribution of each component within SoundLoCD. Demo page: \url{https://XinleiNIU.github.io/demo-SoundLoCD/}. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2404.15637 [pdf, other]

HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts

Authors: Xinlei Niu, Jing Zhang, Charles Patrick Martin

Abstract: We introduce HybridVC, a voice conversion (VC) framework built upon a pre-trained conditional variational autoencoder (CVAE) that combines the strengths of a latent model with contrastive learning. HybridVC supports text and audio prompts, enabling more flexible voice style conversion. HybridVC models a latent distribution conditioned on speaker embeddings acquired by a pretrained speaker encoder… ▽ More We introduce HybridVC, a voice conversion (VC) framework built upon a pre-trained conditional variational autoencoder (CVAE) that combines the strengths of a latent model with contrastive learning. HybridVC supports text and audio prompts, enabling more flexible voice style conversion. HybridVC models a latent distribution conditioned on speaker embeddings acquired by a pretrained speaker encoder and optimises style text embeddings to align with the speaker style information through contrastive learning in parallel. Therefore, HybridVC can be efficiently trained under limited computational resources. Our experiments demonstrate HybridVC's superior training efficiency and its capability for advanced multi-modal voice style conversion. This underscores its potential for widespread applications such as user-defined personalised voice in various social media platforms. A comprehensive ablation study further validates the effectiveness of our method. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2306.02568 [pdf, other]

Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

Authors: Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin

Abstract: We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all… ▽ More We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP). We demonstrate the usage of BDP in the latent space of variational autoencoders (VAEs) and propose the BDP-VAE which captures structured sparse optimal paths as latent variables. This enables end-to-end training for generative tasks in which models rely on unobserved structural information. At last, we validate the behavior of our approach and showcase its applicability in two real-world applications: text-to-speech and singing voice synthesis. Our implementation code is available at \url{https://github.com/XinleiNIU/LatentOptimalPathsBayesianDP}. △ Less

Submitted 25 June, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

Comments: Accepted by ICML 2024

arXiv:2210.09291 [pdf, other]

Embodying the Glitch: Perspectives on Generative AI in Dance Practice

Authors: Benedikte Wallace, Charles P. Martin

Abstract: What role does the break from realism play in the potential for generative artificial intelligence as a creative tool? Through exploration of glitch, we examine the prospective value of these artefacts in creative practice. This paper describes findings from an exploration of AI-generated "mistakes" when using movement produced by a generative deep learning model as an inspiration source in dance… ▽ More What role does the break from realism play in the potential for generative artificial intelligence as a creative tool? Through exploration of glitch, we examine the prospective value of these artefacts in creative practice. This paper describes findings from an exploration of AI-generated "mistakes" when using movement produced by a generative deep learning model as an inspiration source in dance composition. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2012.02404 [pdf, other]

doi 10.5281/zenodo.1302543

Composing an Ensemble Standstill Work for Myo and Bela

Authors: Charles Patrick Martin, Alexander Refsum Jensenius, Jim Torresen

Abstract: This paper describes the process of developing a standstill performance work using the Myo gesture control armband and the Bela embedded computing platform. The combination of Myo and Bela allows a portable and extensible version of the standstill performance concept while introducing muscle tension as an additional control parameter. We describe the technical details of our setup and introduce My… ▽ More This paper describes the process of developing a standstill performance work using the Myo gesture control armband and the Bela embedded computing platform. The combination of Myo and Bela allows a portable and extensible version of the standstill performance concept while introducing muscle tension as an additional control parameter. We describe the technical details of our setup and introduce Myo-to-Bela and Myo-to-OSC software bridges that assist with prototyping compositions using the Myo controller. △ Less

Submitted 4 December, 2020; originally announced December 2020.

ACM Class: H.5.5

Journal ref: Proceedings of the International Conference on New Interfaces for Musical Expression, 2018, pp. 196-197

arXiv:2012.02322 [pdf, other]

A Laptop Ensemble Performance System using Recurrent Neural Networks

Authors: Rohan Proctor, Charles Patrick Martin

Abstract: The popularity of applying machine learning techniques in musical domains has created an inherent availability of freely accessible pre-trained neural network (NN) models ready for use in creative applications. This work outlines the implementation of one such application in the form of an assistance tool designed for live improvisational performances by laptop ensembles. The primary intention was… ▽ More The popularity of applying machine learning techniques in musical domains has created an inherent availability of freely accessible pre-trained neural network (NN) models ready for use in creative applications. This work outlines the implementation of one such application in the form of an assistance tool designed for live improvisational performances by laptop ensembles. The primary intention was to leverage off-the-shelf pre-trained NN models as a basis for assisting individual performers either as musical novices looking to engage with more experienced performers or as a tool to expand musical possibilities through new forms of creative expression. The system expands upon a variety of ideas found in different research areas including new interfaces for musical expression, generative music and group performance to produce a networked performance solution served via a web-browser interface. The final implementation of the system offers performers a mixture of high and low-level controls to influence the shape of sequences of notes output by locally run NN models in real time, also allowing performers to define their level of engagement with the assisting generative models. Two test performances were played, with the system shown to feasibly support four performers over a four minute piece while producing musically cohesive and engaging music. Iterations on the design of the system exposed technical constraints on the use of a JavaScript environment for generative models in a live music context, largely derived from inescapable processing overheads. △ Less

Submitted 3 December, 2020; originally announced December 2020.

ACM Class: H.5.5; H.5.3

Journal ref: Proceedings of the International Conference on New Interfaces for Musical Expression, 2020, pp. 43-48

arXiv:2012.02311 [pdf, other]

Sonic Sculpture: Activating Engagement with Head-Mounted Augmented Reality

Authors: Charles Patrick Martin, Zeruo Liu, Yichen Wang, Wennan He, Henry Gardner

Abstract: This work examines how head-mounted AR can be used to build an interactive sonic landscape to engage with a public sculpture. We describe a sonic artwork, "Listening To Listening", that has been designed to accompany a real-world sculpture with two prototype interaction schemes. Our artwork is created for the HoloLens platform so that users can have an individual experience in a mixed reality cont… ▽ More This work examines how head-mounted AR can be used to build an interactive sonic landscape to engage with a public sculpture. We describe a sonic artwork, "Listening To Listening", that has been designed to accompany a real-world sculpture with two prototype interaction schemes. Our artwork is created for the HoloLens platform so that users can have an individual experience in a mixed reality context. Personal head-mounted AR systems have recently become available and practical for integration into public art projects, however research into sonic sculpture works has yet to account for the affordances of current portable and mainstream AR systems. In this work, we take advantage of the HoloLens' spatial awareness to build sonic spaces that have a precise spatial relationship to a given sculpture and where the sculpture itself is modelled in the augmented scene as an "invisible hologram". We describe the artistic rationale for our artwork, the design of the two interaction schemes, and the technical and usability feedback that we have obtained from demonstrations during iterative development. △ Less

Submitted 3 December, 2020; originally announced December 2020.

ACM Class: H.5.5; H.5.1

Journal ref: Proceedings of the International Conference on New Interfaces for Musical Expression, 2020, pp. 48-52

arXiv:2011.13453 [pdf, other]

Towards Movement Generation with Audio Features

Authors: Benedikte Wallace, Charles P. Martin, Jim Torresen, Kristian Nymoen

Abstract: Sound and movement are closely coupled, particularly in dance. Certain audio features have been found to affect the way we move to music. Is this relationship between sound and movement something which can be modelled using machine learning? This work presents initial experiments wherein high-level audio features calculated from a set of music pieces are included in a movement generation model tra… ▽ More Sound and movement are closely coupled, particularly in dance. Certain audio features have been found to affect the way we move to music. Is this relationship between sound and movement something which can be modelled using machine learning? This work presents initial experiments wherein high-level audio features calculated from a set of music pieces are included in a movement generation model trained on motion capture recordings of improvised dance. Our results indicate that the model learns to generate realistic dance movements which vary depending on the audio features. △ Less

Submitted 26 November, 2020; originally announced November 2020.

arXiv:2003.13254 [pdf, other]

Environmental Adaptation of Robot Morphology and Control through Real-world Evolution

Authors: Tønnes F. Nygaard, Charles P. Martin, David Howard, Jim Torresen, Kyrre Glette

Abstract: Robots operating in the real world will experience a range of different environments and tasks. It is essential for the robot to have the ability to adapt to its surroundings to work efficiently in changing conditions. Evolutionary robotics aims to solve this by optimizing both the control and body (morphology) of a robot, allowing adaptation to internal, as well as external factors. Most work in… ▽ More Robots operating in the real world will experience a range of different environments and tasks. It is essential for the robot to have the ability to adapt to its surroundings to work efficiently in changing conditions. Evolutionary robotics aims to solve this by optimizing both the control and body (morphology) of a robot, allowing adaptation to internal, as well as external factors. Most work in this field has been done in physics simulators, which are relatively simple and not able to replicate the richness of interactions found in the real world. Solutions that rely on the complex interplay between control, body, and environment are therefore rarely found. In this paper, we rely solely on real-world evaluations and apply evolutionary search to yield combinations of morphology and control for our mechanically self-reconfiguring quadruped robot. We evolve solutions on two distinct physical surfaces and analyze the results in terms of both control and morphology. We then transition to two previously unseen surfaces to demonstrate the generality of our method. We find that the evolutionary search finds high-performing and diverse morphology-controller configurations by adapting both control and body to the different properties of the physical environments. We additionally find that morphology and control vary with statistical significance between the environments. Moreover, we observe that our method allows for morphology and control parameters to transfer to previously-unseen terrains, demonstrating the generality of our approach. △ Less

Submitted 20 October, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

arXiv:1905.05626 [pdf, other]

Lessons Learned from Real-World Experiments with DyRET: the Dynamic Robot for Embodied Testing

Authors: Tønnes F. Nygaard, Jørgen Nordmoen, Charles P. Martin, Kyrre Glette

Abstract: Robots are used in more and more complex environments, and are expected to be able to adapt to changes and unknown situations. The easiest and quickest way to adapt is to change the control system of the robot, but for increasingly complex environments one should also change the body of the robot -- its morphology -- to better fit the task at hand. The theory of Embodied Cognition states that cont… ▽ More Robots are used in more and more complex environments, and are expected to be able to adapt to changes and unknown situations. The easiest and quickest way to adapt is to change the control system of the robot, but for increasingly complex environments one should also change the body of the robot -- its morphology -- to better fit the task at hand. The theory of Embodied Cognition states that control is not the only source of cognition, and the body, environment, interaction between these and the mind all contribute as cognitive resources. Taking advantage of these concepts could lead to improved adaptivity, robustness, and versatility, however, executing these concepts on real-world robots puts additional requirements on the hardware and has several challenges when compared to learning just control. In contrast to the majority of work in Evolutionary Robotics, Eiben argues for real-world experiments in his `Grand Challenges for Evolutionary Robotics'. This requires robust hardware platforms that are capable of repeated experiments which should at the same time be flexible when unforeseen demands arise. In this paper, we introduce our unique robot platform with self-adaptive morphology. We discuss the challenges we have faced when designing it, and the lessons learned from real-world testing and learning. △ Less

Submitted 14 May, 2019; originally announced May 2019.

Comments: Accepted to the Learning Legged Locomotion Workshop @ ICRA 2019

arXiv:1904.05009 [pdf, other]

An Interactive Musical Prediction System with Mixture Density Recurrent Neural Networks

Authors: Charles P Martin, Jim Torresen

Abstract: This paper is about creating digital musical instruments where a predictive neural network model is integrated into the interactive system. Rather than predicting symbolic music (e.g., MIDI notes), we suggest that predicting future control data from the user and precise temporal information can lead to new and interesting interactive possibilities. We propose that a mixture density recurrent neura… ▽ More This paper is about creating digital musical instruments where a predictive neural network model is integrated into the interactive system. Rather than predicting symbolic music (e.g., MIDI notes), we suggest that predicting future control data from the user and precise temporal information can lead to new and interesting interactive possibilities. We propose that a mixture density recurrent neural network (MDRNN) is an appropriate model for this task. The predictions can be used to fill-in control data when the user stops performing, or as a kind of filter on the user's own input. We present an interactive MDRNN prediction server that allows rapid prototyping of new NIMEs featuring predictive musical interaction by recording datasets, training MDRNN models, and experimenting with interaction modes. We illustrate our system with several example NIMEs applying this idea. Our evaluation shows that real-time predictive interaction is viable even on single-board computers and that small models are appropriate for small datasets. △ Less

Submitted 10 April, 2019; originally announced April 2019.

Comments: Accepted for presentation at the International Conference on New Interfaces for Musical Expression (NIME), June 2019

arXiv:1902.04403 [pdf, other]

Evolving Robots on Easy Mode: Towards a Variable Complexity Controller for Quadrupeds

Authors: Tønnes Frostad Nygaard, Charles Patrick Martin, Jim Torresen, Kyrre Glette

Abstract: The complexity of a legged robot's environment or task can inform how specialised its gait must be to ensure success. Evolving specialised robotic gaits demands many evaluations - acceptable for computer simulations, but not for physical robots. For some tasks, a more general gait, with lower optimization costs, could be satisfactory. In this paper, we introduce a new type of gait controller where… ▽ More The complexity of a legged robot's environment or task can inform how specialised its gait must be to ensure success. Evolving specialised robotic gaits demands many evaluations - acceptable for computer simulations, but not for physical robots. For some tasks, a more general gait, with lower optimization costs, could be satisfactory. In this paper, we introduce a new type of gait controller where complexity can be set by a single parameter, using a dynamic genotype-phenotype mapping. Low controller complexity leads to conservative gaits, while higher complexity allows more sophistication and high performance for demanding tasks, at the cost of optimization effort. We investigate the new controller on a virtual robot in simulations and do preliminary testing on a real-world robot. We show that having variable complexity allows us to adapt to different optimization budgets. With a high evaluation budget in simulation, a complex controller performs best. Moreover, real-world evolution with a limited evaluation budget indicates that a lower gait complexity is preferable for a relatively simple environment. △ Less

Submitted 12 February, 2019; originally announced February 2019.

Comments: Accepted to EvoApplications19

arXiv:1902.00680 [pdf, other]

doi 10.1162/comj_a_00536

Data Driven Analysis of Tiny Touchscreen Performance with MicroJam

Authors: Charles P Martin, Jim Torresen

Abstract: The widespread adoption of mobile devices, such as smartphones and tablets, has made touchscreens a common interface for musical performance. New mobile musical instruments have been designed that embrace collaborative creation and that explore the affordances of mobile devices, as well as their constraints. While these have been investigated from design and user experience perspectives, there is… ▽ More The widespread adoption of mobile devices, such as smartphones and tablets, has made touchscreens a common interface for musical performance. New mobile musical instruments have been designed that embrace collaborative creation and that explore the affordances of mobile devices, as well as their constraints. While these have been investigated from design and user experience perspectives, there is little examination of the performers' musical outputs. In this work, we introduce a constrained touchscreen performance app, MicroJam, designed to enable collaboration between performers, and engage in a novel data-driven analysis of more than 1600 performances using the app. MicroJam constrains performances to five seconds, and emphasises frequent and casual music making through a social media-inspired interface. Performers collaborate by replying to performances, adding new musical layers that are played back at the same time. Our analysis shows that users tend to focus on the centre and diagonals of the touchscreen area, and tend to swirl or swipe rather than tap. We also observe that while long swipes dominate the visual appearance of performances, the majority of interactions are short with limited expressive possibilities. Our findings are summarised into a set of design recommendations for MicroJam and other touchscreen apps for social musical interaction. △ Less

Submitted 2 February, 2019; originally announced February 2019.

Journal ref: Computer Music Journal, 43(4), 41-57 (2020)

arXiv:1901.07859 [pdf, other]

How do Mixture Density RNNs Predict the Future?

Authors: Kai Olav Ellefsen, Charles Patrick Martin, Jim Torresen

Abstract: Gaining a better understanding of how and what machine learning systems learn is important to increase confidence in their decisions and catalyze further research. In this paper, we analyze the predictions made by a specific type of recurrent neural network, mixture density RNNs (MD-RNNs). These networks learn to model predictions as a combination of multiple Gaussian distributions, making them pa… ▽ More Gaining a better understanding of how and what machine learning systems learn is important to increase confidence in their decisions and catalyze further research. In this paper, we analyze the predictions made by a specific type of recurrent neural network, mixture density RNNs (MD-RNNs). These networks learn to model predictions as a combination of multiple Gaussian distributions, making them particularly interesting for problems where a sequence of inputs may lead to several distinct future possibilities. An example is learning internal models of an environment, where different events may or may not occur, but where the average over different events is not meaningful. By analyzing the predictions made by trained MD-RNNs, we find that their different Gaussian components have two complementary roles: 1) Separately modeling different stochastic events and 2) Separately modeling scenarios governed by different rules. These findings increase our understanding of what is learned by predictive MD-RNNs, and open up new research directions for further understanding how we can benefit from their self-organizing model decomposition. △ Less

Submitted 23 January, 2019; originally announced January 2019.

arXiv:1805.03388 [pdf, other]

Real-World Evolution Adapts Robot Morphology and Control to Hardware Limitations

Authors: Tønnes F. Nygaard, Charles P. Martin, Eivind Samuelsen, Jim Torresen, Kyrre Glette

Abstract: For robots to handle the numerous factors that can affect them in the real world, they must adapt to changes and unexpected events. Evolutionary robotics tries to solve some of these issues by automatically optimizing a robot for a specific environment. Most of the research in this field, however, uses simplified representations of the robotic system in software simulations. The large gap between… ▽ More For robots to handle the numerous factors that can affect them in the real world, they must adapt to changes and unexpected events. Evolutionary robotics tries to solve some of these issues by automatically optimizing a robot for a specific environment. Most of the research in this field, however, uses simplified representations of the robotic system in software simulations. The large gap between performance in simulation and the real world makes it challenging to transfer the resulting robots to the real world. In this paper, we apply real world multi-objective evolutionary optimization to optimize both control and morphology of a four-legged mammal-inspired robot. We change the supply voltage of the system, reducing the available torque and speed of all joints, and study how this affects both the fitness, as well as the morphology and control of the solutions. In addition to demonstrating that this real-world evolutionary scheme for morphology and control is indeed feasible with relatively few evaluations, we show that evolution under the different hardware limitations results in comparable performance for low and moderate speeds, and that the search achieves this by adapting both the control and the morphology of the robot. △ Less

Submitted 9 May, 2018; originally announced May 2018.

Comments: Accepted to the 2018 Genetic and Evolutionary Computation Conference (GECCO)

arXiv:1805.02965 [pdf, other]

Exploring Mechanically Self-Reconfiguring Robots for Autonomous Design

Authors: Tønnes F. Nygaard, Charles P. Martin, Jim Torresen, Kyrre Glette

Abstract: Evolutionary robotics has aimed to optimize robot control and morphology to produce better and more robust robots. Most previous research only addresses optimization of control, and does this only in simulation. We have developed a four-legged mammal-inspired robot that features a self-reconfiguring morphology. In this paper, we discuss the possibilities opened up by being able to efficiently do e… ▽ More Evolutionary robotics has aimed to optimize robot control and morphology to produce better and more robust robots. Most previous research only addresses optimization of control, and does this only in simulation. We have developed a four-legged mammal-inspired robot that features a self-reconfiguring morphology. In this paper, we discuss the possibilities opened up by being able to efficiently do experiments on a changing morphology in the real world. We discuss present challenges for such a platform and potential experimental designs that could unlock new discoveries. Finally, we place our robot in its context within general developments in the field of evolutionary robotics, and consider what advances the future might hold. △ Less

Submitted 8 May, 2018; originally announced May 2018.

Comments: Accepted to the 2018 ICRA Workshop on Autonomous Robot Design

arXiv:1803.05629 [pdf, other]

Self-Modifying Morphology Experiments with DyRET: Dynamic Robot for Embodied Testing

Authors: Tønnes F. Nygaard, Charles P. Martin, Jim Torresen, Kyrre Glette

Abstract: If robots are to become ubiquitous, they will need to be able to adapt to complex and dynamic environments. Robots that can adapt their bodies while deployed might be flexible and robust enough to meet this challenge. Previous work on dynamic robot morphology has focused on simulation, combining simple modules, or switching between locomotion modes. Here, we present an alternative approach: a self… ▽ More If robots are to become ubiquitous, they will need to be able to adapt to complex and dynamic environments. Robots that can adapt their bodies while deployed might be flexible and robust enough to meet this challenge. Previous work on dynamic robot morphology has focused on simulation, combining simple modules, or switching between locomotion modes. Here, we present an alternative approach: a self-reconfigurable morphology that allows a single four-legged robot to actively adapt the length of its legs to different environments. We report the design of our robot, as well as the results of a study that verifies the performance impact of self-reconfiguration. This study compares three different control and morphology pairs under different levels of servo supply voltage in the lab. We also performed preliminary tests in different uncontrolled outdoor environments to see if changes to the external environment supports our findings in the lab. Our results show better performance with an adaptable body, lending evidence to the value of self-reconfiguration for quadruped robots. △ Less

Submitted 23 July, 2019; v1 submitted 15 March, 2018; originally announced March 2018.

Comments: Accepted to ICRA19. Corrections to table II, July 2019

arXiv:1801.10492 [pdf, other]

Deep Predictive Models in Interactive Music

Authors: Charles P. Martin, Kai Olav Ellefsen, Jim Torresen

Abstract: Musical performance requires prediction to operate instruments, to perform in groups and to improvise. In this paper, we investigate how a number of digital musical instruments (DMIs), including two of our own, have applied predictive machine learning models that assist users by predicting unknown states of musical processes. We characterise these predictions as focussed within a musical instrumen… ▽ More Musical performance requires prediction to operate instruments, to perform in groups and to improvise. In this paper, we investigate how a number of digital musical instruments (DMIs), including two of our own, have applied predictive machine learning models that assist users by predicting unknown states of musical processes. We characterise these predictions as focussed within a musical instrument, at the level of individual performers, and between members of an ensemble. These models can connect to existing frameworks for DMI design and have parallels in the cognitive predictions of human musicians. We discuss how recent advances in deep learning highlight the role of prediction in DMIs, by allowing data-driven predictive models with a long memory of past states. The systems we review are used to motivate musical use-cases where prediction is a necessary component, and to highlight a number of challenges for DMI designers seeking to apply deep predictive models in interactive music systems of the future. △ Less

Submitted 19 December, 2018; v1 submitted 31 January, 2018; originally announced January 2018.

arXiv:1711.10746 [pdf, other]

doi 10.1007/978-3-319-77583-8_11

RoboJam: A Musical Mixture Density Network for Collaborative Touchscreen Interaction

Authors: Charles P. Martin, Jim Torresen

Abstract: RoboJam is a machine-learning system for generating music that assists users of a touchscreen music app by performing responses to their short improvisations. This system uses a recurrent artificial neural network to generate sequences of touchscreen interactions and absolute timings, rather than high-level musical notes. To accomplish this, RoboJam's network uses a mixture density layer to predic… ▽ More RoboJam is a machine-learning system for generating music that assists users of a touchscreen music app by performing responses to their short improvisations. This system uses a recurrent artificial neural network to generate sequences of touchscreen interactions and absolute timings, rather than high-level musical notes. To accomplish this, RoboJam's network uses a mixture density layer to predict appropriate touch interaction locations in space and time. In this paper, we describe the design and implementation of RoboJam's network and how it has been integrated into a touchscreen music app. A preliminary evaluation analyses the system in terms of training, musical generation and user interaction. △ Less

Submitted 29 November, 2017; originally announced November 2017.

Journal ref: Computational Intelligence in Music, Sound, Art and Design. EvoMUSART 2018. Lecture Notes in Computer Science, vol 10783

Showing 1–20 of 20 results for author: Martin, C P