Search | arXiv e-print repository

Weakly-Supervised Semantic Segmentation of Circular-Scan, Synthetic-Aperture-Sonar Imagery

Authors: Isaac J. Sledge, Dominic M. Byrne, Jonathan L. King, Steven H. Ostertag, Denton L. Woods, James L. Prater, Jermaine L. Kennedy, Timothy M. Marston, Jose C. Principe

Abstract: We propose a weakly-supervised framework for the semantic segmentation of circular-scan synthetic-aperture-sonar (CSAS) imagery. The first part of our framework is trained in a supervised manner, on image-level labels, to uncover a set of semi-sparse, spatially-discriminative regions in each image. The classification uncertainty of each region is then evaluated. Those areas with the lowest uncerta… ▽ More We propose a weakly-supervised framework for the semantic segmentation of circular-scan synthetic-aperture-sonar (CSAS) imagery. The first part of our framework is trained in a supervised manner, on image-level labels, to uncover a set of semi-sparse, spatially-discriminative regions in each image. The classification uncertainty of each region is then evaluated. Those areas with the lowest uncertainties are then chosen to be weakly labeled segmentation seeds, at the pixel level, for the second part of the framework. Each of the seed extents are progressively resized according to an unsupervised, information-theoretic loss with structured-prediction regularizers. This reshaping process uses multi-scale, adaptively-weighted features to delineate class-specific transitions in local image content. Content-addressable memories are inserted at various parts of our framework so that it can leverage features from previously seen images to improve segmentation performance for related images. We evaluate our weakly-supervised framework using real-world CSAS imagery that contains over ten seafloor classes and ten target classes. We show that our framework performs comparably to nine fully-supervised deep networks. Our framework also outperforms eleven of the best weakly-supervised deep networks. We achieve state-of-the-art performance when pre-training on natural imagery. The average absolute performance gap to the next-best weakly-supervised network is well over ten percent for both natural imagery and sonar imagery. This gap is found to be statistically significant. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: Submitted to the IEEE Journal of Oceanic Engineering

arXiv:2312.06467 [pdf, other]

Aligning brain functions boosts the decoding of visual semantics in novel subjects

Authors: Alexis Thual, Yohann Benchetrit, Felix Geilert, Jérémy Rapin, Iurii Makarov, Hubert Banville, Jean-Rémi King

Abstract: Deep learning is leading to major advances in the realm of brain decoding from functional Magnetic Resonance Imaging (fMRI). However, the large inter-subject variability in brain characteristics has limited most studies to train models on one subject at a time. Consequently, this approach hampers the training of deep learning models, which typically requires very large datasets. Here, we propose t… ▽ More Deep learning is leading to major advances in the realm of brain decoding from functional Magnetic Resonance Imaging (fMRI). However, the large inter-subject variability in brain characteristics has limited most studies to train models on one subject at a time. Consequently, this approach hampers the training of deep learning models, which typically requires very large datasets. Here, we propose to boost brain decoding by aligning brain responses to videos and static images across subjects. Compared to the anatomically-aligned baseline, our method improves out-of-subject decoding performance by up to 75%. Moreover, it also outperforms classical single-subject approaches when fewer than 100 minutes of data is available for the tested subject. Furthermore, we propose a new multi-subject alignment method, which obtains comparable results to that of classical single-subject approaches while improving out-of-subject generalization. Finally, we show that this method aligns neural representations in accordance with brain anatomy. Overall, this study lays the foundations for leveraging extensive neuroimaging datasets and enhancing the decoding of individuals with a limited amount of brain recordings. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2310.19812 [pdf, other]

Brain decoding: toward real-time reconstruction of visual perception

Authors: Yohann Benchetrit, Hubert Banville, Jean-Rémi King

Abstract: In the past five years, the use of generative and foundational AI systems has greatly improved the decoding of brain activity. Visual perception, in particular, can now be decoded from functional Magnetic Resonance Imaging (fMRI) with remarkable fidelity. This neuroimaging technique, however, suffers from a limited temporal resolution ($\approx$0.5 Hz) and thus fundamentally constrains its real-ti… ▽ More In the past five years, the use of generative and foundational AI systems has greatly improved the decoding of brain activity. Visual perception, in particular, can now be decoded from functional Magnetic Resonance Imaging (fMRI) with remarkable fidelity. This neuroimaging technique, however, suffers from a limited temporal resolution ($\approx$0.5 Hz) and thus fundamentally constrains its real-time usage. Here, we propose an alternative approach based on magnetoencephalography (MEG), a neuroimaging device capable of measuring brain activity with high temporal resolution ($\approx$5,000 Hz). For this, we develop an MEG decoding model trained with both contrastive and regression objectives and consisting of three modules: i) pretrained embeddings obtained from the image, ii) an MEG module trained end-to-end and iii) a pretrained image generator. Our results are threefold: Firstly, our MEG decoder shows a 7X improvement of image-retrieval over classic linear decoders. Second, late brain responses to images are best decoded with DINOv2, a recent foundational image model. Third, image retrievals and generations both suggest that high-level visual features can be decoded from MEG signals, although the same approach applied to 7T fMRI also recovers better low-level features. Overall, these results, while preliminary, provide an important step towards the decoding -- in real-time -- of the visual processes continuously unfolding within the human brain. △ Less

Submitted 14 March, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: 25 pages, 13 figures, updated and reformatted version following acceptance at ICLR 2024

arXiv:2305.03391 [pdf, other]

Compressing audio CNNs with graph centrality based filter pruning

Authors: James A King, Arshdeep Singh, Mark D. Plumbley

Abstract: Convolutional neural networks (CNNs) are commonplace in high-performing solutions to many real-world problems, such as audio classification. CNNs have many parameters and filters, with some having a larger impact on the performance than others. This means that networks may contain many unnecessary filters, increasing a CNN's computation and memory requirements while providing limited performance b… ▽ More Convolutional neural networks (CNNs) are commonplace in high-performing solutions to many real-world problems, such as audio classification. CNNs have many parameters and filters, with some having a larger impact on the performance than others. This means that networks may contain many unnecessary filters, increasing a CNN's computation and memory requirements while providing limited performance benefits. To make CNNs more efficient, we propose a pruning framework that eliminates filters with the highest "commonality". We measure this commonality using the graph-theoretic concept of "centrality". We hypothesise that a filter with a high centrality should be eliminated as it represents commonality and can be replaced by other filters without affecting the performance of a network much. An experimental evaluation of the proposed framework is performed on acoustic scene classification and audio tagging. On the DCASE 2021 Task 1A baseline network, our proposed method reduces computations per inference by 71\% with 50\% fewer parameters at less than a two percentage point drop in accuracy compared to the original network. For large-scale CNNs such as PANNs designed for audio tagging, our method reduces 24\% computations per inference with 41\% fewer parameters at a slight improvement in performance. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2210.10203 [pdf, other]

From Model-Based to Model-Free: Learning Building Control for Demand Response

Authors: David Biagioni, Xiangyu Zhang, Christiane Adcock, Michael Sinner, Peter Graf, Jennifer King

Abstract: Grid-interactive building control is a challenging and important problem for reducing carbon emissions, increasing energy efficiency, and supporting the electric power grid. Currently researchers and practitioners are confronted with a choice of control strategies ranging from model-free (purely data-driven) to model-based (directly incorporating physical knowledge) to hybrid methods that combine… ▽ More Grid-interactive building control is a challenging and important problem for reducing carbon emissions, increasing energy efficiency, and supporting the electric power grid. Currently researchers and practitioners are confronted with a choice of control strategies ranging from model-free (purely data-driven) to model-based (directly incorporating physical knowledge) to hybrid methods that combine data and models. In this work, we identify state-of-the-art methods that span this methodological spectrum and evaluate their performance for multi-zone building HVAC control in the context of three demand response programs. We demonstrate, in this context, that hybrid methods offer many benefits over both purely model-free and model-based methods as long as certain requirements are met. In particular, hybrid controllers are relatively sample efficient, fast online, and high accuracy so long as the test case falls within the distribution of training data. Like all data-driven methods, hybrid controllers are still subject to generalization errors when applied to out-of-sample scenarios. Key takeaways for control strategies are summarized and the developed software framework is open-sourced. △ Less

Submitted 18 October, 2022; originally announced October 2022.

arXiv:2208.12266 [pdf, other]

doi 10.1038/s42256-023-00714-5

Decoding speech perception from non-invasive brain recordings

Authors: Alexandre Défossez, Charlotte Caucheteux, Jérémy Rapin, Ori Kabeli, Jean-Rémi King

Abstract: Decoding speech from brain activity is a long-awaited goal in both healthcare and neuroscience. Invasive devices have recently led to major milestones in that regard: deep learning algorithms trained on intracranial recordings now start to decode elementary linguistic features (e.g. letters, words, spectrograms). However, extending this approach to natural speech and non-invasive brain recordings… ▽ More Decoding speech from brain activity is a long-awaited goal in both healthcare and neuroscience. Invasive devices have recently led to major milestones in that regard: deep learning algorithms trained on intracranial recordings now start to decode elementary linguistic features (e.g. letters, words, spectrograms). However, extending this approach to natural speech and non-invasive brain recordings remains a major challenge. Here, we introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from the non-invasive recordings of a large cohort of healthy individuals. To evaluate this approach, we curate and integrate four public datasets, encompassing 175 volunteers recorded with magneto- or electro-encephalography (M/EEG), while they listened to short stories and isolated sentences. The results show that our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities on average across participants, and more than 80% in the very best participants - a performance that allows the decoding of words and phrases absent from the training set. The comparison of our model to a variety of baselines highlights the importance of (i) a contrastive objective, (ii) pretrained representations of speech and (iii) a common convolutional architecture simultaneously trained across multiple participants. Finally, the analysis of the decoder's predictions suggests that they primarily depend on lexical and contextual semantic representations. Overall, this effective decoding of perceived speech from non-invasive recordings delineates a promising path to decode language from brain activity, without putting patients at risk for brain surgery. △ Less

Submitted 5 October, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

Comments: updated version following publication in Nature Machine Intelligence (2023)

arXiv:2208.11488 [pdf]

MEG-MASC: a high-quality magneto-encephalography dataset for evaluating natural speech processing

Authors: Laura Gwilliams, Graham Flick, Alec Marantz, Liina Pylkkanen, David Poeppel, Jean-Remi King

Abstract: The "MEG-MASC" dataset provides a curated set of raw magnetoencephalography (MEG) recordings of 27 English speakers who listened to two hours of naturalistic stories. Each participant performed two identical sessions, involving listening to four fictional stories from the Manually Annotated Sub-Corpus (MASC) intermixed with random word lists and comprehension questions. We time-stamp the onset and… ▽ More The "MEG-MASC" dataset provides a curated set of raw magnetoencephalography (MEG) recordings of 27 English speakers who listened to two hours of naturalistic stories. Each participant performed two identical sessions, involving listening to four fictional stories from the Manually Annotated Sub-Corpus (MASC) intermixed with random word lists and comprehension questions. We time-stamp the onset and offset of each word and phoneme in the metadata of the recording, and organize the dataset according to the 'Brain Imaging Data Structure' (BIDS). This data collection provides a suitable benchmark to large-scale encoding and decoding analyses of temporally-resolved brain responses to speech. We provide the Python code to replicate several validations analyses of the MEG evoked related fields such as the temporal decoding of phonetic features and word frequency. All code and MEG, audio and text data are publicly available to keep with best practices in transparent and reproducible research. △ Less

Submitted 26 July, 2022; originally announced August 2022.

Comments: 11 pages, 4 figures

arXiv:2208.01555 [pdf, other]

Low-complexity CNNs for Acoustic Scene Classification

Authors: Arshdeep Singh, James A King, Xubo Liu, Wenwu Wang, Mark D. Plumbley

Abstract: This technical report describes the SurreyAudioTeam22s submission for DCASE 2022 ASC Task 1, Low-Complexity Acoustic Scene Classification (ASC). The task has two rules, (a) the ASC framework should have maximum 128K parameters, and (b) there should be a maximum of 30 millions multiply-accumulate operations (MACs) per inference. In this report, we present low-complexity systems for ASC that follow… ▽ More This technical report describes the SurreyAudioTeam22s submission for DCASE 2022 ASC Task 1, Low-Complexity Acoustic Scene Classification (ASC). The task has two rules, (a) the ASC framework should have maximum 128K parameters, and (b) there should be a maximum of 30 millions multiply-accumulate operations (MACs) per inference. In this report, we present low-complexity systems for ASC that follow the rules intended for the task. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: Technical Report DCASE 2022 TASK 1. arXiv admin note: substantial text overlap with arXiv:2207.11529

arXiv:2207.07429 [pdf, other]

Continual Learning For On-Device Environmental Sound Classification

Authors: Yang Xiao, Xubo Liu, James King, Arshdeep Singh, Eng Siong Chng, Mark D. Plumbley, Wenwu Wang

Abstract: Continuously learning new classes without catastrophic forgetting is a challenging problem for on-device environmental sound classification given the restrictions on computation resources (e.g., model size, running memory). To address this issue, we propose a simple and efficient continual learning method. Our method selects the historical data for the training by measuring the per-sample classifi… ▽ More Continuously learning new classes without catastrophic forgetting is a challenging problem for on-device environmental sound classification given the restrictions on computation resources (e.g., model size, running memory). To address this issue, we propose a simple and efficient continual learning method. Our method selects the historical data for the training by measuring the per-sample classification uncertainty. Specifically, we measure the uncertainty by observing how the classification probability of data fluctuates against the parallel perturbations added to the classifier embedding. In this way, the computation cost can be significantly reduced compared with adding perturbation to the raw data. Experimental results on the DCASE 2019 Task 1 and ESC-50 dataset show that our proposed method outperforms baseline continual learning methods on classification accuracy and computational efficiency, indicating our method can efficiently and incrementally learn new classes without the catastrophic forgetting problem for on-device environmental sound classification. △ Less

Submitted 18 July, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

Comments: The first two authors contributed equally, 5 pages one figure, submitted to DCASE2022 Workshop

arXiv:2202.08082 [pdf, other]

Formulating Beurling LASSO for Source Separation via Proximal Gradient Iteration

Authors: Sören Schulze, Emily J. King

Abstract: Beurling LASSO generalizes the LASSO problem to finite Radon measures regularized via their total variation. Despite its theoretical appeal, this space is hard to parametrize, which poses an algorithmic challenge. We propose a formulation of continuous convolutional source separation with Beurling LASSO that avoids the explicit computation of the measures and instead employs the duality transform… ▽ More Beurling LASSO generalizes the LASSO problem to finite Radon measures regularized via their total variation. Despite its theoretical appeal, this space is hard to parametrize, which poses an algorithmic challenge. We propose a formulation of continuous convolutional source separation with Beurling LASSO that avoids the explicit computation of the measures and instead employs the duality transform of the proximal mapping. △ Less

Submitted 16 February, 2022; originally announced February 2022.

arXiv:2112.14719 [pdf, ps, other]

Sets of Low Correlation Sequences from Cyclotomy

Authors: Jonathan M. Castello, Daniel J. Katz, Jacob M. King, Alain Olavarrieta

Abstract: Low correlation (finite length) sequences are used in communications and remote sensing. One seeks codebooks of sequences in which each sequence has low aperiodic autocorrelation at all nonzero shifts, and each pair of distinct sequences has low aperiodic crosscorrelation at all shifts. An overall criterion of codebook quality is the demerit factor, which normalizes all sequences to unit Euclidean… ▽ More Low correlation (finite length) sequences are used in communications and remote sensing. One seeks codebooks of sequences in which each sequence has low aperiodic autocorrelation at all nonzero shifts, and each pair of distinct sequences has low aperiodic crosscorrelation at all shifts. An overall criterion of codebook quality is the demerit factor, which normalizes all sequences to unit Euclidean norm, sums the squared magnitudes of all the correlations between every pair of sequences in the codebook (including sequences with themselves to cover autocorrelations), and divides by the square of the number of sequences in the codebook. This demerit factor is expected to be $1+1/N-1/(\ell N)$ for a codebook of $N$ randomly selected binary sequences of length $\ell$, but we want demerit factors much closer to the absolute minimum value of $1$. For each $N$ such that there is an $N\times N$ Hadamard matrix, we use cyclotomy to construct an infinite family of codebooks of binary sequences, in which each codebook has $N-1$ sequences of length $p$, where $p$ runs through the primes with $N\mid p-1$. As $p$ tends to infinity, the demerit factor of the codebooks tends to $1+1/(6(N-1))$, and the maximum magnitude of the undesirable correlations (crosscorrelations between distinct sequences and off-peak autocorrelations) is less than a small constant times $\sqrt{p}\log(p)$. This construction also generalizes to nonbinary sequences. △ Less

Submitted 29 December, 2021; originally announced December 2021.

Comments: 52 pages

arXiv:2111.05969 [pdf, other]

PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems

Authors: David Biagioni, Xiangyu Zhang, Dylan Wald, Deepthi Vaidhynathan, Rohit Chintala, Jennifer King, Ahmed S. Zamzam

Abstract: We present the PowerGridworld software package to provide users with a lightweight, modular, and customizable framework for creating power-systems-focused, multi-agent Gym environments that readily integrate with existing training frameworks for reinforcement learning (RL). Although many frameworks exist for training multi-agent RL (MARL) policies, none can rapidly prototype and develop the enviro… ▽ More We present the PowerGridworld software package to provide users with a lightweight, modular, and customizable framework for creating power-systems-focused, multi-agent Gym environments that readily integrate with existing training frameworks for reinforcement learning (RL). Although many frameworks exist for training multi-agent RL (MARL) policies, none can rapidly prototype and develop the environments themselves, especially in the context of heterogeneous (composite, multi-device) power systems where power flow solutions are required to define grid-level variables and costs. PowerGridworld is an open-source software package that helps to fill this gap. To highlight PowerGridworld's key features, we present two case studies and demonstrate learning MARL policies using both OpenAI's multi-agent deep deterministic policy gradient (MADDPG) and RLLib's proximal policy optimization (PPO) algorithms. In both cases, at least some subset of agents incorporates elements of the power flow solution at each time step as part of their reward (negative cost) structures. △ Less

Submitted 10 November, 2021; originally announced November 2021.

arXiv:2109.14994 [pdf, other]

An investigation of pre-upsampling generative modelling and Generative Adversarial Networks in audio super resolution

Authors: James King, Ramon Viñas Torné, Alexander Campbell, Pietro Liò

Abstract: There have been several successful deep learning models that perform audio super-resolution. Many of these approaches involve using preprocessed feature extraction which requires a lot of domain-specific signal processing knowledge to implement. Convolutional Neural Networks (CNNs) improved upon this framework by automatically learning filters. An example of a convolutional approach is AudioUNet,… ▽ More There have been several successful deep learning models that perform audio super-resolution. Many of these approaches involve using preprocessed feature extraction which requires a lot of domain-specific signal processing knowledge to implement. Convolutional Neural Networks (CNNs) improved upon this framework by automatically learning filters. An example of a convolutional approach is AudioUNet, which takes inspiration from novel methods of upsampling images. Our paper compares the pre-upsampling AudioUNet to a new generative model that upsamples the signal before using deep learning to transform it into a more believable signal. Based on the EDSR network for image super-resolution, the newly proposed model outperforms UNet with a 20% increase in log spectral distance and a mean opinion score of 4.06 compared to 3.82 for the two times upsampling case. AudioEDSR also has 87% fewer parameters than AudioUNet. How incorporating AudioUNet into a Wasserstein GAN (with gradient penalty) (WGAN-GP) structure can affect training is also explored. Finally the effects artifacting has on the current state of the art is analysed and solutions to this problem are proposed. The methods used in this paper have broad applications to telephony, audio recognition and audio generation tasks. △ Less

Submitted 30 September, 2021; originally announced September 2021.

arXiv:2107.04235 [pdf, other]

Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients

Authors: Sören Schulze, Johannes Leuschner, Emily J. King

Abstract: We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep neural network. The network is trained without ground truth information, based on the difference bet… ▽ More We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep neural network. The network is trained without ground truth information, based on the difference between the model prediction and the individual time frames of the short-time Fourier transform. Since some of the model parameters do not yield a useful backpropagation gradient, we model them stochastically and employ the policy gradient instead. To provide phase information and account for inaccuracies in the dictionary-based representation, we also let the network output a direct prediction, which we then use to resynthesize the audio signals for the individual instruments. Due to the flexibility of the neural network, inharmonicity can be incorporated seamlessly and no preprocessing of the input spectra is required. Our algorithm yields high-quality separation results with particularly low interference on a variety of different audio samples, both acoustic and synthetic, provided that the sample contains enough data for the training and that the spectral characteristics of the musical instruments are sufficiently stable to be approximated by the dictionary. △ Less

Submitted 9 August, 2021; v1 submitted 9 July, 2021; originally announced July 2021.

arXiv:2103.01032 [pdf, other]

Inductive biases, pretraining and fine-tuning jointly account for brain responses to speech

Authors: Juliette Millet, Jean-Remi King

Abstract: Our ability to comprehend speech remains, to date, unrivaled by deep learning models. This feat could result from the brain's ability to fine-tune generic sound representations for speech-specific processes. To test this hypothesis, we compare i) five types of deep neural networks to ii) human brain responses elicited by spoken sentences and recorded in 102 Dutch subjects using functional Magnetic… ▽ More Our ability to comprehend speech remains, to date, unrivaled by deep learning models. This feat could result from the brain's ability to fine-tune generic sound representations for speech-specific processes. To test this hypothesis, we compare i) five types of deep neural networks to ii) human brain responses elicited by spoken sentences and recorded in 102 Dutch subjects using functional Magnetic Resonance Imaging (fMRI). Each network was either trained on an acoustics scene classification, a speech-to-text task (based on Bengali, English, or Dutch), or not trained. The similarity between each model and the brain is assessed by correlating their respective activations after an optimal linear projection. The differences in brain-similarity across networks revealed three main results. First, speech representations in the brain can be accounted for by random deep networks. Second, learning to classify acoustic scenes leads deep nets to increase their brain similarity. Third, learning to process phonetically-related speech inputs (i.e., Dutch vs English) leads deep nets to reach higher levels of brain-similarity than learning to process phonetically-distant speech inputs (i.e. Dutch vs Bengali). Together, these results suggest that the human brain fine-tunes its heavily-trained auditory hierarchy to learn to process speech. △ Less

Submitted 25 February, 2021; originally announced March 2021.

Comments: 10 pages, 3 figures

arXiv:2010.10354 [pdf, ps, other]

Time-domain Representation of Passband Scattering Parameters

Authors: Justin B. King

Abstract: This paper presents a simple and accurate method for the inclusion of linear, time-invariant (LTI) networks, described by RF frequency-domain data, within equivalent baseband time-domain simulations. The time-domain representation is formulated as an equivalent baseband discrete-time impulse response, which may be convolved with the equivalent baseband form of the input signal, to obtain the corre… ▽ More This paper presents a simple and accurate method for the inclusion of linear, time-invariant (LTI) networks, described by RF frequency-domain data, within equivalent baseband time-domain simulations. The time-domain representation is formulated as an equivalent baseband discrete-time impulse response, which may be convolved with the equivalent baseband form of the input signal, to obtain the corresponding equivalent baseband output. This allows networks which are most accurately described in the frequency domain, such as frequency-dispersive transmission lines, to be efficiently included as part of a transient time-domain simulation. △ Less

Submitted 14 October, 2020; originally announced October 2020.

Comments: Accepted for publication the the Asia-Pacific Microwave Conference 2020, Hong Kong, China

arXiv:2006.07598 [pdf, other]

doi 10.1088/1742-6596/1618/2/022025

Expert Elicitation on Wind Farm Control

Authors: J. W. van Wingerden, P. A. Fleming, T. Göçmen, I. Eguinoa, B. M. Doekemeijer, K. Dykes, M. Lawson, E. Simley, J. King, D. Astrain, M. Iribas, C. L. Bottasso, J. Meyers, S. Raach, K. Kölle, G. Giebel

Abstract: Wind farm control is an active and growing field of research in which the control actions of individual turbines in a farm are coordinated, accounting for inter-turbine aerodynamic interaction, to improve the overall performance of the wind farm and to reduce costs. The primary objectives of wind farm control include increasing power production, reducing turbine loads, and providing electricity gr… ▽ More Wind farm control is an active and growing field of research in which the control actions of individual turbines in a farm are coordinated, accounting for inter-turbine aerodynamic interaction, to improve the overall performance of the wind farm and to reduce costs. The primary objectives of wind farm control include increasing power production, reducing turbine loads, and providing electricity grid support services. Additional objectives include improving reliability or reducing external impacts to the environment and communities. In 2019, a European research project (FarmConners) was started with the main goal of providing an overview of the state-of-the-art in wind farm control, identifying consensus of research findings, data sets, and best practices, providing a summary of the main research challenges, and establishing a roadmap on how to address these challenges. Complementary to the FarmConners project, an IEA Wind Topical Expert Meeting (TEM) and two rounds of surveys among experts were performed. From these events we can clearly identify an interest in more public validation campaigns. Additionally, a deeper understanding of the mechanical loads and the uncertainties concerning the effectiveness of wind farm control are considered two major research gaps. △ Less

Submitted 16 June, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

arXiv:2001.05412 [pdf, other]

doi 10.1109/LPT.2021.3052649

Intensity-Modulated Fiber-Optic Voltage Sensors for Power Distribution Systems

Authors: Joseph M. Lukens, Nicholas Lagakos, Victor Kaybulkin, Christopher J. Vizas, Daniel J. King

Abstract: We design, test, and analyze fiber-optic voltage sensors based on optical reflection from a piezoelectric transducer. By controlling the physical dimensions of the device, we can tune the frequency of its natural resonance to achieve a desired sensitivity and bandwidth combination. In this work, we fully characterize sensors designed with a 2 kHz characteristic resonance, experimentally verifying… ▽ More We design, test, and analyze fiber-optic voltage sensors based on optical reflection from a piezoelectric transducer. By controlling the physical dimensions of the device, we can tune the frequency of its natural resonance to achieve a desired sensitivity and bandwidth combination. In this work, we fully characterize sensors designed with a 2 kHz characteristic resonance, experimentally verifying a readily usable frequency range from approximately 10 Hz to 3 kHz. Spectral noise measurements indicate detectable voltage levels down to 300 mV rms at 60 Hz, along with a full-scale dynamic range of 60 dB, limited currently by the readout electronics, not the inherent performance of the transducer in the sensor. Additionally, we demonstrate a digital signal processing approach to equalize the measured frequency response, enabling accurate retrieval of short-pulse inputs. Our results suggest the value and applicability of intensity-modulated fiber-optic voltage sensors for measuring both steady-state waveforms and broadband transients which, coupled with the straightforward and compact design of the sensors, should make them effective tools in electric grid monitoring. △ Less

Submitted 15 January, 2020; originally announced January 2020.

arXiv:1911.03019 [pdf, other]

Learning-Accelerated ADMM for Distributed Optimal Power Flow

Authors: David Biagioni, Peter Graf, Xiangyu Zhang, Ahmed Zamzam, Kyri Baker, Jennifer King

Abstract: We propose a novel data-driven method to accelerate the convergence of Alternating Direction Method of Multipliers (ADMM) for solving distributed DC optimal power flow (DC-OPF) where lines are shared between independent network partitions. Using previous observations of ADMM trajectories for a given system under varying load, the method trains a recurrent neural network (RNN) to predict the conver… ▽ More We propose a novel data-driven method to accelerate the convergence of Alternating Direction Method of Multipliers (ADMM) for solving distributed DC optimal power flow (DC-OPF) where lines are shared between independent network partitions. Using previous observations of ADMM trajectories for a given system under varying load, the method trains a recurrent neural network (RNN) to predict the converged values of dual and consensus variables. Given a new realization of system load, a small number of initial ADMM iterations is taken as input to infer the converged values and directly inject them into the iteration. We empirically demonstrate that the online injection of these values into the ADMM iteration accelerates convergence by a significant factor for partitioned 14-, 118- and 2848-bus test systems under differing load scenarios. The proposed method has several advantages: it maintains the security of private decision variables inherent in consensus ADMM; inference is fast and so may be used in online settings; RNN-generated predictions can dramatically improve time to convergence but, by construction, can never result in infeasible ADMM subproblems; it can be easily integrated into existing software implementations. While we focus on the ADMM formulation of distributed DC-OPF in this paper, the ideas presented are naturally extended to other distributed optimization problems. △ Less

Submitted 15 September, 2020; v1 submitted 7 November, 2019; originally announced November 2019.

arXiv:1809.06534 [pdf]

doi 10.1038/s41597-019-0027-4

Multi-channel EEG recordings during a sustained-attention driving task

Authors: Zehong Cao, Chun-Hsiang Chuang, Jung-Kai King, Chin-Teng Lin

Abstract: We described driver behaviour and brain dynamics acquired from a 90-minute sustained-attention task in an immersive driving simulator. The data include 62 copies of 32 channel electroencephalography (EEG) data for 27 subjects that drove on a four lane highway and were asked to keep the car cruising in the centre of the lane. Lane departure events were randomly induced to make the car drift from th… ▽ More We described driver behaviour and brain dynamics acquired from a 90-minute sustained-attention task in an immersive driving simulator. The data include 62 copies of 32 channel electroencephalography (EEG) data for 27 subjects that drove on a four lane highway and were asked to keep the car cruising in the centre of the lane. Lane departure events were randomly induced to make the car drift from the original cruising lane towards the left or right lane. A complete trial includes events with deviation onset, response onset, and response offset. The next trial, in which the subject has to drive back to the original cruising lane, occurs from 5 to 10 seconds after finishing the current trial. We hope that this dataset will lead to the development of novel neural processing assays that can be used to index brain cortical dynamics and detect driving fatigue and drowsiness. This publicly available dataset is beneficial to the neuroscientific and brain computer interface communities. △ Less

Submitted 18 September, 2018; originally announced September 2018.

Comments: This manuscript is submitting to Nature: Scientific Data

Journal ref: Scientific Data (volume 6, Article number: 19) (2019)

arXiv:1806.00273 [pdf, other]

doi 10.1186/s13636-020-00190-4

Sparse Pursuit and Dictionary Learning for Blind Source Separation in Polyphonic Music Recordings

Authors: Sören Schulze, Emily J. King

Abstract: We propose an algorithm for the blind separation of single-channel audio signals. It is based on a parametric model that describes the spectral properties of the sounds of musical instruments independently of pitch. We develop a novel sparse pursuit algorithm that can match the discrete frequency spectra from the recorded signal with the continuous spectra delivered by the model. We first use this… ▽ More We propose an algorithm for the blind separation of single-channel audio signals. It is based on a parametric model that describes the spectral properties of the sounds of musical instruments independently of pitch. We develop a novel sparse pursuit algorithm that can match the discrete frequency spectra from the recorded signal with the continuous spectra delivered by the model. We first use this algorithm to convert an STFT spectrogram from the recording into a novel form of log-frequency spectrogram whose resolution exceeds that of the mel spectrogram. We then make use of the pitch-invariant properties of that representation in order to identify the sounds of the instruments via the same sparse pursuit method. As the model parameters which characterize the musical instruments are not known beforehand, we train a dictionary that contains them, using a modified version of Adam. Applying the algorithm on various audio samples, we find that it is capable of producing high-quality separation results when the model assumptions are satisfied and the instruments are clearly distinguishable, but combinations of instruments with similar spectral characteristics pose a conceptual difficulty. While a key feature of the model is that it explicitly models inharmonicity, its presence can also still impede performance of the sparse pursuit algorithm. In general, due to its pitch-invariance, our method is especially suitable for dealing with spectra from acoustic instruments, requiring only a minimal number of hyperparameters to be preset. Additionally, we demonstrate that the dictionary that is constructed for one recording can be applied to a different recording with similar instruments without additional training. △ Less

Submitted 1 February, 2021; v1 submitted 1 June, 2018; originally announced June 2018.

Journal ref: J. Audio Speech Music Proc. (2021) 2021:6

arXiv:1402.5468 [pdf]

Uncertainty Principle in Control Theory, Part I: Analysis of Performance Limitations

Authors: Ji King

Abstract: This paper investigates performance limitations and tradeoffs in the control design for linear time-invariant systems. It is shown that control specifications in time domain and in frequency domain are always mutually exclusive determined by uncertainty relations. The uncertainty principle from quantum mechanics and harmonic analysis therefore embeds itself inherently in control theory. The relati… ▽ More This paper investigates performance limitations and tradeoffs in the control design for linear time-invariant systems. It is shown that control specifications in time domain and in frequency domain are always mutually exclusive determined by uncertainty relations. The uncertainty principle from quantum mechanics and harmonic analysis therefore embeds itself inherently in control theory. The relations among transient specifications, system bandwidth and control energy are obtained within the framework of uncertainty principle. If the control system is provided with a large bandwidth or great control energy, then it can ensure transient specifications as good as it can be. Such a control system could be approximated by prolate spheroidal wave functions. The obtained results are also applicable to filter design due to the duality of filtering and control. △ Less

Submitted 21 February, 2014; originally announced February 2014.

Comments: 20 pages, 6 figures

MSC Class: 93Axx; 93Cxx ACM Class: F.2.3; I.2.8

Showing 1–22 of 22 results for author: King, J