Search | arXiv e-print repository

doi 10.1007/978-3-031-42293-5_8

Quantifying Device Usefulness -- How Useful is an Obsolete Device?

Authors: Craig Goodwin, Sandra Woolley, Ed de Quincey, Tim Collins

Abstract: Obsolete devices add to the rising levels of electronic waste, a major environmental concern, and a contributing factor to climate change. In recent years, device manufacturers have established environmental commitments and launched initiatives such as supporting the recycling of obsolete devices by making more ways available for consumers to safely dispose of their old devices. However, little su… ▽ More Obsolete devices add to the rising levels of electronic waste, a major environmental concern, and a contributing factor to climate change. In recent years, device manufacturers have established environmental commitments and launched initiatives such as supporting the recycling of obsolete devices by making more ways available for consumers to safely dispose of their old devices. However, little support is available for individuals who want to continue using legacy or 'end-of-life' devices and few studies have explored the usefulness of these older devices, the barriers to their continued use and the associated user experiences. With a human-computer interaction lens, this paper reflects on device usefulness as a function of utility and usability, and on the barriers to continued device use and app installation. Additionally, the paper contributes insights from a sequel study that extends on prior work evaluating app functionality of a 'vintage' Apple device with new empirical data on app downloadability and functionality for the same device when newly classified as 'obsolete'. A total of 230 apps, comprising the top 10 free App Store apps for each of 23 categories, were assessed for downloadability and functionality on an Apple iPad Mini tablet. Although only 20 apps (8.7%) could be downloaded directly onto the newly obsolete device, 143 apps (62.2%) could be downloaded with the use of a different non-legacy device. Of these 163 downloadable apps, 131 apps (com-prising 57% of all 230 apps and 80.4% of the downloadable apps) successfully installed, opened, and functioned. This was a decrease of only 4.3% in functional apps (of the 230 total apps) compared to the performance of the device when previously classified as 'vintage'. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 10 pages, 3 figures, 1 table

Journal ref: Human-Computer Interaction - INTERACT 2023. Lecture Notes in Computer Science, vol 14145. Springer, Cham (2023)

arXiv:2310.12646 [pdf, other]

TRUSTED: The Paired 3D Transabdominal Ultrasound and CT Human Data for Kidney Segmentation and Registration Research

Authors: William Ndzimbong, Cyril Fourniol, Loic Themyr, Nicolas Thome, Yvonne Keeza, Beniot Sauer, Pierre-Thierry Piechaud, Arnaud Mejean, Jacques Marescaux, Daniel George, Didier Mutter, Alexandre Hostettler, Toby Collins

Abstract: Inter-modal image registration (IMIR) and image segmentation with abdominal Ultrasound (US) data has many important clinical applications, including image-guided surgery, automatic organ measurement and robotic navigation. However, research is severely limited by the lack of public datasets. We propose TRUSTED (the Tridimensional Renal Ultra Sound TomodEnsitometrie Dataset), comprising paired tran… ▽ More Inter-modal image registration (IMIR) and image segmentation with abdominal Ultrasound (US) data has many important clinical applications, including image-guided surgery, automatic organ measurement and robotic navigation. However, research is severely limited by the lack of public datasets. We propose TRUSTED (the Tridimensional Renal Ultra Sound TomodEnsitometrie Dataset), comprising paired transabdominal 3DUS and CT kidney images from 48 human patients (96 kidneys), including segmentation, and anatomical landmark annotations by two experienced radiographers. Inter-rater segmentation agreement was over 94 (Dice score), and gold-standard segmentations were generated using the STAPLE algorithm. Seven anatomical landmarks were annotated, important for IMIR systems development and evaluation. To validate the dataset's utility, 5 competitive Deep Learning models for automatic kidney segmentation were benchmarked, yielding average DICE scores from 83.2% to 89.1% for CT, and 61.9% to 79.4% for US images. Three IMIR methods were benchmarked, and Coherent Point Drift performed best with an average Target Registration Error of 4.53mm. The TRUSTED dataset may be used freely researchers to develop and validate new segmentation and IMIR methods. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: Alexandre Hostettler, and Toby Collins share last authorship

arXiv:2307.08007 [pdf, other]

NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks

Authors: Adrián Barahona-Ríos, Tom Collins

Abstract: Controllable neural audio synthesis of sound effects is a challenging task due to the potential scarcity and spectro-temporal variance of the data. Differentiable digital signal processing (DDSP) synthesisers have been successfully employed to model and control musical and harmonic signals using relatively limited data and computational resources. Here we propose NoiseBandNet, an architecture capa… ▽ More Controllable neural audio synthesis of sound effects is a challenging task due to the potential scarcity and spectro-temporal variance of the data. Differentiable digital signal processing (DDSP) synthesisers have been successfully employed to model and control musical and harmonic signals using relatively limited data and computational resources. Here we propose NoiseBandNet, an architecture capable of synthesising and controlling sound effects by filtering white noise through a filterbank, thus going further than previous systems that make assumptions about the harmonic nature of sounds. We evaluate our approach via a series of experiments, modelling footsteps, thunderstorm, pottery, knocking, and metal sound effects. Comparing NoiseBandNet audio reconstruction capabilities to four variants of the DDSP-filtered noise synthesiser, NoiseBandNet scores higher in nine out of ten evaluation categories, establishing a flexible DDSP method for generating time-varying, inharmonic sound effects of arbitrary length with both good time and frequency resolution. Finally, we introduce some potential creative uses of NoiseBandNet, by generating variations, performing loudness transfer, and by training it on user-defined control curves. △ Less

Submitted 16 July, 2023; originally announced July 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2307.03718 [pdf, other]

Frontier AI Regulation: Managing Emerging Risks to Public Safety

Authors: Markus Anderljung, Joslyn Barnhart, Anton Korinek, Jade Leung, Cullen O'Keefe, Jess Whittlestone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, Ben Chang, Tantum Collins, Tim Fist, Gillian Hadfield, Alan Hayes, Lewis Ho, Sara Hooker, Eric Horvitz, Noam Kolt, Jonas Schuett, Yonadav Shavit, Divya Siddarth, Robert Trager, Kevin Wolf

Abstract: Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term "frontier AI" models: highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety. Frontier AI models pose a distinct regulatory challenge: dangerous capabilit… ▽ More Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term "frontier AI" models: highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety. Frontier AI models pose a distinct regulatory challenge: dangerous capabilities can arise unexpectedly; it is difficult to robustly prevent a deployed model from being misused; and, it is difficult to stop a model's capabilities from proliferating broadly. To address these challenges, at least three building blocks for the regulation of frontier models are needed: (1) standard-setting processes to identify appropriate requirements for frontier AI developers, (2) registration and reporting requirements to provide regulators with visibility into frontier AI development processes, and (3) mechanisms to ensure compliance with safety standards for the development and deployment of frontier AI models. Industry self-regulation is an important first step. However, wider societal discussions and government intervention will be needed to create standards and to ensure compliance with them. We consider several options to this end, including granting enforcement powers to supervisory authorities and licensure regimes for frontier AI models. Finally, we propose an initial set of safety standards. These include conducting pre-deployment risk assessments; external scrutiny of model behavior; using risk assessments to inform deployment decisions; and monitoring and responding to new information about model capabilities and uses post-deployment. We hope this discussion contributes to the broader conversation on how to balance public safety risks and innovation benefits from advances at the frontier of AI development. △ Less

Submitted 7 November, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Update July 11th: - Added missing footnote back in. - Adjusted author order (mistakenly non-alphabetical among the first 6 authors) and adjusted affiliations (Jess Whittlestone's affiliation was mistagged and Gillian Hadfield had SRI added to her affiliations) Updated September 4th: Various typos

arXiv:2304.12840 [pdf, other]

doi 10.1177/23998083231209073

Spatiotemporal gender differences in urban vibrancy

Authors: Thomas Collins, Riccardo Di Clemente, Mario Gutiérrez-Roig, Federico Botta

Abstract: Urban vibrancy is the dynamic activity of humans in urban locations. It can vary with urban features and the opportunities for human interactions, but it might also differ according to the underlying social conditions of city inhabitants across and within social surroundings. Such heterogeneity in how different demographic groups may experience cities has the potential to cause gender segregation… ▽ More Urban vibrancy is the dynamic activity of humans in urban locations. It can vary with urban features and the opportunities for human interactions, but it might also differ according to the underlying social conditions of city inhabitants across and within social surroundings. Such heterogeneity in how different demographic groups may experience cities has the potential to cause gender segregation because of differences in the preferences of inhabitants, their accessibility and opportunities, and large-scale mobility behaviours. However, traditional studies have failed to capture fully a high-frequency understanding of how urban vibrancy is linked to urban features, how this might differ for different genders, and how this might affect segregation in cities. Our results show that (1) there are differences between males and females in terms of urban vibrancy, (2) the differences relate to `Points of Interest` as well as transportation networks, and (3) that there are both positive and negative `spatial spillovers` existing across each city. To do this, we use a quantitative approach using Call Detail Record data--taking advantage of the near-ubiquitous use of mobile phones--to gain high-frequency observations of spatial behaviours across the seven most prominent cities of Italy. We use a spatial model comparison approach of the direct and `spillover` effects from urban features on male-female differences. Our results increase our understanding of inequality in cities and how we can make future cities fairer. △ Less

Submitted 11 October, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: 20 pages, 10 figures: 1 figure and 1 table in main, 7 figures in supplementary material, 3 tables in supplementary material. Submitted to Environment and Planning B: Urban Analytics and City Science special issue on Spatial Inequalities and Cities

Journal ref: Environment and Planning B: Urban Analytics and City Science (2023)

arXiv:2303.08956 [pdf]

Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases

Authors: Emma Bluemke, Tantum Collins, Ben Garfinkel, Andrew Trask

Abstract: The development of privacy-enhancing technologies has made immense progress in reducing trade-offs between privacy and performance in data exchange and analysis. Similar tools for structured transparency could be useful for AI governance by offering capabilities such as external scrutiny, auditing, and source verification. It is useful to view these different AI governance objectives as a system o… ▽ More The development of privacy-enhancing technologies has made immense progress in reducing trade-offs between privacy and performance in data exchange and analysis. Similar tools for structured transparency could be useful for AI governance by offering capabilities such as external scrutiny, auditing, and source verification. It is useful to view these different AI governance objectives as a system of information flows in order to avoid partial solutions and significant gaps in governance, as there may be significant overlap in the software stacks needed for the AI governance use cases mentioned in this text. When viewing the system as a whole, the importance of interoperability between these different AI governance solutions becomes clear. Therefore, it is imminently important to look at these problems in AI governance as a system, before these standards, auditing procedures, software, and norms settle into place. △ Less

Submitted 20 March, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2012.08347

arXiv:2303.04244 [pdf, other]

A Light-Weight Contrastive Approach for Aligning Human Pose Sequences

Authors: Robert T. Collins

Abstract: We present a simple unsupervised method for learning an encoder mapping short 3D pose sequences into embedding vectors suitable for sequence-to-sequence alignment by dynamic time warping. Training samples consist of temporal windows of frames containing 3D body points such as mocap markers or skeleton joints. A light-weight, 3-layer encoder is trained using a contrastive loss function that encoura… ▽ More We present a simple unsupervised method for learning an encoder mapping short 3D pose sequences into embedding vectors suitable for sequence-to-sequence alignment by dynamic time warping. Training samples consist of temporal windows of frames containing 3D body points such as mocap markers or skeleton joints. A light-weight, 3-layer encoder is trained using a contrastive loss function that encourages embedding vectors of augmented sample pairs to have cosine similarity 1, and similarity 0 with all other samples in a minibatch. When multiple scripted training sequences are available, temporal alignments inferred from an initial round of training are harvested to extract additional, cross-performance match pairs for a second phase of training to refine the encoder. In addition to being simple, the proposed method is fast to train, making it easy to adapt to new data using different marker sets or skeletal joint layouts. Experimental results illustrate ease of use, transferability, and utility of the learned embeddings for comparing and analyzing human behavior sequences. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2212.07890 [pdf, other]

Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation

Authors: Loic Themyr, Clement Rambour, Nicolas Thome, Toby Collins, Alexandre Hostettler

Abstract: Transformers have proved to be very effective for visual recognition tasks. In particular, vision transformers construct compressed global representations through self-attention and learnable class tokens. Multi-resolution transformers have shown recent successes in semantic segmentation but can only capture local interactions in high-resolution feature maps. This paper extends the notion of globa… ▽ More Transformers have proved to be very effective for visual recognition tasks. In particular, vision transformers construct compressed global representations through self-attention and learnable class tokens. Multi-resolution transformers have shown recent successes in semantic segmentation but can only capture local interactions in high-resolution feature maps. This paper extends the notion of global tokens to build GLobal Attention Multi-resolution (GLAM) transformers. GLAM is a generic module that can be integrated into most existing transformer backbones. GLAM includes learnable global tokens, which unlike previous methods can model interactions between all image regions, and extracts powerful representations during training. Extensive experiments show that GLAM-Swin or GLAM-Swin-UNet exhibit substantially better performances than their vanilla counterparts on ADE20K and Cityscapes. Moreover, GLAM can be used to segment large 3D medical images, and GLAM-nnFormer achieves new state-of-the-art performance on the BCV dataset. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: Winter Conference on Applications of Computer Vision (WACV 2023)

MSC Class: 68T45

arXiv:2210.05313 [pdf, other]

Memory transformers for full context and high-resolution 3D Medical Segmentation

Authors: Loic Themyr, Clément Rambour, Nicolas Thome, Toby Collins, Alexandre Hostettler

Abstract: Transformer models achieve state-of-the-art results for image segmentation. However, achieving long-range attention, necessary to capture global context, with high-resolution 3D images is a fundamental challenge. This paper introduces the Full resolutIoN mEmory (FINE) transformer to overcome this issue. The core idea behind FINE is to learn memory tokens to indirectly model full range interactions… ▽ More Transformer models achieve state-of-the-art results for image segmentation. However, achieving long-range attention, necessary to capture global context, with high-resolution 3D images is a fundamental challenge. This paper introduces the Full resolutIoN mEmory (FINE) transformer to overcome this issue. The core idea behind FINE is to learn memory tokens to indirectly model full range interactions while scaling well in both memory and computational costs. FINE introduces memory tokens at two levels: the first one allows full interaction between voxels within local image regions (patches), the second one allows full interactions between all regions of the 3D volume. Combined, they allow full attention over high resolution images, e.g. 512 x 512 x 256 voxels and above. Experiments on the BCV image segmentation dataset shows better performances than state-of-the-art CNN and transformer baselines, highlighting the superiority of our full attention mechanism compared to recent transformer baselines, e.g. CoTr, and nnFormer. △ Less

Submitted 11 October, 2022; originally announced October 2022.

MSC Class: 68T45

arXiv:2206.11443 [pdf, other]

Image-based Stability Quantification

Authors: Jesse Scott, John Challis, Robert T. Collins, Yanxi Liu

Abstract: Quantitative evaluation of human stability using foot pressure/force measurement hardware and motion capture (mocap) technology is expensive, time consuming, and restricted to the laboratory. We propose a novel image-based method to estimate three key components for stability computation: Center of Mass (CoM), Base of Support (BoS), and Center of Pressure (CoP). Furthermore, we quantitatively vali… ▽ More Quantitative evaluation of human stability using foot pressure/force measurement hardware and motion capture (mocap) technology is expensive, time consuming, and restricted to the laboratory. We propose a novel image-based method to estimate three key components for stability computation: Center of Mass (CoM), Base of Support (BoS), and Center of Pressure (CoP). Furthermore, we quantitatively validate our image-based methods for computing two classic stability measures, CoMtoCoP and CoMtoBoS distances, against values generated directly from laboratory-based sensor output (ground truth) using a publicly available, multi-modality (mocap, foot pressure, two-view videos), ten-subject human motion dataset. Using Leave One Subject Out (LOSO) cross-validation, experimental results show: 1) our image-based CoM estimation method (CoMNet) consistently outperforms state-of-the-art inertial sensor-based CoM estimation techniques; 2) stability computed by our image-based method combined with insole foot pressure sensor data produces consistent, strong, and statistically significant correlation with ground truth stability measures (CoMtoCoP r = 0.79 p < 0.001, CoMtoBoS r = 0.75 p < 0.001); 3) our fully image-based estimation of stability produces consistent, positive, and statistically significant correlation on the two stability metrics (CoMtoCoP r = 0.31 p < 0.001, CoMtoBoS r = 0.22 p < 0.043). Our study provides promising quantitative evidence for the feasibility of image-based stability evaluation in natural environments. △ Less

Submitted 2 November, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

arXiv:2110.07311 [pdf, other]

SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs

Authors: Adrián Barahona-Ríos, Tom Collins

Abstract: Single-image generative adversarial networks learn from the internal distribution of a single training example to generate variations of it, removing the need of a large dataset. In this paper we introduce SpecSinGAN, an unconditional generative architecture that takes a single one-shot sound effect (e.g., a footstep; a character jump) and produces novel variations of it, as if they were different… ▽ More Single-image generative adversarial networks learn from the internal distribution of a single training example to generate variations of it, removing the need of a large dataset. In this paper we introduce SpecSinGAN, an unconditional generative architecture that takes a single one-shot sound effect (e.g., a footstep; a character jump) and produces novel variations of it, as if they were different takes from the same recording session. We explore the use of multi-channel spectrograms to train the model on the various layers that comprise a single sound effect. A listening study comparing our model to real recordings and to digital signal processing procedural audio models in terms of sound plausibility and variation revealed that SpecSinGAN is more plausible and varied than the procedural audio models considered, when using multi-channel spectrograms. Sound examples can be found at the project website: https://www.adrianbarahonarios.com/specsingan/ △ Less

Submitted 5 April, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

arXiv:2012.08630 [pdf, other]

Open Problems in Cooperative AI

Authors: Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R. McKee, Joel Z. Leibo, Kate Larson, Thore Graepel

Abstract: Problems of cooperation--in which agents seek ways to jointly improve their welfare--are ubiquitous and important. They can be found at scales ranging from our daily routines--such as driving on highways, scheduling meetings, and working collaboratively--to our global challenges--such as peace, commerce, and pandemic preparedness. Arguably, the success of the human species is rooted in our ability… ▽ More Problems of cooperation--in which agents seek ways to jointly improve their welfare--are ubiquitous and important. They can be found at scales ranging from our daily routines--such as driving on highways, scheduling meetings, and working collaboratively--to our global challenges--such as peace, commerce, and pandemic preparedness. Arguably, the success of the human species is rooted in our ability to cooperate. Since machines powered by artificial intelligence are playing an ever greater role in our lives, it will be important to equip them with the capabilities necessary to cooperate and to foster cooperation. We see an opportunity for the field of artificial intelligence to explicitly focus effort on this class of problems, which we term Cooperative AI. The objective of this research would be to study the many aspects of the problems of cooperation and to innovate in AI to contribute to solving these problems. Central goals include building machine agents with the capabilities needed for cooperation, building tools to foster cooperation in populations of (machine and/or human) agents, and otherwise conducting AI research for insight relevant to problems of cooperation. This research integrates ongoing work on multi-agent systems, game theory and social choice, human-machine interaction and alignment, natural-language processing, and the construction of social tools and platforms. However, Cooperative AI is not the union of these existing areas, but rather an independent bet about the productivity of specific kinds of conversations that involve these and other areas. We see opportunity to more explicitly focus on the problem of cooperation, to construct unified theory and vocabulary, and to build bridges with adjacent communities working on cooperation, including in the natural, social, and behavioural sciences. △ Less

Submitted 15 December, 2020; originally announced December 2020.

arXiv:2012.08347 [pdf]

Beyond Privacy Trade-offs with Structured Transparency

Authors: Andrew Trask, Emma Bluemke, Teddy Collins, Ben Garfinkel Eric Drexler, Claudia Ghezzou Cuervas-Mons, Iason Gabriel, Allan Dafoe, William Isaac

Abstract: Successful collaboration involves sharing information. However, parties may disagree on how the information they need to share should be used. We argue that many of these concerns reduce to 'the copy problem': once a bit of information is copied and shared, the sender can no longer control how the recipient uses it. From the perspective of each collaborator, this presents a dilemma that can inhibi… ▽ More Successful collaboration involves sharing information. However, parties may disagree on how the information they need to share should be used. We argue that many of these concerns reduce to 'the copy problem': once a bit of information is copied and shared, the sender can no longer control how the recipient uses it. From the perspective of each collaborator, this presents a dilemma that can inhibit collaboration. The copy problem is often amplified by three related problems which we term the bundling, edit, and recursive enforcement problems. We find that while the copy problem is not solvable, aspects of these amplifying problems have been addressed in a variety of disconnected fields. We observe that combining these efforts could improve the governability of information flows and thereby incentivise collaboration. We propose a five-part framework which groups these efforts into specific capabilities and offers a foundation for their integration into an overarching vision we call "structured transparency". We conclude by surveying an array of use-cases that illustrate the structured transparency principles and their related capabilities. △ Less

Submitted 12 March, 2024; v1 submitted 15 December, 2020; originally announced December 2020.

arXiv:2011.02284 [pdf, other]

Surgical Data Science -- from Concepts toward Clinical Translation

Authors: Lena Maier-Hein, Matthias Eisenmann, Duygu Sarikaya, Keno März, Toby Collins, Anand Malpani, Johannes Fallert, Hubertus Feussner, Stamatia Giannarou, Pietro Mascagni, Hirenkumar Nakawala, Adrian Park, Carla Pugh, Danail Stoyanov, Swaroop S. Vedula, Kevin Cleary, Gabor Fichtinger, Germain Forestier, Bernard Gibaud, Teodor Grantcharov, Makoto Hashizume, Doreen Heckmann-Nötzel, Hannes G. Kenngott, Ron Kikinis, Lars Mündermann , et al. (25 additional authors not shown)

Abstract: Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applica… ▽ More Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applications have been studied in the fields of radiological and clinical data science, translational success stories are still lacking in surgery. In this publication, we shed light on the underlying reasons and provide a roadmap for future advances in the field. Based on an international workshop involving leading researchers in the field of SDS, we review current practice, key achievements and initiatives as well as available standards and tools for a number of topics relevant to the field, namely (1) infrastructure for data acquisition, storage and access in the presence of regulatory constraints, (2) data annotation and sharing and (3) data analytics. We further complement this technical perspective with (4) a review of currently available SDS products and the translational progress from academia and (5) a roadmap for faster clinical translation and exploitation of the full potential of SDS, based on an international multi-round Delphi process. △ Less

Submitted 30 July, 2021; v1 submitted 30 October, 2020; originally announced November 2020.

arXiv:2001.00657 [pdf, other]

From Kinematics To Dynamics: Estimating Center of Pressure and Base of Support from Video Frames of Human Motion

Authors: Jesse Scott, Christopher Funk, Bharadwaj Ravichandran, John H. Challis, Robert T. Collins, Yanxi Liu

Abstract: To gain an understanding of the relation between a given human pose image and the corresponding physical foot pressure of the human subject, we propose and validate two end-to-end deep learning architectures, PressNet and PressNet-Simple, to regress foot pressure heatmaps (dynamics) from 2D human pose (kinematics) derived from a video frame. A unique video and foot pressure data set of 813,050 syn… ▽ More To gain an understanding of the relation between a given human pose image and the corresponding physical foot pressure of the human subject, we propose and validate two end-to-end deep learning architectures, PressNet and PressNet-Simple, to regress foot pressure heatmaps (dynamics) from 2D human pose (kinematics) derived from a video frame. A unique video and foot pressure data set of 813,050 synchronized pairs, composed of 5-minute long choreographed Taiji movement sequences of 6 subjects, is collected and used for leaving-one-subject-out cross validation. Our initial experimental results demonstrate reliable and repeatable foot pressure prediction from a single image, setting the first baseline for such a complex cross modality mapping problem in computer vision. Furthermore, we compute and quantitatively validate the Center of Pressure (CoP) and Base of Support (BoS) from predicted foot pressure distribution, obtaining key components in pose stability analysis from images with potential applications in kinesiology, medicine, sports and robotics. △ Less

Submitted 2 January, 2020; originally announced January 2020.

arXiv:1811.12607 [pdf, other]

Learning Dynamics from Kinematics: Estimating 2D Foot Pressure Maps from Video Frames

Authors: Christopher Funk, Savinay Nagendra, Jesse Scott, Bharadwaj Ravichandran, John H. Challis, Robert T. Collins, Yanxi Liu

Abstract: Pose stability analysis is the key to understanding locomotion and control of body equilibrium, with applications in numerous fields such as kinesiology, medicine, and robotics. In biomechanics, Center of Pressure (CoP) is used in studies of human postural control and gait. We propose and validate a novel approach to learn CoP from pose of a human body to aid stability analysis. More specifically,… ▽ More Pose stability analysis is the key to understanding locomotion and control of body equilibrium, with applications in numerous fields such as kinesiology, medicine, and robotics. In biomechanics, Center of Pressure (CoP) is used in studies of human postural control and gait. We propose and validate a novel approach to learn CoP from pose of a human body to aid stability analysis. More specifically, we propose an end-to-end deep learning architecture to regress foot pressure heatmaps, and hence the CoP locations, from 2D human pose derived from video. We have collected a set of long (5min +) choreographed Taiji (Tai Chi) sequences of multiple subjects with synchronized foot pressure and video data. The derived human pose data and corresponding foot pressure maps are used jointly in training a convolutional neural network with residual architecture, named PressNET. Cross-subject validation results show promising performance of PressNET, significantly outperforming the baseline method of K-Nearest Neighbors. Furthermore, we demonstrate that our computation of center of pressure (CoP) from PressNET is not only significantly more accurate than those obtained from the baseline approach but also meets the expectations of corresponding lab-based measurements of stability studies in kinesiology. △ Less

Submitted 28 May, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

arXiv:1811.07791 [pdf, other]

Deep Shape-from-Template: Wide-Baseline, Dense and Fast Registration and Deformable Reconstruction from a Single Image

Authors: David Fuentes-Jimenez, David Casillas-Perez, Daniel Pizarro, Toby Collins, Adrien Bartoli

Abstract: We present Deep Shape-from-Template (DeepSfT), a novel Deep Neural Network (DNN) method for solving real-time automatic registration and 3D reconstruction of a deformable object viewed in a single monocular image.DeepSfT advances the state-of-the-art in various aspects. Compared to existing DNN SfT methods, it is the first fully convolutional real-time approach that handles an arbitrary object geo… ▽ More We present Deep Shape-from-Template (DeepSfT), a novel Deep Neural Network (DNN) method for solving real-time automatic registration and 3D reconstruction of a deformable object viewed in a single monocular image.DeepSfT advances the state-of-the-art in various aspects. Compared to existing DNN SfT methods, it is the first fully convolutional real-time approach that handles an arbitrary object geometry, topology and surface representation. It also does not require ground truth registration with real data and scales well to very complex object models with large numbers of elements. Compared to previous non-DNN SfT methods, it does not involve numerical optimization at run-time, and is a dense, wide-baseline solution that does not demand, and does not suffer from, feature-based matching. It is able to process a single image with significant deformation and viewpoint changes, and handles well the core challenges of occlusions, weak texture and blur. DeepSfT is based on residual encoder-decoder structures and refining blocks. It is trained end-to-end with a novel combination of supervised learning from simulated renderings of the object model and semi-supervised automatic fine-tuning using real data captured with a standard RGB-D camera. The cameras used for fine-tuning and run-time can be different, making DeepSfT practical for real-world use. We show that DeepSfT significantly outperforms state-of-the-art wide-baseline approaches for non-trivial templates, with quantitative and qualitative evaluation. △ Less

Submitted 27 February, 2021; v1 submitted 19 November, 2018; originally announced November 2018.

arXiv:1705.09107 [pdf, other]

SLAM based Quasi Dense Reconstruction For Minimally Invasive Surgery Scenes

Authors: Nader Mahmoud, Alexandre Hostettler, Toby Collins, Luc Soler, Christophe Doignon, J. M. M. Montiel

Abstract: Recovering surgical scene structure in laparoscope surgery is crucial step for surgical guidance and augmented reality applications. In this paper, a quasi dense reconstruction algorithm of surgical scene is proposed. This is based on a state-of-the-art SLAM system, and is exploiting the initial exploration phase that is typically performed by the surgeon at the beginning of the surgery. We show h… ▽ More Recovering surgical scene structure in laparoscope surgery is crucial step for surgical guidance and augmented reality applications. In this paper, a quasi dense reconstruction algorithm of surgical scene is proposed. This is based on a state-of-the-art SLAM system, and is exploiting the initial exploration phase that is typically performed by the surgeon at the beginning of the surgery. We show how to convert the sparse SLAM map to a quasi dense scene reconstruction, using pairs of keyframe images and correlation-based featureless patch matching. We have validated the approach with a live porcine experiment using Computed Tomography as ground truth, yielding a Root Mean Squared Error of 4.9mm. △ Less

Submitted 25 May, 2017; originally announced May 2017.

Comments: ICRA 2017 workshop C4 Surgical Robots: Compliant, Continuum, Cognitive, and Collaborative

Showing 1–18 of 18 results for author: Collins, T