Search | arXiv e-print repository

arXiv:2407.11988 [pdf, other]

Generating Harder Cross-document Event Coreference Resolution Datasets using Metaphoric Paraphrasing

Authors: Shafiuddin Rehan Ahmed, Zhiyong Eric Wang, George Arthur Baker, Kevin Stowe, James H. Martin

Abstract: The most popular Cross-Document Event Coreference Resolution (CDEC) datasets fail to convey the true difficulty of the task, due to the lack of lexical diversity between coreferring event triggers (words or phrases that refer to an event). Furthermore, there is a dearth of event datasets for figurative language, limiting a crucial avenue of research in event comprehension. We address these two iss… ▽ More The most popular Cross-Document Event Coreference Resolution (CDEC) datasets fail to convey the true difficulty of the task, due to the lack of lexical diversity between coreferring event triggers (words or phrases that refer to an event). Furthermore, there is a dearth of event datasets for figurative language, limiting a crucial avenue of research in event comprehension. We address these two issues by introducing ECB+META, a lexically rich variant of Event Coref Bank Plus (ECB+) for CDEC on symbolic and metaphoric language. We use ChatGPT as a tool for the metaphoric transformation of sentences in the documents of ECB+, then tag the original event triggers in the transformed sentences in a semi-automated manner. In this way, we avoid the re-annotation of expensive coreference links. We present results that show existing methods that work well on ECB+ struggle with ECB+META, thereby paving the way for CDEC research on a much more challenging dataset. Code/data: https://github.com/ahmeshaf/llms_coref △ Less

Submitted 5 June, 2024; originally announced July 2024.

Comments: Short Paper, ACL 2024

arXiv:2405.09153 [pdf, other]

Adapting Abstract Meaning Representation Parsing to the Clinical Narrative -- the SPRING THYME parser

Authors: Jon Z. Cai, Kristin Wright-Bettner, Martha Palmer, Guergana K. Savova, James H. Martin

Abstract: This paper is dedicated to the design and evaluation of the first AMR parser tailored for clinical notes. Our objective was to facilitate the precise transformation of the clinical notes into structured AMR expressions, thereby enhancing the interpretability and usability of clinical text data at scale. Leveraging the colon cancer dataset from the Temporal Histories of Your Medical Events (THYME)… ▽ More This paper is dedicated to the design and evaluation of the first AMR parser tailored for clinical notes. Our objective was to facilitate the precise transformation of the clinical notes into structured AMR expressions, thereby enhancing the interpretability and usability of clinical text data at scale. Leveraging the colon cancer dataset from the Temporal Histories of Your Medical Events (THYME) corpus, we adapted a state-of-the-art AMR parser utilizing continuous training. Our approach incorporates data augmentation techniques to enhance the accuracy of AMR structure predictions. Notably, through this learning strategy, our parser achieved an impressive F1 score of 88% on the THYME corpus's colon cancer dataset. Moreover, our research delved into the efficacy of data required for domain adaptation within the realm of clinical notes, presenting domain adaptation data requirements for AMR parsing. This exploration not only underscores the parser's robust performance but also highlights its potential in facilitating a deeper understanding of clinical narratives through structured semantic representations. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted to the 6th Clinical NLP Workshop at NAACL, 2024

arXiv:2405.01770 [pdf, other]

Bike network planning in limited urban space

Authors: Nina Wiedemann, Christian Nöbel, Henry Martin, Lukas Ballo, Martin Raubal

Abstract: The lack of cycling infrastructure in urban environments hinders the adoption of cycling as a viable mode for commuting, despite the evident benefits of (e-)bikes as sustainable, efficient, and health-promoting transportation modes. Bike network planning is a tedious process, relying on heuristic computational methods that frequently overlook the broader implications of introducing new cycling inf… ▽ More The lack of cycling infrastructure in urban environments hinders the adoption of cycling as a viable mode for commuting, despite the evident benefits of (e-)bikes as sustainable, efficient, and health-promoting transportation modes. Bike network planning is a tedious process, relying on heuristic computational methods that frequently overlook the broader implications of introducing new cycling infrastructure, in particular the necessity to repurpose car lanes. In this work, we call for optimizing the trade-off between bike and car networks, effectively pushing for Pareto optimality. This shift in perspective gives rise to a novel linear programming formulation towards optimal bike network allocation. Our experiments, conducted using both real-world and synthetic data, testify the effectiveness and superiority of this optimization approach compared to heuristic methods. In particular, the framework provides stakeholders with a range of lane reallocation scenarios, illustrating potential bike network enhancements and their implications for car infrastructure. Crucially, our approach is adaptable to various bikeability and car accessibility evaluation criteria, making our tool a highly flexible and scalable resource for urban planning. This paper presents an advanced decision-support framework that can significantly aid urban planners in making informed decisions on cycling infrastructure development. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.08949 [pdf, other]

Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles

Authors: Abhijnan Nath, Huma Jamil, Shafiuddin Rehan Ahmed, George Baker, Rahul Ghosh, James H. Martin, Nathaniel Blanchard, Nikhil Krishnaswamy

Abstract: Event coreference resolution (ECR) is the task of determining whether distinct mentions of events within a multi-document corpus are actually linked to the same underlying occurrence. Images of the events can help facilitate resolution when language is ambiguous. Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple l… ▽ More Event coreference resolution (ECR) is the task of determining whether distinct mentions of events within a multi-document corpus are actually linked to the same underlying occurrence. Images of the events can help facilitate resolution when language is ambiguous. Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple linear map between vision and language models. As existing ECR benchmark datasets rarely provide images for all event mentions, we augment the popular ECB+ dataset with event-centric images scraped from the internet and generated using image diffusion models. We establish three methods that incorporate images and text for coreference: 1) a standard fused model with finetuning, 2) a novel linear mapping method without finetuning and 3) an ensembling approach based on splitting mention pairs by semantic and discourse-level difficulty. We evaluate on 2 datasets: the augmented ECB+, and AIDA Phase 1. Our ensemble systems using cross-modal linear mapping establish an upper limit (91.9 CoNLL F1) on ECB+ ECR performance given the preprocessing assumptions used, and establish a novel baseline on AIDA Phase 1. Our results demonstrate the utility of multimodal information in ECR for certain challenging coreference problems, and highlight a need for more multimodal resources in the coreference resolution space. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: To appear at LREC-COLING 2024

arXiv:2404.08656 [pdf, other]

Linear Cross-document Event Coreference Resolution with X-AMR

Authors: Shafiuddin Rehan Ahmed, George Arthur Baker, Evi Judge, Michael Regan, Kristin Wright-Bettner, Martha Palmer, James H. Martin

Abstract: Event Coreference Resolution (ECR) as a pairwise mention classification task is expensive both for automated systems and manual annotations. The task's quadratic difficulty is exacerbated when using Large Language Models (LLMs), making prompt engineering for ECR prohibitively costly. In this work, we propose a graphical representation of events, X-AMR, anchored around individual mentions using a \… ▽ More Event Coreference Resolution (ECR) as a pairwise mention classification task is expensive both for automated systems and manual annotations. The task's quadratic difficulty is exacerbated when using Large Language Models (LLMs), making prompt engineering for ECR prohibitively costly. In this work, we propose a graphical representation of events, X-AMR, anchored around individual mentions using a \textbf{cross}-document version of \textbf{A}bstract \textbf{M}eaning \textbf{R}epresentation. We then linearize the ECR with a novel multi-hop coreference algorithm over the event graphs. The event graphs simplify ECR, making it a) LLM cost-effective, b) compositional and interpretable, and c) easily annotated. For a fair assessment, we first enrich an existing ECR benchmark dataset with these event graphs using an annotator-friendly tool we introduce. Then, we employ GPT-4, the newest LLM by OpenAI, for these annotations. Finally, using the ECR algorithm, we assess GPT-4 against humans and analyze its limitations. Through this research, we aim to advance the state-of-the-art for efficient ECR and shed light on the potential shortcomings of current LLMs at this task. Code and annotations: \url{https://github.com/ahmeshaf/gpt_coref} △ Less

Submitted 24 March, 2024; originally announced April 2024.

Comments: LREC-COLING 2024 main conference

arXiv:2403.15407 [pdf, other]

X-AMR Annotation Tool

Authors: Shafiuddin Rehan Ahmed, Jon Z. Cai, Martha Palmer, James H. Martin

Abstract: This paper presents a novel Cross-document Abstract Meaning Representation (X-AMR) annotation tool designed for annotating key corpus-level event semantics. Leveraging machine assistance through the Prodigy Annotation Tool, we enhance the user experience, ensuring ease and efficiency in the annotation process. Through empirical analyses, we demonstrate the effectiveness of our tool in augmenting a… ▽ More This paper presents a novel Cross-document Abstract Meaning Representation (X-AMR) annotation tool designed for annotating key corpus-level event semantics. Leveraging machine assistance through the Prodigy Annotation Tool, we enhance the user experience, ensuring ease and efficiency in the annotation process. Through empirical analyses, we demonstrate the effectiveness of our tool in augmenting an existing event corpus, highlighting its advantages when integrated with GPT-4. Code and annotations: https://github.com/ahmeshaf/gpt_coref △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: EACL 2024 System Demonstration

arXiv:2312.00359 [pdf, other]

Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

Authors: Yefan Zhou, Tianyu Pang, Keqin Liu, Charles H. Martin, Michael W. Mahoney, Yaoqing Yang

Abstract: Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely ad… ▽ More Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing. We implement TempBalance on CIFAR10, CIFAR100, SVHN, and TinyImageNet datasets using ResNets, VGGs, and WideResNets with various depths and widths. Our results show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization. We also show that TempBalance outperforms a number of state-of-the-art optimizers and learning rate schedulers. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: NeurIPS 2023 Spotlight, first two authors contributed equally

arXiv:2311.10928 [pdf, other]

CAMRA: Copilot for AMR Annotation

Authors: Jon Z. Cai, Shafiuddin Rehan Ahmed, Julia Bonn, Kristin Wright-Bettner, Martha Palmer, James H. Martin

Abstract: In this paper, we introduce CAMRA (Copilot for AMR Annotatations), a cutting-edge web-based tool designed for constructing Abstract Meaning Representation (AMR) from natural language text. CAMRA offers a novel approach to deep lexical semantics annotation such as AMR, treating AMR annotation akin to coding in programming languages. Leveraging the familiarity of programming paradigms, CAMRA encompa… ▽ More In this paper, we introduce CAMRA (Copilot for AMR Annotatations), a cutting-edge web-based tool designed for constructing Abstract Meaning Representation (AMR) from natural language text. CAMRA offers a novel approach to deep lexical semantics annotation such as AMR, treating AMR annotation akin to coding in programming languages. Leveraging the familiarity of programming paradigms, CAMRA encompasses all essential features of existing AMR editors, including example lookup, while going a step further by integrating Propbank roleset lookup as an autocomplete feature within the tool. Notably, CAMRA incorporates AMR parser models as coding co-pilots, greatly enhancing the efficiency and accuracy of AMR annotators. To demonstrate the tool's capabilities, we provide a live demo accessible at: https://camra.colorado.edu △ Less

Submitted 20 February, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: EMNLP 2023 System Demonstration

arXiv:2306.05434 [pdf, other]

How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?

Authors: Shafiuddin Rehan Ahmed, Abhijnan Nath, Michael Regan, Adam Pollins, Nikhil Krishnaswamy, James H. Martin

Abstract: Annotating cross-document event coreference links is a time-consuming and cognitively demanding task that can compromise annotation quality and efficiency. To address this, we propose a model-in-the-loop annotation approach for event coreference resolution, where a machine learning model suggests likely corefering event pairs only. We evaluate the effectiveness of this approach by first simulating… ▽ More Annotating cross-document event coreference links is a time-consuming and cognitively demanding task that can compromise annotation quality and efficiency. To address this, we propose a model-in-the-loop annotation approach for event coreference resolution, where a machine learning model suggests likely corefering event pairs only. We evaluate the effectiveness of this approach by first simulating the annotation process and then, using a novel annotator-centric Recall-Annotation effort trade-off metric, we compare the results of various underlying models and datasets. We finally present a method for obtaining 97\% recall while substantially reducing the workload required by a fully manual annotation process. Code and data can be found at https://github.com/ahmeshaf/model_in_coref △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: The 17th Liguistics Annotation Workshop, 2023 (LAW-XVII) short paper. 10 pages, 6 figures, 1 table

arXiv:2305.05672 [pdf, other]

$2 * n$ is better than $n^2$: Decomposing Event Coreference Resolution into Two Tractable Problems

Authors: Shafiuddin Rehan Ahmed, Abhijnan Nath, James H. Martin, Nikhil Krishnaswamy

Abstract: Event Coreference Resolution (ECR) is the task of linking mentions of the same event either within or across documents. Most mention pairs are not coreferent, yet many that are coreferent can be identified through simple techniques such as lemma matching of the event triggers or the sentences in which they appear. Existing methods for training coreference systems sample from a largely skewed distr… ▽ More Event Coreference Resolution (ECR) is the task of linking mentions of the same event either within or across documents. Most mention pairs are not coreferent, yet many that are coreferent can be identified through simple techniques such as lemma matching of the event triggers or the sentences in which they appear. Existing methods for training coreference systems sample from a largely skewed distribution, making it difficult for the algorithm to learn coreference beyond surface matching. Additionally, these methods are intractable because of the quadratic operations needed. To address these challenges, we break the problem of ECR into two parts: a) a heuristic to efficiently filter out a large number of non-coreferent pairs, and b) a training approach on a balanced set of coreferent and non-coreferent mention pairs. By following this approach, we show that we get comparable results to the state of the art on two popular ECR datasets while significantly reducing compute requirements. We also analyze the mention pairs that are "hard" to accurately classify as coreferent or non-coreferent. Code at https://github.com/ahmeshaf/lemma_ce_coref △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: Findings of the Association of Computational Linguistics, ACL 2023. 13 pages, 7 figures, 6 tables

arXiv:2303.07758 [pdf, other]

Traffic4cast at NeurIPS 2022 -- Predict Dynamics along Graph Edges from Sparse Node Data: Whole City Traffic and ETA from Stationary Vehicle Detectors

Authors: Moritz Neun, Christian Eichenberger, Henry Martin, Markus Spanring, Rahul Siripurapu, Daniel Springer, Leyan Deng, Chenwang Wu, Defu Lian, Min Zhou, Martin Lumiste, Andrei Ilie, Xinhua Wu, Cheng Lyu, Qing-Long Lu, Vishal Mahajan, Yichao Lu, Jiezhang Li, Junjun Li, Yue-Jiao Gong, Florian Grötschla, Joël Mathys, Ye Wei, He Haitao, Hui Fang , et al. (5 additional authors not shown)

Abstract: The global trends of urbanization and increased personal mobility force us to rethink the way we live and use urban space. The Traffic4cast competition series tackles this problem in a data-driven way, advancing the latest methods in machine learning for modeling complex spatial systems over time. In this edition, our dynamic road graph data combine information from road maps, $10^{12}$ probe data… ▽ More The global trends of urbanization and increased personal mobility force us to rethink the way we live and use urban space. The Traffic4cast competition series tackles this problem in a data-driven way, advancing the latest methods in machine learning for modeling complex spatial systems over time. In this edition, our dynamic road graph data combine information from road maps, $10^{12}$ probe data points, and stationary vehicle detectors in three cities over the span of two years. While stationary vehicle detectors are the most accurate way to capture traffic volume, they are only available in few locations. Traffic4cast 2022 explores models that have the ability to generalize loosely related temporal vertex data on just a few nodes to predict dynamic future traffic states on the edges of the entire road graph. In the core challenge, participants are invited to predict the likelihoods of three congestion classes derived from the speed levels in the GPS data for the entire road graph in three cities 15 min into the future. We only provide vehicle count data from spatially sparse stationary vehicle detectors in these three cities as model input for this task. The data are aggregated in 15 min time bins for one hour prior to the prediction time. For the extended challenge, participants are tasked to predict the average travel times on super-segments 15 min into the future - super-segments are longer sequences of road segments in the graph. The competition results provide an important advance in the prediction of complex city-wide traffic states just from publicly available sparse vehicle data and without the need for large amounts of real-time floating vehicle data. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Pre-print under review, submitted to Proceedings of Machine Learning Research

arXiv:2302.12944 [pdf, other]

Dependency Dialogue Acts -- Annotation Scheme and Case Study

Authors: Jon Z. Cai, Brendan King, Margaret Perkoff, Shiran Dudy, Jie Cao, Marie Grace, Natalia Wojarnik, Ananya Ganesh, James H. Martin, Martha Palmer, Marilyn Walker, Jeffrey Flanigan

Abstract: In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse,… ▽ More In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse, and response structure in multi-party multi-threaded conversations. A few key features distinguish DDA from existing dialogue annotation frameworks such as SWBD-DAMSL and the ISO 24617-2 standard. First, DDA prioritizes the relational structure of the dialogue units and the dialog context, annotating both dialog acts and rhetorical relations as response relations to particular utterances. Second, DDA embraces overloading in dialogues, encouraging annotators to specify multiple response relations and dialog acts for each dialog unit. Lastly, DDA places an emphasis on adequately capturing how a speaker is using the full dialog context to plan and organize their speech. With these features, DDA is highly expressive and recall-oriented with regard to conversation dynamics between multiple speakers. In what follows, we present the DDA annotation framework and case studies annotating DDA structures in multi-party, multi-threaded conversations. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: The 13th International Workshop on Spoken Dialogue Systems Technology

Journal ref: The 13th International Workshop on Spoken Dialogue Systems Technology 2023

arXiv:2302.08761 [pdf, other]

doi 10.1109/TITS.2023.3291737

Metropolitan Segment Traffic Speeds from Massive Floating Car Data in 10 Cities

Authors: Moritz Neun, Christian Eichenberger, Yanan Xin, Cheng Fu, Nina Wiedemann, Henry Martin, Martin Tomko, Lukas Ambühl, Luca Hermes, Michael Kopp

Abstract: Traffic analysis is crucial for urban operations and planning, while the availability of dense urban traffic data beyond loop detectors is still scarce. We present a large-scale floating vehicle dataset of per-street segment traffic information, Metropolitan Segment Traffic Speeds from Massive Floating Car Data in 10 Cities (MeTS-10), available for 10 global cities with a 15-minute resolution for… ▽ More Traffic analysis is crucial for urban operations and planning, while the availability of dense urban traffic data beyond loop detectors is still scarce. We present a large-scale floating vehicle dataset of per-street segment traffic information, Metropolitan Segment Traffic Speeds from Massive Floating Car Data in 10 Cities (MeTS-10), available for 10 global cities with a 15-minute resolution for collection periods ranging between 108 and 361 days in 2019-2021 and covering more than 1500 square kilometers per metropolitan area. MeTS-10 features traffic speed information at all street levels from main arterials to local streets for Antwerp, Bangkok, Barcelona, Berlin, Chicago, Istanbul, London, Madrid, Melbourne and Moscow. The dataset leverages the industrial-scale floating vehicle Traffic4cast data with speeds and vehicle counts provided in a privacy-preserving spatio-temporal aggregation. We detail the efficient matching approach mapping the data to the OpenStreetMap road graph. We evaluate the dataset by comparing it with publicly available stationary vehicle detector data (for Berlin, London, and Madrid) and the Uber traffic speed dataset (for Barcelona, Berlin, and London). The comparison highlights the differences across datasets in spatio-temporal coverage and variations in the reported traffic caused by the binning method. MeTS-10 enables novel, city-wide analysis of mobility and traffic patterns for ten major world cities, overcoming current limitations of spatially sparse vehicle detector data. The large spatial and temporal coverage offers an opportunity for joining the MeTS-10 with other datasets, such as traffic surveys in traffic planning studies or vehicle detector data in traffic control settings. △ Less

Submitted 31 August, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems (T-ITS), DOI: https://doi.org/10.1109/TITS.2023.3291737

Journal ref: IEEE Transactions on Intelligent Transportation Systems (T-ITS), 2023

arXiv:2302.00551 [pdf, other]

doi 10.1007/978-3-031-14862-0_1

Triggering Conditions Analysis and Use Case for Validation of ADAS/ADS Functions

Authors: Víctor J. Expósito Jiménez, Helmut Martin, Christian Schwarzl, Georg Macher, Eugen Brenner

Abstract: Safety in the automotive domain is a well-known topic, which has been in constant development in the past years. The complexity of new systems that add more advanced components in each function has opened new trends that have to be covered from the safety perspective. In this case, not only specifications and requirements have to be covered but also scenarios, which cover all relevant information… ▽ More Safety in the automotive domain is a well-known topic, which has been in constant development in the past years. The complexity of new systems that add more advanced components in each function has opened new trends that have to be covered from the safety perspective. In this case, not only specifications and requirements have to be covered but also scenarios, which cover all relevant information of the vehicle environment. Many of them are not yet still sufficient defined or considered. In this context, Safety of the Intended Functionality (SOTIF) appears to ensure the system when it might fail because of technological shortcomings or misuses by users. An identification of the plausibly insufficiencies of ADAS/ADS functions has to be done to discover the potential triggering conditions that can lead to these unknown scenarios, which might effect a hazardous behaviour. The main goal of this publication is the definition of an use case to identify these triggering conditions that have been applied to the collision avoidance function implemented in our self-developed mobile Hardware-in-Loop (HiL) platform. △ Less

Submitted 31 January, 2023; originally announced February 2023.

arXiv:2302.00437 [pdf]

doi 10.1007/978-3-031-14862-0_14

State of the Art Study of the Safety Argumentation Frameworks for Automated Driving System Safety

Authors: Ilona Cieslik, Víctor J. Expósito Jiménez, Helmut Martin, Heiko Scharke, Hannes Schneider

Abstract: The automotive industry is experiencing a transition from assisted to highly automated driving. New concepts for validation of Automated Driving System (ADS) include amongst other a shift from a "technology based" approach to a "scenario based" assessment. The safety validation and type approval process of ADS are seen as the biggest challenges for the automotive industry today. Having in mind a v… ▽ More The automotive industry is experiencing a transition from assisted to highly automated driving. New concepts for validation of Automated Driving System (ADS) include amongst other a shift from a "technology based" approach to a "scenario based" assessment. The safety validation and type approval process of ADS are seen as the biggest challenges for the automotive industry today. Having in mind a variety of existing white papers, standardization activities and regulatory approaches, manufactures still struggle with selecting the best practices that keep aligned with their Safety Management System and Safety Culture. A step forward would be to implement a harmonized global safety assurance scheme that is compliant with relevant regulations, laws, standards, and reflects local rules. Today many communities (regulatory bodies, local authorities, industrial stake-holders) work on proof-of-concept framework for the Safety Argumentation as an answer to this problem. Unfortunately, there is still no consensus on one definitive methodology and a set of safety metrics to measure ADS safety. An objective of this summary report is to facilitate a comprehensive review and analysis of the literature concerning available methods and approaches for vehicle safety, engineering frameworks, processes of scenario-based evaluation and a vendor- and technology-neutral Safety Argumentation approaches and tools. △ Less

Submitted 31 January, 2023; originally announced February 2023.

arXiv:2301.00280 [pdf]

RECOMED: A Comprehensive Pharmaceutical Recommendation System

Authors: Mariam Zomorodi, Ismail Ghodsollahee, Jennifer H. Martin, Nicholas J. Talley, Vahid Salari, Pawel Plawiak, Kazem Rahimi, U. Rajendra Acharya

Abstract: A comprehensive pharmaceutical recommendation system was designed based on the patients and drugs features extracted from Drugs.com and Druglib.com. First, data from these databases were combined, and a dataset of patients and drug information was built. Secondly, the patients and drugs were clustered, and then the recommendation was performed using different ratings provided by patients, and impo… ▽ More A comprehensive pharmaceutical recommendation system was designed based on the patients and drugs features extracted from Drugs.com and Druglib.com. First, data from these databases were combined, and a dataset of patients and drug information was built. Secondly, the patients and drugs were clustered, and then the recommendation was performed using different ratings provided by patients, and importantly by the knowledge obtained from patients and drug specifications, and considering drug interactions. To the best of our knowledge, we are the first group to consider patients conditions and history in the proposed approach for selecting a specific medicine appropriate for that particular user. Our approach applies artificial intelligence (AI) models for the implementation. Sentiment analysis using natural language processing approaches is employed in pre-processing along with neural network-based methods and recommender system algorithms for modeling the system. In our work, patients conditions and drugs features are used for making two models based on matrix factorization. Then we used drug interaction to filter drugs with severe or mild interactions with other drugs. We developed a deep learning model for recommending drugs by using data from 2304 patients as a training set, and then we used data from 660 patients as our validation set. After that, we used knowledge from critical information about drugs and combined the outcome of the model into a knowledge-based system with the rules obtained from constraints on taking medicine. △ Less

Submitted 21 August, 2023; v1 submitted 31 December, 2022; originally announced January 2023.

Comments: 39 pages, 14 figures, 13 tables

arXiv:2212.10283 [pdf, other]

Interpretable models for extrapolation in scientific machine learning

Authors: Eric S. Muckley, James E. Saal, Bryce Meredig, Christopher S. Roper, John H. Martin

Abstract: Data-driven models are central to scientific discovery. In efforts to achieve state-of-the-art model accuracy, researchers are employing increasingly complex machine learning algorithms that often outperform simple regressions in interpolative settings (e.g. random k-fold cross-validation) but suffer from poor extrapolation performance, portability, and human interpretability, which limits their p… ▽ More Data-driven models are central to scientific discovery. In efforts to achieve state-of-the-art model accuracy, researchers are employing increasingly complex machine learning algorithms that often outperform simple regressions in interpolative settings (e.g. random k-fold cross-validation) but suffer from poor extrapolation performance, portability, and human interpretability, which limits their potential for facilitating novel scientific insight. Here we examine the trade-off between model performance and interpretability across a broad range of science and engineering problems with an emphasis on materials science datasets. We compare the performance of black box random forest and neural network machine learning algorithms to that of single-feature linear regressions which are fitted using interpretable input features discovered by a simple random search algorithm. For interpolation problems, the average prediction errors of linear regressions were twice as high as those of black box models. Remarkably, when prediction tasks required extrapolation, linear models yielded average error only 5% higher than that of black box models, and outperformed black box models in roughly 40% of the tested prediction tasks, which suggests that they may be desirable over complex algorithms in many extrapolation problems because of their superior interpretability, computational overhead, and ease of use. The results challenge the common assumption that extrapolative models for scientific machine learning are constrained by an inherent trade-off between performance and interpretability. △ Less

Submitted 16 December, 2022; originally announced December 2022.

Comments: DISTRIBUTION STATEMENT A (Approved for Public Release, Distribution Unlimited)

arXiv:2210.04095 [pdf, other]

doi 10.1145/3557915.3560996

How do you go where? Improving next location prediction by learning travel mode information using transformers

Authors: Ye Hong, Henry Martin, Martin Raubal

Abstract: Predicting the next visited location of an individual is a key problem in human mobility analysis, as it is required for the personalization and optimization of sustainable transport options. Here, we propose a transformer decoder-based neural network to predict the next location an individual will visit based on historical locations, time, and travel modes, which are behaviour dimensions often ov… ▽ More Predicting the next visited location of an individual is a key problem in human mobility analysis, as it is required for the personalization and optimization of sustainable transport options. Here, we propose a transformer decoder-based neural network to predict the next location an individual will visit based on historical locations, time, and travel modes, which are behaviour dimensions often overlooked in previous work. In particular, the prediction of the next travel mode is designed as an auxiliary task to help guide the network's learning. For evaluation, we apply this approach to two large-scale and long-term GPS tracking datasets involving more than 600 individuals. Our experiments show that the proposed method significantly outperforms other state-of-the-art next location prediction methods by a large margin (8.05% and 5.60% relative increase in F1-score for the two datasets, respectively). We conduct an extensive ablation study that quantifies the influence of considering temporal features, travel mode information, and the auxiliary task on the prediction results. Moreover, we experimentally determine the performance upper bound when including the next mode prediction in our model. Finally, our analysis indicates that the performance of location prediction varies significantly with the chosen next travel mode by the individual. These results show potential for a more systematic consideration of additional dimensions of travel behaviour in human mobility prediction tasks. The source code of our model and experiments is available at https://github.com/mie-lab/location-mode-prediction. △ Less

Submitted 27 October, 2022; v1 submitted 8 October, 2022; originally announced October 2022.

Comments: updated main figure, 10 pages, camera ready SIGSPATIAL '22

arXiv:2204.09652 [pdf, other]

The TalkMoves Dataset: K-12 Mathematics Lesson Transcripts Annotated for Teacher and Student Discursive Moves

Authors: Abhijit Suresh, Jennifer Jacobs, Charis Harty, Margaret Perkoff, James H. Martin, Tamara Sumner

Abstract: Transcripts of teaching episodes can be effective tools to understand discourse patterns in classroom instruction. According to most educational experts, sustained classroom discourse is a critical component of equitable, engaging, and rich learning environments for students. This paper describes the TalkMoves dataset, composed of 567 human-annotated K-12 mathematics lesson transcripts (including… ▽ More Transcripts of teaching episodes can be effective tools to understand discourse patterns in classroom instruction. According to most educational experts, sustained classroom discourse is a critical component of equitable, engaging, and rich learning environments for students. This paper describes the TalkMoves dataset, composed of 567 human-annotated K-12 mathematics lesson transcripts (including entire lessons or portions of lessons) derived from video recordings. The set of transcripts primarily includes in-person lessons with whole-class discussions and/or small group work, as well as some online lessons. All of the transcripts are human-transcribed, segmented by the speaker (teacher or student), and annotated at the sentence level for ten discursive moves based on accountable talk theory. In addition, the transcripts include utterance-level information in the form of dialogue act labels based on the Switchboard Dialog Act Corpus. The dataset can be used by educators, policymakers, and researchers to understand the nature of teacher and student discourse in K-12 math classrooms. Portions of this dataset have been used to develop the TalkMoves application, which provides teachers with automated, immediate, and actionable feedback about their mathematics instruction. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: 9 pages, 2 figures, Accepted for a Poster + Demo presentation at the 13th International Conference on Language Resources and Evaluation 2022

arXiv:2203.17070 [pdf, other]

Traffic4cast at NeurIPS 2021 -- Temporal and Spatial Few-Shot Transfer Learning in Gridded Geo-Spatial Processes

Authors: Christian Eichenberger, Moritz Neun, Henry Martin, Pedro Herruzo, Markus Spanring, Yichao Lu, Sungbin Choi, Vsevolod Konyakhin, Nina Lukashina, Aleksei Shpilman, Nina Wiedemann, Martin Raubal, Bo Wang, Hai L. Vu, Reza Mohajerpoor, Chen Cai, Inhi Kim, Luca Hermes, Andrew Melnik, Riza Velioglu, Markus Vieth, Malte Schilling, Alabi Bojesomo, Hasan Al Marzouqi, Panos Liatsis , et al. (12 additional authors not shown)

Abstract: The IARAI Traffic4cast competitions at NeurIPS 2019 and 2020 showed that neural networks can successfully predict future traffic conditions 1 hour into the future on simply aggregated GPS probe data in time and space bins. We thus reinterpreted the challenge of forecasting traffic conditions as a movie completion task. U-Nets proved to be the winning architecture, demonstrating an ability to extra… ▽ More The IARAI Traffic4cast competitions at NeurIPS 2019 and 2020 showed that neural networks can successfully predict future traffic conditions 1 hour into the future on simply aggregated GPS probe data in time and space bins. We thus reinterpreted the challenge of forecasting traffic conditions as a movie completion task. U-Nets proved to be the winning architecture, demonstrating an ability to extract relevant features in this complex real-world geo-spatial process. Building on the previous competitions, Traffic4cast 2021 now focuses on the question of model robustness and generalizability across time and space. Moving from one city to an entirely different city, or moving from pre-COVID times to times after COVID hit the world thus introduces a clear domain shift. We thus, for the first time, release data featuring such domain shifts. The competition now covers ten cities over 2 years, providing data compiled from over 10^12 GPS probe data. Winning solutions captured traffic dynamics sufficiently well to even cope with these complex domain shifts. Surprisingly, this seemed to require only the previous 1h traffic dynamic history and static road graph as input. △ Less

Submitted 1 April, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: Pre-print under review, submitted to Proceedings of Machine Learning Research

arXiv:2203.00337 [pdf, other]

doi 10.1109/TRO.2022.3207095

MIRRAX: A Reconfigurable Robot for Limited Access Environments

Authors: Wei Cheah, Keir Groves, Horatio Martin, Harriet Peel, Simon Watson, Ognjen Marjanovic, Barry Lennox

Abstract: The development of mobile robot platforms for inspection has gained traction in recent years with the rapid advancement in hardware and software. However, conventional mobile robots are unable to address the challenge of operating in extreme environments where the robot is required to traverse narrow gaps in highly cluttered areas with restricted access. This paper presents MIRRAX, a robot that ha… ▽ More The development of mobile robot platforms for inspection has gained traction in recent years with the rapid advancement in hardware and software. However, conventional mobile robots are unable to address the challenge of operating in extreme environments where the robot is required to traverse narrow gaps in highly cluttered areas with restricted access. This paper presents MIRRAX, a robot that has been designed to meet these challenges with the capability of re-configuring itself to both access restricted environments through narrow ports and navigate through tightly spaced obstacles. Controllers for the robot are detailed, along with an analysis on the controllability of the robot given the use of Mecanum wheels in a variable configuration. Characterisation on the robot's performance identified suitable configurations for operating in narrow environments. The minimum lateral footprint width achievable for stable configuration ($<2^\text{o}$~roll) was 0.19~m. Experimental validation of the robot's controllability shows good agreement with the theoretical analysis. A further series of experiments shows the feasibility of the robot in addressing the challenges above: the capability to reconfigure itself for restricted entry through ports as small as 150mm diameter, and navigating through cluttered environments. The paper also presents results from a deployment in a Magnox facility at the Sellafield nuclear site in the UK - the first robot to ever do so, for remote inspection and mapping. △ Less

Submitted 3 October, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: 12 pages, Accepted for IEEE Transactions on Robotics

arXiv:2202.02842 [pdf, other]

Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

Authors: Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney

Abstract: Selecting suitable architecture parameters and training hyperparameters is essential for enhancing machine learning (ML) model performance. Several recent empirical studies conduct large-scale correlational analysis on neural networks (NNs) to search for effective \emph{generalization metrics} that can guide this type of model selection. Effective metrics are typically expected to correlate strong… ▽ More Selecting suitable architecture parameters and training hyperparameters is essential for enhancing machine learning (ML) model performance. Several recent empirical studies conduct large-scale correlational analysis on neural networks (NNs) to search for effective \emph{generalization metrics} that can guide this type of model selection. Effective metrics are typically expected to correlate strongly with test performance. In this paper, we expand on prior analyses by examining generalization-metric-based model selection with the following objectives: (i) focusing on natural language processing (NLP) tasks, as prior work primarily concentrates on computer vision (CV) tasks; (ii) considering metrics that directly predict \emph{test error} instead of the \emph{generalization gap}; (iii) exploring metrics that do not need access to data to compute. From these objectives, we are able to provide the first model selection results on large pretrained Transformers from Huggingface using generalization metrics. Our analyses consider (I) hundreds of Transformers trained in different settings, in which we systematically vary the amount of data, the model size and the optimization hyperparameters, (II) a total of 51 pretrained Transformers from eight families of Huggingface NLP models, including GPT2, BERT, etc., and (III) a total of 28 existing and novel generalization metrics. Despite their niche status, we find that metrics derived from the heavy-tail (HT) perspective are particularly useful in NLP tasks, exhibiting stronger correlations than other, more popular metrics. To further examine these metrics, we extend prior formulations relying on power law (PL) spectral distributions to exponential (EXP) and exponentially-truncated power law (E-TPL) families. △ Less

Submitted 4 June, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

Journal ref: Proceedings of the 29th ACM SIGKDD international conference on knowledge discovery and data mining (2023)

arXiv:2112.12582 [pdf]

Beyond Low Earth Orbit: Biological Research, Artificial Intelligence, and Self-Driving Labs

Authors: Lauren M. Sanders, Jason H. Yang, Ryan T. Scott, Amina Ann Qutub, Hector Garcia Martin, Daniel C. Berrios, Jaden J. A. Hastings, Jon Rask, Graham Mackintosh, Adrienne L. Hoarfrost, Stuart Chalk, John Kalantari, Kia Khezeli, Erik L. Antonsen, Joel Babdor, Richard Barker, Sergio E. Baranzini, Afshin Beheshti, Guillermo M. Delgado-Aparicio, Benjamin S. Glicksberg, Casey S. Greene, Melissa Haendel, Arif A. Hamid, Philip Heller, Daniel Jamieson , et al. (31 additional authors not shown)

Abstract: Space biology research aims to understand fundamental effects of spaceflight on organisms, develop foundational knowledge to support deep space exploration, and ultimately bioengineer spacecraft and habitats to stabilize the ecosystem of plants, crops, microbes, animals, and humans for sustained multi-planetary life. To advance these aims, the field leverages experiments, platforms, data, and mode… ▽ More Space biology research aims to understand fundamental effects of spaceflight on organisms, develop foundational knowledge to support deep space exploration, and ultimately bioengineer spacecraft and habitats to stabilize the ecosystem of plants, crops, microbes, animals, and humans for sustained multi-planetary life. To advance these aims, the field leverages experiments, platforms, data, and model organisms from both spaceborne and ground-analog studies. As research is extended beyond low Earth orbit, experiments and platforms must be maximally autonomous, light, agile, and intelligent to expedite knowledge discovery. Here we present a summary of recommendations from a workshop organized by the National Aeronautics and Space Administration on artificial intelligence, machine learning, and modeling applications which offer key solutions toward these space biology challenges. In the next decade, the synthesis of artificial intelligence into the field of space biology will deepen the biological understanding of spaceflight effects, facilitate predictive modeling and analytics, support maximally autonomous and reproducible experiments, and efficiently manage spaceborne data and metadata, all with the goal to enable life to thrive in deep space. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: 28 pages, 4 figures

arXiv:2112.12554 [pdf]

Beyond Low Earth Orbit: Biomonitoring, Artificial Intelligence, and Precision Space Health

Authors: Ryan T. Scott, Erik L. Antonsen, Lauren M. Sanders, Jaden J. A. Hastings, Seung-min Park, Graham Mackintosh, Robert J. Reynolds, Adrienne L. Hoarfrost, Aenor Sawyer, Casey S. Greene, Benjamin S. Glicksberg, Corey A. Theriot, Daniel C. Berrios, Jack Miller, Joel Babdor, Richard Barker, Sergio E. Baranzini, Afshin Beheshti, Stuart Chalk, Guillermo M. Delgado-Aparicio, Melissa Haendel, Arif A. Hamid, Philip Heller, Daniel Jamieson, Katelyn J. Jarvis , et al. (31 additional authors not shown)

Abstract: Human space exploration beyond low Earth orbit will involve missions of significant distance and duration. To effectively mitigate myriad space health hazards, paradigm shifts in data and space health systems are necessary to enable Earth-independence, rather than Earth-reliance. Promising developments in the fields of artificial intelligence and machine learning for biology and health can address… ▽ More Human space exploration beyond low Earth orbit will involve missions of significant distance and duration. To effectively mitigate myriad space health hazards, paradigm shifts in data and space health systems are necessary to enable Earth-independence, rather than Earth-reliance. Promising developments in the fields of artificial intelligence and machine learning for biology and health can address these needs. We propose an appropriately autonomous and intelligent Precision Space Health system that will monitor, aggregate, and assess biomedical statuses; analyze and predict personalized adverse health outcomes; adapt and respond to newly accumulated data; and provide preventive, actionable, and timely insights to individual deep space crew members and iterative decision support to their crew medical officer. Here we present a summary of recommendations from a workshop organized by the National Aeronautics and Space Administration, on future applications of artificial intelligence in space biology and health. In the next decade, biomonitoring technology, biomarker science, spacecraft hardware, intelligent software, and streamlined data management must mature and be woven together into a Precision Space Health system to enable humanity to thrive in deep space. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: 31 pages, 4 figures

arXiv:2111.13786 [pdf, other]

Learning from learning machines: a new generation of AI technology to meet the needs of science

Authors: Luca Pion-Tonachini, Kristofer Bouchard, Hector Garcia Martin, Sean Peisert, W. Bradley Holtz, Anil Aswani, Dipankar Dwivedi, Haruko Wainwright, Ghanshyam Pilania, Benjamin Nachman, Babetta L. Marrone, Nicola Falco, Prabhat, Daniel Arnold, Alejandro Wolf-Yadlin, Sarah Powers, Sharlee Climer, Quinn Jackson, Ty Carlson, Michael Sohn, Petrus Zwart, Neeraj Kumar, Amy Justice, Claire Tomlin, Daniel Jacobson , et al. (11 additional authors not shown)

Abstract: We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery. The distinct goals of AI for industry versus the goals of AI for science create tension between identifying patterns in data versus discovering patterns in the world from data. If we address the fundamental challenges associated with "bridging the gap" between domain-driven scientific models and… ▽ More We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery. The distinct goals of AI for industry versus the goals of AI for science create tension between identifying patterns in data versus discovering patterns in the world from data. If we address the fundamental challenges associated with "bridging the gap" between domain-driven scientific models and data-driven AI learning machines, then we expect that these AI models can transform hypothesis generation, scientific discovery, and the scientific process itself. △ Less

Submitted 26 November, 2021; originally announced November 2021.

arXiv:2109.14737 [pdf, other]

Unlocking the potential of deep learning for marine ecology: overview, applications, and outlook

Authors: Morten Goodwin, Kim Tallaksen Halvorsen, Lei Jiao, Kristian Muri Knausgård, Angela Helen Martin, Marta Moyano, Rebekah A. Oomen, Jeppe Have Rasmussen, Tonje Knutsen Sørdalen, Susanna Huneide Thorbjørnsen

Abstract: The deep learning revolution is touching all scientific disciplines and corners of our lives as a means of harnessing the power of big data. Marine ecology is no exception. These new methods provide analysis of data from sensors, cameras, and acoustic recorders, even in real time, in ways that are reproducible and rapid. Off-the-shelf algorithms can find, count, and classify species from digital i… ▽ More The deep learning revolution is touching all scientific disciplines and corners of our lives as a means of harnessing the power of big data. Marine ecology is no exception. These new methods provide analysis of data from sensors, cameras, and acoustic recorders, even in real time, in ways that are reproducible and rapid. Off-the-shelf algorithms can find, count, and classify species from digital images or video and detect cryptic patterns in noisy data. Using these opportunities requires collaboration across ecological and data science disciplines, which can be challenging to initiate. To facilitate these collaborations and promote the use of deep learning towards ecosystem-based management of the sea, this paper aims to bridge the gap between marine ecologists and computer scientists. We provide insight into popular deep learning approaches for ecological data analysis in plain language, focusing on the techniques of supervised learning with deep neural networks, and illustrate challenges and opportunities through established and emerging applications of deep learning to marine ecology. We use established and future-looking case studies on plankton, fishes, marine mammals, pollution, and nutrient cycling that involve object detection, classification, tracking, and segmentation of visualized data. We conclude with a broad outlook of the field's opportunities and challenges, including potential technological advances and issues with managing complex data sets. △ Less

Submitted 29 September, 2021; originally announced September 2021.

Comments: 44 pages, 4 figures

arXiv:2106.00734 [pdf, other]

Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics

Authors: Charles H. Martin, Michael W. Mahoney

Abstract: To understand better good generalization performance in state-of-the-art neural network (NN) models, and in particular the success of the ALPHAHAT metric based on Heavy-Tailed Self-Regularization (HT-SR) theory, we analyze of a corpus of models that was made publicly-available for a contest to predict the generalization accuracy of NNs. These models include a wide range of qualities and were train… ▽ More To understand better good generalization performance in state-of-the-art neural network (NN) models, and in particular the success of the ALPHAHAT metric based on Heavy-Tailed Self-Regularization (HT-SR) theory, we analyze of a corpus of models that was made publicly-available for a contest to predict the generalization accuracy of NNs. These models include a wide range of qualities and were trained with a range of architectures and regularization hyperparameters. We break ALPHAHAT into its two subcomponent metrics: a scale-based metric; and a shape-based metric. We identify what amounts to a Simpson's paradox: where "scale" metrics (from traditional statistical learning theory) perform well in aggregate, but can perform poorly on subpartitions of the data of a given depth, when regularization hyperparameters are varied; and where "shape" metrics (from HT-SR theory) perform well on each subpartition of the data, when hyperparameters are varied for models of a given depth, but can perform poorly overall when models with varying depths are aggregated. Our results highlight the subtlety of comparing models when both architectures and hyperparameters are varied; the complementary role of implicit scale versus implicit shape parameters in understanding NN model quality; and the need to go beyond one-size-fits-all metrics based on upper bounds from generalization theory to describe the performance of NN models. Our results also clarify further why the ALPHAHAT metric from HT-SR theory works so well at predicting generalization across a broad range of CV and NLP models. △ Less

Submitted 8 February, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: 21 pages; 9 figures; 6 tables

arXiv:2105.07949 [pdf, other]

Using Transformers to Provide Teachers with Personalized Feedback on their Classroom Discourse: The TalkMoves Application

Authors: Abhijit Suresh, Jennifer Jacobs, Vivian Lai, Chenhao Tan, Wayne Ward, James H. Martin, Tamara Sumner

Abstract: TalkMoves is an innovative application designed to support K-12 mathematics teachers to reflect on, and continuously improve their instructional practices. This application combines state-of-the-art natural language processing capabilities with automated speech recognition to automatically analyze classroom recordings and provide teachers with personalized feedback on their use of specific types o… ▽ More TalkMoves is an innovative application designed to support K-12 mathematics teachers to reflect on, and continuously improve their instructional practices. This application combines state-of-the-art natural language processing capabilities with automated speech recognition to automatically analyze classroom recordings and provide teachers with personalized feedback on their use of specific types of discourse aimed at broadening and deepening classroom conversations about mathematics. These specific discourse strategies are referred to as "talk moves" within the mathematics education community and prior research has documented the ways in which systematic use of these discourse strategies can positively impact student engagement and learning. In this article, we describe the TalkMoves application's cloud-based infrastructure for managing and processing classroom recordings, and its interface for providing teachers with feedback on their use of talk moves during individual teaching episodes. We present the series of model architectures we developed, and the studies we conducted, to develop our best-performing, transformer-based model (F1 = 79.3%). We also discuss several technical challenges that need to be addressed when working with real-world speech and language data from noisy K-12 classrooms. △ Less

Submitted 29 April, 2021; originally announced May 2021.

Comments: Presented at the AAAI 2021 Spring Symposium on Artificial Intelligence for K-12 Education

arXiv:2103.06170 [pdf, other]

Full-Resilient Memory-Optimum Multi-Party Non-Interactive Key Exchange

Authors: Majid Salimi, Hamid Mala, Honorio Martin, Pedro Peris-Lopez

Abstract: Multi-Party Non-Interactive Key Exchange (MP-NIKE) is a fundamental cryptographic primitive in which users register into a key generation centre and receive a public/private key pair each. After that, any subset of these users can compute a shared key without any interaction. Nowadays, IoT devices suffer from a high number and large size of messages exchanged in the Key Management Protocol (KMP).… ▽ More Multi-Party Non-Interactive Key Exchange (MP-NIKE) is a fundamental cryptographic primitive in which users register into a key generation centre and receive a public/private key pair each. After that, any subset of these users can compute a shared key without any interaction. Nowadays, IoT devices suffer from a high number and large size of messages exchanged in the Key Management Protocol (KMP). To overcome this, an MP-NIKE scheme can eliminate the airtime and latency of messages transferred between IoT devices. MP-NIKE schemes can be realized by using multilinear maps. There are several attempts for constructing multilinear maps based on indistinguishable obfuscation, lattices and the Chinese Remainder Theorem (CRT). Nevertheless, these schemes are inefficient in terms of computation cost and memory overhead. Besides, several attacks have been recently reported against CRT-based and lattice-based multilinear maps. There is only one modular exponentiation-based MP-NIKE scheme in the literature which has been claimed to be both secure and efficient. In this article, we present an attack on this scheme based on the Euclidean algorithm, in which two colluding users can obtain the shared key of any arbitrary subgroup of users. We also propose an efficient and secure MP-NIKE scheme. We show how our proposal is secure in the random oracle model assuming the hardness of the root extraction modulo a composite number. △ Less

Submitted 10 March, 2021; originally announced March 2021.

Comments: 14 pages, 3 figures

ACM Class: E.3

arXiv:2102.09600

Within-Document Event Coreference with BERT-Based Contextualized Representations

Authors: Shafiuddin Rehan Ahmed, James H. Martin

Abstract: Event coreference continues to be a challenging problem in information extraction. With the absence of any external knowledge bases for events, coreference becomes a clustering task that relies on effective representations of the context in which event mentions appear. Recent advances in contextualized language representations have proven successful in many tasks, however, their use in event linki… ▽ More Event coreference continues to be a challenging problem in information extraction. With the absence of any external knowledge bases for events, coreference becomes a clustering task that relies on effective representations of the context in which event mentions appear. Recent advances in contextualized language representations have proven successful in many tasks, however, their use in event linking been limited. Here we present a three part approach that (1) uses representations derived from a pretrained BERT model to (2) train a neural classifier to (3) drive a simple clustering algorithm to create coreference chains. We achieve state of the art results with this model on two standard datasets for within-document event coreference task and establish a new standard on a third newer dataset. △ Less

Submitted 6 April, 2024; v1 submitted 15 February, 2021; originally announced February 2021.

Comments: No longer useful work

arXiv:2009.11726 [pdf]

doi 10.1109/ICCVE45908.2019.8965234

Evaluation of an indoor localization system for a mobile robot

Authors: Victor J. Exposito Jimenez, Christian Schwarzl, Helmut Martin

Abstract: Although indoor localization has been a wide researched topic, obtained results may not fit the requirements that some domains need. Most approaches are not able to precisely localize a fast moving object even with a complex installation, which makes their implementation in the automated driving domain complicated. In this publication, common technologies were analyzed and a commercial product, ca… ▽ More Although indoor localization has been a wide researched topic, obtained results may not fit the requirements that some domains need. Most approaches are not able to precisely localize a fast moving object even with a complex installation, which makes their implementation in the automated driving domain complicated. In this publication, common technologies were analyzed and a commercial product, called Marvelmind Indoor GPS, was chosen for our use case in which both ultrasound and radio frequency communications are used. The evaluation is given in a first moment on small indoor scenarios with static and moving objects. Further tests were done on wider areas, where the system is integrated within our Robotics Operating System (ROS)-based self-developed 'Smart PhysIcal Demonstration and evaluation Robot (SPIDER)' and the results of these outdoor tests are compared with the obtained localization by the installed GPS on the robot. Finally, the next steps to improve the results in further developments are discussed. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Journal ref: 2019 IEEE International Conference on Connected Vehicles and Expo (ICCVE)

arXiv:2002.06716 [pdf, other]

doi 10.1038/s41467-021-24025-8

Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data

Authors: Charles H. Martin, Tongsu, Peng, Michael W. Mahoney

Abstract: In many applications, one works with neural network models trained by someone else. For such pretrained models, one may not have access to training data or test data. Moreover, one may not know details about the model, e.g., the specifics of the training data, the loss function, the hyperparameter values, etc. Given one or many pretrained models, it is a challenge to say anything about the expecte… ▽ More In many applications, one works with neural network models trained by someone else. For such pretrained models, one may not have access to training data or test data. Moreover, one may not know details about the model, e.g., the specifics of the training data, the loss function, the hyperparameter values, etc. Given one or many pretrained models, it is a challenge to say anything about the expected performance or quality of the models. Here, we address this challenge by providing a detailed meta-analysis of hundreds of publicly-available pretrained models. We examine norm based capacity control metrics as well as power law based metrics from the recently-developed Theory of Heavy-Tailed Self Regularization. We find that norm based metrics correlate well with reported test accuracies for well-trained models, but that they often cannot distinguish well-trained versus poorly-trained models. We also find that power law based metrics can do much better -- quantitatively better at discriminating among series of well-trained models with a given architecture; and qualitatively better at discriminating well-trained versus poorly-trained models. These methods can be used to identify when a pretrained neural network has problems that cannot be detected simply by examining training/test accuracies. △ Less

Submitted 2 June, 2021; v1 submitted 16 February, 2020; originally announced February 2020.

Comments: 35 pages, 8 tables, 17 figures. To appear in Nature Communications

arXiv:1910.13824 [pdf, other]

doi 10.3929/ethz-b-000388707

Traffic4cast-Traffic Map Movie Forecasting -- Team MIE-Lab

Authors: Henry Martin, Ye Hong, Dominik Bucher, Christian Rupprecht, René Buffat

Abstract: The goal of the IARAI competition traffic4cast was to predict the city-wide traffic status within a 15-minute time window, based on information from the previous hour. The traffic status was given as multi-channel images (one pixel roughly corresponds to 100x100 meters), where one channel indicated the traffic volume, another one the average speed of vehicles, and a third one their rough heading.… ▽ More The goal of the IARAI competition traffic4cast was to predict the city-wide traffic status within a 15-minute time window, based on information from the previous hour. The traffic status was given as multi-channel images (one pixel roughly corresponds to 100x100 meters), where one channel indicated the traffic volume, another one the average speed of vehicles, and a third one their rough heading. As part of our work on the competition, we evaluated many different network architectures, analyzed the statistical properties of the given data in detail, and thought about how to transform the problem to be able to take additional spatio-temporal context-information into account, such as the street network, the positions of traffic lights, or the weather. This document summarizes our efforts that led to our best submission, and gives some insights about which other approaches we evaluated, and why they did not work as well as imagined. △ Less

Submitted 21 November, 2019; v1 submitted 27 October, 2019; originally announced October 2019.

arXiv:1906.03018 [pdf, other]

Learning Software Configuration Spaces: A Systematic Literature Review

Authors: Juliana Alves Pereira, Hugo Martin, Mathieu Acher, Jean-Marc Jézéquel, Goetz Botterweck, Anthony Ventresque

Abstract: Most modern software systems (operating systems like Linux or Android, Web browsers like Firefox or Chrome, video encoders like ffmpeg, x264 or VLC, mobile and cloud applications, etc.) are highly-configurable. Hundreds of configuration options, features, or plugins can be combined, each potentially with distinct functionality and effects on execution time, security, energy consumption, etc. Due t… ▽ More Most modern software systems (operating systems like Linux or Android, Web browsers like Firefox or Chrome, video encoders like ffmpeg, x264 or VLC, mobile and cloud applications, etc.) are highly-configurable. Hundreds of configuration options, features, or plugins can be combined, each potentially with distinct functionality and effects on execution time, security, energy consumption, etc. Due to the combinatorial explosion and the cost of executing software, it is quickly impossible to exhaustively explore the whole configuration space. Hence, numerous works have investigated the idea of learning it from a small sample of configurations' measurements. The pattern "sampling, measuring, learning" has emerged in the literature, with several practical interests for both software developers and end-users of configurable systems. In this survey, we report on the different application objectives (e.g., performance prediction, configuration optimization, constraint mining), use-cases, targeted software systems and application domains. We review the various strategies employed to gather a representative and cost-effective sample. We describe automated software techniques used to measure functional and non-functional properties of configurations. We classify machine learning algorithms and how they relate to the pursued application. Finally, we also describe how researchers evaluate the quality of the learning process. The findings from this systematic review show that the potential application objective is important; there are a vast number of case studies reported in the literature from the basis of several domains and software systems. Yet, the huge variant space of configurable systems is still challenging and calls to further investigate the synergies between artificial intelligence and software engineering. △ Less

Submitted 7 June, 2019; originally announced June 2019.

arXiv:1901.08278 [pdf, other]

Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks

Authors: Charles H. Martin, Michael W. Mahoney

Abstract: Given two or more Deep Neural Networks (DNNs) with the same or similar architectures, and trained on the same dataset, but trained with different solvers, parameters, hyper-parameters, regularization, etc., can we predict which DNN will have the best test accuracy, and can we do so without peeking at the test data? In this paper, we show how to use a new Theory of Heavy-Tailed Self-Regularization… ▽ More Given two or more Deep Neural Networks (DNNs) with the same or similar architectures, and trained on the same dataset, but trained with different solvers, parameters, hyper-parameters, regularization, etc., can we predict which DNN will have the best test accuracy, and can we do so without peeking at the test data? In this paper, we show how to use a new Theory of Heavy-Tailed Self-Regularization (HT-SR) to answer this. HT-SR suggests, among other things, that modern DNNs exhibit what we call Heavy-Tailed Mechanistic Universality (HT-MU), meaning that the correlations in the layer weight matrices can be fit to a power law (PL) with exponents that lie in common Universality classes from Heavy-Tailed Random Matrix Theory (HT-RMT). From this, we develop a Universal capacity control metric that is a weighted average of PL exponents. Rather than considering small toy NNs, we examine over 50 different, large-scale pre-trained DNNs, ranging over 15 different architectures, trained on ImagetNet, each of which has been reported to have different test accuracies. We show that this new capacity metric correlates very well with the reported test accuracies of these DNNs, looking across each architecture (VGG16/.../VGG19, ResNet10/.../ResNet152, etc.). We also show how to approximate the metric by the more familiar Product Norm capacity measure, as the average of the log Frobenius norm of the layer weight matrices. Our approach requires no changes to the underlying DNN or its loss function, it does not require us to train a model (although it could be used to monitor training), and it does not even require access to the ImageNet data. △ Less

Submitted 26 January, 2020; v1 submitted 24 January, 2019; originally announced January 2019.

Comments: Updated as will appear in SDM20

arXiv:1901.08276 [pdf, other]

Traditional and Heavy-Tailed Self Regularization in Neural Network Models

Authors: Charles H. Martin, Michael W. Mahoney

Abstract: Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the empirical spectral density (ESD) of DNN layer matrices displays signature… ▽ More Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of regularization, such as Dropout or Weight Norm constraints. Building on recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify \emph{5+1 Phases of Training}, corresponding to increasing amounts of \emph{Implicit Self-Regularization}. For smaller and/or older DNNs, this Implicit Self-Regularization is like traditional Tikhonov regularization, in that there is a `size scale' separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of \emph{Heavy-Tailed Self-Regularization}, similar to the self-organization seen in the statistical physics of disordered systems. This implicit Self-Regularization can depend strongly on the many knobs of the training process. By exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size. △ Less

Submitted 24 January, 2019; originally announced January 2019.

Comments: Very abridged version of arXiv:1810.01075

arXiv:1810.01075 [pdf, other]

Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning

Authors: Charles H. Martin, Michael W. Mahoney

Abstract: Random Matrix Theory (RMT) is applied to analyze weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of Self-Regularizati… ▽ More Random Matrix Theory (RMT) is applied to analyze weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of Self-Regularization. The empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of Implicit Self-Regularization. These phases can be observed during the training process as well as in the final learned DNNs. For smaller and/or older DNNs, this Implicit Self-Regularization is like traditional Tikhonov regularization, in that there is a "size scale" separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of Heavy-Tailed Self-Regularization, similar to the self-organization seen in the statistical physics of disordered systems. This results from correlations arising at all size scales, which arises implicitly due to the training process itself. This implicit Self-Regularization can depend strongly on the many knobs of the training process. By exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size. This demonstrates that---all else being equal---DNN optimization with larger batch sizes leads to less-well implicitly-regularized models, and it provides an explanation for the generalization gap phenomena. △ Less

Submitted 2 October, 2018; originally announced October 2018.

Comments: 59 pages, 31 figures

arXiv:1710.09553 [pdf, other]

Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

Authors: Charles H. Martin, Michael W. Mahoney

Abstract: We describe an approach to understand the peculiar and counterintuitive generalization properties of deep neural networks. The approach involves going beyond worst-case theoretical capacity control frameworks that have been popular in machine learning in recent years to revisit old ideas in the statistical mechanics of neural networks. Within this approach, we present a prototypical Very Simple De… ▽ More We describe an approach to understand the peculiar and counterintuitive generalization properties of deep neural networks. The approach involves going beyond worst-case theoretical capacity control frameworks that have been popular in machine learning in recent years to revisit old ideas in the statistical mechanics of neural networks. Within this approach, we present a prototypical Very Simple Deep Learning (VSDL) model, whose behavior is controlled by two control parameters, one describing an effective amount of data, or load, on the network (that decreases when noise is added to the input), and one with an effective temperature interpretation (that increases when algorithms are early stopped). Using this model, we describe how a very simple application of ideas from the statistical mechanics theory of generalization provides a strong qualitative description of recently-observed empirical results regarding the inability of deep neural networks not to overfit training data, discontinuous learning and sharp transitions in the generalization properties of learning algorithms, etc. △ Less

Submitted 17 February, 2019; v1 submitted 26 October, 2017; originally announced October 2017.

Comments: 31 pages; added brief discussion of recent papers that use/extend these ideas

arXiv:1608.05045 [pdf, other]

Large Angle based Skeleton Extraction for 3D Animation

Authors: Hugo Martin, Raphael Fernandez, Yong Khoo

Abstract: In this paper, we present a solution for arbitrary 3D character deformation by investigating rotation angle of decomposition and preserving the mesh topology structure. In computer graphics, skeleton extraction and skeleton-driven animation is an active areas and gains increasing interests from researchers. The accuracy is critical for realistic animation and related applications. There have been… ▽ More In this paper, we present a solution for arbitrary 3D character deformation by investigating rotation angle of decomposition and preserving the mesh topology structure. In computer graphics, skeleton extraction and skeleton-driven animation is an active areas and gains increasing interests from researchers. The accuracy is critical for realistic animation and related applications. There have been extensive studies on skeleton based 3D deformation. However for the scenarios of large angle rotation of different body parts, it has been relatively less addressed by the state-of-the-art, which often yield unsatisfactory results. Besides 3D animation problems, we also notice for many 3D skeleton detection or tracking applications from a video or depth streams, large angle rotation is also a critical factor in the regression accuracy and robustness. We introduced a distortion metric function to quantify the surface curviness before and after deformation, which is a major clue for large angle rotation detection. The intensive experimental results show that our method is suitable for 3D modeling, animation, skeleton based tracking applications. △ Less

Submitted 17 August, 2016; originally announced August 2016.

arXiv:1603.02028 [pdf, other]

Adaptive Visualisation System for Construction Building Information Models Using Saliency

Authors: Hugo Martin, Sylvain Chevallier, Eric Monacelli

Abstract: Building Information Modeling (BIM) is a recent construction process based on a 3D model, containing every component related to the building achievement. Architects, structure engineers, method engineers, and others participant to the building process work on this model through the design-to-construction cycle. The high complexity and the large amount of information included in these models raise… ▽ More Building Information Modeling (BIM) is a recent construction process based on a 3D model, containing every component related to the building achievement. Architects, structure engineers, method engineers, and others participant to the building process work on this model through the design-to-construction cycle. The high complexity and the large amount of information included in these models raise several issues, delaying its wide adoption in the industrial world. One of the most important is the visualization: professionals have difficulties to find out the relevant information for their job. Actual solutions suffer from two limitations: the BIM models information are processed manually and insignificant information are simply hidden, leading to inconsistencies in the building model. This paper describes a system relying on an ontological representation of the building information to label automatically the building elements. Depending on the user's department, the visualization is modified according to these labels by automatically adjusting the colors and image properties based on a saliency model. The proposed saliency model incorporates several adaptations to fit the specificities of architectural images. △ Less

Submitted 7 March, 2016; originally announced March 2016.

Comments: 10 pages, 5 figures, to be submitted

arXiv:1108.5405 [pdf, other]

Solving Hard Computational Problems Efficiently: Asymptotic Parametric Complexity 3-Coloring Algorithm

Authors: H. Jose Antonio Martin

Abstract: Many practical problems in almost all scientific and technological disciplines have been classified as computationally hard (NP-hard or even NP-complete). In life sciences, combinatorial optimization problems frequently arise in molecular biology, e.g., genome sequencing; global alignment of multiple genomes; identifying siblings or discovery of dysregulated pathways.In almost all of these problem… ▽ More Many practical problems in almost all scientific and technological disciplines have been classified as computationally hard (NP-hard or even NP-complete). In life sciences, combinatorial optimization problems frequently arise in molecular biology, e.g., genome sequencing; global alignment of multiple genomes; identifying siblings or discovery of dysregulated pathways.In almost all of these problems, there is the need for proving a hypothesis about certain property of an object that can be present only when it adopts some particular admissible structure (an NP-certificate) or be absent (no admissible structure), however, none of the standard approaches can discard the hypothesis when no solution can be found, since none can provide a proof that there is no admissible structure. This article presents an algorithm that introduces a novel type of solution method to "efficiently" solve the graph 3-coloring problem; an NP-complete problem. The proposed method provides certificates (proofs) in both cases: present or absent, so it is possible to accept or reject the hypothesis on the basis of a rigorous proof. It provides exact solutions and is polynomial-time (i.e., efficient) however parametric. The only requirement is sufficient computational power, which is controlled by the parameter $α\in\mathbb{N}$. Nevertheless, here it is proved that the probability of requiring a value of $α>k$ to obtain a solution for a random graph decreases exponentially: $P(α>k) \leq 2^{-(k+1)}$, making tractable almost all problem instances. Thorough experimental analyses were performed. The algorithm was tested on random graphs, planar graphs and 4-regular planar graphs. The obtained experimental results are in accordance with the theoretical expected results. △ Less

Submitted 22 September, 2012; v1 submitted 26 August, 2011; originally announced August 2011.

Comments: Working paper

ACM Class: F.2.0

Showing 1–41 of 41 results for author: Martin, H