Search | arXiv e-print repository

Sapiens: Foundation for Human Vision Models

Authors: Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez, Su Zhaoen, Austin James, Peter Selednik, Stuart Anderson, Shunsuke Saito

Abstract: We present Sapiens, a family of models for four fundamental human-centric vision tasks -- 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images. We observe that, give… ▽ More We present Sapiens, a family of models for four fundamental human-centric vision tasks -- 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images. We observe that, given the same computational budget, self-supervised pretraining on a curated dataset of human images significantly boosts the performance for a diverse set of human-centric tasks. The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic. Our simple model design also brings scalability -- model performance across tasks improves as we scale the number of parameters from 0.3 to 2 billion. Sapiens consistently surpasses existing baselines across various human-centric benchmarks. We achieve significant improvements over the prior state-of-the-art on Humans-5K (pose) by 7.6 mAP, Humans-2K (part-seg) by 17.1 mIoU, Hi4D (depth) by 22.4% relative RMSE, and THuman2 (normal) by 53.5% relative angular error. Project page: https://about.meta.com/realitylabs/codecavatars/sapiens. △ Less

Submitted 26 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

Comments: ECCV 2024 (Oral)

arXiv:2407.07753 [pdf, other]

Quantum CSS Duadic and Triadic Codes: New Insights and Properties

Authors: Reza Dastbasteh, Olatz Sanz Larrarte, Josu Etxezarreta Martinez, Antonio deMarti iOlius, Javier Oliva del Moral, Pedro Crespo Bofill

Abstract: In this study, we investigate the construction of quantum CSS duadic codes with dimensions greater than one. We introduce a method for extending smaller splittings of quantum duadic codes to create larger, potentially degenerate quantum duadic codes. Furthermore, we present a technique for computing or bounding the minimum distances of quantum codes constructed through this approach. Additionally,… ▽ More In this study, we investigate the construction of quantum CSS duadic codes with dimensions greater than one. We introduce a method for extending smaller splittings of quantum duadic codes to create larger, potentially degenerate quantum duadic codes. Furthermore, we present a technique for computing or bounding the minimum distances of quantum codes constructed through this approach. Additionally, we introduce quantum CSS triadic codes, a family of quantum codes with a rate of at least $\frac{1}{3}$. △ Less

Submitted 10 July, 2024; originally announced July 2024.

MSC Class: 94B05; 94B15

arXiv:2406.13264 [pdf, other]

Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks

Authors: Michael Wornow, Avanika Narayan, Ben Viggiano, Ishan S. Khare, Tathagat Verma, Tibor Thompson, Miguel Angel Fuentes Hernandez, Sudharsan Sundar, Chloe Trujillo, Krrish Chawla, Rongfei Lu, Justin Shen, Divya Nagaraj, Joshua Martinez, Vardhan Agrawal, Althea Hudson, Nigam H. Shah, Christopher Re

Abstract: Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task - full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This f… ▽ More Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task - full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This focus on automation ignores the reality of how most BPM tools are applied today - simply documenting the relevant workflow takes 60% of the time of the typical process optimization project. To address this gap we present WONDERBREAD, the first benchmark for evaluating multimodal FMs on BPM tasks beyond automation. Our contributions are: (1) a dataset containing 2928 documented workflow demonstrations; (2) 6 novel BPM tasks sourced from real-world applications ranging from workflow documentation to knowledge transfer to process improvement; and (3) an automated evaluation harness. Our benchmark shows that while state-of-the-art FMs can automatically generate documentation (e.g. recalling 88% of the steps taken in a video demonstration of a workflow), they struggle to re-apply that knowledge towards finer-grained validation of workflow completion (F1 < 0.3). We hope WONDERBREAD encourages the development of more "human-centered" AI tooling for enterprise applications and furthers the exploration of multimodal FMs for the broader universe of BPM tasks. We publish our dataset and experiments here: https://github.com/HazyResearch/wonderbread △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.00166 [pdf, other]

On complexity of colloid cellular automata

Authors: Andrew Adamatzky, Nic Roberts, Raphael Fortulan, Noushin Raeisi Kheirabadi, Panagiotis Mougkogiannis, Michail-Antisthenis Tsompanas, Genaro J. Martinez, Georgios Ch. Sirakoulis, Alessandro Chiolerio

Abstract: The colloid cellular automata do not imitate the physical structure of colloids but are governed by logical functions derived from the colloids. We analyse the space-time complexity of Boolean circuits derived from the electrical responses of colloids: ZnO (zinc oxide, an inorganic compound also known as calamine or zinc white, which naturally occurs as the mineral zincite), proteinoids (microsphe… ▽ More The colloid cellular automata do not imitate the physical structure of colloids but are governed by logical functions derived from the colloids. We analyse the space-time complexity of Boolean circuits derived from the electrical responses of colloids: ZnO (zinc oxide, an inorganic compound also known as calamine or zinc white, which naturally occurs as the mineral zincite), proteinoids (microspheres and crystals of thermal abiotic proteins), and combinations thereof to electrical stimulation. To extract Boolean circuits from colloids, we send all possible configurations of two-, four-, and eight-bit binary strings, encoded as electrical potential values, to the colloids, record their responses, and thereby infer the Boolean functions they implement. We map the discovered functions onto the cell-state transition rules of cellular automata (arrays of binary state machines that update their states synchronously according to the same rule) -- the colloid cellular automata. We then analyse the phenomenology of the space-time configurations of the automata and evaluate their complexity using measures such as compressibility, Shannon entropy, Simpson diversity, and expressivity. A hierarchy of phenomenological and measurable space-time complexity is constructed. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.20204 [pdf, other]

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Authors: Andreas Koukounas, Georgios Mastrapas, Michael Günther, Bo Wang, Scott Martens, Isabelle Mohr, Saba Sturua, Mohammad Kalim Akram, Joan Fontanals Martínez, Saahil Ognawala, Susana Guzman, Maximilian Werk, Nan Wang, Han Xiao

Abstract: Contrastive Language-Image Pretraining (CLIP) is widely used to train models to align images and texts in a common embedding space by mapping them to fixed-sized vectors. These models are key to multimodal information retrieval and related tasks. However, CLIP models generally underperform in text-only tasks compared to specialized text models. This creates inefficiencies for information retrieval… ▽ More Contrastive Language-Image Pretraining (CLIP) is widely used to train models to align images and texts in a common embedding space by mapping them to fixed-sized vectors. These models are key to multimodal information retrieval and related tasks. However, CLIP models generally underperform in text-only tasks compared to specialized text models. This creates inefficiencies for information retrieval systems that keep separate embeddings and models for text-only and multimodal tasks. We propose a novel, multi-task contrastive training method to address this issue, which we use to train the jina-clip-v1 model to achieve the state-of-the-art performance on both text-image and text-text retrieval tasks. △ Less

Submitted 26 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: 4 pages, MFM-EAI@ICML2024

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2405.18350 [pdf, other]

doi 10.1109/ACCESS.2019.2937505

A System for Automatic English Text Expansion

Authors: Silvia García Méndez, Milagros Fernández Gavilanes, Enrique Costa Montenegro, Jonathan Juncal Martínez, Francisco Javier González Castaño, Ehud Reiter

Abstract: We present an automatic text expansion system to generate English sentences, which performs automatic Natural Language Generation (NLG) by combining linguistic rules with statistical approaches. Here, "automatic" means that the system can generate coherent and correct sentences from a minimum set of words. From its inception, the design is modular and adaptable to other languages. This adaptabilit… ▽ More We present an automatic text expansion system to generate English sentences, which performs automatic Natural Language Generation (NLG) by combining linguistic rules with statistical approaches. Here, "automatic" means that the system can generate coherent and correct sentences from a minimum set of words. From its inception, the design is modular and adaptable to other languages. This adaptability is one of its greatest advantages. For English, we have created the highly precise aLexiE lexicon with wide coverage, which represents a contribution on its own. We have evaluated the resulting NLG library in an Augmentative and Alternative Communication (AAC) proof of concept, both directly (by regenerating corpus sentences) and manually (from annotations) using a popular corpus in the NLG field. We performed a second analysis by comparing the quality of text expansion in English to Spanish, using an ad-hoc Spanish-English parallel corpus. The system might also be applied to other domains such as report and news generation. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Journal ref: (2019) IEEE Access, 7, 123320-123333

arXiv:2405.09546 [pdf, other]

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Authors: Yunhao Ge, Yihe Tang, Jiashu Xu, Cem Gokmen, Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti, Yunzhu Li, Roberto Martín-Martín, Miao Liu, Pengchuan Zhang, Ruohan Zhang, Li Fei-Fei, Jiajun Wu

Abstract: The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and renderin… ▽ More The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and rendering quality, limited diversity, and unrealistic physical properties. We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models, based on the newly developed embodied AI benchmark, BEHAVIOR-1K. BVS supports a large number of adjustable parameters at the scene level (e.g., lighting, object placement), the object level (e.g., joint configuration, attributes such as "filled" and "folded"), and the camera level (e.g., field of view, focal length). Researchers can arbitrarily vary these parameters during data generation to perform controlled experiments. We showcase three example application scenarios: systematically evaluating the robustness of models across different continuous axes of domain shift, evaluating scene understanding models on the same set of images, and training and evaluating simulation-to-real transfer for a novel vision task: unary and binary state prediction. Project website: https://behavior-vision-suite.github.io/ △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: CVPR 2024 (Highlight). Project website: https://behavior-vision-suite.github.io/

arXiv:2403.14291 [pdf, other]

Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models

Authors: Pablo Marcos-Manchón, Roberto Alcover-Couso, Juan C. SanMiguel, Jose M. Martínez

Abstract: Diffusion models represent a new paradigm in text-to-image generation. Beyond generating high-quality images from text prompts, models such as Stable Diffusion have been successfully extended to the joint generation of semantic segmentation pseudo-masks. However, current extensions primarily rely on extracting attentions linked to prompt words used for image synthesis. This approach limits the gen… ▽ More Diffusion models represent a new paradigm in text-to-image generation. Beyond generating high-quality images from text prompts, models such as Stable Diffusion have been successfully extended to the joint generation of semantic segmentation pseudo-masks. However, current extensions primarily rely on extracting attentions linked to prompt words used for image synthesis. This approach limits the generation of segmentation masks derived from word tokens not contained in the text prompt. In this work, we introduce Open-Vocabulary Attention Maps (OVAM)-a training-free method for text-to-image diffusion models that enables the generation of attention maps for any word. In addition, we propose a lightweight optimization process based on OVAM for finding tokens that generate accurate attention maps for an object class with a single annotation. We evaluate these tokens within existing state-of-the-art Stable Diffusion extensions. The best-performing model improves its mIoU from 52.1 to 86.6 for the synthetic images' pseudo-masks, demonstrating that our optimized tokens are an efficient way to improve the performance of existing methods without architectural changes or retraining. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)

arXiv:2403.13014 [pdf]

General Line Coordinates in 3D

Authors: Joshua Martinez, Boris Kovalerchuk

Abstract: Interpretable interactive visual pattern discovery in lossless 3D visualization is a promising way to advance machine learning. It enables end users who are not data scientists to take control of the model development process as a self-service. It is conducted in 3D General Line Coordinates (GLC) visualization space, which preserves all n-D information in 3D. This paper presents a system which com… ▽ More Interpretable interactive visual pattern discovery in lossless 3D visualization is a promising way to advance machine learning. It enables end users who are not data scientists to take control of the model development process as a self-service. It is conducted in 3D General Line Coordinates (GLC) visualization space, which preserves all n-D information in 3D. This paper presents a system which combines three types of GLC: Shifted Paired Coordinates (SPC), Shifted Tripled Coordinates (STC), and General Line Coordinates-Linear (GLC-L) for interactive visual pattern discovery. A transition from 2-D visualization to 3-D visualization allows for a more distinct visual pattern than in 2-D and it also allows for finding the best data viewing positions, which are not available in 2-D. It enables in-depth visual analysis of various class-specific data subsets comprehensible for end users in the original interpretable attributes. Controlling model overgeneralization by end users is an additional benefit of this approach. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 8 pages, 25 figures

arXiv:2402.17016 [pdf, other]

Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings

Authors: Isabelle Mohr, Markus Krimmel, Saba Sturua, Mohammad Kalim Akram, Andreas Koukounas, Michael Günther, Georgios Mastrapas, Vinit Ravishankar, Joan Fontanals Martínez, Feng Wang, Qi Liu, Ziniu Yu, Jie Fu, Saahil Ognawala, Susana Guzman, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao

Abstract: We introduce a novel suite of state-of-the-art bilingual text embedding models that are designed to support English and another target language. These models are capable of processing lengthy text inputs with up to 8192 tokens, making them highly versatile for a range of natural language processing tasks such as text retrieval, clustering, and semantic textual similarity (STS) calculations. By f… ▽ More We introduce a novel suite of state-of-the-art bilingual text embedding models that are designed to support English and another target language. These models are capable of processing lengthy text inputs with up to 8192 tokens, making them highly versatile for a range of natural language processing tasks such as text retrieval, clustering, and semantic textual similarity (STS) calculations. By focusing on bilingual models and introducing a unique multi-task learning objective, we have significantly improved the model performance on STS tasks, which outperforms the capabilities of existing multilingual models in both target language understanding and cross-lingual evaluation tasks. Moreover, our bilingual models are more efficient, requiring fewer parameters and less memory due to their smaller vocabulary needs. Furthermore, we have expanded the Massive Text Embedding Benchmark (MTEB) to include benchmarks for German and Spanish embedding models. This integration aims to stimulate further research and advancement in text embedding technologies for these languages. △ Less

Submitted 26 February, 2024; originally announced February 2024.

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2402.05435 [pdf, other]

GPT-4 Generated Narratives of Life Events using a Structured Narrative Prompt: A Validation Study

Authors: Christopher J. Lynch, Erik Jensen, Madison H. Munro, Virginia Zamponi, Joseph Martinez, Kevin O'Brien, Brandon Feldhaus, Katherine Smith, Ann Marie Reinhold, Ross Gore

Abstract: Large Language Models (LLMs) play a pivotal role in generating vast arrays of narratives, facilitating a systematic exploration of their effectiveness for communicating life events in narrative form. In this study, we employ a zero-shot structured narrative prompt to generate 24,000 narratives using OpenAI's GPT-4. From this dataset, we manually classify 2,880 narratives and evaluate their validit… ▽ More Large Language Models (LLMs) play a pivotal role in generating vast arrays of narratives, facilitating a systematic exploration of their effectiveness for communicating life events in narrative form. In this study, we employ a zero-shot structured narrative prompt to generate 24,000 narratives using OpenAI's GPT-4. From this dataset, we manually classify 2,880 narratives and evaluate their validity in conveying birth, death, hiring, and firing events. Remarkably, 87.43% of the narratives sufficiently convey the intention of the structured prompt. To automate the identification of valid and invalid narratives, we train and validate nine Machine Learning models on the classified datasets. Leveraging these models, we extend our analysis to predict the classifications of the remaining 21,120 narratives. All the ML models excelled at classifying valid narratives as valid, but experienced challenges at simultaneously classifying invalid narratives as invalid. Our findings not only advance the study of LLM capabilities, limitations, and validity but also offer practical insights for narrative generation and natural language processing applications. △ Less

Submitted 12 July, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 29 pages, 24 figures

ACM Class: I.2.7; I.6.4

arXiv:2401.03780 [pdf, other]

Cybersecurity in Critical Infrastructures: A Post-Quantum Cryptography Perspective

Authors: Javier Oliva del Moral, Antonio deMarti iOlius, Gerard Vidal, Pedro M. Crespo, Josu Etxezarreta Martinez

Abstract: The machinery of industrial environments was connected to the Internet years ago with the scope of increasing their performance. However, this change made such environments vulnerable against cyber-attacks that can compromise their correct functioning resulting in economic or social problems. Moreover, implementing cryptosystems in the communications between operational technology (OT) devices is… ▽ More The machinery of industrial environments was connected to the Internet years ago with the scope of increasing their performance. However, this change made such environments vulnerable against cyber-attacks that can compromise their correct functioning resulting in economic or social problems. Moreover, implementing cryptosystems in the communications between operational technology (OT) devices is a more challenging task than for information technology (IT) environments since the OT networks are generally composed of legacy elements, characterized by low-computational capabilities. Consequently, implementing cryptosystems in industrial communication networks faces a trade-off between the security of the communications and the amortization of the industrial infrastructure. Critical Infrastructure (CI) refers to the industries which provide key resources for the daily social and economical development, e.g. electricity. Furthermore, a new threat to cybersecurity has arisen with the theoretical proposal of quantum computers, due to their potential ability of breaking state-of-the-art cryptography protocols, such as RSA or ECC. Many global agents have become aware that transitioning their secure communications to a quantum secure paradigm is a priority that should be established before the arrival of fault-tolerance. In this paper, we aim to describe the problematic of implementing post-quantum cryptography (PQC) to CI environments. For doing so, we describe the requirements for these scenarios and how they differ against IT. We also introduce classical cryptography and how quantum computers pose a threat to such security protocols. Furthermore, we introduce state-of-the-art proposals of PQC protocols and present their characteristics. We conclude by discussing the problematic of integrating PQC in industrial environments. △ Less

Submitted 11 June, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: 27 pages, 7 figures, 10 tables

arXiv:2401.03307 [pdf, other]

Modeling Processes of Neighborhood Change

Authors: J. Carlos Martínez Mori, Zhanzhan Zhao

Abstract: An urban planner might design the spatial layout of transportation amenities so as to improve accessibility for underserved communities -- a fairness objective. However, implementing such a design might trigger processes of neighborhood change that change who benefits from these amenities in the long term. If so, has the planner really achieved their fairness objective? Can algorithmic decision-ma… ▽ More An urban planner might design the spatial layout of transportation amenities so as to improve accessibility for underserved communities -- a fairness objective. However, implementing such a design might trigger processes of neighborhood change that change who benefits from these amenities in the long term. If so, has the planner really achieved their fairness objective? Can algorithmic decision-making anticipate second order effects? In this paper, we take a step in this direction by formulating processes of neighborhood change as instances of no-regret dynamics; a collective learning process in which a set of strategic agents rapidly reach a state of approximate equilibrium. We mathematize concepts of neighborhood change to model the incentive structures impacting individual dwelling-site decision-making. Our model accounts for affordability, access to relevant transit amenities, community ties, and site upkeep. We showcase our model with computational experiments that provide semi-quantitative insights on the spatial economics of neighborhood change, particularly on the influence of residential zoning policy and the placement of transit amenities. △ Less

Submitted 9 February, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

MSC Class: 91D10; 91A80; 90B06

arXiv:2312.15849 [pdf]

FODT: Fast, Online, Distributed and Temporary Failure Recovery Approach for MEC

Authors: Xin Yuan, Ning Li, Zhaoxin Zhang, Quan Chen, Jose Fernan Martinez

Abstract: Mobile edge computing (MEC) can reduce the latency of cloud computing successfully. However, the edge server may fail due to the hardware of software issues. When the edge server failure happens, the users who offload tasks to this server will be affected. How to recover the services for these affected users quickly and effectively is challenging. Moreover, considering that the server failure is c… ▽ More Mobile edge computing (MEC) can reduce the latency of cloud computing successfully. However, the edge server may fail due to the hardware of software issues. When the edge server failure happens, the users who offload tasks to this server will be affected. How to recover the services for these affected users quickly and effectively is challenging. Moreover, considering that the server failure is continuous and temporary, and the failed server can be repaired, the previous works cannot handle this problem effectively. Therefore, in this paper, we propose the fast, online, distributed, and temporary failure recovery algorithm (FODT) for MEC. In FODT, when edge sever failure happens, only the affected APs recalculate their user-server allocation strategies and the other APs do not change their strategies. For the affected access points (Aps), the strategies before server failure are reused to reduce complexity and latency. When the failed server is repaired, the influenced APs reuse the strategies before server failure to offload task to this server. Based on this approach, the FODT can achieve better performance than previous works. To the best of knowledge, the FODT is the first failure recovery algorithm, and when compared with previous research, it has higher failure recovery efficiency and lower complexity with acceptable approximate ratio. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: 12 pages, 7 figures

arXiv:2312.06504 [pdf, ps, other]

An infinite class of quantum codes derived from duadic constacyclic codes

Authors: Reza Dastbasteh, Josu Etxezarreta Martinez, Andrew Nemec, Antonio deMarti iOlius, Pedro Crespo Bofill

Abstract: We present a family of quantum stabilizer codes using the structure of duadic constacyclic codes over $\mathbb{F}_4$. Within this family, quantum codes can possess varying dimensions, and their minimum distances are lower bounded by a square root bound. For each fixed dimension, this allows us to construct an infinite sequence of binary quantum codes with a growing minimum distance. Additionally,… ▽ More We present a family of quantum stabilizer codes using the structure of duadic constacyclic codes over $\mathbb{F}_4$. Within this family, quantum codes can possess varying dimensions, and their minimum distances are lower bounded by a square root bound. For each fixed dimension, this allows us to construct an infinite sequence of binary quantum codes with a growing minimum distance. Additionally, we prove that this family of quantum codes includes an infinite subclass of degenerate codes. We also introduce a technique for extending splittings of duadic constacyclic codes, providing new insights into the minimum distance and minimum odd-like weight of specific duadic constacyclic codes. Finally, we provide numerical examples of some quantum codes with short lengths within this family. △ Less

Submitted 27 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 31 pages, 2 tables

MSC Class: 94B05; 94B15

arXiv:2312.04836 [pdf, other]

Thermodynamic Computing System for AI Applications

Authors: Denis Melanson, Mohammad Abu Khater, Maxwell Aifer, Kaelan Donatella, Max Hunter Gordon, Thomas Ahle, Gavin Crooks, Antonio J. Martinez, Faris Sbahi, Patrick J. Coles

Abstract: Recent breakthroughs in artificial intelligence (AI) algorithms have highlighted the need for novel computing hardware in order to truly unlock the potential for AI. Physics-based hardware, such as thermodynamic computing, has the potential to provide a fast, low-power means to accelerate AI primitives, especially generative AI and probabilistic AI. In this work, we present the first continuous-va… ▽ More Recent breakthroughs in artificial intelligence (AI) algorithms have highlighted the need for novel computing hardware in order to truly unlock the potential for AI. Physics-based hardware, such as thermodynamic computing, has the potential to provide a fast, low-power means to accelerate AI primitives, especially generative AI and probabilistic AI. In this work, we present the first continuous-variable thermodynamic computer, which we call the stochastic processing unit (SPU). Our SPU is composed of RLC circuits, as unit cells, on a printed circuit board, with 8 unit cells that are all-to-all coupled via switched capacitances. It can be used for either sampling or linear algebra primitives, and we demonstrate Gaussian sampling and matrix inversion on our hardware. The latter represents the first thermodynamic linear algebra experiment. We also illustrate the applicability of the SPU to uncertainty quantification for neural network classification. We envision that this hardware, when scaled up in size, will have significant impact on accelerating various probabilistic AI applications. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: 26 pages, 22 figures

arXiv:2312.03799 [pdf, other]

Low-power, Continuous Remote Behavioral Localization with Event Cameras

Authors: Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez, Tom Hart, Alex Kacelnik, Guillermo Gallego

Abstract: Researchers in natural science need reliable methods for quantifying animal behavior. Recently, numerous computer vision methods emerged to automate the process. However, observing wild species at remote locations remains a challenging task due to difficult lighting conditions and constraints on power supply and data storage. Event cameras offer unique advantages for battery-dependent remote monit… ▽ More Researchers in natural science need reliable methods for quantifying animal behavior. Recently, numerous computer vision methods emerged to automate the process. However, observing wild species at remote locations remains a challenging task due to difficult lighting conditions and constraints on power supply and data storage. Event cameras offer unique advantages for battery-dependent remote monitoring due to their low power consumption and high dynamic range capabilities. We use this novel sensor to quantify a behavior in Chinstrap penguins called ecstatic display. We formulate the problem as a temporal action detection task, determining the start and end times of the behavior. For this purpose, we recorded a colony of breeding penguins in Antarctica for several weeks and labeled event data on 16 nests. The developed method consists of a generator of candidate time intervals (proposals) and a classifier of the actions within them. The experiments show that the event cameras' natural response to motion is effective for continuous behavior monitoring and detection, reaching a mean average precision (mAP) of 58% (which increases to 63% in good weather conditions). The results also demonstrate the robustness against various lighting conditions contained in the challenging dataset. The low-power capabilities of the event camera allow it to record significantly longer than with a conventional camera. This work pioneers the use of event cameras for remote wildlife observation, opening new interdisciplinary opportunities. https://tub-rip.github.io/eventpenguins/ △ Less

Submitted 19 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: 13 pages, 8 figures, 12 tables, Project page: https://tub-rip.github.io/eventpenguins/

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 2024

arXiv:2311.02986 [pdf, other]

Hacking Cryptographic Protocols with Advanced Variational Quantum Attacks

Authors: Borja Aizpurua, Pablo Bermejo, Josu Etxezarreta Martinez, Roman Orus

Abstract: Here we introduce an improved approach to Variational Quantum Attack Algorithms (VQAA) on crytographic protocols. Our methods provide robust quantum attacks to well-known cryptographic algorithms, more efficiently and with remarkably fewer qubits than previous approaches. We implement simulations of our attacks for symmetric-key protocols such as S-DES, S-AES and Blowfish. For instance, we show ho… ▽ More Here we introduce an improved approach to Variational Quantum Attack Algorithms (VQAA) on crytographic protocols. Our methods provide robust quantum attacks to well-known cryptographic algorithms, more efficiently and with remarkably fewer qubits than previous approaches. We implement simulations of our attacks for symmetric-key protocols such as S-DES, S-AES and Blowfish. For instance, we show how our attack allows a classical simulation of a small 8-qubit quantum computer to find the secret key of one 32-bit Blowfish instance with 24 times fewer number of iterations than a brute-force attack. Our work also shows improvements in attack success rates for lightweight ciphers such as S-DES and S-AES. Further applications beyond symmetric-key cryptography are also discussed, including asymmetric-key protocols and hash functions. In addition, we also comment on potential future improvements of our methods. Our results bring one step closer assessing the vulnerability of large-size classical cryptographic protocols with Noisy Intermediate-Scale Quantum (NISQ) devices, and set the stage for future research in quantum cybersecurity. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 12 pages, 8 figures

arXiv:2310.20093 [pdf, other]

Evaluating Neural Language Models as Cognitive Models of Language Acquisition

Authors: Héctor Javier Vázquez Martínez, Annika Lea Heuser, Charles Yang, Jordan Kodner

Abstract: The success of neural language models (LMs) on many technological tasks has brought about their potential relevance as scientific theories of language despite some clear differences between LM training and child language acquisition. In this paper we argue that some of the most prominent benchmarks for evaluating the syntactic capacities of LMs may not be sufficiently rigorous. In particular, we s… ▽ More The success of neural language models (LMs) on many technological tasks has brought about their potential relevance as scientific theories of language despite some clear differences between LM training and child language acquisition. In this paper we argue that some of the most prominent benchmarks for evaluating the syntactic capacities of LMs may not be sufficiently rigorous. In particular, we show that the template-based benchmarks lack the structural diversity commonly found in the theoretical and psychological studies of language. When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models. We advocate for the use of the readily available, carefully curated datasets that have been evaluated for gradient acceptability by large pools of native speakers and are designed to probe the structural basis of grammar specifically. On one such dataset, the LI-Adger dataset, LMs evaluate sentences in a way inconsistent with human language users. We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: To appear in the GenBench 2023 workshop proceedings, the first workshop on (benchmarking) generalisation in NLP. GenBench 2023 will be held at EMNLP 2023 on December 6, 2023

arXiv:2310.07803 [pdf]

doi 10.1515/humor-2023-0032

A general mechanism of humor: reformulating the semantic overlap

Authors: Javier Martínez

Abstract: This article proposes a cognitive mechanism of humour of general applicability, not restricted to verbal communication. It is indebted to Raskin's concept of script overlap, and conforms to the incongruity-resolution theoretical framework, but it is built on the notion of constraint, an abstract correspondence between sets of data. Under this view, script overlap is an outcome of a more abstractly… ▽ More This article proposes a cognitive mechanism of humour of general applicability, not restricted to verbal communication. It is indebted to Raskin's concept of script overlap, and conforms to the incongruity-resolution theoretical framework, but it is built on the notion of constraint, an abstract correspondence between sets of data. Under this view, script overlap is an outcome of a more abstractly described phenomenon, constraint overlap. The important concept of the overlooked argument is introduced to characterise the two overlapping constraints -- overt and covert. Their inputs and outputs are not directly encoded in utterances, but implicated by them, and their overlap results in another overlap at the level of the communicated utterances, that the incongruity reveals. Our hypothesis assumes as a given that the evocation of such constraints is a cognitive effect of the inferential process by which a hearer interprets utterances. We base this assumption on Hofstadter's theory of analogy-making as the essence of human thought. By substituting "stimuli" of any kind for "utterances" in this model, we obtain a mechanism as easily applicable to non-verbal communication -- slapstick, cartoons -- and we propose it describes the necessary and sufficient conditions for a communicative act in any modality to carry humour. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 24 pages, 8 figures

ACM Class: I.2.m; J.5

Journal ref: HUMOR: International Journal of Humor Research, vol.36, no.4, 2023, pp. 529-565

arXiv:2310.06572 [pdf, other]

doi 10.1016/j.engappai.2024.107876

Deep Learning reconstruction with uncertainty estimation for $γ$ photon interaction in fast scintillator detectors

Authors: Geoffrey Daniel, Mohamed Bahi Yahiaoui, Claude Comtat, Sebastien Jan, Olga Kochebina, Jean-Marc Martinez, Viktoriya Sergeyeva, Viatcheslav Sharyy, Chi-Hsun Sung, Dominique Yvon

Abstract: This article presents a physics-informed deep learning method for the quantitative estimation of the spatial coordinates of gamma interactions within a monolithic scintillator, with a focus on Positron Emission Tomography (PET) imaging. A Density Neural Network approach is designed to estimate the 2-dimensional gamma photon interaction coordinates in a fast lead tungstate (PbWO4) monolithic scinti… ▽ More This article presents a physics-informed deep learning method for the quantitative estimation of the spatial coordinates of gamma interactions within a monolithic scintillator, with a focus on Positron Emission Tomography (PET) imaging. A Density Neural Network approach is designed to estimate the 2-dimensional gamma photon interaction coordinates in a fast lead tungstate (PbWO4) monolithic scintillator detector. We introduce a custom loss function to estimate the inherent uncertainties associated with the reconstruction process and to incorporate the physical constraints of the detector. This unique combination allows for more robust and reliable position estimations and the obtained results demonstrate the effectiveness of the proposed approach and highlights the significant benefits of the uncertainties estimation. We discuss its potential impact on improving PET imaging quality and show how the results can be used to improve the exploitation of the model, to bring benefits to the application and how to evaluate the validity of the given prediction and the associated uncertainties. Importantly, our proposed methodology extends beyond this specific use case, as it can be generalized to other applications beyond PET imaging. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Submitted to Artificial Intelligence

Journal ref: Engineering Applications of Artificial Intelligence, Volume 131, 2024, 107876

arXiv:2309.12265 [pdf, ps, other]

Cost-sharing in Parking Games

Authors: Jennifer Elder, Pamela E. Harris, Jan Kretschmann, J. Carlos Martínez Mori

Abstract: In this paper, we study the total displacement statistic of parking functions from the perspective of cooperative game theory. We introduce parking games, which are coalitional cost-sharing games in characteristic function form derived from the total displacement statistic. We show that parking games are supermodular cost-sharing games, indicating that cooperation is difficult (i.e., their core is… ▽ More In this paper, we study the total displacement statistic of parking functions from the perspective of cooperative game theory. We introduce parking games, which are coalitional cost-sharing games in characteristic function form derived from the total displacement statistic. We show that parking games are supermodular cost-sharing games, indicating that cooperation is difficult (i.e., their core is empty). Next, we study their Shapley value, which formalizes a notion of "fair" cost-sharing and amounts to charging each car for its expected marginal displacement under a random arrival order. Our main contribution is a polynomial-time algorithm to compute the Shapley value of parking games, in contrast with known hardness results on computing the Shapley value of arbitrary games. The algorithm leverages the permutation-invariance of total displacement, combinatorial enumeration, and dynamic programming. We conclude with open questions around an alternative solution concept for supermodular cost-sharing games and connections to other areas in combinatorics. △ Less

Submitted 2 September, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: 14 pages

MSC Class: 05A05; 91A12; 91A46

arXiv:2305.13452 [pdf, other]

Measuring and Modeling Physical Intrinsic Motivation

Authors: Julio Martinez, Felix Binder, Haoliang Wang, Nick Haber, Judith Fan, Daniel L. K. Yamins

Abstract: Humans are interactive agents driven to seek out situations with interesting physical dynamics. Here we formalize the functional form of physical intrinsic motivation. We first collect ratings of how interesting humans find a variety of physics scenarios. We then model human interestingness responses by implementing various hypotheses of intrinsic motivation including models that rely on simple sc… ▽ More Humans are interactive agents driven to seek out situations with interesting physical dynamics. Here we formalize the functional form of physical intrinsic motivation. We first collect ratings of how interesting humans find a variety of physics scenarios. We then model human interestingness responses by implementing various hypotheses of intrinsic motivation including models that rely on simple scene features to models that depend on forward physics prediction. We find that the single best predictor of human responses is adversarial reward, a model derived from physical prediction loss. We also find that simple scene feature models do not generalize their prediction of human responses across all scenarios. Finally, linearly combining the adversarial model with the number of collisions in a scene leads to the greatest improvement in predictivity of human responses, suggesting humans are driven towards scenarios that result in high information gain and physical activity. △ Less

Submitted 7 August, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: 6 pages, 5 figures, accepted to CogSci 2023 with full paper publication in the proceedings

arXiv:2302.13961 [pdf, other]

Soft labelling for semantic segmentation: Bringing coherence to label down-sampling

Authors: Roberto Alcover-Couso, Marcos Escudero-Vinolo, Juan C. SanMiguel, Jose M. Martinez

Abstract: In semantic segmentation, training data down-sampling is commonly performed due to limited resources, the need to adapt image size to the model input, or improve data augmentation. This down-sampling typically employs different strategies for the image data and the annotated labels. Such discrepancy leads to mismatches between the down-sampled color and label images. Hence, the training performanc… ▽ More In semantic segmentation, training data down-sampling is commonly performed due to limited resources, the need to adapt image size to the model input, or improve data augmentation. This down-sampling typically employs different strategies for the image data and the annotated labels. Such discrepancy leads to mismatches between the down-sampled color and label images. Hence, the training performance significantly decreases as the down-sampling factor increases. In this paper, we bring together the down-sampling strategies for the image data and the training labels. To that aim, we propose a novel framework for label down-sampling via soft-labeling that better conserves label information after down-sampling. Therefore, fully aligning soft-labels with image data to keep the distribution of the sampled pixels. This proposal also produces reliable annotations for under-represented semantic classes. Altogether, it allows training competitive models at lower resolutions. Experiments show that the proposal outperforms other down-sampling strategies. Moreover, state-of-the-art performance is achieved for reference benchmarks, but employing significantly less computational resources than foremost approaches. This proposal enables competitive research for semantic segmentation under resource constraints. △ Less

Submitted 19 February, 2024; v1 submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.06584 [pdf, other]

Thermodynamic AI and the fluctuation frontier

Authors: Patrick J. Coles, Collin Szczepanski, Denis Melanson, Kaelan Donatella, Antonio J. Martinez, Faris Sbahi

Abstract: Many Artificial Intelligence (AI) algorithms are inspired by physics and employ stochastic fluctuations. We connect these physics-inspired AI algorithms by unifying them under a single mathematical framework that we call Thermodynamic AI. Seemingly disparate algorithmic classes can be described by this framework, for example, (1) Generative diffusion models, (2) Bayesian neural networks, (3) Monte… ▽ More Many Artificial Intelligence (AI) algorithms are inspired by physics and employ stochastic fluctuations. We connect these physics-inspired AI algorithms by unifying them under a single mathematical framework that we call Thermodynamic AI. Seemingly disparate algorithmic classes can be described by this framework, for example, (1) Generative diffusion models, (2) Bayesian neural networks, (3) Monte Carlo sampling and (4) Simulated annealing. Such Thermodynamic AI algorithms are currently run on digital hardware, ultimately limiting their scalability and overall potential. Stochastic fluctuations naturally occur in physical thermodynamic systems, and such fluctuations can be viewed as a computational resource. Hence, we propose a novel computing paradigm, where software and hardware become inseparable. Our algorithmic unification allows us to identify a single full-stack paradigm, involving Thermodynamic AI hardware, that could accelerate such algorithms. We contrast Thermodynamic AI hardware with quantum computing where noise is a roadblock rather than a resource. Thermodynamic AI hardware can be viewed as a novel form of computing, since it uses a novel fundamental building block. We identify stochastic bits (s-bits) and stochastic modes (s-modes) as the respective building blocks for discrete and continuous Thermodynamic AI hardware. In addition to these stochastic units, Thermodynamic AI hardware employs a Maxwell's demon device that guides the system to produce non-trivial states. We provide a few simple physical architectures for building these devices and we develop a formalism for programming the hardware via gate sequences. We hope to stimulate discussion around this new computing paradigm. Beyond acceleration, we believe it will impact the design of both hardware and algorithms, while also deepening our understanding of the connection between physics and intelligence. △ Less

Submitted 13 June, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

Comments: 47 pages, 18 figures, Updated authors

arXiv:2301.10132 [pdf, other]

doi 10.1103/PhysRevA.108.032602

The superadditivity effects of quantum capacity decrease with the dimension for qudit depolarizing channels

Authors: Josu Etxezarreta Martinez, Antonio deMarti iOlius, Pedro M. Crespo

Abstract: Quantum channel capacity is a fundamental quantity in order to understand how good can quantum information be transmitted or corrected when subjected to noise. However, it is generally not known how to compute such quantities, since the quantum channel coherent information is not additive for all channels, implying that it must be maximized over an unbounded number of channel uses. This leads to t… ▽ More Quantum channel capacity is a fundamental quantity in order to understand how good can quantum information be transmitted or corrected when subjected to noise. However, it is generally not known how to compute such quantities, since the quantum channel coherent information is not additive for all channels, implying that it must be maximized over an unbounded number of channel uses. This leads to the phenomenon known as superadditivity, which refers to the fact that the regularized coherent information of $n$ channel uses exceeds one-shot coherent information. In this article, we study how the gain in quantum capacity of qudit depolarizing channels relates to the dimension of the systems considered. We make use of an argument based on the no-cloning bound in order to proof that the possible superadditive effects decrease as a function of the dimension for such family of channels. In addition, we prove that the capacity of the qudit depolarizing channel coincides with the coherent information when $d\rightarrow\infty$. We also discuss the private classical capacity and obain similar results. We conclude that when high dimensional qudits experiencing depolarizing noise are considered, the coherent information of the channel is not only an achievable rate but essentially the maximum possible rate for any quantum block code. △ Less

Submitted 31 August, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

Comments: 10 pages, 2 figures

Journal ref: Phys. Rev. A 108, 032602 (2023)

arXiv:2211.15538 [pdf, other]

Graph Convolutional Network for Multi-Target Multi-Camera Vehicle Tracking

Authors: Elena Luna, Juan Carlos San Miguel, José María Martínez, Marcos Escudero-Viñolo

Abstract: This letter focuses on the task of Multi-Target Multi-Camera vehicle tracking. We propose to associate single-camera trajectories into multi-camera global trajectories by training a Graph Convolutional Network. Our approach simultaneously processes all cameras providing a global solution, and it is also robust to large cameras unsynchronizations. Furthermore, we design a new loss function to deal… ▽ More This letter focuses on the task of Multi-Target Multi-Camera vehicle tracking. We propose to associate single-camera trajectories into multi-camera global trajectories by training a Graph Convolutional Network. Our approach simultaneously processes all cameras providing a global solution, and it is also robust to large cameras unsynchronizations. Furthermore, we design a new loss function to deal with class imbalance. Our proposal outperforms the related work showing better generalization and without requiring ad-hoc manual annotations or thresholds, unlike compared approaches. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2210.09184 [pdf, other]

Packed-Ensembles for Efficient Uncertainty Estimation

Authors: Olivier Laurent, Adrien Lafage, Enzo Tartaglione, Geoffrey Daniel, Jean-Marc Martinez, Andrei Bursuc, Gianni Franchi

Abstract: Deep Ensembles (DE) are a prominent approach for achieving excellent performance on key metrics such as accuracy, calibration, uncertainty estimation, and out-of-distribution detection. However, hardware limitations of real-world systems constrain to smaller ensembles and lower-capacity networks, significantly deteriorating their performance and properties. We introduce Packed-Ensembles (PE), a st… ▽ More Deep Ensembles (DE) are a prominent approach for achieving excellent performance on key metrics such as accuracy, calibration, uncertainty estimation, and out-of-distribution detection. However, hardware limitations of real-world systems constrain to smaller ensembles and lower-capacity networks, significantly deteriorating their performance and properties. We introduce Packed-Ensembles (PE), a strategy to design and train lightweight structured ensembles by carefully modulating the dimension of their encoding space. We leverage grouped convolutions to parallelize the ensemble into a single shared backbone and forward pass to improve training and inference speeds. PE is designed to operate within the memory limits of a standard neural network. Our extensive research indicates that PE accurately preserves the properties of DE, such as diversity, and performs equally well in terms of accuracy, calibration, out-of-distribution detection, and robustness to distribution shift. We make our code available at https://github.com/ENSTA-U2IS/torch-uncertainty. △ Less

Submitted 27 April, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

Comments: Published as a conference paper at ICLR 2023 (notable 25%)

arXiv:2207.13991 [pdf]

CoNet: Borderless and decentralized server cooperation in edge computing

Authors: Ning Li, Xin Yuan, Zhaoxin Zhang, Jose Fernan Martinez

Abstract: In edge computing (EC), by offloading tasks to edge server or remote cloud, the system performance can be improved greatly. However, since the traffic distribution in EC is heterogeneous and dynamic, it is difficult for an individual edge server to provide satisfactory computation service anytime and anywhere. This issue motivated the researchers to study the cooperation between edge servers. The… ▽ More In edge computing (EC), by offloading tasks to edge server or remote cloud, the system performance can be improved greatly. However, since the traffic distribution in EC is heterogeneous and dynamic, it is difficult for an individual edge server to provide satisfactory computation service anytime and anywhere. This issue motivated the researchers to study the cooperation between edge servers. The previous server cooperation algorithms have disadvantages since the cooperated region is limited within one-hop. However, the performance of EC can be improved further by releasing the restriction of cooperation region. Even some works have extended the cooperated region to multi-hops, they fail to support the task offloading which is one of the core issues of edge computing. Therefore, we propose a new decentralized and borderless server cooperation algorithm for edge computing which takes task offloading strategy into account, named CoNet. In CoNet, the cooperation region is not limited. Each server forms its own basic cooperation unit (BCU) and calculates its announced capability based on BCU. The server's capability, the processing delay, the task and calculation result forwarding delay are considered during the calculation. The task division strategy bases on the real capability of host-server and the announced capability of cooperation-servers. This cooperation process is recursive and will be terminated once the terminal condition is satisfied. The simulation results demonstrate the advantages of CoNet over previous works. △ Less

Submitted 28 July, 2022; originally announced July 2022.

arXiv:2207.01323 [pdf, other]

Computer vision application for improved product traceability in the granite manufacturing industry

Authors: Xurxo Rigueira, Javier Martinez, Maria Araujo, Antonio Recaman

Abstract: The traceability of granite blocks consists in identifying each block with a finite number of color bands which represent a numerical code. This code has to be read several times throughout the manufacturing process, but its accuracy is subject to human errors, leading to cause faults in the traceability system. A computer vision system is presented to address this problem through color detection… ▽ More The traceability of granite blocks consists in identifying each block with a finite number of color bands which represent a numerical code. This code has to be read several times throughout the manufacturing process, but its accuracy is subject to human errors, leading to cause faults in the traceability system. A computer vision system is presented to address this problem through color detection and the decryption of the associated code. The system developed makes use of color space transformations, and several thresholds for the isolation of the colors. Computer vision methods are implemented, along with contour detection procedures for color identification. Lastly, the analysis of geometrical features is used to decrypt the color code captured. The proposed algorithm is trained on a set of 109 pictures taken in different environmental conditions and validated on a set of 21 images. The outcome shows promising results with an accuracy rate of 75.00% in the validation process. Therefore, the application presented can help employees reduce the number of mistakes on product tracking. △ Less

Submitted 4 July, 2022; originally announced July 2022.

MSC Class: 65D19 ACM Class: I.4

arXiv:2206.04663 [pdf, other]

Provably efficient variational generative modeling of quantum many-body systems via quantum-probabilistic information geometry

Authors: Faris M. Sbahi, Antonio J. Martinez, Sahil Patel, Dmitri Saberi, Jae Hyeon Yoo, Geoffrey Roeder, Guillaume Verdon

Abstract: The dual tasks of quantum Hamiltonian learning and quantum Gibbs sampling are relevant to many important problems in physics and chemistry. In the low temperature regime, algorithms for these tasks often suffer from intractabilities, for example from poor sample- or time-complexity. With the aim of addressing such intractabilities, we introduce a generalization of quantum natural gradient descent… ▽ More The dual tasks of quantum Hamiltonian learning and quantum Gibbs sampling are relevant to many important problems in physics and chemistry. In the low temperature regime, algorithms for these tasks often suffer from intractabilities, for example from poor sample- or time-complexity. With the aim of addressing such intractabilities, we introduce a generalization of quantum natural gradient descent to parameterized mixed states, as well as provide a robust first-order approximating algorithm, Quantum-Probabilistic Mirror Descent. We prove data sample efficiency for the dual tasks using tools from information geometry and quantum metrology, thus generalizing the seminal result of classical Fisher efficiency to a variational quantum algorithm for the first time. Our approaches extend previously sample-efficient techniques to allow for flexibility in model choice, including to spectrally-decomposed models like Quantum Hamiltonian-Based Models, which may circumvent intractable time complexities. Our first-order algorithm is derived using a novel quantum generalization of the classical mirror descent duality. Both results require a special choice of metric, namely, the Bogoliubov-Kubo-Mori metric. To test our proposed algorithms numerically, we compare their performance to existing baselines on the task of quantum Gibbs sampling for the transverse field Ising model. Finally, we propose an initialization strategy leveraging geometric locality for the modelling of sequences of states such as those arising from quantum-stochastic processes. We demonstrate its effectiveness empirically for both real and imaginary time evolution while defining a broader class of potential applications. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Comments: 24 + 49 pages, 5 + 4 figures

arXiv:2204.12918 [pdf, other]

We're Not Gonna Break It! Consistency-Preserving Operators for Efficient Product Line Configuration

Authors: Jose-Miguel Horcas, Daniel Strüber, Alexandru Burdusel, Jabier Martinez, Steffen Zschaler

Abstract: When configuring a software product line, finding a good trade-off between multiple orthogonal quality concerns is a challenging multi-objective optimisation problem. State-of-the-art solutions based on search-based techniques create invalid configurations in intermediate steps, requiring additional repair actions that reduce the efficiency of the search. In this work, we introduce consistency-pre… ▽ More When configuring a software product line, finding a good trade-off between multiple orthogonal quality concerns is a challenging multi-objective optimisation problem. State-of-the-art solutions based on search-based techniques create invalid configurations in intermediate steps, requiring additional repair actions that reduce the efficiency of the search. In this work, we introduce consistency-preserving configuration operators (CPCOs)--genetic operators that maintain valid configurations throughout the entire search. CPCOs bundle coherent sets of changes: the activation or deactivation of a particular feature together with other (de)activations that are needed to preserve validity. In our evaluation, our instantiation of the IBEA algorithm with CPCOs outperforms two state-of-the-art tools for optimal product line configuration in terms of both speed and solution quality. The improvements are especially pronounced in large product lines with thousands of features. △ Less

Submitted 27 April, 2022; originally announced April 2022.

Comments: Accepted for publication in IEEE Transactions on Software Engineering (TSE). 16 pages, 10 figures; includes an appendix with 8 additional pages and 4 additional figures

arXiv:2204.10476 [pdf]

doi 10.1016/j.jbi.2007.01.001

Global Mapping of Gene/Protein Interactions in PubMed Abstracts: A Framework and an Experiment with P53 Interactions

Authors: Xin Li, Hsinchun Chen, Zan Huang, Hua Su, Jesse D. Martinez

Abstract: Gene/protein interactions provide critical information for a thorough understanding of cellular processes. Recently, considerable interest and effort has been focused on the construction and analysis of genome-wide gene networks. The large body of biomedical literature is an important source of gene/protein interaction information. Recent advances in text mining tools have made it possible to auto… ▽ More Gene/protein interactions provide critical information for a thorough understanding of cellular processes. Recently, considerable interest and effort has been focused on the construction and analysis of genome-wide gene networks. The large body of biomedical literature is an important source of gene/protein interaction information. Recent advances in text mining tools have made it possible to automatically extract such documented interactions from free-text literature. In this paper, we propose a comprehensive framework for constructing and analyzing large-scale gene functional networks based on the gene/protein interactions extracted from biomedical literature repositories using text mining tools. Our proposed framework consists of analyses of the network topology, network topology-gene function relationship, and temporal network evolution to distill valuable information embedded in the gene functional interactions in literature. We demonstrate the application of the proposed framework using a testbed of P53-related PubMed abstracts, which shows that literature-based P53 networks exhibit small-world and scale-free properties. We also found that high degree genes in the literature-based networks have a high probability of appearing in the manually curated database and genes in the same pathway tend to form local clusters in our literature-based networks. Temporal analysis showed that genes interacting with many other genes tend to be involved in a large number of newly discovered interactions. △ Less

Submitted 21 April, 2022; originally announced April 2022.

Journal ref: Journal of biomedical informatics, 2007

arXiv:2202.07127 [pdf, other]

Computing with Modular Robots

Authors: Genaro J. Martinez, Andrew Adamatzky, Ricardo Q. Figueroa, Eric Schweikardt, Dmitry A. Zaitsev, Ivan Zelinka, Luz N. Oliva-Moreno

Abstract: Propagating patterns are used to transfer and process information in chemical and physical prototypes of unconventional computing devices. Logical values are represented by fronts of traveling diffusive, trigger or phase waves. We apply this concept of pattern based computation to develop experimental prototypes of computing circuits implemented in small modular robots. In the experimental prototy… ▽ More Propagating patterns are used to transfer and process information in chemical and physical prototypes of unconventional computing devices. Logical values are represented by fronts of traveling diffusive, trigger or phase waves. We apply this concept of pattern based computation to develop experimental prototypes of computing circuits implemented in small modular robots. In the experimental prototypes the modular robots Cubelets are concatenated into channels and junction. The structures developed by Cubelets propagate signals in parallel and asynchronously. The approach is illustrated with a working circuit of a one-bit full adder. Complementarily a formalization of these constructions are developed across Sleptsov nets. Finally, a perspective to swarm dynamics is discussed. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: 33 pages, 23 figures, 5 tables

Journal ref: International Journal of Unconventional Computing, 17(1-2), 31-60, 2022

arXiv:2202.03212 [pdf, other]

Introducing explainable supervised machine learning into interactive feedback loops for statistical production system

Authors: Carlos Mougan, George Kanellos, Johannes Micheler, Jose Martinez, Thomas Gottron

Abstract: Statistical production systems cover multiple steps from the collection, aggregation, and integration of data to tasks like data quality assurance and dissemination. While the context of data quality assurance is one of the most promising fields for applying machine learning, the lack of curated and labeled training data is often a limiting factor. The statistical production system for the Centr… ▽ More Statistical production systems cover multiple steps from the collection, aggregation, and integration of data to tasks like data quality assurance and dissemination. While the context of data quality assurance is one of the most promising fields for applying machine learning, the lack of curated and labeled training data is often a limiting factor. The statistical production system for the Centralised Securities Database features an interactive feedback loop between data collected by the European Central Bank and data quality assurance performed by data quality managers at National Central Banks. The quality assurance feedback loop is based on a set of rule-based checks for raising exceptions, upon which the user either confirms the data or corrects an actual error. In this paper we use the information received from this feedback loop to optimize the exceptions presented to the National Central Banks thereby improving the quality of exceptions generated and the time consumed on the system by the users authenticating those exceptions. For this approach we make use of explainable supervised machine learning to (a) identify the types of exceptions and (b) to prioritize which exceptions are more likely to require an intervention or correction by the NCBs. Furthermore, we provide an explainable AI taxonomy aiming to identify the different explainable AI needs that arose during the project. △ Less

Submitted 18 February, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: Irving Fisher Committee (IFC) - Bank of Italy workshop on Data science in central banking: Applications and tools. arXiv admin note: text overlap with arXiv:2107.08045

arXiv:2201.10985 [pdf, other]

Jalisco's multiclass land cover analysis and classification using a novel lightweight convnet with real-world multispectral and relief data

Authors: Alexander Quevedo, Abraham Sánchez, Raul Nancláres, Diana P. Montoya, Juan Pacho, Jorge Martínez, E. Ulises Moya-Sánchez

Abstract: The understanding of global climate change, agriculture resilience, and deforestation control rely on the timely observations of the Land Use and Land Cover Change (LULCC). Recently, some deep learning (DL) methods have been adapted to make an automatic classification of Land Cover (LC) for global and homogeneous data. However, most of these DL models can not apply effectively to real-world data.… ▽ More The understanding of global climate change, agriculture resilience, and deforestation control rely on the timely observations of the Land Use and Land Cover Change (LULCC). Recently, some deep learning (DL) methods have been adapted to make an automatic classification of Land Cover (LC) for global and homogeneous data. However, most of these DL models can not apply effectively to real-world data. i.e. a large number of classes, multi-seasonal data, diverse climate regions, high imbalance label dataset, and low-spatial resolution. In this work, we present our novel lightweight (only 89k parameters) Convolution Neural Network (ConvNet) to make LC classification and analysis to handle these problems for the Jalisco region. In contrast to the global approaches, the regional data provide the context-specificity that is required for policymakers to plan the land use and management, conservation areas, or ecosystem services. In this work, we combine three real-world open data sources to obtain 13 channels. Our embedded analysis anticipates the limited performance in some classes and gives us the opportunity to group the most similar, as a result, the test accuracy performance increase from 73 % to 83 %. We hope that this research helps other regional groups with limited data sources or computational resources to attain the United Nations Sustainable Development Goal (SDG) concerning Life on Land. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: 12 pages

arXiv:2201.06311 [pdf, other]

Graph Neural Networks for Cross-Camera Data Association

Authors: Elena Luna, Juan C. SanMiguel, José M. Martínez, Pablo Carballeira

Abstract: Cross-camera image data association is essential for many multi-camera computer vision tasks, such as multi-camera pedestrian detection, multi-camera multi-target tracking, 3D pose estimation, etc. This association task is typically stated as a bipartite graph matching problem and often solved by applying minimum-cost flow techniques, which may be computationally inefficient with large data. Furth… ▽ More Cross-camera image data association is essential for many multi-camera computer vision tasks, such as multi-camera pedestrian detection, multi-camera multi-target tracking, 3D pose estimation, etc. This association task is typically stated as a bipartite graph matching problem and often solved by applying minimum-cost flow techniques, which may be computationally inefficient with large data. Furthermore, cameras are usually treated by pairs, obtaining local solutions, rather than finding a global solution at once. Other key issue is that of the affinity measurement: the widespread usage of non-learnable pre-defined distances, such as the Euclidean and Cosine ones. This paper proposes an efficient approach for cross-cameras data-association focused on a global solution, instead of processing cameras by pairs. To avoid the usage of fixed distances, we leverage the connectivity of Graph Neural Networks, previously unused in this scope, using a Message Passing Network to jointly learn features and similarity. We validate the proposal for pedestrian multi-view association, showing results over the EPFL multi-camera pedestrian dataset. Our approach considerably outperforms the literature data association techniques, without requiring to be trained in the same scenario in which it is tested. Our code is available at \url{http://www-vpu.eps.uam.es/publications/gnn_cca}. △ Less

Submitted 17 January, 2022; originally announced January 2022.

arXiv:2201.03074 [pdf, other]

A Survey of Passive Sensing in the Workplace

Authors: Subigya Nepal, Gonzalo J. Martinez, Arvind Pillai, Koustuv Saha, Shayan Mirjafari, Vedant Das Swain, Xuhai Xu, Pino G. Audia, Munmun De Choudhury, Anind K. Dey, Aaron Striegel, Andrew T. Campbell

Abstract: As emerging technologies increasingly integrate into all facets of our lives, the workplace stands at the forefront of potential transformative changes. A notable development in this realm is the advent of passive sensing technology, designed to enhance both cognitive and physical capabilities by monitoring human behavior. This paper reviews current research on the application of passive sensing t… ▽ More As emerging technologies increasingly integrate into all facets of our lives, the workplace stands at the forefront of potential transformative changes. A notable development in this realm is the advent of passive sensing technology, designed to enhance both cognitive and physical capabilities by monitoring human behavior. This paper reviews current research on the application of passive sensing technology in the workplace, focusing on its impact on employee wellbeing and productivity. Additionally, we explore unresolved issues and outline prospective pathways for the incorporation of passive sensing in future workplaces. △ Less

Submitted 30 March, 2024; v1 submitted 9 January, 2022; originally announced January 2022.

Comments: Added references and other minor revisions. Also udated to include relevant works published after 2022

ACM Class: H.5.0

arXiv:2110.06013 [pdf, ps, other]

doi 10.1142/9789811235740_0009

On Wave-Based Majority Gates with Cellular Automata

Authors: Genaro J. Martinez, Andrew Adamatzky, Shigeru Ninagawa, Kenichi Morita

Abstract: We demonstrate a discrete implementation of a wave-based majority gate in a chaotic Life-like cellular automaton. The gate functions via controlling of patterns' propagation into stationary channels. The gate presented is realisable in many living and non-living substrates that show wave-like activity of its space-time dynamics or pattern propagation. In the gate a symmetric pattern represents a b… ▽ More We demonstrate a discrete implementation of a wave-based majority gate in a chaotic Life-like cellular automaton. The gate functions via controlling of patterns' propagation into stationary channels. The gate presented is realisable in many living and non-living substrates that show wave-like activity of its space-time dynamics or pattern propagation. In the gate a symmetric pattern represents a binary value 0 while a non-symmetric pattern represents a binary value 1. Origination of the patterns and their symmetry type are encoded by the particle reactions at the beginning of computation. The patterns propagate in channels of the gate and compete for the space at the intersection of the channels. We implement 3-inputs majority gates using a W topology showing additional implementations of 5-inputs majority gates and one tree (cascade) majority gate. △ Less

Submitted 9 October, 2021; originally announced October 2021.

Comments: 18 pages, 12 figures, 2 tables. https://www.worldscientific.com/doi/abs/10.1142/9789811235740_0009

Journal ref: Handbook of Unconventional Computing, Volume 2: Implementations, chapter 9, pp. 271-288, (2021)

arXiv:2109.10549 [pdf, other]

doi 10.3233/FI-222107

On the $2$-domination number of cylinders with small cycles

Authors: E. M. Garzón, J. A. Martínez, J. J. Moreno, M. L. Puertas

Abstract: Domination-type parameters are difficult to manage in Cartesian product graphs and there is usually no general relationship between the parameter in both factors and in the product graph. This is the situation of the domination number, the Roman domination number or the $2$-domination number, among others. Contrary to what happens with the domination number and the Roman domination number, the… ▽ More Domination-type parameters are difficult to manage in Cartesian product graphs and there is usually no general relationship between the parameter in both factors and in the product graph. This is the situation of the domination number, the Roman domination number or the $2$-domination number, among others. Contrary to what happens with the domination number and the Roman domination number, the $2$-domination number remains unknown in cylinders, that is, the Cartesian product of a cycle and a path and in this paper, we will compute this parameter in the cylinders with small cycles. We will develop two algorithms involving the $(\min,+)$ matrix product that will allow us to compute the desired values of $γ_2(C_n\Box P_m)$, with $3\leq n\leq 15$ and $m\geq 2$. We will also pose a conjecture about the general formulae for the $2$-domination number in this graph class. △ Less

Submitted 14 April, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

Comments: 15 pages, 1 figure

ACM Class: G.2.2; F.2.2

Journal ref: Fundamenta Informaticae, Volume 185, Issue 2 (May 6, 2022) fi:8516

arXiv:2107.14700 [pdf]

Seeing poverty from space, how much can it be tuned?

Authors: Tomas Sako, Arturo Jr M. Martinez

Abstract: Since the United Nations launched the Sustainable Development Goals (SDG) in 2015, numerous universities, NGOs and other organizations have attempted to develop tools for monitoring worldwide progress in achieving them. Led by advancements in the fields of earth observation techniques, data sciences and the emergence of artificial intelligence, a number of research teams have developed innovative… ▽ More Since the United Nations launched the Sustainable Development Goals (SDG) in 2015, numerous universities, NGOs and other organizations have attempted to develop tools for monitoring worldwide progress in achieving them. Led by advancements in the fields of earth observation techniques, data sciences and the emergence of artificial intelligence, a number of research teams have developed innovative tools for highlighting areas of vulnerability and tracking the implementation of SDG targets. In this paper we demonstrate that individuals with no organizational affiliation and equipped only with common hardware, publicly available datasets and cloud-based computing services can participate in the improvement of predicting machine-learning-based approaches to predicting local poverty levels in a given agro-ecological environment. The approach builds upon several pioneering efforts over the last five years related to mapping poverty by deep learning to process satellite imagery and "ground-truth" data from the field to link features with incidence of poverty in a particular context. The approach employs new methods for object identification in order to optimize the modeled results and achieve significantly high accuracy. A key goal of the project was to intentionally keep costs as low as possible - by using freely available resources - so that citizen scientists, students and organizations could replicate the method in other areas of interest. Moreover, for simplicity, the input data used were derived from just a handful of sources (involving only earth observation and population headcounts). The results of the project could therefore certainly be strengthened further through the integration of proprietary data from social networks, mobile phone providers, and other sources. △ Less

Submitted 30 July, 2021; originally announced July 2021.

Comments: 19 pages

arXiv:2104.05711 [pdf, other]

doi 10.1038/s41467-022-28810-x

The world-wide waste web

Authors: Johann H. Martínez, Sergi Romero, José J. Ramasco, Ernesto Estrada

Abstract: Countries globally trade with tons of waste materials every year, some of which are highly hazardous. This trade admits a network representation of the world-wide waste web, with countries as vertices and flows as directed weighted edges. Here we investigate the main properties of this network by tracking 108 categories of wastes interchanged in the period 2001-2019. Although, most of the hazardou… ▽ More Countries globally trade with tons of waste materials every year, some of which are highly hazardous. This trade admits a network representation of the world-wide waste web, with countries as vertices and flows as directed weighted edges. Here we investigate the main properties of this network by tracking 108 categories of wastes interchanged in the period 2001-2019. Although, most of the hazardous waste was traded between developed nations, a disproportionate asymmetry existed in the flow from developed to developing countries. Using a dynamical model, we simulate how waste stress propagates through the network and affects the countries. We identify 28 countries with low Environmental Performance Index that are at high risk of waste congestion. Therefore, they are at threat of improper handling and disposal of hazardous waste. We find evidence of pollution by heavy metals, by volatile organic compounds and/or by persistent organic pollutants, which are used as chemical fingerprints, due to the improper handling of waste in several of these countries. △ Less

Submitted 14 March, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: Nat Commun (2022). Main manuscript, and supplementary information. Total of 15 figures and 58 pages

arXiv:2104.02920 [pdf, ps, other]

Visualization of the Computation Process of a Universal Register Machine

Authors: Shigeru Ninagawa, Genaro J. Martinez

Abstract: Universal register machine, a formal model of computation, can be emulated on the array of the Game of Life, a two-dimensional cellular automaton. We perform spectral analysis on the computation dynamical process of the universal register machine on the Game of Life. The array is divided into small sectors and the power spectrum is calculated from the evolution in each sector. The power spectrum c… ▽ More Universal register machine, a formal model of computation, can be emulated on the array of the Game of Life, a two-dimensional cellular automaton. We perform spectral analysis on the computation dynamical process of the universal register machine on the Game of Life. The array is divided into small sectors and the power spectrum is calculated from the evolution in each sector. The power spectrum can be classified into four categories by its shape; null, white noise, sharp peaks, and power law. By representing the shape of power spectrum by a mark, we can visualize the activity of the sector during the computation process. For example, the track of pulse moving between components of the universal register machine and the position of frequently modified registers can be identified. This method can expose the functional difference in each region of computing machine. △ Less

Submitted 22 May, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

arXiv:2102.11228 [pdf, ps, other]

doi 10.1109/IGARSS47720.2021.9554465

Subspace-Based Feature Fusion From Hyperspectral And Multispectral Image For Land Cover Classification

Authors: Juan Ramírez, Héctor Vargas, José Ignacio Martínez, Henry Arguello

Abstract: In remote sensing, hyperspectral (HS) and multispectral (MS) image fusion have emerged as a synthesis tool to improve the data set resolution. However, conventional image fusion methods typically degrade the performance of the land cover classification. In this paper, a feature fusion method from HS and MS images for pixel-based classification is proposed. More precisely, the proposed method first… ▽ More In remote sensing, hyperspectral (HS) and multispectral (MS) image fusion have emerged as a synthesis tool to improve the data set resolution. However, conventional image fusion methods typically degrade the performance of the land cover classification. In this paper, a feature fusion method from HS and MS images for pixel-based classification is proposed. More precisely, the proposed method first extracts spatial features from the MS image using morphological profiles. Then, the feature fusion model assumes that both the extracted morphological profiles and the HS image can be described as a feature matrix lying in different subspaces. An algorithm based on combining alternating optimization (AO) and the alternating direction method of multipliers (ADMM) is developed to solve efficiently the feature fusion problem. Finally, extensive simulations were run to evaluate the performance of the proposed feature fusion approach for two data sets. In general, the proposed approach exhibits a competitive performance compared to other feature extraction methods. △ Less

Submitted 3 April, 2022; v1 submitted 22 February, 2021; originally announced February 2021.

Comments: 4 pages, 2 figures, 1 table, and 2 algorithms. Submitted to the International Geoscience and Remote Sensing Symposium (2021)

arXiv:2102.04091 [pdf, other]

Online Clustering-based Multi-Camera Vehicle Tracking in Scenarios with overlapping FOVs

Authors: Elena Luna, Juan C. SanMiguel, Jose M. Martínez, Marcos Escudero-Viñolo

Abstract: Multi-Target Multi-Camera (MTMC) vehicle tracking is an essential task of visual traffic monitoring, one of the main research fields of Intelligent Transportation Systems. Several offline approaches have been proposed to address this task; however, they are not compatible with real-world applications due to their high latency and post-processing requirements. In this paper, we present a new low-la… ▽ More Multi-Target Multi-Camera (MTMC) vehicle tracking is an essential task of visual traffic monitoring, one of the main research fields of Intelligent Transportation Systems. Several offline approaches have been proposed to address this task; however, they are not compatible with real-world applications due to their high latency and post-processing requirements. In this paper, we present a new low-latency online approach for MTMC tracking in scenarios with partially overlapping fields of view (FOVs), such as road intersections. Firstly, the proposed approach detects vehicles at each camera. Then, the detections are merged between cameras by applying cross-camera clustering based on appearance and location. Lastly, the clusters containing different detections of the same vehicle are temporally associated to compute the tracks on a frame-by-frame basis. The experiments show promising low-latency results while addressing real-world challenges such as the a priori unknown and time-varying number of targets and the continuous state estimation of them without performing any post-processing of the trajectories. △ Less

Submitted 8 February, 2021; originally announced February 2021.

Comments: 10 pages

arXiv:2101.06720 [pdf, other]

Deep Multi-Task Learning for Joint Localization, Perception, and Prediction

Authors: John Phillips, Julieta Martinez, Ioan Andrei Bârsan, Sergio Casas, Abbas Sadat, Raquel Urtasun

Abstract: Over the last few years, we have witnessed tremendous progress on many subtasks of autonomous driving, including perception, motion forecasting, and motion planning. However, these systems often assume that the car is accurately localized against a high-definition map. In this paper we question this assumption, and investigate the issues that arise in state-of-the-art autonomy stacks under localiz… ▽ More Over the last few years, we have witnessed tremendous progress on many subtasks of autonomous driving, including perception, motion forecasting, and motion planning. However, these systems often assume that the car is accurately localized against a high-definition map. In this paper we question this assumption, and investigate the issues that arise in state-of-the-art autonomy stacks under localization error. Based on our observations, we design a system that jointly performs perception, prediction, and localization. Our architecture is able to reuse computation between both tasks, and is thus able to correct localization errors efficiently. We show experiments on a large-scale autonomy dataset, demonstrating the efficiency and accuracy of our proposed approach. △ Less

Submitted 10 April, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

Comments: CVPR 21

arXiv:2012.12437 [pdf, other]

doi 10.1109/IROS45743.2020.9340924

Pit30M: A Benchmark for Global Localization in the Age of Self-Driving Cars

Authors: Julieta Martinez, Sasha Doubov, Jack Fan, Ioan Andrei Bârsan, Shenlong Wang, Gellért Máttyus, Raquel Urtasun

Abstract: We are interested in understanding whether retrieval-based localization approaches are good enough in the context of self-driving vehicles. Towards this goal, we introduce Pit30M, a new image and LiDAR dataset with over 30 million frames, which is 10 to 100 times larger than those used in previous work. Pit30M is captured under diverse conditions (i.e., season, weather, time of the day, traffic),… ▽ More We are interested in understanding whether retrieval-based localization approaches are good enough in the context of self-driving vehicles. Towards this goal, we introduce Pit30M, a new image and LiDAR dataset with over 30 million frames, which is 10 to 100 times larger than those used in previous work. Pit30M is captured under diverse conditions (i.e., season, weather, time of the day, traffic), and provides accurate localization ground truth. We also automatically annotate our dataset with historical weather and astronomical data, as well as with image and LiDAR semantic segmentation as a proxy measure for occlusion. We benchmark multiple existing methods for image and LiDAR retrieval and, in the process, introduce a simple, yet effective convolutional network-based LiDAR retrieval method that is competitive with the state of the art. Our work provides, for the first time, a benchmark for sub-metre retrieval-based localization at city scale. The dataset, its Python SDK, as well as more information about the sensors, calibration, and metadata, are available on the project website: https://pit30m.github.io/ △ Less

Submitted 30 April, 2024; v1 submitted 22 December, 2020; originally announced December 2020.

Comments: Published at IROS 2020

arXiv:2012.10942 [pdf, other]

Learning to Localize Through Compressed Binary Maps

Authors: Xinkai Wei, Ioan Andrei Bârsan, Shenlong Wang, Julieta Martinez, Raquel Urtasun

Abstract: One of the main difficulties of scaling current localization systems to large environments is the on-board storage required for the maps. In this paper we propose to learn to compress the map representation such that it is optimal for the localization task. As a consequence, higher compression rates can be achieved without loss of localization accuracy when compared to standard coding schemes that… ▽ More One of the main difficulties of scaling current localization systems to large environments is the on-board storage required for the maps. In this paper we propose to learn to compress the map representation such that it is optimal for the localization task. As a consequence, higher compression rates can be achieved without loss of localization accuracy when compared to standard coding schemes that optimize for reconstruction, thus ignoring the end task. Our experiments show that it is possible to learn a task-specific compression which reduces storage requirements by two orders of magnitude over general-purpose codecs such as WebP without sacrificing performance. △ Less

Submitted 20 December, 2020; originally announced December 2020.

Comments: 18 pages, 12 figures, 6 tables; Presented at CVPR 2019

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 10316-10324

arXiv:2011.09952 [pdf, other]

On the Request-Trip-Vehicle Assignment Problem

Authors: J. Carlos Martínez Mori, Samitha Samaranayake

Abstract: The request-trip-vehicle assignment problem is at the heart of a popular decomposition strategy for online vehicle routing. In this framework, assignments are done in batches in order to exploit any shareability among vehicles and incoming travel requests. We study a natural ILP formulation and its LP relaxation. Our main result is an LP-based randomized rounding algorithm that, whenever the insta… ▽ More The request-trip-vehicle assignment problem is at the heart of a popular decomposition strategy for online vehicle routing. In this framework, assignments are done in batches in order to exploit any shareability among vehicles and incoming travel requests. We study a natural ILP formulation and its LP relaxation. Our main result is an LP-based randomized rounding algorithm that, whenever the instance is feasible, leverages mild assumptions to return an assignment whose: i) expected cost is at most that of an optimal solution, and ii) expected fraction of unassigned requests is at most $1/e$. If trip-vehicle assignment costs are $α$-approximate, we pay an additional factor of $α$ in the expected cost. We can relax the feasibility requirement by considering the penalty version of the problem, in which a penalty is paid for each unassigned request. We find that, whenever a request is repeatedly unassigned after a number of rounds, with high probability it is so in accordance with the sequence of LP solutions and not because of a rounding error. We additionally introduce a deterministic rounding heuristic inspired by our randomized technique. Our computational experiments show that our rounding algorithms achieve a performance similar to that of the ILP at a reduced computation time, far improving on our theoretical guarantee. The reason for this is that, although the assignment problem is hard in theory, the natural LP relaxation tends to be very tight in practice. △ Less

Submitted 31 July, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

Comments: SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21)

arXiv:2011.06949 [pdf, other]

Learning language variations in news corpora through differential embeddings

Authors: Carlos Selmo, Julian F. Martinez, Mariano G. Beiró, J. Ignacio Alvarez-Hamelin

Abstract: There is an increasing interest in the NLP community in capturing variations in the usage of language, either through time (i.e., semantic drift), across regions (as dialects or variants) or in different social contexts (i.e., professional or media technolects). Several successful dynamical embeddings have been proposed that can track semantic change through time. Here we show that a model with a… ▽ More There is an increasing interest in the NLP community in capturing variations in the usage of language, either through time (i.e., semantic drift), across regions (as dialects or variants) or in different social contexts (i.e., professional or media technolects). Several successful dynamical embeddings have been proposed that can track semantic change through time. Here we show that a model with a central word representation and a slice-dependent contribution can learn word embeddings from different corpora simultaneously. This model is based on a star-like representation of the slices. We apply it to The New York Times and The Guardian newspapers, and we show that it can capture both temporal dynamics in the yearly slices of each corpus, and language variations between US and UK English in a curated multi-source corpus. We provide an extensive evaluation of this methodology. △ Less

Submitted 13 November, 2020; originally announced November 2020.

Showing 1–50 of 102 results for author: Martínez, J