Zum Hauptinhalt springen

Showing 1–50 of 53 results for author: Wilson, S

Searching in archive cs. Search in all archives.
.
  1. A Survey on Small-Scale Testbeds for Connected and Automated Vehicles and Robot Swarms

    Authors: Armin Mokhtarian, Jianye Xu, Patrick Scheffe, Maximilian Kloock, Simon Schäfer, Heeseung Bang, Viet-Anh Le, Sangeet Ulhas, Johannes Betz, Sean Wilson, Spring Berman, Liam Paull, Amanda Prorok, Bassam Alrifaee

    Abstract: Connected and automated vehicles and robot swarms hold transformative potential for enhancing safety, efficiency, and sustainability in the transportation and manufacturing sectors. Extensive testing and validation of these technologies is crucial for their deployment in the real world. While simulations are essential for initial testing, they often have limitations in capturing the complex dynami… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 16 pages, 11 figures, 1 table. This work has been submitted to the IEEE Robotics & Automation Magazine for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  2. arXiv:2407.14779  [pdf, other

    cs.CY cs.AI cs.HC

    Do Generative AI Models Output Harm while Representing Non-Western Cultures: Evidence from A Community-Centered Approach

    Authors: Sourojit Ghosh, Pranav Narayanan Venkit, Sanjana Gautam, Shomir Wilson, Aylin Caliskan

    Abstract: Our research investigates the impact of Generative Artificial Intelligence (GAI) models, specifically text-to-image generators (T2Is), on the representation of non-Western cultures, with a focus on Indian contexts. Despite the transformative potential of T2Is in content creation, concerns have arisen regarding biases that may lead to misrepresentations and marginalizations. Through a community-cen… ▽ More

    Submitted 3 August, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: This is the pre-peer reviewed version, which has been accepted at the 7th AAAI ACM Conference on AI, Ethics, and Society, Oct. 21, 2024, California, USA

  3. arXiv:2407.04903  [pdf, other

    cs.CL cs.AI cs.CV

    MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension

    Authors: Zekun Li, Xianjun Yang, Kyuri Choi, Wanrong Zhu, Ryan Hsieh, HyeonJung Kim, Jin Hyuk Lim, Sungyoung Ji, Byungju Lee, Xifeng Yan, Linda Ruth Petzold, Stephen D. Wilson, Woosang Lim, William Yang Wang

    Abstract: The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks pr… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Code and data are available at https://github.com/Leezekun/MMSci

  4. arXiv:2407.01817  [pdf, other

    cs.CL cs.CY cs.HC

    Race and Privacy in Broadcast Police Communications

    Authors: Pranav Narayanan Venkit, Christopher Graziul, Miranda Ardith Goodman, Samantha Nicole Kenny, Shomir Wilson

    Abstract: Radios are essential for the operations of modern police departments, and they function as both a collaborative communication technology and a sociotechnical system. However, little prior research has examined their usage or their connections to individual privacy and the role of race in policing, two growing topics of concern in the US. As a case study, we examine the Chicago Police Department's… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted in the 27th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW '24)

  5. arXiv:2404.07461  [pdf, other

    cs.CL cs.AI

    "Confidently Nonsensical?'': A Critical Survey on the Perspectives and Challenges of 'Hallucinations' in NLP

    Authors: Pranav Narayanan Venkit, Tatiana Chakravorti, Vipul Gupta, Heidi Biggs, Mukund Srinath, Koustava Goswami, Sarah Rajtmajer, Shomir Wilson

    Abstract: We investigate how hallucination in large language models (LLM) is characterized in peer-reviewed literature using a critical examination of 103 publications across NLP research. Through a comprehensive review of sociological and technological literature, we identify a lack of agreement with the term `hallucination.' Additionally, we conduct a survey with 171 practitioners from the field of NLP an… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  6. arXiv:2403.11593  [pdf, other

    cs.CV

    End-to-end multi-modal product matching in fashion e-commerce

    Authors: Sándor Tóth, Stephen Wilson, Alexia Tsoukara, Enric Moreu, Anton Masalovich, Lars Roemheld

    Abstract: Product matching, the task of identifying different representations of the same product for better discoverability, curation, and pricing, is a key capability for online marketplace and e-commerce companies. We present a robust multi-modal product matching system in an industry setting, where large datasets, data distribution shifts and unseen domains pose challenges. We compare different approach… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 9 pages, submitted to SIGKDD

  7. arXiv:2402.11006  [pdf, other

    cs.CR cs.LG

    Automated Detection and Analysis of Data Practices Using A Real-World Corpus

    Authors: Mukund Srinath, Pranav Venkit, Maria Badillo, Florian Schaub, C. Lee Giles, Shomir Wilson

    Abstract: Privacy policies are crucial for informing users about data practices, yet their length and complexity often deter users from reading them. In this paper, we propose an automated approach to identify and visualize data practices within privacy policies at different levels of detail. Leveraging crowd-sourced annotations from the ToS;DR platform, we experiment with various methods to match policy ex… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  8. arXiv:2401.01089  [pdf, other

    cs.CL cs.AI cs.CE

    Quokka: An Open-source Large Language Model ChatBot for Material Science

    Authors: Xianjun Yang, Stephen D. Wilson, Linda Petzold

    Abstract: This paper presents the development of a specialized chatbot for materials science, leveraging the Llama-2 language model, and continuing pre-training on the expansive research articles in the materials science domain from the S2ORC dataset. The methodology involves an initial pretraining phase on over one million domain-specific papers, followed by an instruction-tuning process to refine the chat… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Work in progress

  9. arXiv:2312.14978  [pdf

    cs.IR cs.AI cs.LG cs.NE

    On Quantifying Sentiments of Financial News -- Are We Doing the Right Things?

    Authors: Gourab Nath, Arav Sood, Aanchal Khanna, Savi Wilson, Karan Manot, Sree Kavya Durbaka

    Abstract: Typical investors start off the day by going through the daily news to get an intuition about the performance of the market. The speculations based on the tone of the news ultimately shape their responses towards the market. Today, computers are being trained to compute the news sentiment so that it can be used as a variable to predict stock market movements and returns. Some researchers have even… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: submitted to the 56th Annual Convention of ORSI and 10th International Conference on Business Analytics and Intelligence held at the Indian Institute of Science (IISc) during 18-20 December 2023

    ACM Class: I.2.7

  10. arXiv:2310.12318  [pdf, other

    cs.CL cs.AI cs.CY cs.HC

    The Sentiment Problem: A Critical Survey towards Deconstructing Sentiment Analysis

    Authors: Pranav Narayanan Venkit, Mukund Srinath, Sanjana Gautam, Saranya Venkatraman, Vipul Gupta, Rebecca J. Passonneau, Shomir Wilson

    Abstract: We conduct an inquiry into the sociotechnical aspects of sentiment analysis (SA) by critically examining 189 peer-reviewed papers on their applications, models, and datasets. Our investigation stems from the recognition that SA has become an integral component of diverse sociotechnical systems, exerting influence on both social and technical users. By delving into sociological and technological li… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted and will appear at the EMNLP 2023 Main Conference

  11. arXiv:2310.08866  [pdf, other

    cs.LG cs.AI

    Adaptivity and Modularity for Efficient Generalization Over Task Complexity

    Authors: Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio

    Abstract: Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that standard transformers face challenges in solving these tasks. These tasks are variations of pointer value retrieval previously introduced by Zhang et a… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  12. arXiv:2310.08687  [pdf, other

    cs.HC cs.CR

    Understanding How to Inform Blind and Low-Vision Users about Data Privacy through Privacy Question Answering Assistants

    Authors: Yuanyuan Feng, Abhilasha Ravichander, Yaxing Yao, Shikun Zhang, Rex Chen, Shomir Wilson, Norman Sadeh

    Abstract: Understanding and managing data privacy in the digital world can be challenging for sighted users, let alone blind and low-vision (BLV) users. There is limited research on how BLV users, who have special accessibility needs, navigate data privacy, and how potential privacy tools could assist them. We conducted an in-depth qualitative study with 21 US BLV participants to understand their data priva… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: This research paper is accepted by USENIX Security '24

  13. arXiv:2308.12539  [pdf, other

    cs.CL cs.AI cs.LG

    CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias

    Authors: Vipul Gupta, Pranav Narayanan Venkit, Hugo Laurençon, Shomir Wilson, Rebecca J. Passonneau

    Abstract: As language models (LMs) become increasingly powerful and widely used, it is important to quantify them for sociodemographic bias with potential for harm. Prior measures of bias are sensitive to perturbations in the templates designed to compare performance across social groups, due to factors such as low diversity or limited number of templates. Also, most previous work considers only one NLP tas… ▽ More

    Submitted 7 August, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

  14. arXiv:2308.04346  [pdf, other

    cs.CL cs.CY

    Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated Articles

    Authors: Pranav Narayanan Venkit, Sanjana Gautam, Ruchi Panchanadikar, Ting-Hao `Kenneth' Huang, Shomir Wilson

    Abstract: We investigate the potential for nationality biases in natural language processing (NLP) models using human evaluation methods. Biased NLP models can perpetuate stereotypes and lead to algorithmic discrimination, posing a significant challenge to the fairness and justice of AI systems. Our study employs a two-step mixed-methods approach that includes both quantitative and qualitative analysis to i… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  15. arXiv:2307.09209  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models

    Authors: Pranav Narayanan Venkit, Mukund Srinath, Shomir Wilson

    Abstract: We analyze sentiment analysis and toxicity detection models to detect the presence of explicit bias against people with disability (PWD). We employ the bias identification framework of Perturbation Sensitivity Analysis to examine conversations related to PWD on social media platforms, specifically Twitter and Reddit, in order to gain insight into how disability bias is disseminated in real-world s… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: TrustNLP at ACL 2023

    Journal ref: Proceedings at The Third Workshop on Trustworthy Natural Language Processing collocated at the 61st Annual Meeting Of The Association For Computational Linguistics. 2023

  16. arXiv:2307.03386  [pdf, other

    cs.SE

    ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments

    Authors: Jaydeb Saker, Sayma Sultana, Steven R. Wilson, Amiangshu Bosu

    Abstract: Background: The existence of toxic conversations in open-source platforms can degrade relationships among software developers and may negatively impact software product quality. To help mitigate this, some initial work has been done to detect toxic comments in the Software Engineering (SE) domain. Aims: Since automatically classifying an entire text as toxic or non-toxic does not help human modera… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Journal ref: The 17th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2023

  17. arXiv:2306.08158  [pdf, other

    cs.CL cs.AI cs.LG

    Sociodemographic Bias in Language Models: A Survey and Forward Path

    Authors: Vipul Gupta, Pranav Narayanan Venkit, Shomir Wilson, Rebecca J. Passonneau

    Abstract: Sociodemographic bias in language models (LMs) has the potential for harm when deployed in real-world settings. This paper presents a comprehensive survey of the past decade of research on sociodemographic bias in LMs, organized into a typology that facilitates examining the different aims: types of bias, quantifying bias, and debiasing techniques. We track the evolution of the latter two question… ▽ More

    Submitted 13 August, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 23 pages, 3 figure

  18. Range Limited Coverage Control using Air-Ground Multi-Robot Teams

    Authors: Max Rudolph, Sean Wilson, Magnus Egerstedt

    Abstract: In this paper, we investigate how heterogeneous multi-robot systems with different sensing capabilities can observe a domain with an apriori unknown density function. Common coverage control techniques are targeted towards homogeneous teams of robots and do not consider what happens when the sensing capabilities of the robots are vastly different. This work proposes an extension to Lloyd's algorit… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: Published at 2021 IEEE International Conference on Robotics and Automation (ICRA)

  19. arXiv:2302.05597  [pdf, other

    cs.CL cs.AI

    MatKB: Semantic Search for Polycrystalline Materials Synthesis Procedures

    Authors: Xianjun Yang, Stephen Wilson, Linda Petzold

    Abstract: In this paper, we present a novel approach to knowledge extraction and retrieval using Natural Language Processing (NLP) techniques for material science. Our goal is to automatically mine structured knowledge from millions of research articles in the field of polycrystalline materials and make it easily accessible to the broader community. The proposed method leverages NLP techniques such as entit… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Work in Progress

  20. arXiv:2302.02463  [pdf, other

    cs.CL cs.AI

    Nationality Bias in Text Generation

    Authors: Pranav Narayanan Venkit, Sanjana Gautam, Ruchi Panchanadikar, Ting-Hao 'Kenneth' Huang, Shomir Wilson

    Abstract: Little attention is placed on analyzing nationality bias in language models, especially when nationality is highly used as a factor in increasing the performance of social NLP models. This paper examines how a text generation model, GPT-2, accentuates pre-existing societal biases about country-based demonyms. We generate stories using GPT-2 for various nationalities and use sensitivity analysis to… ▽ More

    Submitted 14 February, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: Paper accepted in the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL2023)

  21. arXiv:2210.12401  [pdf, other

    cs.CL

    PcMSP: A Dataset for Scientific Action Graphs Extraction from Polycrystalline Materials Synthesis Procedure Text

    Authors: Xianjun Yang, Ya Zhuo, Julia Zuo, Xinlu Zhang, Stephen Wilson, Linda Petzold

    Abstract: Scientific action graphs extraction from materials synthesis procedures is important for reproducible research, machine automation, and material prediction. But the lack of annotated data has hindered progress in this field. We demonstrate an effort to annotate Polycrystalline Materials Synthesis Procedures (PcMSP) from 305 open access scientific articles for the construction of synthesis action g… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  22. arXiv:2209.13595  [pdf, other

    cs.CY cs.CL

    What Are You Anxious About? Examining Subjects of Anxiety during the COVID-19 Pandemic

    Authors: Lucia L. Chen, Steven R. Wilson, Sophie Lohmann, Daniela V. Negraia

    Abstract: COVID-19 poses disproportionate mental health consequences to the public during different phases of the pandemic. We use a computational approach to capture the specific aspects that trigger an online community's anxiety about the pandemic and investigate how these aspects change over time. First, we identified nine subjects of anxiety (SOAs) in a sample of Reddit posts ($N$=86) from r/COVID19\_su… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: This paper is accepted at 17TH INTERNATIONAL CONFERENCE ON WEB AND SOCIAL MEDIA (ICWSM) 2023

  23. Effects of Online Self-Disclosure on Social Feedback During the COVID-19 Pandemic

    Authors: Jooyoung Lee, Sarah Rajtmajer, Eesha Srivatsavaya, Shomir Wilson

    Abstract: We investigate relationships between online self-disclosure and received social feedback during the COVID-19 crisis. We crawl a total of 2,399 posts and 29,851 associated comments from the r/COVID19_support subreddit and manually extract fine-grained personal information categories and types of social support sought from each post. We develop a BERT-based ensemble classifier to automatically ident… ▽ More

    Submitted 21 September, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: Accepted to ACM Transactions on Social Computing

  24. arXiv:2208.13930  [pdf, other

    cs.CV

    SAFE: Sensitivity-Aware Features for Out-of-Distribution Object Detection

    Authors: Samuel Wilson, Tobias Fischer, Feras Dayoub, Dimity Miller, Niko Sünderhauf

    Abstract: We address the problem of out-of-distribution (OOD) detection for the task of object detection. We show that residual convolutional layers with batch normalisation produce Sensitivity-Aware FEatures (SAFE) that are consistently powerful for distinguishing in-distribution from out-of-distribution detections. We extract SAFE vectors for every detected object, and train a multilayer perceptron on the… ▽ More

    Submitted 22 August, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

    Journal ref: IEEE International Conference on Computer Vision 2023

  25. arXiv:2208.10898  [pdf, other

    cs.CL cs.SI

    Don't Take it Personally: Analyzing Gender and Age Differences in Ratings of Online Humor

    Authors: J. A. Meaney, Steven R. Wilson, Luis Chiruzzo, Walid Magdy

    Abstract: Computational humor detection systems rarely model the subjectivity of humor responses, or consider alternative reactions to humor - namely offense. We analyzed a large dataset of humor and offense ratings by male and female annotators of different age groups. We find that women link these two concepts more strongly than men, and they tend to give lower humor ratings and higher offense scores. We… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

  26. arXiv:2206.14169  [pdf, other

    cs.CL

    Creation and Analysis of an International Corpus of Privacy Laws

    Authors: Sonu Gupta, Ellen Poplavska, Nora O'Toole, Siddhant Arora, Thomas Norton, Norman Sadeh, Shomir Wilson

    Abstract: The landscape of privacy laws and regulations around the world is complex and ever-changing. National and super-national laws, agreements, decrees, and other government-issued rules form a patchwork that companies must follow to operate internationally. To examine the status and evolution of this patchwork, we introduce the Government Privacy Instructions Corpus, or GPI Corpus, of 1,043 privacy la… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: 14 pages, 7 figures, 7 tables

  27. arXiv:2202.00879  [pdf, ps, other

    cs.SI cs.AI cs.CL cs.CR cs.CY

    Automated Detection of Doxing on Twitter

    Authors: Younes Karimi, Anna Squicciarini, Shomir Wilson

    Abstract: Doxing refers to the practice of disclosing sensitive personal information about a person without their consent. This form of cyberbullying is an unpleasant and sometimes dangerous phenomenon for online social networks. Although prior work exists on automated identification of other types of cyberbullying, a need exists for methods capable of detecting doxing on Twitter specifically. We propose an… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: 24 pages, 1 figure. Accepted in the 25th ACM Conference on Computer-Supported Cooperative Work and Social Computing (ACM CSCW 2022)

    MSC Class: 68T01; 68T50; 91F20 ACM Class: H.4.3; I.7.0; J.4; K.4.2

  28. arXiv:2112.05341  [pdf, other

    cs.CV cs.AI

    Hyperdimensional Feature Fusion for Out-Of-Distribution Detection

    Authors: Samuel Wilson, Tobias Fischer, Niko Sünderhauf, Feras Dayoub

    Abstract: We introduce powerful ideas from Hyperdimensional Computing into the challenging field of Out-of-Distribution (OOD) detection. In contrast to most existing work that performs OOD detection based on only a single layer of a neural network, we use similarity-preserving semi-orthogonal projection matrices to project the feature maps from multiple layers into a common vector space. By repeatedly apply… ▽ More

    Submitted 29 August, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: Accepted to WACV2023

  29. arXiv:2111.15592  [pdf, other

    cs.CV cs.LG

    MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale

    Authors: Kasra Hosseini, Daniel C. S. Wilson, Kaspar Beelen, Katherine McDonough

    Abstract: We present MapReader, a free, open-source software library written in Python for analyzing large map collections (scanned or born-digital). This library transforms the way historians can use maps by turning extensive, homogeneous map sets into searchable primary sources. MapReader allows users with little or no computer vision expertise to i) retrieve maps via web-servers; ii) preprocess and divid… ▽ More

    Submitted 30 November, 2021; originally announced November 2021.

    Comments: 13 pages, 9 figures

  30. arXiv:2111.13259  [pdf, other

    cs.CL cs.AI

    Identification of Bias Against People with Disabilities in Sentiment Analysis and Toxicity Detection Models

    Authors: Pranav Narayanan Venkit, Shomir Wilson

    Abstract: Sociodemographic biases are a common problem for natural language processing, affecting the fairness and integrity of its applications. Within sentiment analysis, these biases may undermine sentiment predictions for texts that mention personal attributes that unbiased human readers would consider neutral. Such discrimination can have great consequences in the applications of sentiment analysis bot… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

  31. arXiv:2110.02411  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Voice Aging with Audio-Visual Style Transfer

    Authors: Justin Wilson, Sunyeong Park, Seunghye J. Wilson, Ming C. Lin

    Abstract: Face aging techniques have used generative adversarial networks (GANs) and style transfer learning to transform one's appearance to look younger/older. Identity is maintained by conditioning these generative networks on a learned vector representation of the source content. In this work, we apply a similar approach to age a speaker's voice, referred to as voice aging. We first analyze the classifi… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

  32. arXiv:2109.13863  [pdf, other

    cs.LG cs.AI

    A First-Occupancy Representation for Reinforcement Learning

    Authors: Ted Moskovitz, Spencer R. Wilson, Maneesh Sahani

    Abstract: Both animals and artificial agents benefit from state representations that support rapid transfer of learning across tasks and which enable them to efficiently traverse their environments to reach rewarding states. The successor representation (SR), which measures the expected cumulative, discounted state occupancy under a fixed policy, enables efficient transfer to different reward structures in… ▽ More

    Submitted 6 November, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

  33. arXiv:2104.07592  [pdf, other

    cs.RO eess.SY

    Data-Driven Robust Barrier Functions for Safe, Long-Term Operation

    Authors: Yousef Emam, Paul Glotfelter, Sean Wilson, Gennaro Notomista, Magnus Egerstedt

    Abstract: Applications that require multi-robot systems to operate independently for extended periods of time in unknown or unstructured environments face a broad set of challenges, such as hardware degradation, changing weather patterns, or unfamiliar terrain. To operate effectively under these changing conditions, algorithms developed for long-term autonomy applications require a stronger focus on robustn… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: Submitted to IEEE Transactions on Robotics (T-RO) as a regular paper. arXiv admin note: text overlap with arXiv:1909.02966

  34. arXiv:2103.07833  [pdf, other

    cs.CL

    A `Sourceful' Twist: Emoji Prediction Based on Sentiment, Hashtags and Application Source

    Authors: Pranav Venkit, Zeba Karishma, Chi-Yang Hsu, Rahul Katiki, Kenneth Huang, Shomir Wilson, Patrick Dudas

    Abstract: We widely use emojis in social networking to heighten, mitigate or negate the sentiment of the text. Emoji suggestions already exist in many cross-platform applications but an emoji is predicted solely based a few prominent words instead of understanding the subject and substance of the text. Through this paper, we showcase the importance of using Twitter features to help the model understand the… ▽ More

    Submitted 13 March, 2021; originally announced March 2021.

  35. arXiv:2008.10769  [pdf, other

    cs.LG stat.ML

    Variable selection for Gaussian process regression through a sparse projection

    Authors: Chiwoo Park, David J. Borth, Nicholas S. Wilson, Chad N. Hunter

    Abstract: This paper presents a new variable selection approach integrated with Gaussian process (GP) regression. We consider a sparse projection of input variables and a general stationary covariance model that depends on the Euclidean distance between the projected features. The sparse projection matrix is considered as an unknown parameter. We propose a forward stagewise approach with embedded gradient d… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

  36. arXiv:2005.07655  [pdf, other

    cs.CL cs.SI

    Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity

    Authors: Steven R. Wilson, Walid Magdy, Barbara McGillivray, Gareth Tyson

    Abstract: As an online, crowd-sourced, open English-language slang dictionary, the Urban Dictionary platform contains a wealth of opinions, jokes, and definitions of terms, phrases, acronyms, and more. However, it is unclear exactly how activity on this platform relates to larger conversations happening elsewhere on the web, such as discussions on larger, more popular social media platforms. In this researc… ▽ More

    Submitted 18 May, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

    Comments: Accepted at The Web Science Conference 2020

  37. Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies

    Authors: Mukund Srinath, Shomir Wilson, C. Lee Giles

    Abstract: Organisations disclose their privacy practices by posting privacy policies on their website. Even though users often care about their digital privacy, they often don't read privacy policies since they require a significant investment in time and effort. Although natural language processing can help in privacy policy understanding, there has been a lack of large scale privacy policy corpora that co… ▽ More

    Submitted 30 March, 2024; v1 submitted 23 April, 2020; originally announced April 2020.

    Journal ref: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021

  38. arXiv:2004.02347  [pdf, other

    cs.MA eess.SY

    A Receding Horizon Scheduling Approach for Search & Rescue Scenarios

    Authors: Yousef Emam, Sean Wilson, Mathias Hakenberg, Ulrich Munz, Magnus Egerstedt

    Abstract: Many applications involving complex multi-task problems such as disaster relief, logistics and manufacturing necessitate the deployment and coordination of heterogeneous multi-agent systems due to the sheer number of tasks that must be executed simultaneously. A fundamental requirement for the successful coordination of such systems is leveraging the specialization of each agent within the team. T… ▽ More

    Submitted 5 April, 2020; originally announced April 2020.

    Comments: Accepted to IFAC World Congress 2020

  39. arXiv:2001.04639  [pdf, other

    cs.LG stat.ME stat.ML

    Robust Gaussian Process Regression with a Bias Model

    Authors: Chiwoo Park, David J. Borth, Nicholas S. Wilson, Chad N. Hunter, Fritz J. Friedersdorf

    Abstract: This paper presents a new approach to a robust Gaussian process (GP) regression. Most existing approaches replace an outlier-prone Gaussian likelihood with a non-Gaussian likelihood induced from a heavy tail distribution, such as the Laplace distribution and Student-t distribution. However, the use of a non-Gaussian likelihood would incur the need for a computationally expensive Bayesian approxima… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    MSC Class: 62G08

  40. arXiv:1911.00841  [pdf, other

    cs.CL

    Question Answering for Privacy Policies: Combining Computational and Legal Perspectives

    Authors: Abhilasha Ravichander, Alan W Black, Shomir Wilson, Thomas Norton, Norman Sadeh

    Abstract: Privacy policies are long and complex documents that are difficult for users to read and understand, and yet, they have legal effects on how user data is collected, managed and used. Ideally, we would like to empower users to inform themselves about issues that matter to them, and enable them to selectively explore those issues. We present PrivacyQA, a corpus consisting of 1750 questions about the… ▽ More

    Submitted 3 November, 2019; originally announced November 2019.

    Comments: EMNLP 2019

  41. arXiv:1910.10579  [pdf, other

    cs.NE cs.AI cs.CV cs.LG

    Autoencoding with a Classifier System

    Authors: Richard J. Preen, Stewart W. Wilson, Larry Bull

    Abstract: Autoencoders are data-specific compression algorithms learned automatically from examples. The predominant approach has been to construct single large global models that cover the domain. However, training and evaluating models of increasing size comes at the price of additional time and computational cost. Conditional computation, sparsity, and model pruning techniques can reduce these costs whil… ▽ More

    Submitted 12 May, 2021; v1 submitted 23 October, 2019; originally announced October 2019.

    Journal ref: IEEE Transactions on Evolutionary Computation (2021)

  42. arXiv:1907.08540  [pdf, other

    cs.CL

    Predicting Human Activities from User-Generated Content

    Authors: Steven R. Wilson, Rada Mihalcea

    Abstract: The activities we do are linked to our interests, personality, political preferences, and decisions we make about the future. In this paper, we explore the task of predicting human activities from user-generated content. We collect a dataset containing instances of social media users writing about a range of everyday activities. We then use a state-of-the-art sentence embedding framework tailored… ▽ More

    Submitted 19 July, 2019; originally announced July 2019.

    Comments: ACL 2019

  43. Multi-Label Transfer Learning for Multi-Relational Semantic Similarity

    Authors: Li Zhang, Steven R. Wilson, Rada Mihalcea

    Abstract: Multi-relational semantic similarity datasets define the semantic relations between two short texts in multiple ways, e.g., similarity, relatedness, and so on. Yet, all the systems to date designed to capture such relations target one relation at a time. We propose a multi-label transfer learning approach based on LSTM to make predictions for several relations simultaneously and aggregate the loss… ▽ More

    Submitted 10 April, 2019; v1 submitted 31 May, 2018; originally announced May 2018.

    Comments: Accepted to *SEM 2019

    Journal ref: Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (SEM 2019) (2019) 44-50

  44. arXiv:1804.07835  [pdf, other

    cs.CL

    Direct Network Transfer: Transfer Learning of Sentence Embeddings for Semantic Similarity

    Authors: Li Zhang, Steven R. Wilson, Rada Mihalcea

    Abstract: Sentence encoders, which produce sentence embeddings using neural networks, are typically evaluated by how well they transfer to downstream tasks. This includes semantic similarity, an important task in natural language understanding. Although there has been much work dedicated to building sentence encoders, the accompanying transfer learning techniques have received relatively little attention. I… ▽ More

    Submitted 31 October, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

  45. arXiv:1612.06685  [pdf, other

    cs.CL

    Stateology: State-Level Interactive Charting of Language, Feelings, and Values

    Authors: Konstantinos Pappas, Steven Wilson, Rada Mihalcea

    Abstract: People's personality and motivations are manifest in their everyday language usage. With the emergence of social media, ample examples of such usage are procurable. In this paper, we aim to analyze the vocabulary used by close to 200,000 Blogger users in the U.S. with the purpose of geographically portraying various demographic, linguistic, and psychological dimensions at the state level. We give… ▽ More

    Submitted 20 December, 2016; originally announced December 2016.

    Comments: 5 pages, 5 figures

  46. arXiv:1611.02360   

    cs.CL

    Cruciform: Solving Crosswords with Natural Language Processing

    Authors: Dragomir Radev, Rui Zhang, Steve Wilson, Derek Van Assche, Henrique Spyra Gubert, Alisa Krivokapic, MeiXing Dong, Chongruo Wu, Spruce Bondera, Luke Brandl, Jeremy Dohmann

    Abstract: Crossword puzzles are popular word games that require not only a large vocabulary, but also a broad knowledge of topics. Answering each clue is a natural language task on its own as many clues contain nuances, puns, or counter-intuitive word definitions. Additionally, it can be extremely difficult to ascertain definitive answers without the constraints of the crossword grid itself. This task is ch… ▽ More

    Submitted 23 November, 2016; v1 submitted 7 November, 2016; originally announced November 2016.

    Comments: based on feedback, we have determined that the paper needs more work

  47. arXiv:1510.08172  [pdf, other

    cs.IT

    Spectrally and Energy Efficient OFDM (SEE-OFDM) for Intensity Modulated Optical Wireless Systems

    Authors: Emily Lam, Sarah Kate Wilson, Hany Elgala, Thomas D. C. Little

    Abstract: Spectrally and energy efficient orthogonal frequency division multiplexing (SEE-OFDM) is an optical OFDM technique based on combining multiple asymmetrically clipped optical OFDM (ACO-OFDM) signals into one OFDM signal. By summing different components together, SEE-OFDM can achieve the same spectral efficiency as DC-biased optical OFDM (DCO-OFDM) without an energy-inefficient DC-bias. This paper i… ▽ More

    Submitted 27 October, 2015; originally announced October 2015.

    Comments: 26 pages, 13 figures

  48. arXiv:1510.00109  [pdf, other

    cs.MA eess.SY math.OC

    Confinement Control of Double Integrators using Partially Periodic Leader Trajectories

    Authors: Karthik Elamvazhuthi, Sean Wilson, Spring Berman

    Abstract: We consider a multi-agent confinement control problem in which a single leader has a purely repulsive effect on follower agents with double-integrator dynamics. By decomposing the leader's control inputs into periodic and aperiodic components, we show that the leader can be driven so as to guarantee confinement of the followers about a time-dependent trajectory in the plane. We use tools from aver… ▽ More

    Submitted 24 March, 2016; v1 submitted 1 October, 2015; originally announced October 2015.

    Comments: To appear in the Proceedings of the 2016 American Control Conference (Minor corrections and additional comments on the case with consensus type inter-follower interaction)

  49. arXiv:1311.0558  [pdf, ps, other

    math.CO cs.CG cs.DM

    A Quantitative Steinitz Theorem for Plane Triangulations

    Authors: Igor Pak, Stedman Wilson

    Abstract: We give a new proof of Steinitz's classical theorem in the case of plane triangulations, which allows us to obtain a new general bound on the grid size of the simplicial polytope realizing a given triangulation, subexponential in a number of special cases. Formally, we prove that every plane triangulation $G$ with $n$ vertices can be embedded in $\mathbb{R}^2$ in such a way that it is the vertic… ▽ More

    Submitted 3 November, 2013; originally announced November 2013.

    Comments: 25 pages, 6 postscript figures

    MSC Class: 05C62 (Primary); 52B10; 68R10 (Secondary)

  50. Joint Access Point Selection and Power Allocation for Uplink Wireless Networks

    Authors: Mingyi Hong, Alfredo Garcia, Jorge Barrera, Stephen G. Wilson

    Abstract: We consider the distributed uplink resource allocation problem in a multi-carrier wireless network with multiple access points (APs). Each mobile user can optimize its own transmission rate by selecting a suitable AP and by controlling its transmit power. Our objective is to devise suitable algorithms by which mobile users can jointly perform these tasks in a distributed manner. Our approach relie… ▽ More

    Submitted 18 July, 2012; originally announced July 2012.

    Comments: Revised and Resubmitted to IEEE Transactions on Signal Processing