Zum Hauptinhalt springen

Showing 1–33 of 33 results for author: Howe, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.01966  [pdf, other

    cs.CL cs.AI cs.CY

    ML-EAT: A Multilevel Embedding Association Test for Interpretable and Transparent Social Science

    Authors: Robert Wolfe, Alexis Hiniker, Bill Howe

    Abstract: This research introduces the Multilevel Embedding Association Test (ML-EAT), a method designed for interpretable and transparent measurement of intrinsic bias in language technologies. The ML-EAT addresses issues of ambiguity and difficulty in interpreting the traditional EAT measurement by quantifying bias at three levels of increasing granularity: the differential association between two target… ▽ More

    Submitted 27 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted at Artificial Intelligence, Ethics, and Society 2024

  2. arXiv:2408.01961  [pdf, ps, other

    cs.CY cs.AI cs.CL cs.HC cs.LG

    Representation Bias of Adolescents in AI: A Bilingual, Bicultural Study

    Authors: Robert Wolfe, Aayushi Dangol, Bill Howe, Alexis Hiniker

    Abstract: Popular and news media often portray teenagers with sensationalism, as both a risk to society and at risk from society. As AI begins to absorb some of the epistemic functions of traditional media, we study how teenagers in two countries speaking two languages: 1) are depicted by AI, and 2) how they would prefer to be depicted. Specifically, we study the biases about teenagers learned by static wor… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted at Artificial Intelligence, Ethics, and Society 2024

  3. arXiv:2408.01959  [pdf, other

    cs.CV cs.AI cs.CL cs.CY cs.LG

    Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI

    Authors: Robert Wolfe, Aayushi Dangol, Alexis Hiniker, Bill Howe

    Abstract: Multimodal AI models capable of associating images and text hold promise for numerous domains, ranging from automated image captioning to accessibility applications for blind and low-vision users. However, uncertainty about bias has in some cases limited their adoption and availability. In the present work, we study 43 CLIP vision-language models to determine whether they learn human-like facial i… ▽ More

    Submitted 27 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted at Artificial Intelligence, Ethics, and Society 2024

  4. arXiv:2408.00932  [pdf, other

    cs.CV cs.CL

    Towards Zero-Shot Annotation of the Built Environment with Vision-Language Models (Vision Paper)

    Authors: Bin Han, Yiwei Yang, Anat Caspi, Bill Howe

    Abstract: Equitable urban transportation applications require high-fidelity digital representations of the built environment: not just streets and sidewalks, but bike lanes, marked and unmarked crossings, curb ramps and cuts, obstructions, traffic signals, signage, street markings, potholes, and more. Direct inspections and manual annotations are prohibitively expensive at scale. Conventional machine learni… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  5. arXiv:2407.18418  [pdf, other

    cs.CL

    Know Your Limits: A Survey of Abstention in Large Language Models

    Authors: Bingbing Wen, Jihan Yao, Shangbin Feng, Chenjun Xu, Yulia Tsvetkov, Bill Howe, Lucy Lu Wang

    Abstract: Abstention, the refusal of large language models (LLMs) to provide an answer, is increasingly recognized for its potential to mitigate hallucinations and enhance safety in LLM systems. In this survey, we introduce a framework to examine abstention from three perspectives: the query, the model, and human values. We organize the literature on abstention methods, benchmarks, and evaluation metrics us… ▽ More

    Submitted 8 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: preprint

  6. arXiv:2407.16875  [pdf, other

    cs.CV

    PathwayBench: Assessing Routability of Pedestrian Pathway Networks Inferred from Multi-City Imagery

    Authors: Yuxiang Zhang, Bill Howe, Sachin Mehta, Nicholas-J Bolten, Anat Caspi

    Abstract: Applications to support pedestrian mobility in urban areas require a complete, and routable graph representation of the built environment. Globally available information, including aerial imagery provides a scalable source for constructing these path networks, but the associated learning problem is challenging: Relative to road network pathways, pedestrian network pathways are narrower, more frequ… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2303.02323

  7. arXiv:2405.16820  [pdf, other

    cs.LG cs.AI cs.CY cs.HC

    Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings

    Authors: Robert Wolfe, Isaac Slaughter, Bin Han, Bingbing Wen, Yiwei Yang, Lucas Rosenblatt, Bernease Herman, Eva Brown, Zening Qu, Nic Weber, Bill Howe

    Abstract: The rapid proliferation of generative AI has raised questions about the competitiveness of lower-parameter, locally tunable, open-weight models relative to high-parameter, API-guarded, closed-weight models in terms of performance, domain adaptation, cost, and generalization. Centering under-resourced yet risk-intolerant settings in government, research, and healthcare, we see for-profit closed-wei… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted at the ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2024

  8. arXiv:2404.12452  [pdf, other

    cs.CL

    Characterizing LLM Abstention Behavior in Science QA with Context Perturbations

    Authors: Bingbing Wen, Bill Howe, Lucy Lu Wang

    Abstract: The correct model response in the face of uncertainty is to abstain from answering a question so as not to mislead the user. In this work, we study the ability of LLMs to abstain from answering context-dependent science questions when provided insufficient or incorrect context. We probe model sensitivity in several settings: removing gold context, replacing gold context with irrelevant context, an… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  9. arXiv:2312.13503  [pdf, other

    cs.CV cs.AI

    InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models

    Authors: Bingbing Wen, Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Bill Howe, Lijuan Wang

    Abstract: In this paper, we build a visual dialogue dataset, named InfoVisDial, which provides rich informative answers in each round even with external knowledge related to the visual content. Different from existing datasets where the answer is compact and short, InfoVisDial contains long free-form answers with rich information in each round of dialogue. For effective data collection, the key idea is to b… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  10. arXiv:2310.00740  [pdf, other

    cs.CV cs.CY cs.LG

    Top-down Green-ups: Satellite Sensing and Deep Models to Predict Buffelgrass Phenology

    Authors: Lucas Rosenblatt, Bin Han, Erin Posthumus, Theresa Crimmins, Bill Howe

    Abstract: An invasive species of grass known as "buffelgrass" contributes to severe wildfires and biodiversity loss in the Southwest United States. We tackle the problem of predicting buffelgrass "green-ups" (i.e. readiness for herbicidal treatment). To make our predictions, we explore temporal, visual and multi-modal models that combine satellite sensing and deep learning. We find that all of our neural-ba… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  11. arXiv:2306.07292  [pdf, other

    cs.LG cs.AI cs.CR

    SARN: Structurally-Aware Recurrent Network for Spatio-Temporal Disaggregation

    Authors: Bin Han, Bill Howe

    Abstract: Open data is frequently released spatially aggregated, usually to comply with privacy policies. But coarse, heterogeneous aggregations complicate learning and integration for downstream AI/ML systems. In this work, we consider models to disaggregate spatio-temporal data from a low-resolution, irregular partition (e.g., census tract) to a high-resolution, irregular partition (e.g., city block). We… ▽ More

    Submitted 1 August, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

  12. arXiv:2301.04233  [pdf, other

    cs.CV cs.AI cs.CY

    Adapting to Skew: Imputing Spatiotemporal Urban Data with 3D Partial Convolutions and Biased Masking

    Authors: Bin Han, Bill Howe

    Abstract: We adapt image inpainting techniques to impute large, irregular missing regions in urban settings characterized by sparsity, variance in both space and time, and anomalous events. Missing regions in urban data can be caused by sensor or software failures, data quality issues, interference from weather events, incomplete data collection, or varying data use regulations; any missing data can render… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

  13. arXiv:2212.11261  [pdf, other

    cs.CY cs.AI cs.CL cs.CV cs.LG

    Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias

    Authors: Robert Wolfe, Yiwei Yang, Bill Howe, Aylin Caliskan

    Abstract: Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics, such as emotions, are disregarded and the person is treated as a body. We replicate three experiments in psychology qua… ▽ More

    Submitted 15 May, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

    Comments: 12 pages, 4 figures, 2 tables

    Journal ref: ACM FAccT 2023

  14. arXiv:2208.12700  [pdf, other

    cs.CR cs.CY

    Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy

    Authors: Lucas Rosenblatt, Bernease Herman, Anastasia Holovenko, Wonkwon Lee, Joshua Loftus, Elizabeth McKinnie, Taras Rumezhak, Andrii Stadnik, Bill Howe, Julia Stoyanovich

    Abstract: Differential privacy (DP) data synthesizers support public release of sensitive information, offering theoretical guarantees for privacy but limited evidence of utility in practical settings. Utility is typically measured as the error on representative proxy tasks, such as descriptive statistics, accuracy of trained classifiers, or performance over a query workload. The ability for these results t… ▽ More

    Submitted 31 May, 2023; v1 submitted 26 August, 2022; originally announced August 2022.

    Comments: Preprint. 14 pages

  15. arXiv:2002.07951  [pdf, other

    cs.DB cs.PL

    SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra

    Authors: Yisu Remy Wang, Shana Hutchison, Jonathan Leang, Bill Howe, Dan Suciu

    Abstract: Machine learning algorithms are commonly specified in linear algebra (LA). LA expressions can be rewritten into more efficient forms, by taking advantage of input properties such as sparsity, as well as program properties such as common subexpressions and fusible operators. The complex interaction among these properties' impact on the execution cost poses a challenge to optimizing compilers. Exist… ▽ More

    Submitted 22 December, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

  16. arXiv:1908.07924  [pdf, other

    cs.DB cs.LG

    Data Management for Causal Algorithmic Fairness

    Authors: Babak Salimi, Bill Howe, Dan Suciu

    Abstract: Fairness is increasingly recognized as a critical component of machine learning systems. However, it is the underlying data on which these systems are trained that often reflects discrimination, suggesting a data management problem. In this paper, we first make a distinction between associational and causal definitions of fairness in the literature and argue that the concept of fairness requires c… ▽ More

    Submitted 30 September, 2019; v1 submitted 20 August, 2019; originally announced August 2019.

    Comments: arXiv admin note: text overlap with arXiv:1902.08283

  17. arXiv:1908.07465  [pdf, other

    cs.DL cs.LG stat.ML

    Delineating Knowledge Domains in the Scientific Literature Using Visual Information

    Authors: Sean Yang, Po-shen Lee, Jevin D. West, Bill Howe

    Abstract: Figures are an important channel for scientific communication, used to express complex ideas, models and data in ways that words cannot. However, this visual information is mostly ignored in analyses of the scientific literature. In this paper, we demonstrate the utility of using scientific figures as markers of knowledge domains in science, which can be used for classification, recommender system… ▽ More

    Submitted 12 August, 2019; originally announced August 2019.

  18. arXiv:1907.03827  [pdf, other

    cs.CY cs.AI cs.LG stat.AP

    FairST: Equitable Spatial and Temporal Demand Prediction for New Mobility Systems

    Authors: An Yan, Bill Howe

    Abstract: Emerging transportation modes, including car-sharing, bike-sharing, and ride-hailing, are transforming urban mobility but have been shown to reinforce socioeconomic inequities. Spatiotemporal demand prediction models for these new mobility regimes must therefore consider fairness as a first-class design requirement. We present FairST, a fairness-aware model for predicting demand for new mobility s… ▽ More

    Submitted 21 June, 2019; originally announced July 2019.

  19. arXiv:1905.01351  [pdf, ps, other

    cs.DB

    In Defense of Synthetic Data

    Authors: Luke Rodriguez, Bill Howe

    Abstract: Synthetic datasets have long been thought of as second-rate, to be used only when "real" data collected directly from the real world is unavailable. But this perspective assumes that raw data is clean, unbiased, and trustworthy, which it rarely is. Moreover, the benefits of synthetic data for privacy and for bias correction are becoming increasingly important in any domain that works with people.… ▽ More

    Submitted 3 May, 2019; originally announced May 2019.

    Comments: Discussion paper at FATES on the Web 2019

  20. arXiv:1902.08283  [pdf, other

    cs.DB cs.AI

    Capuchin: Causal Database Repair for Algorithmic Fairness

    Authors: Babak Salimi, Luke Rodriguez, Bill Howe, Dan Suciu

    Abstract: Fairness is increasingly recognized as a critical component of machine learning systems. However, it is the underlying data on which these systems are trained that often reflect discrimination, suggesting a database repair problem. Existing treatments of fairness rely on statistical correlations that can be fooled by statistical anomalies, such as Simpson's paradox. Proposals for causality-based d… ▽ More

    Submitted 1 October, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

    Journal ref: Proceedings of the 2019 International Conference on Management of Data. ACM, 2019

  21. arXiv:1901.01860  [pdf, other

    cs.LG stat.ML

    JECL: Joint Embedding and Cluster Learning for Image-Text Pairs

    Authors: Sean T. Yang, Kuan-Hao Huang, Bill Howe

    Abstract: We propose JECL, a method for clustering image-caption pairs by training parallel encoders with regularized clustering and alignment objectives, simultaneously learning both representations and cluster assignments. These image-caption pairs arise frequently in high-value applications where structured training data is expensive to produce, but free-text descriptions are common. JECL trains by minim… ▽ More

    Submitted 16 October, 2020; v1 submitted 4 January, 2019; originally announced January 2019.

    Comments: ICPR2020

  22. arXiv:1808.08355  [pdf, other

    cs.DB

    Database-Agnostic Workload Management

    Authors: Shrainik Jain, Jiaqi Yan, Thierry Cruane, Bill Howe

    Abstract: We present a system to support generalized SQL workload analysis and management for multi-tenant and multi-database platforms. Workload analysis applications are becoming more sophisticated to support database administration, model user behavior, audit security, and route queries, but the methods rely on specialized feature engineering, and therefore must be carefully implemented and reimplemented… ▽ More

    Submitted 25 August, 2018; originally announced August 2018.

  23. arXiv:1808.07603  [pdf, other

    cs.DB

    Privacy-Preserving Synthetic Datasets Over Weakly Constrained Domains

    Authors: Luke Rodriguez, Bill Howe

    Abstract: Techniques to deliver privacy-preserving synthetic datasets take a sensitive dataset as input and produce a similar dataset as output while maintaining differential privacy. These approaches have the potential to improve data sharing and reuse, but they must be accessible to non-experts and tolerant of realistic data. Existing approaches make an implicit assumption that the active domain of the da… ▽ More

    Submitted 22 August, 2018; originally announced August 2018.

    Comments: Submitted to TDPD18

  24. MobilityMirror: Bias-Adjusted Transportation Datasets

    Authors: Luke Rodriguez, Babak Salimi, Haoyue Ping, Julia Stoyanovich, Bill Howe

    Abstract: We describe customized synthetic datasets for publishing mobility data. Private companies are providing new transportation modalities, and their data is of high value for integrative transportation research, policy enforcement, and public accountability. However, these companies are disincentivized from sharing data not only to protect the privacy of individuals (drivers and/or passengers), but al… ▽ More

    Submitted 24 January, 2019; v1 submitted 21 August, 2018; originally announced August 2018.

    Comments: Presented at BIDU 2018 workshop and published in Springer Communications in Computer and Information Science vol 926

    Journal ref: Big Social Data and Urban Computing. BiDU 2018. Communications in Computer and Information Science, vol 926. Springer, Cham

  25. arXiv:1804.07890  [pdf, other

    cs.CY cs.DB cs.HC

    A Nutritional Label for Rankings

    Authors: Ke Yang, Julia Stoyanovich, Abolfazl Asudeh, Bill Howe, HV Jagadish, Gerome Miklau

    Abstract: Algorithmic decisions often result in scoring and ranking individuals to determine credit worthiness, qualifications for college admissions and employment, and compatibility as dating partners. While automatic and seemingly objective, ranking algorithms can discriminate against individuals and protected groups, and exhibit low diversity. Furthermore, ranked results are often unstable --- small cha… ▽ More

    Submitted 21 April, 2018; originally announced April 2018.

    Comments: 4 pages, SIGMOD demo, 3 figuress, ACM SIGMOD 2018

    MSC Class: 68U01; 68P01 ACM Class: H.2, H.2.8, K.4.1

  26. arXiv:1801.05613  [pdf, other

    cs.DB cs.CL

    Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics

    Authors: Shrainik Jain, Bill Howe, Jiaqi Yan, Thierry Cruanes

    Abstract: We consider methods for learning vector representations of SQL queries to support generalized workload analytics tasks, including workload summarization for index selection and predicting queries that will trigger memory errors. We consider vector representations of both raw SQL text and optimized query plans, and evaluate these methods on synthetic and real SQL workloads. We find that general alg… ▽ More

    Submitted 2 February, 2018; v1 submitted 17 January, 2018; originally announced January 2018.

  27. arXiv:1710.08874  [pdf, other

    cs.CY

    Synthetic Data for Social Good

    Authors: Bill Howe, Julia Stoyanovich, Haoyue Ping, Bernease Herman, Matt Gee

    Abstract: Data for good implies unfettered access to data. But data owners must be conservative about how, when, and why they share data or risk violating the trust of the people they aim to help, losing their funding, or breaking the law. Data sharing agreements can help prevent privacy violations, but require a level of specificity that is premature during preliminary discussions, and can take over a year… ▽ More

    Submitted 24 October, 2017; originally announced October 2017.

    Comments: Presented at the Data For Good Exchange 2017

  28. arXiv:1709.08600  [pdf, other

    cs.CL cs.LG

    EZLearn: Exploiting Organic Supervision in Large-Scale Data Annotation

    Authors: Maxim Grechkin, Hoifung Poon, Bill Howe

    Abstract: Many real-world applications require automated data annotation, such as identifying tissue origins based on gene expressions and classifying images into semantic categories. Annotation classes are often numerous and subject to changes over time, and annotating examples has become the major bottleneck for supervised learning methods. In science and other high-value domains, large repositories of da… ▽ More

    Submitted 1 July, 2018; v1 submitted 25 September, 2017; originally announced September 2017.

  29. LaraDB: A Minimalist Kernel for Linear and Relational Algebra Computation

    Authors: Dylan Hutchison, Bill Howe, Dan Suciu

    Abstract: Analytics tasks manipulate structured data with variants of relational algebra (RA) and quantitative data with variants of linear algebra (LA). The two computational models have overlapping expressiveness, motivating a common programming model that affords unified reasoning and algorithm design. At the logical level we propose Lara, a lean algebra of three operators, that expresses RA and LA as we… ▽ More

    Submitted 13 April, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

    Comments: 10 pages, to appear in the BeyondMR workshop at the 2017 ACM SIGMOD conference

  30. From NoSQL Accumulo to NewSQL Graphulo: Design and Utility of Graph Algorithms inside a BigTable Database

    Authors: Dylan Hutchison, Jeremy Kepner, Vijay Gadepally, Bill Howe

    Abstract: Google BigTable's scale-out design for distributed key-value storage inspired a generation of NoSQL databases. Recently the NewSQL paradigm emerged in response to analytic workloads that demand distributed computation local to data storage. Many such analytics take the form of graph algorithms, a trend that motivated the GraphBLAS initiative to standardize a set of matrix math kernels for building… ▽ More

    Submitted 11 August, 2016; v1 submitted 22 June, 2016; originally announced June 2016.

    Comments: 9 pages, to appear in 2016 IEEE High Performance Extreme Computing Conference (HPEC)

  31. arXiv:1605.04951  [pdf, other

    cs.SI cs.CV cs.DL cs.IR

    Viziometrics: Analyzing Visual Information in the Scientific Literature

    Authors: Po-shen Lee, Jevin D. West, Bill Howe

    Abstract: Scientific results are communicated visually in the literature through diagrams, visualizations, and photographs. These information-dense objects have been largely ignored in bibliometrics and scientometrics studies when compared to citations and text. In this paper, we use techniques from computer vision and machine learning to classify more than 8 million figures from PubMed into 5 figure types… ▽ More

    Submitted 27 May, 2016; v1 submitted 16 May, 2016; originally announced May 2016.

  32. arXiv:1604.03607  [pdf, other

    cs.DB cs.PL

    Lara: A Key-Value Algebra underlying Arrays and Relations

    Authors: Dylan Hutchison, Bill Howe, Dan Suciu

    Abstract: Data processing systems roughly group into families such as relational, array, graph, and key-value. Many data processing tasks exceed the capabilities of any one family, require data stored across families, or run faster when partitioned onto multiple families. Discovering ways to execute computation among multiple available systems, let alone discovering an optimal execution plan, is challenging… ▽ More

    Submitted 12 April, 2016; originally announced April 2016.

    Comments: Working draft

  33. Astronomy in the Cloud: Using MapReduce for Image Coaddition

    Authors: Keith Wiley, Andrew Connolly, Jeff Gardner, Simon Krughof, Magdalena Balazinska, Bill Howe, YongChul Kwon, YingYi Bu

    Abstract: In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computation challenges such as anomaly detection and classification, and moving object tracking. Since such studies benefit from the highest quality data, methods such as image coaddition (stacking) will be a… ▽ More

    Submitted 5 October, 2010; originally announced October 2010.

    Comments: 31 pages, 11 figures, 2 tables