-
Market or Markets? Investigating Google Search's Market Shares Under Horizontal and Vertical Segmentation
Authors:
Desheng Hu,
Muhammad Abu Bakar Aziz,
Jeffrey Gleason,
Alice Koeninger,
Nikolas Guggenberger,
Ronald E. Robertson,
Christo Wilson
Abstract:
Is Google Search a monopoly with gatekeeping power? Regulators from the US, UK, and Europe have argued that it is based on the assumption that Google Search dominates the market for horizontal (a.k.a. "general") web search. Google disputes this, claiming that competition extends to all vertical (a.k.a. "specialized") search engines, and that under this market definition it does not have monopoly p…
▽ More
Is Google Search a monopoly with gatekeeping power? Regulators from the US, UK, and Europe have argued that it is based on the assumption that Google Search dominates the market for horizontal (a.k.a. "general") web search. Google disputes this, claiming that competition extends to all vertical (a.k.a. "specialized") search engines, and that under this market definition it does not have monopoly power. In this study we present the first analysis of Google Search's market share under both horizontal and vertical segmentation of online search. We leverage observational trace data collected from a panel of US residents that includes their web browsing history and copies of the Google Search Engine Result Pages they were shown. We observe that Google Search receives 71.8% of participants' queries when compared to other horizontal search engines, and that participants' search sessions begin at Google greater than 50% of the time in 24 out of 30 vertical market segments (which comprise almost all of our participants' searches). Our results inform the consequential and ongoing debates about the market power of Google Search and the conceptualization of online markets in general.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution
Authors:
Neeraj Varshney,
Himanshu Gupta,
Eric Robertson,
Bing Liu,
Chitta Baral
Abstract:
State-of-the-art natural language processing models have been shown to achieve remarkable performance in 'closed-world' settings where all the labels in the evaluation set are known at training time. However, in real-world settings, 'novel' instances that do not belong to any known class are often observed. This renders the ability to deal with novelties crucial. To initiate a systematic research…
▽ More
State-of-the-art natural language processing models have been shown to achieve remarkable performance in 'closed-world' settings where all the labels in the evaluation set are known at training time. However, in real-world settings, 'novel' instances that do not belong to any known class are often observed. This renders the ability to deal with novelties crucial. To initiate a systematic research in this important area of 'dealing with novelties', we introduce 'NoveltyTask', a multi-stage task to evaluate a system's performance on pipelined novelty 'detection' and 'accommodation' tasks. We provide mathematical formulation of NoveltyTask and instantiate it with the authorship attribution task that pertains to identifying the correct author of a given text. We use Amazon reviews corpus and compile a large dataset (consisting of 250k instances across 200 authors/labels) for NoveltyTask. We conduct comprehensive experiments and explore several baseline methods for the task. Our results show that the methods achieve considerably low performance making the task challenging and leaving sufficient room for improvement. Finally, we believe our work will encourage research in this underexplored area of dealing with novelties, an important step en route to developing robust systems.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
Optical convolutional neural network with atomic nonlinearity
Authors:
Mingwei Yang,
Elizabeth Robertson,
Luisa Esguerra,
Kurt Busch,
Janik Wolters
Abstract:
Due to their high degree of parallelism, fast processing speeds and low power consumption, analog optical functional elements offer interesting routes for realizing neuro-morphic computer hardware. For instance, convolutional neural networks lend themselves to analog optical implementations by exploiting the Fourier-transform characteristics of suitable designed optical setups. However, the effici…
▽ More
Due to their high degree of parallelism, fast processing speeds and low power consumption, analog optical functional elements offer interesting routes for realizing neuro-morphic computer hardware. For instance, convolutional neural networks lend themselves to analog optical implementations by exploiting the Fourier-transform characteristics of suitable designed optical setups. However, the efficient implementation of optical nonlinearities for such neural networks still represents challenges. In this work, we report on the realization and characterization of a three-layer optical convolutional neural network where the linear part is based on a 4f-imaging system and the optical nonlinearity is realized via the absorption profile of a cesium atomic vapor cell. This system classifies the handwritten digital dataset MNIST with 83.96% accuracy, which agrees well with corresponding simulations. Our results thus demonstrate the viability of utilizing atomic nonlinearities in neural network architectures with low power consumption.
△ Less
Submitted 24 January, 2023;
originally announced January 2023.
-
Human Activity Recognition in an Open World
Authors:
Derek S. Prijatelj,
Samuel Grieggs,
Jin Huang,
Dawei Du,
Ameya Shringi,
Christopher Funk,
Adam Kaufman,
Eric Robertson,
Walter J. Scheirer
Abstract:
Managing novelty in perception-based human activity recognition (HAR) is critical in realistic settings to improve task performance over time and ensure solution generalization outside of prior seen samples. Novelty manifests in HAR as unseen samples, activities, objects, environments, and sensor changes, among other ways. Novelty may be task-relevant, such as a new class or new features, or task-…
▽ More
Managing novelty in perception-based human activity recognition (HAR) is critical in realistic settings to improve task performance over time and ensure solution generalization outside of prior seen samples. Novelty manifests in HAR as unseen samples, activities, objects, environments, and sensor changes, among other ways. Novelty may be task-relevant, such as a new class or new features, or task-irrelevant resulting in nuisance novelty, such as never before seen noise, blur, or distorted video recordings. To perform HAR optimally, algorithmic solutions must be tolerant to nuisance novelty, and learn over time in the face of novelty. This paper 1) formalizes the definition of novelty in HAR building upon the prior definition of novelty in classification tasks, 2) proposes an incremental open world learning (OWL) protocol and applies it to the Kinetics datasets to generate a new benchmark KOWL-718, 3) analyzes the performance of current state-of-the-art HAR models when novelty is introduced over time, 4) provides a containerized and packaged pipeline for reproducing the OWL protocol and for modifying for any future updates to Kinetics. The experimental analysis includes an ablation study of how the different models perform under various conditions as annotated by Kinetics-AVA. The protocol as an algorithm for reproducing experiments using the KOWL-718 benchmark will be publicly released with code and containers at https://github.com/prijatelj/human-activity-recognition-in-an-open-world. The code may be used to analyze different annotations and subsets of the Kinetics datasets in an incremental open world fashion, as well as be extended as further updates to Kinetics are released.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Semantic Novelty Detection and Characterization in Factual Text Involving Named Entities
Authors:
Nianzu Ma,
Sahisnu Mazumder,
Alexander Politowicz,
Bing Liu,
Eric Robertson,
Scott Grigsby
Abstract:
Much of the existing work on text novelty detection has been studied at the topic level, i.e., identifying whether the topic of a document or a sentence is novel or not. Little work has been done at the fine-grained semantic level (or contextual level). For example, given that we know Elon Musk is the CEO of a technology company, the sentence "Elon Musk acted in the sitcom The Big Bang Theory" is…
▽ More
Much of the existing work on text novelty detection has been studied at the topic level, i.e., identifying whether the topic of a document or a sentence is novel or not. Little work has been done at the fine-grained semantic level (or contextual level). For example, given that we know Elon Musk is the CEO of a technology company, the sentence "Elon Musk acted in the sitcom The Big Bang Theory" is novel and surprising because normally a CEO would not be an actor. Existing topic-based novelty detection methods work poorly on this problem because they do not perform semantic reasoning involving relations between named entities in the text and their background knowledge. This paper proposes an effective model (called PAT-SND) to solve the problem, which can also characterize the novelty. An annotated dataset is also created. Evaluation shows that PAT-SND outperforms 10 baselines by large margins.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
Subscriptions and external links help drive resentful users to alternative and extremist YouTube videos
Authors:
Annie Y. Chen,
Brendan Nyhan,
Jason Reifler,
Ronald E. Robertson,
Christo Wilson
Abstract:
Do online platforms facilitate the consumption of potentially harmful content? Using paired behavioral and survey data provided by participants recruited from a representative sample in 2020 (n=1,181), we show that exposure to alternative and extremist channel videos on YouTube is heavily concentrated among a small group of people with high prior levels of gender and racial resentment. These viewe…
▽ More
Do online platforms facilitate the consumption of potentially harmful content? Using paired behavioral and survey data provided by participants recruited from a representative sample in 2020 (n=1,181), we show that exposure to alternative and extremist channel videos on YouTube is heavily concentrated among a small group of people with high prior levels of gender and racial resentment. These viewers often subscribe to these channels (prompting recommendations to their videos) and follow external links to them. In contrast, non-subscribers rarely see or follow recommendations to videos from these channels. Our findings suggest YouTube's algorithms were not sending people down "rabbit holes" during our observation window in 2020, possibly due to changes that the company made to its recommender system in 2019. However, the platform continues to play a key role in facilitating exposure to content from alternative and extremist channels among dedicated audiences.
△ Less
Submitted 2 April, 2023; v1 submitted 22 April, 2022;
originally announced April 2022.
-
AI Autonomy : Self-Initiated Open-World Continual Learning and Adaptation
Authors:
Bing Liu,
Sahisnu Mazumder,
Eric Robertson,
Scott Grigsby
Abstract:
As more and more AI agents are used in practice, it is time to think about how to make these agents fully autonomous so that they can (1) learn by themselves continually in a self-motivated and self-initiated manner rather than being retrained offline periodically on the initiation of human engineers and (2) accommodate or adapt to unexpected or novel circumstances. As the real-world is an open en…
▽ More
As more and more AI agents are used in practice, it is time to think about how to make these agents fully autonomous so that they can (1) learn by themselves continually in a self-motivated and self-initiated manner rather than being retrained offline periodically on the initiation of human engineers and (2) accommodate or adapt to unexpected or novel circumstances. As the real-world is an open environment that is full of unknowns or novelties, the capabilities of detecting novelties, characterizing them, accommodating/adapting to them, gathering ground-truth training data and incrementally learning the unknowns/novelties become critical in making the AI agent more and more knowledgeable, powerful and self-sustainable over time. The key challenge here is how to automate the process so that it is carried out continually on the agent's own initiative and through its own interactions with humans, other agents and the environment just like human on-the-job learning. This paper proposes a framework (called SOLA) for this learning paradigm to promote the research of building autonomous and continual learning enabled AI agents. To show feasibility, an implemented agent is also described.
△ Less
Submitted 19 April, 2023; v1 submitted 16 March, 2022;
originally announced March 2022.
-
Googling for Abortion: Search Engine Mediation of Abortion Accessibility in the United States
Authors:
Yelena Mejova,
Tatiana Gracyk,
Ronald E. Robertson
Abstract:
Among the myriad barriers to abortion access, crisis pregnancy centers (CPCs) pose an additional difficulty by targeting women with unexpected or "crisis" pregnancies in order to dissuade them from the procedure. Web search engines may prove to be another barrier, being in a powerful position to direct their users to health information, and above all, health services. In this study we ask, to what…
▽ More
Among the myriad barriers to abortion access, crisis pregnancy centers (CPCs) pose an additional difficulty by targeting women with unexpected or "crisis" pregnancies in order to dissuade them from the procedure. Web search engines may prove to be another barrier, being in a powerful position to direct their users to health information, and above all, health services. In this study we ask, to what degree does Google Search provide quality responses to users searching for an abortion provider, specifically in terms of directing them to abortion clinics (ACs) or CPCs. To answer this question, we considered the scenario of a woman searching for abortion services online, and conducted 10 abortion-related queries from 467 locations across the United States once a week for 14 weeks. Overall, among Google's location results that feature businesses alongside a map, 79.4% were ACs, and 6.9% were CPCs. When an AC was returned, it was the closest known AC location 86.9% of the time. However, when a CPC appeared in a result set, it was the closest one to the search location 75.9% of the time. Examining correlates of AC results, we found that fewer AC results were returned for searches from poorer and rural areas, and those with TRAP laws governing AC facility and clinician requirements. We also observed that Google's performance on our queries significantly improved following a major algorithm update. These results have important implications concerning health access quality and equity, both for individual users and public health policy.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
Engagement Outweighs Exposure to Partisan and Unreliable News within Google Search
Authors:
Ronald E. Robertson,
Jon Green,
Damian J. Ruck,
Katherine Ognyanova,
Christo Wilson,
David Lazer
Abstract:
If popular online platforms systematically expose their users to partisan and unreliable news, they could potentially contribute to societal issues like rising political polarization. This concern is central to the echo chamber and filter bubble debates, which critique the roles that user choice and algorithmic curation play in guiding users to different online information sources. These roles can…
▽ More
If popular online platforms systematically expose their users to partisan and unreliable news, they could potentially contribute to societal issues like rising political polarization. This concern is central to the echo chamber and filter bubble debates, which critique the roles that user choice and algorithmic curation play in guiding users to different online information sources. These roles can be measured in terms of exposure, the URLs seen while using an online platform, and engagement, the URLs selected while on that platform or browsing the web more generally. However, due to the challenges of obtaining ecologically valid exposure data--what real users saw during their regular platform use--studies in this vein often only examine engagement data, or estimate exposure via simulated behavior or inference. Despite their centrality to the contemporary information ecosystem, few such studies have focused on web search, and even fewer have examined both exposure and engagement on any platform. To address these gaps, we conducted a two-wave study pairing surveys with ecologically valid measures of exposure and engagement on Google Search during the 2018 and 2020 US elections. We found that participants' partisan identification had a small and inconsistent relationship with the amount of partisan and unreliable news they were exposed to on Google Search, a more consistent relationship with the search results they chose to follow, and the most consistent relationship with their overall engagement. That is, compared to the news sources our participants were exposed to on Google Search, we found more identity-congruent and unreliable news sources in their engagement choices, both within Google Search and overall. These results suggest that exposure and engagement with partisan or unreliable news on Google Search are not primarily driven by algorithmic curation, but by users' own choices.
△ Less
Submitted 28 September, 2022; v1 submitted 31 December, 2021;
originally announced January 2022.
-
Self-Initiated Open World Learning for Autonomous AI Agents
Authors:
Bing Liu,
Eric Robertson,
Scott Grigsby,
Sahisnu Mazumder
Abstract:
As more and more AI agents are used in practice, it is time to think about how to make these agents fully autonomous so that they can learn by themselves in a self-motivated and self-supervised manner rather than being retrained periodically on the initiation of human engineers using expanded training data. As the real-world is an open environment with unknowns or novelties, detecting novelties or…
▽ More
As more and more AI agents are used in practice, it is time to think about how to make these agents fully autonomous so that they can learn by themselves in a self-motivated and self-supervised manner rather than being retrained periodically on the initiation of human engineers using expanded training data. As the real-world is an open environment with unknowns or novelties, detecting novelties or unknowns, characterizing them, accommodating or adapting to them, gathering ground-truth training data, and incrementally learning the unknowns/novelties are critical to making the agent more and more knowledgeable and powerful over time. The key challenge is how to automate the process so that it is carried out on the agent's own initiative and through its own interactions with humans and the environment. Since an AI agent usually has a performance task, characterizing each novelty becomes critical and necessary so that the agent can formulate an appropriate response to adapt its behavior to accommodate the novelty and to learn from it to improve the agent's adaptation capability and task performance. The process goes continually without termination. This paper proposes a theoretic framework for this learning paradigm to promote the research of building Self-initiated Open world Learning (SOL) agents. An example SOL agent is also described.
△ Less
Submitted 28 February, 2024; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP
Authors:
Sepideh Esmaeilpour,
Bing Liu,
Eric Robertson,
Lei Shu
Abstract:
In an out-of-distribution (OOD) detection problem, samples of known classes(also called in-distribution classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and also (2) detect samples that do not belong to any of the known classes (i.e., they belong to some unknown or OOD classes). This paper stu…
▽ More
In an out-of-distribution (OOD) detection problem, samples of known classes(also called in-distribution classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and also (2) detect samples that do not belong to any of the known classes (i.e., they belong to some unknown or OOD classes). This paper studies the problem of zero-shot out-of-distribution(OOD) detection, which still performs the same two tasks in testing but has no training except using the given known class names. This paper proposes a novel yet simple method (called ZOC) to solve the problem. ZOC builds on top of the recent advances in zero-shot classification through multi-modal representation learning. It first extends the pre-trained language-vision model CLIP by training a text-based image description generator on top of CLIP. In testing, it uses the extended model to generate candidate unknown class names for each test sample and computes a confidence score based on both the known class names and candidate unknown class names for zero-shot OOD detection. Experimental results on 5 benchmark datasets for OOD detection demonstrate that ZOC outperforms the baselines by a large margin.
△ Less
Submitted 22 March, 2022; v1 submitted 6 September, 2021;
originally announced September 2021.
-
Handwriting Recognition with Novelty
Authors:
Derek S. Prijatelj,
Samuel Grieggs,
Futoshi Yumoto,
Eric Robertson,
Walter J. Scheirer
Abstract:
This paper introduces an agent-centric approach to handle novelty in the visual recognition domain of handwriting recognition (HWR). An ideal transcription agent would rival or surpass human perception, being able to recognize known and new characters in an image, and detect any stylistic changes that may occur within or across documents. A key confound is the presence of novelty, which has contin…
▽ More
This paper introduces an agent-centric approach to handle novelty in the visual recognition domain of handwriting recognition (HWR). An ideal transcription agent would rival or surpass human perception, being able to recognize known and new characters in an image, and detect any stylistic changes that may occur within or across documents. A key confound is the presence of novelty, which has continued to stymie even the best machine learning-based algorithms for these tasks. In handwritten documents, novelty can be a change in writer, character attributes, writing attributes, or overall document appearance, among other things. Instead of looking at each aspect independently, we suggest that an integrated agent that can process known characters and novelties simultaneously is a better strategy. This paper formalizes the domain of handwriting recognition with novelty, describes a baseline agent, introduces an evaluation protocol with benchmark data, and provides experimentation to set the state-of-the-art. Results show feasibility for the agent-centric approach, but more work is needed to approach human-levels of reading ability, giving the HWR community a formal basis to build upon as they solve this challenging problem.
△ Less
Submitted 17 May, 2021; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Explainable Deep Learning for Uncovering Actionable Scientific Insights for Materials Discovery and Design
Authors:
Shusen Liu,
Bhavya Kailkhura,
Jize Zhang,
Anna M. Hiszpanski,
Emily Robertson,
Donald Loveland,
T. Yong-Jin Han
Abstract:
The scientific community has been increasingly interested in harnessing the power of deep learning to solve various domain challenges. However, despite the effectiveness in building predictive models, fundamental challenges exist in extracting actionable knowledge from deep neural networks due to their opaque nature. In this work, we propose techniques for exploring the behavior of deep learning m…
▽ More
The scientific community has been increasingly interested in harnessing the power of deep learning to solve various domain challenges. However, despite the effectiveness in building predictive models, fundamental challenges exist in extracting actionable knowledge from deep neural networks due to their opaque nature. In this work, we propose techniques for exploring the behavior of deep learning models by injecting domain-specific actionable attributes as tunable "knobs" in the analysis pipeline. By incorporating the domain knowledge in a generative modeling framework, we are not only able to better understand the behavior of these black-box models, but also provide scientists with actionable insights that can potentially lead to fundamental discoveries.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
Actionable Attribution Maps for Scientific Machine Learning
Authors:
Shusen Liu,
Bhavya Kailkhura,
Jize Zhang,
Anna M. Hiszpanski,
Emily Robertson,
Donald Loveland,
T. Yong-Jin Han
Abstract:
The scientific community has been increasingly interested in harnessing the power of deep learning to solve various domain challenges. However, despite the effectiveness in building predictive models, fundamental challenges exist in extracting actionable knowledge from the deep neural network due to their opaque nature. In this work, we propose techniques for exploring the behavior of deep learnin…
▽ More
The scientific community has been increasingly interested in harnessing the power of deep learning to solve various domain challenges. However, despite the effectiveness in building predictive models, fundamental challenges exist in extracting actionable knowledge from the deep neural network due to their opaque nature. In this work, we propose techniques for exploring the behavior of deep learning models by injecting domain-specific actionable concepts as tunable ``knobs'' in the analysis pipeline. By incorporating the domain knowledge with generative modeling, we are not only able to better understand the behavior of these black-box models, but also provide scientists with actionable insights that can potentially lead to fundamental discoveries.
△ Less
Submitted 30 June, 2020;
originally announced June 2020.
-
Quantifying the Impact of User Attention on Fair Group Representation in Ranked Lists
Authors:
Piotr Sapiezynski,
Wesley Zeng,
Ronald E. Robertson,
Alan Mislove,
Christo Wilson
Abstract:
In this work, we introduce a novel metric for auditing group fairness in ranked lists. Our approach offers two benefits compared to the state of the art. First, we offer a blueprint for modeling of user attention. Rather than assuming a logarithmic loss in importance as a function of the rank, we can account for varying user behaviors through parametrization. For example, we expect a user to see m…
▽ More
In this work, we introduce a novel metric for auditing group fairness in ranked lists. Our approach offers two benefits compared to the state of the art. First, we offer a blueprint for modeling of user attention. Rather than assuming a logarithmic loss in importance as a function of the rank, we can account for varying user behaviors through parametrization. For example, we expect a user to see more items during a viewing of a social media feed than when they inspect the results list of a single web search query. Second, we allow non-binary protected attributes to enable investigating inherently continuous attributes (\eg political alignment on the liberal to conservative spectrum) as well as to facilitate measurements across aggregated sets of search results, rather than separately for each result list. By combining these two elements into our metric, we are able to better address the human factors inherent in this problem. We measure the whole sociotechnical system, consisting of a ranking algorithm and individuals using it, instead of exclusively focusing on the ranking algorithm. Finally, we use our metric to perform three simulated fairness audits. We show that determining fairness of a ranked output necessitates knowledge (or a model) of the end-users of the particular service. Depending on their attention distribution function, a fixed ranking of results can appear biased both in favor and against a protected group.
△ Less
Submitted 13 May, 2019; v1 submitted 29 January, 2019;
originally announced January 2019.