-
Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms
Authors:
Joshua Ashkinaze,
Ruijia Guan,
Laura Kurek,
Eytan Adar,
Ceren Budak,
Eric Gilbert
Abstract:
Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs' capacity to detect (Task 1) and correct (Task 2) biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy. LLMs struggled with bias detection, achieving only 64% ac…
▽ More
Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs' capacity to detect (Task 1) and correct (Task 2) biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy. LLMs struggled with bias detection, achieving only 64% accuracy on a balanced dataset. Models exhibited contrasting biases (some under- and others over-predicted bias), suggesting distinct priors about neutrality. LLMs performed better at generation, removing 79% of words removed by Wikipedia editors. However, LLMs made additional changes beyond Wikipedia editors' simpler neutralizations, resulting in high-recall but low-precision editing. Interestingly, crowdworkers rated AI rewrites as more neutral (70%) and fluent (61%) than Wikipedia-editor rewrites. Qualitative analysis found LLMs sometimes applied NPOV more comprehensively than Wikipedia editors but often made extraneous non-NPOV-related changes (such as grammar). LLMs may apply rules in ways that resonate with the public but diverge from community experts. While potentially effective for generation, LLMs may reduce editor agency and increase moderation workload (e.g., verifying additions). Even when rules are easy to articulate, having LLMs apply them like community members may still be difficult.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations
Authors:
Yoonjoo Lee,
Kihoon Son,
Tae Soo Kim,
Jisu Kim,
John Joon Young Chung,
Eytan Adar,
Juho Kim
Abstract:
As Large Language Models (LLMs) are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or a…
▽ More
As Large Language Models (LLMs) are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or alternatives. However, it is not obvious how the user will interpret conflicts or inconsistencies. To this end, we investigate how users perceive the AI model and comprehend the generated information when they receive multiple, potentially inconsistent, outputs. Through a preliminary study, we identified five types of output inconsistencies. Based on these categories, we conducted a study (N=252) in which participants were given one or more LLM-generated passages to an information-seeking question. We found that inconsistency within multiple LLM-generated outputs lowered the participants' perceived AI capacity, while also increasing their comprehension of the given information. Specifically, we observed that this positive effect of inconsistencies was most significant for participants who read two passages, compared to those who read three. Based on these findings, we present design implications that, instead of regarding LLM output inconsistencies as a drawback, we can reveal the potential inconsistencies to transparently indicate the limitations of these models and promote critical LLM usage.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Feminist Interaction Techniques: Deterring Non-Consensual Screenshots with Interaction Techniques
Authors:
Li Qiwei,
Francesca Lameiro,
Shefali Patel,
Cristi-Isaula-Reyes,
Eytan Adar,
Eric Gilbert,
Sarita Schoenebeck
Abstract:
Non-consensual Intimate Media (NCIM) refers to the distribution of sexual or intimate content without consent. NCIM is common and causes significant emotional, financial, and reputational harm. We developed Hands-Off, an interaction technique for messaging applications that deters non-consensual screenshots. Hands-Off requires recipients to perform a hand gesture in the air, above the device, to u…
▽ More
Non-consensual Intimate Media (NCIM) refers to the distribution of sexual or intimate content without consent. NCIM is common and causes significant emotional, financial, and reputational harm. We developed Hands-Off, an interaction technique for messaging applications that deters non-consensual screenshots. Hands-Off requires recipients to perform a hand gesture in the air, above the device, to unlock media -- which makes simultaneous screenshotting difficult. A lab study shows that Hands-Off gestures are easy to perform and reduce non-consensual screenshots by 67 percent. We conclude by generalizing this approach and introduce the idea of Feminist Interaction Techniques (FIT), interaction techniques that encode feminist values and speak to societal problems, and reflect on FIT's opportunities and limitations.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Imagine a dragon made of seaweed: How images enhance learning in Wikipedia
Authors:
Anita Silva,
Maria Tracy,
Katharina Reinecke,
Eytan Adar,
Miriam Redi
Abstract:
Though images are ubiquitous across Wikipedia, it is not obvious that the image choices optimally support learning. When well selected, images can enhance learning by dual coding, complementing, or supporting articles. When chosen poorly, images can mislead, distract, and confuse. We developed a large dataset containing 470 questions & answers to 94 Wikipedia articles with images on a wide range o…
▽ More
Though images are ubiquitous across Wikipedia, it is not obvious that the image choices optimally support learning. When well selected, images can enhance learning by dual coding, complementing, or supporting articles. When chosen poorly, images can mislead, distract, and confuse. We developed a large dataset containing 470 questions & answers to 94 Wikipedia articles with images on a wide range of topics. Through an online experiment (n=704), we determined whether the images displayed alongside the text of the article are effective in helping readers understand and learn. For certain tasks, such as learning to identify targets visually (e.g., "which of these pictures is a gujia?"), article images significantly improve accuracy. Images did not significantly improve general knowledge questions (e.g., "where are gujia from?"). Most interestingly, only some images helped with visual knowledge questions (e.g., "what shape is a gujia?"). Using our findings, we reflect on the implications for editors and tools to support image selection.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Authors' Values and Attitudes Towards AI-bridged Scalable Personalization of Creative Language Arts
Authors:
Taewook Kim,
Hyomin Han,
Eytan Adar,
Matthew Kay,
John Joon Young Chung
Abstract:
Generative AI has the potential to create a new form of interactive media: AI-bridged creative language arts (CLA), which bridge the author and audience by personalizing the author's vision to the audience's context and taste at scale. However, it is unclear what the authors' values and attitudes would be regarding AI-bridged CLA. To identify these values and attitudes, we conducted an interview s…
▽ More
Generative AI has the potential to create a new form of interactive media: AI-bridged creative language arts (CLA), which bridge the author and audience by personalizing the author's vision to the audience's context and taste at scale. However, it is unclear what the authors' values and attitudes would be regarding AI-bridged CLA. To identify these values and attitudes, we conducted an interview study with 18 authors across eight genres (e.g., poetry, comics) by presenting speculative but realistic AI-bridged CLA scenarios. We identified three benefits derived from the dynamics between author, artifact, and audience: those that 1) authors get from the process, 2) audiences get from the artifact, and 3) authors get from the audience. We found how AI-bridged CLA would either promote or reduce these benefits, along with authors' concerns. We hope our investigation hints at how AI can provide intriguing experiences to CLA audiences while promoting authors' values.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions
Authors:
John Joon Young Chung,
Eytan Adar
Abstract:
While diffusion-based text-to-image (T2I) models provide a simple and powerful way to generate images, guiding this generation remains a challenge. For concepts that are difficult to describe through language, users may struggle to create prompts. Moreover, many of these models are built as end-to-end systems, lacking support for iterative shaping of the image. In response, we introduce PromptPain…
▽ More
While diffusion-based text-to-image (T2I) models provide a simple and powerful way to generate images, guiding this generation remains a challenge. For concepts that are difficult to describe through language, users may struggle to create prompts. Moreover, many of these models are built as end-to-end systems, lacking support for iterative shaping of the image. In response, we introduce PromptPaint, which combines T2I generation with interactions that model how we use colored paints. PromptPaint allows users to go beyond language to mix prompts that express challenging concepts. Just as we iteratively tune colors through layered placements of paint on a physical canvas, PromptPaint similarly allows users to apply different prompts to different canvas areas and times of the generative process. Through a set of studies, we characterize different approaches for mixing prompts, design trade-offs, and socio-technical challenges for generative models. With PromptPaint we provide insight into future steerable generative tools.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
viz2viz: Prompt-driven stylized visualization generation using a diffusion model
Authors:
Jiaqi Wu,
John Joon Young Chung,
Eytan Adar
Abstract:
Creating stylized visualization requires going beyond the limited, abstract, geometric marks produced by most tools. Rather, the designer builds stylized idioms where the marks are both transformed (e.g., photographs of candles instead of bars) and also synthesized into a 'scene' that pushes the boundaries of traditional visualizations. To support this, we introduce viz2viz, a system for transform…
▽ More
Creating stylized visualization requires going beyond the limited, abstract, geometric marks produced by most tools. Rather, the designer builds stylized idioms where the marks are both transformed (e.g., photographs of candles instead of bars) and also synthesized into a 'scene' that pushes the boundaries of traditional visualizations. To support this, we introduce viz2viz, a system for transforming visualizations with a textual prompt to a stylized form. The system follows a high-level recipe that leverages various generative methods to produce new visualizations that retain the properties of the original dataset. While the base recipe is consistent across many visualization types, we demonstrate how it can be specifically adapted to the creation of different visualization types (bar charts, area charts, pie charts, and network visualizations). Our approach introduces techniques for using different prompts for different marks (i.e., each bar can be something completely different) while still retaining image "coherence." We conclude with an evaluation of the approach and discussion on extensions and limitations.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Roboviz: A Game-Centered Project for Information Visualization Education
Authors:
Eytan Adar,
Elsie Lee-Robbins
Abstract:
Due to their pedagogical advantages, large final projects in information visualization courses have become standard practice. Students take on a client--real or simulated--a dataset, and a vague set of goals to create a complete visualization or visual analytics product. Unfortunately, many projects suffer from ambiguous goals, over or under-constrained client expectations, and data constraints th…
▽ More
Due to their pedagogical advantages, large final projects in information visualization courses have become standard practice. Students take on a client--real or simulated--a dataset, and a vague set of goals to create a complete visualization or visual analytics product. Unfortunately, many projects suffer from ambiguous goals, over or under-constrained client expectations, and data constraints that have students spending their time on non-visualization problems (e.g., data cleaning). These are important skills, but are often secondary course objectives, and unforeseen problems can majorly hinder students. We created an alternative for our information visualization course: Roboviz, a real-time game for students to play by building a visualization-focused interface. By designing the game mechanics around four different data types, the project allows students to create a wide array of interactive visualizations. Student teams play against their classmates with the objective to collect the most (good) robots. The flexibility of the strategies encourages variability, a range of approaches, and solving wicked design constraints. We describe the construction of this game and report on student projects over two years. We further show how the game mechanics can be extended or adapted to other game-based projects.
△ Less
Submitted 8 August, 2022;
originally announced August 2022.
-
Affective Learning Objectives for Communicative Visualizations
Authors:
Elsie Lee-Robbins,
Eytan Adar
Abstract:
When designing communicative visualizations, we often focus on goals that seek to convey patterns, relations, or comparisons (cognitive learning objectives). We pay less attention to affective intents--those that seek to influence or leverage the audience's opinions, attitudes, or values in some way. Affective objectives may range in outcomes from making the viewer care about the subject, strength…
▽ More
When designing communicative visualizations, we often focus on goals that seek to convey patterns, relations, or comparisons (cognitive learning objectives). We pay less attention to affective intents--those that seek to influence or leverage the audience's opinions, attitudes, or values in some way. Affective objectives may range in outcomes from making the viewer care about the subject, strengthening a stance on an opinion, or leading them to take further action. Because such goals are often considered a violation of perceived 'neutrality' or are 'political,' designers may resist or be unable to describe these intents, let alone formalize them as learning objectives. While there are notable exceptions--such as advocacy visualizations or persuasive cartography--we find that visualization designers rarely acknowledge or formalize affective objectives. Through interviews with visualization designers, we expand on prior work on using learning objectives as a framework for describing and assessing communicative intent. Specifically, we extend and revise the framework to include a set of affective learning objectives. This structured taxonomy can help designers identify and declare their goals and compare and assess designs in a more principled way. Additionally, the taxonomy can enable external critique and analysis of visualizations. We illustrate the use of the taxonomy with a critical analysis of an affective visualization.
△ Less
Submitted 8 August, 2022;
originally announced August 2022.
-
Human-AI Guidelines in Practice: Leaky Abstractions as an Enabler in Collaborative Software Teams
Authors:
Hariharan Subramonyam,
Jane Im,
Colleen Seifert,
Eytan Adar
Abstract:
In conventional software development, user experience (UX) designers and engineers collaborate through separation of concerns (SoC): designers create human interface specifications, and engineers build to those specifications. However, we argue that Human-AI systems thwart SoC because human needs must shape the design of the AI interface, the underlying AI sub-components, and training data. How do…
▽ More
In conventional software development, user experience (UX) designers and engineers collaborate through separation of concerns (SoC): designers create human interface specifications, and engineers build to those specifications. However, we argue that Human-AI systems thwart SoC because human needs must shape the design of the AI interface, the underlying AI sub-components, and training data. How do designers and engineers currently collaborate on AI and UX design? To find out, we interviewed 21 industry professionals (UX researchers, AI engineers, data scientists, and managers) across 14 organizations about their collaborative work practices and associated challenges. We find that hidden information encapsulated by SoC challenges collaboration across design and engineering concerns. Practitioners describe inventing ad-hoc representations exposing low-level design and implementation details (which we characterize as leaky abstractions) to "puncture" SoC and share information across expertise boundaries. We identify how leaky abstractions are employed to collaborate at the AI-UX boundary and formalize a process of creating and using leaky abstractions.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Sensible AI: Re-imagining Interpretability and Explainability using Sensemaking Theory
Authors:
Harmanpreet Kaur,
Eytan Adar,
Eric Gilbert,
Cliff Lampe
Abstract:
Understanding how ML models work is a prerequisite for responsibly designing, deploying, and using ML-based systems. With interpretability approaches, ML can now offer explanations for its outputs to aid human understanding. Though these approaches rely on guidelines for how humans explain things to each other, they ultimately solve for improving the artifact -- an explanation. In this paper, we p…
▽ More
Understanding how ML models work is a prerequisite for responsibly designing, deploying, and using ML-based systems. With interpretability approaches, ML can now offer explanations for its outputs to aid human understanding. Though these approaches rely on guidelines for how humans explain things to each other, they ultimately solve for improving the artifact -- an explanation. In this paper, we propose an alternate framework for interpretability grounded in Weick's sensemaking theory, which focuses on who the explanation is intended for. Recent work has advocated for the importance of understanding stakeholders' needs -- we build on this by providing concrete properties (e.g., identity, social context, environmental cues, etc.) that shape human understanding. We use an application of sensemaking in organizations as a template for discussing design guidelines for Sensible AI, AI that factors in the nuances of human cognition when trying to explain itself.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
Visualizing Uncertainty in Probabilistic Graphs with Network Hypothetical Outcome Plots (NetHOPs)
Authors:
Dongping Zhang,
Eytan Adar,
Jessica Hullman
Abstract:
Probabilistic graphs are challenging to visualize using the traditional node-link diagram. Encoding edge probability using visual variables like width or fuzziness makes it difficult for users of static network visualizations to estimate network statistics like densities, isolates, path lengths, or clustering under uncertainty. We introduce Network Hypothetical Outcome Plots (NetHOPs), a visualiza…
▽ More
Probabilistic graphs are challenging to visualize using the traditional node-link diagram. Encoding edge probability using visual variables like width or fuzziness makes it difficult for users of static network visualizations to estimate network statistics like densities, isolates, path lengths, or clustering under uncertainty. We introduce Network Hypothetical Outcome Plots (NetHOPs), a visualization technique that animates a sequence of network realizations sampled from a network distribution defined by probabilistic edges. NetHOPs employ an aggregation and anchoring algorithm used in dynamic and longitudinal graph drawing to parameterize layout stability for uncertainty estimation. We present a community matching algorithm to enable visualizing the uncertainty of cluster membership and community occurrence. We describe the results of a study in which 51 network experts used NetHOPs to complete a set of common visual analysis tasks and reported how they perceived network structures and properties subject to uncertainty. Participants' estimates fell, on average, within 11% of the ground truth statistics, suggesting NetHOPs can be a reasonable approach for enabling network analysts to reason about multiple properties under uncertainty. Participants appeared to articulate the distribution of network statistics slightly more accurately when they could manipulate the layout anchoring and the animation speed. Based on these findings, we synthesize design recommendations for developing and using animated visualizations for probabilistic networks.
△ Less
Submitted 22 August, 2021;
originally announced August 2021.
-
Learning Objectives, Insights, and Assessments: How Specification Formats Impact Design
Authors:
Elsie Lee-Robbins,
Shiqing He,
Eytan Adar
Abstract:
Despite the ubiquity of communicative visualizations, specifying communicative intent during design is ad hoc. Whether we are selecting from a set of visualizations, commissioning someone to produce them, or creating them ourselves, an effective way of specifying intent can help guide this process. Ideally, we would have a concise and shared specification language. In previous work, we have argued…
▽ More
Despite the ubiquity of communicative visualizations, specifying communicative intent during design is ad hoc. Whether we are selecting from a set of visualizations, commissioning someone to produce them, or creating them ourselves, an effective way of specifying intent can help guide this process. Ideally, we would have a concise and shared specification language. In previous work, we have argued that communicative intents can be viewed as a learning/assessment problem (i.e., what should the reader learn and what test should they do well on). Learning-based specification formats are linked (e.g., assessments are derived from objectives) but some may more effectively specify communicative intent. Through a large-scale experiment, we studied three specification types: learning objectives, insights, and assessments. Participants, guided by one of these specifications, rated their preferences for a set of visualization designs. Then, we evaluated the set of visualization designs to assess which specification led participants to prefer the most effective visualizations. We find that while all specification types have benefits over no-specification, each format has its own advantages. Our results show that learning objective-based specifications helped participants the most in visualization selection. We also identify situations in which specifications may be insufficient and assessments are vital.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Towards A Process Model for Co-Creating AI Experiences
Authors:
Hariharan Subramonyam,
Colleen Seifert,
Eytan Adar
Abstract:
Thinking of technology as a design material is appealing. It encourages designers to explore the material's properties to understand its capabilities and limitations, a prerequisite to generative design thinking. However, as a material, AI resists this approach because its properties emerge as part of the design process itself. Therefore, designers and AI engineers must collaborate in new ways to…
▽ More
Thinking of technology as a design material is appealing. It encourages designers to explore the material's properties to understand its capabilities and limitations, a prerequisite to generative design thinking. However, as a material, AI resists this approach because its properties emerge as part of the design process itself. Therefore, designers and AI engineers must collaborate in new ways to create both the material and its application experience. We investigate the co-creation process through a design study with 10 pairs of designers and engineers. We find that design 'probes' with user data are a useful tool in defining AI materials. Through data probes, designers construct designerly representations of the envisioned AI experience (AIX) to identify desirable AI characteristics. Data probes facilitate divergent thinking, material testing, and design validation. Based on our findings, we propose a process model for co-creating AIX and offer design considerations for incorporating data probes in design tools.
△ Less
Submitted 6 May, 2021; v1 submitted 15 April, 2021;
originally announced April 2021.
-
Communicative Visualizations as a Learning Problem
Authors:
Eytan Adar,
Elsie Lee
Abstract:
Significant research has provided robust task and evaluation languages for the analysis of exploratory visualizations. Unfortunately, these taxonomies fail when applied to communicative visualizations. Instead, designers often resort to evaluating communicative visualizations from the cognitive efficiency perspective: "can the recipient accurately decode my message/insight?" However, designers are…
▽ More
Significant research has provided robust task and evaluation languages for the analysis of exploratory visualizations. Unfortunately, these taxonomies fail when applied to communicative visualizations. Instead, designers often resort to evaluating communicative visualizations from the cognitive efficiency perspective: "can the recipient accurately decode my message/insight?" However, designers are unlikely to be satisfied if the message went 'in one ear and out the other.' The consequence of this inconsistency is that it is difficult to design or select between competing options in a principled way. The problem we address is the fundamental mismatch between how designers want to describe their intent, and the language they have. We argue that visualization designers can address this limitation through a learning lens: that the recipient is a student and the designer a teacher. By using learning objectives, designers can better define, assess, and compare communicative visualizations. We illustrate how the learning-based approach provides a framework for understanding a wide array of communicative goals. To understand how the framework can be applied (and its limitations), we surveyed and interviewed members of the Data Visualization Society using their own visualizations as a probe. Through this study we identified the broad range of objectives in communicative visualizations and the prevalence of certain objective types.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
LAMVI-2: A Visual Tool for Comparing and Tuning Word Embedding Models
Authors:
Xin Rong,
Joshua Luckson,
Eytan Adar
Abstract:
Tuning machine learning models, particularly deep learning architectures, is a complex process. Automated hyperparameter tuning algorithms often depend on specific optimization metrics. However, in many situations, a developer trades one metric against another: accuracy versus overfitting, precision versus recall, smaller models and accuracy, etc. With deep learning, not only are the model's repre…
▽ More
Tuning machine learning models, particularly deep learning architectures, is a complex process. Automated hyperparameter tuning algorithms often depend on specific optimization metrics. However, in many situations, a developer trades one metric against another: accuracy versus overfitting, precision versus recall, smaller models and accuracy, etc. With deep learning, not only are the model's representations opaque, the model's behavior when parameters "knobs" are changed may also be unpredictable. Thus, picking the "best" model often requires time-consuming model comparison. In this work, we introduce LAMVI-2, a visual analytics system to support a developer in comparing hyperparameter settings and outcomes. By focusing on word-embedding models ("deep learning for text") we integrate views to compare both high-level statistics as well as internal model behaviors (e.g., comparing word 'distances'). We demonstrate how developers can work with LAMVI-2 to more quickly and accurately narrow down an appropriate and effective application-specific model.
△ Less
Submitted 22 October, 2018;
originally announced October 2018.
-
Extracting Inter-community Conflicts in Reddit
Authors:
Srayan Datta,
Eytan Adar
Abstract:
Anti-social behaviors in social media can happen both at user and community levels. While a great deal of attention is on the individual as an 'aggressor,' the banning of entire Reddit subcommunities (i.e., subreddits) demonstrates that this is a multi-layer concern. Existing research on inter-community conflict has largely focused on specific subcommunities or ideological opponents. However, anta…
▽ More
Anti-social behaviors in social media can happen both at user and community levels. While a great deal of attention is on the individual as an 'aggressor,' the banning of entire Reddit subcommunities (i.e., subreddits) demonstrates that this is a multi-layer concern. Existing research on inter-community conflict has largely focused on specific subcommunities or ideological opponents. However, antagonistic behaviors may be more pervasive and integrate into the broader network. In this work, we study the landscape of conflicts among subreddits by deriving higher-level (community) behaviors from the way individuals are sanctioned and rewarded. By constructing a conflict network, we characterize different patterns in subreddit-to-subreddit conflicts as well as communities of 'co-targeted' subreddits. By analyzing the dynamics of these interactions, we also observe that the conflict focus shifts over time.
△ Less
Submitted 13 August, 2018;
originally announced August 2018.
-
Learning Word Relatedness over Time
Authors:
Guy D. Rosin,
Eytan Adar,
Kira Radinsky
Abstract:
Search systems are often focused on providing relevant results for the "now", assuming both corpora and user needs that focus on the present. However, many corpora today reflect significant longitudinal collections ranging from 20 years of the Web to hundreds of years of digitized newspapers and books. Understanding the temporal intent of the user and retrieving the most relevant historical conten…
▽ More
Search systems are often focused on providing relevant results for the "now", assuming both corpora and user needs that focus on the present. However, many corpora today reflect significant longitudinal collections ranging from 20 years of the Web to hundreds of years of digitized newspapers and books. Understanding the temporal intent of the user and retrieving the most relevant historical content has become a significant challenge. Common search features, such as query expansion, leverage the relationship between terms but cannot function well across all times when relationships vary temporally. In this work, we introduce a temporal relationship model that is extracted from longitudinal data collections. The model supports the task of identifying, given two words, when they relate to each other. We present an algorithmic framework for this task and show its application for the task of query expansion, achieving high gain.
△ Less
Submitted 30 July, 2017; v1 submitted 25 July, 2017;
originally announced July 2017.
-
Link-Prediction Enhanced Consensus Clustering for Complex Networks
Authors:
Matthew Burgess,
Eytan Adar,
Michael Cafarella
Abstract:
Many real networks that are inferred or collected from data are incomplete due to missing edges. Missing edges can be inherent to the dataset (Facebook friend links will never be complete) or the result of sampling (one may only have access to a portion of the data). The consequence is that downstream analyses that consume the network will often yield less accurate results than if the edges were c…
▽ More
Many real networks that are inferred or collected from data are incomplete due to missing edges. Missing edges can be inherent to the dataset (Facebook friend links will never be complete) or the result of sampling (one may only have access to a portion of the data). The consequence is that downstream analyses that consume the network will often yield less accurate results than if the edges were complete. Community detection algorithms, in particular, often suffer when critical intra-community edges are missing. We propose a novel consensus clustering algorithm to enhance community detection on incomplete networks. Our framework utilizes existing community detection algorithms that process networks imputed by our link prediction based algorithm. The framework then merges their multiple outputs into a final consensus output. On average our method boosts performance of existing algorithms by 7% on artificial data and 17% on ego networks collected from Facebook.
△ Less
Submitted 4 June, 2015;
originally announced June 2015.
-
Information Evolution in Social Networks
Authors:
Lada A. Adamic,
Thomas M. Lento,
Eytan Adar,
Pauline C. Ng
Abstract:
Social networks readily transmit information, albeit with less than perfect fidelity. We present a large-scale measurement of this imperfect information copying mechanism by examining the dissemination and evolution of thousands of memes, collectively replicated hundreds of millions of times in the online social network Facebook. The information undergoes an evolutionary process that exhibits seve…
▽ More
Social networks readily transmit information, albeit with less than perfect fidelity. We present a large-scale measurement of this imperfect information copying mechanism by examining the dissemination and evolution of thousands of memes, collectively replicated hundreds of millions of times in the online social network Facebook. The information undergoes an evolutionary process that exhibits several regularities. A meme's mutation rate characterizes the population distribution of its variants, in accordance with the Yule process. Variants further apart in the diffusion cascade have greater edit distance, as would be expected in an iterative, imperfect replication process. Some text sequences can confer a replicative advantage; these sequences are abundant and transfer "laterally" between different memes. Subpopulations of the social network can preferentially transmit a specific variant of a meme if the variant matches their beliefs or culture. Understanding the mechanism driving change in diffusing information has important implications for how we interpret and harness the information that reaches us through our social networks.
△ Less
Submitted 27 February, 2014;
originally announced February 2014.
-
Tycoon: an Implementation of a Distributed, Market-based Resource Allocation System
Authors:
Kevin Lai,
Lars Rasmusson,
Eytan Adar,
Stephen Sorkin,
Li Zhang,
Bernardo A. Huberman
Abstract:
Distributed clusters like the Grid and PlanetLab enable the same statistical multiplexing efficiency gains for computing as the Internet provides for networking. One major challenge is allocating resources in an economically efficient and low-latency way. A common solution is proportional share, where users each get resources in proportion to their pre-defined weight. However, this does not allo…
▽ More
Distributed clusters like the Grid and PlanetLab enable the same statistical multiplexing efficiency gains for computing as the Internet provides for networking. One major challenge is allocating resources in an economically efficient and low-latency way. A common solution is proportional share, where users each get resources in proportion to their pre-defined weight. However, this does not allow users to differentiate the value of their jobs. This leads to economic inefficiency. In contrast, systems that require reservations impose a high latency (typically minutes to hours) to acquire resources.
We present Tycoon, a market based distributed resource allocation system based on proportional share. The key advantages of Tycoon are that it allows users to differentiate the value of their jobs, its resource acquisition latency is limited only by communication delays, and it imposes no manual bidding overhead on users. We present experimental results using a prototype implementation of our design.
△ Less
Submitted 8 December, 2004;
originally announced December 2004.