Search | arXiv e-print repository

Lost in Magnitudes: Exploring the Design Space for Visualizing Data with Large Value Ranges

Authors: Katerina Batziakoudi, Florent Cabric, Stéphanie Rey, Jean-Daniel Fekete

Abstract: We explore the design space for the static visualization of datasets with quantitative attributes that vary over multiple orders of magnitude-we call these attributes Orders of Magnitude Values (OMVs)-and provide design guidelines and recommendations on effective visual encodings for OMVs. Current charts rely on linear or logarithmic scales to visualize values, leading to limitations in performing… ▽ More We explore the design space for the static visualization of datasets with quantitative attributes that vary over multiple orders of magnitude-we call these attributes Orders of Magnitude Values (OMVs)-and provide design guidelines and recommendations on effective visual encodings for OMVs. Current charts rely on linear or logarithmic scales to visualize values, leading to limitations in performing simple tasks for OMVs. In particular, linear scales prevent the reading of smaller magnitudes and their comparisons, while logarithmic scales are challenging for the general public to understand. Our design space leverages the approach of dividing OMVs into two different parts: mantissa and exponent, in a way similar to scientific notation. This separation allows for a visual encoding of both parts. For our exploration, we use four datasets, each with two attributes: an OMV, divided into mantissa and exponent, and a second attribute that is nominal, ordinal, time, or quantitative. We start from the original design space described by the Grammar of Graphics and systematically generate all possible visualizations for these datasets, employing different marks and visual channels. We refine this design space by enforcing integrity constraints from visualization and graphical perception literature. Through a qualitative assessment of all viable combinations, we discuss the most effective visualizations for OMVs, focusing on channel and task effectiveness. The article's main contributions are 1) the presentation of the design space of OMVs, 2) the generation of a large number of OMV visualizations, among which some are novel and effective, 3) the refined definition of a scale that we call E+M for OMVs, and 4) guidelines and recommendations for designing effective OMV visualizations. These efforts aim to enrich visualization systems to better support data with OMVs and guide future research. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2210.06562 [pdf, other]

doi 10.1109/TVCG.2022.3231230

Scalability in Visualization

Authors: Gaëlle Richer, Alexis Pister, Moataz Abdelaal, Jean-Daniel Fekete, Michael Sedlmair, Daniel Weiskopf

Abstract: We introduce a conceptual model for scalability designed for visualization research. With this model, we systematically analyze over 120 visualization publications from 1990-2020 to characterize the different notions of scalability in these works. While many papers have addressed scalability issues, our survey identifies a lack of consistency in the use of the term in the visualization research co… ▽ More We introduce a conceptual model for scalability designed for visualization research. With this model, we systematically analyze over 120 visualization publications from 1990-2020 to characterize the different notions of scalability in these works. While many papers have addressed scalability issues, our survey identifies a lack of consistency in the use of the term in the visualization research community. We address this issue by introducing a consistent terminology meant to help visualization researchers better characterize the scalability aspects in their research. It also helps in providing multiple methods for supporting the claim that a work is "scalable". Our model is centered around an effort function with inputs and outputs. The inputs are the problem size and resources, whereas the outputs are the actual efforts, for instance, in terms of computational run time or visual clutter. We select representative examples to illustrate different approaches and facets of what scalability can mean in visualization literature. Finally, targeting the diverse crowd of visualization researchers without a scalability tradition, we provide a set of recommendations for how scalability can be presented in a clear and consistent way to improve fair comparison between visualization techniques and systems and foster reproducibility. △ Less

Submitted 14 December, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

ACM Class: H.5.2

arXiv:2209.13498 [pdf, other]

Characterizing Uncertainty in the Visual Text Analysis Pipeline

Authors: Pantea Haghighatkhah, Mennatallah El-Assady, Jean-Daniel Fekete, Narges Mahyar, Carita Paradis, Vasiliki Simaki, Bettina Speckmann

Abstract: Current visual text analysis approaches rely on sophisticated processing pipelines. Each step of such a pipeline potentially amplifies any uncertainties from the previous step. To ensure the comprehensibility and interoperability of the results, it is of paramount importance to clearly communicate the uncertainty not only of the output but also within the pipeline. In this paper, we characterize t… ▽ More Current visual text analysis approaches rely on sophisticated processing pipelines. Each step of such a pipeline potentially amplifies any uncertainties from the previous step. To ensure the comprehensibility and interoperability of the results, it is of paramount importance to clearly communicate the uncertainty not only of the output but also within the pipeline. In this paper, we characterize the sources of uncertainty along the visual text analysis pipeline. Within its three phases of labeling, modeling, and analysis, we identify six sources, discuss the type of uncertainty they create, and how they propagate. △ Less

Submitted 22 September, 2022; originally announced September 2022.

arXiv:2209.11534 [pdf, other]

doi 10.1109/BELIV57783.2022.00008

An Interdisciplinary Perspective on Evaluation and Experimental Design for Visual Text Analytics: Position Paper

Authors: Kostiantyn Kucher, Nicole Sultanum, Angel Daza, Vasiliki Simaki, Maria Skeppstedt, Barbara Plank, Jean-Daniel Fekete, Narges Mahyar

Abstract: Appropriate evaluation and experimental design are fundamental for empirical sciences, particularly in data-driven fields. Due to the successes in computational modeling of languages, for instance, research outcomes are having an increasingly immediate impact on end users. As the gap in adoption by end users decreases, the need increases to ensure that tools and models developed by the research co… ▽ More Appropriate evaluation and experimental design are fundamental for empirical sciences, particularly in data-driven fields. Due to the successes in computational modeling of languages, for instance, research outcomes are having an increasingly immediate impact on end users. As the gap in adoption by end users decreases, the need increases to ensure that tools and models developed by the research communities and practitioners are reliable, trustworthy, and supportive of the users in their goals. In this position paper, we focus on the issues of evaluating visual text analytics approaches. We take an interdisciplinary perspective from the visualization and natural language processing communities, as we argue that the design and validation of visual text analytics include concerns beyond computational or visual/interactive methods on their own. We identify four key groups of challenges for evaluating visual text analytics approaches (data ambiguity, experimental design, user trust, and "big picture" concerns) and provide suggestions for research opportunities from an interdisciplinary perspective. △ Less

Submitted 20 December, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: Published in Proceedings of the 2022 IEEE Workshop on Evaluation and Beyond - Methodological Approaches to Visualization (BELIV '22). ACM 2012 CCS: Human-centered computing, Visualization, Visualization design and evaluation methods

arXiv:2005.08101 [pdf, other]

doi 10.1177/1473871621991539

The Missing Path: Analysing Incompleteness in Knowledge Graphs

Authors: Marie Destandau, Jean-Daniel Fekete

Abstract: Knowledge Graphs (KG) allow to merge and connect heterogeneous data despite their differences; they are incomplete by design. Yet, KG data producers need to ensure the best level of completeness, as far as possible. The difficulty is that they have no means to distinguish cases where incomplete entities could and should be fixed. We present a new visualisation tool: The Missing Path, to support th… ▽ More Knowledge Graphs (KG) allow to merge and connect heterogeneous data despite their differences; they are incomplete by design. Yet, KG data producers need to ensure the best level of completeness, as far as possible. The difficulty is that they have no means to distinguish cases where incomplete entities could and should be fixed. We present a new visualisation tool: The Missing Path, to support them in identifying coherent subsets of entities that can be repaired. It relies on a map, grouping entities according to their incomplete profile. The map is coordinated with histograms and stacked charts to support interactive exploration and analysis; the summary of a subset can be compared with the one of the full collection to reveal its distinctive features. We conduct an iterative design process and evaluation with 9 Wikidata contributors. Participants gain insights and find various strategies to identify coherent subsets to be fixed. △ Less

Submitted 13 January, 2021; v1 submitted 16 May, 2020; originally announced May 2020.

Comments: 10 pages, 11 figures

arXiv:2005.02972 [pdf, other]

doi 10.1109/TVCG.2020.3030347

Integrating Prior Knowledge in Mixed Initiative Social Network Clustering

Authors: Alexis Pister, Paolo Buono, Jean-Daniel Fekete, Catherine Plaisant, Paola Valdivia

Abstract: We propose a new approach -- called PK-clustering -- to help social scientists create meaningful clusters in social networks. Many clustering algorithms exist but most social scientists find them difficult to understand, and tools do not provide any guidance to choose algorithms, or to evaluate results taking into account the prior knowledge of the scientists. Our work introduces a new clustering… ▽ More We propose a new approach -- called PK-clustering -- to help social scientists create meaningful clusters in social networks. Many clustering algorithms exist but most social scientists find them difficult to understand, and tools do not provide any guidance to choose algorithms, or to evaluate results taking into account the prior knowledge of the scientists. Our work introduces a new clustering approach and a visual analytics user interface that address this issue. It is based on a process that 1) captures the prior knowledge of the scientists as a set of incomplete clusters, 2) runs multiple clustering algorithms (similarly to clustering ensemble methods), 3) visualizes the results of all the algorithms ranked and summarized by how well each algorithm matches the prior knowledge, 4) evaluates the consensus between user-selected algorithms, and 5) allows users to review details and iteratively update the acquired knowledge. We describe our approach using an initial functional prototype, then provide two examples of use and early feedback from social scientists. We believe our clustering approach offers a novel constructive method to iteratively build knowledge while avoiding being overly influenced by the results of often randomly selected black-box clustering algorithms. △ Less

Submitted 17 May, 2021; v1 submitted 6 May, 2020; originally announced May 2020.

ACM Class: H.5.2

Journal ref: IEEE Transactions on Visualization and Computer Graphics, 2021

arXiv:2002.09949 [pdf, other]

Path Outlines: Browsing Path-Based Summaries of Knowledge Graphs

Authors: Marie Destandau, Olivier Corby, Jean-Daniel Fekete, Alain Giboin

Abstract: Knowledge Graphs have become a ubiquitous technology powering search engines, recommender systems, connected objects, corporate knowledge management and Open Data. They rely on small units of information named triples that can be combined to form higher level statements across datasets following information needs. But data producers face a problem: reconstituting chains of triples has a high cogni… ▽ More Knowledge Graphs have become a ubiquitous technology powering search engines, recommender systems, connected objects, corporate knowledge management and Open Data. They rely on small units of information named triples that can be combined to form higher level statements across datasets following information needs. But data producers face a problem: reconstituting chains of triples has a high cognitive cost, which hinders them from gaining meaningful overviews of their own datasets. We introduce path outlines: conceptual objects characterizing sequences of triples with descriptive statistics. We interview 11 data producers to evaluate their interest. We present Path Outlines, a tool to browse path-based summaries, based on coordinated views with 2 novel visualisations. We compare Path Outlines with the current baseline technique in an experiment with 36 participants. We show that it is 3 times faster, leads to better task completion, less errors, that participants prefer it, and find tasks easier with it. △ Less

Submitted 8 October, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

Comments: 16 pages, 9 figures

arXiv:1912.08101 [pdf, other]

Visualizing and Analyzing Entity Activity on the Bitcoin Network

Authors: Christoph Kinkeldey, Jean-Daniel Fekete, Tanja Blascheck, Petra Isenberg

Abstract: We present BitConduite, a visual analytics tool for explorative analysis of financial activity within the Bitcoin network. Bitcoin is the largest cryptocurrency worldwide and a phenomenon that challenges the underpinnings of traditional financial systems - its users can send money pseudo-anonymously while circumventing traditional banking systems. Yet, despite the fact that all financial transacti… ▽ More We present BitConduite, a visual analytics tool for explorative analysis of financial activity within the Bitcoin network. Bitcoin is the largest cryptocurrency worldwide and a phenomenon that challenges the underpinnings of traditional financial systems - its users can send money pseudo-anonymously while circumventing traditional banking systems. Yet, despite the fact that all financial transactions in Bitcoin are available in an openly accessible online ledger - the blockchain - not much is known about how different types of actors in the network (we call them entities) actually use Bitcoin. BitConduite offers an entity-centered view on transactions, making the data accessible to non-technical experts through a guided workflow for classification of entities according to several activity metrics. Other novelties are the possibility to cluster entities by similarity and exploration of transaction data at different scales, from large groups of entities down to a single entity and the associated transactions. Two use cases illustrate the workflow of the system and its analytic power. We report on feedback regarding the approach and the the software tool gathered during a workshop with domain experts, and we discuss the potential of the approach based on our findings. △ Less

Submitted 18 December, 2019; v1 submitted 17 December, 2019; originally announced December 2019.

arXiv:1812.08032 [pdf, other]

Progressive Data Science: Potential and Challenges

Authors: Cagatay Turkay, Nicola Pezzotti, Carsten Binnig, Hendrik Strobelt, Barbara Hammer, Daniel A. Keim, Jean-Daniel Fekete, Themis Palpanas, Yunhai Wang, Florin Rusu

Abstract: Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped-up significantly by providing quick feedback on the impact of changes. The idea of progressive data science is to compute the results of changes in a progressive manner,… ▽ More Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped-up significantly by providing quick feedback on the impact of changes. The idea of progressive data science is to compute the results of changes in a progressive manner, returning a first approximation of results quickly and allow iterative refinements until converging to a final result. Enabling the user to interact with the intermediate results allows an early detection of erroneous or suboptimal choices, the guided definition of modifications to the pipeline and their quick assessment. In this paper, we discuss the progressiveness challenges arising in different steps of the data science pipeline. We describe how changes in each step of the pipeline impact the subsequent steps and outline why progressive data science will help to make the process more effective. Computing progressive approximations of outcomes resulting from changes creates numerous research challenges, especially if the changes are made in the early steps of the pipeline. We discuss these challenges and outline first steps towards progressiveness, which, we argue, will ultimately help to significantly speed-up the overall data science process. △ Less

Submitted 12 September, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

ACM Class: H.5.2; H.3.m; I.2.m; I.3.m

arXiv:1705.05283 [pdf, other]

Visualizing Dimensionality Reduction Artifacts: An Evaluation

Authors: Nicolas Heulot, Jean-Daniel Fekete, Michael Aupetit

Abstract: Multidimensional scaling allows visualizing high-dimensional data as 2D maps with the premise that insights in 2D reveal valid information in high-dimensions. However, the resulting projections suffer from artifacts such as bad local neighborhood preservation and clusters tearing. Interactively coloring the projection according to the discrepancy between original proximities relative to a referenc… ▽ More Multidimensional scaling allows visualizing high-dimensional data as 2D maps with the premise that insights in 2D reveal valid information in high-dimensions. However, the resulting projections suffer from artifacts such as bad local neighborhood preservation and clusters tearing. Interactively coloring the projection according to the discrepancy between original proximities relative to a reference item reveals these artifacts, but it is not clear if conveying these proximities using color and displaying only local information really helps the visual analysis of projections. We conducted a controlled experiment to investigate the relevance of this interactive technique to help the visual analysis of any projection regardless its quality. We compared the bare projection to the interactive coloring of the original proximities on different visual analysis tasks involving outliers and clusters. Results indicate that the interactive coloring is worthwhile for local tasks as it is significantly robust to projection artifacts whereas the projection is not. However this interactive technique does not help significantly for visual clustering tasks for that projections already give a suitable overview. △ Less

Submitted 15 May, 2017; originally announced May 2017.

ACM Class: H.5.2

arXiv:1612.05239 [pdf, other]

doi 10.1145/3092906

The CENDARI Infrastructure

Authors: Nadia Boukhelifa, Mike Bryant, Nataša Bulatović, Ivan Čukić, Jean-Daniel Fekete, Milica Knežević, Jörg Lehmann, David Stuart, Carsten Thiel

Abstract: The CENDARI infrastructure is a research supporting platform designed to provide tools for transnational historical research, focusing on two topics: Medieval culture and World War I. It exposes to the end users modern web-based tools relying on a sophisticated infrastructure to collect, enrich, annotate, and search through large document corpora. Supporting researchers in their daily work is a no… ▽ More The CENDARI infrastructure is a research supporting platform designed to provide tools for transnational historical research, focusing on two topics: Medieval culture and World War I. It exposes to the end users modern web-based tools relying on a sophisticated infrastructure to collect, enrich, annotate, and search through large document corpora. Supporting researchers in their daily work is a novel concern for infrastructures. We describe how we gathered requirements through multiple methods to understand the historians' needs and derive an abstract workflow to support them. We then outline the tools we have built, tying their technical descriptions to the user requirements. The main tools are the Note Taking Environment and its faceted search capabilities, the Data Integration platform including the Data API, supporting semantic enrichment through entity recognition, and the environment supporting the software development processes throughout the project to keep both technical partners and researchers in the loop. The outcomes are technical together with new resources developed and gathered, and the research workflow that has been described and documented. △ Less

Submitted 15 December, 2016; originally announced December 2016.

arXiv:1607.05162 [pdf, other]

Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis

Authors: Jean-Daniel Fekete, Romain Primet

Abstract: Exploring data requires a fast feedback loop from the analyst to the system, with a latency below about 10 seconds because of human cognitive limitations. When data becomes large or analysis becomes complex, sequential computations can no longer be completed in a few seconds and data exploration is severely hampered. This article describes a novel computation paradigm called Progressive Computatio… ▽ More Exploring data requires a fast feedback loop from the analyst to the system, with a latency below about 10 seconds because of human cognitive limitations. When data becomes large or analysis becomes complex, sequential computations can no longer be completed in a few seconds and data exploration is severely hampered. This article describes a novel computation paradigm called Progressive Computation for Data Analysis or more concisely Progressive Analytics, that brings at the programming language level a low-latency guarantee by performing computations in a progressive fashion. Moving this progressive computation at the language level relieves the programmer of exploratory data analysis systems from implementing the whole analytics pipeline in a progressive way from scratch, streamlining the implementation of scalable exploratory data analysis systems. This article describes the new paradigm through a prototype implementation called ProgressiVis, and explains the requirements it implies through examples. △ Less

Submitted 18 July, 2016; originally announced July 2016.

Comments: 10 pages

ACM Class: K.6.1; K.7.m; H.5.m

arXiv:0705.0599 [pdf, ps, other]

doi 10.1109/TVCG.2007.70582

NodeTrix: Hybrid Representation for Analyzing Social Networks

Authors: Nathalie Henry, Jean-Daniel Fekete, Michael Mcguffin

Abstract: The need to visualize large social networks is growing as hardware capabilities make analyzing large networks feasible and many new data sets become available. Unfortunately, the visualizations in existing systems do not satisfactorily answer the basic dilemma of being readable both for the global structure of the network and also for detailed analysis of local communities. To address this probl… ▽ More The need to visualize large social networks is growing as hardware capabilities make analyzing large networks feasible and many new data sets become available. Unfortunately, the visualizations in existing systems do not satisfactorily answer the basic dilemma of being readable both for the global structure of the network and also for detailed analysis of local communities. To address this problem, we present NodeTrix, a hybrid representation for networks that combines the advantages of two traditional representations: node-link diagrams are used to show the global structure of a network, while arbitrary portions of the network can be shown as adjacency matrices to better support the analysis of communities. A key contribution is a set of interaction techniques. These allow analysts to create a NodeTrix visualization by dragging selections from either a node-link or a matrix, flexibly manipulate the NodeTrix representation to explore the dataset, and create meaningful summary visualizations of their findings. Finally, we present a case study applying NodeTrix to the analysis of the InfoVis 2004 coauthorship dataset to illustrate the capabilities of NodeTrix as both an exploration tool and an effective means of communicating results. △ Less

Submitted 21 June, 2007; v1 submitted 4 May, 2007; originally announced May 2007.

Showing 1–13 of 13 results for author: Fekete, J