Zum Hauptinhalt springen

Showing 1–18 of 18 results for author: Valduriez, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.10935  [pdf, other

    cs.DC cs.AI cs.LG

    AEDFL: Efficient Asynchronous Decentralized Federated Learning with Heterogeneous Devices

    Authors: Ji Liu, Tianshi Che, Yang Zhou, Ruoming Jin, Huaiyu Dai, Dejing Dou, Patrick Valduriez

    Abstract: Federated Learning (FL) has achieved significant achievements recently, enabling collaborative model training on distributed data over edge devices. Iterative gradient or model exchanges between devices and the centralized server in the standard FL paradigm suffer from severe efficiency bottlenecks on the server. While enabling collaborative training without a central server, existing decentralize… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: To appear in SDM 2024, 15 pages

  2. KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments

    Authors: Daniel Rosendo, Kate Keahey, Alexandru Costan, Matthieu Simonin, Patrick Valduriez, Gabriel Antoniu

    Abstract: Distributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to supercomputers (the Computing Continuum). Understanding the performance trade-offs of large-scale workflows deployed on such complex Edge-to-Cloud Continuu… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Journal ref: ACM REP '23: ACM Conference on Reproducibility and Replicability, Jun 2023, Santa Cruz, California, United States. pp.62-73

  3. arXiv:2307.10658  [pdf, other

    cs.DB cs.DC cs.PF

    ProvLight: Efficient Workflow Provenance Capture on the Edge-to-Cloud Continuum

    Authors: Daniel Rosendo, Marta Mattoso, Alexandru Costan, Renan Souza, Débora Pina, Patrick Valduriez, Gabriel Antoniu

    Abstract: Modern scientific workflows require hybrid infrastructures combining numerous decentralized resources on the IoT/Edge interconnected to Cloud/HPC systems (aka the Computing Continuum) to enable their optimized execution. Understanding and optimizing the performance of such complex Edge-to-Cloud workflows is challenging. Capturing the provenance of key performance indicators, with their related dat… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Journal ref: Cluster 2023 - IEEE International Conference on Cluster Computing, Oct 2023, Santa Fe, New Mexico, United States

  4. arXiv:2207.06667  [pdf, other

    cs.DC cs.AI cs.LG

    Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources

    Authors: Ji Liu, Daxiang Dong, Xi Wang, An Qin, Xingjian Li, Patrick Valduriez, Dejing Dou, Dianhai Yu

    Abstract: Although more layers and more parameters generally improve the accuracy of the models, such big models generally have high computational complexity and require big memory, which exceed the capacity of small devices for inference and incurs long training time. In addition, it is difficult to afford long training time and inference time of big models even in high performance servers, as well. As an… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: To appear in Concurrency and Computation: Practice and Experience, 16 pages, 7 figures, 5 tables

  5. Distributed intelligence on the Edge-to-Cloud Continuum: A systematic literature review

    Authors: Daniel Rosendo, Alexandru Costan, Patrick Valduriez, Gabriel Antoniu

    Abstract: The explosion of data volumes generated by an increasing number of applications is strongly impacting the evolution of distributed digital infrastructures for data analytics and machine learning (ML). While data analytics used to be mainly performed on cloud infrastructures, the rapid development of IoT infrastructures and the requirements for low-latency, secure processing has motivated the devel… ▽ More

    Submitted 29 April, 2022; originally announced May 2022.

    Journal ref: Journal of Parallel and Distributed Computing, Elsevier, 2022, 166, pp.71-94

  6. arXiv:2109.01379  [pdf, ps, other

    cs.DC cs.NI cs.PF

    Enabling Reproducible Analysis of Complex Workflows on the Edge-to-Cloud Continuum

    Authors: Daniel Rosendo, Alexandru Costan, Gabriel Antoniu, Patrick Valduriez

    Abstract: Distributed digital infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex applications to be executed from IoT Edge devices to the HPC Cloud (aka the Computing Continuum, the Digital Continuum, or the Transcontinuum). Understanding end-to-end performance in such a complex continuum is challenging. This breaks down to reconciling many, t… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Journal ref: Conf{é}rence sur la Gestion de Donn{é}es -- Principles, Technologies et Applications, Oct 2021, Paris, France

  7. arXiv:2108.04033  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum

    Authors: Daniel Rosendo, Alexandru Costan, Gabriel Antoniu, Matthieu Simonin, Jean-Christophe Lombardo, Alexis Joly, Patrick Valduriez

    Abstract: In more and more application areas, we are witnessing the emergence of complex workflows that combine computing, analytics and learning. They often require a hybrid execution infrastructure with IoT devices interconnected to cloud/HPC systems (aka Computing Continuum). Such workflows are subject to complex constraints and requirements in terms of performance, resource usage, energy consumption and… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

    Journal ref: Cluster 2021 - IEEE International Conference on Cluster Computing, Sep 2021, Portland, OR, United States

  8. Distributed In-memory Data Management for Workflow Executions

    Authors: Renan Souza, Vítor Silva, Alexandre A. B. Lima, Daniel de Oliveira, Patrick Valduriez, Marta Mattoso

    Abstract: Complex scientific experiments from various domains are typically modeled as workflows and executed on large-scale machines using a Parallel Workflow Management System (WMS). Since such executions usually last for hours or days, some WMSs provide user steering support, i.e., they allow users to run data analyses and, depending on the results, adapt the workflows at runtime. A challenge in the para… ▽ More

    Submitted 11 May, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: 26 pages, 14 figures, PeerJ Computer Science (2021)

    MSC Class: 65Y05; 68P15; 68P20 ACM Class: H.2; C.4

  9. arXiv:2102.03243  [pdf, other

    cs.CV cs.AI

    Hyperspherical embedding for novel class classification

    Authors: Rafael S. Pereira, Alexis Joly, Patrick Valduriez, Fabio Porto

    Abstract: Deep learning models have become increasingly useful in many different industries. On the domain of image classification, convolutional neural networks proved the ability to learn robust features for the closed set problem, as shown in many different datasets, such as MNIST FASHIONMNIST, CIFAR10, CIFAR100, and IMAGENET. These approaches use deep neural networks with dense layers with softmax activ… ▽ More

    Submitted 28 February, 2022; v1 submitted 5 February, 2021; originally announced February 2021.

    Comments: 9 pages with 10 figures and 6 tables. Not currently published

    MSC Class: cs.LG; cs.AI; cs:CV

  10. arXiv:2010.00330  [pdf, other

    cs.DB cs.AI cs.DC cs.LG

    Workflow Provenance in the Lifecycle of Scientific Machine Learning

    Authors: Renan Souza, Leonardo G. Azevedo, Vítor Lourenço, Elton Soares, Raphael Thiago, Rafael Brandão, Daniel Civitarese, Emilio Vital Brazil, Marcio Moreno, Patrick Valduriez, Marta Mattoso, Renato Cerqueira, Marco A. S. Netto

    Abstract: Machine Learning (ML) has already fundamentally changed several businesses. More recently, it has also been profoundly impacting the computational science and engineering domains, like geoscience, climate science, and health science. In these domains, users need to perform comprehensive data analyses combining scientific data and ML models to provide for critical requirements, such as reproducibil… ▽ More

    Submitted 25 August, 2021; v1 submitted 30 September, 2020; originally announced October 2020.

    Comments: 21 pages, 10 figures, text overlap with arXiv:1910.04223, a workshop paper being extended in this journal paper

    MSC Class: 65Y05; 68P15 ACM Class: I.2; H.2; C.4; J.2

    Journal ref: Concurrency Computation Practice Experience. 2021;e6544

  11. arXiv:1910.04223  [pdf, other

    cs.DC cs.DB cs.LG

    Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering

    Authors: Renan Souza, Leonardo Azevedo, Vítor Lourenço, Elton Soares, Raphael Thiago, Rafael Brandão, Daniel Civitarese, Emilio Vital Brazil, Marcio Moreno, Patrick Valduriez, Marta Mattoso, Renato Cerqueira, Marco A. S. Netto

    Abstract: Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists' expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stakeholders how it was created. The m… ▽ More

    Submitted 21 October, 2019; v1 submitted 9 October, 2019; originally announced October 2019.

    Comments: 10 pages, 7 figures, Accepted at Workflows in Support of Large-scale Science (WORKS) co-located with the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) 2019, Denver, Colorado

    MSC Class: 65Y05; 68P15 ACM Class: I.2; H.2; C.4; J.2

  12. Keeping Track of User Steering Actions in Dynamic Workflows

    Authors: Renan Souza, Vítor Silva, José Camata, Alvaro Coutinho, Patrick Valduriez, Marta Mattoso

    Abstract: In long-lasting scientific workflow executions in HPC machines, computational scientists (the users in this work) often need to fine-tune several workflow parameters. These tunings are done through user steering actions that may significantly improve performance (e.g., reduce execution time) or improve the overall results. However, in executions that last for weeks, users can lose track of what ha… ▽ More

    Submitted 17 May, 2019; originally announced May 2019.

    Journal ref: Future Generation Computer Systems, Elsevier, 2019, 99, pp.624-643

  13. arXiv:1805.03141  [pdf, other

    cs.DC cs.AI cs.DB

    Parallel Computation of PDFs on Big Spatial Data Using Spark

    Authors: Ji Liu, Noel Moreno Lemus, Esther Pacitti, Fabio Porto, Patrick Valduriez

    Abstract: We consider big spatial data, which is typically produced in scientific areas such as geological or seismic interpretation. The spatial data can be produced by observation (e.g. using sensors or soil instrument) or numerical simulation programs and correspond to points that represent a 3D soil cube area. However, errors in signal processing and modeling create some uncertainty, and thus a lack of… ▽ More

    Submitted 8 May, 2018; originally announced May 2018.

  14. arXiv:1703.02638  [pdf, other

    cs.DB

    Constellation Queries over Big Data

    Authors: Fabio Porto, Amir Khatibi, João R. Nobre, Eduardo Ogasawara, Patrick Valduriez, Dennis Shasha

    Abstract: A geometrical pattern is a set of points with all pairwise distances (or, more generally, relative distances) specified. Finding matches to such patterns has applications to spatial data in seismic, astronomical, and transportation contexts. For example, a particularly interesting geometric pattern in astronomy is the Einstein cross, which is an astronomical phenomenon in which a single quasar is… ▽ More

    Submitted 7 March, 2017; originally announced March 2017.

    ACM Class: H.2.4; H.2.8; H.3.1

  15. arXiv:1310.4802  [pdf, ps, other

    cs.DB cs.DC

    On Demand Memory Specialization for Distributed Graph Databases

    Authors: Xavier Martinez-Palau, David Dominguez-Sal, Reza Akbarinia, Patrick Valduriez, Josep Lluís Larriba-Pey

    Abstract: In this paper, we propose the DN-tree that is a data structure to build lossy summaries of the frequent data access patterns of the queries in a distributed graph data management system. These compact representations allow us an efficient communication of the data structure in distributed systems. We exploit this data structure with a new \textit{Dynamic Data Partitioning} strategy (DYDAP) that as… ▽ More

    Submitted 16 October, 2013; originally announced October 2013.

  16. arXiv:1205.2555  [pdf, other

    cs.DL

    Public Data Integration with WebSmatch

    Authors: R. Coletta, E. Castanier, P. Valduriez, C. Frisch, D. Ngo, Z. Bellahsene

    Abstract: Integrating open data sources can yield high value information but raises major problems in terms of metadata extraction, data source integration and visualization of integrated data. In this paper, we describe WebSmatch, a flexible environment for Web data integration, based on a real, end-to-end data integration scenario over public data from Data Publica. WebSmatch supports the full process of… ▽ More

    Submitted 15 May, 2012; v1 submitted 11 May, 2012; originally announced May 2012.

    Comments: Presented at the First International Workshop On Open Data, WOD-2012 (http://arxiv.org/abs/1204.3726)

    Report number: WOD/2012/NANTES/9

  17. Principles of Distributed Data Management in 2020?

    Authors: Patrick Valduriez

    Abstract: With the advents of high-speed networks, fast commodity hardware, and the web, distributed data sources have become ubiquitous. The third edition of the Özsu-Valduriez textbook Principles of Distributed Database Systems [10] reflects the evolution of distributed data management and distributed database systems. In this new edition, the fundamental principles of distributed data management could be… ▽ More

    Submitted 11 November, 2011; originally announced November 2011.

    Journal ref: Int. Conf. on Databases and Expert Systems Applications (DEXA) 6860 (2011) 1-11

  18. Reducing Network Traffic in Unstructured P2P Systems Using Top-k Queries

    Authors: Reza Akbarinia, Esther Pacitti, Patrick Valduriez

    Abstract: A major problem of unstructured P2P systems is their heavy network traffic. This is caused mainly by high numbers of query answers, many of which are irrelevant for users. One solution to this problem is to use Top-k queries whereby the user can specify a limited number (k) of the most relevant answers. In this paper, we present FD, a (Fully Distributed) framework for executing Top-k queries in… ▽ More

    Submitted 14 September, 2009; originally announced September 2009.

    Journal ref: Distributed and Parallel Databases 19, 2-3 (2006) 67-86