Zum Hauptinhalt springen

Showing 1–21 of 21 results for author: Beschastnikh, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.06124  [pdf, other

    cs.HC

    Teleoscope: Exploring Themes in Large Document Sets By Example

    Authors: Paul Bucci, Leo Foord-Kelcey, Patrick Yung Kang Lee, Alamjeet Singh, Ivan Beschastnikh

    Abstract: Qualitative thematic exploration of data by hand does not scale and researchers create and update a personalized point of view as they explore data. As a result, machine learning (ML) approaches that might help with exploration are challenging to apply. We developed Teleoscope, a web-based system that supports interactive exploration of large corpora (100K-1M) of short documents (1-3 paragraphs).… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 28 pages, 9 figures, pre-print

  2. arXiv:2310.00399  [pdf, other

    cs.SE

    Empirical Study on Transformer-based Techniques for Software Engineering

    Authors: Yan Xiao, Xinyue Zuo, Lei Xue, Kailong Wang, Jin Song Dong, Ivan Beschastnikh

    Abstract: Many Transformer-based pre-trained models for code have been developed and applied to code-related tasks. In this paper, we review the existing literature, examine the suitability of model architectures for different tasks, and look at the generalization ability of models on different datasets, and their resource consumption. We examine three very representative pre-trained models for code: Code… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

  3. arXiv:2305.01657  [pdf, other

    cs.LG cs.AI cs.DC

    Scalable Data Point Valuation in Decentralized Learning

    Authors: Konstantin D. Pandl, Chun-Yin Huang, Ivan Beschastnikh, Xiaoxiao Li, Scott Thiebes, Ali Sunyaev

    Abstract: Existing research on data valuation in federated and swarm learning focuses on valuing client contributions and works best when data across clients is independent and identically distributed (IID). In practice, data is rarely distributed IID. We develop an approach called DDVal for decentralized data valuation, capable of valuing individual data points in federated and swarm learning. DDVal is bas… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

  4. arXiv:2212.01523  [pdf, other

    cs.LG cs.DC

    GlueFL: Reconciling Client Sampling and Model Masking for Bandwidth Efficient Federated Learning

    Authors: Shiqi He, Qifan Yan, Feijie Wu, Lanjun Wang, Mathias Lécuyer, Ivan Beschastnikh

    Abstract: Federated learning (FL) is an effective technique to directly involve edge devices in machine learning training while preserving client privacy. However, the substantial communication overhead of FL makes training challenging when edge devices have limited network bandwidth. Existing work to optimize FL bandwidth overlooks downstream transmission and does not account for FL client sampling. In t… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

  5. arXiv:2203.05158  [pdf, other

    cs.DC

    Scaling Blockchain Consensus via a Robust Shared Mempool

    Authors: Fangyu Gai, Jianyu Niu, Ivan Beschastnikh, Chen Feng, Sheng Wang

    Abstract: There is a resurgence of interest in Byzantine fault-tolerant (BFT) systems due to blockchains. However, leader-based BFT consensus protocols used by permissioned blockchains have limited scalability and robustness. To alleviate the leader bottleneck in BFT consensus, we introduce Stratus, a robust shared mempool protocol that decouples transaction distribution from consensus. Our idea is to have… ▽ More

    Submitted 25 September, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: This work is to appear in ICDE 2023

  6. arXiv:2201.04322  [pdf, other

    cs.DC cs.NI

    Gridiron: A Technique for Augmenting Cloud Workloads with Network Bandwidth Requirements

    Authors: Nodir Kodirov, Shane Bergsma, Syed M. Iqbal, Alan J. Hu, Ivan Beschastnikh, Margo Seltzer

    Abstract: Cloud applications use more than just server resources, they also require networking resources. We propose a new technique to model network bandwidth demand of networked cloud applications. Our technique, Gridiron, augments VM workload traces from Azure cloud with network bandwidth requirements. The key to the Gridiron technique is to derive inter-VM network bandwidth requirements using Amdahl's s… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

    Comments: 9 pages, 8 figures, 2 tables

  7. arXiv:2112.13009  [pdf, other

    cs.CR

    One Bad Apple Spoils the Bunch: Transaction DoS in MimbleWimble Blockchains

    Authors: Seyed Ali Tabatabaee, Charlene Nicer, Ivan Beschastnikh, Chen Feng

    Abstract: As adoption of blockchain-based systems grows, more attention is being given to privacy of these systems. Early systems like BitCoin provided few privacy features. As a result, systems with strong privacy guarantees, including Monero, Zcash, and MimbleWimble have been developed. Compared to BitCoin, these cryptocurrencies are much less understood. In this paper, we focus on MimbleWimble, which use… ▽ More

    Submitted 24 December, 2021; originally announced December 2021.

    Comments: 9 pages, 4 figures

  8. arXiv:2110.02718  [pdf, other

    cs.LG

    Generalizing Neural Networks by Reflecting Deviating Data in Production

    Authors: Yan Xiao, Yun Lin, Ivan Beschastnikh, Changsheng Sun, David S. Rosenblum, Jin Song Dong

    Abstract: Trained with a sufficiently large training and testing dataset, Deep Neural Networks (DNNs) are expected to generalize. However, inputs may deviate from the training dataset distribution in real deployments. This is a fundamental issue with using a finite dataset. Even worse, real inputs may change over time from the expected distribution. Taken together, these issues may lead deployed DNNs to mis… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  9. arXiv:2109.02312  [pdf, other

    cs.SE

    Linear-time Temporal Logic guided Greybox Fuzzing

    Authors: Ruijie Meng, Zhen Dong, Jialin Li, Ivan Beschastnikh, Abhik Roychoudhury

    Abstract: Software model checking is a verification technique which is widely used for checking temporal properties of software systems. Even though it is a property verification technique, its common usage in practice is in "bug finding", that is, finding violations of temporal properties. Motivated by this observation and leveraging the recent progress in fuzzing, we build a greybox fuzzing framework to f… ▽ More

    Submitted 19 April, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: To appear in International Conference on Software Engineering (ICSE) 2022

  10. arXiv:2103.02371  [pdf, other

    cs.SE

    Self-Checking Deep Neural Networks in Deployment

    Authors: Yan Xiao, Ivan Beschastnikh, David S. Rosenblum, Changsheng Sun, Sebastian Elbaum, Yun Lin, Jin Song Dong

    Abstract: The widespread adoption of Deep Neural Networks (DNNs) in important domains raises questions about the trustworthiness of DNN outputs. Even a highly accurate DNN will make mistakes some of the time, and in settings like self-driving vehicles these mistakes must be quickly detected and properly dealt with in deployment. Just as our community has developed effective techniques and mechanisms to moni… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Journal ref: 43rd International Conference on Software Engineering (ICSE2021)

  11. arXiv:2103.00777  [pdf, other

    cs.CR cs.DC

    Dissecting the Performance of Chained-BFT

    Authors: Fangyu Gai, Ali Farahbakhsh, Jianyu Niu, Chen Feng, Ivan Beschastnikh, Hao Duan

    Abstract: Permissioned blockchains employ Byzantine fault-tolerant (BFT) state machine replication (SMR) to reach agreement on an ever-growing, linearly ordered log of transactions. A new paradigm, combined with decades of research in BFT SMR and blockchain (namely chained-BFT, or cBFT), has emerged for directly constructing blockchain protocols. Chained-BFT protocols have a unifying propose-vote scheme ins… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: 12 pages

  12. arXiv:2012.01636  [pdf, other

    cs.DC

    EBFT: Simplifying BFT Consensus Through Egalitarianism

    Authors: Jianyu Niu, Runchao Han, Shengqi Liu, Fangyu Gai, Ivan Beschastnikh, Yinqian Zhang, Chen Feng

    Abstract: We present Egalitarian BFT (EBFT), a simple and high-performance framework of BFT consensus protocols for decentralized systems like blockchains. The key innovation in EBFT is egalitarian block generation: nodes randomly and non-interactively propose blocks containing client transactions, rather than relying on a leader to do so. Apart from deterministic safety and liveness guarantees standard in… ▽ More

    Submitted 12 March, 2023; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: 17 page, 12 figures

  13. arXiv:2011.11001  [pdf, other

    cs.LG

    Fairness-guided SMT-based Rectification of Decision Trees and Random Forests

    Authors: Jiang Zhang, Ivan Beschastnikh, Sergey Mechtaev, Abhik Roychoudhury

    Abstract: Data-driven decision making is gaining prominence with the popularity of various machine learning models. Unfortunately, real-life data used in machine learning training may capture human biases, and as a result the learned models may lead to unfair decision making. In this paper, we provide a solution to this problem for decision trees and random forests. Our approach converts any decision tree o… ▽ More

    Submitted 22 November, 2020; originally announced November 2020.

  14. arXiv:2010.13681  [pdf, other

    cs.DC cs.HC

    Aggregate-Driven Trace Visualizations for Performance Debugging

    Authors: Vaastav Anand, Matheus Stolet, Thomas Davidson, Ivan Beschastnikh, Tamara Munzner, Jonathan Mace

    Abstract: Performance issues in cloud systems are hard to debug. Distributed tracing is a widely adopted approach that gives engineers visibility into cloud systems. Existing trace analysis approaches focus on debugging single request correctness issues but not debugging single request performance issues. Diagnosing a performance issue in a given request requires comparing the performance of the offending r… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

  15. arXiv:2006.05182  [pdf, other

    cs.NI

    Parking Packet Payload with P4

    Authors: Swati Goswami, Nodir Kodirov, Craig Mustard, Ivan Beschastnikh, Margo Seltzer

    Abstract: Network Function (NF) deployments suffer from poor link goodput, because popular NFs such as firewalls process only packet headers while receiving and transmitting complete packets. As a result, unnecessary packet payloads needlessly consume link bandwidth. We introduce PayloadPark, which improves goodput by temporarily parking packet payloads in the stateful memory of dataplane programmable switc… ▽ More

    Submitted 2 November, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

  16. arXiv:2005.07826  [pdf, other

    cs.CR

    Precise XSS detection and mitigation with Client-side Templates

    Authors: Jose Carlos Pazos, Jean-Sebastien Legare, Ivan Beschastnikh, William Aiello

    Abstract: We present XSnare, a fully client-side XSS solution, implemented as a Firefox extension. Our approach takes advantage of available previous knowledge of a web application's HTML template content, as well as the rich context available in the DOM to block XSS attacks. XSnare prevents XSS exploits by using a database of exploit descriptions, which are written with the help of previously recorded CVEs… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

    Comments: 15 pages, 10 figures

  17. arXiv:1905.10518  [pdf, other

    cs.CR

    Bandwidth-Efficient Transaction Relay for Bitcoin

    Authors: Gleb Naumenko, Gregory Maxwell, Pieter Wuille, Alexandra Fedorova, Ivan Beschastnikh

    Abstract: Bitcoin is a top-ranked cryptocurrency that has experienced huge growth and survived numerous attacks. The protocols making up Bitcoin must therefore accommodate the growth of the network and ensure security. Security of the Bitcoin network depends on connectivity between the nodes. Higher connectivity yields better security. In this paper we make two observations: (1) current connectivity in the… ▽ More

    Submitted 3 June, 2019; v1 submitted 25 May, 2019; originally announced May 2019.

  18. arXiv:1812.09975  [pdf, other

    cs.NI cs.LG stat.ML

    Iroko: A Framework to Prototype Reinforcement Learning for Data Center Traffic Control

    Authors: Fabian Ruffy, Michael Przystupa, Ivan Beschastnikh

    Abstract: Recent networking research has identified that data-driven congestion control (CC) can be more efficient than traditional CC in TCP. Deep reinforcement learning (RL), in particular, has the potential to learn optimal network policies. However, RL suffers from instability and over-fitting, deficiencies which so far render it unacceptable for use in datacenter networks. In this paper, we analyze the… ▽ More

    Submitted 24 December, 2018; originally announced December 2018.

    Comments: 5 figures, 1 Table, 11 pages, Accepted to http://mlforsystems.org/accepted_papers.html (ML for Systems) workshop

    Journal ref: Proceedings of the Workshop on ML for Systems at NeurIPS, 2018

  19. arXiv:1811.09904  [pdf, other

    cs.LG cs.CR cs.DC stat.ML

    Biscotti: A Ledger for Private and Secure Peer-to-Peer Machine Learning

    Authors: Muhammad Shayan, Clement Fung, Chris J. M. Yoon, Ivan Beschastnikh

    Abstract: Federated Learning is the current state of the art in supporting secure multi-party machine learning (ML): data is maintained on the owner's device and the updates to the model are aggregated through a secure protocol. However, this process assumes a trusted centralized infrastructure for coordination, and clients must trust that the central service does not use the byproducts of client data. In a… ▽ More

    Submitted 11 December, 2019; v1 submitted 24 November, 2018; originally announced November 2018.

    Comments: 20 pages

  20. arXiv:1811.09712  [pdf, other

    cs.CR cs.DC cs.LG

    Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting

    Authors: Clement Fung, Jamie Koerner, Stewart Grant, Ivan Beschastnikh

    Abstract: Distributed machine learning (ML) systems today use an unsophisticated threat model: data sources must trust a central ML process. We propose a brokered learning abstraction that allows data sources to contribute towards a globally-shared model with provable privacy guarantees in an untrusted setting. We realize this abstraction by building on federated learning, the state of the art in multi-part… ▽ More

    Submitted 23 February, 2019; v1 submitted 23 November, 2018; originally announced November 2018.

    Comments: 16 pages

  21. arXiv:1808.04866  [pdf, other

    cs.LG cs.CR cs.DC stat.ML

    Mitigating Sybils in Federated Learning Poisoning

    Authors: Clement Fung, Chris J. M. Yoon, Ivan Beschastnikh

    Abstract: Machine learning (ML) over distributed multi-party data is required for a variety of domains. Existing approaches, such as federated learning, collect the outputs computed by a group of devices at a central aggregator and run iterative algorithms to train a globally shared model. Unfortunately, such approaches are susceptible to a variety of attacks, including model poisoning, which is made substa… ▽ More

    Submitted 15 July, 2020; v1 submitted 14 August, 2018; originally announced August 2018.

    Comments: 16 pages, Extended technical version of conference paper "The Limitations of Federated Learning in Sybil Settings" accepted at RAID 2020