Search | arXiv e-print repository

Panza: A Personalized Text Writing Assistant via Data Playback and Local Fine-Tuning

Authors: Armand Nicolicioiu, Eugenia Iofinova, Eldar Kurtic, Mahdi Nikdan, Andrei Panferov, Ilia Markov, Nir Shavit, Dan Alistarh

Abstract: The availability of powerful open-source large language models (LLMs) opens exciting use-cases, such as automated personal assistants that adapt to the user's unique data and demands. Two key desiderata for such assistants are personalization-in the sense that the assistant should reflect the user's own style-and privacy-in the sense that users may prefer to always store their personal data locall… ▽ More The availability of powerful open-source large language models (LLMs) opens exciting use-cases, such as automated personal assistants that adapt to the user's unique data and demands. Two key desiderata for such assistants are personalization-in the sense that the assistant should reflect the user's own style-and privacy-in the sense that users may prefer to always store their personal data locally, on their own computing device. We present a new design for such an automated assistant, for the specific use case of personal assistant for email generation, which we call Panza. Specifically, Panza can be both trained and inferenced locally on commodity hardware, and is personalized to the user's writing style. Panza's personalization features are based on a new technique called data playback, which allows us to fine-tune an LLM to better reflect a user's writing style using limited data. We show that, by combining efficient fine-tuning and inference methods, Panza can be executed entirely locally using limited resources-specifically, it can be executed within the same resources as a free Google Colab instance. Finally, our key methodological contribution is a careful study of evaluation metrics, and of how different choices of system components (e.g. the use of Retrieval-Augmented Generation or different fine-tuning approaches) impact the system's performance. △ Less

Submitted 24 June, 2024; originally announced July 2024.

Comments: Panza is available at https://github.com/IST-DASLab/PanzaMail

arXiv:2405.15756 [pdf, other]

Sparse Expansion and Neuronal Disentanglement

Authors: Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

Abstract: We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the original weights, one-shot pruned for a specific cluster of input values. We call this approach $\textit{Sparse Expansion}$. We show that, for models such as Llama 2 70B, as we increase the number of sparse experts, Sparse Expansion outperforms all other on… ▽ More We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the original weights, one-shot pruned for a specific cluster of input values. We call this approach $\textit{Sparse Expansion}$. We show that, for models such as Llama 2 70B, as we increase the number of sparse experts, Sparse Expansion outperforms all other one-shot sparsification approaches for the same inference FLOP budget per token, and that this gap grows as sparsity increases, leading to inference speedups. But why? To answer this, we provide strong evidence that the mixture of sparse experts is effectively $\textit{disentangling}$ the input-output relationship of every individual neuron across clusters of inputs. Specifically, sparse experts approximate the dense neuron output distribution with fewer weights by decomposing the distribution into a collection of simpler ones, each with a separate sparse dot product covering it. Interestingly, we show that the Wasserstein distance between a neuron's output distribution and a Gaussian distribution is an indicator of its entanglement level and contribution to the accuracy of the model. Every layer of an LLM has a fraction of highly entangled Wasserstein neurons, and model performance suffers more when these are sparsified as opposed to others. The code for Sparse Expansion is available at: https://github.com/Shavit-Lab/Sparse-Expansion . △ Less

Submitted 24 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 10 pages, 8 figures

arXiv:2405.13754 [pdf, other]

Grounding Toxicity in Real-World Events across Languages

Authors: Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen

Abstract: Social media conversations frequently suffer from toxicity, creating significant issues for users, moderators, and entire communities. Events in the real world, like elections or conflicts, can initiate and escalate toxic behavior online. Our study investigates how real-world events influence the origin and spread of toxicity in online discussions across various languages and regions. We gathered… ▽ More Social media conversations frequently suffer from toxicity, creating significant issues for users, moderators, and entire communities. Events in the real world, like elections or conflicts, can initiate and escalate toxic behavior online. Our study investigates how real-world events influence the origin and spread of toxicity in online discussions across various languages and regions. We gathered Reddit data comprising 4.5 million comments from 31 thousand posts in six different languages (Dutch, English, German, Arabic, Turkish and Spanish). We target fifteen major social and political world events that occurred between 2020 and 2023. We observe significant variations in toxicity, negative sentiment, and emotion expressions across different events and language communities, showing that toxicity is a complex phenomenon in which many different factors interact and still need to be investigated. We will release the data for further research along with our code. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Paper accepted for at The 29th International Conference on Natural Language & Information Systems (NLDB 2024)

arXiv:2404.18865 [pdf, other]

Truth-value judgment in language models: belief directions are context sensitive

Authors: Stefan F. Schouten, Peter Bloem, Ilia Markov, Piek Vossen

Abstract: Recent work has demonstrated that the latent spaces of large language models (LLMs) contain directions predictive of the truth of sentences. Multiple methods recover such directions and build probes that are described as getting at a model's "knowledge" or "beliefs". We investigate this phenomenon, looking closely at the impact of context on the probes. Our experiments establish where in the LLM t… ▽ More Recent work has demonstrated that the latent spaces of large language models (LLMs) contain directions predictive of the truth of sentences. Multiple methods recover such directions and build probes that are described as getting at a model's "knowledge" or "beliefs". We investigate this phenomenon, looking closely at the impact of context on the probes. Our experiments establish where in the LLM the probe's predictions can be described as being conditional on the preceding (related) sentences. Specifically, we quantify the responsiveness of the probes to the presence of (negated) supporting and contradicting sentences, and score the probes on their consistency. We also perform a causal intervention experiment, investigating whether moving the representation of a premise along these belief directions influences the position of the hypothesis along that same direction. We find that the probes we test are generally context sensitive, but that contexts which should not affect the truth often still impact the probe outputs. Our experiments show that the type of errors depend on the layer, the (type of) model, and the kind of data. Finally, our results suggest that belief directions are (one of the) causal mediators in the inference process that incorporates in-context information. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18810 [pdf, other]

Unknown Script: Impact of Script on Cross-Lingual Transfer

Authors: Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen

Abstract: Cross-lingual transfer has become an effective way of transferring knowledge between languages. In this paper, we explore an often overlooked aspect in this domain: the influence of the source language of a language model on language transfer performance. We consider a case where the target language and its script are not part of the pre-trained model. We conduct a series of experiments on monolin… ▽ More Cross-lingual transfer has become an effective way of transferring knowledge between languages. In this paper, we explore an often overlooked aspect in this domain: the influence of the source language of a language model on language transfer performance. We consider a case where the target language and its script are not part of the pre-trained model. We conduct a series of experiments on monolingual and multilingual models that are pre-trained on different tokenization methods to determine factors that affect cross-lingual transfer to a new language with a unique script. Our findings reveal the importance of the tokenizer as a stronger factor than the shared script, language similarity, and model size. △ Less

Submitted 7 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: Paper accepted to NAACL Student Research Workshop (SRW) 2024

arXiv:2404.18726 [pdf, other]

The Constant in HATE: Analyzing Toxicity in Reddit across Topics and Languages

Authors: Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen

Abstract: Toxic language remains an ongoing challenge on social media platforms, presenting significant issues for users and communities. This paper provides a cross-topic and cross-lingual analysis of toxicity in Reddit conversations. We collect 1.5 million comment threads from 481 communities in six languages: English, German, Spanish, Turkish,Arabic, and Dutch, covering 80 topics such as Culture, Politic… ▽ More Toxic language remains an ongoing challenge on social media platforms, presenting significant issues for users and communities. This paper provides a cross-topic and cross-lingual analysis of toxicity in Reddit conversations. We collect 1.5 million comment threads from 481 communities in six languages: English, German, Spanish, Turkish,Arabic, and Dutch, covering 80 topics such as Culture, Politics, and News. We thoroughly analyze how toxicity spikes within different communities in relation to specific topics. We observe consistent patterns of increased toxicity across languages for certain topics, while also noting significant variations within specific language communities. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Accepted to TRAC 2024

arXiv:2311.05787 [pdf, other]

Towards stable real-world equation discovery with assessing differentiating quality influence

Authors: Mikhail Masliaev, Ilya Markov, Alexander Hvatov

Abstract: This paper explores the critical role of differentiation approaches for data-driven differential equation discovery. Accurate derivatives of the input data are essential for reliable algorithmic operation, particularly in real-world scenarios where measurement quality is inevitably compromised. We propose alternatives to the commonly used finite differences-based method, notorious for its instabil… ▽ More This paper explores the critical role of differentiation approaches for data-driven differential equation discovery. Accurate derivatives of the input data are essential for reliable algorithmic operation, particularly in real-world scenarios where measurement quality is inevitably compromised. We propose alternatives to the commonly used finite differences-based method, notorious for its instability in the presence of noise, which can exacerbate random errors in the data. Our analysis covers four distinct methods: Savitzky-Golay filtering, spectral differentiation, smoothing based on artificial neural networks, and the regularization of derivative variation. We evaluate these methods in terms of applicability to problems, similar to the real ones, and their ability to ensure the convergence of equation discovery algorithms, providing valuable insights for robust modeling of real-world processes. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.14657 [pdf, other]

Reasoning about Ambiguous Definite Descriptions

Authors: Stefan F. Schouten, Peter Bloem, Ilia Markov, Piek Vossen

Abstract: Natural language reasoning plays an increasingly important role in improving language models' ability to solve complex language understanding tasks. An interesting use case for reasoning is the resolution of context-dependent ambiguity. But no resources exist to evaluate how well Large Language Models can use explicit reasoning to resolve ambiguity in language. We propose to use ambiguous definite… ▽ More Natural language reasoning plays an increasingly important role in improving language models' ability to solve complex language understanding tasks. An interesting use case for reasoning is the resolution of context-dependent ambiguity. But no resources exist to evaluate how well Large Language Models can use explicit reasoning to resolve ambiguity in language. We propose to use ambiguous definite descriptions for this purpose and create and publish the first benchmark dataset consisting of such phrases. Our method includes all information required to resolve the ambiguity in the prompt, which means a model does not require anything but reasoning to do well. We find this to be a challenging task for recent LLMs. Code and data available at: https://github.com/sfschouten/exploiting-ambiguity △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Findings

arXiv:2310.09259 [pdf, other]

QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models

Authors: Saleh Ashkboos, Ilia Markov, Elias Frantar, Tingxuan Zhong, Xincheng Wang, Jie Ren, Torsten Hoefler, Dan Alistarh

Abstract: Large Language Models (LLMs) from the GPT family have become extremely popular, leading to a race towards reducing their inference costs to allow for efficient local computation. Yet, the vast majority of existing work focuses on weight-only quantization, which can reduce runtime costs in the memory-bound one-token-at-a-time generative setting, but does not address them in compute-bound scenarios,… ▽ More Large Language Models (LLMs) from the GPT family have become extremely popular, leading to a race towards reducing their inference costs to allow for efficient local computation. Yet, the vast majority of existing work focuses on weight-only quantization, which can reduce runtime costs in the memory-bound one-token-at-a-time generative setting, but does not address them in compute-bound scenarios, such as batched inference or prompt processing. In this paper, we address the general quantization problem, where both weights and activations should be quantized. We show, for the first time, that the majority of inference computations for large generative models such as LLaMA, OPT, and Falcon can be performed with both weights and activations being cast to 4 bits, in a way that leads to practical speedups, while at the same time maintaining good accuracy. We achieve this via a hybrid quantization strategy called QUIK, which compresses most of the weights and activations to 4-bit, while keeping some outlier weights and activations in higher-precision. The key feature of our scheme is that it is designed with computational efficiency in mind: we provide GPU kernels matching the QUIK format with highly-efficient layer-wise runtimes, which lead to practical end-to-end throughput improvements of up to 3.4x relative to FP16 execution. We provide detailed studies for models from the OPT, LLaMA-2 and Falcon families, as well as a first instance of accurate inference using quantization plus 2:4 sparsity. Code is available at: https://github.com/IST-DASLab/QUIK. △ Less

Submitted 2 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: 16 pages

arXiv:2306.09642 [pdf, ps, other]

Cross-Domain Toxic Spans Detection

Authors: Stefan F. Schouten, Baran Barbarestani, Wondimagegnhue Tufa, Piek Vossen, Ilia Markov

Abstract: Given the dynamic nature of toxic language use, automated methods for detecting toxic spans are likely to encounter distributional shift. To explore this phenomenon, we evaluate three approaches for detecting toxic spans under cross-domain conditions: lexicon-based, rationale extraction, and fine-tuned language models. Our findings indicate that a simple method using off-the-shelf lexicons perform… ▽ More Given the dynamic nature of toxic language use, automated methods for detecting toxic spans are likely to encounter distributional shift. To explore this phenomenon, we evaluate three approaches for detecting toxic spans under cross-domain conditions: lexicon-based, rationale extraction, and fine-tuned language models. Our findings indicate that a simple method using off-the-shelf lexicons performs best in the cross-domain setup. The cross-domain error analysis suggests that (1) rationale extraction methods are prone to false negatives, while (2) language models, despite performing best for the in-domain case, recall fewer explicitly toxic words than lexicons and are prone to certain types of false positives. Our code is publicly available at: https://github.com/sfschouten/toxic-cross-domain. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: NLDB 2023

arXiv:2306.09633 [pdf, other]

The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement

Authors: Igor L. Markov

Abstract: Reinforcement learning (RL) for physical design of silicon chips in a Google 2021 Nature paper stirred controversy due to poorly documented claims that raised eyebrows and drew critical media coverage. The paper withheld critical methodology steps and most inputs needed to reproduce results. Our meta-analysis shows how two separate evaluations filled in the gaps and demonstrated that Google RL lag… ▽ More Reinforcement learning (RL) for physical design of silicon chips in a Google 2021 Nature paper stirred controversy due to poorly documented claims that raised eyebrows and drew critical media coverage. The paper withheld critical methodology steps and most inputs needed to reproduce results. Our meta-analysis shows how two separate evaluations filled in the gaps and demonstrated that Google RL lags behind (i) human designers, (ii) a well-known algorithm (Simulated Annealing), and (iii) generally-available commercial software, while being slower; and in a 2023 open research contest, RL methods weren't in top 5. Crosschecked data indicate that the integrity of the Nature paper is substantially undermined owing to errors in conduct, analysis and reporting. Before publishing, Google rebuffed internal allegations of fraud, which still stand. We note policy implications and conclusions for chip design. △ Less

Submitted 5 July, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: 15 pages, 1 figure, 4 tables, 83 references

arXiv:2303.16531 [pdf, other]

RusTitW: Russian Language Text Dataset for Visual Text in-the-Wild Recognition

Authors: Igor Markov, Sergey Nesteruk, Andrey Kuznetsov, Denis Dimitrov

Abstract: Information surrounds people in modern life. Text is a very efficient type of information that people use for communication for centuries. However, automated text-in-the-wild recognition remains a challenging problem. The major limitation for a DL system is the lack of training data. For the competitive performance, training set must contain many samples that replicate the real-world cases. While… ▽ More Information surrounds people in modern life. Text is a very efficient type of information that people use for communication for centuries. However, automated text-in-the-wild recognition remains a challenging problem. The major limitation for a DL system is the lack of training data. For the competitive performance, training set must contain many samples that replicate the real-world cases. While there are many high-quality datasets for English text recognition; there are no available datasets for Russian language. In this paper, we present a large-scale human-labeled dataset for Russian text recognition in-the-wild. We also publish a synthetic dataset and code to reproduce the generation process △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: 5 pages, 6 figures, 2 tables

arXiv:2303.11580 [pdf, other]

Efficient Multi-stage Inference on Tabular Data

Authors: Daniel S Johnson, Igor L Markov

Abstract: Many ML applications and products train on medium amounts of input data but get bottlenecked in real-time inference. When implementing ML systems, conventional wisdom favors segregating ML code into services queried by product code via Remote Procedure Call (RPC) APIs. This approach clarifies the overall software architecture and simplifies product code by abstracting away ML internals. However, t… ▽ More Many ML applications and products train on medium amounts of input data but get bottlenecked in real-time inference. When implementing ML systems, conventional wisdom favors segregating ML code into services queried by product code via Remote Procedure Call (RPC) APIs. This approach clarifies the overall software architecture and simplifies product code by abstracting away ML internals. However, the separation adds network latency and entails additional CPU overhead. Hence, we simplify inference algorithms and embed them into the product code to reduce network communication. For public datasets and a high-performance real-time platform that deals with tabular data, we show that over half of the inputs are often amenable to such optimization, while the remainder can be handled by the original model. By applying our optimization with AutoML to both training and inference, we reduce inference latency by 1.3x, CPU resources by 30%, and network communication between application front-end and ML back-end by about 50% for a commercial end-to-end ML platform that serves millions of real-time decisions per second. △ Less

Submitted 21 July, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

arXiv:2302.14139 [pdf, other]

Scalable End-to-End ML Platforms: from AutoML to Self-serve

Authors: Igor L. Markov, Pavlos A. Apostolopoulos, Mia R. Garrard, Tanya Qie, Yin Huang, Tanvi Gupta, Anika Li, Cesar Cardoso, George Han, Ryan Maghsoudian, Norm Zhou

Abstract: ML platforms help enable intelligent data-driven applications and maintain them with limited engineering effort. Upon sufficiently broad adoption, such platforms reach economies of scale that bring greater component reuse while improving efficiency of system development and maintenance. For an end-to-end ML platform with broad adoption, scaling relies on pervasive ML automation and system integrat… ▽ More ML platforms help enable intelligent data-driven applications and maintain them with limited engineering effort. Upon sufficiently broad adoption, such platforms reach economies of scale that bring greater component reuse while improving efficiency of system development and maintenance. For an end-to-end ML platform with broad adoption, scaling relies on pervasive ML automation and system integration to reach the quality we term self-serve that we define with ten requirements and six optional capabilities. With this in mind, we identify long-term goals for platform development, discuss related tradeoffs and future work. Our reasoning is illustrated on two commercially-deployed end-to-end ML platforms that host hundreds of real-time use cases -- one general-purpose and one specialized. △ Less

Submitted 3 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 10 pages, 1 figure, 2 tables

arXiv:2302.12360 [pdf, other]

Practical Knowledge Distillation: Using DNNs to Beat DNNs

Authors: Chung-Wei Lee, Pavlos Athanasios Apostolopulos, Igor L. Markov

Abstract: For tabular data sets, we explore data and model distillation, as well as data denoising. These techniques improve both gradient-boosting models and a specialized DNN architecture. While gradient boosting is known to outperform DNNs on tabular data, we close the gap for datasets with 100K+ rows and give DNNs an advantage on small data sets. We extend these results with input-data distillation and… ▽ More For tabular data sets, we explore data and model distillation, as well as data denoising. These techniques improve both gradient-boosting models and a specialized DNN architecture. While gradient boosting is known to outperform DNNs on tabular data, we close the gap for datasets with 100K+ rows and give DNNs an advantage on small data sets. We extend these results with input-data distillation and optimized ensembling to help DNN performance match or exceed that of gradient boosting. As a theoretical justification of our practical method, we prove its equivalence to classical cross-entropy knowledge distillation. We also qualitatively explain the superiority of DNN ensembles over XGBoost on small data sets. For an industry end-to-end real-time ML platform with 4M production inferences per second, we develop a model-training workflow based on data sampling that distills ensembles of models into a single gradient-boosting model favored for high-performance real-time inference, without performance loss. Empirical evaluation shows that the proposed combination of methods consistently improves model accuracy over prior best models across several production applications deployed worldwide. △ Less

Submitted 1 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: 11 pages, 1 figure, 17 tables

arXiv:2302.02390 [pdf, other]

Quantized Distributed Training of Large Models with Convergence Guarantees

Authors: Ilia Markov, Adrian Vladu, Qi Guo, Dan Alistarh

Abstract: Communication-reduction techniques are a popular way to improve scalability in data-parallel training of deep neural networks (DNNs). The recent emergence of large language models such as GPT has created the need for new approaches to exploit data-parallelism. Among these, fully-sharded data parallel (FSDP) training is highly popular, yet it still encounters scalability bottlenecks. One reason is… ▽ More Communication-reduction techniques are a popular way to improve scalability in data-parallel training of deep neural networks (DNNs). The recent emergence of large language models such as GPT has created the need for new approaches to exploit data-parallelism. Among these, fully-sharded data parallel (FSDP) training is highly popular, yet it still encounters scalability bottlenecks. One reason is that applying compression techniques to FSDP is challenging: as the vast majority of the communication involves the model's weights, direct compression alters convergence and leads to accuracy loss. We present QSDP, a variant of FSDP which supports both gradient and weight quantization with theoretical guarantees, is simple to implement and has essentially no overheads. To derive QSDP we prove that a natural modification of SGD achieves convergence even when we only maintain quantized weights, and thus the domain over which we train consists of quantized points and is, therefore, highly non-convex. We validate this approach by training GPT-family models with up to 1.3 billion parameters on a multi-node cluster. Experiments show that QSDP preserves model accuracy, while completely removing the communication bottlenecks of FSDP, providing end-to-end speedups of up to 2.2x. △ Less

Submitted 5 February, 2023; originally announced February 2023.

arXiv:2301.07233 [pdf, other]

Enhancing quantum computer performance via symmetrization

Authors: Andrii Maksymov, Jason Nguyen, Yunseong Nam, Igor Markov

Abstract: Large quantum computers promise to solve some critical problems not solvable otherwise. However, modern quantum technologies suffer various imperfections such as control errors and qubit decoherence, inhibiting their potential utility. The overheads of quantum error correction are too great for near-term quantum computers, whereas error-mitigation strategies that address specific device imperfecti… ▽ More Large quantum computers promise to solve some critical problems not solvable otherwise. However, modern quantum technologies suffer various imperfections such as control errors and qubit decoherence, inhibiting their potential utility. The overheads of quantum error correction are too great for near-term quantum computers, whereas error-mitigation strategies that address specific device imperfections may lose relevance as devices improve. To enhance the performance of quantum computers with high-quality qubits, we introduce a strategy based on symmetrization and nonlinear aggregation. On a commercial trapped-ion quantum computer, it improves performance of multiple practical algorithms by 100x with no qubit or gate overhead. △ Less

Submitted 17 January, 2023; originally announced January 2023.

arXiv:2210.17357 [pdf, other]

L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning

Authors: Mohammadreza Alimohammadi, Ilia Markov, Elias Frantar, Dan Alistarh

Abstract: Data-parallel distributed training of deep neural networks (DNN) has gained very widespread adoption, but can still experience communication bottlenecks. To address this issue, entire families of compression mechanisms have been developed, including quantization, sparsification, and low-rank approximation, some of which are seeing significant practical adoption. Despite this progress, almost all k… ▽ More Data-parallel distributed training of deep neural networks (DNN) has gained very widespread adoption, but can still experience communication bottlenecks. To address this issue, entire families of compression mechanisms have been developed, including quantization, sparsification, and low-rank approximation, some of which are seeing significant practical adoption. Despite this progress, almost all known compression schemes apply compression uniformly across DNN layers, although layers are heterogeneous in terms of parameter count and their impact on model accuracy. In this work, we provide a general framework for adapting the degree of compression across the model's layers dynamically during training, improving the overall compression, while leading to substantial speedups, without sacrificing accuracy. Our framework, called L-GreCo, is based on an adaptive algorithm, which automatically picks the optimal compression parameters for model layers guaranteeing the best compression ratio while satisfying an error constraint. Extensive experiments over image classification and language modeling tasks shows that L-GreCo is effective across all existing families of compression methods, and achieves up to 2.5$\times$ training speedup and up to 5$\times$ compression improvement over efficient implementations of existing approaches, while recovering full accuracy. Moreover, L-GreCo is complementary to existing adaptive algorithms, improving their compression ratio by 50% and practical throughput by 66%. △ Less

Submitted 9 June, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

arXiv:2210.12526 [pdf, other]

Federated Calibration and Evaluation of Binary Classifiers

Authors: Graham Cormode, Igor Markov

Abstract: We address two major obstacles to practical use of supervised classifiers on distributed private data. Whether a classifier was trained by a federation of cooperating clients or trained centrally out of distribution, (1) the output scores must be calibrated, and (2) performance metrics must be evaluated -- all without assembling labels in one place. In particular, we show how to perform calibratio… ▽ More We address two major obstacles to practical use of supervised classifiers on distributed private data. Whether a classifier was trained by a federation of cooperating clients or trained centrally out of distribution, (1) the output scores must be calibrated, and (2) performance metrics must be evaluated -- all without assembling labels in one place. In particular, we show how to perform calibration and compute precision, recall, accuracy and ROC-AUC in the federated setting under three privacy models (i) secure aggregation, (ii) distributed differential privacy, (iii) local differential privacy. Our theorems and experiments clarify tradeoffs between privacy, accuracy, and data efficiency. They also help decide whether a given application has sufficient data to support federated calibration and evaluation. △ Less

Submitted 22 October, 2022; originally announced October 2022.

Comments: 24 pages

arXiv:2202.09483 [pdf, other]

Data-Driven Mitigation of Adversarial Text Perturbation

Authors: Rasika Bhalerao, Mohammad Al-Rubaie, Anand Bhaskar, Igor Markov

Abstract: Social networks have become an indispensable part of our lives, with billions of people producing ever-increasing amounts of text. At such scales, content policies and their enforcement become paramount. To automate moderation, questionable content is detected by Natural Language Processing (NLP) classifiers. However, high-performance classifiers are hampered by misspellings and adversarial text p… ▽ More Social networks have become an indispensable part of our lives, with billions of people producing ever-increasing amounts of text. At such scales, content policies and their enforcement become paramount. To automate moderation, questionable content is detected by Natural Language Processing (NLP) classifiers. However, high-performance classifiers are hampered by misspellings and adversarial text perturbations. In this paper, we classify intentional and unintentional adversarial text perturbation into ten types and propose a deobfuscation pipeline to make NLP models robust to such perturbations. We propose Continuous Word2Vec (CW2V), our data-driven method to learn word embeddings that ensures that perturbations of words have embeddings similar to those of the original words. We show that CW2V embeddings are generally more robust to text perturbations than embeddings based on character ngrams. Our robust classification pipeline combines deobfuscation and classification, using proposed defense methods and word embeddings to classify whether Facebook posts are requesting engagement such as likes. Our pipeline results in engagement bait classification that goes from 0.70 to 0.67 AUC with adversarial text perturbation, while character ngram-based word embedding methods result in downstream classification that goes from 0.76 to 0.64. △ Less

Submitted 18 February, 2022; originally announced February 2022.

arXiv:2111.12795 [pdf, other]

Picasso: Model-free Feature Visualization

Authors: Binh Vu, Igor Markov

Abstract: Today, Machine Learning (ML) applications can have access to tens of thousands of features. With such feature sets, efficiently browsing and curating subsets of most relevant features is a challenge. In this paper, we present a novel approach to visualize up to several thousands of features in a single image. The image not only shows information on individual features, but also expresses feature i… ▽ More Today, Machine Learning (ML) applications can have access to tens of thousands of features. With such feature sets, efficiently browsing and curating subsets of most relevant features is a challenge. In this paper, we present a novel approach to visualize up to several thousands of features in a single image. The image not only shows information on individual features, but also expresses feature interactions via the relative positioning of features. △ Less

Submitted 24 November, 2021; originally announced November 2021.

arXiv:2111.08617 [pdf, other]

doi 10.1145/3528535.3565248

CGX: Adaptive System Support for Communication-Efficient Deep Learning

Authors: Ilia Markov, Hamidreza Ramezanikebrya, Dan Alistarh

Abstract: The ability to scale out training workloads has been one of the key performance enablers of deep learning. The main scaling approach is data-parallel GPU-based training, which has been boosted by hardware and software support for highly efficient point-to-point communication, and in particular via hardware bandwidth overprovisioning. Overprovisioning comes at a cost: there is an order of magnitude… ▽ More The ability to scale out training workloads has been one of the key performance enablers of deep learning. The main scaling approach is data-parallel GPU-based training, which has been boosted by hardware and software support for highly efficient point-to-point communication, and in particular via hardware bandwidth overprovisioning. Overprovisioning comes at a cost: there is an order of magnitude price difference between "cloud-grade" servers with such support, relative to their popular "consumer-grade" counterparts, although single server-grade and consumer-grade GPUs can have similar computational envelopes. In this paper, we show that the costly hardware overprovisioning approach can be supplanted via algorithmic and system design, and propose a framework called CGX, which provides efficient software support for compressed communication in ML applications, for both multi-GPU single-node training, as well as larger-scale multi-node training. CGX is based on two technical advances: \emph{At the system level}, it relies on a re-developed communication stack for ML frameworks, which provides flexible, highly-efficient support for compressed communication. \emph{At the application level}, it provides \emph{seamless, parameter-free} integration with popular frameworks, so that end-users do not have to modify training recipes, nor significant training code. This is complemented by a \emph{layer-wise adaptive compression} technique which dynamically balances compression gains with accuracy preservation. CGX integrates with popular ML frameworks, providing up to 3X speedups for multi-GPU nodes based on commodity hardware, and order-of-magnitude improvements in the multi-node setting, with negligible impact on accuracy. △ Less

Submitted 29 December, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Journal ref: Middleware 2022

arXiv:2110.07554 [pdf, other]

Looper: An end-to-end ML platform for product decisions

Authors: Igor L. Markov, Hanson Wang, Nitya Kasturi, Shaun Singh, Sze Wai Yuen, Mia Garrard, Sarah Tran, Yin Huang, Zehui Wang, Igor Glotov, Tanvi Gupta, Boshuang Huang, Peng Chen, Xiaowen Xie, Michael Belkin, Sal Uryasev, Sam Howie, Eytan Bakshy, Norm Zhou

Abstract: Modern software systems and products increasingly rely on machine learning models to make data-driven decisions based on interactions with users, infrastructure and other systems. For broader adoption, this practice must (i) accommodate product engineers without ML backgrounds, (ii) support finegrain product-metric evaluation and (iii) optimize for product goals. To address shortcomings of prior p… ▽ More Modern software systems and products increasingly rely on machine learning models to make data-driven decisions based on interactions with users, infrastructure and other systems. For broader adoption, this practice must (i) accommodate product engineers without ML backgrounds, (ii) support finegrain product-metric evaluation and (iii) optimize for product goals. To address shortcomings of prior platforms, we introduce general principles for and the architecture of an ML platform, Looper, with simple APIs for decision-making and feedback collection. Looper covers the end-to-end ML lifecycle from collecting training data and model training to deployment and inference, and extends support to personalization, causal evaluation with heterogenous treatment effects, and Bayesian tuning for product goals. During the 2021 production deployment Looper simultaneously hosted 440-1,000 ML models that made 4-6 million real-time decisions per second. We sum up experiences of platform adopters and describe their learning curve. △ Less

Submitted 21 June, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: 11 pages + references, 7 figures; to appear in KDD 2022

arXiv:2109.11577 [pdf, other]

Text Ranking and Classification using Data Compression

Authors: Nitya Kasturi, Igor L. Markov

Abstract: A well-known but rarely used approach to text categorization uses conditional entropy estimates computed using data compression tools. Text affinity scores derived from compressed sizes can be used for classification and ranking tasks, but their success depends on the compression tools used. We use the Zstandard compressor and strengthen these ideas in several ways, calling the resulting language-… ▽ More A well-known but rarely used approach to text categorization uses conditional entropy estimates computed using data compression tools. Text affinity scores derived from compressed sizes can be used for classification and ranking tasks, but their success depends on the compression tools used. We use the Zstandard compressor and strengthen these ideas in several ways, calling the resulting language-agnostic technique Zest. In applications, this approach simplifies configuration, avoiding careful feature extraction and large ML models. Our ablation studies confirm the value of individual enhancements we introduce. We show that Zest complements and can compete with language-specific multidimensional content embeddings in production, but cannot outperform other counting methods on public datasets. △ Less

Submitted 7 December, 2021; v1 submitted 23 September, 2021; originally announced September 2021.

Journal ref: ICBINB workshop at NeurIPS 2021

arXiv:2108.08538 [pdf, other]

doi 10.1145/3459637.3482275

Mixture-Based Correction for Position and Trust Bias in Counterfactual Learning to Rank

Authors: Ali Vardasbi, Maarten de Rijke, Ilya Markov

Abstract: In counterfactual learning to rank (CLTR) user interactions are used as a source of supervision. Since user interactions come with bias, an important focus of research in this field lies in developing methods to correct for the bias of interactions. Inverse propensity scoring (IPS) is a popular method suitable for correcting position bias. Affine correction (AC) is a generalization of IPS that cor… ▽ More In counterfactual learning to rank (CLTR) user interactions are used as a source of supervision. Since user interactions come with bias, an important focus of research in this field lies in developing methods to correct for the bias of interactions. Inverse propensity scoring (IPS) is a popular method suitable for correcting position bias. Affine correction (AC) is a generalization of IPS that corrects for position bias and trust bias. IPS and AC provably remove bias, conditioned on an accurate estimation of the bias parameters. Estimating the bias parameters, in turn, requires an accurate estimation of the relevance probabilities. This cyclic dependency introduces practical limitations in terms of sensitivity, convergence and efficiency. We propose a new correction method for position and trust bias in CLTR in which, unlike the existing methods, the correction does not rely on relevance estimation. Our proposed method, mixture-based correction (MBC), is based on the assumption that the distribution of the CTRs over the items being ranked is a mixture of two distributions: the distribution of CTRs for relevant items and the distribution of CTRs for non-relevant items. We prove that our method is unbiased. The validity of our proof is not conditioned on accurate bias parameter estimation. Our experiments show that MBC, when used in different bias settings and accompanied by different LTR algorithms, outperforms AC, the state-of-the-art method for correcting position and trust bias, in some settings, while performing on par in other settings. Furthermore, MBC is orders of magnitude more efficient than AC in terms of the training time. △ Less

Submitted 19 August, 2021; originally announced August 2021.

Comments: CIKM 2021

arXiv:2108.03708 [pdf, other]

Detecting Qubit-coupling Faults in Ion-trap Quantum Computers

Authors: Andrii O. Maksymov, Jason Nguyen, Vandiver Chaplin, Yunseong Nam, Igor L. Markov

Abstract: Ion-trap quantum computers offer a large number of possible qubit couplings, each of which requires individual calibration and can be misconfigured. To enhance the duty cycle of an ion trap, we develop a strategy that diagnoses individual miscalibrated couplings using only log-many tests. This strategy is validated on a commercial ion-trap quantum computer, where we illustrate the process of debug… ▽ More Ion-trap quantum computers offer a large number of possible qubit couplings, each of which requires individual calibration and can be misconfigured. To enhance the duty cycle of an ion trap, we develop a strategy that diagnoses individual miscalibrated couplings using only log-many tests. This strategy is validated on a commercial ion-trap quantum computer, where we illustrate the process of debugging faulty quantum gates. Our methodology provides a scalable pathway towards fault detections on a larger scale ion-trap quantum computers, confirmed by simulations up to 32 qubits. △ Less

Submitted 12 December, 2021; v1 submitted 8 August, 2021; originally announced August 2021.

Journal ref: HPCA 2022

arXiv:2108.01521 [pdf, other]

Bit-efficient Numerical Aggregation and Stronger Privacy for Trust in Federated Analytics

Authors: Graham Cormode, Igor L. Markov

Abstract: Private data generated by edge devices -- from smart phones to automotive electronics -- are highly informative when aggregated but can be damaging when mishandled. A variety of solutions are being explored but have not yet won the public's trust and full backing of mobile platforms. In this work, we propose numerical aggregation protocols that empirically improve upon prior art, while providing c… ▽ More Private data generated by edge devices -- from smart phones to automotive electronics -- are highly informative when aggregated but can be damaging when mishandled. A variety of solutions are being explored but have not yet won the public's trust and full backing of mobile platforms. In this work, we propose numerical aggregation protocols that empirically improve upon prior art, while providing comparable local differential privacy guarantees. Sharing a single private bit per value supports privacy metering that enable privacy controls and guarantees that are not covered by differential privacy. We put emphasis on the ease of implementation, compatibility with existing methods, and compelling empirical performance. △ Less

Submitted 3 August, 2021; originally announced August 2021.

Comments: 15 pages

arXiv:2104.13818

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

Authors: Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

Abstract: As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD prov… ▽ More As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods. △ Less

Submitted 1 May, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: This entry is redundant and was created in error. See arXiv:1908.06077 for the latest version

arXiv:2102.09507 [pdf, ps, other]

Regular Expressions for Fast-response COVID-19 Text Classification

Authors: Igor L. Markov, Jacqueline Liu, Adam Vagner

Abstract: Text classifiers are at the core of many NLP applications and use a variety of algorithmic approaches and software. This paper introduces infrastructure and methodologies for text classifiers based on large-scale regular expressions. In particular, we describe how Facebook determines if a given piece of text - anything from a hashtag to a post - belongs to a narrow topic such as COVID-19. To fully… ▽ More Text classifiers are at the core of many NLP applications and use a variety of algorithmic approaches and software. This paper introduces infrastructure and methodologies for text classifiers based on large-scale regular expressions. In particular, we describe how Facebook determines if a given piece of text - anything from a hashtag to a post - belongs to a narrow topic such as COVID-19. To fully define a topic and evaluate classifier performance we employ human-guided iterations of keyword discovery, but do not require labeled data. For COVID-19, we build two sets of regular expressions: (1) for 66 languages, with 99% precision and recall >50%, (2) for the 11 most common languages, with precision >90% and recall >90%. Regular expressions enable low-latency queries from multiple platforms. Response to challenges like COVID-19 is fast and so are revisions. Comparisons to a DNN classifier show explainable results, higher precision and recall, and less overfitting. Our learnings can be applied to other narrow-topic classifiers. △ Less

Submitted 21 June, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: 10 pages, 7 tables

arXiv:2102.08465 [pdf, other]

Prioritizing Original News on Facebook

Authors: Xiuyan Ni, Shujian Bu, Igor L. Markov

Abstract: This work outlines how we prioritize original news, a critical indicator of news quality. By examining the landscape and life-cycle of news posts on our social media platform, we identify challenges of building and deploying an originality score. We pursue an approach based on normalized PageRank values and three-step clustering, and refresh the score on an hourly basis to capture the dynamics of… ▽ More This work outlines how we prioritize original news, a critical indicator of news quality. By examining the landscape and life-cycle of news posts on our social media platform, we identify challenges of building and deploying an originality score. We pursue an approach based on normalized PageRank values and three-step clustering, and refresh the score on an hourly basis to capture the dynamics of online news. We describe a near real-time system architecture, evaluate our methodology, and deploy it to production. Our empirical results validate individual components and show that prioritizing original news increases user engagement with news and improves proprietary cumulative metrics. △ Less

Submitted 14 March, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: 9 pages, 8 figures, 6 tables, 2 algorithm pseudocodes

Journal ref: CIKM 2021

arXiv:2102.05612 [pdf, other]

Personalization for Web-based Services using Offline Reinforcement Learning

Authors: Pavlos Athanasios Apostolopoulos, Zehui Wang, Hanson Wang, Chad Zhou, Kittipat Virochsiri, Norm Zhou, Igor L. Markov

Abstract: Large-scale Web-based services present opportunities for improving UI policies based on observed user interactions. We address challenges of learning such policies through model-free offline Reinforcement Learning (RL) with off-policy training. Deployed in a production system for user authentication in a major social network, it significantly improves long-term objectives. We articulate practical… ▽ More Large-scale Web-based services present opportunities for improving UI policies based on observed user interactions. We address challenges of learning such policies through model-free offline Reinforcement Learning (RL) with off-policy training. Deployed in a production system for user authentication in a major social network, it significantly improves long-term objectives. We articulate practical challenges, compare several ML techniques, provide insights on training and evaluation of RL models, and discuss generalizations. △ Less

Submitted 10 February, 2021; originally announced February 2021.

Comments: 9 pages, 8 figures, 3 tables

Journal ref: 2nd Offline Reinforcement Learning Workshop at NeurIPS 2021

arXiv:2010.12460 [pdf, other]

Adaptive Gradient Quantization for Data-Parallel SGD

Authors: Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel Roy, Ali Ramezani-Kebrya

Abstract: Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression sch… ▽ More Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters. △ Less

Submitted 23 October, 2020; originally announced October 2020.

Comments: Accepted at the conference on Neural Information Processing Systems (NeurIPS 2020)

arXiv:2008.00216 [pdf, other]

Faster Schrödinger-style simulation of quantum circuits

Authors: Aneeqa Fatima, Igor L. Markov

Abstract: Recent demonstrations of superconducting quantum computers by Google and IBM and trapped-ion computers from IonQ fueled new research in quantum algorithms, compilation into quantum circuits, and empirical algorithmics. While online access to quantum hardware remains too limited to meet the demand, simulating quantum circuits on conventional computers satisfies many needs. We advance Schrödinger-st… ▽ More Recent demonstrations of superconducting quantum computers by Google and IBM and trapped-ion computers from IonQ fueled new research in quantum algorithms, compilation into quantum circuits, and empirical algorithmics. While online access to quantum hardware remains too limited to meet the demand, simulating quantum circuits on conventional computers satisfies many needs. We advance Schrödinger-style simulation of quantum circuits that is useful standalone and as a building block in layered simulation algorithms, both cases are illustrated in our results. Our algorithmic contributions show how to simulate multiple quantum gates at once, how to avoid floating-point multiplies, how to best use instruction-level and thread-level parallelism as well as CPU cache, and how to leverage these optimizations by reordering circuit gates. While not described previously, these techniques implemented by us supported published high-performance distributed simulations up to 64 qubits. To show additional impact, we benchmark our simulator against Microsoft, IBM and Google simulators on hard circuits from Google. △ Less

Submitted 24 November, 2020; v1 submitted 1 August, 2020; originally announced August 2020.

Comments: 14 pages, 15 figures, 4 tables. Version 2 : Additional optimizations; improved simulation runtimes; profiling data; comparisons with the latest IBM QISKit simulator; dispelled apparent limitations of techniques. Version 3 : Ablation experiments and images for the code snippets

Journal ref: HPCA 2021

arXiv:2005.11938 [pdf, other]

doi 10.1145/3397271.3401299

Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank

Authors: Ali Vardasbi, Maarten de Rijke, Ilya Markov

Abstract: Unbiased CLTR requires click propensities to compensate for the difference between user clicks and true relevance of search results via IPS. Current propensity estimation methods assume that user click behavior follows the PBM and estimate click propensities based on this assumption. However, in reality, user clicks often follow the CM, where users scan search results from top to bottom and where… ▽ More Unbiased CLTR requires click propensities to compensate for the difference between user clicks and true relevance of search results via IPS. Current propensity estimation methods assume that user click behavior follows the PBM and estimate click propensities based on this assumption. However, in reality, user clicks often follow the CM, where users scan search results from top to bottom and where each next click depends on the previous one. In this cascade scenario, PBM-based estimates of propensities are not accurate, which, in turn, hurts CLTR performance. In this paper, we propose a propensity estimation method for the cascade scenario, called CM-IPS. We show that CM-IPS keeps CLTR performance close to the full-information performance in case the user clicks follow the CM, while PBM-based CLTR has a significant gap towards the full-information. The opposite is true if the user clicks follow PBM instead of the CM. Finally, we suggest a way to select between CM- and PBM-based propensity estimation methods based on historical user clicks. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 4 pages, 2 figures, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20)

arXiv:2005.01588 [pdf]

Workshops on Extreme Scale Design Automation (ESDA) Challenges and Opportunities for 2025 and Beyond

Authors: R. Iris Bahar, Alex K. Jones, Srinivas Katkoori, Patrick H. Madden, Diana Marculescu, Igor L. Markov

Abstract: Integrated circuits and electronic systems, as well as design technologies, are evolving at a great rate -- both quantitatively and qualitatively. Major developments include new interconnects and switching devices with atomic-scale uncertainty, the depth and scale of on-chip integration, electronic system-level integration, the increasing significance of software, as well as more effective means o… ▽ More Integrated circuits and electronic systems, as well as design technologies, are evolving at a great rate -- both quantitatively and qualitatively. Major developments include new interconnects and switching devices with atomic-scale uncertainty, the depth and scale of on-chip integration, electronic system-level integration, the increasing significance of software, as well as more effective means of design entry, compilation, algorithmic optimization, numerical simulation, pre- and post-silicon design validation, and chip test. Application targets and key markets are also shifting substantially from desktop CPUs to mobile platforms to an Internet-of-Things infrastructure. In light of these changes in electronic design contexts and given EDA's significant dependence on such context, the EDA community must adapt to these changes and focus on the opportunities for research and commercial success. The CCC workshop series on Extreme-Scale Design Automation, organized with the support of ACM SIGDA, studied challenges faced by the EDA community as well as new and exciting opportunities currently available. This document represents a summary of the findings from these meetings. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: A Computing Community Consortium (CCC) workshop report, 32 pages

Report number: ccc2014report_1

arXiv:2002.00467 [pdf, other]

Safe Exploration for Optimizing Contextual Bandits

Authors: Rolf Jagerman, Ilya Markov, Maarten de Rijke

Abstract: Contextual bandit problems are a natural fit for many information retrieval tasks, such as learning to rank, text classification, recommendation, etc. However, existing learning methods for contextual bandit problems have one of two drawbacks: they either do not explore the space of all possible document rankings (i.e., actions) and, thus, may miss the optimal ranking, or they present suboptimal r… ▽ More Contextual bandit problems are a natural fit for many information retrieval tasks, such as learning to rank, text classification, recommendation, etc. However, existing learning methods for contextual bandit problems have one of two drawbacks: they either do not explore the space of all possible document rankings (i.e., actions) and, thus, may miss the optimal ranking, or they present suboptimal rankings to a user and, thus, may harm the user experience. We introduce a new learning method for contextual bandit problems, Safe Exploration Algorithm (SEA), which overcomes the above drawbacks. SEA starts by using a baseline (or production) ranking system (i.e., policy), which does not harm the user experience and, thus, is safe to execute, but has suboptimal performance and, thus, needs to be improved. Then SEA uses counterfactual learning to learn a new policy based on the behavior of the baseline policy. SEA also uses high-confidence off-policy evaluation to estimate the performance of the newly learned policy. Once the performance of the newly learned policy is at least as good as the performance of the baseline policy, SEA starts using the new policy to execute new actions, allowing it to actively explore favorable regions of the action space. This way, SEA never performs worse than the baseline policy and, thus, does not harm the user experience, while still exploring the action space and, thus, being able to find an optimal policy. Our experiments using text classification and document retrieval confirm the above by comparing SEA (and a boundless variant called BSEA) to online and offline learning methods for contextual bandit problems. △ Less

Submitted 2 February, 2020; originally announced February 2020.

Comments: 23 pages, 3 figures

arXiv:2001.05918 [pdf, other]

Elastic Consistency: A General Consistency Model for Distributed Stochastic Gradient Descent

Authors: Giorgi Nadiradze, Ilia Markov, Bapi Chatterjee, Vyacheslav Kungurtsev, Dan Alistarh

Abstract: Machine learning has made tremendous progress in recent years, with models matching or even surpassing humans on a series of specialized tasks. One key element behind the progress of machine learning in recent years has been the ability to train machine learning models in large-scale distributed shared-memory and message-passing environments. Many of these models are trained employing variants of… ▽ More Machine learning has made tremendous progress in recent years, with models matching or even surpassing humans on a series of specialized tasks. One key element behind the progress of machine learning in recent years has been the ability to train machine learning models in large-scale distributed shared-memory and message-passing environments. Many of these models are trained employing variants of stochastic gradient descent (SGD) based optimization. In this paper, we introduce a general consistency condition covering communication-reduced and asynchronous distributed SGD implementations. Our framework, called elastic consistency enables us to derive convergence bounds for a variety of distributed SGD methods used in practice to train large-scale machine learning models. The proposed framework de-clutters the implementation-specific convergence analysis and provides an abstraction to derive convergence bounds. We utilize the framework to analyze a sparsification scheme for distributed SGD methods in an asynchronous setting for convex and non-convex objectives. We implement the distributed SGD variant to train deep CNN models in an asynchronous shared-memory setting. Empirical results show that error-feedback may not necessarily help in improving the convergence of sparsified asynchronous distributed SGD, which corroborates an insight suggested by our convergence analysis. △ Less

Submitted 28 June, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

arXiv:1908.06077 [pdf, other]

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

Authors: Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

Abstract: As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD prov… ▽ More As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods. △ Less

Submitted 3 May, 2021; v1 submitted 16 August, 2019; originally announced August 2019.

Comments: 42 pages, 21 figures. To appear in the Journal of Machine Learning Research (JMLR)

arXiv:1903.02939 [pdf, other]

doi 10.1145/3308558.3313419

ViTOR: Learning to Rank Webpages Based on Visual Features

Authors: Bram van den Akker, Ilya Markov, Maarten de Rijke

Abstract: The visual appearance of a webpage carries valuable information about its quality and can be used to improve the performance of learning to rank (LTR). We introduce the Visual learning TO Rank (ViTOR) model that integrates state-of-the-art visual features extraction methods by (i) transfer learning from a pre-trained image classification model, and (ii) synthetic saliency heat maps generated from… ▽ More The visual appearance of a webpage carries valuable information about its quality and can be used to improve the performance of learning to rank (LTR). We introduce the Visual learning TO Rank (ViTOR) model that integrates state-of-the-art visual features extraction methods by (i) transfer learning from a pre-trained image classification model, and (ii) synthetic saliency heat maps generated from webpage snapshots. Since there is currently no public dataset for the task of LTR with visual features, we also introduce and release the ViTOR dataset, containing visually rich and diverse webpages. The ViTOR dataset consists of visual snapshots, non-visual features and relevance judgments for ClueWeb12 webpages and TREC Web Track queries. We experiment with the proposed ViTOR model on the ViTOR dataset and show that it significantly improves the performance of LTR with visual features △ Less

Submitted 7 March, 2019; originally announced March 2019.

Comments: In Proceedings of the 2019 World Wide Web Conference (WWW 2019), May 2019, San Francisco

ACM Class: H.3.3; I.2

arXiv:1812.04910 [pdf, other]

Online Learning to Rank with List-level Feedback for Image Filtering

Authors: Chang Li, Artem Grotov, Ilya Markov, Maarten de Rijke

Abstract: Online learning to rank (OLTR) via implicit feedback has been extensively studied for document retrieval in cases where the feedback is available at the level of individual items. To learn from item-level feedback, the current algorithms require certain assumptions about user behavior. In this paper, we study a more general setup: OLTR with list-level feedback, where the feedback is provided only… ▽ More Online learning to rank (OLTR) via implicit feedback has been extensively studied for document retrieval in cases where the feedback is available at the level of individual items. To learn from item-level feedback, the current algorithms require certain assumptions about user behavior. In this paper, we study a more general setup: OLTR with list-level feedback, where the feedback is provided only at the level of an entire ranked list. We propose two methods that allow online learning to rank in this setup. The first method, PGLearn, uses a ranking model to generate policies and optimizes it online using policy gradients. The second method, RegLearn, learns to combine individual document relevance scores by directly predicting the observed list-level feedback through regression. We evaluate the proposed methods on the image filtering task, in which deep neural networks (DNNs) are used to rank images in response to a set of standing queries. We show that PGLearn does not perform well in OLTR with list-level feedback. RegLearn, instead, shows good performance in both online and offline metrics. △ Less

Submitted 9 January, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

arXiv:1812.04412 [pdf, other]

MergeDTS: A Method for Effective Large-Scale Online Ranker Evaluation

Authors: Chang Li, Ilya Markov, Maarten de Rijke, Masrour Zoghi

Abstract: Online ranker evaluation is one of the key challenges in information retrieval. While the preferences of rankers can be inferred by interleaving methods, the problem of how to effectively choose the ranker pair that generates the interleaved list without degrading the user experience too much is still challenging. On the one hand, if two rankers have not been compared enough, the inferred preferen… ▽ More Online ranker evaluation is one of the key challenges in information retrieval. While the preferences of rankers can be inferred by interleaving methods, the problem of how to effectively choose the ranker pair that generates the interleaved list without degrading the user experience too much is still challenging. On the one hand, if two rankers have not been compared enough, the inferred preference can be noisy and inaccurate. On the other, if two rankers are compared too many times, the interleaving process inevitably hurts the user experience too much. This dilemma is known as the exploration versus exploitation tradeoff. It is captured by the $K$-armed dueling bandit problem, which is a variant of the $K$-armed bandit problem, where the feedback comes in the form of pairwise preferences. Today's deployed search systems can evaluate a large number of rankers concurrently, and scaling effectively in the presence of numerous rankers is a critical aspect of $K$-armed dueling bandit problems. In this paper, we focus on solving the large-scale online ranker evaluation problem under the so-called Condorcet assumption, where there exists an optimal ranker that is preferred to all other rankers. We propose Merge Double Thompson Sampling (MergeDTS), which first utilizes a divide-and-conquer strategy that localizes the comparisons carried out by the algorithm to small batches of rankers, and then employs Thompson Sampling (TS) to reduce the comparisons between suboptimal rankers inside these small batches. The effectiveness (regret) and efficiency (time complexity) of MergeDTS are extensively evaluated using examples from the domain of online evaluation for web search. Our main finding is that for large-scale Condorcet ranker evaluation problems, MergeDTS outperforms the state-of-the-art dueling bandit algorithms. △ Less

Submitted 9 August, 2020; v1 submitted 11 December, 2018; originally announced December 2018.

Comments: Accepted at TOIS

arXiv:1807.10749 [pdf, other]

Quantum Supremacy Is Both Closer and Farther than It Appears

Authors: Igor L. Markov, Aneeqa Fatima, Sergei V. Isakov, Sergio Boixo

Abstract: As quantum computers improve in the number of qubits and fidelity, the question of when they surpass state-of-the-art classical computation for a well-defined computational task is attracting much attention. The leading candidate task for this milestone entails sampling from the output distribution defined by a random quantum circuit. We develop a massively-parallel simulation tool Rollright that… ▽ More As quantum computers improve in the number of qubits and fidelity, the question of when they surpass state-of-the-art classical computation for a well-defined computational task is attracting much attention. The leading candidate task for this milestone entails sampling from the output distribution defined by a random quantum circuit. We develop a massively-parallel simulation tool Rollright that does not require inter-process communication (IPC) or proprietary hardware. We also develop two ways to trade circuit fidelity for computational speedups, so as to match the fidelity of a given quantum computer --- a task previously thought impossible. We report massive speedups for the sampling task over prior software from Microsoft, IBM, Alibaba and Google, as well as supercomputer and GPU-based simulations. By using publicly available Google Cloud Computing, we price such simulations and enable comparisons by total cost across hardware platforms. We simulate approximate sampling from the output of a circuit with 7x8 qubits and depth 1+40+1 by producing one million bitstring probabilities with fidelity 0.5%, at an estimated cost of $35184. The simulation costs scale linearly with fidelity, and using this scaling we estimate that extending circuit depth to 1+48+1 increases costs to one million dollars. Scaling the simulation to 10M bitstring probabilities needed for sampling 1M bitstrings helps comparing simulation to quantum computers. We describe refinements in benchmarks that slow down leading simulators, halving the circuit depth that can be simulated within the same time. △ Less

Submitted 26 September, 2018; v1 submitted 27 July, 2018; originally announced July 2018.

Comments: 32 pages, 3 figures, 1. A new section on how to simulate sampling. 2. New comparisons with simulators developed by other groups. Edited for clarity

Journal ref: DAC 2020

arXiv:1806.05819 [pdf, other]

BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback

Authors: Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi

Abstract: In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists. Learning to rank has traditionally been studied in two settings. In the offline setting, rankers are typically learned from relevance labels created by judges. This approach has generally become standard in industrial applications of ranking, such as search… ▽ More In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists. Learning to rank has traditionally been studied in two settings. In the offline setting, rankers are typically learned from relevance labels created by judges. This approach has generally become standard in industrial applications of ranking, such as search. However, this approach lacks exploration and thus is limited by the information content of the offline training data. In the online setting, an algorithm can experiment with lists and learn from feedback on them in a sequential fashion. Bandit algorithms are well-suited for this setting but they tend to learn user preferences from scratch, which results in a high initial cost of exploration. This poses an additional challenge of safe exploration in ranked lists. We propose BubbleRank, a bandit algorithm for safe re-ranking that combines the strengths of both the offline and online settings. The algorithm starts with an initial base list and improves it online by gradually exchanging higher-ranked less attractive items for lower-ranked more attractive items. We prove an upper bound on the n-step regret of BubbleRank that degrades gracefully with the quality of the initial base list. Our theoretical findings are supported by extensive experiments on a large-scale real-world click dataset. △ Less

Submitted 29 June, 2019; v1 submitted 15 June, 2018; originally announced June 2018.

arXiv:1805.03411 [pdf, other]

A Click Sequence Model for Web Search

Authors: Alexey Borisov, Martijn Wardenaar, Ilya Markov, Maarten de Rijke

Abstract: Getting a better understanding of user behavior is important for advancing information retrieval systems. Existing work focuses on modeling and predicting single interaction events, such as clicks. In this paper, we for the first time focus on modeling and predicting sequences of interaction events. And in particular, sequences of clicks. We formulate the problem of click sequence prediction and… ▽ More Getting a better understanding of user behavior is important for advancing information retrieval systems. Existing work focuses on modeling and predicting single interaction events, such as clicks. In this paper, we for the first time focus on modeling and predicting sequences of interaction events. And in particular, sequences of clicks. We formulate the problem of click sequence prediction and propose a click sequence model (CSM) that aims to predict the order in which a user will interact with search engine results. CSM is based on a neural network that follows the encoder-decoder architecture. The encoder computes contextual embeddings of the results. The decoder predicts the sequence of positions of the clicked results. It uses an attention mechanism to extract necessary information about the results at each timestep. We optimize the parameters of CSM by maximizing the likelihood of observed click sequences. We test the effectiveness of CSM on three new tasks: (i) predicting click sequences, (ii) predicting the number of clicks, and (iii) predicting whether or not a user will interact with the results in the order these results are presented on a search engine result page (SERP). Also, we show that CSM achieves state-of-the-art results on a standard click prediction task, where the goal is to predict an unordered set of results a user will click on. △ Less

Submitted 9 May, 2018; originally announced May 2018.

arXiv:1712.03554 [pdf, other]

Simulation of Quantum Circuits via Stabilizer Frames

Authors: Héctor J. García, Igor L. Markov

Abstract: Generic quantum-circuit simulation appears intractable for conventional computers and may be unnecessary because useful quantum circuits exhibit significant structure that can be exploited during simulation. For example, Gottesman and Knill identified an important subclass, called stabilizer circuits, which can be simulated efficiently using group-theory techniques and insights from quantum physic… ▽ More Generic quantum-circuit simulation appears intractable for conventional computers and may be unnecessary because useful quantum circuits exhibit significant structure that can be exploited during simulation. For example, Gottesman and Knill identified an important subclass, called stabilizer circuits, which can be simulated efficiently using group-theory techniques and insights from quantum physics. Realistic circuits enriched with quantum error-correcting codes and fault-tolerant procedures are dominated by stabilizer subcircuits and contain a relatively small number of non-Clifford components. Therefore, we develop new data structures and algorithms that facilitate parallel simulation of such circuits. Stabilizer frames offer more compact storage than previous approaches but require more sophisticated bookkeeping. Our implementation, called Quipu, simulates certain quantum arithmetic circuits (e.g., reversible ripple-carry adders) in polynomial time and space for equal superpositions of $n$-qubits. On such instances, known linear-algebraic simulation techniques, such as the (state-of-the-art) BDD-based simulator QuIDDPro, take exponential time. We simulate quantum Fourier transform and quantum fault-tolerant circuits using Quipu, and the results demonstrate that our stabilizer-based technique empirically outperforms QuIDDPro in all cases. While previous high-performance, structure-aware simulations of quantum circuits were difficult to parallelize, we demonstrate that Quipu can be parallelized with a nontrivial computational speedup. △ Less

Submitted 10 December, 2017; originally announced December 2017.

Comments: 15 pages, 18 figures, 3 tables

Journal ref: IEEE Transactions on Computers, vol. 64, no. 8, 2015

arXiv:1711.07848 [pdf, other]

On the Geometry of Stabilizer States

Authors: Héctor J. García, Igor L. Markov, Andrew W. Cross

Abstract: Large-scale quantum computation is likely to require massive quantum error correction (QEC). QEC codes and circuits are described via the stabilizer formalism, which represents stabilizer states by keeping track of the operators that preserve them. Such states are obtained by stabilizer circuits (consisting of CNOT, Hadamard and Phase gates) and can be represented compactly on conventional compute… ▽ More Large-scale quantum computation is likely to require massive quantum error correction (QEC). QEC codes and circuits are described via the stabilizer formalism, which represents stabilizer states by keeping track of the operators that preserve them. Such states are obtained by stabilizer circuits (consisting of CNOT, Hadamard and Phase gates) and can be represented compactly on conventional computers using $O(n^2)$ bits, where $n$ is the number of qubits. As an additional application, the work by Aaronson and Gottesman suggests the use of superpositions of stabilizer states to represent arbitrary quantum states. To aid in such applications and improve our understanding of stabilizer states, we characterize and count nearest-neighbor stabilizer states, quantify the distribution of angles between pairs of stabilizer states, study succinct stabilizer superpositions and stabilizer bivectors, explore the approximation of non-stabilizer states by single stabilizer states and short linear combinations of stabilizer states, develop an improved inner-product computation for stabilizer states via synthesis of compact canonical stabilizer circuits, propose an orthogonalization procedure for stabilizer states, and evaluate several of these algorithms empirically. △ Less

Submitted 20 November, 2017; originally announced November 2017.

Comments: 38 pages, 10 figures, 2 Appendices. arXiv admin note: substantial text overlap with arXiv:1210.6646

Journal ref: Quantum Information and Computation (QIC), vol. 14, no. 7-8, pp. 683-720, 2014

arXiv:1709.05298 [pdf, other]

Conversational Exploratory Search via Interactive Storytelling

Authors: Svitlana Vakulenko, Ilya Markov, Maarten de Rijke

Abstract: Conversational interfaces are likely to become more efficient, intuitive and engaging way for human-computer interaction than today's text or touch-based interfaces. Current research efforts concerning conversational interfaces focus primarily on question answering functionality, thereby neglecting support for search activities beyond targeted information lookup. Users engage in exploratory search… ▽ More Conversational interfaces are likely to become more efficient, intuitive and engaging way for human-computer interaction than today's text or touch-based interfaces. Current research efforts concerning conversational interfaces focus primarily on question answering functionality, thereby neglecting support for search activities beyond targeted information lookup. Users engage in exploratory search when they are unfamiliar with the domain of their goal, unsure about the ways to achieve their goals, or unsure about their goals in the first place. Exploratory search is often supported by approaches from information visualization. However, such approaches cannot be directly translated to the setting of conversational search. In this paper we investigate the affordances of interactive storytelling as a tool to enable exploratory search within the framework of a conversational interface. Interactive storytelling provides a way to navigate a document collection in the pace and order a user prefers. In our vision, interactive storytelling is to be coupled with a dialogue-based system that provides verbal explanations and responsive design. We discuss challenges and sketch the research agenda required to put this vision into life. △ Less

Submitted 15 September, 2017; originally announced September 2017.

Comments: Accepted at ICTIR'17 Workshop on Search-Oriented Conversational AI (SCAI 2017)

arXiv:1412.0650 [pdf, ps, other]

A review of "Mem-computing NP-complete problems in polynomial time using polynomial resources" (arXiv:1411.4798)

Authors: Igor L. Markov

Abstract: The reviewed paper describes an analog device that empirically solves small instances of the NP-complete Subset Sum Problem (SSP). The authors claim that this device can solve the SSP in polynomial time using polynomial space, in principle, and observe no exponential scaling in resource requirements. We point out that (a) the properties ascribed by the authors to their device are insufficient to s… ▽ More The reviewed paper describes an analog device that empirically solves small instances of the NP-complete Subset Sum Problem (SSP). The authors claim that this device can solve the SSP in polynomial time using polynomial space, in principle, and observe no exponential scaling in resource requirements. We point out that (a) the properties ascribed by the authors to their device are insufficient to solve NP-complete problems in poly-time, (b) runtime analysis offered does not cover the spectral measurement step, (c) the overall technique requires exponentially increasing resources when scaled up because of the spectral measurement step. △ Less

Submitted 22 April, 2015; v1 submitted 29 November, 2014; originally announced December 2014.

arXiv:1408.3821 [pdf, other]

doi 10.1038/nature13570

Limits on Fundamental Limits to Computation

Authors: Igor L. Markov

Abstract: An indispensable part of our lives, computing has also become essential to industries and governments. Steady improvements in computer hardware have been supported by periodic doubling of transistor densities in integrated circuits over the last fifty years. Such Moore scaling now requires increasingly heroic efforts, stimulating research in alternative hardware and stirring controversy. To help e… ▽ More An indispensable part of our lives, computing has also become essential to industries and governments. Steady improvements in computer hardware have been supported by periodic doubling of transistor densities in integrated circuits over the last fifty years. Such Moore scaling now requires increasingly heroic efforts, stimulating research in alternative hardware and stirring controversy. To help evaluate emerging technologies and enrich our understanding of integrated-circuit scaling, we review fundamental limits to computation: in manufacturing, energy, physical space, design and verification effort, and algorithms. To outline what is achievable in principle and in practice, we recall how some limits were circumvented, compare loose and tight limits. We also point out that engineering difficulties encountered by emerging technologies may indicate yet-unknown limits. △ Less

Submitted 8 January, 2015; v1 submitted 17 August, 2014; originally announced August 2014.

Comments: 15 pages, 4 figures, 1 table

Journal ref: Nature 512, 147-154 (14 August 2014)

arXiv:1304.7516 [pdf, ps, other]

Quantum Circuits for GCD Computation with $O(n \log n)$ Depth and O(n) Ancillae

Authors: Mehdi Saeedi, Igor L. Markov

Abstract: GCD computations and variants of the Euclidean algorithm enjoy broad uses in both classical and quantum algorithms. In this paper, we propose quantum circuits for GCD computation with $O(n \log n)$ depth with O(n) ancillae. Prior circuit construction needs $O(n^2)$ running time with O(n) ancillae. The proposed construction is based on the binary GCD algorithm and it benefits from log-depth circuit… ▽ More GCD computations and variants of the Euclidean algorithm enjoy broad uses in both classical and quantum algorithms. In this paper, we propose quantum circuits for GCD computation with $O(n \log n)$ depth with O(n) ancillae. Prior circuit construction needs $O(n^2)$ running time with O(n) ancillae. The proposed construction is based on the binary GCD algorithm and it benefits from log-depth circuits for 1-bit shift, comparison/subtraction, and managing ancillae. The worst-case gate count remains $O(n^2)$, as in traditional circuits. △ Less

Submitted 28 April, 2013; originally announced April 2013.

Comments: 5 pages, 6 figures, 1 table

Showing 1–50 of 62 results for author: Markov, I