Search | arXiv e-print repository

MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs

Authors: Quang H. Nguyen, Duy C. Hoang, Juliette Decugis, Saurav Manchanda, Nitesh V. Chawla, Khoa D. Doan

Abstract: The rapid progress in machine learning (ML) has brought forth many large language models (LLMs) that excel in various tasks and areas. These LLMs come with different abilities and costs in terms of computation or pricing. Since the demand for each query can vary, e.g., because of the queried domain or its complexity, defaulting to one LLM in an application is not usually the best choice, whether i… ▽ More The rapid progress in machine learning (ML) has brought forth many large language models (LLMs) that excel in various tasks and areas. These LLMs come with different abilities and costs in terms of computation or pricing. Since the demand for each query can vary, e.g., because of the queried domain or its complexity, defaulting to one LLM in an application is not usually the best choice, whether it is the biggest, priciest, or even the one with the best average test performance. Consequently, picking the right LLM that is both accurate and cost-effective for an application remains a challenge. In this paper, we introduce MetaLLM, a framework that dynamically and intelligently routes each query to the optimal LLM (among several available LLMs) for classification tasks, achieving significantly improved accuracy and cost-effectiveness. By framing the selection problem as a multi-armed bandit, MetaLLM balances prediction accuracy and cost efficiency under uncertainty. Our experiments, conducted on popular LLM platforms such as OpenAI's GPT models, Amazon's Titan, Anthropic's Claude, and Meta's LLaMa, showcase MetaLLM's efficacy in real-world scenarios, laying the groundwork for future extensions beyond classification tasks. △ Less

Submitted 24 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.03792 [pdf, other]

NeuroSteiner: A Graph Transformer for Wirelength Estimation

Authors: Sahil Manchanda, Dana Kianfar, Markus Peschl, Romain Lepert, Michaël Defferrard

Abstract: A core objective of physical design is to minimize wirelength (WL) when placing chip components on a canvas. Computing the minimal WL of a placement requires finding rectilinear Steiner minimum trees (RSMTs), an NP-hard problem. We propose NeuroSteiner, a neural model that distills GeoSteiner, an optimal RSMT solver, to navigate the cost--accuracy frontier of WL estimation. NeuroSteiner is trained… ▽ More A core objective of physical design is to minimize wirelength (WL) when placing chip components on a canvas. Computing the minimal WL of a placement requires finding rectilinear Steiner minimum trees (RSMTs), an NP-hard problem. We propose NeuroSteiner, a neural model that distills GeoSteiner, an optimal RSMT solver, to navigate the cost--accuracy frontier of WL estimation. NeuroSteiner is trained on synthesized nets labeled by GeoSteiner, alleviating the need to train on real chip designs. Moreover, NeuroSteiner's differentiability allows to place by minimizing WL through gradient descent. On ISPD 2005 and 2019, NeuroSteiner can obtain 0.3% WL error while being 60% faster than GeoSteiner, or 0.2% and 30%. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Work-in-Progress poster at the 2024 Design and Automation Conference (DAC'24)

arXiv:2312.05571 [pdf, other]

Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

Authors: Subhabrata Dutta, Joykirat Singh, Ishan Pandey, Sunny Manchanda, Soumen Chakrabarti, Tanmoy Chakraborty

Abstract: Large Language Models (LLM) exhibit zero-shot mathematical reasoning capacity as a behavior emergent with scale, commonly manifesting as chain-of-thoughts (CoT) reasoning. However, multiple empirical findings suggest that this prowess is exclusive to LLMs with exorbitant sizes (beyond 50 billion parameters). Meanwhile, educational neuroscientists suggest that symbolic algebraic manipulation be int… ▽ More Large Language Models (LLM) exhibit zero-shot mathematical reasoning capacity as a behavior emergent with scale, commonly manifesting as chain-of-thoughts (CoT) reasoning. However, multiple empirical findings suggest that this prowess is exclusive to LLMs with exorbitant sizes (beyond 50 billion parameters). Meanwhile, educational neuroscientists suggest that symbolic algebraic manipulation be introduced around the same time as arithmetic word problems to modularize language-to-formulation, symbolic manipulation of the formulation, and endgame arithmetic. In this paper, we start with the hypothesis that much smaller LMs, which are weak at multi-step reasoning, can achieve reasonable arithmetic reasoning if arithmetic word problems are posed as a formalize-then-solve task. In our architecture, which we call SYRELM, the LM serves the role of a translator to map natural language arithmetic questions into a formal language (FL) description. A symbolic solver then evaluates the FL expression to obtain the answer. A small frozen LM, equipped with an efficient low-rank adapter, is capable of generating FL expressions that incorporate natural language descriptions of the arithmetic problem (e.g., variable names and their purposes, formal expressions combining variables, etc.). We adopt policy-gradient reinforcement learning to train the adapted LM, informed by the non-differentiable symbolic solver. This marks a sharp departure from the recent development in tool-augmented LLMs, in which the external tools (e.g., calculator, Web search, etc.) are essentially detached from the learning phase of the LM. SYRELM shows massive improvements (e.g., +30.65 absolute point improvement in accuracy on the SVAMP dataset using GPT-J 6B model) over base LMs, while keeping our testbed easy to diagnose, interpret and within reach of most researchers. △ Less

Submitted 19 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

Comments: AAAI 2024

arXiv:2311.14459 [pdf, other]

IDD-AW: A Benchmark for Safe and Robust Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather

Authors: Furqan Ahmed Shaik, Abhishek Malreddy, Nikhil Reddy Billa, Kunal Chaudhary, Sunny Manchanda, Girish Varma

Abstract: Large-scale deployment of fully autonomous vehicles requires a very high degree of robustness to unstructured traffic, and weather conditions, and should prevent unsafe mispredictions. While there are several datasets and benchmarks focusing on segmentation for drive scenes, they are not specifically focused on safety and robustness issues. We introduce the IDD-AW dataset, which provides 5000 pair… ▽ More Large-scale deployment of fully autonomous vehicles requires a very high degree of robustness to unstructured traffic, and weather conditions, and should prevent unsafe mispredictions. While there are several datasets and benchmarks focusing on segmentation for drive scenes, they are not specifically focused on safety and robustness issues. We introduce the IDD-AW dataset, which provides 5000 pairs of high-quality images with pixel-level annotations, captured under rain, fog, low light, and snow in unstructured driving conditions. As compared to other adverse weather datasets, we provide i.) more annotated images, ii.) paired Near-Infrared (NIR) image for each frame, iii.) larger label set with a 4-level label hierarchy to capture unstructured traffic conditions. We benchmark state-of-the-art models for semantic segmentation in IDD-AW. We also propose a new metric called ''Safe mean Intersection over Union (Safe mIoU)'' for hierarchical datasets which penalizes dangerous mispredictions that are not captured in the traditional definition of mean Intersection over Union (mIoU). The results show that IDD-AW is one of the most challenging datasets to date for these tasks. The dataset and code will be available here: http://iddaw.github.io. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 8 pages excluding references. Accepted in WACV 2024

arXiv:2310.18338 [pdf, other]

Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning

Authors: Gurusha Juneja, Subhabrata Dutta, Soumen Chakrabarti, Sunny Manchanda, Tanmoy Chakraborty

Abstract: Large Language Models (LLMs) prompted to generate chain-of-thought (CoT) exhibit impressive reasoning capabilities. Recent attempts at prompt decomposition toward solving complex, multi-step reasoning problems depend on the ability of the LLM to simultaneously decompose and solve the problem. A significant disadvantage is that foundational LLMs are typically not available for fine-tuning, making a… ▽ More Large Language Models (LLMs) prompted to generate chain-of-thought (CoT) exhibit impressive reasoning capabilities. Recent attempts at prompt decomposition toward solving complex, multi-step reasoning problems depend on the ability of the LLM to simultaneously decompose and solve the problem. A significant disadvantage is that foundational LLMs are typically not available for fine-tuning, making adaptation computationally prohibitive. We believe (and demonstrate) that problem decomposition and solution generation are distinct capabilites, better addressed in separate modules, than by one monolithic LLM. We introduce DaSLaM, which uses a decomposition generator to decompose complex problems into subproblems that require fewer reasoning steps. These subproblems are answered by a solver. We use a relatively small (13B parameters) LM as the decomposition generator, which we train using policy gradient optimization to interact with a solver LM (regarded as black-box) and guide it through subproblems, thereby rendering our method solver-agnostic. Evaluation on multiple different reasoning datasets reveal that with our method, a 175 billion parameter LM (text-davinci-003) can produce competitive or even better performance, compared to its orders-of-magnitude larger successor, GPT-4. Additionally, we show that DaSLaM is not limited by the solver's capabilities as a function of scale; e.g., solver LMs with diverse sizes give significant performance improvement with our solver-agnostic decomposition technique. Exhaustive ablation studies evince the superiority of our modular finetuning technique over exorbitantly large decomposer LLMs, based on prompting alone. △ Less

Submitted 27 February, 2024; v1 submitted 21 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 (Typos corrected)

arXiv:2310.11787 [pdf, other]

NeuroCUT: A Neural Approach for Robust Graph Partitioning

Authors: Rishi Shah, Krishnanshu Jain, Sahil Manchanda, Sourav Medya, Sayan Ranu

Abstract: Graph partitioning aims to divide a graph into disjoint subsets while optimizing a specific partitioning objective. The majority of formulations related to graph partitioning exhibit NP-hardness due to their combinatorial nature. Conventional methods, like approximation algorithms or heuristics, are designed for distinct partitioning objectives and fail to achieve generalization across other impor… ▽ More Graph partitioning aims to divide a graph into disjoint subsets while optimizing a specific partitioning objective. The majority of formulations related to graph partitioning exhibit NP-hardness due to their combinatorial nature. Conventional methods, like approximation algorithms or heuristics, are designed for distinct partitioning objectives and fail to achieve generalization across other important partitioning objectives. Recently machine learning-based methods have been developed that learn directly from data. Further, these methods have a distinct advantage of utilizing node features that carry additional information. However, these methods assume differentiability of target partitioning objective functions and cannot generalize for an unknown number of partitions, i.e., they assume the number of partitions is provided in advance. In this study, we develop NeuroCUT with two key innovations over previous methodologies. First, by leveraging a reinforcement learning-based framework over node representations derived from a graph neural network and positional features, NeuroCUT can accommodate any optimization objective, even those with non-differentiable functions. Second, we decouple the parameter space and the partition count making NeuroCUT inductive to any unseen number of partition, which is provided at query time. Through empirical evaluation, we demonstrate that NeuroCUT excels in identifying high-quality partitions, showcases strong generalization across a wide spectrum of partitioning objectives, and exhibits strong generalization to unseen partition count. △ Less

Submitted 21 June, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: To appear in Knowledge Discovery and Data Mining(KDD), 2024

arXiv:2310.09486 [pdf, other]

Mirage: Model-Agnostic Graph Distillation for Graph Classification

Authors: Mridul Gupta, Sahil Manchanda, Hariprasad Kodamana, Sayan Ranu

Abstract: GNNs, like other deep learning models, are data and computation hungry. There is a pressing need to scale training of GNNs on large datasets to enable their usage on low-resource environments. Graph distillation is an effort in that direction with the aim to construct a smaller synthetic training set from the original training data without significantly compromising model performance. While initia… ▽ More GNNs, like other deep learning models, are data and computation hungry. There is a pressing need to scale training of GNNs on large datasets to enable their usage on low-resource environments. Graph distillation is an effort in that direction with the aim to construct a smaller synthetic training set from the original training data without significantly compromising model performance. While initial efforts are promising, this work is motivated by two key observations: (1) Existing graph distillation algorithms themselves rely on training with the full dataset, which undermines the very premise of graph distillation. (2) The distillation process is specific to the target GNN architecture and hyper-parameters and thus not robust to changes in the modeling pipeline. We circumvent these limitations by designing a distillation algorithm called Mirage for graph classification. Mirage is built on the insight that a message-passing GNN decomposes the input graph into a multiset of computation trees. Furthermore, the frequency distribution of computation trees is often skewed in nature, enabling us to condense this data into a concise distilled summary. By compressing the computation data itself, as opposed to emulating gradient flows on the original training set-a prevalent approach to date-Mirage transforms into an unsupervised and architecture-agnostic distillation algorithm. Extensive benchmarking on real-world datasets underscores Mirage's superiority, showcasing enhanced generalization accuracy, data compression, and distillation efficiency when compared to state-of-the-art baselines. △ Less

Submitted 31 March, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

Comments: 14 pages, 14 figures

arXiv:2310.01452 [pdf, other]

Fooling the Textual Fooler via Randomizing Latent Representations

Authors: Duy C. Hoang, Quang H. Nguyen, Saurav Manchanda, MinLong Peng, Kok-Seng Wong, Khoa D. Doan

Abstract: Despite outstanding performance in a variety of NLP tasks, recent studies have revealed that NLP models are vulnerable to adversarial attacks that slightly perturb the input to cause the models to misbehave. Among these attacks, adversarial word-level perturbations are well-studied and effective attack strategies. Since these attacks work in black-box settings, they do not require access to the mo… ▽ More Despite outstanding performance in a variety of NLP tasks, recent studies have revealed that NLP models are vulnerable to adversarial attacks that slightly perturb the input to cause the models to misbehave. Among these attacks, adversarial word-level perturbations are well-studied and effective attack strategies. Since these attacks work in black-box settings, they do not require access to the model architecture or model parameters and thus can be detrimental to existing NLP applications. To perform an attack, the adversary queries the victim model many times to determine the most important words in an input text and to replace these words with their corresponding synonyms. In this work, we propose a lightweight and attack-agnostic defense whose main goal is to perplex the process of generating an adversarial example in these query-based black-box attacks; that is to fool the textual fooler. This defense, named AdvFooler, works by randomizing the latent representation of the input at inference time. Different from existing defenses, AdvFooler does not necessitate additional computational overhead during training nor relies on assumptions about the potential adversarial perturbation set while having a negligible impact on the model's accuracy. Our theoretical and empirical analyses highlight the significance of robustness resulting from confusing the adversary via randomizing the latent space, as well as the impact of randomization on clean accuracy. Finally, we empirically demonstrate near state-of-the-art robustness of AdvFooler against representative adversarial word-level attacks on two benchmark datasets. △ Less

Submitted 9 June, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Accepted to Findings of ACL 2024

arXiv:2306.03480 [pdf, other]

GSHOT: Few-shot Generative Modeling of Labeled Graphs

Authors: Sahil Manchanda, Shubham Gupta, Sayan Ranu, Srikanta Bedathur

Abstract: Deep graph generative modeling has gained enormous attraction in recent years due to its impressive ability to directly learn the underlying hidden graph distribution. Despite their initial success, these techniques, like much of the existing deep generative methods, require a large number of training samples to learn a good model. Unfortunately, large number of training samples may not always be… ▽ More Deep graph generative modeling has gained enormous attraction in recent years due to its impressive ability to directly learn the underlying hidden graph distribution. Despite their initial success, these techniques, like much of the existing deep generative methods, require a large number of training samples to learn a good model. Unfortunately, large number of training samples may not always be available in scenarios such as drug discovery for rare diseases. At the same time, recent advances in few-shot learning have opened door to applications where available training data is limited. In this work, we introduce the hitherto unexplored paradigm of few-shot graph generative modeling. Towards this, we develop GSHOT, a meta-learning based framework for few-shot labeled graph generative modeling. GSHOT learns to transfer meta-knowledge from similar auxiliary graph datasets. Utilizing these prior experiences, GSHOT quickly adapts to an unseen graph dataset through self-paced fine-tuning. Through extensive experiments on datasets from diverse domains having limited training samples, we establish that GSHOT generates graphs of superior fidelity compared to existing baselines. △ Less

Submitted 14 December, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: Accepted in Learning on Graph Conference (LOG,2023),https://openreview.net/forum?id=Hy9K2WiVwW

arXiv:2306.03447 [pdf, other]

GRAFENNE: Learning on Graphs with Heterogeneous and Dynamic Feature Sets

Authors: Shubham Gupta, Sahil Manchanda, Sayan Ranu, Srikanta Bedathur

Abstract: Graph neural networks (GNNs), in general, are built on the assumption of a static set of features characterizing each node in a graph. This assumption is often violated in practice. Existing methods partly address this issue through feature imputation. However, these techniques (i) assume uniformity of feature set across nodes, (ii) are transductive by nature, and (iii) fail to work when features… ▽ More Graph neural networks (GNNs), in general, are built on the assumption of a static set of features characterizing each node in a graph. This assumption is often violated in practice. Existing methods partly address this issue through feature imputation. However, these techniques (i) assume uniformity of feature set across nodes, (ii) are transductive by nature, and (iii) fail to work when features are added or removed over time. In this work, we address these limitations through a novel GNN framework called GRAFENNE. GRAFENNE performs a novel allotropic transformation on the original graph, wherein the nodes and features are decoupled through a bipartite encoding. Through a carefully chosen message passing framework on the allotropic transformation, we make the model parameter size independent of the number of features and thereby inductive to both unseen nodes and features. We prove that GRAFENNE is at least as expressive as any of the existing message-passing GNNs in terms of Weisfeiler-Leman tests, and therefore, the additional inductivity to unseen features does not come at the cost of expressivity. In addition, as demonstrated over four real-world graphs, GRAFENNE empowers the underlying GNN with high empirical efficacy and the ability to learn in continual fashion over streaming feature sets. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: 17 pages, 4 figures and 9 tables. Accepted in ICML 2023, DOI will be updated once it is available

arXiv:2301.12477 [pdf, other]

StriderNET: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes

Authors: Vaibhav Bihani, Sahil Manchanda, Srikanth Sastry, Sayan Ranu, N. M. Anoop Krishnan

Abstract: Optimization of atomic structures presents a challenging problem, due to their highly rough and non-convex energy landscape, with wide applications in the fields of drug design, materials discovery, and mechanics. Here, we present a graph reinforcement learning approach, StriderNET, that learns a policy to displace the atoms towards low energy configurations. We evaluate the performance of Strider… ▽ More Optimization of atomic structures presents a challenging problem, due to their highly rough and non-convex energy landscape, with wide applications in the fields of drug design, materials discovery, and mechanics. Here, we present a graph reinforcement learning approach, StriderNET, that learns a policy to displace the atoms towards low energy configurations. We evaluate the performance of StriderNET on three complex atomic systems, namely, binary Lennard-Jones particles, calcium silicate hydrates gel, and disordered silicon. We show that StriderNET outperforms all classical optimization algorithms and enables the discovery of a lower energy minimum. In addition, StriderNET exhibits a higher rate of reaching minima with energies, as confirmed by the average over multiple realizations. Finally, we show that StriderNET exhibits inductivity to unseen system sizes that are an order of magnitude different from the training system. △ Less

Submitted 29 January, 2023; originally announced January 2023.

arXiv:2209.08769 [pdf, other]

Walk-and-Relate: A Random-Walk-based Algorithm for Representation Learning on Sparse Knowledge Graphs

Authors: Saurav Manchanda

Abstract: Knowledge graph (KG) embedding techniques use structured relationships between entities to learn low-dimensional representations of entities and relations. The traditional KG embedding techniques (such as TransE and DistMult) estimate these embeddings via simple models developed over observed KG triplets. These approaches differ in their triplet scoring loss functions. As these models only use the… ▽ More Knowledge graph (KG) embedding techniques use structured relationships between entities to learn low-dimensional representations of entities and relations. The traditional KG embedding techniques (such as TransE and DistMult) estimate these embeddings via simple models developed over observed KG triplets. These approaches differ in their triplet scoring loss functions. As these models only use the observed triplets to estimate the embeddings, they are prone to suffer through data sparsity that usually occurs in the real-world knowledge graphs, i.e., the lack of enough triplets per entity. To settle this issue, we propose an efficient method to augment the number of triplets to address the problem of data sparsity. We use random walks to create additional triplets, such that the relations carried by these introduced triplets entail the metapath induced by the random walks. We also provide approaches to accurately and efficiently filter out informative metapaths from the possible set of metapaths, induced by the random walks. The proposed approaches are model-agnostic, and the augmented training dataset can be used with any KG embedding approach out of the box. Experimental results obtained on the benchmark datasets show the advantages of the proposed approach. △ Less

Submitted 14 October, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

arXiv:2209.05555 [pdf]

An Embedding-Based Grocery Search Model at Instacart

Authors: Yuqing Xie, Taesik Na, Xiao Xiao, Saurav Manchanda, Young Rao, Zhihong Xu, Guanghua Shu, Esther Vasiete, Tejaswi Tenneti, Haixun Wang

Abstract: The key to e-commerce search is how to best utilize the large yet noisy log data. In this paper, we present our embedding-based model for grocery search at Instacart. The system learns query and product representations with a two-tower transformer-based encoder architecture. To tackle the cold-start problem, we focus on content-based features. To train the model efficiently on noisy data, we propo… ▽ More The key to e-commerce search is how to best utilize the large yet noisy log data. In this paper, we present our embedding-based model for grocery search at Instacart. The system learns query and product representations with a two-tower transformer-based encoder architecture. To tackle the cold-start problem, we focus on content-based features. To train the model efficiently on noisy data, we propose a self-adversarial learning method and a cascade training method. AccOn an offline human evaluation dataset, we achieve 10% relative improvement in RECALL@20, and for online A/B testing, we achieve 4.1% cart-adds per search (CAPS) and 1.5% gross merchandise value (GMV) improvement. We describe how we train and deploy the embedding based search model and give a detailed analysis of the effectiveness of our method. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: Accepted by SIGIR eCom, July 15, 2022

arXiv:2208.12226 [pdf, other]

doi 10.1609/aaai.v37i7.26086

Lifelong Learning for Neural powered Mixed Integer Programming

Authors: Sahil Manchanda, Sayan Ranu

Abstract: Mixed Integer programs (MIPs) are typically solved by the Branch-and-Bound algorithm. Recently, Learning to imitate fast approximations of the expert strong branching heuristic has gained attention due to its success in reducing the running time for solving MIPs. However, existing learning-to-branch methods assume that the entire training data is available in a single session of training. This ass… ▽ More Mixed Integer programs (MIPs) are typically solved by the Branch-and-Bound algorithm. Recently, Learning to imitate fast approximations of the expert strong branching heuristic has gained attention due to its success in reducing the running time for solving MIPs. However, existing learning-to-branch methods assume that the entire training data is available in a single session of training. This assumption is often not true, and if the training data is supplied in continual fashion over time, existing techniques suffer from catastrophic forgetting. In this work, we study the hitherto unexplored paradigm of Lifelong Learning to Branch on Mixed Integer Programs. To mitigate catastrophic forgetting, we propose LIMIP, which is powered by the idea of modeling an MIP instance in the form of a bipartite graph, which we map to an embedding space using a bipartite Graph Attention Network. This rich embedding space avoids catastrophic forgetting through the application of knowledge distillation and elastic weight consolidation, wherein we learn the parameters key towards retaining efficacy and are therefore protected from significant drift. We evaluate LIMIP on a series of NP-hard problems and establish that in comparison to existing baselines, LIMIP is up to 50% better when confronted with lifelong learning. △ Less

Submitted 30 June, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence 37(7),9047-9054, 2023

arXiv:2206.00787 [pdf, other]

On the Generalization of Neural Combinatorial Optimization Heuristics

Authors: Sahil Manchanda, Sofia Michel, Darko Drakulic, Jean-Marc Andreoli

Abstract: Neural Combinatorial Optimization approaches have recently leveraged the expressiveness and flexibility of deep neural networks to learn efficient heuristics for hard Combinatorial Optimization (CO) problems. However, most of the current methods lack generalization: for a given CO problem, heuristics which are trained on instances with certain characteristics underperform when tested on instances… ▽ More Neural Combinatorial Optimization approaches have recently leveraged the expressiveness and flexibility of deep neural networks to learn efficient heuristics for hard Combinatorial Optimization (CO) problems. However, most of the current methods lack generalization: for a given CO problem, heuristics which are trained on instances with certain characteristics underperform when tested on instances with different characteristics. While some previous works have focused on varying the training instances properties, we postulate that a one-size-fit-all model is out of reach. Instead, we formalize solving a CO problem over a given instance distribution as a separate learning task and investigate meta-learning techniques to learn a model on a variety of tasks, in order to optimize its capacity to adapt to new tasks. Through extensive experiments, on two CO problems, using both synthetic and realistic instances, we show that our proposed meta-learning approach significantly improves the generalization of two state-of-the-art models. △ Less

Submitted 3 October, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Published in ECML PKDD 2022

arXiv:2203.03564 [pdf, other]

doi 10.1609/aaai.v36i6.20638

TIGGER: Scalable Generative Modelling for Temporal Interaction Graphs

Authors: Shubham Gupta, Sahil Manchanda, Srikanta Bedathur, Sayan Ranu

Abstract: There has been a recent surge in learning generative models for graphs. While impressive progress has been made on static graphs, work on generative modeling of temporal graphs is at a nascent stage with significant scope for improvement. First, existing generative models do not scale with either the time horizon or the number of nodes. Second, existing techniques are transductive in nature and th… ▽ More There has been a recent surge in learning generative models for graphs. While impressive progress has been made on static graphs, work on generative modeling of temporal graphs is at a nascent stage with significant scope for improvement. First, existing generative models do not scale with either the time horizon or the number of nodes. Second, existing techniques are transductive in nature and thus do not facilitate knowledge transfer. Finally, due to relying on one-to-one node mapping from source to the generated graph, existing models leak node identity information and do not allow up-scaling/down-scaling the source graph size. In this paper, we bridge these gaps with a novel generative model called TIGGER. TIGGER derives its power through a combination of temporal point processes with auto-regressive modeling enabling both transductive and inductive variants. Through extensive experiments on real datasets, we establish TIGGER generates graphs of superior fidelity, while also being up to 3 orders of magnitude faster than the state-of-the-art. △ Less

Submitted 8 March, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: To be published in AAAI-2022, additionally contains technical appendices/supplementary material

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 36(6), 6819-6828,2022

arXiv:2112.01845 [pdf, other]

Semantic Map Injected GAN Training for Image-to-Image Translation

Authors: Balaram Singh Kshatriya, Shiv Ram Dubey, Himangshu Sarma, Kunal Chaudhary, Meva Ram Gurjar, Rahul Rai, Sunny Manchanda

Abstract: Image-to-image translation is the recent trend to transform images from one domain to another domain using generative adversarial network (GAN). The existing GAN models perform the training by only utilizing the input and output modalities of transformation. In this paper, we perform the semantic injected training of GAN models. Specifically, we train with original input and output modalities and… ▽ More Image-to-image translation is the recent trend to transform images from one domain to another domain using generative adversarial network (GAN). The existing GAN models perform the training by only utilizing the input and output modalities of transformation. In this paper, we perform the semantic injected training of GAN models. Specifically, we train with original input and output modalities and inject a few epochs of training for translation from input to semantic map. Lets refer the original training as the training for the translation of input image into target domain. The injection of semantic training in the original training improves the generalization capability of the trained GAN model. Moreover, it also preserves the categorical information in a better way in the generated image. The semantic map is only utilized at the training time and is not required at the test time. The experiments are performed using state-of-the-art GAN models over CityScapes and RGB-NIR stereo datasets. We observe the improved performance in terms of the SSIM, FID and KID scores after injecting semantic training as compared to original training. △ Less

Submitted 3 December, 2021; originally announced December 2021.

Comments: Accepted in Fourth Workshop on Computer Vision Applications (WCVA) at ICVGIP 2021

arXiv:2105.00644 [pdf, other]

Schema-Aware Deep Graph Convolutional Networks for Heterogeneous Graphs

Authors: Saurav Manchanda, Da Zheng, George Karypis

Abstract: Graph convolutional network (GCN) based approaches have achieved significant progress for solving complex, graph-structured problems. GCNs incorporate the graph structure information and the node (or edge) features through message passing and computes 'deep' node representations. Despite significant progress in the field, designing GCN architectures for heterogeneous graphs still remains an open c… ▽ More Graph convolutional network (GCN) based approaches have achieved significant progress for solving complex, graph-structured problems. GCNs incorporate the graph structure information and the node (or edge) features through message passing and computes 'deep' node representations. Despite significant progress in the field, designing GCN architectures for heterogeneous graphs still remains an open challenge. Due to the schema of a heterogeneous graph, useful information may reside multiple hops away. A key question is how to perform message passing to incorporate information of neighbors multiple hops away while avoiding the well-known over-smoothing problem in GCNs. To address this question, we propose our GCN framework 'Deep Heterogeneous Graph Convolutional Network (DHGCN)', which takes advantage of the schema of a heterogeneous graph and uses a hierarchical approach to effectively utilize information many hops away. It first computes representations of the target nodes based on their 'schema-derived ego-network' (SEN). It then links the nodes of the same type with various pre-defined metapaths and performs message passing along these links to compute final node representations. Our design choices naturally capture the way a heterogeneous graph is generated from the schema. The experimental results on real and synthetic datasets corroborate the design choice and illustrate the performance gains relative to competing alternatives. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2012.08134 [pdf, other]

Distant-Supervised Slot-Filling for E-Commerce Queries

Authors: Saurav Manchanda, Mohit Sharma, George Karypis

Abstract: Slot-filling refers to the task of annotating individual terms in a query with the corresponding intended product characteristics (product type, brand, gender, size, color, etc.). These characteristics can then be used by a search engine to return results that better match the query's product intent. Traditional methods for slot-filling require the availability of training data with ground truth s… ▽ More Slot-filling refers to the task of annotating individual terms in a query with the corresponding intended product characteristics (product type, brand, gender, size, color, etc.). These characteristics can then be used by a search engine to return results that better match the query's product intent. Traditional methods for slot-filling require the availability of training data with ground truth slot-annotation information. However, generating such labeled data, especially in e-commerce is expensive and time-consuming because the number of slots increases as new products are added. In this paper, we present distant-supervised probabilistic generative models, that require no manual annotation. The proposed approaches leverage the readily available historical query logs and the purchases that these queries led to, and also exploit co-occurrence information among the slots in order to identify intended product characteristics. We evaluate our approaches by considering how they affect retrieval performance, as well as how well they classify the slots. In terms of retrieval, our approaches achieve better ranking performance (up to 156%) over Okapi BM25. Moreover, our approach that leverages co-occurrence information leads to better performance than the one that does not on both the retrieval and slot classification tasks. △ Less

Submitted 15 December, 2020; originally announced December 2020.

arXiv:2003.11774 [pdf, other]

Image Generation Via Minimizing Fréchet Distance in Discriminator Feature Space

Authors: Khoa D. Doan, Saurav Manchanda, Fengjiao Wang, Sathiya Keerthi, Avradeep Bhowmik, Chandan K. Reddy

Abstract: For a given image generation problem, the intrinsic image manifold is often low dimensional. We use the intuition that it is much better to train the GAN generator by minimizing the distributional distance between real and generated images in a small dimensional feature space representing such a manifold than on the original pixel-space. We use the feature space of the GAN discriminator for such a… ▽ More For a given image generation problem, the intrinsic image manifold is often low dimensional. We use the intuition that it is much better to train the GAN generator by minimizing the distributional distance between real and generated images in a small dimensional feature space representing such a manifold than on the original pixel-space. We use the feature space of the GAN discriminator for such a representation. For distributional distance, we employ one of two choices: the Fréchet distance or direct optimal transport (OT); these respectively lead us to two new GAN methods: Fréchet-GAN and OT-GAN. The idea of employing Fréchet distance comes from the success of Fréchet Inception Distance as a solid evaluation metric in image generation. Fréchet-GAN is attractive in several ways. We propose an efficient, numerically stable approach to calculate the Fréchet distance and its gradient. The Fréchet distance estimation requires a significantly less computation time than OT; this allows Fréchet-GAN to use much larger mini-batch size in training than OT. More importantly, we conduct experiments on a number of benchmark datasets and show that Fréchet-GAN (in particular) and OT-GAN have significantly better image generation capabilities than the existing representative primal and dual GAN approaches based on the Wasserstein distance. △ Less

Submitted 30 March, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

arXiv:2003.01296 [pdf, other]

Regression via Implicit Models and Optimal Transport Cost Minimization

Authors: Saurav Manchanda, Khoa Doan, Pranjul Yadav, S. Sathiya Keerthi

Abstract: This paper addresses the classic problem of regression, which involves the inductive learning of a map, $y=f(x,z)$, $z$ denoting noise, $f:\mathbb{R}^n\times \mathbb{R}^k \rightarrow \mathbb{R}^m$. Recently, Conditional GAN (CGAN) has been applied for regression and has shown to be advantageous over the other standard approaches like Gaussian Process Regression, given its ability to implicitly mod… ▽ More This paper addresses the classic problem of regression, which involves the inductive learning of a map, $y=f(x,z)$, $z$ denoting noise, $f:\mathbb{R}^n\times \mathbb{R}^k \rightarrow \mathbb{R}^m$. Recently, Conditional GAN (CGAN) has been applied for regression and has shown to be advantageous over the other standard approaches like Gaussian Process Regression, given its ability to implicitly model complex noise forms. However, the current CGAN implementation for regression uses the classical generator-discriminator architecture with the minimax optimization approach, which is notorious for being difficult to train due to issues like training instability or failure to converge. In this paper, we take another step towards regression models that implicitly model the noise, and propose a solution which directly optimizes the optimal transport cost between the true probability distribution $p(y|x)$ and the estimated distribution $\hat{p}(y|x)$ and does not suffer from the issues associated with the minimax approach. On a variety of synthetic and real-world datasets, our proposed solution achieves state-of-the-art results. The code accompanying this paper is available at "https://github.com/gurdaspuriya/ot_regression". △ Less

Submitted 2 March, 2020; originally announced March 2020.

arXiv:2003.00134 [pdf, other]

Image Hashing by Minimizing Discrete Component-wise Wasserstein Distance

Authors: Khoa D. Doan, Saurav Manchanda, Sarkhan Badirli, Chandan K. Reddy

Abstract: Image hashing is one of the fundamental problems that demand both efficient and effective solutions for various practical scenarios. Adversarial autoencoders are shown to be able to implicitly learn a robust, locality-preserving hash function that generates balanced and high-quality hash codes. However, the existing adversarial hashing methods are inefficient to be employed for large-scale image r… ▽ More Image hashing is one of the fundamental problems that demand both efficient and effective solutions for various practical scenarios. Adversarial autoencoders are shown to be able to implicitly learn a robust, locality-preserving hash function that generates balanced and high-quality hash codes. However, the existing adversarial hashing methods are inefficient to be employed for large-scale image retrieval applications. Specifically, they require an exponential number of samples to be able to generate optimal hash codes and a significantly high computational cost to train. In this paper, we show that the high sample-complexity requirement often results in sub-optimal retrieval performance of the adversarial hashing methods. To address this challenge, we propose a new adversarial-autoencoder hashing approach that has a much lower sample requirement and computational cost. Specifically, by exploiting the desired properties of the hash function in the low-dimensional, discrete space, our method efficiently estimates a better variant of Wasserstein distance by averaging a set of easy-to-compute one-dimensional Wasserstein distances. The resulting hashing approach has an order-of-magnitude better sample complexity, thus better generalization property, compared to the other adversarial hashing methods. In addition, the computational cost is significantly reduced using our approach. We conduct experiments on several real-world datasets and show that the proposed method outperforms the competing hashing methods, achieving up to 10% improvement over the current state-of-the-art image hashing methods. The code accompanying this paper is available on Github (https://github.com/khoadoan/adversarial-hashing). △ Less

Submitted 25 May, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

arXiv:2002.02879 [pdf, other]

Targeted display advertising: the case of preferential attachment

Authors: Saurav Manchanda, Pranjul Yadav, Khoa Doan, S. Sathiya Keerthi

Abstract: An average adult is exposed to hundreds of digital advertisements daily (https://www.mediadynamicsinc.com/uploads/files/PR092214-Note-only-150-Ads-2mk.pdf), making the digital advertisement industry a classic example of a big-data-driven platform. As such, the ad-tech industry relies on historical engagement logs (clicks or purchases) to identify potentially interested users for the advertisement… ▽ More An average adult is exposed to hundreds of digital advertisements daily (https://www.mediadynamicsinc.com/uploads/files/PR092214-Note-only-150-Ads-2mk.pdf), making the digital advertisement industry a classic example of a big-data-driven platform. As such, the ad-tech industry relies on historical engagement logs (clicks or purchases) to identify potentially interested users for the advertisement campaign of a partner (a seller who wants to target users for its products). The number of advertisements that are shown for a partner, and hence the historical campaign data available for a partner depends upon the budget constraints of the partner. Thus, enough data can be collected for the high-budget partners to make accurate predictions, while this is not the case with the low-budget partners. This skewed distribution of the data leads to "preferential attachment" of the targeted display advertising platforms towards the high-budget partners. In this paper, we develop "domain-adaptation" approaches to address the challenge of predicting interested users for the partners with insufficient data, i.e., the tail partners. Specifically, we develop simple yet effective approaches that leverage the similarity among the partners to transfer information from the partners with sufficient data to cold-start partners, i.e., partners without any campaign data. Our approaches readily adapt to the new campaign data by incremental fine-tuning, and hence work at varying points of a campaign, and not just the cold-start. We present an experimental analysis on the historical logs of a major display advertising platform (https://www.criteo.com/). Specifically, we evaluate our approaches across 149 partners, at varying points of their campaigns. Experimental results show that the proposed approaches outperform the other "domain-adaptation" approaches at different time points of the campaigns. △ Less

Submitted 7 February, 2020; originally announced February 2020.

Comments: IEEE BigData 2019 paper

arXiv:2001.08169 [pdf, other]

AppStreamer: Reducing Storage Requirements of Mobile Games through Predictive Streaming

Authors: Nawanol Theera-Ampornpunt, Shikhar Suryavansh, Sameer Manchanda, Rajesh Panta, Kaustubh Joshi, Mostafa Ammar, Mung Chiang, Saurabh Bagchi

Abstract: Storage has become a constrained resource on smartphones. Gaming is a popular activity on mobile devices and the explosive growth in the number of games coupled with their growing size contributes to the storage crunch. Even where storage is plentiful, it takes a long time to download and install a heavy app before it can be launched. This paper presents AppStreamer, a novel technique for reducing… ▽ More Storage has become a constrained resource on smartphones. Gaming is a popular activity on mobile devices and the explosive growth in the number of games coupled with their growing size contributes to the storage crunch. Even where storage is plentiful, it takes a long time to download and install a heavy app before it can be launched. This paper presents AppStreamer, a novel technique for reducing the storage requirements or startup delay of mobile games, and heavy mobile apps in general. AppStreamer is based on the intuition that most apps do not need the entirety of its files (images, audio and video clips, etc.) at any one time. AppStreamer can, therefore, keep only a small part of the files on the device, akin to a "cache", and download the remainder from a cloud storage server or a nearby edge server when it predicts that the app will need them in the near future. AppStreamer continuously predicts file blocks for the near future as the user uses the app, and fetches them from the storage server before the user sees a stall due to missing resources. We implement AppStreamer at the Android file system layer. This ensures that the apps require no source code or modification, and the approach generalizes across apps. We evaluate AppStreamer using two popular games: Dead Effect 2, a 3D first-person shooter, and Fire Emblem Heroes, a 2D turn-based strategy role-playing game. Through a user study, 75% and 87% of the users respectively find that AppStreamer provides the same quality of user experience as the baseline where all files are stored on the device. AppStreamer cuts down the storage requirement by 87% for Dead Effect 2 and 86% for Fire Emblem Heroes. △ Less

Submitted 16 December, 2019; originally announced January 2020.

Comments: 12 pages; EWSN 2020

arXiv:2001.03386 [pdf]

SUPAID: A Rule mining based method for automatic rollout decision aid for supervisors in fleet management systems

Authors: Sahil Manchanda, Arun Rajkumar, Simarjot Kaur, Narayanan Unny

Abstract: The decision to rollout a vehicle is critical to fleet management companies as wrong decisions can lead to additional cost of maintenance and failures during journey. With the availability of large amount of data and advancement of machine learning techniques, the rollout decisions of a supervisor can be effectively automated and the mistakes in decisions made by the supervisor learnt. In this pap… ▽ More The decision to rollout a vehicle is critical to fleet management companies as wrong decisions can lead to additional cost of maintenance and failures during journey. With the availability of large amount of data and advancement of machine learning techniques, the rollout decisions of a supervisor can be effectively automated and the mistakes in decisions made by the supervisor learnt. In this paper, we propose a novel learning algorithm SUPAID which under a natural 'one-way efficiency' assumption on the supervisor, uses a rule mining approach to rank the vehicles based on their roll-out feasibility thus helping prevent the supervisor from makingerroneous decisions. Our experimental results on real data from a public transit agency from a city in U.S show that the proposed method SUPAID can result in significant cost savings. △ Less

Submitted 15 January, 2020; v1 submitted 10 January, 2020; originally announced January 2020.

arXiv:1911.11358 [pdf, other]

CAWA: An Attention-Network for Credit Attribution

Authors: Saurav Manchanda, George Karypis

Abstract: Credit attribution is the task of associating individual parts in a document with their most appropriate class labels. It is an important task with applications to information retrieval and text summarization. When labeled training data is available, traditional approaches for sequence tagging can be used for credit attribution. However, generating such labeled datasets is expensive and time-consu… ▽ More Credit attribution is the task of associating individual parts in a document with their most appropriate class labels. It is an important task with applications to information retrieval and text summarization. When labeled training data is available, traditional approaches for sequence tagging can be used for credit attribution. However, generating such labeled datasets is expensive and time-consuming. In this paper, we present "Credit Attribution With Attention (CAWA)", a neural-network-based approach, that instead of using sentence-level labeled data, uses the set of class labels that are associated with an entire document as a source of distant-supervision. CAWA combines an attention mechanism with a multilabel classifier into an end-to-end learning framework to perform credit attribution. CAWA labels the individual sentences from the input document using the resultant attention-weights. CAWA improves upon the state-of-the-art credit attribution approach by not constraining a sentence to belong to just one class, but modeling each sentence as a distribution over all classes, leading to better modeling of semantically-similar classes. Experiments on the credit attribution task on a variety of datasets show that the sentence class labels generated by CAWA outperform the competing approaches. Additionally, on the multilabel text classification task, CAWA performs better than the competing credit attribution approaches. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Comments: To appear in AAAI 2020

arXiv:1908.08564 [pdf, other]

Intent term selection and refinement in e-commerce queries

Authors: Saurav Manchanda, Mohit Sharma, George Karypis

Abstract: In e-commerce, a user tends to search for the desired product by issuing a query to the search engine and examining the retrieved results. If the search engine was successful in correctly understanding the user's query, it will return results that correspond to the products whose attributes match the terms in the query that are representative of the query's product intent. However, the search engi… ▽ More In e-commerce, a user tends to search for the desired product by issuing a query to the search engine and examining the retrieved results. If the search engine was successful in correctly understanding the user's query, it will return results that correspond to the products whose attributes match the terms in the query that are representative of the query's product intent. However, the search engine may fail to retrieve results that satisfy the query's product intent and thus degrading user experience due to different issues in query processing: (i) when multiple terms are present in a query it may fail to determine the relevant terms that are representative of the query's product intent, and (ii) it may suffer from vocabulary gap between the terms in the query and the product's description, i.e., terms used in the query are semantically similar but different from the terms in the product description. Hence, identifying the terms that describe the query's product intent and predicting additional terms that describe the query's product intent better than the existing query terms to the search engine is an essential task in e-commerce search. In this paper, we leverage the historical query reformulation logs of a major e-commerce retailer to develop distant-supervised approaches to solve both these problems. Our approaches exploit the fact that the significance of a term is dependent upon the context (other terms in the neighborhood) in which it is used in order to learn the importance of the term towards the query's product intent. We show that identifying and emphasizing the terms that define the query's product intent leads to a 3% improvement in ranking. Moreover, for the tasks of identifying the important terms in a query and for predicting the additional terms that represent product intent, experiments illustrate that our approaches outperform the non-contextual baselines. △ Less

Submitted 22 August, 2019; originally announced August 2019.

Comments: Extended version of paper "Intent term weighing in e-commerce queries" to appear in CIKM'19

arXiv:1904.06730 [pdf, other]

doi 10.1109/ICDM.2018.00154

Text segmentation on multilabel documents: A distant-supervised approach

Authors: Saurav Manchanda, George Karypis

Abstract: Segmenting text into semantically coherent segments is an important task with applications in information retrieval and text summarization. Developing accurate topical segmentation requires the availability of training data with ground truth information at the segment level. However, generating such labeled datasets, especially for applications in which the meaning of the labels is user-defined, i… ▽ More Segmenting text into semantically coherent segments is an important task with applications in information retrieval and text summarization. Developing accurate topical segmentation requires the availability of training data with ground truth information at the segment level. However, generating such labeled datasets, especially for applications in which the meaning of the labels is user-defined, is expensive and time-consuming. In this paper, we develop an approach that instead of using segment-level ground truth information, it instead uses the set of labels that are associated with a document and are easier to obtain as the training data essentially corresponds to a multilabel dataset. Our method, which can be thought of as an instance of distant supervision, improves upon the previous approaches by exploiting the fact that consecutive sentences in a document tend to talk about the same topic, and hence, probably belong to the same class. Experiments on the text segmentation task on a variety of datasets show that the segmentation produced by our method beats the competing approaches on four out of five datasets and performs at par on the fifth dataset. On the multilabel text classification task, our method performs at par with the competing approaches, while requiring significantly less time to estimate than the competing approaches. △ Less

Submitted 14 April, 2019; originally announced April 2019.

Comments: Accepted in 2018 IEEE International Conference on Data Mining (ICDM)

Journal ref: 2018 IEEE International Conference on Data Mining (ICDM), 1170-1175

arXiv:1904.06725 [pdf, other]

doi 10.1007/978-3-319-93037-4_27

Distributed representation of multi-sense words: A loss-driven approach

Authors: Saurav Manchanda, George Karypis

Abstract: Word2Vec's Skip Gram model is the current state-of-the-art approach for estimating the distributed representation of words. However, it assumes a single vector per word, which is not well-suited for representing words that have multiple senses. This work presents LDMI, a new model for estimating distributional representations of words. LDMI relies on the idea that, if a word carries multiple sense… ▽ More Word2Vec's Skip Gram model is the current state-of-the-art approach for estimating the distributed representation of words. However, it assumes a single vector per word, which is not well-suited for representing words that have multiple senses. This work presents LDMI, a new model for estimating distributional representations of words. LDMI relies on the idea that, if a word carries multiple senses, then having a different representation for each of its senses should lead to a lower loss associated with predicting its co-occurring words, as opposed to the case when a single vector representation is used for all the senses. After identifying the multi-sense words, LDMI clusters the occurrences of these words to assign a sense to each occurrence. Experiments on the contextual word similarity task show that LDMI leads to better performance than competing approaches. △ Less

Submitted 14 April, 2019; originally announced April 2019.

Comments: PAKDD 2018 Best paper award runner-up

Journal ref: Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science, vol 10938. Springer, Cham

arXiv:1903.03332 [pdf, other]

Learning Heuristics over Large Graphs via Deep Reinforcement Learning

Authors: Sahil Manchanda, Akash Mittal, Anuj Dhawan, Sourav Medya, Sayan Ranu, Ambuj Singh

Abstract: There has been an increased interest in discovering heuristics for combinatorial problems on graphs through machine learning. While existing techniques have primarily focused on obtaining high-quality solutions, scalability to billion-sized graphs has not been adequately addressed. In addition, the impact of budget-constraint, which is necessary for many practical scenarios, remains to be studied.… ▽ More There has been an increased interest in discovering heuristics for combinatorial problems on graphs through machine learning. While existing techniques have primarily focused on obtaining high-quality solutions, scalability to billion-sized graphs has not been adequately addressed. In addition, the impact of budget-constraint, which is necessary for many practical scenarios, remains to be studied. In this paper, we propose a framework called GCOMB to bridge these gaps. GCOMB trains a Graph Convolutional Network (GCN) using a novel probabilistic greedy mechanism to predict the quality of a node. To further facilitate the combinatorial nature of the problem, GCOMB utilizes a Q-learning framework, which is made efficient through importance sampling. We perform extensive experiments on real graphs to benchmark the efficiency and efficacy of GCOMB. Our results establish that GCOMB is 100 times faster and marginally better in quality than state-of-the-art algorithms for learning combinatorial algorithms. Additionally, a case-study on the practical combinatorial problem of Influence Maximization (IM) shows GCOMB is 150 times faster than the specialized IM algorithm IMM with similar quality. △ Less

Submitted 3 December, 2020; v1 submitted 8 March, 2019; originally announced March 2019.

Comments: To appear in NeurIPS 2020 https://papers.nips.cc/paper/2020/hash/e7532dbeff7ef901f2e70daacb3f452d-Abstract.html

arXiv:1705.05183 [pdf, ps, other]

Representation learning of drug and disease terms for drug repositioning

Authors: Sahil Manchanda, Ashish Anand

Abstract: Drug repositioning (DR) refers to identification of novel indications for the approved drugs. The requirement of huge investment of time as well as money and risk of failure in clinical trials have led to surge in interest in drug repositioning. DR exploits two major aspects associated with drugs and diseases: existence of similarity among drugs and among diseases due to their shared involved gene… ▽ More Drug repositioning (DR) refers to identification of novel indications for the approved drugs. The requirement of huge investment of time as well as money and risk of failure in clinical trials have led to surge in interest in drug repositioning. DR exploits two major aspects associated with drugs and diseases: existence of similarity among drugs and among diseases due to their shared involved genes or pathways or common biological effects. Existing methods of identifying drug-disease association majorly rely on the information available in the structured databases only. On the other hand, abundant information available in form of free texts in biomedical research articles are not being fully exploited. Word-embedding or obtaining vector representation of words from a large corpora of free texts using neural network methods have been shown to give significant performance for several natural language processing tasks. In this work we propose a novel way of representation learning to obtain features of drugs and diseases by combining complementary information available in unstructured texts and structured datasets. Next we use matrix completion approach on these feature vectors to learn projection matrix between drug and disease vector spaces. The proposed method has shown competitive performance with state-of-the-art methods. Further, the case studies on Alzheimer's and Hypertension diseases have shown that the predicted associations are matching with the existing knowledge. △ Less

Submitted 20 May, 2017; v1 submitted 15 May, 2017; originally announced May 2017.

Comments: Accepted to appear in 3rd IEEE International Conference on Cybernetics (Spl Session: Deep Learning for Prediction and Estimation)

Showing 1–31 of 31 results for author: Manchanda, S