Search | arXiv e-print repository

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Authors: Jiachen Li, Xinyao Wang, Sijie Zhu, Chia-Wen Kuo, Lu Xu, Fan Chen, Jitesh Jain, Humphrey Shi, Longyin Wen

Abstract: Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks. However, these scaling approaches are computationally expensive and overlook the significance of improving model capabilities from the vision side. Inspired by the successful applications of Mixture-of-Exp… ▽ More Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks. However, these scaling approaches are computationally expensive and overlook the significance of improving model capabilities from the vision side. Inspired by the successful applications of Mixture-of-Experts (MoE) in LLMs, which improves model scalability during training while keeping inference costs similar to those of smaller models, we propose CuMo. CuMo incorporates Co-upcycled Top-K sparsely-gated Mixture-of-experts blocks into both the vision encoder and the MLP connector, thereby enhancing the multimodal LLMs with minimal additional activated parameters during inference. CuMo first pre-trains the MLP blocks and then initializes each expert in the MoE block from the pre-trained MLP block during the visual instruction tuning stage. Auxiliary losses are used to ensure a balanced loading of experts. CuMo outperforms state-of-the-art multimodal LLMs across various VQA and visual-instruction-following benchmarks using models within each model size group, all while training exclusively on open-sourced datasets. The code and model weights for CuMo are open-sourced at https://github.com/SHI-Labs/CuMo. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2404.07467 [pdf, other]

Trashbusters: Deep Learning Approach for Litter Detection and Tracking

Authors: Kashish Jain, Manthan Juthani, Jash Jain, Anant V. Nimkar

Abstract: The illegal disposal of trash is a major public health and environmental concern. Disposing of trash in unplanned places poses serious health and environmental risks. We should try to restrict public trash cans as much as possible. This research focuses on automating the penalization of litterbugs, addressing the persistent problem of littering in public places. Traditional approaches relying on m… ▽ More The illegal disposal of trash is a major public health and environmental concern. Disposing of trash in unplanned places poses serious health and environmental risks. We should try to restrict public trash cans as much as possible. This research focuses on automating the penalization of litterbugs, addressing the persistent problem of littering in public places. Traditional approaches relying on manual intervention and witness reporting suffer from delays, inaccuracies, and anonymity issues. To overcome these challenges, this paper proposes a fully automated system that utilizes surveillance cameras and advanced computer vision algorithms for litter detection, object tracking, and face recognition. The system accurately identifies and tracks individuals engaged in littering activities, attaches their identities through face recognition, and enables efficient enforcement of anti-littering policies. By reducing reliance on manual intervention, minimizing human error, and providing prompt identification, the proposed system offers significant advantages in addressing littering incidents. The primary contribution of this research lies in the implementation of the proposed system, leveraging advanced technologies to enhance surveillance operations and automate the penalization of litterbugs. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.18819 [pdf, other]

Benchmarking Object Detectors with COCO: A New Path Forward

Authors: Shweta Singh, Aayan Yadav, Jitesh Jain, Humphrey Shi, Justin Johnson, Karan Desai

Abstract: The Common Objects in Context (COCO) dataset has been instrumental in benchmarking object detectors over the past decade. Like every dataset, COCO contains subtle errors and imperfections stemming from its annotation procedure. With the advent of high-performing models, we ask whether these errors of COCO are hindering its utility in reliably benchmarking further progress. In search for an answer,… ▽ More The Common Objects in Context (COCO) dataset has been instrumental in benchmarking object detectors over the past decade. Like every dataset, COCO contains subtle errors and imperfections stemming from its annotation procedure. With the advent of high-performing models, we ask whether these errors of COCO are hindering its utility in reliably benchmarking further progress. In search for an answer, we inspect thousands of masks from COCO (2017 version) and uncover different types of errors such as imprecise mask boundaries, non-exhaustively annotated instances, and mislabeled masks. Due to the prevalence of COCO, we choose to correct these errors to maintain continuity with prior research. We develop COCO-ReM (Refined Masks), a cleaner set of annotations with visibly better mask quality than COCO-2017. We evaluate fifty object detectors and find that models that predict visually sharper masks score higher on COCO-ReM, affirming that they were being incorrectly penalized due to errors in COCO-2017. Moreover, our models trained using COCO-ReM converge faster and score higher than their larger variants trained using COCO-2017, highlighting the importance of data quality in improving object detectors. With these findings, we advocate using COCO-ReM for future object detection research. Our dataset is available at https://cocorem.xyz △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Technical report. Dataset website: https://cocorem.xyz and code: https://github.com/kdexd/coco-rem

arXiv:2312.14233 [pdf, other]

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Authors: Jitesh Jain, Jianwei Yang, Humphrey Shi

Abstract: Humans possess the remarkable skill of Visual Perception, the ability to see and understand the seen, helping them make sense of the visual world and, in turn, reason. Multimodal Large Language Models (MLLM) have recently achieved impressive performance on vision-language tasks ranging from visual question-answering and image captioning to visual reasoning and image generation. However, when promp… ▽ More Humans possess the remarkable skill of Visual Perception, the ability to see and understand the seen, helping them make sense of the visual world and, in turn, reason. Multimodal Large Language Models (MLLM) have recently achieved impressive performance on vision-language tasks ranging from visual question-answering and image captioning to visual reasoning and image generation. However, when prompted to identify or count (perceive) the entities in a given image, existing MLLM systems fail. Working towards developing an accurate MLLM system for perception and reasoning, we propose using Versatile vision enCoders (VCoder) as perception eyes for Multimodal LLMs. We feed the VCoder with perception modalities such as segmentation or depth maps, improving the MLLM's perception abilities. Secondly, we leverage the images from COCO and outputs from off-the-shelf vision perception models to create our COCO Segmentation Text (COST) dataset for training and evaluating MLLMs on the object perception task. Thirdly, we introduce metrics to assess the object perception abilities in MLLMs on our COST dataset. Lastly, we provide extensive experimental evidence proving the VCoder's improved object-level perception skills over existing Multimodal LLMs, including GPT-4V. We open-source our dataset, code, and models to promote research. We open-source our code at https://github.com/SHI-Labs/VCoder △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Project Page: https://praeclarumjj3.github.io/vcoder/

arXiv:2306.05399 [pdf, other]

Matting Anything

Authors: Jiachen Li, Jitesh Jain, Humphrey Shi

Abstract: In this paper, we propose the Matting Anything Model (MAM), an efficient and versatile framework for estimating the alpha matte of any instance in an image with flexible and interactive visual or linguistic user prompt guidance. MAM offers several significant advantages over previous specialized image matting networks: (i) MAM is capable of dealing with various types of image matting, including se… ▽ More In this paper, we propose the Matting Anything Model (MAM), an efficient and versatile framework for estimating the alpha matte of any instance in an image with flexible and interactive visual or linguistic user prompt guidance. MAM offers several significant advantages over previous specialized image matting networks: (i) MAM is capable of dealing with various types of image matting, including semantic, instance, and referring image matting with only a single model; (ii) MAM leverages the feature maps from the Segment Anything Model (SAM) and adopts a lightweight Mask-to-Matte (M2M) module to predict the alpha matte through iterative refinement, which has only 2.7 million trainable parameters. (iii) By incorporating SAM, MAM simplifies the user intervention required for the interactive use of image matting from the trimap to the box, point, or text prompt. We evaluate the performance of MAM on various image matting benchmarks, and the experimental results demonstrate that MAM achieves comparable performance to the state-of-the-art specialized image matting models under different metrics on each benchmark. Overall, MAM shows superior generalization ability and can effectively handle various image matting tasks with fewer parameters, making it a practical solution for unified image matting. Our code and models are open-sourced at https://github.com/SHI-Labs/Matting-Anything. △ Less

Submitted 16 November, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: Project web-page: https://chrisjuniorli.github.io/project/Matting-Anything/

arXiv:2211.06220 [pdf, other]

OneFormer: One Transformer to Rule Universal Image Segmentation

Authors: Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi

Abstract: Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation in the last decades include scene parsing, panoptic segmentation, and, more recently, new panoptic architectures. However, such panoptic architectures do not truly unify image segmentation because they need to be trained individually on the semantic, instance, or panoptic segmentation to achieve the best p… ▽ More Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation in the last decades include scene parsing, panoptic segmentation, and, more recently, new panoptic architectures. However, such panoptic architectures do not truly unify image segmentation because they need to be trained individually on the semantic, instance, or panoptic segmentation to achieve the best performance. Ideally, a truly universal framework should be trained only once and achieve SOTA performance across all three image segmentation tasks. To that end, we propose OneFormer, a universal image segmentation framework that unifies segmentation with a multi-task train-once design. We first propose a task-conditioned joint training strategy that enables training on ground truths of each domain (semantic, instance, and panoptic segmentation) within a single multi-task training process. Secondly, we introduce a task token to condition our model on the task at hand, making our model task-dynamic to support multi-task training and inference. Thirdly, we propose using a query-text contrastive loss during training to establish better inter-task and inter-class distinctions. Notably, our single OneFormer model outperforms specialized Mask2Former models across all three segmentation tasks on ADE20k, CityScapes, and COCO, despite the latter being trained on each of the three tasks individually with three times the resources. With new ConvNeXt and DiNAT backbones, we observe even more performance improvement. We believe OneFormer is a significant step towards making image segmentation more universal and accessible. To support further research, we open-source our code and models at https://github.com/SHI-Labs/OneFormer △ Less

Submitted 26 December, 2022; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: Project Page: https://praeclarumjj3.github.io/oneformer

arXiv:2208.03382 [pdf, other]

Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand

Authors: Jitesh Jain, Yuqian Zhou, Ning Yu, Humphrey Shi

Abstract: Deep image inpainting has made impressive progress with recent advances in image generation and processing algorithms. We claim that the performance of inpainting algorithms can be better judged by the generated structures and textures. Structures refer to the generated object boundary or novel geometric structures within the hole, while texture refers to high-frequency details, especially man-mad… ▽ More Deep image inpainting has made impressive progress with recent advances in image generation and processing algorithms. We claim that the performance of inpainting algorithms can be better judged by the generated structures and textures. Structures refer to the generated object boundary or novel geometric structures within the hole, while texture refers to high-frequency details, especially man-made repeating patterns filled inside the structural regions. We believe that better structures are usually obtained from a coarse-to-fine GAN-based generator network while repeating patterns nowadays can be better modeled using state-of-the-art high-frequency fast fourier convolutional layers. In this paper, we propose a novel inpainting network combining the advantages of the two designs. Therefore, our model achieves a remarkable visual quality to match state-of-the-art performance in both structure generation and repeating texture synthesis using a single network. Extensive experiments demonstrate the effectiveness of the method, and our conclusions further highlight the two critical factors of image inpainting quality, structures, and textures, as the future design directions of inpainting networks. △ Less

Submitted 2 December, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

Comments: Project page at https://praeclarumjj3.github.io/fcf-inpainting/

arXiv:2206.13395 [pdf, other]

Gait Cycle Reconstruction and Human Identification from Occluded Sequences

Authors: Abhishek Paul, Manav Mukesh Jain, Jinesh Jain, Pratik Chattopadhyay

Abstract: Gait-based person identification from videos captured at surveillance sites using Computer Vision-based techniques is quite challenging since these walking sequences are usually corrupted with occlusion, and a complete cycle of gait is not always available. In this work, we propose an effective neural network-based model to reconstruct the occluded frames in an input sequence before carrying out g… ▽ More Gait-based person identification from videos captured at surveillance sites using Computer Vision-based techniques is quite challenging since these walking sequences are usually corrupted with occlusion, and a complete cycle of gait is not always available. In this work, we propose an effective neural network-based model to reconstruct the occluded frames in an input sequence before carrying out gait recognition. Specifically, we employ LSTM networks to predict an embedding for each occluded frame both from the forward and the backward directions, and next fuse the predictions from the two LSTMs by employing a network of residual blocks and convolutional layers. While the LSTMs are trained to minimize the mean-squared loss, the fusion network is trained to optimize the pixel-wise cross-entropy loss between the ground-truth and the reconstructed samples. Evaluation of our approach has been done using synthetically occluded sequences generated from the OU-ISIR LP and CASIA-B data and real-occluded sequences present in the TUM-IITKGP data. The effectiveness of the proposed reconstruction model has been verified through the Dice score and gait-based recognition accuracy using some popular gait recognition methods. Comparative study with existing occlusion handling methods in gait recognition highlights the superiority of our proposed occlusion reconstruction approach over the others. △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2201.07788 [pdf, other]

ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes

Authors: Rahul Sajnani, Adrien Poulenard, Jivitesh Jain, Radhika Dua, Leonidas J. Guibas, Srinath Sridhar

Abstract: Progress in 3D object understanding has relied on manually canonicalized shape datasets that contain instances with consistent position and orientation (3D pose). This has made it hard to generalize these methods to in-the-wild shapes, eg., from internet model collections or depth sensors. ConDor is a self-supervised method that learns to Canonicalize the 3D orientation and position for full and p… ▽ More Progress in 3D object understanding has relied on manually canonicalized shape datasets that contain instances with consistent position and orientation (3D pose). This has made it hard to generalize these methods to in-the-wild shapes, eg., from internet model collections or depth sensors. ConDor is a self-supervised method that learns to Canonicalize the 3D orientation and position for full and partial 3D point clouds. We build on top of Tensor Field Networks (TFNs), a class of permutation- and rotation-equivariant, and translation-invariant 3D networks. During inference, our method takes an unseen full or partial 3D point cloud at an arbitrary pose and outputs an equivariant canonical pose. During training, this network uses self-supervision losses to learn the canonical pose from an un-canonicalized collection of full and partial 3D point clouds. ConDor can also learn to consistently co-segment object parts without any supervision. Extensive quantitative results on four new metrics show that our approach outperforms existing methods while enabling new applications such as operation on depth images and annotation transfer. △ Less

Submitted 14 April, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

Comments: Accepted to CVPR 2022, New Orleans, Louisiana. For project page and code, see https://ivl.cs.brown.edu/ConDor/

arXiv:2112.12782 [pdf, other]

SeMask: Semantically Masked Transformers for Semantic Segmentation

Authors: Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi

Abstract: Finetuning a pretrained backbone in the encoder part of an image transformer network has been the traditional approach for the semantic segmentation task. However, such an approach leaves out the semantic context that an image provides during the encoding stage. This paper argues that incorporating semantic information of the image into pretrained hierarchical transformer-based backbones while fin… ▽ More Finetuning a pretrained backbone in the encoder part of an image transformer network has been the traditional approach for the semantic segmentation task. However, such an approach leaves out the semantic context that an image provides during the encoding stage. This paper argues that incorporating semantic information of the image into pretrained hierarchical transformer-based backbones while finetuning improves the performance considerably. To achieve this, we propose SeMask, a simple and effective framework that incorporates semantic information into the encoder with the help of a semantic attention operation. In addition, we use a lightweight semantic decoder during training to provide supervision to the intermediate semantic prior maps at every stage. Our experiments demonstrate that incorporating semantic priors enhances the performance of the established hierarchical encoders with a slight increase in the number of FLOPs. We provide empirical proof by integrating SeMask into Swin Transformer and Mix Transformer backbones as our encoder paired with different decoders. Our framework achieves a new state-of-the-art of 58.25% mIoU on the ADE20K dataset and improvements of over 3% in the mIoU metric on the Cityscapes dataset. The code and checkpoints are publicly available at https://github.com/Picsart-AI-Research/SeMask-Segmentation . △ Less

Submitted 13 April, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

Comments: Updated experiments with Mix-Transformer (MiT) on ADE20K and added an analysis section

arXiv:2110.12780 [pdf, other]

Battling Hateful Content in Indic Languages HASOC '21

Authors: Aditya Kadam, Anmol Goel, Jivitesh Jain, Jushaan Singh Kalra, Mallika Subramanian, Manvith Reddy, Prashant Kodali, T. H. Arjun, Manish Shrivastava, Ponnurangam Kumaraguru

Abstract: The extensive rise in consumption of online social media (OSMs) by a large number of people poses a critical problem of curbing the spread of hateful content on these platforms. With the growing usage of OSMs in multiple languages, the task of detecting and characterizing hate becomes more complex. The subtle variations of code-mixed texts along with switching scripts only add to the complexity. T… ▽ More The extensive rise in consumption of online social media (OSMs) by a large number of people poses a critical problem of curbing the spread of hateful content on these platforms. With the growing usage of OSMs in multiple languages, the task of detecting and characterizing hate becomes more complex. The subtle variations of code-mixed texts along with switching scripts only add to the complexity. This paper presents a solution for the HASOC 2021 Multilingual Twitter Hate-Speech Detection challenge by team PreCog IIIT Hyderabad. We adopt a multilingual transformer based approach and describe our architecture for all 6 subtasks as part of the challenge. Out of the 6 teams that participated in all the subtasks, our submissions rank 3rd overall. △ Less

Submitted 5 November, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: 12 pages, 6 figures, 2 tables, Accepted at FIRE 2021, CEUR Workshop Proceedings (http://fire.irsi.res.in/fire/2021/home)

arXiv:2103.13239 [pdf, other]

doi 10.1145/3487351.3488354

What's Kooking? Characterizing India's Emerging Social Network, Koo

Authors: Asmit Kumar Singh, Chirag Jain, Jivitesh Jain, Rishi Raj Jain, Shradha Sehgal, Tanisha Pandey, Ponnurangam Kumaraguru

Abstract: Social media has grown exponentially in a short period, coming to the forefront of communications and online interactions. Despite their rapid growth, social media platforms have been unable to scale to different languages globally and remain inaccessible to many. In this paper, we characterize Koo, a multilingual micro-blogging site that rose in popularity in 2021, as an Indian alternative to Twi… ▽ More Social media has grown exponentially in a short period, coming to the forefront of communications and online interactions. Despite their rapid growth, social media platforms have been unable to scale to different languages globally and remain inaccessible to many. In this paper, we characterize Koo, a multilingual micro-blogging site that rose in popularity in 2021, as an Indian alternative to Twitter. We collected a dataset of 4.07 million users, 163.12 million follower-following relationships, and their content and activity across 12 languages. We study the user demographic along the lines of language, location, gender, and profession. The prominent presence of Indian languages in the discourse on Koo indicates the platform's success in promoting regional languages. We observe Koo's follower-following network to be much denser than Twitter's, comprising of closely-knit linguistic communities. An N-gram analysis of posts on Koo shows a #KooVsTwitter rhetoric, revealing the debate comparing the two platforms. Our characterization highlights the dynamics of the multilingual social network and its diverse Indian user base. △ Less

Submitted 20 January, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

Comments: 9 pages, 11 figures, 6 tables

arXiv:2101.06914 [pdf, other]

Capitol (Pat)riots: A comparative study of Twitter and Parler

Authors: Hitkul, Avinash Prabhu, Dipanwita Guhathakurta, Jivitesh jain, Mallika Subramanian, Manvith Reddy, Shradha Sehgal, Tanvi Karandikar, Amogh Gulati, Udit Arora, Rajiv Ratn Shah, Ponnurangam Kumaraguru

Abstract: On 6 January 2021, a mob of right-wing conservatives stormed the USA Capitol Hill interrupting the session of congress certifying 2020 Presidential election results. Immediately after the start of the event, posts related to the riots started to trend on social media. A social media platform which stood out was a free speech endorsing social media platform Parler; it is being claimed as the platfo… ▽ More On 6 January 2021, a mob of right-wing conservatives stormed the USA Capitol Hill interrupting the session of congress certifying 2020 Presidential election results. Immediately after the start of the event, posts related to the riots started to trend on social media. A social media platform which stood out was a free speech endorsing social media platform Parler; it is being claimed as the platform on which the riots were planned and talked about. Our report presents a contrast between the trending content on Parler and Twitter around the time of riots. We collected data from both platforms based on the trending hashtags and draw comparisons based on what are the topics being talked about, who are the people active on the platforms and how organic is the content generated on the two platforms. While the content trending on Twitter had strong resentments towards the event and called for action against rioters and inciters, Parler content had a strong conservative narrative echoing the ideas of voter fraud similar to the attacking mob. We also find a disproportionately high manipulation of traffic on Parler when compared to Twitter. △ Less

Submitted 18 January, 2021; originally announced January 2021.

arXiv:2009.09206 [pdf, other]

DEAP Cache: Deep Eviction Admission and Prefetching for Cache

Authors: Ayush Mangal, Jitesh Jain, Keerat Kaur Guliani, Omkar Bhalerao

Abstract: Recent approaches for learning policies to improve caching, target just one out of the prefetching, admission and eviction processes. In contrast, we propose an end to end pipeline to learn all three policies using machine learning. We also take inspiration from the success of pretraining on large corpora to learn specialized embeddings for the task. We model prefetching as a sequence prediction t… ▽ More Recent approaches for learning policies to improve caching, target just one out of the prefetching, admission and eviction processes. In contrast, we propose an end to end pipeline to learn all three policies using machine learning. We also take inspiration from the success of pretraining on large corpora to learn specialized embeddings for the task. We model prefetching as a sequence prediction task based on past misses. Following previous works suggesting that frequency and recency are the two orthogonal fundamental attributes for caching, we use an online reinforcement learning technique to learn the optimal policy distribution between two orthogonal eviction strategies based on them. While previous approaches used the past as an indicator of the future, we instead explicitly model the future frequency and recency in a multi-task fashion with prefetching, leveraging the abilities of deep networks to capture futuristic trends and use them for learning eviction and admission. We also model the distribution of the data in an online fashion using Kernel Density Estimation in our approach, to deal with the problem of caching non-stationary data. We present our approach as a "proof of concept" of learning all three components of cache strategies using machine learning and leave improving practical deployment for future work. △ Less

Submitted 19 September, 2020; originally announced September 2020.

arXiv:1808.09964 [pdf, other]

Semi-Metrification of the Dynamic Time Warping Distance

Authors: Brijnesh J. Jain

Abstract: The dynamic time warping (dtw) distance fails to satisfy the triangle inequality and the identity of indiscernibles. As a consequence, the dtw-distance is not warping-invariant, which in turn results in peculiarities in data mining applications. This article converts the dtw-distance to a semi-metric and shows that its canonical extension is warping-invariant. Empirical results indicate that the n… ▽ More The dynamic time warping (dtw) distance fails to satisfy the triangle inequality and the identity of indiscernibles. As a consequence, the dtw-distance is not warping-invariant, which in turn results in peculiarities in data mining applications. This article converts the dtw-distance to a semi-metric and shows that its canonical extension is warping-invariant. Empirical results indicate that the nearest-neighbor classifier in the proposed semi-metric space performs comparably to the same classifier in the standard dtw-space. To overcome the undesirable peculiarities of dtw-spaces, this result suggests to further explore the semi-metric space for data mining applications. △ Less

Submitted 2 September, 2018; v1 submitted 29 August, 2018; originally announced August 2018.

arXiv:1711.09156 [pdf, other]

Warped-Linear Models for Time Series Classification

Authors: Brijnesh J. Jain

Abstract: This article proposes and studies warped-linear models for time series classification. The proposed models are time-warp invariant analogues of linear models. Their construction is in line with time series averaging and extensions of k-means and learning vector quantization to dynamic time warping (DTW) spaces. The main theoretical result is that warped-linear models correspond to polyhedral class… ▽ More This article proposes and studies warped-linear models for time series classification. The proposed models are time-warp invariant analogues of linear models. Their construction is in line with time series averaging and extensions of k-means and learning vector quantization to dynamic time warping (DTW) spaces. The main theoretical result is that warped-linear models correspond to polyhedral classifiers in Euclidean spaces. This result simplifies the analysis of time-warp invariant models by reducing to max-linear functions. We exploit this relationship and derive solutions to the label-dependency problem and the problem of learning warped-linear models. Empirical results on time series classification suggest that warped-linear functions better trade solution quality against computation time than nearest-neighbor and prototype-based methods. △ Less

Submitted 24 November, 2017; originally announced November 2017.

arXiv:1706.06932 [pdf, other]

doi 10.1007/978-3-319-66402-6_15

WebPol: Fine-grained Information Flow Policies for Web Browsers

Authors: Abhishek Bichhawat, Vineet Rajani, Jinank Jain, Deepak Garg, Christian Hammer

Abstract: In the standard web browser programming model, third-party scripts included in an application execute with the same privilege as the application's own code. This leaves the application's confidential data vulnerable to theft and leakage by malicious code and inadvertent bugs in the third-party scripts. Security mechanisms in modern browsers (the same-origin policy, cross-origin resource sharing an… ▽ More In the standard web browser programming model, third-party scripts included in an application execute with the same privilege as the application's own code. This leaves the application's confidential data vulnerable to theft and leakage by malicious code and inadvertent bugs in the third-party scripts. Security mechanisms in modern browsers (the same-origin policy, cross-origin resource sharing and content security policies) are too coarse to suit this programming model. All these mechanisms (and their extensions) describe whether or not a script can access certain data, whereas the meaningful requirement is to allow untrusted scripts access to confidential data that they need and to prevent the scripts from leaking data on the side. Motivated by this gap, we propose WebPol, a policy mechanism that allows a website developer to include fine-grained policies on confidential application data in the familiar syntax of the JavaScript programming language. The policies can be associated with any webpage element, and specify what aspects of the element can be accessed by which third-party domains. A script can access data that the policy allows it to, but it cannot pass the data (or data derived from it) to other scripts or remote hosts in contravention of the policy. To specify the policies, we expose a small set of new native APIs in JavaScript. Our policies can be enforced using any of the numerous existing proposals for information flow tracking in web browsers. We have integrated our policies into one such proposal that we use to evaluate performance overheads and to test our examples. △ Less

Submitted 26 June, 2017; v1 submitted 21 June, 2017; originally announced June 2017.

Comments: ESORICS '17

arXiv:1705.05681 [pdf, other]

Optimal Warping Paths are unique for almost every Pair of Time Series

Authors: Brijnesh J. Jain, David Schultz

Abstract: Update rules for learning in dynamic time warping spaces are based on optimal warping paths between parameter and input time series. In general, optimal warping paths are not unique resulting in adverse effects in theory and practice. Under the assumption of squared error local costs, we show that no two warping paths have identical costs almost everywhere in a measure-theoretic sense. Two direct… ▽ More Update rules for learning in dynamic time warping spaces are based on optimal warping paths between parameter and input time series. In general, optimal warping paths are not unique resulting in adverse effects in theory and practice. Under the assumption of squared error local costs, we show that no two warping paths have identical costs almost everywhere in a measure-theoretic sense. Two direct consequences of this result are: (i) optimal warping paths are unique almost everywhere, and (ii) the set of all pairs of time series with multiple equal-cost warping paths coincides with the union of exponentially many zero sets of quadratic forms. One implication of the proposed results is that typical distance-based cost functions such as the k-means objective are differentiable almost everywhere and can be minimized by subgradient methods. △ Less

Submitted 2 March, 2018; v1 submitted 16 May, 2017; originally announced May 2017.

arXiv:1610.04460 [pdf, other]

On the Existence of a Sample Mean in Dynamic Time Warping Spaces

Authors: Brijnesh J. Jain, David Schultz

Abstract: The concept of sample mean in dynamic time warping (DTW) spaces has been successfully applied to improve pattern recognition systems and generalize centroid-based clustering algorithms. Its existence has neither been proved nor challenged. This article presents sufficient conditions for existence of a sample mean in DTW spaces. The proposed result justifies prior work on approximate mean algorithm… ▽ More The concept of sample mean in dynamic time warping (DTW) spaces has been successfully applied to improve pattern recognition systems and generalize centroid-based clustering algorithms. Its existence has neither been proved nor challenged. This article presents sufficient conditions for existence of a sample mean in DTW spaces. The proposed result justifies prior work on approximate mean algorithms, sets the stage for constructing exact mean algorithms, and is a first step towards a statistical theory of DTW spaces. △ Less

Submitted 5 March, 2018; v1 submitted 14 October, 2016; originally announced October 2016.

arXiv:1604.07711 [pdf, ps, other]

Condorcet's Jury Theorem for Consensus Clustering and its Implications for Diversity

Authors: Brijnesh J. Jain

Abstract: Condorcet's Jury Theorem has been invoked for ensemble classifiers to indicate that the combination of many classifiers can have better predictive performance than a single classifier. Such a theoretical underpinning is unknown for consensus clustering. This article extends Condorcet's Jury Theorem to the mean partition approach under the additional assumptions that a unique ground-truth partition… ▽ More Condorcet's Jury Theorem has been invoked for ensemble classifiers to indicate that the combination of many classifiers can have better predictive performance than a single classifier. Such a theoretical underpinning is unknown for consensus clustering. This article extends Condorcet's Jury Theorem to the mean partition approach under the additional assumptions that a unique ground-truth partition exists and sample partitions are drawn from a sufficiently small ball containing the ground-truth. As an implication of practical relevance, we question the claim that the quality of consensus clustering depends on the diversity of the sample partitions. Instead, we conjecture that limiting the diversity of the mean partitions is necessary for controlling the quality. △ Less

Submitted 10 October, 2016; v1 submitted 26 April, 2016; originally announced April 2016.

arXiv:1604.06626 [pdf, other]

The Mean Partition Theorem of Consensus Clustering

Authors: Brijnesh J. Jain

Abstract: To devise efficient solutions for approximating a mean partition in consensus clustering, Dimitriadou et al. [3] presented a necessary condition of optimality for a consensus function based on least square distances. We show that their result is pivotal for deriving interesting properties of consensus clustering beyond optimization. For this, we present the necessary condition of optimality in a s… ▽ More To devise efficient solutions for approximating a mean partition in consensus clustering, Dimitriadou et al. [3] presented a necessary condition of optimality for a consensus function based on least square distances. We show that their result is pivotal for deriving interesting properties of consensus clustering beyond optimization. For this, we present the necessary condition of optimality in a slightly stronger form in terms of the Mean Partition Theorem and extend it to the Expected Partition Theorem. To underpin its versatility, we show three examples that apply the Mean Partition Theorem: (i) equivalence of the mean partition and optimal multiple alignment, (ii) construction of profiles and motifs, and (iii) relationship between consensus clustering and cluster stability. △ Less

Submitted 26 April, 2016; v1 submitted 22 April, 2016; originally announced April 2016.

arXiv:1602.02543 [pdf, other]

Homogeneity of Cluster Ensembles

Authors: Brijnesh J. Jain

Abstract: The expectation and the mean of partitions generated by a cluster ensemble are not unique in general. This issue poses challenges in statistical inference and cluster stability. In this contribution, we state sufficient conditions for uniqueness of expectation and mean. The proposed conditions show that a unique mean is neither exceptional nor generic. To cope with this issue, we introduce homogen… ▽ More The expectation and the mean of partitions generated by a cluster ensemble are not unique in general. This issue poses challenges in statistical inference and cluster stability. In this contribution, we state sufficient conditions for uniqueness of expectation and mean. The proposed conditions show that a unique mean is neither exceptional nor generic. To cope with this issue, we introduce homogeneity as a measure of how likely is a unique mean for a sample of partitions. We show that homogeneity is related to cluster stability. This result points to a possible conflict between cluster stability and diversity in consensus clustering. To assess homogeneity in a practical setting, we propose an efficient way to compute a lower bound of homogeneity. Empirical results using the k-means algorithm suggest that uniqueness of the mean partition is not exceptional for real-world data. Moreover, for samples of high homogeneity, uniqueness can be enforced by increasing the number of data points or by removing outlier partitions. In a broader context, this contribution can be placed as a further step towards a statistical theory of partitions. △ Less

Submitted 8 February, 2016; originally announced February 2016.

arXiv:1511.00871 [pdf, other]

Properties of the Sample Mean in Graph Spaces and the Majorize-Minimize-Mean Algorithm

Authors: Brijnesh J. Jain

Abstract: One of the most fundamental concepts in statistics is the concept of sample mean. Properties of the sample mean that are well-defined in Euclidean spaces become unwieldy or even unclear in graph spaces. Open problems related to the sample mean of graphs include: non-existence, non-uniqueness, statistical inconsistency, lack of convergence results of mean algorithms, non-existence of midpoints, and… ▽ More One of the most fundamental concepts in statistics is the concept of sample mean. Properties of the sample mean that are well-defined in Euclidean spaces become unwieldy or even unclear in graph spaces. Open problems related to the sample mean of graphs include: non-existence, non-uniqueness, statistical inconsistency, lack of convergence results of mean algorithms, non-existence of midpoints, and disparity to midpoints. We present conditions to resolve all six problems and propose a Majorize-Minimize-Mean (MMM) Algorithm. Experiments on graph datasets representing images and molecules show that the MMM-Algorithm best approximates a sample mean of graphs compared to six other mean algorithms. △ Less

Submitted 3 November, 2015; originally announced November 2015.

arXiv:1505.08071 [pdf, other]

Geometry of Graph Edit Distance Spaces

Authors: Brijnesh J. Jain

Abstract: In this paper we study the geometry of graph spaces endowed with a special class of graph edit distances. The focus is on geometrical results useful for statistical pattern recognition. The main result is the Graph Representation Theorem. It states that a graph is a point in some geometrical space, called orbit space. Orbit spaces are well investigated and easier to explore than the original graph… ▽ More In this paper we study the geometry of graph spaces endowed with a special class of graph edit distances. The focus is on geometrical results useful for statistical pattern recognition. The main result is the Graph Representation Theorem. It states that a graph is a point in some geometrical space, called orbit space. Orbit spaces are well investigated and easier to explore than the original graph space. We derive a number of geometrical results from the orbit space representation, translate them to the graph space, and indicate their significance and usefulness in statistical pattern recognition. △ Less

Submitted 29 May, 2015; originally announced May 2015.

arXiv:1403.2295 [pdf, ps, other]

Sublinear Models for Graphs

Authors: Brijnesh J. Jain

Abstract: This contribution extends linear models for feature vectors to sublinear models for graphs and analyzes their properties. The results are (i) a geometric interpretation of sublinear classifiers, (ii) a generic learning rule based on the principle of empirical risk minimization, (iii) a convergence theorem for the margin perceptron in the sublinearly separable case, and (iv) the VC-dimension of sub… ▽ More This contribution extends linear models for feature vectors to sublinear models for graphs and analyzes their properties. The results are (i) a geometric interpretation of sublinear classifiers, (ii) a generic learning rule based on the principle of empirical risk minimization, (iii) a convergence theorem for the margin perceptron in the sublinearly separable case, and (iv) the VC-dimension of sublinear functions. Empirical results on graph data show that sublinear models on graphs have similar properties as linear models for feature vectors. △ Less

Submitted 10 March, 2014; originally announced March 2014.

arXiv:1204.4294 [pdf, ps, other]

Learning in Riemannian Orbifolds

Authors: Brijnesh J. Jain, Klaus Obermayer

Abstract: Learning in Riemannian orbifolds is motivated by existing machine learning algorithms that directly operate on finite combinatorial structures such as point patterns, trees, and graphs. These methods, however, lack statistical justification. This contribution derives consistency results for learning problems in structured domains and thereby generalizes learning in vector spaces and manifolds. Learning in Riemannian orbifolds is motivated by existing machine learning algorithms that directly operate on finite combinatorial structures such as point patterns, trees, and graphs. These methods, however, lack statistical justification. This contribution derives consistency results for learning problems in structured domains and thereby generalizes learning in vector spaces and manifolds. △ Less

Submitted 19 April, 2012; originally announced April 2012.

Comments: arXiv admin note: substantial text overlap with arXiv:1001.0921

arXiv:1001.0927 [pdf, other]

Accelerating Competitive Learning Graph Quantization

Authors: Brijnesh J. Jain, Klaus Obermayer

Abstract: Vector quantization(VQ) is a lossy data compression technique from signal processing for which simple competitive learning is one standard method to quantize patterns from the input space. Extending competitive learning VQ to the domain of graphs results in competitive learning for quantizing input graphs. In this contribution, we propose an accelerated version of competitive learning graph quan… ▽ More Vector quantization(VQ) is a lossy data compression technique from signal processing for which simple competitive learning is one standard method to quantize patterns from the input space. Extending competitive learning VQ to the domain of graphs results in competitive learning for quantizing input graphs. In this contribution, we propose an accelerated version of competitive learning graph quantization (GQ) without trading computational time against solution quality. For this, we lift graphs locally to vectors in order to avoid unnecessary calculations of intractable graph distances. In doing so, the accelerated version of competitive learning GQ gradually turns locally into a competitive learning VQ with increasing number of iterations. Empirical results show a significant speedup by maintaining a comparable solution quality. △ Less

Submitted 6 January, 2010; originally announced January 2010.

Comments: 17 pages; submitted to CVIU

arXiv:1001.0921 [pdf, other]

Graph Quantization

Authors: Brijnesh J. Jain, Klaus Obermayer

Abstract: Vector quantization(VQ) is a lossy data compression technique from signal processing, which is restricted to feature vectors and therefore inapplicable for combinatorial structures. This contribution presents a theoretical foundation of graph quantization (GQ) that extends VQ to the domain of attributed graphs. We present the necessary Lloyd-Max conditions for optimality of a graph quantizer and… ▽ More Vector quantization(VQ) is a lossy data compression technique from signal processing, which is restricted to feature vectors and therefore inapplicable for combinatorial structures. This contribution presents a theoretical foundation of graph quantization (GQ) that extends VQ to the domain of attributed graphs. We present the necessary Lloyd-Max conditions for optimality of a graph quantizer and consistency results for optimal GQ design based on empirical distortion measures and stochastic optimization. These results statistically justify existing clustering algorithms in the domain of graphs. The proposed approach provides a template of how to link structural pattern recognition methods other than GQ to statistical pattern recognition. △ Less

Submitted 6 January, 2010; originally announced January 2010.

Comments: 24 pages; submitted to CVIU

arXiv:0912.4598 [pdf, other]

Elkan's k-Means for Graphs

Authors: Brijnesh J. Jain, Klaus Obermayer

Abstract: This paper extends k-means algorithms from the Euclidean domain to the domain of graphs. To recompute the centroids, we apply subgradient methods for solving the optimization-based formulation of the sample mean of graphs. To accelerate the k-means algorithm for graphs without trading computational time against solution quality, we avoid unnecessary graph distance calculations by exploiting the… ▽ More This paper extends k-means algorithms from the Euclidean domain to the domain of graphs. To recompute the centroids, we apply subgradient methods for solving the optimization-based formulation of the sample mean of graphs. To accelerate the k-means algorithm for graphs without trading computational time against solution quality, we avoid unnecessary graph distance calculations by exploiting the triangle inequality of the underlying distance metric following Elkan's k-means algorithm proposed in \cite{Elkan03}. In experiments we show that the accelerated k-means algorithm are faster than the standard k-means algorithm for graphs provided there is a cluster structure in the data. △ Less

Submitted 23 December, 2009; originally announced December 2009.

Comments: 21 pages; submitted to MLJ

Showing 1–29 of 29 results for author: Jain, J