Search | arXiv e-print repository

A Unified and Interpretable Emotion Representation and Expression Generation

Authors: Reni Paskaleva, Mykyta Holubakha, Andela Ilic, Saman Motamed, Luc Van Gool, Danda Paudel

Abstract: Canonical emotions, such as happy, sad, and fearful, are easy to understand and annotate. However, emotions are often compound, e.g. happily surprised, and can be mapped to the action units (AUs) used for expressing emotions, and trivially to the canonical ones. Intuitively, emotions are continuous as represented by the arousal-valence (AV) model. An interpretable unification of these four modalit… ▽ More Canonical emotions, such as happy, sad, and fearful, are easy to understand and annotate. However, emotions are often compound, e.g. happily surprised, and can be mapped to the action units (AUs) used for expressing emotions, and trivially to the canonical ones. Intuitively, emotions are continuous as represented by the arousal-valence (AV) model. An interpretable unification of these four modalities - namely, Canonical, Compound, AUs, and AV - is highly desirable, for a better representation and understanding of emotions. However, such unification remains to be unknown in the current literature. In this work, we propose an interpretable and unified emotion model, referred as C2A2. We also develop a method that leverages labels of the non-unified models to annotate the novel unified one. Finally, we modify the text-conditional diffusion models to understand continuous numbers, which are then used to generate continuous expressions using our unified emotion model. Through quantitative and qualitative experiments, we show that our generated images are rich and capture subtle expressions. Our work allows a fine-grained generation of expressions in conjunction with other textual inputs and offers a new label space for emotions at the same time. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 10 pages, 9 figures, 3 tables Accepted at CVPR 2024. Project page: https://emotion-diffusion.github.io

arXiv:2403.07750 [pdf, other]

Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings

Authors: Sahand Sharifzadeh, Christos Kaplanis, Shreya Pathak, Dharshan Kumaran, Anastasija Ilic, Jovana Mitrovic, Charles Blundell, Andrea Banino

Abstract: The creation of high-quality human-labeled image-caption datasets presents a significant bottleneck in the development of Visual-Language Models (VLMs). In this work, we investigate an approach that leverages the strengths of Large Language Models (LLMs) and image generation models to create synthetic image-text pairs for efficient and effective VLM training. Our method employs a pretrained text-t… ▽ More The creation of high-quality human-labeled image-caption datasets presents a significant bottleneck in the development of Visual-Language Models (VLMs). In this work, we investigate an approach that leverages the strengths of Large Language Models (LLMs) and image generation models to create synthetic image-text pairs for efficient and effective VLM training. Our method employs a pretrained text-to-image model to synthesize image embeddings from captions generated by an LLM. Despite the text-to-image model and VLM initially being trained on the same data, our approach leverages the image generator's ability to create novel compositions, resulting in synthetic image embeddings that expand beyond the limitations of the original dataset. Extensive experiments demonstrate that our VLM, finetuned on synthetic data achieves comparable performance to models trained solely on human-annotated data, while requiring significantly less data. Furthermore, we perform a set of analyses on captions which reveals that semantic diversity and balance are key aspects for better downstream performance. Finally, we show that synthesizing images in the image embedding space is 25\% faster than in the pixel space. We believe our work not only addresses a significant challenge in VLM training but also opens up promising avenues for the development of self-improving multi-modal models. △ Less

Submitted 7 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: 9 pages, 6 figures

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2401.09865 [pdf, other]

Improving fine-grained understanding in image-text pre-training

Authors: Ioana Bica, Anastasija Ilić, Matthias Bauer, Goker Erdogan, Matko Bošnjak, Christos Kaplanis, Alexey A. Gritsenko, Matthias Minderer, Charles Blundell, Razvan Pascanu, Jovana Mitrović

Abstract: We introduce SPARse Fine-grained Contrastive Alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations from image-text pairs. Given that multiple image patches often correspond to single words, we propose to learn a grouping of image patches for every token in the caption. To achieve this, we use a sparse similarity metric between image patches and language to… ▽ More We introduce SPARse Fine-grained Contrastive Alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations from image-text pairs. Given that multiple image patches often correspond to single words, we propose to learn a grouping of image patches for every token in the caption. To achieve this, we use a sparse similarity metric between image patches and language tokens and compute for each token a language-grouped vision embedding as the weighted average of patches. The token and language-grouped vision embeddings are then contrasted through a fine-grained sequence-wise loss that only depends on individual samples and does not require other batch samples as negatives. This enables more detailed information to be learned in a computationally inexpensive manner. SPARC combines this fine-grained loss with a contrastive loss between global image and text embeddings to learn representations that simultaneously encode global and local information. We thoroughly evaluate our proposed method and show improved performance over competing approaches both on image-level tasks relying on coarse-grained information, e.g. classification, as well as region-level tasks relying on fine-grained information, e.g. retrieval, object detection, and segmentation. Moreover, SPARC improves model faithfulness and captioning in foundational vision-language models. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 26 pages

arXiv:2307.09081 [pdf, ps, other]

Maximal diameter of integral circulant graphs

Authors: Milan Bašić, Aleksandar Ilić, Aleksandar Stamenković

Abstract: Integral circulant graphs are proposed as models for quantum spin networks that permit a quantum phenomenon called perfect state transfer. Specifically, it is important to know how far information can potentially be transferred between nodes of the quantum networks modelled by integral circulant graphs and this task is related to calculating the maximal diameter of a graph. The integral circulant… ▽ More Integral circulant graphs are proposed as models for quantum spin networks that permit a quantum phenomenon called perfect state transfer. Specifically, it is important to know how far information can potentially be transferred between nodes of the quantum networks modelled by integral circulant graphs and this task is related to calculating the maximal diameter of a graph. The integral circulant graph $ICG_n (D)$ has the vertex set $Z_n = \{0, 1, 2, \ldots, n - 1\}$ and vertices $a$ and $b$ are adjacent if $\gcd(a-b,n)\in D$, where $D \subseteq \{d : d \mid n,\ 1\leq d<n\}$. Motivated by the result on the upper bound of the diameter of $ICG_n(D)$ given in [N. Saxena, S. Severini, I. Shparlinski, \textit{Parameters of integral circulant graphs and periodic quantum dynamics}, International Journal of Quantum Information 5 (2007), 417--430], according to which $2|D|+1$ represents one such bound, in this paper we prove that the maximal value of the diameter of the integral circulant graph $ICG_n(D)$ of a given order $n$ with its prime factorization $p_1^{α_1}\cdots p_k^{α_k}$, is equal to $r(n)$ or $r(n)+1$, where $r(n)=k + |\{ i \ | α_i> 1,\ 1\leq i\leq k \}|$, depending on whether $n\not\in 4N+2$ or not, respectively. Furthermore, we show that, for a given order $n$, a divisor set $D$ with $|D|\leq k$ can always be found such that this bound is attained. Finally, we calculate the maximal diameter in the class of integral circulant graphs of a given order $n$ and cardinality of the divisor set $t\leq k$ and characterize all extremal graphs. We actually show that the maximal diameter can have the values $2t$, $2t+1$, $r(n)$ and $r(n)+1$ depending on the values of $t$ and $n$. This way we further improve the upper bound of Saxena, Severini and Shparlinski and we also characterize all graphs whose diameters are equal to $2|D|+1$, thus generalizing a result in that paper. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: 29 pages, 1 figure

MSC Class: 05C12 11A07 81P45

arXiv:2201.10956 [pdf, other]

Unlocking Personalized Healthcare on Modern CPUs/GPUs: Three-way Gene Interaction Study

Authors: Diogo Marques, Rafael Campos, Sergio Santander-Jiménez, Zakhar Matveev, Leonel Sousa, Aleksandar Ilic

Abstract: Developments in Genome-Wide Association Studies have led to the increasing notion that future healthcare techniques will be personalized to the patient, by relying on genetic tests to determine the risk of developing a disease. To this end, the detection of gene interactions that cause complex diseases constitutes an important application. Similarly to many applications in this field, extensive da… ▽ More Developments in Genome-Wide Association Studies have led to the increasing notion that future healthcare techniques will be personalized to the patient, by relying on genetic tests to determine the risk of developing a disease. To this end, the detection of gene interactions that cause complex diseases constitutes an important application. Similarly to many applications in this field, extensive data sets containing genetic information for a series of patients are used (such as Single-Nucleotide Polymorphisms), leading to high computational complexity and memory utilization, thus constituting a major challenge when targeting high-performance execution in modern computing systems. To close this gap, this work proposes several novel approaches for the detection of three-way gene interactions in modern CPUs and GPUs, making use of different optimizations to fully exploit the target architectures. Crucial insights from the Cache-Aware Roofline Model are used to ensure the suitability of the applications to the computing devices. An extensive study of the architectural features of 13 CPU and GPU devices from all main vendors is also presented, allowing to understand the features relevant to obtain high-performance in this bioinformatics domain. To the best of our knowledge, this study is the first to perform such evaluation for epistasis detection. The proposed approaches are able to surpass the performance of state-of-the-art works in the tested platforms, achieving an average speedup of 3.9$\times$ (7.3$\times$ on CPUs and 2.8$\times$ on GPUs) and maximum speedup of 10.6$\times$ on Intel UHD P630 GPU. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: To appear in 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022)

arXiv:2112.01140 [pdf, other]

On the computational complexity of the Steiner $k$-eccentricity

Authors: Xingfu Li, Guihai Yu, Aleksandar Ilić, Sandi Klavžar

Abstract: The Steiner $k$-eccentricity of a vertex $v$ of a graph $G$ is the maximum Steiner distance over all $k$-subsets of $V (G)$ which contain $v$. A linear time algorithm for calculating the Steiner $k$-eccentricity of a vertex on block graphs is presented. For general graphs, an $O(n^{ν(G)+1}(n(G) + m(G) + k))$ algorithm is designed, where $ν(G)$ is the cyclomatic number of $G$. A linear algorithm fo… ▽ More The Steiner $k$-eccentricity of a vertex $v$ of a graph $G$ is the maximum Steiner distance over all $k$-subsets of $V (G)$ which contain $v$. A linear time algorithm for calculating the Steiner $k$-eccentricity of a vertex on block graphs is presented. For general graphs, an $O(n^{ν(G)+1}(n(G) + m(G) + k))$ algorithm is designed, where $ν(G)$ is the cyclomatic number of $G$. A linear algorithm for computing the Steiner $3$-eccentricities of all vertices of a tree is also presented which improves the quadratic algorithm from [Discrete Appl.\ Math.\ 304 (2021) 181--195]. △ Less

Submitted 2 December, 2021; originally announced December 2021.

arXiv:2008.09299

Optimal algorithm for computing Steiner 3-eccentricities of trees

Authors: Aleksandar Ilic

Abstract: The Steiner $k$-eccentricity of a vertex $v$ of a graph $G$ is the maximum Steiner distance over all $k$-subsets of $V (G)$ which contain $v$. In this note, we design a linear algorithm for computing the Steiner $3$-eccentricities and the connective Steiner $3$-eccentricity index on a tree and thus improving a quadratic algorithm presented in [G. Yu, X. Li, \emph{Connective Steiner 3-eccentricity… ▽ More The Steiner $k$-eccentricity of a vertex $v$ of a graph $G$ is the maximum Steiner distance over all $k$-subsets of $V (G)$ which contain $v$. In this note, we design a linear algorithm for computing the Steiner $3$-eccentricities and the connective Steiner $3$-eccentricity index on a tree and thus improving a quadratic algorithm presented in [G. Yu, X. Li, \emph{Connective Steiner 3-eccentricity index and network similarity measure}, Appl. Math. Comput. 386 (2020), 125446.] △ Less

Submitted 21 February, 2021; v1 submitted 21 August, 2020; originally announced August 2020.

Comments: Merged into another paper

arXiv:1905.00661 [pdf, other]

doi 10.1109/PACT.2019.00026

HeTM: Transactional Memory for Heterogeneous Systems

Authors: Daniel Castro, Paolo Romano, Aleksandar Ilic, Amin M. Khan

Abstract: Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, developing applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reduc… ▽ More Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, developing applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reducing the complexity of programming heterogeneous systems by introducing the abstraction of Heterogeneous Transactional Memory (HeTM). HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions. Besides introducing the abstract semantics and programming model of HeTM, we present the design and evaluation of a concrete implementation of the proposed abstraction, which we named Speculative HeTM (SHeTM). SHeTM makes use of a novel design that leverages on speculative techniques and aims at hiding the inherently large communication latency between CPUs and discrete GPUs and at minimizing inter-device synchronization overhead. SHeTM is based on a modular and extensible design that allows for easily integrating alternative TM implementations on the CPU's and GPU's sides, which allows the flexibility to adopt, on either side, the TM implementation (e.g., in hardware or software) that best fits the applications' workload and the architectural characteristics of the processing unit. We demonstrate the efficiency of the SHeTM via an extensive quantitative study based both on synthetic benchmarks and on a porting of a popular object caching system. △ Less

Submitted 2 September, 2019; v1 submitted 2 May, 2019; originally announced May 2019.

Comments: The current work was accepted in the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT'19)

arXiv:1810.04870 [pdf, ps, other]

Path matrix and path energy of graphs

Authors: Aleksandar Ilic, Milan Basic

Abstract: Given a graph $G$, we associate a path matrix $P$ whose $(i, j)$ entry represents the maximum number of vertex disjoint paths between the vertices $i$ and $j$, with zeros on the main diagonal. In this note, we resolve four conjectures from [M. M. Shikare, P. P. Malavadkar, S. C. Patekar, I. Gutman, \emph{On Path Eigenvalues and Path Energy of Graphs}, MATCH Commun. Math. Comput. Chem. {\bf 79} (20… ▽ More Given a graph $G$, we associate a path matrix $P$ whose $(i, j)$ entry represents the maximum number of vertex disjoint paths between the vertices $i$ and $j$, with zeros on the main diagonal. In this note, we resolve four conjectures from [M. M. Shikare, P. P. Malavadkar, S. C. Patekar, I. Gutman, \emph{On Path Eigenvalues and Path Energy of Graphs}, MATCH Commun. Math. Comput. Chem. {\bf 79} (2018), 387--398.] on the path energy of graphs and finally present efficient $O(|E| |V|^3)$ algorithm for computing the path matrix used for verifying computational results. △ Less

Submitted 18 February, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

Comments: 8 pages

MSC Class: 05C50

arXiv:1307.6505 [pdf, ps, other]

On the variable common due date, minimal tardy jobs bicriteria two-machine flow shop problem with ordered machines

Authors: Aleksandar Ilic

Abstract: We consider a special case of the ordinary NP-hard two-machine flow shop problem with the objective of determining simultaneously a minimal common due date and the minimal number of tardy jobs. In [S. S. Panwalkar, C. Koulamas, An O(n^2) algorithm for the variable common due date, minimal tardy jobs bicriteria two-machine flow shop problem with ordered machines, European Journal of Operational Res… ▽ More We consider a special case of the ordinary NP-hard two-machine flow shop problem with the objective of determining simultaneously a minimal common due date and the minimal number of tardy jobs. In [S. S. Panwalkar, C. Koulamas, An O(n^2) algorithm for the variable common due date, minimal tardy jobs bicriteria two-machine flow shop problem with ordered machines, European Journal of Operational Research 221 (2012), 7-13.], the authors presented quadratic algorithm for the problem when each job has its smaller processing time on the first machine. In this note, we improve the running time of the algorithm to O(n log n) by efficient implementation using recently introduced modified binary tree data structure. △ Less

Submitted 13 March, 2015; v1 submitted 24 July, 2013; originally announced July 2013.

Comments: 6 pages, 1 algorithm

arXiv:1106.3037 [pdf, ps, other]

Efficient algorithm for the vertex connectivity of trapezoid graphs

Authors: Aleksandar Ilic

Abstract: The intersection graph of a collection of trapezoids with corner points lying on two parallel lines is called a trapezoid graph. These graphs and their generalizations were applied in various fields, including modeling channel routing problems in VLSI design and identifying the optimal chain of non-overlapping fragments in bioinformatics. Using modified binary indexed tree data structure, we desig… ▽ More The intersection graph of a collection of trapezoids with corner points lying on two parallel lines is called a trapezoid graph. These graphs and their generalizations were applied in various fields, including modeling channel routing problems in VLSI design and identifying the optimal chain of non-overlapping fragments in bioinformatics. Using modified binary indexed tree data structure, we design an algorithm for calculating the vertex connectivity of trapezoid graph $G$ with time complexity $O (n \log n)$, where $n$ is the number of trapezoids. Furthermore, we establish sufficient and necessary condition for a trapezoid graph $G$ to be bipartite and characterize trees that can be represented as trapezoid graphs. △ Less

Submitted 15 June, 2011; originally announced June 2011.

Comments: 12 pages, 2 figures

MSC Class: 05C85; 68R10; 05C40

arXiv:1106.2351 [pdf, ps, other]

On vertex covers and matching number of trapezoid graphs

Authors: Aleksandar Ilic, Andreja Ilic

Abstract: The intersection graph of a collection of trapezoids with corner points lying on two parallel lines is called a trapezoid graph. Using binary indexed tree data structure, we improve algorithms for calculating the size and the number of minimum vertex covers (or independent sets), as well as the total number of vertex covers, and reduce the time complexity from $O (n^2)$ to $O (n \log n)$, where… ▽ More The intersection graph of a collection of trapezoids with corner points lying on two parallel lines is called a trapezoid graph. Using binary indexed tree data structure, we improve algorithms for calculating the size and the number of minimum vertex covers (or independent sets), as well as the total number of vertex covers, and reduce the time complexity from $O (n^2)$ to $O (n \log n)$, where $n$ is the number of trapezoids. Furthermore, we present the family of counterexamples for recently proposed algorithm with time complexity $O (n^2)$ for calculating the maximum cardinality matching in trapezoid graphs. △ Less

Submitted 12 June, 2011; originally announced June 2011.

Comments: 9 pages, 1 figure, 4 algorithms

MSC Class: 05C85; 68R10

Showing 1–13 of 13 results for author: Ilić, A