-
A Unified and Interpretable Emotion Representation and Expression Generation
Authors:
Reni Paskaleva,
Mykyta Holubakha,
Andela Ilic,
Saman Motamed,
Luc Van Gool,
Danda Paudel
Abstract:
Canonical emotions, such as happy, sad, and fearful, are easy to understand and annotate. However, emotions are often compound, e.g. happily surprised, and can be mapped to the action units (AUs) used for expressing emotions, and trivially to the canonical ones. Intuitively, emotions are continuous as represented by the arousal-valence (AV) model. An interpretable unification of these four modalit…
▽ More
Canonical emotions, such as happy, sad, and fearful, are easy to understand and annotate. However, emotions are often compound, e.g. happily surprised, and can be mapped to the action units (AUs) used for expressing emotions, and trivially to the canonical ones. Intuitively, emotions are continuous as represented by the arousal-valence (AV) model. An interpretable unification of these four modalities - namely, Canonical, Compound, AUs, and AV - is highly desirable, for a better representation and understanding of emotions. However, such unification remains to be unknown in the current literature. In this work, we propose an interpretable and unified emotion model, referred as C2A2. We also develop a method that leverages labels of the non-unified models to annotate the novel unified one. Finally, we modify the text-conditional diffusion models to understand continuous numbers, which are then used to generate continuous expressions using our unified emotion model. Through quantitative and qualitative experiments, we show that our generated images are rich and capture subtle expressions. Our work allows a fine-grained generation of expressions in conjunction with other textual inputs and offers a new label space for emotions at the same time.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Authors:
Sahand Sharifzadeh,
Christos Kaplanis,
Shreya Pathak,
Dharshan Kumaran,
Anastasija Ilic,
Jovana Mitrovic,
Charles Blundell,
Andrea Banino
Abstract:
The creation of high-quality human-labeled image-caption datasets presents a significant bottleneck in the development of Visual-Language Models (VLMs). In this work, we investigate an approach that leverages the strengths of Large Language Models (LLMs) and image generation models to create synthetic image-text pairs for efficient and effective VLM training. Our method employs a pretrained text-t…
▽ More
The creation of high-quality human-labeled image-caption datasets presents a significant bottleneck in the development of Visual-Language Models (VLMs). In this work, we investigate an approach that leverages the strengths of Large Language Models (LLMs) and image generation models to create synthetic image-text pairs for efficient and effective VLM training. Our method employs a pretrained text-to-image model to synthesize image embeddings from captions generated by an LLM. Despite the text-to-image model and VLM initially being trained on the same data, our approach leverages the image generator's ability to create novel compositions, resulting in synthetic image embeddings that expand beyond the limitations of the original dataset. Extensive experiments demonstrate that our VLM, finetuned on synthetic data achieves comparable performance to models trained solely on human-annotated data, while requiring significantly less data. Furthermore, we perform a set of analyses on captions which reveals that semantic diversity and balance are key aspects for better downstream performance. Finally, we show that synthesizing images in the image embedding space is 25\% faster than in the pixel space. We believe our work not only addresses a significant challenge in VLM training but also opens up promising avenues for the development of self-improving multi-modal models.
△ Less
Submitted 7 June, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1110 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 8 August, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Improving fine-grained understanding in image-text pre-training
Authors:
Ioana Bica,
Anastasija Ilić,
Matthias Bauer,
Goker Erdogan,
Matko Bošnjak,
Christos Kaplanis,
Alexey A. Gritsenko,
Matthias Minderer,
Charles Blundell,
Razvan Pascanu,
Jovana Mitrović
Abstract:
We introduce SPARse Fine-grained Contrastive Alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations from image-text pairs. Given that multiple image patches often correspond to single words, we propose to learn a grouping of image patches for every token in the caption. To achieve this, we use a sparse similarity metric between image patches and language to…
▽ More
We introduce SPARse Fine-grained Contrastive Alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations from image-text pairs. Given that multiple image patches often correspond to single words, we propose to learn a grouping of image patches for every token in the caption. To achieve this, we use a sparse similarity metric between image patches and language tokens and compute for each token a language-grouped vision embedding as the weighted average of patches. The token and language-grouped vision embeddings are then contrasted through a fine-grained sequence-wise loss that only depends on individual samples and does not require other batch samples as negatives. This enables more detailed information to be learned in a computationally inexpensive manner. SPARC combines this fine-grained loss with a contrastive loss between global image and text embeddings to learn representations that simultaneously encode global and local information. We thoroughly evaluate our proposed method and show improved performance over competing approaches both on image-level tasks relying on coarse-grained information, e.g. classification, as well as region-level tasks relying on fine-grained information, e.g. retrieval, object detection, and segmentation. Moreover, SPARC improves model faithfulness and captioning in foundational vision-language models.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Maximal diameter of integral circulant graphs
Authors:
Milan Bašić,
Aleksandar Ilić,
Aleksandar Stamenković
Abstract:
Integral circulant graphs are proposed as models for quantum spin networks that permit a quantum phenomenon called perfect state transfer. Specifically, it is important to know how far information can potentially be transferred between nodes of the quantum networks modelled by integral circulant graphs and this task is related to calculating the maximal diameter of a graph. The integral circulant…
▽ More
Integral circulant graphs are proposed as models for quantum spin networks that permit a quantum phenomenon called perfect state transfer. Specifically, it is important to know how far information can potentially be transferred between nodes of the quantum networks modelled by integral circulant graphs and this task is related to calculating the maximal diameter of a graph. The integral circulant graph $ICG_n (D)$ has the vertex set $Z_n = \{0, 1, 2, \ldots, n - 1\}$ and vertices $a$ and $b$ are adjacent if $\gcd(a-b,n)\in D$, where $D \subseteq \{d : d \mid n,\ 1\leq d<n\}$. Motivated by the result on the upper bound of the diameter of $ICG_n(D)$ given in [N. Saxena, S. Severini, I. Shparlinski, \textit{Parameters of integral circulant graphs and periodic quantum dynamics}, International Journal of Quantum Information 5 (2007), 417--430], according to which $2|D|+1$ represents one such bound, in this paper we prove that the maximal value of the diameter of the integral circulant graph $ICG_n(D)$ of a given order $n$ with its prime factorization $p_1^{α_1}\cdots p_k^{α_k}$, is equal to $r(n)$ or $r(n)+1$, where $r(n)=k + |\{ i \ | α_i> 1,\ 1\leq i\leq k \}|$, depending on whether $n\not\in 4N+2$ or not, respectively. Furthermore, we show that, for a given order $n$, a divisor set $D$ with $|D|\leq k$ can always be found such that this bound is attained. Finally, we calculate the maximal diameter in the class of integral circulant graphs of a given order $n$ and cardinality of the divisor set $t\leq k$ and characterize all extremal graphs. We actually show that the maximal diameter can have the values $2t$, $2t+1$, $r(n)$ and $r(n)+1$ depending on the values of $t$ and $n$. This way we further improve the upper bound of Saxena, Severini and Shparlinski and we also characterize all graphs whose diameters are equal to $2|D|+1$, thus generalizing a result in that paper.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Unlocking Personalized Healthcare on Modern CPUs/GPUs: Three-way Gene Interaction Study
Authors:
Diogo Marques,
Rafael Campos,
Sergio Santander-Jiménez,
Zakhar Matveev,
Leonel Sousa,
Aleksandar Ilic
Abstract:
Developments in Genome-Wide Association Studies have led to the increasing notion that future healthcare techniques will be personalized to the patient, by relying on genetic tests to determine the risk of developing a disease. To this end, the detection of gene interactions that cause complex diseases constitutes an important application. Similarly to many applications in this field, extensive da…
▽ More
Developments in Genome-Wide Association Studies have led to the increasing notion that future healthcare techniques will be personalized to the patient, by relying on genetic tests to determine the risk of developing a disease. To this end, the detection of gene interactions that cause complex diseases constitutes an important application. Similarly to many applications in this field, extensive data sets containing genetic information for a series of patients are used (such as Single-Nucleotide Polymorphisms), leading to high computational complexity and memory utilization, thus constituting a major challenge when targeting high-performance execution in modern computing systems. To close this gap, this work proposes several novel approaches for the detection of three-way gene interactions in modern CPUs and GPUs, making use of different optimizations to fully exploit the target architectures. Crucial insights from the Cache-Aware Roofline Model are used to ensure the suitability of the applications to the computing devices. An extensive study of the architectural features of 13 CPU and GPU devices from all main vendors is also presented, allowing to understand the features relevant to obtain high-performance in this bioinformatics domain. To the best of our knowledge, this study is the first to perform such evaluation for epistasis detection. The proposed approaches are able to surpass the performance of state-of-the-art works in the tested platforms, achieving an average speedup of 3.9$\times$ (7.3$\times$ on CPUs and 2.8$\times$ on GPUs) and maximum speedup of 10.6$\times$ on Intel UHD P630 GPU.
△ Less
Submitted 26 January, 2022;
originally announced January 2022.
-
On the computational complexity of the Steiner $k$-eccentricity
Authors:
Xingfu Li,
Guihai Yu,
Aleksandar Ilić,
Sandi Klavžar
Abstract:
The Steiner $k$-eccentricity of a vertex $v$ of a graph $G$ is the maximum Steiner distance over all $k$-subsets of $V (G)$ which contain $v$. A linear time algorithm for calculating the Steiner $k$-eccentricity of a vertex on block graphs is presented. For general graphs, an $O(n^{ν(G)+1}(n(G) + m(G) + k))$ algorithm is designed, where $ν(G)$ is the cyclomatic number of $G$. A linear algorithm fo…
▽ More
The Steiner $k$-eccentricity of a vertex $v$ of a graph $G$ is the maximum Steiner distance over all $k$-subsets of $V (G)$ which contain $v$. A linear time algorithm for calculating the Steiner $k$-eccentricity of a vertex on block graphs is presented. For general graphs, an $O(n^{ν(G)+1}(n(G) + m(G) + k))$ algorithm is designed, where $ν(G)$ is the cyclomatic number of $G$. A linear algorithm for computing the Steiner $3$-eccentricities of all vertices of a tree is also presented which improves the quadratic algorithm from [Discrete Appl.\ Math.\ 304 (2021) 181--195].
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
Optimal algorithm for computing Steiner 3-eccentricities of trees
Authors:
Aleksandar Ilic
Abstract:
The Steiner $k$-eccentricity of a vertex $v$ of a graph $G$ is the maximum Steiner distance over all $k$-subsets of $V (G)$ which contain $v$. In this note, we design a linear algorithm for computing the Steiner $3$-eccentricities and the connective Steiner $3$-eccentricity index on a tree and thus improving a quadratic algorithm presented in [G. Yu, X. Li, \emph{Connective Steiner 3-eccentricity…
▽ More
The Steiner $k$-eccentricity of a vertex $v$ of a graph $G$ is the maximum Steiner distance over all $k$-subsets of $V (G)$ which contain $v$. In this note, we design a linear algorithm for computing the Steiner $3$-eccentricities and the connective Steiner $3$-eccentricity index on a tree and thus improving a quadratic algorithm presented in [G. Yu, X. Li, \emph{Connective Steiner 3-eccentricity index and network similarity measure}, Appl. Math. Comput. 386 (2020), 125446.]
△ Less
Submitted 21 February, 2021; v1 submitted 21 August, 2020;
originally announced August 2020.
-
HeTM: Transactional Memory for Heterogeneous Systems
Authors:
Daniel Castro,
Paolo Romano,
Aleksandar Ilic,
Amin M. Khan
Abstract:
Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, developing applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reduc…
▽ More
Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, developing applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reducing the complexity of programming heterogeneous systems by introducing the abstraction of Heterogeneous Transactional Memory (HeTM). HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions. Besides introducing the abstract semantics and programming model of HeTM, we present the design and evaluation of a concrete implementation of the proposed abstraction, which we named Speculative HeTM (SHeTM). SHeTM makes use of a novel design that leverages on speculative techniques and aims at hiding the inherently large communication latency between CPUs and discrete GPUs and at minimizing inter-device synchronization overhead. SHeTM is based on a modular and extensible design that allows for easily integrating alternative TM implementations on the CPU's and GPU's sides, which allows the flexibility to adopt, on either side, the TM implementation (e.g., in hardware or software) that best fits the applications' workload and the architectural characteristics of the processing unit. We demonstrate the efficiency of the SHeTM via an extensive quantitative study based both on synthetic benchmarks and on a porting of a popular object caching system.
△ Less
Submitted 2 September, 2019; v1 submitted 2 May, 2019;
originally announced May 2019.
-
Path matrix and path energy of graphs
Authors:
Aleksandar Ilic,
Milan Basic
Abstract:
Given a graph $G$, we associate a path matrix $P$ whose $(i, j)$ entry represents the maximum number of vertex disjoint paths between the vertices $i$ and $j$, with zeros on the main diagonal. In this note, we resolve four conjectures from [M. M. Shikare, P. P. Malavadkar, S. C. Patekar, I. Gutman, \emph{On Path Eigenvalues and Path Energy of Graphs}, MATCH Commun. Math. Comput. Chem. {\bf 79} (20…
▽ More
Given a graph $G$, we associate a path matrix $P$ whose $(i, j)$ entry represents the maximum number of vertex disjoint paths between the vertices $i$ and $j$, with zeros on the main diagonal. In this note, we resolve four conjectures from [M. M. Shikare, P. P. Malavadkar, S. C. Patekar, I. Gutman, \emph{On Path Eigenvalues and Path Energy of Graphs}, MATCH Commun. Math. Comput. Chem. {\bf 79} (2018), 387--398.] on the path energy of graphs and finally present efficient $O(|E| |V|^3)$ algorithm for computing the path matrix used for verifying computational results.
△ Less
Submitted 18 February, 2019; v1 submitted 11 October, 2018;
originally announced October 2018.
-
On the variable common due date, minimal tardy jobs bicriteria two-machine flow shop problem with ordered machines
Authors:
Aleksandar Ilic
Abstract:
We consider a special case of the ordinary NP-hard two-machine flow shop problem with the objective of determining simultaneously a minimal common due date and the minimal number of tardy jobs. In [S. S. Panwalkar, C. Koulamas, An O(n^2) algorithm for the variable common due date, minimal tardy jobs bicriteria two-machine flow shop problem with ordered machines, European Journal of Operational Res…
▽ More
We consider a special case of the ordinary NP-hard two-machine flow shop problem with the objective of determining simultaneously a minimal common due date and the minimal number of tardy jobs. In [S. S. Panwalkar, C. Koulamas, An O(n^2) algorithm for the variable common due date, minimal tardy jobs bicriteria two-machine flow shop problem with ordered machines, European Journal of Operational Research 221 (2012), 7-13.], the authors presented quadratic algorithm for the problem when each job has its smaller processing time on the first machine. In this note, we improve the running time of the algorithm to O(n log n) by efficient implementation using recently introduced modified binary tree data structure.
△ Less
Submitted 13 March, 2015; v1 submitted 24 July, 2013;
originally announced July 2013.
-
Efficient algorithm for the vertex connectivity of trapezoid graphs
Authors:
Aleksandar Ilic
Abstract:
The intersection graph of a collection of trapezoids with corner points lying on two parallel lines is called a trapezoid graph. These graphs and their generalizations were applied in various fields, including modeling channel routing problems in VLSI design and identifying the optimal chain of non-overlapping fragments in bioinformatics. Using modified binary indexed tree data structure, we desig…
▽ More
The intersection graph of a collection of trapezoids with corner points lying on two parallel lines is called a trapezoid graph. These graphs and their generalizations were applied in various fields, including modeling channel routing problems in VLSI design and identifying the optimal chain of non-overlapping fragments in bioinformatics. Using modified binary indexed tree data structure, we design an algorithm for calculating the vertex connectivity of trapezoid graph $G$ with time complexity $O (n \log n)$, where $n$ is the number of trapezoids. Furthermore, we establish sufficient and necessary condition for a trapezoid graph $G$ to be bipartite and characterize trees that can be represented as trapezoid graphs.
△ Less
Submitted 15 June, 2011;
originally announced June 2011.
-
On vertex covers and matching number of trapezoid graphs
Authors:
Aleksandar Ilic,
Andreja Ilic
Abstract:
The intersection graph of a collection of trapezoids with corner points lying on two parallel lines is called a trapezoid graph. Using binary indexed tree data structure, we improve algorithms for calculating the size and the number of minimum vertex covers (or independent sets), as well as the total number of vertex covers, and reduce the time complexity from $O (n^2)$ to $O (n \log n)$, where…
▽ More
The intersection graph of a collection of trapezoids with corner points lying on two parallel lines is called a trapezoid graph. Using binary indexed tree data structure, we improve algorithms for calculating the size and the number of minimum vertex covers (or independent sets), as well as the total number of vertex covers, and reduce the time complexity from $O (n^2)$ to $O (n \log n)$, where $n$ is the number of trapezoids. Furthermore, we present the family of counterexamples for recently proposed algorithm with time complexity $O (n^2)$ for calculating the maximum cardinality matching in trapezoid graphs.
△ Less
Submitted 12 June, 2011;
originally announced June 2011.