Search | arXiv e-print repository

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Authors: Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao

Abstract: Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extract… ▽ More Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extraction and understanding tasks, and their capacity to process within-document data formats such as charts and equations remains under-explored. To address these issues, we present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community, using our custom auto-labeling pipeline. DocGenome features four key characteristics: 1) Completeness: It is the first dataset to structure data from all modalities including 13 layout attributes along with their LaTeX source codes. 2) Logicality: It provides 6 logical relationships between different entities within each scientific document. 3) Diversity: It covers various document-oriented tasks, including document classification, visual grounding, document layout detection, document transformation, open-ended single-page QA and multi-page QA. 4) Correctness: It undergoes rigorous quality control checks conducted by a specialized team. We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Homepage of DocGenome: https://unimodal4reasoning.github.io/DocGenome_page 22 pages, 11 figures

arXiv:2405.10750 [pdf, other]

Parameter Identification for Electrochemical Models of Lithium-Ion Batteries Using Bayesian Optimization

Authors: Jianzong Pi, Samuel Filgueira da Silva, Mehmet Fatih Ozkan, Abhishek Gupta, Marcello Canova

Abstract: Efficient parameter identification of electrochemical models is crucial for accurate monitoring and control of lithium-ion cells. This process becomes challenging when applied to complex models that rely on a considerable number of interdependent parameters that affect the output response. Gradient-based and metaheuristic optimization techniques, although previously employed for this task, are lim… ▽ More Efficient parameter identification of electrochemical models is crucial for accurate monitoring and control of lithium-ion cells. This process becomes challenging when applied to complex models that rely on a considerable number of interdependent parameters that affect the output response. Gradient-based and metaheuristic optimization techniques, although previously employed for this task, are limited by their lack of robustness, high computational costs, and susceptibility to local minima. In this study, Bayesian Optimization is used for tuning the dynamic parameters of an electrochemical equivalent circuit battery model (E-ECM) for a nickel-manganese-cobalt (NMC)-graphite cell. The performance of the Bayesian Optimization is compared with baseline methods based on gradient-based and metaheuristic approaches. The robustness of the parameter optimization method is tested by performing verification using an experimental drive cycle. The results indicate that Bayesian Optimization outperforms Gradient Descent and PSO optimization techniques, achieving reductions on average testing loss by 28.8% and 5.8%, respectively. Moreover, Bayesian optimization significantly reduces the variance in testing loss by 95.8% and 72.7%, respectively. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 6 pages

arXiv:2402.03830 [pdf, other]

OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving

Authors: Guohang Yan, Jiahao Pi, Jianfei Guo, Zhaotong Luo, Min Dou, Nianchen Deng, Qiusheng Huang, Daocheng Fu, Licheng Wen, Pinlong Cai, Xing Gao, Xinyu Cai, Bo Zhang, Xuemeng Yang, Yeqi Bai, Hongbin Zhou, Botian Shi

Abstract: With deep learning and computer vision technology development, autonomous driving provides new solutions to improve traffic safety and efficiency. The importance of building high-quality datasets is self-evident, especially with the rise of end-to-end autonomous driving algorithms in recent years. Data plays a core role in the algorithm closed-loop system. However, collecting real-world data is ex… ▽ More With deep learning and computer vision technology development, autonomous driving provides new solutions to improve traffic safety and efficiency. The importance of building high-quality datasets is self-evident, especially with the rise of end-to-end autonomous driving algorithms in recent years. Data plays a core role in the algorithm closed-loop system. However, collecting real-world data is expensive, time-consuming, and unsafe. With the development of implicit rendering technology and in-depth research on using generative models to produce data at scale, we propose OASim, an open and adaptive simulator and autonomous driving data generator based on implicit neural rendering. It has the following characteristics: (1) High-quality scene reconstruction through neural implicit surface reconstruction technology. (2) Trajectory editing of the ego vehicle and participating vehicles. (3) Rich vehicle model library that can be freely selected and inserted into the scene. (4) Rich sensors model library where you can select specified sensors to generate data. (5) A highly customizable data generation system can generate data according to user needs. We demonstrate the high quality and fidelity of the generated data through perception performance evaluation on the Carla simulator and real-world data acquisition. Code is available at https://github.com/PJLab-ADG/OASim. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 10 pages, 9 figures

arXiv:2305.16840 [pdf, other]

Automatic Surround Camera Calibration Method in Road Scene for Self-driving Car

Authors: Jixiang Li, Jiahao Pi, Guohang Yan, Yikang Li

Abstract: With the development of autonomous driving technology, sensor calibration has become a key technology to achieve accurate perception fusion and localization. Accurate calibration of the sensors ensures that each sensor can function properly and accurate information aggregation can be achieved. Among them, camera calibration based on surround view has received extensive attention. In autonomous dri… ▽ More With the development of autonomous driving technology, sensor calibration has become a key technology to achieve accurate perception fusion and localization. Accurate calibration of the sensors ensures that each sensor can function properly and accurate information aggregation can be achieved. Among them, camera calibration based on surround view has received extensive attention. In autonomous driving applications, the calibration accuracy of the camera can directly affect the accuracy of perception and depth estimation. For online calibration of surround-view cameras, traditional feature extraction-based methods will suffer from strong distortion when the initial extrinsic parameters error is large, making these methods less robust and inaccurate. More existing methods use the sparse direct method to calibrate multi-cameras, which can ensure both accuracy and real-time performance and is theoretically achievable. However, this method requires a better initial value, and the initial estimate with a large error is often stuck in a local optimum. To this end, we introduce a robust automatic multi-cameras (pinhole or fisheye cameras) calibration and refinement method in the road scene. We utilize the coarse-to-fine random-search strategy, and it can solve large disturbances of initial extrinsic parameters, which can make up for falling into optimal local value in nonlinear optimization methods. In the end, quantitative and qualitative experiments are conducted in actual and simulated environments, and the result shows the proposed method can achieve accuracy and robustness performance. The open-source code is available at https://github.com/OpenCalib/SurroundCameraCalib. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 6 pages, 7 figures

arXiv:2302.02057 [pdf, other]

Semantic Diffusion Network for Semantic Segmentation

Authors: Haoru Tan, Sitong Wu, Jimin Pi

Abstract: Precise and accurate predictions over boundary areas are essential for semantic segmentation. However, the commonly-used convolutional operators tend to smooth and blur local detail cues, making it difficult for deep models to generate accurate boundary predictions. In this paper, we introduce an operator-level approach to enhance semantic boundary awareness, so as to improve the prediction of the… ▽ More Precise and accurate predictions over boundary areas are essential for semantic segmentation. However, the commonly-used convolutional operators tend to smooth and blur local detail cues, making it difficult for deep models to generate accurate boundary predictions. In this paper, we introduce an operator-level approach to enhance semantic boundary awareness, so as to improve the prediction of the deep semantic segmentation model. Specifically, we first formulate the boundary feature enhancement as an anisotropic diffusion process. We then propose a novel learnable approach called semantic diffusion network (SDN) to approximate the diffusion process, which contains a parameterized semantic difference convolution operator followed by a feature fusion module. Our SDN aims to construct a differentiable mapping from the original feature to the inter-class boundary-enhanced feature. The proposed SDN is an efficient and flexible module that can be easily plugged into existing encoder-decoder segmentation models. Extensive experiments show that our approach can achieve consistent improvements over several typical and state-of-the-art segmentation baseline models on challenging public benchmarks. The code will be released soon. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Comments: Accepted by NeurIPS2022

arXiv:2212.04976 [pdf, other]

Augmentation Matters: A Simple-yet-Effective Approach to Semi-supervised Semantic Segmentation

Authors: Zhen Zhao, Lihe Yang, Sifan Long, Jimin Pi, Luping Zhou, Jingdong Wang

Abstract: Recent studies on semi-supervised semantic segmentation (SSS) have seen fast progress. Despite their promising performance, current state-of-the-art methods tend to increasingly complex designs at the cost of introducing more network components and additional training procedures. Differently, in this work, we follow a standard teacher-student framework and propose AugSeg, a simple and clean approa… ▽ More Recent studies on semi-supervised semantic segmentation (SSS) have seen fast progress. Despite their promising performance, current state-of-the-art methods tend to increasingly complex designs at the cost of introducing more network components and additional training procedures. Differently, in this work, we follow a standard teacher-student framework and propose AugSeg, a simple and clean approach that focuses mainly on data perturbations to boost the SSS performance. We argue that various data augmentations should be adjusted to better adapt to the semi-supervised scenarios instead of directly applying these techniques from supervised learning. Specifically, we adopt a simplified intensity-based augmentation that selects a random number of data transformations with uniformly sampling distortion strengths from a continuous space. Based on the estimated confidence of the model on different unlabeled samples, we also randomly inject labelled information to augment the unlabeled samples in an adaptive manner. Without bells and whistles, our simple AugSeg can readily achieve new state-of-the-art performance on SSS benchmarks under different partition protocols. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: 10 pages, 8 tables

arXiv:2211.11335 [pdf, other]

Instance-specific and Model-adaptive Supervision for Semi-supervised Semantic Segmentation

Authors: Zhen Zhao, Sifan Long, Jimin Pi, Jingdong Wang, Luping Zhou

Abstract: Recently, semi-supervised semantic segmentation has achieved promising performance with a small fraction of labeled data. However, most existing studies treat all unlabeled data equally and barely consider the differences and training difficulties among unlabeled instances. Differentiating unlabeled instances can promote instance-specific supervision to adapt to the model's evolution dynamically.… ▽ More Recently, semi-supervised semantic segmentation has achieved promising performance with a small fraction of labeled data. However, most existing studies treat all unlabeled data equally and barely consider the differences and training difficulties among unlabeled instances. Differentiating unlabeled instances can promote instance-specific supervision to adapt to the model's evolution dynamically. In this paper, we emphasize the cruciality of instance differences and propose an instance-specific and model-adaptive supervision for semi-supervised semantic segmentation, named iMAS. Relying on the model's performance, iMAS employs a class-weighted symmetric intersection-over-union to evaluate quantitative hardness of each unlabeled instance and supervises the training on unlabeled data in a model-adaptive manner. Specifically, iMAS learns from unlabeled instances progressively by weighing their corresponding consistency losses based on the evaluated hardness. Besides, iMAS dynamically adjusts the augmentation for each instance such that the distortion degree of augmented instances is adapted to the model's generalization capability across the training course. Not integrating additional losses and training procedures, iMAS can obtain remarkable performance gains against current state-of-the-art approaches on segmentation benchmarks under different semi-supervised partition protocols. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.11315 [pdf, other]

Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers

Authors: Sifan Long, Zhen Zhao, Jimin Pi, Shengsheng Wang, Jingdong Wang

Abstract: Vision transformers have achieved significant improvements on various vision tasks but their quadratic interactions between tokens significantly reduce computational efficiency. Many pruning methods have been proposed to remove redundant tokens for efficient vision transformers recently. However, existing studies mainly focus on the token importance to preserve local attentive tokens but completel… ▽ More Vision transformers have achieved significant improvements on various vision tasks but their quadratic interactions between tokens significantly reduce computational efficiency. Many pruning methods have been proposed to remove redundant tokens for efficient vision transformers recently. However, existing studies mainly focus on the token importance to preserve local attentive tokens but completely ignore the global token diversity. In this paper, we emphasize the cruciality of diverse global semantics and propose an efficient token decoupling and merging method that can jointly consider the token importance and diversity for token pruning. According to the class token attention, we decouple the attentive and inattentive tokens. In addition to preserving the most discriminative local tokens, we merge similar inattentive tokens and match homogeneous attentive tokens to maximize the token diversity. Despite its simplicity, our method obtains a promising trade-off between model complexity and classification accuracy. On DeiT-S, our method reduces the FLOPs by 35% with only a 0.2% accuracy drop. Notably, benefiting from maintaining the token diversity, our method can even improve the accuracy of DeiT-T by 0.1% after reducing its FLOPs by 40%. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.09799 [pdf, other]

CAE v2: Context Autoencoder with CLIP Target

Authors: Xinyu Zhang, Jiahui Chen, Junkun Yuan, Qiang Chen, Jian Wang, Xiaodi Wang, Shumin Han, Xiaokang Chen, Jimin Pi, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

Abstract: Masked image modeling (MIM) learns visual representation by masking and reconstructing image patches. Applying the reconstruction supervision on the CLIP representation has been proven effective for MIM. However, it is still under-explored how CLIP supervision in MIM influences performance. To investigate strategies for refining the CLIP-targeted MIM, we study two critical elements in MIM, i.e., t… ▽ More Masked image modeling (MIM) learns visual representation by masking and reconstructing image patches. Applying the reconstruction supervision on the CLIP representation has been proven effective for MIM. However, it is still under-explored how CLIP supervision in MIM influences performance. To investigate strategies for refining the CLIP-targeted MIM, we study two critical elements in MIM, i.e., the supervision position and the mask ratio, and reveal two interesting perspectives, relying on our developed simple pipeline, context autodecoder with CLIP target (CAE v2). Firstly, we observe that the supervision on visible patches achieves remarkable performance, even better than that on masked patches, where the latter is the standard format in the existing MIM methods. Secondly, the optimal mask ratio positively correlates to the model size. That is to say, the smaller the model, the lower the mask ratio needs to be. Driven by these two discoveries, our simple and concise approach CAE v2 achieves superior performance on a series of downstream tasks. For example, a vanilla ViT-Large model achieves 81.7% and 86.7% top-1 accuracy on linear probing and fine-tuning on ImageNet-1K, and 55.9% mIoU on semantic segmentation on ADE20K with the pre-training for 300 epochs. We hope our findings can be helpful guidelines for the pre-training in the MIM area, especially for the small-scale models. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2209.09422 [pdf, other]

Bit Allocation using Optimization

Authors: Tongda Xu, Han Gao, Chenjian Gao, Yuanyuan Wang, Dailan He, Jinyong Pi, Jixiang Luo, Ziyu Zhu, Mao Ye, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

Abstract: In this paper, we consider the problem of bit allocation in Neural Video Compression (NVC). First, we reveal a fundamental relationship between bit allocation in NVC and Semi-Amortized Variational Inference (SAVI). Specifically, we show that SAVI with GoP (Group-of-Picture)-level likelihood is equivalent to pixel-level bit allocation with precise rate \& quality dependency model. Based on this equ… ▽ More In this paper, we consider the problem of bit allocation in Neural Video Compression (NVC). First, we reveal a fundamental relationship between bit allocation in NVC and Semi-Amortized Variational Inference (SAVI). Specifically, we show that SAVI with GoP (Group-of-Picture)-level likelihood is equivalent to pixel-level bit allocation with precise rate \& quality dependency model. Based on this equivalence, we establish a new paradigm of bit allocation using SAVI. Different from previous bit allocation methods, our approach requires no empirical model and is thus optimal. Moreover, as the original SAVI using gradient ascent only applies to single-level latent, we extend the SAVI to multi-level such as NVC by recursively applying back-propagating through gradient ascent. Finally, we propose a tractable approximation for practical implementation. Our method can be applied to scenarios where performance outweights encoding speed, and serves as an empirical bound on the R-D performance of bit allocation. Experimental results show that current state-of-the-art bit allocation algorithms still have a room of $\approx 0.5$ dB PSNR to improve compared with ours. Code is available at \url{https://github.com/tongdaxu/Bit-Allocation-Using-Optimization}. △ Less

Submitted 8 May, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

Comments: ICML 2023

arXiv:2209.07694 [pdf, other]

An Extrinsic Calibration Method between LiDAR and GNSS/INS for Autonomous Driving

Authors: Guohang Yan, Jiahao Pi, Chengjie Wang, Xinyu Cai, Yikang Li

Abstract: Accurate and reliable sensor calibration is critical for fusing LiDAR and inertial measurements in autonomous driving. This paper proposes a novel three-stage extrinsic calibration method between LiDAR and GNSS/INS for autonomous driving. The first stage can quickly calibrate the extrinsic parameters between the sensors through point cloud surface features so that the extrinsic can be narrowed fro… ▽ More Accurate and reliable sensor calibration is critical for fusing LiDAR and inertial measurements in autonomous driving. This paper proposes a novel three-stage extrinsic calibration method between LiDAR and GNSS/INS for autonomous driving. The first stage can quickly calibrate the extrinsic parameters between the sensors through point cloud surface features so that the extrinsic can be narrowed from a large initial error to a small error range in little time. The second stage can further calibrate the extrinsic parameters based on LiDAR-mapping space occupancy while removing motion distortion. In the final stage, the z-axis errors caused by the plane motion of the autonomous vehicle are corrected, and an accurate extrinsic parameter is finally obtained. Specifically, This method utilizes the planar features in the environment, making it possible to quickly carry out calibration. Experimental results on real-world data sets demonstrate the reliability and accuracy of our method. The codes are open-sourced on the Github website. The code link is https://github.com/OpenCalib/LiDAR2INS. △ Less

Submitted 28 February, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

Comments: 7 pages, 12 figures, submitted to IROS 2023

arXiv:2011.11619 [pdf, other]

Neural collapse with unconstrained features

Authors: Dustin G. Mixon, Hans Parshall, Jianzong Pi

Abstract: Neural collapse is an emergent phenomenon in deep learning that was recently discovered by Papyan, Han and Donoho. We propose a simple "unconstrained features model" in which neural collapse also emerges empirically. By studying this model, we provide some explanation for the emergence of neural collapse in terms of the landscape of empirical risk. Neural collapse is an emergent phenomenon in deep learning that was recently discovered by Papyan, Han and Donoho. We propose a simple "unconstrained features model" in which neural collapse also emerges empirically. By studying this model, we provide some explanation for the emergence of neural collapse in terms of the landscape of empirical risk. △ Less

Submitted 23 November, 2020; originally announced November 2020.

arXiv:2007.06801 [pdf, other]

Multi-Objective Vehicle Rebalancing for Ridehailing System using a Reinforcement Learning Approach

Authors: Yuntian Deng, Hao Chen, Shiping Shao, Jiacheng Tang, Jianzong Pi, Abhishek Gupta

Abstract: The problem of designing a rebalancing algorithm for a large-scale ridehailing system with asymmetric demand is considered here. We pose the rebalancing problem within a semi Markov decision problem (SMDP) framework with closed queues of vehicles serving stationary, but asymmetric demand, over a large city with multiple nodes (representing neighborhoods). We assume that the passengers queue up at… ▽ More The problem of designing a rebalancing algorithm for a large-scale ridehailing system with asymmetric demand is considered here. We pose the rebalancing problem within a semi Markov decision problem (SMDP) framework with closed queues of vehicles serving stationary, but asymmetric demand, over a large city with multiple nodes (representing neighborhoods). We assume that the passengers queue up at every node until they are matched with a vehicle. The goal of the SMDP is to minimize a convex combination of the waiting time of the passengers and the total empty vehicle miles traveled. The resulting SMDP appears to be difficult to solve for closed-form expression for the rebalancing strategy. As a result, we use a deep reinforcement learning algorithm to determine the approximately optimal solution to the SMDP. The trained policy is compared with other well-known algorithms for rebalancing, which are designed to address other objectives (such as to minimize demand drop probability) for the ridehailing problem. △ Less

Submitted 14 July, 2020; originally announced July 2020.

arXiv:2003.02228 [pdf, other]

doi 10.3233/FAIA200199

PushNet: Efficient and Adaptive Neural Message Passing

Authors: Julian Busch, Jiaxing Pi, Thomas Seidl

Abstract: Message passing neural networks have recently evolved into a state-of-the-art approach to representation learning on graphs. Existing methods perform synchronous message passing along all edges in multiple subsequent rounds and consequently suffer from various shortcomings: Propagation schemes are inflexible since they are restricted to $k$-hop neighborhoods and insensitive to actual demands of in… ▽ More Message passing neural networks have recently evolved into a state-of-the-art approach to representation learning on graphs. Existing methods perform synchronous message passing along all edges in multiple subsequent rounds and consequently suffer from various shortcomings: Propagation schemes are inflexible since they are restricted to $k$-hop neighborhoods and insensitive to actual demands of information propagation. Further, long-range dependencies cannot be modeled adequately and learned representations are based on correlations of fixed locality. These issues prevent existing methods from reaching their full potential in terms of prediction performance. Instead, we consider a novel asynchronous message passing approach where information is pushed only along the most relevant edges until convergence. Our proposed algorithm can equivalently be formulated as a single synchronous message passing iteration using a suitable neighborhood function, thus sharing the advantages of existing methods while addressing their central issues. The resulting neural network utilizes a node-adaptive receptive field derived from meaningful sparse node neighborhoods. In addition, by learning and combining node representations over differently sized neighborhoods, our model is able to capture correlations on multiple scales. We further propose variants of our base model with different inductive bias. Empirical results are provided for semi-supervised node classification on five real-world datasets following a rigorous evaluation protocol. We find that our models outperform competitors on all datasets in terms of accuracy with statistical significance. In some cases, our models additionally provide faster runtime. △ Less

Submitted 17 December, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

Journal ref: 24th European Conference on Artificial Intelligence (ECAI 2020)

arXiv:1904.10778 [pdf, other]

Some Limit Properties of Markov Chains Induced by Stochastic Recursive Algorithms

Authors: Abhishek Gupta, Hao Chen, Jianzong Pi, Gaurav Tendolkar

Abstract: Recursive stochastic algorithms have gained significant attention in the recent past due to data driven applications. Examples include stochastic gradient descent for solving large-scale optimization problems and empirical dynamic programming algorithms for solving Markov decision problems. These recursive stochastic algorithms approximate certain contraction operators and can be viewed within the… ▽ More Recursive stochastic algorithms have gained significant attention in the recent past due to data driven applications. Examples include stochastic gradient descent for solving large-scale optimization problems and empirical dynamic programming algorithms for solving Markov decision problems. These recursive stochastic algorithms approximate certain contraction operators and can be viewed within the framework of iterated random operators. Accordingly, we consider iterated random operators over a Polish space that simulate iterated contraction operator over that Polish space. Assume that the iterated random operators are indexed by certain batch sizes such that as batch sizes grow to infinity, each realization of the random operator converges (in some sense) to the contraction operator it is simulating. We show that starting from the same initial condition, the distribution of the random sequence generated by the iterated random operators converges weakly to the trajectory generated by the contraction operator. We further show that under certain conditions, the time average of the random sequence converges to the spatial mean of the invariant distribution. We then apply these results to logistic regression, empirical value iteration, and empirical Q value iteration for finite state finite action MDPs to illustrate the general theory develop here. △ Less

Submitted 23 July, 2020; v1 submitted 24 April, 2019; originally announced April 2019.

Comments: Accepted in SIMODS, 37 pages

arXiv:1904.00450 [pdf, other]

Two Algorithms for Computing Exact and Approximate Nash Equilibria in Bimatrix Games

Authors: Jianzong Pi, Joseph L. Heyman, Abhishek Gupta

Abstract: In this paper, we first devise two algorithms to determine whether or not a bimatrix game has a strategically equivalent zero-sum game. If so, we propose an algorithm that computes the strategically equivalent zero-sum game. If a given bimatrix game is not strategically equivalent to a zero-sum game, we then propose an approach to compute a zero-sum game whose saddle-point equilibrium can be mappe… ▽ More In this paper, we first devise two algorithms to determine whether or not a bimatrix game has a strategically equivalent zero-sum game. If so, we propose an algorithm that computes the strategically equivalent zero-sum game. If a given bimatrix game is not strategically equivalent to a zero-sum game, we then propose an approach to compute a zero-sum game whose saddle-point equilibrium can be mapped to a well-supported approximate Nash equilibrium of the original game. We conduct extensive numerical simulation to establish the efficacy of the two algorithms. △ Less

Submitted 11 August, 2021; v1 submitted 31 March, 2019; originally announced April 2019.

Comments: 20 pages, 3 figures. Replaces "On the Computation of Strategically Equivalent Rank-0 Games" by condensing the main results of that paper and extending the results with an algorithm for well-supported approximate Nash equilibrium. Submitted to 2021 Conference on Decision and Game Theory for Security (GameSec 2021)

arXiv:1805.00625 [pdf, other]

Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features

Authors: Didan Deng, Yuqian Zhou, Jimin Pi, Bertram E. Shi

Abstract: The integration of information across multiple modalities and across time is a promising way to enhance the emotion recognition performance of affective systems. Much previous work has focused on instantaneous emotion recognition. The 2018 One-Minute Gradual-Emotion Recognition (OMG-Emotion) challenge, which was held in conjunction with the IEEE World Congress on Computational Intelligence, encour… ▽ More The integration of information across multiple modalities and across time is a promising way to enhance the emotion recognition performance of affective systems. Much previous work has focused on instantaneous emotion recognition. The 2018 One-Minute Gradual-Emotion Recognition (OMG-Emotion) challenge, which was held in conjunction with the IEEE World Congress on Computational Intelligence, encouraged participants to address long-term emotion recognition by integrating cues from multiple modalities, including facial expression, audio and language. Intuitively, a multi-modal inference network should be able to leverage information from each modality and their correlations to improve recognition over that achievable by a single modality network. We describe here a multi-modal neural architecture that integrates visual information over time using an LSTM, and combines it with utterance level audio and text cues to recognize human sentiment from multimodal clips. Our model outperforms the unimodal baseline, achieving the concordance correlation coefficients (CCC) of 0.400 on the arousal task, and 0.353 on the valence task. △ Less

Submitted 4 May, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

Comments: 5 pages, 1 figure, subject to the 2018 IJCNN challenge on One-Minute Gradual-Emotion Recognition

Showing 1–17 of 17 results for author: Pi, J