Aspect-Based Sentiment Analysis Techniques: A Comparative Study

Dineth Jayakody12, Koshila Isuranda12, A V A Malkith12, Nisansa de Silva3,
Sachintha Rajith Ponnamperuma4, G G N Sandamali15, K L K Sudheera15 5{nadeesha, kushan}@eie.ruh.ac.lk
1Department of Electrical and Information Engineering, University of Ruhuna
2{jayakody_ds_e21,isuranda_mak_e21, malkith_ava_e21}@engug.ruh.ac.lk
3Department of Computer Science & Engineering, University of Moratuwa
3[email protected]
4Emojot Inc.
4[email protected]
Abstract

Since the dawn of the digitalisation era, customer feedback and online reviews are unequivocally major sources of insights for businesses. Consequently, conducting comparative analyses of such sources has become the de facto modus operandi of any business that wishes to give itself a competitive edge over its peers and improve customer loyalty. Sentiment analysis is one such method instrumental in gauging public interest, exposing market trends, and analysing competitors. While traditional sentiment analysis focuses on overall sentiment, as the needs advance with time, it has become important to explore public opinions and sentiments on various specific subjects, products and services mentioned in the reviews on a finer-granular level. To this end, Aspect-based Sentiment Analysis (ABSA), supported by advances in Artificial Intelligence (AI) techniques which have contributed to a paradigm shift from simple word-level analysis to tone and context-aware analyses, focuses on identifying specific aspects within the text and determining the sentiment associated with each aspect. In this study, we compare several deep-NN methods for ABSA on two benchmark datasets (Restaurant-14 and Laptop-14) and found that FAST LSA obtains the best overall results of 87.6% and 82.6% accuracy but does not pass LSA+DeBERTa which reports 90.33% and 86.21% accuracy respectively.

Index Terms:
Aspect-based Sentiment Analysis, Comparative Analysis, BERT-based Deep Neural Methods, Benchmark Study

I Introduction

Social media and other online platforms have enjoyed exponential growth, which in turn has created an unprecedented abundance of user-generated content. However, conversely, this has added a de facto expectation on businesses to understand the user sentiments expressed in these texts if they intend to make informed decisions and enhance customer satisfaction. Aspect-based sentiment analysis (ABSA) has emerged as a valuable technique in Natural Language Processing (NLP) to analyze opinions at a finer granular level by identifying sentiment towards specific aspects or features within a given domain [1, 2, 3, 4, 5, 6]. In this paper, we focus on domain-specific ABSA [4, 2, 3], particularly focusing on its application in analyzing customer reviews, which provides valuable feedback for businesses to improve their products or services.

In our initial analysis, we present the accuracy levels of various ABSA models using the benchmark 2014 SemEval restaurant and laptop datasets [7]. The results are tabulated to provide a comparative view of their performance in sentiment analysis tasks. Following this assessment, we proceed to explore avenues for improving model performance through fine-tuning and testing on the same dataset. Through the fine-tuning process, our objective is to adapt these pre-trained models to the specific variations of the domain under consideration, thereby enhancing their effectiveness in capturing sentiment expressions related to various aspects contained within customer reviews. Specifically, we focus on the Llama 2 model, which utilizes parameter-efficient techniques enabled by the QLora architecture, as well as the FAST_LSA_T_V2 model, integrated into the PyABSA framework. Additionally, we investigate transformer pre-trained models within the context of the SetFit framework. Leveraging this framework, we experiment with hybrid models by connecting different transformer models and testing them on the aforementioned dataset.

Further elaboration on these models and their fine-tuning methodologies will be provided in subsequent sections of this paper, where we look into their architectures, techniques, and experimental results in greater detail.

II Literature Review

Rietzler et al. [8] undertook a comprehensive study focusing on Aspect-Target Sentiment Classification (ATSC) within ABSA, presenting a novel two-step approach. Their methodology involved domain-specific fine-tuning of BERT [9] language models followed by task-specific fine-tuning, resulting in an accuracy of approximately 84.06% with the BERT-ADA model which surpassed the performance of baseline models such as vanilla BERT-base and XLNet-base [10]. The success of this model underscores the significance of domain-specific considerations for improving model robustness and performance in real-world applications.

Subsequent studies by Karimi et al. [11] and Bai et al. [12] explored alternative approaches utilizing BERT-based models for ABSA. Karimi et al. [11] introduced adversarial training to enhance ABSA performance, leveraging artificial data generation through adversarial processes. Their BERT Adversarial Training (BAT) architecture surpassed both general-purpose BERT and domain-specific post-trained BERT (BERT-PT) models in ABSA tasks, without the need for extensive manual labelling. Similarly, Karimi et al. [13] introduced two novel modules, Parallel Aggregation and Hierarchical Aggregation, to augment ABSA using BERT. These modules aimed to enhance Aspect Extraction (AE) and Aspect Sentiment Classification (ASC) tasks, yielding superior performance compared to post-trained vanilla BERT.

Introducing a multi-task learning model for ABSA, Yang et al. [14] achieved an accuracy of 86.60% with their LCF-ATEPC model. Meanwhile, Dai et al. [15] explored the potential of pre-trained models (PTMs), particularly the RoBERTa [16] model in ABSA tasks but could not surpass the performance of LCF-ATEPC model. The superior performance of the LCF-ATEPC highlights the efficacy of multi-task learning approaches in ABSA, showcasing the importance of considering aspect term extraction (ATE) alongside polarity classification for comprehensive sentiment analysis.

DeBERTa [17] (Decoding-enhanced BERT with Disentangled Attention)-based models were explored by Silva and Marcacini [18] and Yang and Li [19], introducing ABSA-DeBERTa and LSA+DeBERTa-V3-Large, respectively. Silva and Marcacini [18] delved into disentangled learning to enhance BERT-based representations in ABSA. They separated syntactic and semantic features, showcasing the improvement in ABSA task performance through the incorporation of disentangled attention. This enabled the isolation of position and content vectors, potentially enhancing model performance by focusing on syntactic and semantic aspects separately. On the other hand, Yang and Li [19] introduced a novel perspective in aspect-based sentiment classification (ABSC) by emphasizing the significance of aspect sentiment coherency. Subsequently, Xing and Tsang [20] and Zhang et al. [21] also explored the utilization of BERT-based models, introducing KaGRMN-DSG (Knowledge-aware Gated Recurrent Memory Network with Dual Syntax Graph Modeling) and DPL-BERT, respectively. However, despite these advancements, neither KaGRMN-DSG nor DPL-BERT could surpass the accuracy achieved by the LCF-ATEPC [14] model.

In Table I we summarize the accuracies of the models the relevant studies in the literature have reported for the benchmark SemEval [7] Restaurant (Res-14) and Laptop (Lap-14) datasets. The accuracies range from 82.69% to 88.27%, showcasing the varying degrees of success in sentiment analysis across different approaches.

TABLE I: Accuracies of models on the SemEval 2014 [7] benchmark
Model Accuracy
Res-14 Lap-14 Mean
BAT [11] 86.03 79.35 82.69
PH-SUM [13] 86.37 79.55 82.96
RGAT+ [12] 86.68 80.94 83.81
BERT-ADA [8] 87.89 80.23 84.06
KaGRMN-DSG [20] 87.35 81.87 84.61
RoBERTa+MLP [15] 87.37 83.78 85.58
DPL-BERT [21] 89.54 81.96 85.75
ABSA-DeBERTa [18] 89.46 82.76 86.11
LCF-ATEPC [14] 90.18 83.02 86.60
LSA+DeBERTa-V3-Large [19] 90.33 86.21 88.27

III Methodology

Here we followed three innovative approaches in NLP: 1) LLaMA 2 fine-tuning with Parameter-Efficient Fine-Tuning (PEFT) techniques such as QLoRA; 2) SETFIT for efficient few-shot fine-tuning of Sentence Transformers; and 3) FAST LSA [22] V2 on PyABSA framework.

III-A LLaMA with QLoRA

Given the current state-of-the-art interest in Large Language Models (LLMs), we opted to include an LLM-based analysis in our comparative study. LLaMA 2 is a collection of second-generation open-source LLMs from Meta that comes with a commercial license. Roumeliotis et al. [23] presented that LLaMA 2 shows a significant leap forward in natural language understanding and generation, by its advanced architecture, large training data and refined training strategies. The architecture of LLaMA 2 is based on the transformer model, a neural network architecture that has proven highly effective in a wide range of NLP tasks. LLaMA 2 employs a multi-layered transformer architecture with self-attention mechanisms. It is designed to handle a wide range of natural language processing tasks, with models ranging in scale from 7 billion to 70 billion parameters.

Fine-tuning in machine learning is the process of adjusting the weights and parameters of a pre-trained model on new data to improve its performance on a specific task. There are three main fine-tuning methods in the context:

  1. 1.

    Instruction Fine-Tunning (IFT): According to Peng et al. [24], IFT involves training the model using prompt completion pairs, showing desired responses to queries.

  2. 2.

    Full Fine Tunning: Full fine-tuning involves updating all of the weights in a pre-trained model during training on a new dataset, allowing the model to adapt to a specific task.

  3. 3.

    Parameter-Efficient Fine-Tunning (PEFT): Selectively updates a small set of parameters, making memory requirements more manageable. There are various ways of achieving Parameter efficient fine-tuning. Low-Rank Parameter (LoRA) [25] and Quantized Low-Ranking Adaptation (QLoRA) [26] are the most widely used and effective.

Traditional fine-tuning of pre-trained language models (PLMs) requires updating all of the model’s parameters, which is computationally expensive and requires massive amounts of data; thus making it challenging to attempt on consumer hardware due to inadequate VRAMs and computing. However, Parameter-Efficient Fine-Tuning (PEFT) works by only updating a small subset of the model’s most influential parameters, making it much more efficient. Four-bit quantization via QLoRA allows such efficient fine-tuning of huge LLM models on consumer hardware while retaining high performance. QLoRA quantizes a pre-trained language model to four bits and freezes the parameters. A small number of trainable Low-Rank Adapter layers are then added to the model. In our case, we created a 4-bit quantization with NF4-type configuration using BitsAndBytes111https://github.com/TimDettmers/bitsandbytes.

According to Dettmers et al. [26] under the model fine-tuning process, Supervised fine-tuning (SFT) is a key step in Reinforcement Learning from Human Feedback (RLHF). The SFT models come with tools to train language models using reinforcement learning, starting with supervised fine-tuning, then reward modelling, and finally, Proximal Policy Optimization (PPO). During this process, we provided the SFT trainer with the model, dataset, LoRA configuration, tokenizer, and training parameters.

To test the fine-tuned model, we used the Transformers text generation pipeline including the prompt. The LLaMA 2 model was fine-tuned using techniques such as QLoRA, PEFT, and SFT to overcome memory and computational limitations. By utilizing Hugging Face libraries such as transformers222https://huggingface.co/transformers/, accelerate333https://huggingface.co/accelerate/, peft444https://huggingface.co/peft/, trl555https://huggingface.co/trl/, and bitsandbytes, we were able to successfully fine-tune the 7B parameter LLaMA 2 model on a consumer GPU.

III-B SetFit

Few-shot learning has become increasingly essential in addressing label-scarce scenarios, where data annotation is often time-consuming and expensive. These methods aim to adapt pre-trained language models (PLMs) to specific downstream tasks using only a limited number of labelled training examples. One of the primary obstacles is the reliance on large-scale language models, which typically contain billions of parameters, demanding substantial computational resources and specialized infrastructure. Moreover, these methods frequently require manual crafting of prompts, introducing variability and complexity in the training process, thus restricting accessibility for researchers and practitioners.

In response to this, Tunstall et al. [27] proposed SETFIT (Sentence Transformer Fine-tuning) which presents an innovative framework for efficient and prompt-free few-shot fine-tuning of Sentence Transformers (ST). Diverging from existing methods, SETFIT does not necessitate manually crafted prompts and achieves high accuracy with significantly fewer parameters. The SETFIT approach consists of two main steps. In the first step, the ST is fine-tuned using a contrastive loss function, encouraging the model to learn discriminative representations of similar and dissimilar text pairs. In the second step, a simple classification head is trained on top of the fine-tuned ST to perform downstream tasks such as text classification or similarity ranking. By decoupling the fine-tuning and classification steps, SETFIT achieves high accuracy with orders of magnitude fewer parameters than existing methods, making it computationally efficient and scalable. In our study, we utilized several available sentence transformers through the SETFIT framework to obtain accuracies for aspect extraction and sentiment polarity identification.

III-C PyABSA

Yang and Li [28] addressed the challenge of the lack of a unified framework for ABSA by developing PyABSA, an open-source ABSA framework. PyABSA integrates ATE and text classification functionalities alongside ASC within a modular architecture. This design facilitates adaptation to various ABSA subtasks and supports multilingual modelling and automated dataset annotation, thereby streamlining ABSA applications.

Moreover, PyABSA offers multi-task-based ATESC models, which are pipeline models capable of simultaneously performing ATE and ASC sub-tasks.To tackle the data shortage problem, PyABSA provides automated dataset annotation interfaces and manual dataset annotation tools, encouraging community participation in annotating and contributing custom datasets to the repository.

In our study, we utilized the PyABSA framework on the SemEval 2014 restaurant and laptop benchmark dataset to evaluate accuracy to be consistent with the practices followed in the literature as shown in Table I. Specifically, we employed the FAST_LSA_T_V2 model with PyABSA, which is included in the english checkpoint, to assess aspect extraction performance.

IV Results

TABLE II: Accuracies of Models Evaluated by this study on the SemEval 2014 [7] benchmark. The Symbol column is used to refer to the same models in Fig 1 for the sake of brevity.
Model Symbol Accuracy (%)
Res-14 Lap-14
Aspect Extraction Sentiment Polarity Aspect Extraction Sentiment Polarity
Llama-2-7b [29] with QLoRA [26] - 35.75 65.84 71.00 65.00
SETFIT [27] BGE [30] (Small) A 60.10 73.20 86.50 74.80
Sentence-T5 [31] (Base) B 78.70 77.90 62.60 71.60
RoBERTa-STSb-v2 [16, 32] (Base) C 79.20 78.70 78.30 66.10
Paraphrase-MiniLM-L6-v2 [33, 32] D 79.80 62.00 80.80 61.40
+MpNet [34] E 85.40 79.50 79.40 70.00
CLIP-ViT-B-32-multilingual-v1 [35, 32] F 81.90 69.30 81.70 52.60
SPECTER [36] G 81.90 71.60 78.60 49.60
GTR [37] (Base) H 82.30 72.40 74.10 74.00
TinyBERT [38, 32] I 83.10 72.40 81.20 62.90
ALBERT [39, 32] J 76.99 71.65 77.60 65.35
+DistilRoBERTa [40] K 84.50 75.50 82.00 66.90
DistilRoBERTa [40] L 85.00 73.20 80.80 66.10
+All-MiniLM-L6-v2 [33, 32] M 85.90 71.60 77.50 65.30
MpNet [34] N 87.16 77.95 87.68 70.07
LaBSE [41] O 90.30 76.40 88.40 65.40
+MpNet [34] P 88.50 74.80 89.50 75.60
+GTR [37] (Base) Q 88.50 74.00 87.30 73.20
+RoBERTa-STSb-v2 [16, 32] (Base) R 88.50 80.30 89.50 70.10
FAST LSA [22] V2 on PyABSA [42] - 87.67 82.60
ABCDEFGHIJKLMNOPQR4545454550505050555555556060606065656565707070707575757580808080858585859090909095959595100100100100Aspect Extraction Accuracy (%)Res-14 Lap-14
(a) Aspect Extraction Accuracy
ABCDEFGHIJKLMNOPQR4545454550505050555555556060606065656565707070707575757580808080858585859090909095959595100100100100Sentiment Polarity Identification Percentage (%)Res-14 Lap-14
(b) Sentiment Polarity Percentage
Figure 1: Results obtained from SETFIT Models. The models are marked as shown in the symbols column in Table II for brevity.

IV-A LLaMA with QLoRA

The first section of Table II shows the performance of Llama-2-7b [29] with QLoRA [26]. These performances were obtained using the L4 GPU. It can be noted that even though the sentiment polarity results are comparable between the two datasets, the aspect extraction on Res-14 is several magnitudes weaker than that of Lap-14.

IV-B SetFit

In the second section of Table II, we provide a comprehensive overview of the accuracies attained by various sentence models using the SETFIT framework [27]. Due to the modular nature of the SETFIT framework, we could fine-tune and test combinations of models. If a model is reported in a single row, it means we have used the said sentence transformer model for both aspect extraction and sentiment polarity identification (eg, BGE [30]). In the cell blocks where a model is followed by other models with +++ are combinations. For example, the first row of Paraphrase-MiniLM-L6-v2 [33, 32] contains results of that model being used both for aspect extraction and sentiment polarity identification. The subsequent line with +MpNet [34] indicates that Paraphrase-MiniLM-L6-v2 was used for the aspect extraction component and MpNet was used for the sentiment polarity identification component.

At this point, a question may be raised as to why would the aspect extraction have two different values for accuracy (79.80 vs. 85.40) in the two configurations if in both cases the same model (ie, Paraphrase-MiniLM-L6-v2 in this example) was used for that task. The reason is the fact that the fine-tuning is conducted end-to-end in a holistic manner and thus, the choice of the model used for the sentiment polarity identification ends up influencing the ultimate accuracy obtained by the aspect extraction component. It may enhance the result as in the case of Paraphrase-MiniLM-L6-v2 and MpNet. It may also hinder as in the case of LaBSE.

Overall, it can be noted that LaBSE [41] consistently emerges as a standout performer; either by itself or as the aspect extraction component of a pair. It can be argued that this robust performance is owed to its capability to capture nuanced complex information crucial for understanding both aspect-based sentiment analysis and sentiment polarity classification tasks. Specifically on the sentiment polarity classification task, it can be noted that Mpnet and RoBERTa-STSb-v2 [16] elevates performance multiple configurations. Further, the results also reveal domain-specific variations in model performance, as evidenced by the disparity between that Res-14 and Lap-14 results of SPECTER [36].

To give a better overview of how various models perform, we include 1(a) and 1(b) which visualize the SetFit results discussed in Table II. In 1(a), we present a detailed analysis of aspect extraction accuracy for various models. LaBSE emerging as the top performer across both datasets can easily be noted. It is also evident how ALBERT+DistilRoBERTa and LaBSE+RoBERTa-STSb closely follow with accuracies verging on 90%. The outlying low accuracies of BGE [30] and Sentence-T5 [31] are also evident. Similarly, in 1(b), we look into the analysis of sentiment polarity identification percentages for the same models and datasets. Here, LaBSE+RoBERTa-STSb shows the highest accuracy for Res-14 while LaBSE+MpNet shows the highest accuracy for Lap-14. These results reaffirm the effectiveness of LaBSE across both tasks and datasets. Conversely, models such as CLIP-ViT-B-32-multilingual-v1 [35] and SPECTER demonstrate relatively lower performances.

IV-C PyABSA

The third section of Table II reports the results obtained from the implementation of the FAST LSA [22] model on PyABSA [42]. PyABSA is also fine-tuned end-to-end similar to SETFIT [27]. However, unlike SETFIT, PyABSA does not report Aspect Extraction and Sentiment Polarity accuracies separately. It only gives an overall value. This is the reason for it having only one value per dataset in Table II. Alternatively, it is not wrong to take the reported accuracies as the values for the Sentiment Polarity task as it is the task that we have at the tail end of the pipe. If regarded in that perspective, it can be claimed that FAST LSA on PyABSA has the best results for Sentiment Polarity among all the model combinations and configurations tested by this study. Hence we have opted to highlight those results in bold as we did for the best results in the second (SETFIT) section.

V Conclusion

This study evaluates three NLP approaches for ABSA: 1) LLaMA 2 fine-tuning with Parameter-Efficient Fine-Tuning (PEFT) technique QLoRA; 2) SETFIT for efficient few-shot fine-tuning of Sentence Transformers; and 3) FAST LSA [22] V2 on PyABSA framework. These approaches aimed to overcome memory and computational limitations while enhancing efficiency and scalability in NLP tasks.

We observe that LLaMA 2, a collection of second-generation open-source LLMs, after fine-tuning with 4-bit quantization via Parameter-Efficient Fine-Tuning (PEFT) QLoRA only manages middling performance. From the modular options in SETFIT, fine-tuned LaBSE models demonstrate standout performances. Finally, FAST LSA on PyABSA gives out the overall best performance with 87.6% and 82.6% accuracy respectively for Res-14 and Lap-14 datasets. Nevertheless, none of the tested models are able to surpass the reported accuracy of LSA+DeBERTa-V3-Large [19] which claims 90.33% and 86.21% respectively. In summary, this study explores the importance of innovative methodologies such as fine-tuning techniques, prompt-free few-shot learning, and modular frameworks in advancing NLP tasks.

References

  • Mudalige et al. [2020] C. R. Mudalige, D. Karunarathna, I. Rajapaksha, N. de Silva, G. Ratnayaka, A. S. Perera, and R. Pathirana, “SigmaLaw-ABSA: Dataset for Aspect-Based Sentiment Analysis in Legal Opinion Texts,” in 2020 IEEE 15th international conference on industrial and information systems (ICIIS).   IEEE, 2020, pp. 488–493.
  • Rajapaksha et al. [2021] I. Rajapaksha, C. R. Mudalige, D. Karunarathna, N. de Silva, A. S. Perera, and G. Ratnayaka, “Sigmalaw PBSA-A Deep Learning Model for Aspect-Based Sentiment Analysis for the Legal Domain,” in International Conference on Database and Expert Systems Applications.   Springer, 2021, pp. 125–137.
  • Jayasinghe et al. [2021] S. Jayasinghe, L. Rambukkanage, A. Silva, N. de Silva, and A. S. Perera, “Party-based Sentiment Analysis Pipeline for the Legal Domain,” in 2021 21st International Conference on Advances in ICT for Emerging Regions (ICter).   IEEE, 2021, pp. 171–176.
  • Rajapaksha et al. [2020] I. Rajapaksha, C. R. Mudalige, D. Karunarathna, N. de Silva, G. Rathnayaka, and A. S. Perera, “Rule-Based Approach for Party-Based Sentiment Analysis in Legal Opinion Texts,” in 2020 20th International Conference on Advances in ICT for Emerging Regions (ICTer).   IEEE, 2020, pp. 284–285.
  • Rajapaksha et al. [2022] I. Rajapaksha, C. R. Mudalige, D. Karunarathna, N. de Silva, G. Ratnayaka, and A. S. Perera, “Sigmalaw PBSA-A Deep Learning Approach for Aspect-Based Sentiment Analysis in Legal Opinion Texts,” J. Data Intell., vol. 3, no. 1, pp. 101–115, 2022.
  • Samarawickrama et al. [2022] C. Samarawickrama, M. de Almeida, N. de Silva, G. Ratnayaka, and A. S. Perera, “Legal Party Extraction from Legal Opinion Texts Using Recurrent Deep Neural Networks,” J. Data Intell., vol. 3, no. 3, pp. 350–365, 2022.
  • Pontiki et al. [2014] M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar, “SemEval-2014 task 4: Aspect based sentiment analysis,” in SemEval 2014, 2014, pp. 27–35.
  • Rietzler et al. [2019] A. Rietzler, S. Stabinger, P. Opitz, and S. Engl, “Adapt or get left behind: Domain adaptation through bert language model finetuning for aspect-target sentiment classification,” arXiv preprint arXiv:1908.11860, 2019.
  • Devlin et al. [2018] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  • Yang et al. [2019] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “XLNet: Generalized Autoregressive Pretraining for Language Understanding,” NeurIPS, vol. 32, 2019.
  • Karimi et al. [2021] A. Karimi, L. Rossi, and A. Prati, “Adversarial Training for Aspect-Based Sentiment Analysis with BERT,” in ICPR.   IEEE, 2021, pp. 8797–8803.
  • Bai et al. [2020] X. Bai, P. Liu, and Y. Zhang, “Investigating typed syntactic dependencies for targeted sentiment classification using graph attention neural network,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 503–514, 2020.
  • Karimi et al. [2020] A. Karimi, L. Rossi, and A. Prati, “Improving bert performance for aspect-based sentiment analysis,” arXiv preprint arXiv:2010.11731, 2020.
  • Yang et al. [2021] H. Yang, B. Zeng, J. Yang, Y. Song, and R. Xu, “A multi-task learning model for chinese-oriented aspect polarity classification and aspect term extraction,” Neurocomputing, vol. 419, pp. 344–356, 2021.
  • Dai et al. [2021] J. Dai, H. Yan, T. Sun, P. Liu, and X. Qiu, “Does syntax matter? a strong baseline for aspect-based sentiment analysis with roberta,” arXiv preprint arXiv:2104.04986, 2021.
  • Cer et al. [2017] D. Cer, M. Diab, E. Agirre, I. Lopez-Gazpio, and L. Specia, “SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation,” in SemEval, 2017.
  • He et al. [2020] P. He, X. Liu, J. Gao, and W. Chen, “DeBERTa: Decoding-enhanced BERT with Disentangled Attention,” in ICLR, 2020.
  • Silva and Marcacini [2021] E. H. d. Silva and R. M. Marcacini, “Aspect-based sentiment analysis using bert with disentangled attention,” in ICML, 2021.
  • Yang and Li [2021] H. Yang and K. Li, “Modeling aspect sentiment coherency via local sentiment aggregation,” arXiv preprint arXiv:2110.08604, 2021.
  • Xing and Tsang [2022] B. Xing and I. W. Tsang, “Understand me, if you refer to aspect knowledge: Knowledge-aware gated recurrent memory network,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 6, no. 5, pp. 1092–1102, 2022.
  • Zhang et al. [2022] Y. Zhang, M. Zhang, S. Wu, and J. Zhao, “Towards unifying the label space for aspect- and sentence-based sentiment analysis,” arXiv preprint arXiv:2203.07090, 2022.
  • Yang and Li [2024] H. Yang and K. Li, “Modeling Aspect Sentiment Coherency via Local Sentiment Aggregation,” in Findings of EACL, 2024, pp. 182–195.
  • Roumeliotis et al. [2024] K. I. Roumeliotis, N. D. Tselikas, and D. K. Nasiopoulos, “Llms in e-commerce: a comparative analysis of gpt and llama models in product review evaluation,” Natural Language Processing Journal, vol. 6, p. 100056, 2024.
  • Peng et al. [2023] B. Peng, C. Li, P. He, M. Galley, and J. Gao, “Instruction tuning with gpt-4,” arXiv preprint arXiv:2304.03277, 2023.
  • Hu et al. [2021] E. J. Hu, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen et al., “LoRA: Low-Rank Adaptation of Large Language Models,” in ICLR, 2021.
  • Dettmers et al. [2023] T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLoRA: Efficient Finetuning of Quantized LLMs,” in NeurIPS, vol. 36, 2023.
  • Tunstall et al. [2022] L. Tunstall, N. Reimers, U. E. S. Jo, L. Bates, D. Korat, M. Wasserblat, and O. Pereg, “Efficient few-shot learning without prompts,” arXiv preprint arXiv:2209.11055, 2022.
  • Yang and Li [2022] H. Yang and K. Li, “Pyabsa: Open framework for aspect-based sentiment analysis,” arXiv preprint arXiv:2208.01368, vol. 475, 2022.
  • Touvron et al. [2023] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open Foundation and Fine-Tuned Chat Models,” arXiv preprint arXiv:2307.09288, 2023.
  • Xiao et al. [2023] S. Xiao, Z. Liu, P. Zhang, and N. Muennighof, “C-Pack: Packaged Resources To Advance General Chinese Embedding,” arXiv preprint arXiv:2309.07597, 2023.
  • Ni et al. [2022a] J. Ni, G. H. Abrego, N. Constant, J. Ma, K. Hall, D. Cer, and Y. Yang, “Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models,” in Findings of ACL, 2022, pp. 1864–1874.
  • Reimers and Gurevych [2019] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” in EMNLP-IJCNLP, 2019, pp. 3982–3992.
  • Wang et al. [2020] W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou, “MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers,” NeurIPS, vol. 33, pp. 5776–5788, 2020.
  • Song et al. [2020] K. Song, X. Tan, T. Qin, J. Lu, and T.-Y. Liu, “MPNet: Masked and Permuted Pre-training for Language Understanding,” NeurIPS, vol. 33, pp. 16 857–16 867, 2020.
  • Radford et al. [2021] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in ICML.   PMLR, 2021, pp. 8748–8763.
  • Cohan et al. [2020] A. Cohan, S. Feldman, I. Beltagy, D. Downey, and D. S. Weld, “SPECTER: Document-level Representation Learning using Citation-informed Transformers,” in ACL, 2020.
  • Ni et al. [2022b] J. Ni, C. Qu, J. Lu, Z. Dai, G. H. Abrego, J. Ma, V. Zhao, Y. Luan, K. Hall, M.-W. Chang et al., “Large Dual Encoders Are Generalizable Retrievers,” in EMNLP, 2022, pp. 9844–9855.
  • Jiao et al. [2020] X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, and Q. Liu, “TinyBERT: Distilling BERT for Natural Language Understanding,” in Findings of EMNLP, 2020, pp. 4163–4174.
  • Lan et al. [2019] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations,” arXiv preprint arXiv:1909.11942, 2019.
  • Sanh et al. [2019] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” ArXiv, vol. abs/1910.01108, 2019.
  • Feng et al. [2022] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, and W. Wang, “Language-agnostic BERT Sentence Embedding,” in ACL, 2022, pp. 878–891.
  • Yang et al. [2023] H. Yang, C. Zhang, and K. Li, “PyABSA: a modularized framework for reproducible aspect-based sentiment analysis,” in ICKM, 2023, pp. 5117–5122.