Unified Interpretation of Smoothing Methods for Negative Sampling Loss Functions in Knowledge Graph Embedding
Abstract
Knowledge Graphs (KGs) are fundamental resources in knowledge-intensive tasks in NLP. Due to the limitation of manually creating KGs, KG Completion (KGC) has an important role in automatically completing KGs by scoring their links with KG Embedding (KGE). To handle many entities in training, KGE relies on Negative Sampling (NS) loss that can reduce the computational cost by sampling. Since the appearance frequencies for each link are at most one in KGs, sparsity is an essential and inevitable problem. The NS loss is no exception. As a solution, the NS loss in KGE relies on smoothing methods like Self-Adversarial Negative Sampling (SANS) and subsampling. However, it is uncertain what kind of smoothing method is suitable for this purpose due to the lack of theoretical understanding. This paper provides theoretical interpretations of the smoothing methods for the NS loss in KGE and induces a new NS loss, Triplet Adaptive Negative Sampling (TANS), that can cover the characteristics of the conventional smoothing methods. Experimental results of TransE, DistMult, ComplEx, RotatE, HAKE, and HousE on FB15k-237, WN18RR, and YAGO3-10 datasets and their sparser subsets show the soundness of our interpretation and performance improvement by our TANS.
1 Introduction
Knowledge Graphs (KGs) represent human knowledge using various entities and their relationships as graph structures. KGs are fundamental resources for knowledge-intensive tasks like dialog (Moon et al., 2019), question answering (Reese et al., 2020), named entity recognition (Liu et al., 2019), open-domain questions (Hu et al., 2022), and recommendation systems (Gao et al., 2020), etc.
However, to create complete KGs, we need to consider a large number of entities and all their possible relationships. Taking into account the explosively large number of combinations between entities, only relying on manual approaches is unrealistic to make complete KGs.
Knowledge Graph Completion (KGC) is a task to deal with this problem. KGC involves automatically completing missing links corresponding to relationships between entities in KGs. To complete the KGs, we need to score each link between entities. For this purpose, current KGC commonly relies on Knowledge Graph Embedding (KGE) (Bordes et al., 2011). KGE models predict the missing relations, named link prediction, by learning structural representations. In the current KGE, models need to complete a link (triplet) of entities and , and their relationship by answering or from a given query or , respectively. Hence, KGE needs to handle a large number of entities and their relationships during its training.
To handle a large number of entities and relationships in KGs, Negative Sampling (NS) loss (Mikolov et al., 2013) is frequently used for training KGE models. The original NS loss is proposed to approximate softmax cross-entropy loss to reduce computational costs by sampling false labels from its noise distribution in training. Trouillon et al. (2016) import the NS loss from word embedding to KGE with utilizing uniform distribution as its noise distribution. Sun et al. (2019) extend the NS loss to Self-Adversarial Negative Sampling (SANS) loss for efficient training of KGE. Unlike the NS loss with uniform distribution, the SANS loss utilizes the training model’s prediction as the noise distribution. Since the negative samples in the SANS loss become more difficult to discriminate for models in training, the SANS can extract models’ potential compared with the NS loss with uniform distribution.
![Refer to caption](extracted/5712115/figures/sum_query_answer_frequency_all.png)
![Refer to caption](x1.png)
One of the problems left for KGE is the sparsity of KGs. Figure 1 shows the appearance frequency of queries and answers (entities) in the training data of FB15k-237, WN18RR and YAGO3-10 datasets. From the long-tail distribution of this figure, we can understand that both queries and answers necessary for training KGE models may suffer from the sparsity problem.
As a solution, several smoothing methods are used in KGE. Sun et al. (2019) import subsampling from word2vec (Mikolov et al., 2013) to KGE. Subsampling can smooth the appearance frequency of triplets and queries in KGs. Kamigaito and Hayashi (2022a) show a general formulation that covers the basic subsampling of Sun et al. (2019) (Base), their frequency-based subsampling (Freq) and unique-based subsampling (Uniq) for KGE. Kamigaito and Hayashi (2021) indicate that SANS has a similar effect of using label-smoothing (Szegedy et al., 2016) and thus SANS can smooth the frequencies of answers in training. Figure 2 shows the effectiveness of SANS and subsampling in KGC performance. From the figure, since FB15k-237 is more sparse (imbalanced) than WN18RR and YAGO3-10 based on Figure 1, we can understand that strategy in choosing smoothing methods have more considerable influences than models when data is sparse.
While SANS and subsampling can improve model performance by smoothing the appearance frequencies of triplets, queries, and answers, their theoretical relationship is not clear, leaving their capabilities and deficiencies a question. For example, conventional works (Sun et al., 2019; Zhang et al., 2020b; Kamigaito and Hayashi, 2022a)111Note that Sun et al. (2019); Zhang et al. (2020b) use subsampling in their released implementation without referring to it in their paper. jointly use SANS and subsampling with no theoretical background. Thus, there is a call for further interpretability and performance improvement.
To solve the above problem, we theoretically and empirically study the differences of SANS and subsampling on three common datasets and their sparser subsets with six popular KGE models222Our code and data are available at https://github.com/xincanfeng/ss_kge.. Our contributions are as follows:
-
•
By focusing on the smoothing targets, we theoretically reveal the differences between SANS and subsampling and induce a new NS loss, Triplet Adaptive Negative Sampling (TANS), that can cover the smoothing target of both SANS and subsampling.
-
•
We theoretically show that TANS with subsampling can potentially cover the conventional usages of SANS and subsampling.
-
•
We empirically verify that TANS improves KGC performance on sparse KGs in terms of MRR.
-
•
We empirically verify that TANS with subsampling can cover the conventional usages of SANS and subsampling in terms of MRR.
2 Background
In this section, we describe the problem formulation for solving KGC by KGE and explain the conventional NS loss functions in KGE.
2.1 Formulation of KGE
KGC is a research topic for automatically inferring new links in a KG that are likely but not yet known to be true. To infer the new links by KGE, we decompose KGs into a set of triplets (links). By using entities , and their relation , we represent the triplet as . In a typical KGC task, a KGE model receives a query or and predicts the entity corresponding to as an answer.
In KGE, a KGE model scores a triplet by using a scoring function , where denotes model parameters. Here, using a softmax function, we represent the existence probability for an answer of the query as follows:
(1) |
where Y is a set of entities.
2.2 NS Loss in KGE
To train , we need to calculate losses for the observables that follow . Even if we can represent KGC by Eq. (1), it does not mean we can tractably perform KGC due to the large number of Y in KGs. For the reason of the computational cost, the NS loss (Mikolov et al., 2013) is used to approximate Eq. (1) by sampling false answers.
By modifying that of Mikolov et al. (2013), the following NS loss (Sun et al., 2019; Ahrabian et al., 2020) is commonly used in KGE:
(2) |
where is the noise distribution that follows uniform distribution, is the sigmoid function, is the number of negative samples per positive sample , and is a margin term to adjust the value range decided by .
2.3 Smoothing Methods for the NS Loss in KGE
As shown in Figure 1, KGC needs to deal with the sparsity problem caused by low frequent queries and answers in KGs. Imposing smoothing on the appearance frequencies of queries and answers can mitigate this problem. The following subsections introduce subsampling (Mikolov et al., 2013; Sun et al., 2019; Kamigaito and Hayashi, 2022a) and SANS (Sun et al., 2019), the conventional smoothing methods for the NS loss in KGE.
2.3.1 Subsampling
Subsampling (Mikolov et al., 2013) is a method to smooth the frequency of triplets or queries in the NS loss. Sun et al. (2019) import this approach from word embedding to KGE. Kamigaito and Hayashi (2022b, a) add some variants to subsampling for KGC and theoretically provide a unified expression of them as follows:
(3) |
where is a temperature term to adjust the frequecy of triplets and queries. Note that we incorporate into Eq. (3) to consider various loss functions even though Kamigaito and Hayashi (2022b, a) do not consider . In this formulation, we can consider several assumptions for deciding and . We introduce these assumptions in the following paragraphs:
Base
As a basic subsampling approach, Sun et al. (2019) import the one originally used in word2vec Mikolov et al. (2013) to KGE, defined as follows:
(4) |
where is the symbol for frequency and represents the frequency of . In word2vec, subsampling randomly discards a word by a probability , where is a constant value and is a frequency of a word. This is similar to randomly keeping a word with a probability . Thus, we can understand that Eq. (4) follows the original use in word2vec. Since the actual occurs at most once in KGs, when , they approximate the frequency of as:
(5) |
based on the approximation of n-gram language modeling (Katz, 1987).
Freq
Kamigaito and Hayashi (2022a) propose frequency-based subsamping (Freq) by assuming a case that originally has a frequency, but the observed one in the KG is at most 1.
(6) |
Uniq
Kamigaito and Hayashi (2022a) also propose unique-based subsamping (Uniq) by assuming a case that the originally frequency and the observed one in the KG are both 1.
(7) |
2.3.2 SANS Loss
SANS is originally proposed as a kind of NS loss to train KGE models efficiently by considering negative samples close to their corresponding positive ones. Kamigaito and Hayashi (2021) show that using SANS is similar to imposing label-smoothing on Eq. (1). Thus, SANS is a method to smooth the frequency of answers in the NS loss. The SANS loss is represented as follows:
(8) | ||||
(9) |
where is a temperature to adjust the distribution of negative sampling. Different from subsampling, SANS uses that is predicted by a model to adjust the frequency of the answer . Since is essentially a noise distribution, it does not receive any gradient during training.
Method | Smoothing | Remarks | |||
---|---|---|---|---|---|
Subsampling | Base | and are influenced by . | |||
Uniq | is indirectly controlled by . | ||||
Freq | is indirectly controlled by or . | ||||
SANS | is indirectly controlled by . | ||||
TANS |
3 Triplet Adaptive Negative Sampling
In this section, we explain our proposed Triplet Adaptive Negative Sampling (TANS) in detail. We first show the overview of our TANS through the comparison with the conventional smoothing methods of the NS loss for KGE (See §2.3) in §3.1 and after that we explain the details of TANS through its mathematical formulations in §3.2 and §3.3.
3.1 Overview
TANS is fundamentally different from SANS, with SANS only taking into account the conditional probability of negative samples and TANS being a loss function that considers the joint probability of the pair of queries and their answers.
Table 1 shows the characteristics of TANS and the conventional smoothing methods of the NS loss for KGE introduced in §2.3. These characteristics are based on the decomposition of , the appearance probability for the triplet , into that of its answer and query :
(10) |
In Eq. (10), smoothing both and is similar to smoothing . However, smoothing does not ensure smoothing both and considering the case of only one of them being smoothed, and the left one being still sparse. Similarly, smoothing only or does not ensure being smoothed due to the case where one of them is still sparse. In Table 1, we denote such a case where the method can influence the probability, but no guarantee of the probability be smoothed as .
In TANS, we aim to smooth by smoothing both and based on Eq. (10).
3.2 Formulation
Here, we induce TANS from SANS with targeting to smooth by smoothing both and . First, we assume a simple replacement from to in of Eq. (9):
(11) |
However, using Eq. (11) causes an imbalanced loss between the first and second terms since the sum of on all negative samples is not always 1. Thus, Eq. (11) is impractical as a loss function.
As a solution, we focus on the decomposition and the fact that the sum of of all negative samples is always 1. By using to make a balance between the first and second loss term, we can modify Eq. (11) and induce our TANS as follows:
(12) | ||||
(13) |
where is a temperature to smooth the frequency of queries. Since TANS uses a noise distribution decided by and , it does not propagate gradients through probabilities for negative samples, and thus, memory usage is not increased.
Temperature | Induced NS Loss | ||
---|---|---|---|
Equivalent to , the basic NS loss in KGE (Eq. (2)) | |||
Currently does not exist | |||
Proportional to , the SANS loss (Eq. (9)) | |||
Equivalent to our , the TANS loss (Eq. (12)) | |||
Proportional to , the basic NS loss in KGE (Eq. (2)) with subsampling in §2.3 | |||
Currently does not exist | |||
Proportional to , the SANS loss (Eq. (9)) with subsampling in §2.3 | |||
Equivalent to our , the unified NS loss in KGE (Eq. (16)) | |||
and also equivalent to our , the TANS loss (Eq. (12)) with subsampling in §2.3 |
3.3 Theoretical Interpretation
In this subsection, we discuss the difference and similarities among TANS and other smoothing methods for the NS loss in KGE. As shown in Table 1, the subsampling methods, Base and Freq, can smooth triplet frequencies similar to our TANS. To investigate TANS from the view point of subsampling, we reformulate Eq. (12) as follows:
(14) | ||||
(15) |
Apart from the temperature terms, , , and , we can see that the general formulation of subsampling in Eq. (3) and the above Eq. (14) has the same formulation. Thus, TANS is not merely an extension of SANS but also a novel subsampling method.
Even though their similar characteristic, TANS and subsampling have an essential difference: TANS smooths the frequencies by model-predicted distributions as in Eq. (13), and the subsampling methods smooth them by counting appearance frequencies on the observed data as in Eq. (4), (5), (6), and (7). For instance, TANS can work even when the entity or relations included in the target triplet appear more than once, which is theoretically different from conventional approaches.
Since the superiority of using either model-based or count-based frequencies depends on the model and dataset, we empirically investigate this point through our experiments.
4 Unified Interpretation of SANS and Subsampling
In the previous section, we understand that our TANS can smooth triplets, queries, and answers partially covered by SANS and subsampling methods. On the other hand, TANS only relies on model-predicted frequencies to smooth the frequencies. Neubig and Dyer (2016) point out the benefits of combining count-based and model-predicted frequencies in language modeling. This section integrates smoothing methods for the NS loss in KGE from a unified interpretation.
4.1 Formulation
4.2 Theoretical Interpretation
As shown in Table 2, TANS w/ subsampling has characteristics of all smoothing methods for the NS loss in KGE introduced in this paper. Therefore, we can expect higher performance of TANS w/ subsampling than the combination of conventional methods, the basic NS, SANS, and subsampling. However, because TANS w/ subsampling uses subsampling in §2.3, we need to choose the one from Base, Uniq, and Freq for TANS w/ subsampling. Since this part is out of the scope of theoretical interpretation, we investigate this in the experiments.
![Refer to caption](x2.png)
![Refer to caption](x3.png)
![Refer to caption](x4.png)
5 Experiments
In this section, we investigate our theoretical interpretation in §3.3 and §4.2 through experiments.
5.1 Experimental Settings
Datasets We used three common datasets, FB15k-237 (Toutanova and Chen, 2015), WN18RR, and YAGO3-10 (Dettmers et al., 2018) 333Table 4 in Appendix A shows the dataset statistics..
Comparison Methods As comparison methods, we used TransE (Bordes et al., 2013), DistMult (Yang et al., 2015), ComplEx (Trouillon et al., 2016), RotatE (Sun et al., 2019), HAKE (Zhang et al., 2020a), and HousE (Li et al., 2022). We followed the original settings of Sun et al. (2019) for TransE, DistMult, ComplEx, and RotatE with their implementation444https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding, the original settings of Zhang et al. (2020a) for HAKE with their implementation555https://github.com/MIRALab-USTC/KGE-HAKE, and the original settings of Li et al. (2022) for HousE with their implementation666https://github.com/rui9812/HousE. We tuned temperature on the validation split for each dataset.
Metrics We employed conventional metrics in KGC, i.e., MRR, Hits@1 (H@1), Hits@3 (H@3), and Hits@10 (H@10) and reported the average scores and their standard deviations by three different runs with fixed random seeds.
5.2 Results
Since the result tables are large777The full experimental results are listed in Appendix B. The scores are included in Table 5, 6, and 7 of Appendix B.1. The training loss curves and validation MRR curves for each smoothing method are in Figure 6, 7, and 8 of Appendix B.2., we discuss them individually, focusing on important information in the following subsections.
5.2.1 Effectiveness of TANS
Figure 3(a) shows the MRR scores of each method. From the result, we can understand the effectiveness of considering triplet information in SANS as conducted in TANS. Thus, the result is along with our expectation in §3.3 that TANS can cover the role of subsampling methods. However, as the result of HAKE on WN18RR shows, there is a case that subsampling methods outperform TANS. As discussed in §3.3, using only TANS does not cover all combinations of NS loss and subsampling. Considering this theoretical fact, we further compare TANS with subsampling and the SANS loss with subsampling in the following section.
5.2.2 Validity of the Unified Interpretation
Figure 3(b) shows the result for each configuration. We can see performance improvements by using subsampling in both SANS and TANS. Furthermore, in almost all cases, TANS with subsampling achieve the highest MRR. This observation is along with the theoretical conclusion in §3.3 that TANS with subsampling can cover the characteristic of other NS loss in terms of smoothing. On the other hand, the results of HAKE on YAGO3-10 show the different tendency that SANS with subsampling achieves the best MRR instead of TANS. Because the model prediction estimates the triplet frequencies, TANS is influenced by the selected model. Therefore, carefully choosing the combination of a loss function and model is still effective in improving KGC performance on the NS loss with subsampling.
6 Analysis
We analyze how TANS mitigates the sparsity problem in imbalanced KGs commonly caused by low frequent triplets in KGC. By considering that all triplets in KGs appear at most once, we focus on queries. We extracted 0.5% triplets with the highest or lowest frequent queries in training, validation, and test splits as the sparser subsets FB15k-237-HL, WN18RR-HL, and YAGO3-10-HL, respectively 888Note that we show their appearance frequencies of queries and answers in the training data in Figure 5 and detailed statistics in Table 4 of Appendix C.1 and C.2, respectively. from original data, for the investigation.
7 Related Work
Knowledge Graph
Knowledge graphs have important roles in various knowledge-intensive NLP tasks like dialog (Moon et al., 2019), question answering (Reese et al., 2020), named entity recognition (Liu et al., 2019), open-domain questions (Hu et al., 2022), recommendation systems (Gao et al., 2020), and commonsense reasoning (Sakai et al., 2024b), etc. In addition to these text-only tasks, knowledge-intensive vision and language (V&L) tasks such as visual question answering (VQA) (Yue et al., 2023), image generation (Kamigaito et al., 2023), explanation generation (Hayashi et al., 2024), and image review generation (Saito et al., 2024) also require external knowledge. Visual KGs (Zhu et al., 2024) have the potential to contribute to solving these tasks. Therefore, KGs are important materials in various different fields.
Knowlege Graph Completion
Even though KGs are useful, their sparsity is a fundamental problem. To solve the sparsity of knowledge graphs, we need to complete them by inferring their unseen links between nodes, which are entities. For that purpose, knowledge graph completion (KGC) and knowledge graph embedding (KGE) Bordes et al. (2011), which represents KG information as a continuous vector space, are commonly used. As KGE methods, vector space models like TransE Bordes et al. (2013), DistMult (Yang et al., 2015), ComplEx (Trouillon et al., 2016), RotatE (Sun et al., 2019), HAKE (Zhang et al., 2020a), and HousE (Li et al., 2022), that learn only from task-specific datasets expand this field as pioneers. As well as such approaches, pre-trained language model (PLM)-based approaches like KEPLER Wang et al. (2021) and SimKGC Wang et al. (2022) also have an important role in KGC due to their ability to utilize the knowledge obtained in pre-training. However, as pointed out by Sakai et al. (2024a), PLM-based approaches have a leakage issue caused by data contamination in pre-training. Generation-based KGC methods like KGT5 Saxena et al. (2022) and GenKGC Xie et al. (2022) are unique in directly generating entity names. In hierarchical text classification (HTC), generation-based approaches contribute to improving performance Kwon et al. (2023) supported by considering label hierarchies by fusing pre-trained text and label embeddings Xiong et al. (2021); Zhang et al. (2021) on the decoder. However, Sakai et al. (2024a) point out that commonly used KGC methods conduct link-level prediction, and such generation-based KGC methods make it difficult to use structure information of KGs directly. Thus, their performance gain is limited. This situation requires investigating the benefits of inferring links by generation-based KGC under predefined entities and relationships.
Negative Sampling
Mikolov et al. (2013) initially propose the NS loss of the frequent words to train their word embedding model, word2vec. Trouillon et al. (2016) introduce the NS loss to KGE to speed up training. Melamud et al. (2017) use the NS loss to train the language model. In contextualized pre-trained embeddings, Clark et al. (2020a) indicate that a BERT (Devlin et al., 2019)-like model ELECTRA (Clark et al., 2020b) uses the NS loss to perform better and faster than language models. Sun et al. (2019) extend the NS loss to SANS loss for KGE and propose their noise distribution, which is subsampled by a uniformed probability . Kamigaito and Hayashi (2021) point out the sparseness problem of KGs through their theoretical analysis of the NS loss in KGE. Furthermore, Kamigaito and Hayashi (2022a, b) reveal that subsampling Mikolov et al. (2013) can alleviate the sparseness problem in the NS for KGE and conclude three assumptions for subsampling, i.e., Base, Freq, and Uniq. Feng et al. (2023) incorporate their proposed model-based subsampling that estimates frequencies for entities and their relationships by a trained KGE model into the subsampling of the NS loss to mitigate the sparseness issue of counting the frequency by increasing computational cost to train the additional KGE model.
Our Work
Through our work, we theoretically clarify the position of the previous works on SANS loss and subsampling from the viewpoint of smoothing methods for the NS loss in KGE. Since our work unitedly interprets SANS loss and subsampling, our proposed TANS inherits the advantages of conventional works and can deal with the sparsity problem in the NS loss for KGE.
8 Conclusion
We reveal the relationships between SANS loss and subsampling for the KG completion task through theoretical analysis. We explain that SANS loss and subsampling under three assumptions, Base, Freq, and Uniq have similar roles to mitigate the sparseness problem of queries and answers of KGs by smoothing the frequencies of queries and answers. Furthermore, based on our interpretation, we induce a new loss function, Triplet Adaptive Negative Sampling (TANS), by integrating SANS loss and subsampling. We also introduce a theoretical interpretation that TANS with subsampling can cover all conventional combinations of SANS loss and subsampling.
We verified our interpretation by empirical experiments in three common datasets, FB15k-237, WN18RR, and YAGO3-10, and six popular KGE models, TransE, DistMult, ComplEx, RotatE, HAKE, and HousE. The experimental results show that our TANS loss can outperform subsampling and SANS loss with many models in terms of MRR as expected by our theoretical interpretation. Furthermore, the combinatorial use of TANS and subsampling achieved comparable or better performance than other combinations and showed the validity of our theoretical interpretation that TANS with subsampling can cover all conventional combinations of SANS loss and subsampling in KGE.
Limitations
Our experiments are conducted exclusively on public datasets, which are relatively well-balanced. Consequently, we anticipate that our TANS will perform better on real-world KGs.
Ethics Statement
We used the publicly available datasets, FB15k-237, WN18RR, and YAGO3-10, to train and evaluate KGE models, and there is no ethical consideration.
Reproducibility Statement
We used the publicly available code to implement KGE models, TransE, DistMult, ComplEx, RotatE, HAKE, and HousE with the author-provided hyperparameters as described in §5.1. Regarding the temperature parameter , we tuned it on the validation split for each dataset and reported the values in Table 5, 6, and 7 of Appendix B. Our code and data are available at https://github.com/xincanfeng/ss_kge.
Acknowledgements
This work was supported by NAIST Granite, i.e., JST SPRING Grant Number JPMJSP2140.
References
- Ahrabian et al. (2020) Kian Ahrabian, Aarash Feizi, Yasmin Salehi, William L. Hamilton, and Avishek Joey Bose. 2020. Structure aware negative sampling in knowledge graphs. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6093–6101, Online. Association for Computational Linguistics.
- Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, pages 2787–2795.
- Bordes et al. (2011) Antoine Bordes, Jason Weston, Ronan Collobert, and Yoshua Bengio. 2011. Learning structured embeddings of knowledge bases. In Proceedings of the AAAI conference on artificial intelligence, volume 25, pages 301–306.
- Clark et al. (2020a) Kevin Clark, Minh-Thang Luong, Quoc Le, and Christopher D. Manning. 2020a. Pre-training transformers as energy-based cloze models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 285–294, Online. Association for Computational Linguistics.
- Clark et al. (2020b) Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020b. Electra: Pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations.
- Dettmers et al. (2018) Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2d knowledge graph embeddings. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), pages 1811–1818.
- Devlin et al. (2019) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Feng et al. (2023) Xincan Feng, Hidetaka Kamigaito, Katsuhiko Hayashi, and Taro Watanabe. 2023. Model-based subsampling for knowledge graph completion. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 910–920, Nusa Dua, Bali. Association for Computational Linguistics.
- Gao et al. (2020) Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan. 2020. Deep learning on knowledge graph for recommender system: A survey.
- Hayashi et al. (2024) Kazuki Hayashi, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, and Taro Watanabe. 2024. Artwork explanation in large-scale vision language models.
- Hu et al. (2022) Ziniu Hu, Yichong Xu, Wenhao Yu, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Kai-Wei Chang, and Yizhou Sun. 2022. Empowering language models with knowledge graph reasoning for question answering.
- Kamigaito and Hayashi (2021) Hidetaka Kamigaito and Katsuhiko Hayashi. 2021. Unified interpretation of softmax cross-entropy and negative sampling: With case study for knowledge graph embedding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5517–5531, Online. Association for Computational Linguistics.
- Kamigaito and Hayashi (2022a) Hidetaka Kamigaito and Katsuhiko Hayashi. 2022a. Comprehensive analysis of negative sampling in knowledge graph representation learning. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 10661–10675. PMLR.
- Kamigaito and Hayashi (2022b) Hidetaka Kamigaito and Katsuhiko Hayashi. 2022b. Erratum to: Comprehensive analysis of negative sampling in knowledge graph representation learning. ResearchGate.
- Kamigaito et al. (2023) Hidetaka Kamigaito, Katsuhiko Hayashi, and Taro Watanabe. 2023. Table and image generation for investigating knowledge of entities in pre-trained vision and language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1904–1917, Toronto, Canada. Association for Computational Linguistics.
- Katz (1987) Slava Katz. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE transactions on acoustics, speech, and signal processing, 35(3):400–401.
- Kwon et al. (2023) Jingun Kwon, Hidetaka Kamigaito, Young-In Song, and Manabu Okumura. 2023. Hierarchical label generation for text classification. In Findings of the Association for Computational Linguistics: EACL 2023, pages 625–632, Dubrovnik, Croatia. Association for Computational Linguistics.
- Li et al. (2022) Rui Li, Jianan Zhao, Chaozhuo Li, Di He, Yiqi Wang, Yuming Liu, Hao Sun, Senzhang Wang, Weiwei Deng, Yanming Shen, Xing Xie, and Qi Zhang. 2022. House: Knowledge graph embedding with householder parameterization.
- Liu et al. (2019) Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, and Ping Wang. 2019. K-bert: Enabling language representation with knowledge graph.
- Melamud et al. (2017) Oren Melamud, Ido Dagan, and Jacob Goldberger. 2017. A simple language model based on PMI matrix approximations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1860–1865, Copenhagen, Denmark. Association for Computational Linguistics.
- Mikolov et al. (2013) Tomás Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. CoRR, abs/1310.4546.
- Moon et al. (2019) Seungwhan Moon, Pararth Shah, Anuj Kumar, and Rajen Subba. 2019. OpenDialKG: Explainable conversational reasoning with attention-based walks over knowledge graphs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 845–854, Florence, Italy. Association for Computational Linguistics.
- Neubig and Dyer (2016) Graham Neubig and Chris Dyer. 2016. Generalizing and hybridizing count-based and neural language models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1163–1172, Austin, Texas. Association for Computational Linguistics.
- Reese et al. (2020) Justin Reese, Deepak Unni, Tiffany Callahan, Luca Cappelletti, Vida Ravanmehr, Seth Carbon, Kent Shefchek, Benjamin Good, James Balhoff, Tommaso Fontana, Hannah Blau, Nicolas Matentzoglu, Nomi Harris, Monica Munoz-Torres, Melissa Haendel, Peter Robinson, Marcin Joachimiak, and Christopher Mungall. 2020. Kg-covid-19: a framework to produce customized knowledge graphs for covid-19 response. Patterns, 2:100155.
- Saito et al. (2024) Shigeki Saito, Kazuki Hayashi, Yusuke Ide, Yusuke Sakai, Kazuma Onishi, Toma Suzuki, Seiji Gobara, Hidetaka Kamigaito, Katsuhiko Hayashi, and Taro Watanabe. 2024. Evaluating image review ability of vision language models.
- Sakai et al. (2024a) Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, and Taro Watanabe. 2024a. Does pre-trained language model actually infer unseen links in knowledge graph completion? In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8091–8106, Mexico City, Mexico. Association for Computational Linguistics.
- Sakai et al. (2024b) Yusuke Sakai, Hidetaka Kamigaito, and Taro Watanabe. 2024b. mcsqa: Multilingual commonsense reasoning dataset with unified creation strategy by language models and humans.
- Saxena et al. (2022) Apoorv Saxena, Adrian Kochsiek, and Rainer Gemulla. 2022. Sequence-to-sequence knowledge graph completion and question answering. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.
- Sun et al. (2019) Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. Rotate: Knowledge graph embedding by relational rotation in complex space. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019.
- Szegedy et al. (2016) Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826.
- Toutanova and Chen (2015) Kristina Toutanova and Danqi Chen. 2015. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pages 57–66, Beijing, China. Association for Computational Linguistics.
- Trouillon et al. (2016) Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 2071–2080. JMLR.org.
- Wang et al. (2022) Liang Wang, Wei Zhao, Zhuoyu Wei, and Jingming Liu. 2022. SimKGC: Simple contrastive knowledge graph completion with pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4281–4294, Dublin, Ireland. Association for Computational Linguistics.
- Wang et al. (2021) Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhengyan Zhang, Zhiyuan Liu, Juanzi Li, and Jian Tang. 2021. KEPLER: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics, 9:176–194.
- Xie et al. (2022) Xin Xie, Ningyu Zhang, Zhoubo Li, Shumin Deng, Hui Chen, Feiyu Xiong, Mosha Chen, and Huajun Chen. 2022. From discrimination to generation: Knowledge graph completion with generative transformer. In Companion Proceedings of the Web Conference 2022, WWW ’22, page 162–165, New York, NY, USA. Association for Computing Machinery.
- Xiong et al. (2021) Yijin Xiong, Yukun Feng, Hao Wu, Hidetaka Kamigaito, and Manabu Okumura. 2021. Fusing label embedding into BERT: An efficient improvement for text classification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1743–1750, Online. Association for Computational Linguistics.
- Yang et al. (2015) Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In Proceddings of the 3rd International Conference on Learning Representations, ICLR 2015.
- Yue et al. (2023) Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, and Wenhu Chen. 2023. Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi.
- Zhang et al. (2021) Ying Zhang, Hidetaka Kamigaito, and Manabu Okumura. 2021. A language model-based generative classifier for sentence-level discourse parsing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2432–2446, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Zhang et al. (2020a) Zhanqiu Zhang, Jianyu Cai, Yongdong Zhang, and Jie Wang. 2020a. Learning hierarchy-aware knowledge graph embeddings for link prediction. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, (AAAI20), pages 3065–3072.
- Zhang et al. (2020b) Zhiyuan Zhang, Xiaoqian Liu, Yi Zhang, Qi Su, Xu Sun, and Bin He. 2020b. Pretrain-KGE: Learning knowledge representation from pretrained language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 259–266, Online. Association for Computational Linguistics.
- Zhu et al. (2024) Xiangru Zhu, Zhixu Li, Xiaodan Wang, Xueyao Jiang, Penglei Sun, Xuwu Wang, Yanghua Xiao, and Nicholas Jing Yuan. 2024. Multi-modal knowledge graph construction and application: A survey. IEEE Transactions on Knowledge and Data Engineering, 36(2):715–735.
Appendix A Dataset Statistics
Appendix B Full Experimental Results
B.1 Results Tables
Table 5, 6, and 7 list all results on FB15k-237, WN18RR, and YAGO3-10, explained in §5.2. In these tables, the bold scores are the best results for each subsampling type (e.g. None, Base, Freq, and Uniq.), indicates the best scores for each model, SD denotes the standard deviation of the three trials, and denotes the temperature chosen by development data.
B.2 Training Loss and Validation MRR Curve
Figure 6, 7, and 8 show the training loss curves and validation MRR curves for each smoothing method. From these figures, we can understand that the convergence of TANS loss is as well as SANS and NS loss on datasets FB15k-237, WN18RR, and YAGO3-10 for each KGE model. Meanwhile, the time complexity of TANS is the same with SANS and NS loss too.
![Refer to caption](extracted/5712115/figures/sum_query_answer_frequency_hl.png)
Dataset | Split | Tuple | Query | Entity | Relation |
---|---|---|---|---|---|
FB15k-237 | Total | 310,116 | 150,508 | 14,541 | 237 |
#Train | 272,115 | 138,694 | 14,505 | 237 | |
#Valid | 17,535 | 19,750 | 9,809 | 223 | |
#Test | 20,466 | 22,379 | 10,348 | 224 | |
WN18RR | Total | 93,003 | 77,479 | 40,943 | 11 |
#Train | 86,835 | 74,587 | 40,559 | 11 | |
#Valid | 3,034 | 5,431 | 5,173 | 11 | |
#Test | 3,134 | 5,565 | 5,323 | 11 | |
YAGO3-10 | Total | 1,089,040 | 372,775 | 123,182 | 37 |
#Train | 1,079,040 | 371,077 | 123,143 | 37 | |
#Valid | 5,000 | 8,534 | 7,948 | 33 | |
#Test | 5,000 | 8,531 | 7,937 | 34 |
Dataset | Split | Tuple | Query | Entity | Relation |
FB15k-237-HL | Total | 111,631 | 63,330 | 11,828 | 155 |
#Train | 95,244 | 55,923 | 11,600 | 155 | |
#Valid | 7,571 | 6,918 | 4,933 | 90 | |
#Test | 8,816 | 7,830 | 5,406 | 89 | |
WN18RR-HL | Total | 14,697 | 14,675 | 12,973 | 10 |
#Train | 13,758 | 13,785 | 12,275 | 10 | |
#Valid | 465 | 619 | 613 | 9 | |
#Test | 474 | 623 | 619 | 8 | |
YAGO3-10-HL | Total | 366,079 | 182,274 | 95,788 | 29 |
#Train | 362,728 | 181,196 | 95,432 | 29 | |
#Valid | 1,662 | 2,316 | 2,113 | 13 | |
#Test | 1,689 | 2,359 | 2,135 | 14 |
Appendix C Sparse Queries
C.1 Appearance Frequencies of Queries and Answers
C.2 Data Statistics
C.3 Detailed Results
Table 8, 9, and 10 shows the detailed results on our filtered sparser data FB15k-237-HL, WN18RR-HL, and YAGO3-10-HL, expained in §6. Notations are as those described in §B.1.
FB15k-237 | |||||||||||
Model | Subsampling | MRR | H@1 | H@3 | H@10 | ||||||
Assumption | Loss | Mean | SD | Mean | SD | Mean | SD | Mean | SD | ||
ComplEx | None | NS | 23.9 | 0.2 | 15.8 | 0.1 | 26.1 | 0.3 | 40.0 | 0.2 | - |
SANS | 22.3 | 0.1 | 13.8 | 0.1 | 24.2 | 0.0 | 39.5 | 0.2 | - | ||
TANS | 32.8 | 0.2 | 23.2 | 0.1 | 36.2 | 0.2 | 52.2 | 0.1 | -2 | ||
Base | NS | 27.2 | 0.1 | 19.1 | 0.1 | 29.5 | 0.1 | 43.0 | 0.2 | - | |
SANS | 32.3 | 0.0 | 23.0 | 0.1 | 35.4 | 0.1 | 51.2 | 0.1 | - | ||
TANS | †33.3 | 0.0 | †23.8 | 0.1 | †36.9 | 0.1 | †52.7 | 0.0 | -1 | ||
Freq | NS | 25.1 | 0.2 | 17.1 | 0.3 | 27.4 | 0.2 | 41.0 | 0.2 | - | |
SANS | 32.7 | 0.1 | 23.6 | 0.1 | 36.0 | 0.1 | 51.2 | 0.1 | - | ||
TANS | †33.3 | 0.0 | †23.8 | 0.0 | 36.8 | 0.1 | 52.1 | 0.2 | -0.5 | ||
Uniq | NS | 22.8 | 0.4 | 14.7 | 0.5 | 24.7 | 0.4 | 39.0 | 0.1 | - | |
SANS | 32.6 | 0.0 | 23.5 | 0.1 | 35.8 | 0.1 | 51.2 | 0.1 | - | ||
TANS | 33.0 | 0.1 | 23.5 | 0.1 | 36.5 | 0.1 | 52.1 | 0.1 | -0.5 | ||
DistMult | None | NS | 23.3 | 0.1 | 15.6 | 0.1 | 25.7 | 0.1 | 38.4 | 0.1 | - |
SANS | 22.3 | 0.1 | 14.0 | 0.2 | 24.1 | 0.1 | 39.2 | 0.0 | - | ||
TANS | 31.0 | 0.1 | 21.7 | 0.1 | 34.0 | 0.1 | 49.6 | 0.1 | -1 | ||
Base | NS | 25.4 | 0.1 | 17.9 | 0.1 | 27.6 | 0.1 | 40.4 | 0.1 | - | |
SANS | 30.8 | 0.1 | 21.9 | 0.1 | 33.6 | 0.1 | 48.4 | 0.1 | - | ||
TANS | †31.5 | 0.1 | †22.4 | 0.1 | †34.6 | 0.1 | †49.7 | 0.0 | -0.5 | ||
Freq | NS | 24.0 | 0.1 | 16.7 | 0.2 | 25.9 | 0.1 | 38.4 | 0.1 | - | |
SANS | 29.9 | 0.0 | 21.2 | 0.1 | 32.8 | 0.0 | 47.5 | 0.1 | - | ||
TANS | 30.7 | 0.0 | 21.6 | 0.0 | 34.0 | 0.0 | 49.0 | 0.0 | -1 | ||
Uniq | NS | 21.0 | 0.1 | 13.5 | 0.2 | 22.8 | 0.2 | 36.3 | 0.2 | - | |
SANS | 29.2 | 0.0 | 20.5 | 0.1 | 31.9 | 0.0 | 46.7 | 0.0 | - | ||
TANS | 30.7 | 0.1 | 21.5 | 0.1 | 33.8 | 0.1 | 49.3 | 0.1 | -2 | ||
TransE | None | NS | 30.4 | 0.0 | 21.3 | 0.1 | 33.4 | 0.1 | 48.5 | 0.0 | - |
SANS | 33.0 | 0.1 | 22.9 | 0.1 | 37.2 | 0.1 | †53.0 | 0.1 | - | ||
TANS | 33.6 | 0.0 | 23.9 | 0.0 | 37.3 | 0.0 | †53.0 | 0.1 | -0.5 | ||
Base | NS | 29.4 | 0.1 | 20.0 | 0.1 | 32.8 | 0.0 | 48.1 | 0.0 | - | |
SANS | 33.0 | 0.1 | 23.1 | 0.1 | 36.8 | 0.1 | 52.7 | 0.1 | - | ||
TANS | 33.0 | 0.0 | 23.1 | 0.0 | 36.8 | 0.1 | 52.7 | 0.1 | -0.1 | ||
Freq | NS | 29.3 | 0.1 | 20.0 | 0.1 | 32.8 | 0.1 | 47.8 | 0.1 | - | |
SANS | 33.5 | 0.0 | 23.9 | 0.1 | 37.2 | 0.1 | 52.8 | 0.1 | - | ||
TANS | 33.5 | 0.1 | 23.9 | 0.1 | 37.2 | 0.0 | 52.8 | 0.1 | -0.1 | ||
Uniq | NS | 30.1 | 0.1 | 21.0 | 0.1 | 33.6 | 0.0 | 48.0 | 0.0 | - | |
SANS | 33.5 | 0.0 | 23.9 | 0.0 | 37.3 | 0.2 | 52.7 | 0.1 | - | ||
TANS | †34.0 | 0.1 | †24.5 | 0.1 | †37.7 | 0.1 | †53.0 | 0.1 | 0.5 | ||
RotatE | None | NS | 30.3 | 0.0 | 21.4 | 0.1 | 33.2 | 0.1 | 48.4 | 0.1 | - |
SANS | 32.9 | 0.1 | 22.8 | 0.1 | 36.8 | 0.0 | 53.1 | 0.2 | - | ||
TANS | 34.1 | 0.1 | 24.6 | 0.1 | 37.7 | 0.1 | †53.3 | 0.1 | -0.5 | ||
Base | NS | 29.5 | 0.0 | 20.3 | 0.0 | 32.7 | 0.1 | 47.9 | 0.0 | - | |
SANS | 33.6 | 0.1 | 23.9 | 0.1 | 37.3 | 0.1 | 53.1 | 0.0 | - | ||
TANS | 33.8 | 0.0 | 24.2 | 0.0 | 37.4 | 0.0 | 53.0 | 0.1 | -0.5 | ||
Freq | NS | 29.4 | 0.1 | 20.2 | 0.1 | 32.6 | 0.1 | 47.6 | 0.1 | - | |
SANS | 34.0 | 0.1 | 24.6 | 0.0 | 37.7 | 0.0 | 53.0 | 0.0 | - | ||
TANS | 34.1 | 0.0 | 24.6 | 0.0 | 37.7 | 0.0 | 53.1 | 0.1 | -0.01 | ||
Uniq | NS | 30.1 | 0.0 | 21.2 | 0.1 | 33.3 | 0.1 | 47.7 | 0.1 | - | |
SANS | 33.9 | 0.1 | 24.4 | 0.1 | 37.6 | 0.1 | 52.9 | 0.1 | - | ||
TANS | †34.2 | 0.0 | †24.7 | 0.1 | †37.8 | 0.0 | 53.1 | 0.1 | 0.5 | ||
HAKE | None | NS | 30.8 | 0.1 | 21.8 | 0.1 | 33.8 | 0.1 | 48.6 | 0.1 | - |
SANS | 32.8 | 0.2 | 22.7 | 0.3 | 36.9 | 0.1 | 52.8 | 0.1 | - | ||
TANS | 34.4 | 0.1 | 24.9 | 0.1 | 37.9 | 0.2 | 53.6 | 0.0 | -0.5 | ||
Base | NS | 30.4 | 0.1 | 21.6 | 0.1 | 33.3 | 0.1 | 48.2 | 0.0 | - | |
SANS | 34.1 | 0.1 | 24.4 | 0.1 | 37.9 | 0.1 | 53.6 | 0.2 | - | ||
TANS | 34.1 | 0.0 | 24.4 | 0.0 | 37.9 | 0.0 | 53.7 | 0.0 | -0.05 | ||
Freq | NS | 30.2 | 0.1 | 21.5 | 0.0 | 33.1 | 0.0 | 47.7 | 0.1 | - | |
SANS | 34.7 | 0.0 | 25.2 | 0.1 | 38.2 | 0.0 | 53.8 | 0.1 | - | ||
TANS | 34.6 | 0.0 | 25.0 | 0.1 | 38.2 | 0.2 | 53.7 | 0.1 | 0.05 | ||
Uniq | NS | 30.7 | 0.1 | 22.2 | 0.1 | 33.5 | 0.1 | 48.0 | 0.1 | - | |
SANS | 34.7 | 0.1 | 25.1 | 0.1 | 38.3 | 0.1 | 53.9 | 0.1 | - | ||
TANS | †34.9 | 0.0 | †25.4 | 0.0 | †38.6 | 0.1 | †54.0 | 0.1 | 0.5 | ||
HousE | None | NS | 29.1 | 0.1 | 20.6 | 0.1 | 31.6 | 0.1 | 46.3 | 0.1 | - |
SANS | 34.7 | 0.2 | 24.8 | 0.2 | 38.5 | 0.3 | 54.4 | 0.2 | - | ||
TANS | 35.6 | 0.1 | 26.1 | 0.1 | 39.4 | 0.1 | 54.5 | 0.1 | -1 | ||
Base | NS | 28.1 | 0.1 | 19.6 | 0.1 | 30.9 | 0.2 | 45.1 | 0.2 | - | |
SANS | 35.2 | 0.2 | 25.6 | 0.2 | 39.0 | 0.2 | 54.4 | 0.3 | - | ||
TANS | 35.6 | 0.1 | 26.1 | 0.1 | 39.4 | 0.2 | 54.5 | 0.1 | -0.5 | ||
Freq | NS | 27.9 | 0.1 | 19.2 | 0.1 | 30.7 | 0.2 | 45.2 | 0.1 | - | |
SANS | 35.9 | 0.2 | 26.4 | 0.2 | 39.5 | 0.2 | 54.7 | 0.1 | - | ||
TANS | 35.8 | 0.2 | 26.4 | 0.2 | 39.6 | 0.2 | 54.7 | 0.1 | -0.01 | ||
Uniq | NS | 28.8 | 0.1 | 20.2 | 0.2 | 31.9 | 0.1 | 45.7 | 0.0 | - | |
SANS | 36.1 | 0.1 | †26.7 | 0.2 | 39.8 | 0.1 | †54.8 | 0.2 | - | ||
TANS | †36.2 | 0.1 | †26.7 | 0.2 | †39.9 | 0.1 | †54.8 | 0.1 | 0.1 |
WN18RR | |||||||||||
Model | Subsampling | MRR | H@1 | H@3 | H@10 | ||||||
Assumption | Loss | Mean | SD | Mean | SD | Mean | SD | Mean | SD | ||
ComplEx | None | NS | 44.5 | 0.1 | 38.1 | 0.2 | 48.3 | 0.2 | 55.5 | 0.1 | - |
SANS | 45.0 | 0.1 | 41.0 | 0.1 | 46.5 | 0.3 | 53.3 | 0.3 | - | ||
TANS | 47.3 | 0.0 | 43.3 | 0.0 | 49.1 | 0.1 | 55.7 | 0.1 | -2 | ||
Base | NS | 45.0 | 0.1 | 38.9 | 0.1 | 48.6 | 0.2 | 55.7 | 0.1 | - | |
SANS | 46.9 | 0.1 | 42.7 | 0.2 | 48.5 | 0.2 | 55.5 | 0.2 | - | ||
TANS | 47.7 | 0.2 | 43.6 | 0.1 | 49.3 | 0.2 | 55.9 | 0.3 | -2 | ||
Freq | NS | 45.1 | 0.1 | 38.9 | 0.1 | 48.8 | 0.2 | 56.0 | 0.2 | - | |
SANS | 47.4 | 0.1 | 43.2 | 0.1 | 49.2 | 0.2 | 56.0 | 0.2 | - | ||
TANS | 48.0 | 0.1 | 43.9 | 0.1 | †49.7 | 0.1 | 56.1 | 0.1 | -2 | ||
Uniq | NS | 45.0 | 0.1 | 38.7 | 0.1 | 48.8 | 0.1 | 56.0 | 0.3 | - | |
SANS | 47.5 | 0.1 | 43.3 | 0.1 | 49.1 | 0.2 | 56.2 | 0.2 | - | ||
TANS | †48.3 | 0.1 | †44.4 | 0.2 | 49.6 | 0.1 | †56.3 | 0.2 | -1 | ||
DistMult | None | NS | 38.5 | 0.2 | 30.6 | 0.3 | 42.9 | 0.2 | 52.5 | 0.1 | - |
SANS | 42.4 | 0.0 | 38.2 | 0.1 | 43.7 | 0.0 | 51.0 | 0.2 | - | ||
TANS | 44.2 | 0.1 | 40.1 | 0.1 | 45.3 | 0.1 | 53.2 | 0.2 | -2 | ||
Base | NS | 39.3 | 0.2 | 31.9 | 0.2 | 43.3 | 0.1 | 53.0 | 0.2 | - | |
SANS | 43.9 | 0.1 | 39.4 | 0.1 | 45.2 | 0.1 | 53.3 | 0.2 | - | ||
TANS | 44.6 | 0.0 | 40.5 | 0.2 | 45.7 | 0.1 | 53.9 | 0.1 | -2 | ||
Freq | NS | 39.0 | 0.2 | 31.2 | 0.2 | 43.2 | 0.1 | 52.9 | 0.2 | - | |
SANS | 44.5 | 0.1 | 40.0 | 0.1 | 46.0 | 0.1 | 54.2 | 0.2 | - | ||
TANS | 44.7 | 0.1 | 40.5 | 0.2 | 45.8 | 0.0 | 54.0 | 0.2 | -2 | ||
Uniq | NS | 38.8 | 0.2 | 30.8 | 0.2 | 43.1 | 0.1 | 53.0 | 0.2 | - | |
SANS | 44.7 | 0.1 | 40.1 | 0.1 | †46.2 | 0.3 | 54.3 | 0.0 | - | ||
TANS | †45.0 | 0.1 | †40.7 | 0.1 | 46.1 | 0.2 | †54.5 | 0.2 | -0.5 | ||
TransE | None | NS | 21.1 | 0.0 | 2.1 | 0.1 | 36.5 | 0.2 | 50.4 | 0.2 | - |
SANS | 22.5 | 0.1 | 1.7 | 0.1 | 40.2 | 0.1 | 52.5 | 0.2 | - | ||
TANS | 22.7 | 0.0 | 2.5 | 0.0 | 39.5 | 0.2 | 53.4 | 0.1 | 0.5 | ||
Base | NS | 20.3 | 0.1 | 1.6 | 0.1 | 35.1 | 0.2 | 49.9 | 0.2 | - | |
SANS | 22.3 | 0.0 | 1.3 | 0.1 | 40.2 | 0.1 | 52.9 | 0.1 | - | ||
TANS | 22.4 | 0.1 | 1.4 | 0.1 | 40.1 | 0.1 | 53.0 | 0.1 | 0.1 | ||
Freq | NS | 21.0 | 0.1 | 1.8 | 0.1 | 36.4 | 0.2 | 51.0 | 0.2 | - | |
SANS | 23.0 | 0.0 | 1.9 | 0.1 | 40.9 | 0.2 | 53.6 | 0.0 | - | ||
TANS | 23.1 | 0.0 | 2.1 | 0.0 | †41.0 | 0.1 | 53.8 | 0.0 | 0.1 | ||
Uniq | NS | 21.5 | 0.1 | 2.2 | 0.0 | 37.2 | 0.1 | 51.4 | 0.2 | - | |
SANS | 23.2 | 0.0 | 2.3 | 0.1 | 40.9 | 0.2 | 53.6 | 0.1 | - | ||
TANS | †23.3 | 0.1 | †3.0 | 0.0 | 40.2 | 0.2 | †54.4 | 0.1 | 0.5 | ||
RotatE | None | NS | 47.0 | 0.1 | 42.5 | 0.2 | 48.6 | 0.2 | 55.8 | 0.3 | - |
SANS | 47.2 | 0.1 | 42.6 | 0.1 | 49.1 | 0.1 | 56.7 | 0.0 | - | ||
TANS | 47.3 | 0.1 | 42.6 | 0.1 | 49.1 | 0.1 | 56.7 | 0.1 | -0.01 | ||
Base | NS | 47.0 | 0.0 | 42.2 | 0.1 | 48.7 | 0.1 | 56.3 | 0.1 | - | |
SANS | 47.5 | 0.1 | 42.7 | 0.2 | 49.3 | 0.1 | 57.2 | 0.1 | - | ||
TANS | 47.5 | 0.1 | 42.7 | 0.2 | 49.3 | 0.1 | 57.1 | 0.1 | 0.01 | ||
Freq | NS | 47.1 | 0.1 | 42.3 | 0.1 | 48.7 | 0.1 | 56.4 | 0.1 | - | |
SANS | 47.7 | 0.1 | †42.9 | 0.2 | 49.6 | 0.0 | 57.4 | 0.1 | - | ||
TANS | 47.7 | 0.1 | 42.8 | 0.2 | 49.7 | 0.1 | 57.4 | 0.1 | 0.1 | ||
Uniq | NS | 47.2 | 0.2 | 42.7 | 0.2 | 48.7 | 0.1 | 56.3 | 0.1 | - | |
SANS | 47.7 | 0.1 | †42.9 | 0.1 | 49.6 | 0.1 | 57.2 | 0.1 | - | ||
TANS | †47.8 | 0.2 | 42.8 | 0.3 | †49.8 | 0.1 | †57.6 | 0.1 | 0.5 | ||
HAKE | None | NS | 48.8 | 0.1 | 44.5 | 0.1 | 50.5 | 0.2 | 57.3 | 0.1 | - |
SANS | 48.9 | 0.0 | 44.5 | 0.2 | 50.6 | 0.3 | 57.7 | 0.1 | - | ||
TANS | 48.9 | 0.0 | 44.4 | 0.1 | 50.5 | 0.3 | 57.8 | 0.1 | 0.01 | ||
Base | NS | 49.2 | 0.0 | 44.6 | 0.1 | 51.1 | 0.1 | 57.9 | 0.2 | - | |
SANS | 49.5 | 0.1 | 45.0 | 0.2 | 51.2 | 0.2 | 58.2 | 0.2 | - | ||
TANS | 49.5 | 0.1 | 45.0 | 0.2 | 51.2 | 0.3 | 58.4 | 0.2 | 0.1 | ||
Freq | NS | 49.3 | 0.1 | 44.8 | 0.1 | 51.3 | 0.2 | 58.0 | 0.2 | - | |
SANS | 49.7 | 0.1 | 45.2 | 0.2 | 51.5 | 0.1 | 58.4 | 0.2 | - | ||
TANS | 49.7 | 0.0 | 45.2 | 0.2 | 51.6 | 0.3 | 58.4 | 0.2 | -0.01 | ||
Uniq | NS | 49.4 | 0.2 | 44.9 | 0.2 | 51.3 | 0.2 | 57.8 | 0.2 | - | |
SANS | †49.9 | 0.0 | 45.3 | 0.1 | †51.8 | 0.2 | †58.6 | 0.2 | - | ||
TANS | †49.9 | 0.1 | †45.4 | 0.1 | †51.8 | 0.2 | 58.5 | 0.2 | 0.05 | ||
HousE | None | NS | 47.4 | 0.1 | 41.7 | 0.1 | 50.2 | 0.1 | 57.3 | 0.1 | - |
SANS | 49.7 | 0.1 | 44.8 | 0.2 | 51.5 | 0.1 | 59.5 | 0.1 | - | ||
TANS | 50.2 | 0.1 | 45.3 | 0.1 | 52.0 | 0.1 | 60.0 | 0.1 | -0.5 | ||
Base | NS | 48.1 | 0.1 | 42.4 | 0.1 | 50.9 | 0.1 | 58.5 | 0.2 | - | |
SANS | 51.2 | 0.1 | 46.7 | 0.1 | 53.0 | 0.2 | 60.3 | 0.1 | - | ||
TANS | 51.3 | 0.1 | 46.7 | 0.2 | 53.0 | 0.0 | 60.4 | 0.1 | 0.05 | ||
Freq | NS | 48.1 | 0.2 | 42.5 | 0.3 | 50.9 | 0.2 | 58.5 | 0.2 | - | |
SANS | †51.4 | 0.1 | †46.8 | 0.1 | †53.2 | 0.3 | †60.5 | 0.1 | - | ||
TANS | 51.3 | 0.2 | 46.7 | 0.2 | 53.1 | 0.3 | †60.5 | 0.1 | 0.05 | ||
Uniq | NS | 48.1 | 0.1 | 42.5 | 0.1 | 50.8 | 0.2 | 58.1 | 0.1 | - | |
SANS | 51.2 | 0.2 | †46.8 | 0.2 | 52.7 | 0.1 | 60.1 | 0.1 | - | ||
TANS | 51.1 | 0.3 | 46.7 | 0.5 | 52.7 | 0.1 | 60.0 | 0.1 | -0.1 |
YAGO3-10 | |||||||||||
Model | Subsampling | MRR | H@1 | H@3 | H@10 | ||||||
Assumption | Loss | Mean | SD | Mean | SD | Mean | SD | Mean | SD | ||
RotatE | None | NS | 43.5 | 0.1 | 32.8 | 0.2 | 49.1 | 0.2 | 63.7 | 0.3 | - |
SANS | 49.6 | 0.2 | 39.9 | 0.1 | 55.3 | 0.3 | 67.3 | 0.2 | - | ||
TANS | 49.6 | 0.2 | 40.0 | 0.2 | 55.4 | 0.5 | 67.2 | 0.3 | -0.05 | ||
Base | NS | 44.8 | 0.1 | 34.5 | 0.3 | 50.0 | 0.2 | 64.7 | 0.2 | - | |
SANS | 49.6 | 0.3 | 40.1 | 0.3 | 55.2 | 0.4 | 67.4 | 0.3 | - | ||
TANS | 49.5 | 0.3 | 40.1 | 0.3 | 55.0 | 0.5 | 67.3 | 0.3 | -0.05 | ||
Freq | NS | 44.8 | 0.2 | 34.5 | 0.3 | 50.0 | 0.1 | 64.7 | 0.2 | - | |
SANS | 49.9 | 0.2 | 40.5 | 0.3 | 55.5 | 0.5 | 67.4 | 0.3 | - | ||
TANS | 49.9 | 0.2 | 40.5 | 0.3 | 55.5 | 0.5 | 67.4 | 0.2 | 0.01 | ||
Uniq | NS | 44.4 | 0.2 | 34.0 | 0.3 | 49.8 | 0.2 | 64.3 | 0.2 | - | |
SANS | 50.0 | 0.3 | 40.6 | 0.2 | 55.6 | 0.3 | 67.5 | 0.2 | - | ||
TANS | †50.1 | 0.2 | †40.7 | 0.1 | †55.7 | 0.3 | †67.6 | 0.3 | 0.05 | ||
HAKE | None | NS | 47.4 | 0.3 | 36.6 | 0.5 | 53.9 | 0.1 | 67.0 | 0.1 | - |
SANS | 53.5 | 0.2 | 44.6 | 0.3 | 59.1 | 0.4 | 69.0 | 0.2 | - | ||
TANS | 53.7 | 0.1 | 45.3 | 0.3 | 59.0 | 0.1 | 68.8 | 0.1 | 0.05 | ||
Base | NS | 48.8 | 0.3 | 38.4 | 0.4 | 55.0 | 0.2 | 68.1 | 0.3 | - | |
SANS | 54.6 | 0.2 | 46.2 | 0.3 | 59.9 | 0.2 | 69.6 | 0.2 | - | ||
TANS | 54.5 | 0.2 | 45.9 | 0.3 | 59.9 | 0.2 | 69.9 | 0.1 | -0.1 | ||
Freq | NS | 49.3 | 0.2 | 39.1 | 0.3 | 55.4 | 0.1 | 68.1 | 0.2 | - | |
SANS | 54.6 | 0.4 | 46.0 | 0.7 | 60.2 | 0.1 | 69.6 | 0.3 | - | ||
TANS | 54.8 | 0.2 | 46.4 | 0.3 | 60.1 | 0.1 | 69.6 | 0.3 | 0.05 | ||
Uniq | NS | 45.2 | 0.1 | 34.3 | 0.1 | 51.1 | 0.1 | 65.8 | 0.3 | - | |
SANS | †55.2 | 0.3 | †46.8 | 0.5 | †60.5 | 0.2 | †70.0 | 0.3 | - | ||
TANS | 55.1 | 0.2 | †46.8 | 0.3 | 60.3 | 0.1 | 69.9 | 0.2 | -0.1 | ||
HousE | None | NS | 29.2 | 0.0 | 18.3 | 0.1 | 33.6 | 0.2 | 50.1 | 0.2 | - |
SANS | 54.8 | 1.3 | 46.8 | 1.3 | 59.7 | 1.2 | 68.9 | 1.2 | - | ||
TANS | 54.8 | 1.2 | 46.9 | 1.2 | 59.6 | 1.2 | 68.8 | 1.1 | 0.01 | ||
Base | NS | 29.6 | 0.1 | 19.8 | 0.1 | 33.6 | 0.2 | 48.9 | 0.1 | - | |
SANS | 56.7 | 0.1 | 48.6 | 0.2 | 61.7 | 0.2 | 71.3 | 0.1 | - | ||
TANS | 57.0 | 0.2 | 49.0 | 0.4 | 61.9 | 0.3 | †71.5 | 0.2 | -0.1 | ||
Freq | NS | 27.3 | 0.8 | 17.5 | 0.9 | 31.0 | 0.8 | 46.6 | 0.8 | - | |
SANS | 57.0 | 0.1 | 49.0 | 0.2 | 62.0 | 0.1 | 71.4 | 0.1 | - | ||
TANS | 57.2 | 0.1 | 49.3 | 0.1 | †62.3 | 0.1 | 71.4 | 0.1 | -0.1 | ||
Uniq | NS | 28.1 | 0.2 | 18.2 | 0.4 | 31.8 | 0.1 | 47.6 | 0.0 | - | |
SANS | 57.2 | 0.1 | 49.3 | 0.2 | 62.0 | 0.0 | 71.4 | 0.2 | - | ||
TANS | †57.3 | 0.2 | †49.5 | 0.3 | 62.2 | 0.1 | †71.5 | 0.1 | -0.05 |
FB15k-237-HL | |||||||
Model | Subsampling | MRR | H@1 | ||||
Assumption | Loss | Mean | SD | Mean | SD | ||
HAKE | None | NS | 38.1 | 0.3 | 28.4 | 0.5 | - |
SANS | 35.2 | 0.2 | 24.5 | 0.3 | - | ||
TANS | 41.1 | 0.1 | 33.0 | 0.1 | -1 | ||
Base | NS | 40.5 | 0.1 | 31.8 | 0.2 | - | |
SANS | 38.4 | 0.2 | 28.9 | 0.2 | - | ||
TANS | 41.8 | 0.1 | 33.6 | 0.2 | -1 | ||
Freq | NS | 41.1 | 0.1 | 32.8 | 0.1 | - | |
SANS | 40.2 | 0.0 | 31.5 | 0.1 | - | ||
TANS | †42.0 | 0.1 | †33.7 | 0.1 | -1 | ||
Uniq | NS | 41.5 | 0.1 | 33.2 | 0.1 | - | |
SANS | 41.1 | 0.0 | 32.8 | 0.0 | - | ||
TANS | 41.9 | 0.2 | 33.5 | 0.2 | -0.1 | ||
RotatE | None | NS | 40.0 | 0.1 | 30.8 | 0.1 | - |
SANS | 36.3 | 0.1 | 25.3 | 0.2 | - | ||
TANS | 41.5 | 0.0 | 33.1 | 0.1 | -1 | ||
Base | NS | 41.8 | 0.1 | 33.6 | 0.1 | - | |
SANS | 40.7 | 0.1 | 31.7 | 0.2 | - | ||
TANS | 42.0 | 0.1 | 33.8 | 0.1 | -0.5 | ||
Freq | NS | 41.3 | 0.1 | 33.2 | 0.1 | - | |
SANS | 42.0 | 0.2 | 33.6 | 0.3 | - | ||
TANS | †42.3 | 0.0 | †34.1 | 0.1 | -0.5 | ||
Uniq | NS | 41.7 | 0.1 | 33.7 | 0.2 | - | |
SANS | 42.2 | 0.1 | 33.8 | 0.2 | - | ||
TANS | 42.1 | 0.1 | 33.8 | 0.2 | -0.05 | ||
HousE | None | NS | 39.1 | 0.2 | 29.8 | 0.2 | - |
SANS | 37.0 | 0.2 | 26.2 | 0.4 | - | ||
TANS | 42.3 | 0.1 | 34.1 | 0.2 | -2 | ||
Base | NS | 40.3 | 0.1 | 31.3 | 0.2 | - | |
SANS | 40.5 | 0.4 | 31.3 | 0.4 | - | ||
TANS | 42.4 | 0.2 | 34.2 | 0.3 | -2 | ||
Freq | NS | 39.8 | 0.3 | 31.0 | 0.3 | - | |
SANS | 42.1 | 0.2 | 33.8 | 0.2 | - | ||
TANS | †42.8 | 0.3 | †34.8 | 0.4 | -1 | ||
Uniq | NS | 40.5 | 0.2 | 31.9 | 0.2 | - | |
SANS | 42.4 | 0.2 | 34.4 | 0.2 | - | ||
TANS | 42.5 | 0.1 | 34.5 | 0.0 | -1 |
WN18RR-HL | |||||||
Model | Subsampling | MRR | H@1 | ||||
Assumption | Loss | Mean | SD | Mean | SD | ||
HAKE | None | NS | 10.8 | 0.1 | 8.7 | 0.2 | - |
SANS | 10.3 | 0.1 | 7.8 | 0.1 | - | ||
TANS | 13.9 | 0.2 | †12.1 | 0.2 | -2 | ||
Base | NS | 12.1 | 0.2 | 9.5 | 0.3 | - | |
SANS | 11.1 | 0.1 | 9.1 | 0.1 | - | ||
TANS | 13.7 | 0.1 | 11.7 | 0.3 | -2 | ||
Freq | NS | 12.4 | 0.1 | 10.4 | 0.1 | - | |
SANS | 11.9 | 0.2 | 9.5 | 0.2 | - | ||
TANS | †14.2 | 0.5 | 11.9 | 0.4 | -2 | ||
Uniq | NS | 13.3 | 0.3 | 11.3 | 0.3 | - | |
SANS | 11.9 | 0.2 | 9.7 | 0.2 | - | ||
TANS | 14.1 | 0.2 | 11.7 | 0.2 | -2 | ||
RotatE | None | NS | 14.2 | 0.2 | 11.8 | 0.3 | - |
SANS | 13.9 | 0.3 | 11.7 | 0.3 | - | ||
TANS | 14.4 | 0.1 | 11.8 | 0.2 | -2 | ||
Base | NS | 13.9 | 0.2 | 11.5 | 0.2 | - | |
SANS | 14.1 | 0.3 | 11.7 | 0.3 | - | ||
TANS | 14.5 | 0.1 | 11.7 | 0.1 | -2 | ||
Freq | NS | 14.4 | 0.1 | 12.0 | 0.1 | - | |
SANS | 14.3 | 0.4 | 12.0 | 0.3 | - | ||
TANS | †15.1 | 0.1 | 12.2 | 0.1 | -2 | ||
Uniq | NS | 14.4 | 0.2 | 12.2 | 0.1 | - | |
SANS | 14.2 | 0.2 | 11.9 | 0.2 | - | ||
TANS | †15.1 | 0.2 | †12.3 | 0.3 | -2 | ||
HousE | None | NS | 10.7 | 1.8 | 8.4 | 1.4 | - |
SANS | 11.7 | 1.1 | 9.5 | 0.9 | - | ||
TANS | 13.4 | 0.4 | 11.0 | 0.4 | -2 | ||
Base | NS | 9.9 | 0.4 | 8.4 | 0.4 | - | |
SANS | 11.5 | 0.2 | 9.5 | 0.2 | - | ||
TANS | 13.4 | 0.2 | 11.3 | 0.3 | -2 | ||
Freq | NS | †13.9 | 0.1 | 11.8 | 0.2 | - | |
SANS | 13.8 | 0.2 | 11.9 | 0.3 | - | ||
TANS | †13.9 | 0.3 | †12.0 | 0.2 | 0.1 | ||
Uniq | NS | 13.7 | 0.1 | 11.6 | 0.1 | - | |
SANS | 13.8 | 0.2 | 11.6 | 0.2 | - | ||
TANS | 13.8 | 0.2 | 11.7 | 0.3 | -0.05 |
YAGO3-10-HL | |||||||
Model | Subsampling | MRR | H@1 | ||||
Assumption | Loss | Mean | SD | Mean | SD | ||
HAKE | None | NS | 45.9 | 0.0 | 36.9 | 0.1 | - |
SANS | 47.8 | 0.4 | 40.0 | 0.6 | - | ||
TANS | 49.2 | 0.4 | 39.8 | 0.7 | -0.5 | ||
Base | NS | 50.2 | 0.3 | 43.0 | 0.3 | - | |
SANS | 47.7 | 0.4 | 40.5 | 0.7 | - | ||
TANS | 50.1 | 0.3 | 41.4 | 0.3 | -0.5 | ||
Freq | NS | †50.8 | 0.3 | †43.3 | 0.2 | - | |
SANS | 48.8 | 0.1 | 41.3 | 0.2 | - | ||
TANS | 49.7 | 0.3 | 41.0 | 0.2 | -0.5 | ||
Uniq | NS | 49.4 | 0.2 | 40.8 | 0.2 | - | |
SANS | 46.9 | 0.4 | 39.8 | 0.5 | - | ||
TANS | 49.4 | 0.6 | 40.6 | 0.8 | -0.5 | ||
RotatE | None | NS | 38.0 | 0.1 | 28.7 | 0.3 | - |
SANS | 41.3 | 0.1 | 32.3 | 0.2 | - | ||
TANS | 43.5 | 0.1 | 34.8 | 0.2 | -0.5 | ||
Base | NS | 40.6 | 0.2 | 31.8 | 0.5 | - | |
SANS | 43.8 | 0.2 | 35.1 | 0.1 | - | ||
TANS | 43.8 | 0.2 | 35.2 | 0.1 | -0.05 | ||
Freq | NS | 40.3 | 0.2 | 31.4 | 0.4 | - | |
SANS | 43.5 | 0.2 | 34.6 | 0.1 | - | ||
TANS | 43.7 | 0.0 | 35.1 | 0.1 | -0.1 | ||
Uniq | NS | 40.2 | 0.0 | 31.3 | 0.2 | - | |
SANS | 43.9 | 0.1 | 35.1 | 0.2 | - | ||
TANS | †44.1 | 0.1 | †35.4 | 0.3 | -0.1 | ||
HousE | None | NS | 37.8 | 0.3 | 26.9 | 0.4 | - |
SANS | 50.3 | 0.1 | 40.7 | 0.3 | - | ||
TANS | †52.5 | 0.5 | †45.4 | 0.3 | -0.5 | ||
Base | NS | 42.8 | 1.2 | 34.3 | 1.9 | - | |
SANS | 51.9 | 0.3 | 44.4 | 0.2 | - | ||
TANS | 51.9 | 0.6 | 44.3 | 0.8 | 0.05 | ||
Freq | NS | 39.7 | 0.8 | 29.9 | 1.5 | - | |
SANS | 48.6 | 1.7 | 40.0 | 1.4 | - | ||
TANS | 52.0 | 0.1 | 44.5 | 0.3 | -1 | ||
Uniq | NS | 41.0 | 0.1 | 31.6 | 0.1 | - | |
SANS | 49.4 | 0.3 | 41.1 | 1.1 | - | ||
TANS | 52.2 | 0.1 | 44.7 | 0.1 | -0.05 |
![Refer to caption](extracted/5712115/figures/train_valid_curves_fb.png)
![Refer to caption](extracted/5712115/figures/train_valid_curves_wn.png)
![Refer to caption](extracted/5712115/figures/train_valid_curves_yago.png)