Unified Interpretation of Smoothing Methods for Negative Sampling Loss Functions in Knowledge Graph Embedding

Einführung

This is the PyTorch implementation of the TANS model for knowledge graph embedding (KGE). We proposed a new subsampling method "TANS" with theoretical explanation and experimental validation of recent negative sampling methods used for knowledge graph embedding.

Abstract: Knowledge Graphs (KGs) are fundamental resources in knowledge-intensive tasks in NLP. Due to the limitation of manually creating KGs, KG Completion (KGC) has an important role in automatically completing KGs by scoring their links with KG Embedding (KGE). To handle many entities in training, KGE relies on Negative Sampling (NS) loss that can reduce the computational cost by sampling. Since the appearance frequencies for each link are at most one in KGs, sparsity is an essential and inevitable problem. The NS loss is no exception. As a solution, the NS loss in KGE relies on smoothing methods like Self-Adversarial Negative Sampling (SANS) and subsampling. However, it is uncertain what kind of smoothing method is suitable for this purpose due to the lack of theoretical understanding. This paper provides theoretical interpretations of the smoothing methods for the NS loss in KGE and induces a new NS loss, Triplet Adaptive Negative Sampling (TANS), that can cover the characteristics of the conventional smoothing methods. Experimental results of TransE, DistMult, ComplEx, RotatE, HAKE, and HousE on FB15k-237, WN18RR, and YAGO3-10 datasets and their sparser subsets show the soundness of our interpretation and performance improvement by our TANS.

Implemented features

Models:

Evaluation Metrics:

MRR, MR, HITS@1, HITS@3, HITS@10 (filtered)
AUC-PR (for Countries data sets)

Loss Function:

Uniform Negative Sampling
Self-Adversarial Negative Sampling (SANS)
Subsampling (Base, Freq, Uniq)
Triplet Adaptive Negative Sampling (TANS)

Usage

Knowledge Graph Data:

entities.dict: a dictionary map entities to unique ids
relations.dict: a dictionary map relations to unique ids
train.txt: the KGE model is trained to fit this data set
valid.txt: create a blank file if no validation data is available
test.txt: the KGE model is evaluated on this data set

Train

For example, this command train a RotatE model on FB15k dataset with GPU 0.

CUDA_VISIBLE_DEVICES=0 python -u codes/run.py --do_train \
 --cuda \
 --do_valid \
 --do_test \
 --data_path data/FB15k \
 --model RotatE \
 -n 256 -b 1024 -d 1000 \
 -g 24.0 -a 1.0 -adv \
 -lr 0.0001 --max_steps 150000 \
 -save models/RotatE_FB15k_0 --test_batch_size 16 -de

Check argparse configuration at codes/run.py for more arguments and more details.

Test

CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_test --cuda -init $SAVE

Reproducing the best results

To reprocude the results in the our paper Unified Interpretation of Smoothing Methods for Negative Sampling Loss Functions in Knowledge Graph Embedding, you can run the bash commands in config.sh.

The run.sh script provides an easy way to search hyper-parameters:

bash run.sh train RotatE FB15k 0 0 1024 256 1000 24.0 1.0 0.0001 200000 16 -de

Using the library

The python libarary is organized around 3 objects:

TrainDataset (dataloader.py): prepare data stream for training
TestDataSet (dataloader.py): prepare data stream for evluation
KGEModel (model.py): calculate triple score and provide train/test API

The run.py file contains the main function, which parses arguments, reads data, initilize the model and provides the training loop.

Add your own model to model.py like:

def TransE(self, head, relation, tail, mode):
    if mode == 'head-batch':
        score = head + (relation - tail)
    else:
        score = (head + relation) - tail

    score = self.gamma.item() - torch.norm(score, p=1, dim=2)
    return score

Citation

If you use the codes, please cite the following paper:

??

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
analysis		analysis
codes		codes
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.sh		config.sh
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unified Interpretation of Smoothing Methods for Negative Sampling Loss Functions in Knowledge Graph Embedding

Über uns

Releases

Packages

Contributors 5

Languages

License

xincanfeng/ss_kge

Folders and files

Latest commit

History

Repository files navigation

Unified Interpretation of Smoothing Methods for Negative Sampling Loss Functions in Knowledge Graph Embedding

Über uns

Ressourcen

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages