Skip to main content

Showing 1–41 of 41 results for author: de Silva, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12336  [pdf, other

    cs.CL

    M2DS: Multilingual Dataset for Multi-document Summarisation

    Authors: Kushan Hewapathirana, Nisansa de Silva, C. D. Athuraliya

    Abstract: In the rapidly evolving digital era, there is an increasing demand for concise information as individuals seek to distil key insights from various sources. Recent attention from researchers on Multi-document Summarisation (MDS) has resulted in diverse datasets covering customer reviews, academic papers, medical and legal documents, and news articles. However, the English-centric nature of these da… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.02834  [pdf, ps, other

    cs.CL

    Aspect-Based Sentiment Analysis Techniques: A Comparative Study

    Authors: Dineth Jayakody, Koshila Isuranda, A V A Malkith, Nisansa de Silva, Sachintha Rajith Ponnamperuma, G G N Sandamali, K L K Sudheera

    Abstract: Since the dawn of the digitalisation era, customer feedback and online reviews are unequivocally major sources of insights for businesses. Consequently, conducting comparative analyses of such sources has become the de facto modus operandi of any business that wishes to give itself a competitive edge over its peers and improve customer loyalty. Sentiment analysis is one such method instrumental in… ▽ More

    Submitted 4 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  3. SHADE: Semantic Hypernym Annotator for Domain-specific Entities -- DnD Domain Use Case

    Authors: Akila Peiris, Nisansa de Silva

    Abstract: Manual data annotation is an important NLP task but one that takes considerable amount of resources and effort. In spite of the costs, labeling and categorizing entities is essential for NLP tasks such as semantic evaluation. Even though annotation can be done by non-experts in most cases, due to the fact that this requires human labor, the process is costly. Another major challenge encountered in… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  4. arXiv:2406.06021  [pdf, other

    cs.CL

    Shoulders of Giants: A Look at the Degree and Utility of Openness in NLP Research

    Authors: Surangika Ranathunga, Nisansa de Silva, Dilith Jayakody, Aloka Fernando

    Abstract: We analysed a sample of NLP research papers archived in ACL Anthology as an attempt to quantify the degree of openness and the benefit of such an open culture in the NLP community. We observe that papers published in different NLP venues show different patterns related to artefact reuse. We also note that more than 30% of the papers we analysed do not release their artefacts publicly, despite prom… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Will appear in ACL 2024

  5. Fine Tuning Named Entity Extraction Models for the Fantasy Domain

    Authors: Aravinth Sivaganeshan, Nisansa de Silva

    Abstract: Named Entity Recognition (NER) is a sequence classification Natural Language Processing task where entities are identified in the text and classified into predefined categories. It acts as a foundation for most information extraction systems. Dungeons and Dragons (D&D) is an open-ended tabletop fantasy game with its own diverse lore. DnD entities are domain-specific and are thus unrecognizable by… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  6. arXiv:2402.07446  [pdf, other

    cs.CL

    Quality Does Matter: A Detailed Look at the Quality and Utility of Web-Mined Parallel Corpora

    Authors: Surangika Ranathunga, Nisansa de Silva, Menan Velayuthan, Aloka Fernando, Charitha Rathnayake

    Abstract: We conducted a detailed analysis on the quality of web-mined corpora for two low-resource languages (making three language pairs, English-Sinhala, English-Tamil and Sinhala-Tamil). We ranked each corpus according to a similarity measure and carried out an intrinsic and extrinsic evaluation on different portions of this ranked corpus. We show that there are significant quality differences between d… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  7. arXiv:2401.07356  [pdf, other

    cs.SE

    BUGSPHP: A dataset for Automated Program Repair in PHP

    Authors: K. D. Pramod, W. T. N. De Silva, W. U. K. Thabrew, Ridwan Shariffdeen, Sandareka Wickramanayake

    Abstract: Automated Program Repair (APR) improves developer productivity by saving debugging and bug-fixing time. While APR has been extensively explored for C/C++ and Java programs, there is little research on bugs in PHP programs due to the lack of a benchmark PHP bug dataset. This is surprising given that PHP has been one of the most widely used server-side languages for over two decades, being used in a… ▽ More

    Submitted 21 January, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

  8. arXiv:2311.10436  [pdf, other

    cs.CL

    Sinhala-English Word Embedding Alignment: Introducing Datasets and Benchmark for a Low Resource Language

    Authors: Kasun Wickramasinghe, Nisansa de Silva

    Abstract: Since their inception, embeddings have become a primary ingredient in many flavours of Natural Language Processing (NLP) tasks supplanting earlier types of representation. Even though multilingual embeddings have been used for the increasing number of multilingual tasks, due to the scarcity of parallel training data, low-resource languages such as Sinhala, tend to focus more on monolingual embeddi… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Journal ref: https://aclanthology.org/2023.paclic-1.42/

  9. arXiv:2311.10357  [pdf, ps, other

    quant-ph cs.DS

    Fast algorithms for classical specifications of stabiliser states and Clifford gates

    Authors: Nadish de Silva, Wilfred Salmon, Ming Yin

    Abstract: The stabiliser formalism plays a central role in quantum computing, error correction, and fault-tolerance. Stabiliser states are used to encode computational basis states. Clifford gates are those which can be easily performed fault-tolerantly in the most common error correction schemes. Their mathematical properties are the subject of significant research interest. Conversions between and verif… ▽ More

    Submitted 26 May, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: Python implementations available at https://github.com/ndesilva/stabiliser-tools. New in v2: new algorithm for extracting the stabiliser tableau of a Clifford gate matrix that is exponentially faster compared to v1, more thorough complexity analyses. New in v3: new and faster algorithms, comparisons with existing implementations

  10. arXiv:2310.08083  [pdf, other

    cs.SE cs.IR

    On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization

    Authors: Junayed Mahmud, Nadeeshan De Silva, Safwat Ali Khan, Seyed Hooman Mostafavi, SM Hasan Mansur, Oscar Chaparro, Andrian Marcus, Kevin Moran

    Abstract: One of the most important tasks related to managing bug reports is localizing the fault so that a fix can be applied. As such, prior work has aimed to automate this task of bug localization by formulating it as an information retrieval problem, where potentially buggy files are retrieved and ranked according to their textual similarity with a given bug report. However, there is often a notable sem… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 13 pages, to appear in the Proceedings of the 46th International Conference on Software Engineering (ICSE'24)

  11. arXiv:2309.17171  [pdf, other

    cs.CL cs.LG

    Comparative Analysis of Named Entity Recognition in the Dungeons and Dragons Domain

    Authors: Gayashan Weerasundara, Nisansa de Silva

    Abstract: Many NLP tasks, although well-resolved for general English, face challenges in specific domains like fantasy literature. This is evident in Named Entity Recognition (NER), which detects and categorizes entities in text. We analyzed 10 NER models on 7 Dungeons and Dragons (D&D) adventure books to assess domain-specific performance. Using open-source Large Language Models, we annotated named entitie… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: 9 pages

  12. Multi-document Summarization: A Comparative Evaluation

    Authors: Kushan Hewapathirana, Nisansa de Silva, C. D. Athuraliya

    Abstract: This paper is aimed at evaluating state-of-the-art models for Multi-document Summarization (MDS) on different types of datasets in various domains and investigating the limitations of existing models to determine future research directions. To address this gap, we conducted an extensive literature review to identify state-of-the-art models and datasets. We analyzed the performance of PRIMERA and P… ▽ More

    Submitted 12 September, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

  13. Sinhala-English Parallel Word Dictionary Dataset

    Authors: Kasun Wickramasinghe, Nisansa de Silva

    Abstract: Parallel datasets are vital for performing and evaluating any kind of multilingual task. However, in the cases where one of the considered language pairs is a low-resource language, the existing top-down parallel data such as corpora are lacking in both tally and quality due to the dearth of human annotation. Therefore, for low-resource languages, it is more feasible to move in the bottom-up direc… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  14. arXiv:2302.06050  [pdf, other

    cs.SE

    BURT: A Chatbot for Interactive Bug Reporting

    Authors: Yang Song, Junayed Mahmud, Nadeeshan De Silva, Ying Zhou, Oscar Chaparro, Kevin Moran, Andrian Marcus, Denys Poshyvanyk

    Abstract: This paper introduces BURT, a web-based chatbot for interactive reporting of Android app bugs. BURT is designed to assist Android app end-users in reporting high-quality defect information using an interactive interface. BURT guides the users in reporting essential bug report elements, i.e., the observed behavior, expected behavior, and the steps to reproduce the bug. It verifies the quality of th… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

    Comments: Accepted by the Demonstrations Track of the 45th International Conference on Software Engineering (ICSE'23). arXiv admin note: substantial text overlap with arXiv:2209.10062

  15. arXiv:2212.09080  [pdf, other

    cs.CL cs.LG

    Synthesis and Evaluation of a Domain-specific Large Data Set for Dungeons & Dragons

    Authors: Akila Peiris, Nisansa de Silva

    Abstract: This paper introduces the Forgotten Realms Wiki (FRW) data set and domain specific natural language generation using FRW along with related analyses. Forgotten Realms is the de-facto default setting of the popular open ended tabletop fantasy role playing game, Dungeons & Dragons. The data set was extracted from the Forgotten Realms Fandom wiki consisting of more than over 45,200 articles. The FRW… ▽ More

    Submitted 18 December, 2022; originally announced December 2022.

  16. arXiv:2210.14472  [pdf, other

    cs.CL

    Sinhala Sentence Embedding: A Two-Tiered Structure for Low-Resource Languages

    Authors: Gihan Weeraprameshwara, Vihanga Jayawickrama, Nisansa de Silva, Yudhanjaya Wijeratne

    Abstract: In the process of numerically modeling natural languages, developing language embeddings is a vital step. However, it is challenging to develop functional embeddings for resource-poor languages such as Sinhala, for which sufficiently large corpora, effective language parsers, and any other required resources are difficult to find. In such conditions, the exploitation of existing models to come up… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  17. arXiv:2210.08523  [pdf, other

    cs.CL

    Some Languages are More Equal than Others: Probing Deeper into the Linguistic Disparity in the NLP World

    Authors: Surangika Ranathunga, Nisansa de Silva

    Abstract: Linguistic disparity in the NLP world is a problem that has been widely acknowledged recently. However, different facets of this problem, or the reasons behind this disparity are seldom discussed within the NLP community. This paper provides a comprehensive analysis of the disparity that exists within the languages of the world. We show that simply categorising languages considering data availabil… ▽ More

    Submitted 19 October, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

  18. Selecting Seed Words for Wordle using Character Statistics

    Authors: Nisansa de Silva

    Abstract: Wordle, a word guessing game rose to global popularity in the January of 2022. The goal of the game is to guess a five-letter English word within six tries. Each try provides the player with hints by means of colour changing tiles which inform whether or not a given character is part of the solution as well as, in cases where it is part of the solution, whether or not it is in the correct placemen… ▽ More

    Submitted 6 February, 2024; v1 submitted 7 February, 2022; originally announced February 2022.

  19. Sentiment Analysis with Deep Learning Models: A Comparative Study on a Decade of Sinhala Language Facebook Data

    Authors: Gihan Weeraprameshwara, Vihanga Jayawickrama, Nisansa de Silva, Yudhanjaya Wijeratne

    Abstract: The relationship between Facebook posts and the corresponding reaction feature is an interesting subject to explore and understand. To achieve this end, we test state-of-the-art Sinhala sentiment analysis models against a data set containing a decade worth of Sinhala posts with millions of reactions. For the purpose of establishing benchmarks and with the goal of identifying the best model for Sin… ▽ More

    Submitted 13 January, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

    Comments: 8 pages, LaTeX; typos corrected

  20. Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts

    Authors: Vihanga Jayawickrama, Gihan Weeraprameshwara, Nisansa de Silva, Yudhanjaya Wijeratne

    Abstract: The Facebook network allows its users to record their reactions to text via a typology of emotions. This network, taken at scale, is therefore a prime data set of annotated sentiment data. This paper uses millions of such reactions, derived from a decade worth of Facebook post data centred around a Sri Lankan context, to model an eye of the beholder approach to sentiment detection for online Sinha… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

  21. arXiv:2111.05721  [pdf, other

    cs.CL cs.AI cs.LG

    Critical Sentence Identification in Legal Cases Using Multi-Class Classification

    Authors: Sahan Jayasinghe, Lakith Rambukkanage, Ashan Silva, Nisansa de Silva, Amal Shehan Perera

    Abstract: Inherently, the legal domain contains a vast amount of data in text format. Therefore it requires the application of Natural Language Processing (NLP) to cater to the analytically demanding needs of the domain. The advancement of NLP is spreading through various domains, such as the legal domain, in forms of practical applications and academic research. Identifying critical sentences, facts and ar… ▽ More

    Submitted 14 November, 2021; v1 submitted 10 November, 2021; originally announced November 2021.

  22. Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

    Authors: Julia Kreutzer, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Ortiz Suarez, Iroro Orife, Kelechi Ogueji, Andre Niyongabo Rubungo, Toan Q. Nguyen, Mathias Müller, André Müller , et al. (27 additional authors not shown)

    Abstract: With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have system… ▽ More

    Submitted 21 February, 2022; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted at TACL; pre-MIT Press publication version

    Journal ref: Transactions of the Association for Computational Linguistics (2022) 10: 50-72

  23. SigmaLaw-ABSA: Dataset for Aspect-Based Sentiment Analysis in Legal Opinion Texts

    Authors: Chanika Ruchini Mudalige, Dilini Karunarathna, Isanka Rajapaksha, Nisansa de Silva, Gathika Ratnayaka, Amal Shehan Perera, Ramesh Pathirana

    Abstract: Aspect-Based Sentiment Analysis (ABSA) has been prominent and ongoing research over many different domains, but it is not widely discussed in the legal domain. A number of publicly available datasets for a wide range of domains usually fulfill the needs of researchers to perform their studies in the field of ABSA. To the best of our knowledge, there is no publicly available dataset for the Aspect… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

    Comments: 6 pages, 2 figures, IEEE International Conference on Industrial and Information Systems(ICIIS) 2020

  24. Rule-Based Approach for Party-Based Sentiment Analysis in Legal Opinion Texts

    Authors: Isanka Rajapaksha, Chanika Ruchini Mudalige, Dilini Karunarathna, Nisansa de Silva, Gathika Ratnayaka, Amal Shehan Perera

    Abstract: A document which elaborates opinions and arguments related to the previous court cases is known as a legal opinion text. Lawyers and legal officials have to spend considerable effort and time to obtain the required information manually from those documents when dealing with new legal cases. Hence, it provides much convenience to those individuals if there is a way to automate the process of extrac… ▽ More

    Submitted 13 November, 2020; v1 submitted 11 November, 2020; originally announced November 2020.

    Comments: 2 pages, 1 figure, The 20th International Conference on Advances in ICT for Emerging Regions (ICTer2020)

  25. arXiv:2011.00318  [pdf, ps, other

    cs.CL

    Effective Approach to Develop a Sentiment Annotator For Legal Domain in a Low Resource Setting

    Authors: Gathika Ratnayaka, Nisansa de Silva, Amal Shehan Perera, Ramesh Pathirana

    Abstract: Analyzing the sentiments of legal opinions available in Legal Opinion Texts can facilitate several use cases such as legal judgement prediction, contradictory statements identification and party-based sentiment analysis. However, the task of developing a legal domain specific sentiment annotator is challenging due to resource constraints such as lack of domain specific labelled data and domain exp… ▽ More

    Submitted 31 October, 2020; originally announced November 2020.

  26. arXiv:2007.07884  [pdf

    cs.CL cs.SI

    Sinhala Language Corpora and Stopwords from a Decade of Sri Lankan Facebook

    Authors: Yudhanjaya Wijeratne, Nisansa de Silva

    Abstract: This paper presents two colloquial Sinhala language corpora from the language efforts of the Data, Analysis and Policy team of LIRNEasia, as well as a list of algorithmically derived stopwords. The larger of the two corpora spans 2010 to 2020 and contains 28,825,820 to 29,549,672 words of multilingual text posted by 533 Sri Lankan Facebook pages, including politics, media, celebrities, and other c… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: 10 pages; Github repo of data linked in summary

  27. arXiv:1908.09775  [pdf, other

    cs.CV cs.LG eess.IV

    Multi-Path Learnable Wavelet Neural Network for Image Classification

    Authors: D. D. N. De Silva, H. W. M. K. Vithanage, K. S. D. Fernando, I. T. S. Piyatilake

    Abstract: Despite the remarkable success of deep learning in pattern recognition, deep network models face the problem of training a large number of parameters. In this paper, we propose and evaluate a novel multi-path wavelet neural network architecture for image classification with far less number of trainable parameters. The model architecture consists of a multi-path layout with several levels of wavele… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

  28. arXiv:1906.02430  [pdf, other

    cs.CL

    Shift-of-Perspective Identification Within Legal Cases

    Authors: Gathika Ratnayaka, Thejan Rupasinghe, Nisansa de Silva, Viraj Salaka Gamage, Menuka Warushavithana, Amal Shehan Perera

    Abstract: Arguments, counter-arguments, facts, and evidence obtained via documents related to previous court cases are of essential need for legal professionals. Therefore, the process of automatic information extraction from documents containing legal opinions related to court cases can be considered to be of significant importance. This study is focused on the identification of sentences in legal opinion… ▽ More

    Submitted 17 August, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

  29. arXiv:1906.02358  [pdf, other

    cs.CL

    Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

    Authors: Nisansa de Silva

    Abstract: Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cou… ▽ More

    Submitted 19 April, 2024; v1 submitted 5 June, 2019; originally announced June 2019.

  30. arXiv:1903.03772  [pdf, ps, other

    cs.AI cs.CL

    Logic Rules Powered Knowledge Graph Embedding

    Authors: Pengwei Wang, Dejing Dou, Fangzhao Wu, Nisansa de Silva, Lianwen Jin

    Abstract: Large scale knowledge graph embedding has attracted much attention from both academia and industry in the field of Artificial Intelligence. However, most existing methods concentrate solely on fact triples contained in the given knowledge graph. Inspired by the fact that logic rules can provide a flexible and declarative language for expressing rich background knowledge, it is natural to integrate… ▽ More

    Submitted 9 March, 2019; originally announced March 2019.

  31. arXiv:1810.01912  [pdf, other

    cs.CL

    Fast Approach to Build an Automatic Sentiment Annotator for Legal Domain using Transfer Learning

    Authors: Viraj Gamage, Menuka Warushavithana, Nisansa de Silva, Amal Shehan Perera, Gathika Ratnayaka, Thejan Rupasinghe

    Abstract: This study proposes a novel way of identifying the sentiment of the phrases used in the legal domain. The added complexity of the language used in law, and the inability of the existing systems to accurately predict the sentiments of words in law are the main motivations behind this study. This is a transfer learning approach, which can be used for other domain adaptation tasks as well. The propos… ▽ More

    Submitted 3 October, 2018; originally announced October 2018.

    Comments: 9 pages, 3 figures

  32. arXiv:1809.03416  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Identifying Relationships Among Sentences in Court Case Transcripts Using Discourse Relations

    Authors: Gathika Ratnayaka, Thejan Rupasinghe, Nisansa de Silva, Menuka Warushavithana, Viraj Gamage, Amal Shehan Perera

    Abstract: Case Law has a significant impact on the proceedings of legal cases. Therefore, the information that can be obtained from previous court cases is valuable to lawyers and other legal officials when performing their duties. This paper describes a methodology of applying discourse relations between sentences when processing text documents related to the legal domain. In this study, we developed a mec… ▽ More

    Submitted 14 September, 2018; v1 submitted 10 September, 2018; originally announced September 2018.

    Comments: Conference: 2018 International Conference on Advances in ICT for Emerging Regions (ICTer)

  33. arXiv:1809.00982  [pdf, other

    cs.CV

    Wavelet based edge feature enhancement for convolutional neural networks

    Authors: D. D. N. De Silva, S. Fernando, I. T. S. Piyatilake, A. V. S. Karunarathne

    Abstract: Convolutional neural networks are able to perform a hierarchical learning process starting with local features. However, a limited attention is paid to enhancing such elementary level features like edges. We propose and evaluate two wavelet-based edge feature enhancement methods to preprocess the input images to convolutional neural networks. The first method develops feature enhanced representati… ▽ More

    Submitted 4 February, 2019; v1 submitted 29 August, 2018; originally announced September 2018.

  34. arXiv:1808.01766  [pdf

    cs.NE

    On Optimizing Deep Convolutional Neural Networks by Evolutionary Computing

    Authors: M. U. B. Dias, D. D. N. De Silva, S. Fernando

    Abstract: Optimization for deep networks is currently a very active area of research. As neural networks become deeper, the ability in manually optimizing the network becomes harder. Mini-batch normalization, identification of effective respective fields, momentum updates, introduction of residual blocks, learning rate adoption, etc. have been proposed to speed up the rate of convergent in manual training p… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

  35. arXiv:1805.10685  [pdf, other

    cs.IR cs.CL

    Legal Document Retrieval using Document Vector Embeddings and Deep Learning

    Authors: Keet Sugathadasa, Buddhi Ayesha, Nisansa de Silva, Amal Shehan Perera, Vindula Jayawardana, Dimuthu Lakmal, Madhavi Perera

    Abstract: Domain specific information retrieval process has been a prominent and ongoing research in the field of natural language processing. Many researchers have incorporated different techniques to overcome the technical and domain specificity and provide a mature model for various domains of interest. The main bottleneck in these studies is the heavy coupling of domain experts, that makes the entire pr… ▽ More

    Submitted 27 May, 2018; originally announced May 2018.

  36. Semi-Supervised Instance Population of an Ontology using Word Vector Embeddings

    Authors: Vindula Jayawardana, Dimuthu Lakmal, Nisansa de Silva, Amal Shehan Perera, Keet Sugathadasa, Buddhi Ayesha, Madhavi Perera

    Abstract: In many modern day systems such as information extraction and knowledge management agents, ontologies play a vital role in maintaining the concept hierarchies of the selected domain. However, ontology population has become a problematic process due to its nature of heavy coupling with manual human intervention. With the use of word embeddings in the field of natural language processing, it became… ▽ More

    Submitted 9 September, 2017; originally announced September 2017.

  37. arXiv:1709.00013  [pdf, other

    quant-ph cs.LO

    Logical paradoxes in quantum computation

    Authors: Nadish de Silva

    Abstract: While quantum computers are expected to yield considerable advantages over classical devices, the precise features of quantum theory enabling these advantages remain unclear. Contextuality--the denial of a notion of classical physical reality--has emerged as a promising hypothesis. Magic states are quantum resources critical for practically achieving universal quantum computation. They exhibit t… ▽ More

    Submitted 7 June, 2018; v1 submitted 31 August, 2017; originally announced September 2017.

    Comments: To appear in the Proceedings of the Thirty-Third Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2018 - Oxford, UK)

  38. Deriving a Representative Vector for Ontology Classes with Instance Word Vector Embeddings

    Authors: Vindula Jayawardana, Dimuthu Lakmal, Nisansa de Silva, Amal Shehan Perera, Keet Sugathadasa, Buddhi Ayesha

    Abstract: Selecting a representative vector for a set of vectors is a very common requirement in many algorithmic tasks. Traditionally, the mean or median vector is selected. Ontology classes are sets of homogeneous instance objects that can be converted to a vector space by word vector embeddings. This study proposes a methodology to derive a representative vector for ontology classes whose instances were… ▽ More

    Submitted 7 June, 2017; originally announced June 2017.

  39. Synergistic Union of Word2Vec and Lexicon for Domain Specific Semantic Similarity

    Authors: Keet Sugathadasa, Buddhi Ayesha, Nisansa de Silva, Amal Shehan Perera, Vindula Jayawardana, Dimuthu Lakmal, Madhavi Perera

    Abstract: Semantic similarity measures are an important part in Natural Language Processing tasks. However Semantic similarity measures built for general use do not perform well within specific domains. Therefore in this study we introduce a domain specific semantic similarity measure that was created by the synergistic union of word2vec, a word embedding method that is used for semantic similarity calculat… ▽ More

    Submitted 8 June, 2017; v1 submitted 6 June, 2017; originally announced June 2017.

    Comments: 6 Pages, 3 figures

  40. arXiv:1705.09995  [pdf

    cs.CL

    Subject Specific Stream Classification Preprocessing Algorithm for Twitter Data Stream

    Authors: Nisansa de Silva, Danaja Maldeniya, Chamilka Wijeratne

    Abstract: Micro-blogging service Twitter is a lucrative source for data mining applications on global sentiment. But due to the omnifariousness of the subjects mentioned in each data item; it is inefficient to run a data mining algorithm on the raw data. This paper discusses an algorithm to accurately classify the entire stream in to a given number of mutually exclusive collectively exhaustive streams upon… ▽ More

    Submitted 28 May, 2017; originally announced May 2017.

    Comments: 6 pages

  41. The Quantum Monad on Relational Structures

    Authors: Samson Abramsky, Rui Soares Barbosa, Nadish de Silva, Octavio Zapata

    Abstract: Homomorphisms between relational structures play a central role in finite model theory, constraint satisfaction and database theory. A central theme in quantum computation is to show how quantum resources can be used to gain advantage in information processing tasks. In particular, non-local games have been used to exhibit quantum advantage in boolean constraint satisfaction, and to obtain quantum… ▽ More

    Submitted 20 May, 2017; originally announced May 2017.

    Comments: 20 pages

    Journal ref: 42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017), Leibniz International Proceedings in Informatics (LIPIcs) 83: 35:1--35:19, 2017