Skip to main content

Showing 1–7 of 7 results for author: Şahinuç, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04046  [pdf, other

    cs.CL

    Systematic Task Exploration with LLMs: A Study in Citation Text Generation

    Authors: Furkan Şahinuç, Ilia Kuznetsov, Yufang Hou, Iryna Gurevych

    Abstract: Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks. Yet, this flexibility brings new challenges, as it introduces new degrees of freedom in formulating the task inputs and instructions and in evaluating model performance. To facilitate the exploration of creative NLG tasks, we propose a three-component re… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 (Main)

  2. arXiv:2210.05401  [pdf, other

    cs.SI cs.CL cs.IR

    MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection

    Authors: Cagri Toraman, Oguzhan Ozcelik, Furkan Şahinuç, Fazli Can

    Abstract: The rapid dissemination of misinformation through online social networks poses a pressing issue with harmful consequences jeopardizing human health, public safety, democracy, and the economy; therefore, urgent action is required to address this problem. In this study, we construct a new human-annotated dataset, called MiDe22, having 5,284 English and 5,064 Turkish tweets with their misinformation… ▽ More

    Submitted 11 July, 2024; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Published at LREC-COLING 2024

  3. arXiv:2209.12816  [pdf, other

    cs.CL cs.AI cs.GL eess.AS

    Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers

    Authors: Nurullah Sevim, Ege Ozan Özyedek, Furkan Şahinuç, Aykut Koç

    Abstract: Transformer-based language models utilize the attention mechanism for substantial performance improvements in almost all natural language processing (NLP) tasks. Similar attention structures are also extensively studied in several other areas. Although the attention mechanism enhances the model performances significantly, its quadratic complexity prevents efficient processing of long sequences. Re… ▽ More

    Submitted 16 May, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: 11 pages

  4. Impact of Tokenization on Language Models: An Analysis for Turkish

    Authors: Cagri Toraman, Eyup Halit Yilmaz, Furkan Şahinuç, Oguzhan Ozcelik

    Abstract: Tokenization is an important text preprocessing step to prepare input tokens for deep language models. WordPiece and BPE are de facto methods employed by important models, such as BERT and GPT. However, the impact of tokenization can be different for morphologically rich languages, such as Turkic languages, where many words can be generated by adding prefixes and suffixes. We compare five tokenize… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: submitted to ACM TALLIP

    Journal ref: ACM Transactions on Asian and Low-Resource Language Information Processing (2023) Volume 22 Issue 4 pp 1-21

  5. arXiv:2203.01111  [pdf, other

    cs.CL cs.SI

    Large-Scale Hate Speech Detection with Cross-Domain Transfer

    Authors: Cagri Toraman, Furkan Şahinuç, Eyup Halit Yilmaz

    Abstract: The performance of hate speech detection models relies on the datasets on which the models are trained. Existing datasets are mostly prepared with a limited number of instances or hate domains that define hate topics. This hinders large-scale analysis and transfer learning with respect to hate domains. In this study, we construct large-scale tweet datasets for hate speech detection in English and… ▽ More

    Submitted 5 July, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: Published at the Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022)

  6. BlackLivesMatter 2020: An Analysis of Deleted and Suspended Users in Twitter

    Authors: Cagri Toraman, Furkan Şahinuç, Eyup Halit Yilmaz

    Abstract: After George Floyd's death in May 2020, the volume of discussion in social media increased dramatically. A series of protests followed this tragic event, called as the 2020 BlackLivesMatter movement. Eventually, many user accounts are deleted by their owners or suspended due to violating the rules of social media platforms. In this study, we analyze what happened in Twitter before and after the ev… ▽ More

    Submitted 6 July, 2022; v1 submitted 30 September, 2021; originally announced October 2021.

    Comments: Published at the 14th International ACM Conference on Web Science in 2022 (WebSci 2022)

  7. Imparting Interpretability to Word Embeddings while Preserving Semantic Structure

    Authors: Lutfi Kerem Senel, Ihsan Utlu, Furkan Şahinuç, Haldun M. Ozaktas, Aykut Koç

    Abstract: As an ubiquitous method in natural language processing, word embeddings are extensively employed to map semantic properties of words into a dense vector representation. They capture semantic and syntactic relations among words but the vectors corresponding to the words are only meaningful relative to each other. Neither the vector nor its dimensions have any absolute, interpretable meaning. We int… ▽ More

    Submitted 2 July, 2020; v1 submitted 19 July, 2018; originally announced July 2018.

    Comments: 14 pages, 5 figures

    Journal ref: Natural Language Engineering, 1-26, 2020