Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Sorensen, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.16905  [pdf, other

    cs.CL

    Towards a Unified Framework for Adaptable Problematic Content Detection via Continual Learning

    Authors: Ali Omrani, Alireza S. Ziabari, Preni Golazizian, Jeffrey Sorensen, Morteza Dehghani

    Abstract: Detecting problematic content, such as hate speech, is a multifaceted and ever-changing task, influenced by social dynamics, user populations, diversity of sources, and evolving language. There has been significant efforts, both in academia and in industry, to develop annotated resources that capture various aspects of problematic content. Due to researchers' diverse objectives, the annotations ar… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  2. arXiv:2202.11176  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    A New Generation of Perspective API: Efficient Multilingual Character-level Transformers

    Authors: Alyssa Lees, Vinh Q. Tran, Yi Tay, Jeffrey Sorensen, Jai Gupta, Donald Metzler, Lucy Vasserman

    Abstract: On the world wide web, toxic content detectors are a crucial line of defense against potentially hateful and offensive messages. As such, building highly effective classifiers that enable a safer internet is an important research area. Moreover, the web is a highly multilingual, cross-cultural community that develops its own lingo over time. As such, it is crucial to develop models that are effect… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

  3. arXiv:2112.02164  [pdf, other

    eess.IV cs.CV

    Bridging the gap between prostate radiology and pathology through machine learning

    Authors: Indrani Bhattacharya, David S. Lim, Han Lin Aung, Xingchen Liu, Arun Seetharaman, Christian A. Kunder, Wei Shao, Simon J. C. Soerensen, Richard E. Fan, Pejman Ghanouni, Katherine J. To'o, James D. Brooks, Geoffrey A. Sonn, Mirabela Rusu

    Abstract: Prostate cancer is the second deadliest cancer for American men. While Magnetic Resonance Imaging (MRI) is increasingly used to guide targeted biopsies for prostate cancer diagnosis, its utility remains limited due to high rates of false positives and false negatives as well as low inter-reader agreements. Machine learning methods to detect and localize cancer on prostate MRI can help standardize… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: Indrani Bhattacharya and David S. Lim contributed equally as first authors. Geoffrey A. Sonn and Mirabela Rusu contributed equally as senior authors

  4. arXiv:2111.10223  [pdf, other

    cs.CL

    Toxicity Detection can be Sensitive to the Conversational Context

    Authors: Alexandros Xenos, John Pavlopoulos, Ion Androutsopoulos, Lucas Dixon, Jeffrey Sorensen, Leo Laugier

    Abstract: User posts whose perceived toxicity depends on the conversational context are rare in current toxicity detection datasets. Hence, toxicity detectors trained on existing datasets will also tend to disregard context, making the detection of context-sensitive toxicity harder when it does occur. We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels: (i) annotato… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

    Comments: 13 pages, 8 figures

  5. arXiv:2102.05456  [pdf, other

    cs.CL cs.AI cs.LG

    Civil Rephrases Of Toxic Texts With Self-Supervised Transformers

    Authors: Leo Laugier, John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon

    Abstract: Platforms that support online commentary, from social networks to news sites, are increasingly leveraging machine learning to assist their moderation efforts. But this process does not typically provide feedback to the author that would help them contribute according to the community guidelines. This is prohibitively time-consuming for human moderators to do, and computational approaches are still… ▽ More

    Submitted 11 February, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

  6. arXiv:2010.07410  [pdf, other

    cs.CL cs.SI

    Six Attributes of Unhealthy Conversation

    Authors: Ilan Price, Jordan Gifford-Moore, Jory Fleming, Saul Musker, Maayan Roichman, Guillaume Sylvain, Nithum Thain, Lucas Dixon, Jeffrey Sorensen

    Abstract: We present a new dataset of approximately 44000 comments labeled by crowdworkers. Each comment is labelled as either 'healthy' or 'unhealthy', in addition to binary labels for the presence of six potentially 'unhealthy' sub-attributes: (1) hostile; (2) antagonistic, insulting, provocative or trolling; (3) dismissive; (4) condescending or patronising; (5) sarcastic; and/or (6) an unfair generalisat… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: Appearing in the 4th Workshop on Online Abuse and Harms (2020)

  7. arXiv:2008.00119  [pdf, other

    eess.IV cs.CV

    CorrSigNet: Learning CORRelated Prostate Cancer SIGnatures from Radiology and Pathology Images for Improved Computer Aided Diagnosis

    Authors: Indrani Bhattacharya, Arun Seetharaman, Wei Shao, Rewa Sood, Christian A. Kunder, Richard E. Fan, Simon John Christoph Soerensen, Jeffrey B. Wang, Pejman Ghanouni, Nikola C. Teslovich, James D. Brooks, Geoffrey A. Sonn, Mirabela Rusu

    Abstract: Magnetic Resonance Imaging (MRI) is widely used for screening and staging prostate cancer. However, many prostate cancers have subtle features which are not easily identifiable on MRI, resulting in missed diagnoses and alarming variability in radiologist interpretation. Machine learning models have been developed in an effort to improve cancer identification, but current models localize cancer usi… ▽ More

    Submitted 31 July, 2020; originally announced August 2020.

    Comments: Accepted to MICCAI 2020

  8. arXiv:2006.00998  [pdf, other

    cs.CL

    Toxicity Detection: Does Context Really Matter?

    Authors: John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, Ion Androutsopoulos

    Abstract: Moderation is crucial to promoting healthy on-line discussions. Although several `toxicity' detection datasets and models have been published, most of them ignore the context of the posts, implicitly assuming that comments maybe judged independently. We investigate this assumption by focusing on two questions: (a) does context affect the human judgement, and (b) does conditioning on context improv… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

  9. arXiv:2004.05476  [pdf, other

    cs.CL cs.CY cs.IR cs.LG

    Classifying Constructive Comments

    Authors: Varada Kolhatkar, Nithum Thain, Jeffrey Sorensen, Lucas Dixon, Maite Taboada

    Abstract: We introduce the Constructive Comments Corpus (C3), comprised of 12,000 annotated news comments, intended to help build new tools for online communities to improve the quality of their discussions. We define constructive comments as high-quality comments that make a contribution to the conversation. We explain the crowd worker annotation scheme and define a taxonomy of sub-characteristics of const… ▽ More

    Submitted 4 August, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

  10. arXiv:1903.04561  [pdf, other

    cs.LG cs.CL stat.ML

    Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification

    Authors: Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, Lucy Vasserman

    Abstract: Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifier's score distribution can vary ac… ▽ More

    Submitted 8 May, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

    Comments: Updated to fix typo in Equation 4

  11. arXiv:1903.02088  [pdf, other

    stat.ML cs.LG

    Limitations of Pinned AUC for Measuring Unintended Bias

    Authors: Daniel Borkan, Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, Lucy Vasserman

    Abstract: This report examines the Pinned AUC metric introduced and highlights some of its limitations. Pinned AUC provides a threshold-agnostic measure of unintended bias in a classification model, inspired by the ROC-AUC metric. However, as we highlight in this report, there are ways that the metric can obscure different kinds of unintended biases when the underlying class distributions on which bias is b… ▽ More

    Submitted 5 March, 2019; originally announced March 2019.

  12. arXiv:1810.13181  [pdf, other

    cs.CL

    WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

    Authors: Yiqing Hua, Cristian Danescu-Niculescu-Mizil, Dario Taraborelli, Nithum Thain, Jeffery Sorensen, Lucas Dixon

    Abstract: We present a corpus that encompasses the complete history of conversations between contributors to Wikipedia, one of the largest online collaborative communities. By recording the intermediate states of conversations---including not only comments and replies, but also their modifications, deletions and restorations---this data offers an unprecedented view of online conversation. This level of deta… ▽ More

    Submitted 31 October, 2018; originally announced October 2018.

    Journal ref: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

  13. arXiv:1709.02230  [pdf, other

    physics.soc-ph cond-mat.quant-gas cs.CY physics.atom-ph physics.data-an quant-ph

    Remote optimization of an ultra-cold atoms experiment by experts and citizen scientists

    Authors: Robert Heck, Oana Vuculescu, Jens Jakob Sørensen, Jonathan Zoller, Morten G. Andreasen, Mark G. Bason, Poul Ejlertsen, Ottó Elíasson, Pinja Haikka, Jens S. Laustsen, Lærke L. Nielsen, Andrew Mao, Romain Müller, Mario Napolitano, Mads K. Pedersen, Aske R. Thorsen, Carsten Bergenholtz, Tommaso Calarco, Simone Montangero, Jacob F. Sherson

    Abstract: We introduce a novel remote interface to control and optimize the experimental production of Bose-Einstein condensates (BECs) and find improved solutions using two distinct implementations. First, a team of theoreticians employed a Remote version of their dCRAB optimization algorithm (RedCRAB), and second a gamified interface allowed 600 citizen scientists from around the world to participate in r… ▽ More

    Submitted 27 March, 2018; v1 submitted 7 September, 2017; originally announced September 2017.

    Comments: 8 pages, 3 figures and 12 pages, 7 figures supplementary information

    Journal ref: PNAS, published online ahead of print (2018)

  14. arXiv:1709.01347  [pdf, other

    cs.IT

    Random Pilot and Data Access in Massive MIMO for Machine-type Communications

    Authors: Elisabeth de Carvalho, Emil Björnson, Jesper H. Sørensen, Erik G. Larsson, Petar Popovski

    Abstract: A massive MIMO system, represented by a base station with hundreds of antennas, is capable of spatially multiplexing many devices and thus naturally suited to serve dense crowds of wireless devices in emerging applications, such as machine-type communications. Crowd scenarios pose new challenges in the pilot-based acquisition of channel state information and call for pilot access protocols that ma… ▽ More

    Submitted 5 September, 2017; originally announced September 2017.

  15. arXiv:1610.04364  [pdf, ps, other

    cs.IT

    Delay Minimization in Real-time Communications with Joint Buffering and Coding

    Authors: Jesper H. Sørensen, Petar Popovski, Jan Østergaard

    Abstract: We present a closed-form expression for the minimal delay that is achievable in a setting that combines a buffer and an erasure code, used to mitigate the packet delay variance. The erasure code is modeled according to the recent information-theoretic results on finite block length codes. Evaluations reveal that accurate knowledge of the network parameters is essential for optimal operation. Moreo… ▽ More

    Submitted 14 October, 2016; originally announced October 2016.

  16. arXiv:1606.02080  [pdf, other

    cs.IT cs.NI

    Random Access Protocols for Massive MIMO

    Authors: Elisabeth de Carvalho, Emil Björnson, Jesper H. Sørensen, Petar Popovski, Erik G. Larsson

    Abstract: 5G wireless networks are expected to support new services with stringent requirements on data rates, latency and reliability. One novel feature is the ability to serve a dense crowd of devices, calling for radically new ways of accessing the network. This is the case in machine-type communications, but also in urban environments and hotspots. In those use cases, the high number of devices and the… ▽ More

    Submitted 17 March, 2017; v1 submitted 7 June, 2016; originally announced June 2016.

  17. arXiv:1605.05862  [pdf, ps, other

    cs.IT

    Coded Pilot Access: A Random Access Solution for Massive MIMO Systems

    Authors: Jesper H. Sørensen, Elisabeth de Carvalho, Čedomir Stefanović, Petar Popovski

    Abstract: We present a novel access protocol for crowd scenarios in massive MIMO (Multiple-input multiple-output) systems. Crowd scenarios are characterized by a large number of users with intermittent access behavior, whereby orthogonal scheduling is infeasible. In such scenarios, random access is a natural choice. The proposed access protocol relies on two essential properties of a massive MIMO system, na… ▽ More

    Submitted 19 May, 2016; originally announced May 2016.

    Comments: arXiv admin note: text overlap with arXiv:1505.05726

  18. A Random Access Protocol for Pilot Allocation in Crowded Massive MIMO Systems

    Authors: Emil Björnson, Elisabeth de Carvalho, Jesper H. Sørensen, Erik G. Larsson, Petar Popovski

    Abstract: The Massive MIMO (multiple-input multiple-output) technology has great potential to manage the rapid growth of wireless data traffic. Massive MIMO achieves tremendous spectral efficiency by spatial multiplexing of many tens of user equipments (UEs). These gains are only achieved in practice if many more UEs can connect efficiently to the network than today. As the number of UEs increases, while ea… ▽ More

    Submitted 11 February, 2017; v1 submitted 14 April, 2016; originally announced April 2016.

    Comments: To appear in IEEE Transactions on Wireless Communications, 16 pages, 10 figures. This is reproducible research with simulation code available at https://github.com/emilbjornson/sucre-protocol

  19. arXiv:1505.05726  [pdf, ps, other

    cs.IT

    Massive MIMO for Crowd Scenarios: A Solution Based on Random Access

    Authors: Jesper H. Sørensen, Elisabeth de Carvalho, Petar Popovski

    Abstract: This paper presents a new approach to intra-cell pilot contamination in crowded massive MIMO scenarios. The approach relies on two essential properties of a massive MIMO system, namely near-orthogonality between user channels and near-stability of channel powers. Signal processing techniques that take advantage of these properties allow us to view a set of contaminated pilot signals as a graph cod… ▽ More

    Submitted 21 May, 2015; originally announced May 2015.

  20. arXiv:1505.05717  [pdf, ps, other

    cs.IT

    Pilot Decontamination Through Pilot Sequence Hopping in Massive MIMO Systems

    Authors: Jesper H. Sørensen, Elisabeth de Carvalho

    Abstract: This work concerns wireless cellular networks applying massive multiple-input multiple-output (MIMO) technology. In such a system, the base station in a given cell is equipped with a very large number (hundreds or even thousands) of antennas and serves multiple users. Estimation of the channel from the base station to each user is performed at the base station using an uplink pilot sequence. Such… ▽ More

    Submitted 21 May, 2015; originally announced May 2015.

  21. arXiv:1301.7232  [pdf, ps, other

    cs.IT

    Coded Splitting Tree Protocols

    Authors: Jesper H. Sørensen, Cedomir Stefanović, Petar Popovski

    Abstract: This paper presents a novel approach to multiple access control called coded splitting tree protocol. The approach builds on the known tree splitting protocols, code structure and successive interference cancellation (SIC). Several instances of the tree splitting protocol are initiated, each instance is terminated prematurely and subsequently iterated. The combined set of leaves from all the tree… ▽ More

    Submitted 30 January, 2013; originally announced January 2013.

  22. arXiv:1204.4686  [pdf, ps, other

    cs.IT

    Analysis of LT Codes with Unequal Recovery Time

    Authors: Jesper H. Sørensen, Petar Popovski, Jan Østergaard

    Abstract: In this paper we analyze a specific class of rateless codes, called LT codes with unequal recovery time. These codes provide the option of prioritizing different segments of the transmitted data over other. The result is that segments are decoded in stages during the rateless transmission, where higher prioritized segments are decoded at lower overhead. Our analysis focuses on quantifying the expe… ▽ More

    Submitted 20 April, 2012; originally announced April 2012.

    Comments: 33 pages, 10 figures

  23. arXiv:1012.2673  [pdf, ps, other

    cs.IT

    On the Role of Feedback in LT Codes

    Authors: Jesper H. Sørensen, Petar Popovski, Jan Østergaard

    Abstract: This paper concerns application of feedback in LT codes. The considered type of feedback is acknowledgments, where information on which symbols have been decoded is given to the transmitter. We identify an important adaptive mechanism in standard LT codes, which is crucial to their ability to perform well under any channel conditions. We show how precipitate application of acknowledgments can inte… ▽ More

    Submitted 13 December, 2010; originally announced December 2010.

  24. arXiv:1011.2078  [pdf, ps, other

    cs.IT cs.NI

    Design and Analysis of LT Codes with Decreasing Ripple Size

    Authors: Jesper H. Sørensen, Petar Popovski, Jan Østergaard

    Abstract: In this paper we propose a new design of LT codes, which decreases the amount of necessary overhead in comparison to existing designs. The design focuses on a parameter of the LT decoding process called the ripple size. This parameter was also a key element in the design proposed in the original work by Luby. Specifically, Luby argued that an LT code should provide a constant ripple size during de… ▽ More

    Submitted 7 June, 2012; v1 submitted 9 November, 2010; originally announced November 2010.