Zum Hauptinhalt springen

Showing 1–25 of 25 results for author: Alstrup, S

Searching in archive cs. Search in all archives.
.
  1. Unsupervised Multi-Index Semantic Hashing

    Authors: Christian Hansen, Casper Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma

    Abstract: Semantic hashing represents documents as compact binary vectors (hash codes) and allows both efficient and effective similarity search in large-scale information retrieval. The state of the art has primarily focused on learning hash codes that improve similarity search effectiveness, while assuming a brute-force linear scan strategy for searching over all the hash codes, even though much faster al… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: Proceedings of the 2021 World Wide Web Conference, published under Creative Commons CC-BY 4.0 License

  2. Unsupervised Semantic Hashing with Pairwise Reconstruction

    Authors: Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma

    Abstract: Semantic Hashing is a popular family of methods for efficient similarity search in large-scale datasets. In Semantic Hashing, documents are encoded as short binary vectors (i.e., hash codes), such that semantic similarity can be efficiently computed using the Hamming distance. Recent state-of-the-art approaches have utilized weak supervision to train better performing hashing models. Inspired by t… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: Accepted at SIGIR'20

  3. Factuality Checking in News Headlines with Eye Tracking

    Authors: Christian Hansen, Casper Hansen, Jakob Grue Simonsen, Birger Larsen, Stephen Alstrup, Christina Lioma

    Abstract: We study whether it is possible to infer if a news headline is true or false using only the movement of the human eyes when reading news headlines. Our study with 55 participants who are eye-tracked when reading 108 news headlines (72 true, 36 false) shows that false headlines receive statistically significantly less visual attention than true headlines. We further build an ensemble learner that p… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

    Comments: Accepted to SIGIR 2020

  4. Content-aware Neural Hashing for Cold-start Recommendation

    Authors: Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma

    Abstract: Content-aware recommendation approaches are essential for providing meaningful recommendations for \textit{new} (i.e., \textit{cold-start}) items in a recommender system. We present a content-aware neural hashing-based collaborative filtering approach (NeuHash-CF), which generates binary hash codes for users and items, such that the highly efficient Hamming distance can be used for estimating user… ▽ More

    Submitted 31 May, 2020; originally announced June 2020.

    Comments: Accepted to SIGIR 2020

  5. arXiv:1909.06856  [pdf, other

    cs.CY

    Modelling End-of-Session Actions in Educational Systems

    Authors: Christian Hansen, Casper Hansen, Stephen Alstrup, Christina Lioma

    Abstract: In this paper we consider the problem of modelling when students end their session in an online mathematics educational system. Being able to model this accurately will help us optimize the way content is presented and consumed. This is done by modelling the probability of an action being the last in a session, which we denote as the End-of-Session probability. We use log data from a system where… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Comments: In proceedings of EDM 2019

  6. arXiv:1908.08937  [pdf, ps, other

    cs.CY cs.LG stat.ML

    Tracking Behavioral Patterns among Students in an Online Educational System

    Authors: Stephan Lorenzen, Niklas Hjuler, Stephen Alstrup

    Abstract: Analysis of log data generated by online educational systems is an essential task to better the educational systems and increase our understanding of how students learn. In this study we investigate previously unseen data from Clio Online, the largest provider of digital learning content for primary schools in Denmark. We consider data for 14,810 students with 3 million sessions in the period 2015… ▽ More

    Submitted 21 August, 2019; originally announced August 2019.

    Journal ref: In Proceedings of the 11'th International Conference on Educational Data Mining (EDM), p. 280-285. 2018

  7. arXiv:1906.03072  [pdf, ps, other

    cs.CY cs.CL cs.LG stat.ML

    Investigating Writing Style Development in High School

    Authors: Stephan Lorenzen, Niklas Hjuler, Stephen Alstrup

    Abstract: In this paper we do the first large scale analysis of writing style development among Danish high school students. More than 10K students with more than 100K essays are analyzed. Writing style itself is often studied in the natural language processing community, but usually with the goal of verifying authorship, assessing quality or popularity, or other kinds of predictions. In this work, we ana… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: A short version of this paper will be presented at EDM 2019

  8. arXiv:1906.01635  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Detecting Ghostwriters in High Schools

    Authors: Magnus Stavngaard, August Sørensen, Stephan Lorenzen, Niklas Hjuler, Stephen Alstrup

    Abstract: Students hiring ghostwriters to write their assignments is an increasing problem in educational institutions all over the world, with companies selling these services as a product. In this work, we develop automatic techniques with special focus on detecting such ghostwriting in high school assignments. This is done by training deep neural networks on an unprecedented large amount of data supplied… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: Presented at ESANN 2019

    Journal ref: Proceedings. ESANN 2019: 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. ed. Michel Verleysen. 2019. p 197-202

  9. arXiv:1906.00674  [pdf, other

    cs.IR cs.CL

    Contextually Propagated Term Weights for Document Representation

    Authors: Casper Hansen, Christian Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

    Abstract: Word embeddings predict a word from its neighbours by learning small, dense embedding vectors. In practice, this prediction corresponds to a semantic score given to the predicted word (or term weight). We present a novel model that, given a target word, redistributes part of that word's weight (that has been computed with word embeddings) across words occurring in similar contexts as the target wo… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: SIGIR 2019

  10. arXiv:1906.00671  [pdf, other

    cs.IR cs.CL cs.LG

    Unsupervised Neural Generative Semantic Hashing

    Authors: Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma

    Abstract: Fast similarity search is a key component in large-scale information retrieval, where semantic hashing has become a popular strategy for representing documents as binary hash codes. Recent advances in this area have been obtained through neural network based models: generative models trained by learning to reconstruct the original documents. We present a novel unsupervised generative semantic hash… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: SIGIR 2019

  11. arXiv:1904.00761  [pdf, other

    cs.CL cs.LG stat.ML

    Neural Speed Reading with Structural-Jump-LSTM

    Authors: Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

    Abstract: Recurrent neural networks (RNNs) can model natural language by sequentially 'reading' input tokens and outputting a distributed representation of each token. Due to the sequential nature of RNNs, inference time is linearly dependent on the input length, and all inputs are read regardless of their importance. Efforts to speed up this inference, known as 'neural speed reading', either ignore or skim… ▽ More

    Submitted 2 April, 2019; v1 submitted 20 March, 2019; originally announced April 2019.

    Comments: 10 pages

    Journal ref: 7th International Conference on Learning Representations (ICLR) 2019

  12. arXiv:1903.08408  [pdf, other

    cs.IR cs.LG

    Modelling Sequential Music Track Skips using a Multi-RNN Approach

    Authors: Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

    Abstract: Modelling sequential music skips provides streaming companies the ability to better understand the needs of the user base, resulting in a better user experience by reducing the need to manually skip certain music tracks. This paper describes the solution of the University of Copenhagen DIKU-IR team in the 'Spotify Sequential Skip Prediction Challenge', where the task was to predict the skip behavi… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

    Comments: 4 pages

    Journal ref: 12th ACM International Conference on Web Search and Data Mining (WSDM) 2019, WSDM Cup

  13. arXiv:1903.08404  [pdf, other

    cs.IR cs.CL cs.LG

    Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking

    Authors: Casper Hansen, Christian Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

    Abstract: Automatic fact-checking systems detect misinformation, such as fake news, by (i) selecting check-worthy sentences for fact-checking, (ii) gathering related information to the sentences, and (iii) inferring the factuality of the sentences. Most prior research on (i) uses hand-crafted features to select check-worthy sentences, and does not explicitly account for the recent finding that the top weigh… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

    Comments: 6 pages

    Journal ref: In Companion Proceedings of the 2019 World Wide Web Conference

  14. arXiv:1709.01960  [pdf, other

    cs.DS

    Constructing Light Spanners Deterministically in Near-Linear Time

    Authors: Stephen Alstrup, Søren Dahlgaard, Arnold Filtser, Morten Stöckel, Christian Wulff-Nilsen

    Abstract: Graph spanners are well-studied and widely used both in theory and practice. In a recent breakthrough, Chechik and Wulff-Nilsen [CW18] improved the state-of-the-art for light spanners by constructing a $(2k-1)(1+ε)$-spanner with $O(n^{1+1/k})$ edges and $O_ε(n^{1/k})$ lightness. Soon after, Filtser and Solomon [FS19] showed that the classic greedy spanner construction achieves the same bounds The… ▽ More

    Submitted 19 January, 2022; v1 submitted 6 September, 2017; originally announced September 2017.

  15. arXiv:1708.06403  [pdf, other

    cs.CY

    Smart City Analytics: Ensemble-Learned Prediction of Citizen Home Care

    Authors: Casper Hansen, Christian Hansen, Stephen Alstrup, Christina Lioma

    Abstract: We present an ensemble learning method that predicts large increases in the hours of home care received by citizens. The method is supervised, and uses different ensembles of either linear (logistic regression) or non-linear (random forests) classifiers. Experiments with data available from 2013 to 2017 for every citizen in Copenhagen receiving home care (27,775 citizens) show that prediction can… ▽ More

    Submitted 21 August, 2017; originally announced August 2017.

  16. arXiv:1708.04164  [pdf, other

    cs.CY cs.HC

    Sequence Modelling For Analysing Student Interaction with Educational Systems

    Authors: Christian Hansen, Casper Hansen, Niklas Hjuler, Stephen Alstrup, Christina Lioma

    Abstract: The analysis of log data generated by online educational systems is an important task for improving the systems, and furthering our knowledge of how students learn. This paper uses previously unseen log data from Edulab, the largest provider of digital learning for mathematics in Denmark, to analyse the sessions of its users, where 1.08 million student sessions are extracted from a subset of their… ▽ More

    Submitted 14 August, 2017; originally announced August 2017.

    Comments: The 10th International Conference on Educational Data Mining 2017

  17. arXiv:1607.04911  [pdf, other

    cs.DS math.CO

    Near-Optimal Induced Universal Graphs for Bounded Degree Graphs

    Authors: Mikkel Abrahamsen, Stephen Alstrup, Jacob Holm, Mathias Bæk Tejs Knudsen, Morten Stöckel

    Abstract: A graph $U$ is an induced universal graph for a family $F$ of graphs if every graph in $F$ is a vertex-induced subgraph of $U$. For the family of all undirected graphs on $n$ vertices Alstrup, Kaplan, Thorup, and Zwick [STOC 2015] give an induced universal graph with $O\!\left(2^{n/2}\right)$ vertices, matching a lower bound by Moon [Proc. Glasgow Math. Assoc. 1965]. Let $k= \lceil D/2 \rceil$.… ▽ More

    Submitted 21 July, 2016; v1 submitted 17 July, 2016; originally announced July 2016.

  18. arXiv:1507.04046  [pdf, ps, other

    cs.DS

    Distance labeling schemes for trees

    Authors: Stephen Alstrup, Inge Li Gørtz, Esben Bistrup Halvorsen, Ely Porat

    Abstract: We consider distance labeling schemes for trees: given a tree with $n$ nodes, label the nodes with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the distance in the tree between the two nodes. A lower bound by Gavoille et. al. (J. Alg. 2004) and an upper bound by Peleg (J. Graph Theory 2000) establish that labels must use… ▽ More

    Submitted 14 July, 2015; originally announced July 2015.

  19. arXiv:1507.02618  [pdf, other

    cs.DS

    Sublinear Distance Labeling

    Authors: Stephen Alstrup, Søren Dahlgaard, Mathias Bæk Tejs Knudsen, Ely Porat

    Abstract: A distance labeling scheme labels the $n$ nodes of a graph with binary strings such that, given the labels of any two nodes, one can determine the distance in the graph between the two nodes by looking only at the labels. A $D$-preserving distance labeling scheme only returns precise distances between pairs of nodes that are at distance at least $D$ from each other. In this paper we consider dista… ▽ More

    Submitted 8 September, 2016; v1 submitted 9 July, 2015; originally announced July 2015.

    Comments: A preliminary version of this paper appeared at ESA'16

  20. arXiv:1504.04498  [pdf, ps, other

    cs.DS

    Simpler, faster and shorter labels for distances in graphs

    Authors: Stephen Alstrup, Cyril Gavoille, Esben Bistrup Halvorsen, Holger Petersen

    Abstract: We consider how to assign labels to any undirected graph with n nodes such that, given the labels of two nodes and no other information regarding the graph, it is possible to determine the distance between the two nodes. The challenge in such a distance labeling scheme is primarily to minimize the maximum label lenght and secondarily to minimize the time needed to answer distance queries (decoding… ▽ More

    Submitted 17 April, 2015; originally announced April 2015.

    ACM Class: E.1; G.2.2; E.4

  21. arXiv:1504.02306  [pdf, other

    cs.DS

    Optimal induced universal graphs and adjacency labeling for trees

    Authors: Stephen Alstrup, Søren Dahlgaard, Mathias Bæk Tejs Knudsen

    Abstract: We show that there exists a graph $G$ with $O(n)$ nodes, where any forest of $n$ nodes is a node-induced subgraph of $G$. Furthermore, for constant arboricity $k$, the result implies the existence of a graph with $O(n^k)$ nodes that contains all $n$-node graphs as node-induced subgraphs, matching a $Ω(n^k)$ lower bound. The lower bound and previously best upper bounds were presented in Alstrup and… ▽ More

    Submitted 15 February, 2016; v1 submitted 9 April, 2015; originally announced April 2015.

    Comments: A preliminary version of this paper appeared at FOCS'15

  22. arXiv:1404.3391  [pdf, ps, other

    cs.DS cs.DM math.CO

    Adjacency labeling schemes and induced-universal graphs

    Authors: Stephen Alstrup, Haim Kaplan, Mikkel Thorup, Uri Zwick

    Abstract: We describe a way of assigning labels to the vertices of any undirected graph on up to $n$ vertices, each composed of $n/2+O(1)$ bits, such that given the labels of two vertices, and no other information regarding the graph, it is possible to decide whether or not the vertices are adjacent in the graph. This is optimal, up to an additive constant, and constitutes the first improvement in almost 50… ▽ More

    Submitted 13 April, 2014; originally announced April 2014.

    ACM Class: G.2.2; E.1; E.2; G.2.1

  23. Near-optimal labeling schemes for nearest common ancestors

    Authors: Stephen Alstrup, Esben Bistrup Halvorsen, Kasper Green Larsen

    Abstract: We consider NCA labeling schemes: given a rooted tree $T$, label the nodes of $T$ with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the label of their nearest common ancestor. For trees with $n$ nodes we present upper and lower bounds establishing that labels of size $(2\pm ε)\log n$, $ε<1$ are both sufficient and necessary. (All… ▽ More

    Submitted 16 December, 2013; originally announced December 2013.

    ACM Class: E.1; G.2.2; E.4

  24. arXiv:cs/0310065  [pdf, ps, other

    cs.DS

    Maintaining Information in Fully-Dynamic Trees with Top Trees

    Authors: Stephen Alstrup, Jacob Holm, Kristian de Lichtenberg, Mikkel Thorup

    Abstract: We introduce top trees as a design of a new simpler interface for data structures maintaining information in a fully-dynamic forest. We demonstrate how easy and versatile they are to use on a host of different applications. For example, we show how to maintain the diameter, center, and median of each tree in the forest. The forest can be updated by insertion and deletion of edges and by changes… ▽ More

    Submitted 21 November, 2003; v1 submitted 31 October, 2003; originally announced October 2003.

    Comments: Preliminary versions of this work presented at ICALP'97 and SWAT'00. The new version takes layered top trees into account

    ACM Class: E.1; F.2.2; G.2.2

  25. arXiv:cs/0211010  [pdf, ps, other

    cs.DS

    Efficient Tree Layout in a Multilevel Memory Hierarchy

    Authors: Stephen Alstrup, Michael A. Bender, Erik D. Demaine, Martin Farach-Colton, Theis Rauhe, Mikkel Thorup

    Abstract: We consider the problem of laying out a tree with fixed parent/child structure in hierarchical memory. The goal is to minimize the expected number of block transfers performed during a search along a root-to-leaf path, subject to a given probability distribution on the leaves. This problem was previously considered by Gil and Itai, who developed optimal but slow algorithms when the block-transfe… ▽ More

    Submitted 28 July, 2004; v1 submitted 11 November, 2002; originally announced November 2002.

    Comments: 18 pages. Version 2 adds faster dynamic programs. Preliminary version appeared in European Symposium on Algorithms, 2002

    ACM Class: E.1; F.2.2