Zum Hauptinhalt springen

Showing 1–34 of 34 results for author: Berger, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.05714  [pdf, other

    cs.AI

    Implementing a hybrid approach in a knowledge engineering process to manage technical advice relating to feedback from the operation of complex sensitive equipment

    Authors: Alain Claude Hervé Berger, Sébastien Boblet, Thierry Cartié, Jean-Pierre Cotton, François Vexler

    Abstract: How can technical advice on operating experience feedback be managed efficiently in an organization that has never used knowledge engineering techniques and methods? This article explains how an industrial company in the nuclear and defense sectors adopted such an approach, adapted to its "TA KM" organizational context and falls within the ISO30401 framework, to build a complete system with a "SAR… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: in French language. 35es Journ{é}es francophones d'Ing{é}nierie des Connaissances (IC 2024) @ Plate-Forme Intelligence Artificielle (PFIA 2024), Association Française pour l'Intelligence Artificielle; Laboratoire L3i La Rochelle Universit{é}, Jul 2024, La Rochelle, France

  2. arXiv:2403.11001  [pdf, other

    eess.IV cs.CV cs.LG

    Topologically faithful multi-class segmentation in medical images

    Authors: Alexander H. Berger, Nico Stucki, Laurin Lux, Vincent Buergin, Suprosanna Shit, Anna Banaszak, Daniel Rueckert, Ulrich Bauer, Johannes C. Paetzold

    Abstract: Topological accuracy in medical image segmentation is a highly important property for downstream applications such as network analysis and flow modeling in vessels or cell counting. Recently, significant methodological advancements have brought well-founded concepts from algebraic topology to binary segmentation. However, these approaches have been underexplored in multi-class segmentation scenari… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  3. arXiv:2403.06601  [pdf, other

    cs.CV cs.AI

    Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers

    Authors: Alexander H. Berger, Laurin Lux, Suprosanna Shit, Ivan Ezhov, Georgios Kaissis, Martin J. Menten, Daniel Rueckert, Johannes C. Paetzold

    Abstract: Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model. Due to the complexity of this task, large training datasets are rare in many domains, which makes the training of large networks challenging. This data sparsity necessitates the establishment of pre-training strategies akin to the state-of-the-art in computer visio… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  4. arXiv:2401.05740  [pdf, ps, other

    cs.GT

    An improved bound for the price of anarchy for related machine scheduling

    Authors: Andre Berger, Arman Rouhani, Marc Schröder

    Abstract: In this paper, we introduce an improved upper bound for the efficiency of Nash equilibria in utilitarian scheduling games on related machines. The machines have varying speeds and adhere to the Shortest Processing Time (SPT) policy as the global order for job processing. The goal of each job is to minimize its completion time, while the social objective is to minimize the sum of completion times.… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  5. arXiv:2311.03697  [pdf, other

    cs.RO

    Towards Autonomous Crop Monitoring: Inserting Sensors in Cluttered Environments

    Authors: Moonyoung Lee, Aaron Berger, Dominic Guri, Kevin Zhang, Lisa Coffee, George Kantor, Oliver Kroemer

    Abstract: We present a contact-based phenotyping robot platform that can autonomously insert nitrate sensors into cornstalks to proactively monitor macronutrient levels in crops. This task is challenging because inserting such sensors requires sub-centimeter precision in an environment which contains high levels of clutter, lighting variation, and occlusion. To address these challenges, we develop a robust… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  6. arXiv:2308.08674  [pdf, other

    cs.DS

    Approximating Min-Diameter: Standard and Bichromatic

    Authors: Aaron Berger, Jenny Kaufmann, Virginia Vassilevska Williams

    Abstract: The min-diameter of a directed graph $G$ is a measure of the largest distance between nodes. It is equal to the maximum min-distance $d_{min}(u,v)$ across all pairs $u,v \in V(G)$, where $d_{min}(u,v) = \min(d(u,v), d(v,u))$. Our work provides a $O(m^{1.426}n^{0.288})$-time $3/2$-approximation algorithm for min-diameter in DAGs, and a faster $O(m^{0.713}n)$-time almost-$3/2$-approximation variant.… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: ESA 2023

  7. arXiv:2308.06111  [pdf, other

    cs.CL cs.AI

    Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models

    Authors: Lars Hillebrand, Armin Berger, Tobias Deußer, Tim Dilmaghani, Mohamed Khaled, Bernd Kliem, Rüdiger Loitz, Maren Pielka, David Leonhard, Christian Bauckhage, Rafet Sifa

    Abstract: Auditing financial documents is a very tedious and time-consuming process. As of today, it can already be simplified by employing AI-based solutions to recommend relevant text passages from a report for each legal requirement of rigorous accounting standards. However, these methods need to be fine-tuned regularly, and they require abundant annotated data, which is often lacking in industrial envir… ▽ More

    Submitted 14 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: Accepted at DocEng 2023, 4 pages, 1 figure, 2 tables

  8. arXiv:2301.09545  [pdf, other

    cs.HC cs.AI cs.LG

    The Entoptic Field Camera as Metaphor-Driven Research-through-Design with AI Technologies

    Authors: Jesse Josua Benjamin, Heidi Biggs, Arne Berger, Julija Rukanskaitė, Michael Heidt, Nick Merrill, James Pierce, Joseph Lindley

    Abstract: Artificial intelligence (AI) technologies are widely deployed in smartphone photography; and prompt-based image synthesis models have rapidly become commonplace. In this paper, we describe a Research-through-Design (RtD) project which explores this shift in the means and modes of image production via the creation and use of the Entoptic Field Camera. Entoptic phenomena usually refer to perceptions… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

    Comments: To be published in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023, Hamburg, Germany

  9. Illuminating Large-Scale IPv6 Scanning in the Internet

    Authors: Philipp Richter, Oliver Gasser, Arthur Berger

    Abstract: While scans of the IPv4 space are ubiquitous, today little is known about scanning activity in the IPv6 Internet. In this work, we present a longitudinal and detailed empirical study on large-scale IPv6 scanning behavior in the Internet, based on firewall logs captured at some 230,000 hosts of a major Content Distribution Network (CDN). We develop methods to identify IPv6 scans, assess current and… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Journal ref: in Proceedings of the ACM Internet Measurement Conference (IMC), 2022

  10. arXiv:2201.10491  [pdf

    cs.HC cs.CY

    Playing The Ethics Card: Ethical Aspects In Design Tools For Inspiration And Education

    Authors: Albrecht Kurze, Arne Berger

    Abstract: This paper relates findings of own research in the domain of co-design tools in terms of ethical aspects and their opportunities for inspiration and in HCI education. We overview a number of selected general-purpose HCI/design tools as well as domain specific tools for the Internet of Things. These tools are often card-based, not only suitable for workshops with co-designers but also for internal… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: Workshop Co-designing Resources for Ethics Education in HCI at Conference on Human Factors in Computing Systems (CHI 21). May 9, 2021

  11. arXiv:2101.04035  [pdf, other

    cs.HC cs.CY cs.LG

    Machine Learning Uncertainty as a Design Material: A Post-Phenomenological Inquiry

    Authors: Jesse Josua Benjamin, Arne Berger, Nick Merrill, James Pierce

    Abstract: Design research is important for understanding and interrogating how emerging technologies shape human experience. However, design research with Machine Learning (ML) is relatively underdeveloped. Crucially, designers have not found a grasp on ML uncertainty as a design opportunity rather than an obstacle. The technical literature points to data and model uncertainties as two main properties of ML… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

    Comments: Accepted to ACM 2021 CHI Conference on Human Factors in Computing Systems (CHI 2021)

    ACM Class: H.5.0

  12. Towards Reconstructing Multi-Step Cyber Attacks in Modern Cloud Environments with Tripwires

    Authors: Mario Kahlhofer, Michael Hölzl, Andreas Berger

    Abstract: Rapidly-changing cloud environments that consist of heavily interconnected components are difficult to secure. Existing solutions often try to correlate many weak indicators to identify and reconstruct multi-step cyber attacks. The lack of a true, causal link between most of these indicators still leaves administrators with a lot of false-positives to browse through. We argue that cyber deception… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

    Comments: To be published in European Interdisciplinary Cybersecurity Conference (EICC 2020)

  13. arXiv:2008.10709  [pdf, ps, other

    cs.DS cs.DM

    Memoryless Worker-Task Assignment with Polylogarithmic Switching Cost

    Authors: Aaron Berger, William Kuszmaul, Adam Polak, Jonathan Tidor, Nicole Wein

    Abstract: We study the basic problem of assigning memoryless workers to tasks with dynamically changing demands. Given a set of $w$ workers and a multiset $T \subseteq[t]$ of $|T|=w$ tasks, a memoryless worker-task assignment function is any function $φ$ that assigns the workers $[w]$ to the tasks $T$ based only on the current value of $T$. The assignment function $φ$ is said to have switching cost at most… ▽ More

    Submitted 28 April, 2022; v1 submitted 24 August, 2020; originally announced August 2020.

    Comments: ICALP 2022

  14. arXiv:2004.12195  [pdf, other

    cs.DL cs.CL cs.HC

    QURATOR: Innovative Technologies for Content and Data Curation

    Authors: Georg Rehm, Peter Bourgonje, Stefanie Hegele, Florian Kintzel, Julián Moreno Schneider, Malte Ostendorff, Karolina Zaczynska, Armin Berger, Stefan Grill, Sören Räuchle, Jens Rauenbusch, Lisa Rutenburg, André Schmidt, Mikka Wild, Henry Hoffmann, Julian Fink, Sarah Schulz, Jurica Seva, Joachim Quantz, Joachim Böttger, Josefine Matthey, Rolf Fricke, Jan Thomsen, Adrian Paschke, Jamal Al Qundus , et al. (15 additional authors not shown)

    Abstract: In all domains and sectors, the demand for intelligent systems to support the processing and generation of digital content is rapidly increasing. The availability of vast amounts of content and the pressure to publish new content quickly and in rapid succession requires faster, more efficient and smarter processing and generation methods. With a consortium of ten partners from research and industr… ▽ More

    Submitted 25 April, 2020; originally announced April 2020.

    Comments: Proceedings of QURATOR 2020: The conference for intelligent content solutions, Berlin, Germany, February 2020

  15. arXiv:1912.13273  [pdf

    cs.HC

    From Ideation to Implications: Directions for the Internet of Things in the Home

    Authors: Albrecht Kurze, Arne Berger, Teresa Denefleh

    Abstract: In this paper we give a brief overview of our approaches and ongoing work for future directions of the Internet of Things (IoT) with a focus on the IoT in the home. We highlight some of our activities including tools and methods for an ideation-driven approach as well as for an implications-driven approach. We point to some findings of workshops and empirical field-studies. We show examples for ne… ▽ More

    Submitted 31 December, 2019; originally announced December 2019.

    Comments: Proceedings of the CHI 2019 Workshop on New Directions for the IoT: Automate, Share, Build, and Care, (arXiv:1906.06089)

    Report number: IOTD/2019/05

  16. arXiv:1911.09890  [pdf, other

    cs.DM cs.DS

    Degree-Bounded Generalized Polymatroids and Approximating the Metric Many-Visits TSP

    Authors: Kristóf Bérczi, André Berger, Matthias Mnich, Roland Vincze

    Abstract: In the Bounded Degree Matroid Basis Problem, we are given a matroid and a hypergraph on the same ground set, together with costs for the elements of that set as well as lower and upper bounds $f(\varepsilon)$ and $g(\varepsilon)$ for each hyperedge $\varepsilon$. The objective is to find a minimum-cost basis $B$ such that $f(\varepsilon) \leq |B \cap \varepsilon| \leq g(\varepsilon)$ for each hype… ▽ More

    Submitted 14 December, 2019; v1 submitted 22 November, 2019; originally announced November 2019.

    Comments: 17 pages

  17. arXiv:1805.06265  [pdf, ps, other

    cs.DC cs.DS

    Integrated Bounds for Disintegrated Storage

    Authors: Alon Berger, Idit Keidar, Alexander Spiegelman

    Abstract: We point out a somewhat surprising similarity between non-authenticated Byzantine storage, coded storage, and certain emulations of shared registers from smaller ones. A common characteristic in all of these is the inability of reads to safely return a value obtained in a single atomic access to shared storage. We collectively refer to such systems as disintegrated storage, and show integrated spa… ▽ More

    Submitted 6 August, 2018; v1 submitted 16 May, 2018; originally announced May 2018.

  18. arXiv:1804.06361  [pdf, other

    cs.DS

    A time- and space-optimal algorithm for the many-visits TSP

    Authors: André Berger, László Kozma, Matthias Mnich, Roland Vincze

    Abstract: The many-visits traveling salesperson problem (MV-TSP) asks for an optimal tour of $n$ cities that visits each city $c$ a prescribed number $k_c$ of times. Travel costs may be asymmetric, and visiting a city twice in a row may incur a non-zero cost. The MV-TSP problem finds applications in scheduling, geometric approximation, and Hamiltonicity of certain graph families. The fastest known algorit… ▽ More

    Submitted 21 April, 2020; v1 submitted 17 April, 2018; originally announced April 2018.

    Comments: Small fixes, journal version

  19. Algorithm guided outlining of 105 pancreatic cancer liver metastases in Ultrasound

    Authors: Alexander Hann, Lucas Bettac, Mark M. Haenle, Tilmann Graeter, Andreas W. Berger, Jens Dreyhaupt, Dieter Schmalstieg, Wolfram G. Zoller, Jan Egger

    Abstract: Manual segmentation of hepatic metastases in ultrasound images acquired from patients suffering from pancreatic cancer is common practice. Semiautomatic measurements promising assistance in this process are often assessed using a small number of lesions performed by examiners who already know the algorithm. In this work, we present the application of an algorithm for the segmentation of liver meta… ▽ More

    Submitted 9 October, 2017; originally announced October 2017.

    Comments: 7 pages, 3 Figures, 3 Tables, 46 References

    Journal ref: Sci Rep. 2017 Oct 6;7(1):12779

  20. arXiv:1707.03900  [pdf, other

    cs.NI

    kIP: a Measured Approach to IPv6 Address Anonymization

    Authors: David Plonka, Arthur Berger

    Abstract: Privacy-minded Internet service operators anonymize IPv6 addresses by truncating them to a fixed length, perhaps due to long-standing use of this technique with IPv4 and a belief that it's "good enough." We claim that simple anonymization by truncation is suspect since it does not entail privacy guarantees nor does it take into account some common address assignment practices observed today. To in… ▽ More

    Submitted 12 July, 2017; originally announced July 2017.

  21. A unifying framework for fast randomization of ecological networks with fixed (node) degrees

    Authors: Corrie Jacobien Carstens, Annabell Berger, Giovanni Strona

    Abstract: The switching model is a Markov chain approach to sample graphs with fixed degree sequence uniformly at random. The recently invented Curveball algorithm for bipartite graphs applies several switches simultaneously (`trades'). Here, we introduce Curveball algorithms for simple (un)directed graphs which use single or simultaneous trades. We show experimentally that these algorithms converge magnitu… ▽ More

    Submitted 26 July, 2018; v1 submitted 16 September, 2016; originally announced September 2016.

    Journal ref: Corrie Jacobien Carstens, Annabell Berger, Giovanni Strona, A unifying framework for fast randomization of ecological networks with fixed (node) degrees, MethodsX, Volume 5, 2018, Pages 773-780

  22. arXiv:1607.04597  [pdf, ps, other

    math.CO cs.DM

    Query Complexity of Mastermind Variants

    Authors: Aaron Berger, Christopher Chute, Matthew Stone

    Abstract: We study variants of Mastermind, a popular board game in which the objective is sequence reconstruction. In this two-player game, the so-called \textit{codemaker} constructs a hidden sequence $H = (h_1, h_2, \ldots, h_n)$ of colors selected from an alphabet $\mathcal{A} = \{1,2,\ldots, k\}$ (\textit{i.e.,} $h_i\in\mathcal{A}$ for all $i\in\{1,2,\ldots, n\}$). The game then proceeds in turns, each… ▽ More

    Submitted 25 September, 2017; v1 submitted 15 July, 2016; originally announced July 2016.

    Comments: Revised and trimmed- 17 pages

    MSC Class: 91A46; 68Q25

  23. arXiv:1606.04327  [pdf, other

    cs.NI cs.AI cs.IT

    Entropy/IP: Uncovering Structure in IPv6 Addresses

    Authors: Pawel Foremski, David Plonka, Arthur Berger

    Abstract: In this paper, we introduce Entropy/IP: a system that discovers Internet address structure based on analyses of a subset of IPv6 addresses known to be active, i.e., training data, gleaned by readily available passive and active means. The system is completely automated and employs a combination of information-theoretic and machine learning techniques to probabilistically model IPv6 addresses. We p… ▽ More

    Submitted 21 November, 2016; v1 submitted 14 June, 2016; originally announced June 2016.

    Comments: Paper presented at the ACM IMC 2016 in Santa Monica, USA (https://dl.acm.org/citation.cfm?id=2987445). Live Demo site available at http://www.entropy-ip.com/

    Journal ref: IMC '16 Proceedings of the 2016 ACM on Internet Measurement Conference, pp. 167-181

  24. Beyond Counting: New Perspectives on the Active IPv4 Address Space

    Authors: Philipp Richter, Georgios Smaragdakis, David Plonka, Arthur Berger

    Abstract: In this study, we report on techniques and analyses that enable us to capture Internet-wide activity at individual IP address-level granularity by relying on server logs of a large commercial content delivery network (CDN) that serves close to 3 trillion HTTP requests on a daily basis. Across the whole of 2015, these logs recorded client activity involving 1.2 billion unique IPv4 addresses, the hi… ▽ More

    Submitted 9 September, 2016; v1 submitted 1 June, 2016; originally announced June 2016.

    Comments: in Proceedings of ACM IMC 2016

  25. Marathon: An open source software library for the analysis of Markov-Chain Monte Carlo algorithms

    Authors: Steffen Rechner, Annabell Berger

    Abstract: In this paper, we consider the Markov-Chain Monte Carlo (MCMC) approach for random sampling of combinatorial objects. The running time of such an algorithm depends on the total mixing time of the underlying Markov chain and is unknown in general. For some Markov chains, upper bounds on this total mixing time exist but are too large to be applicable in practice. We try to answer the question, wheth… ▽ More

    Submitted 14 September, 2016; v1 submitted 19 August, 2015; originally announced August 2015.

  26. Temporal and Spatial Classification of Active IPv6 Addresses

    Authors: David Plonka, Arthur Berger

    Abstract: There is striking volume of World-Wide Web activity on IPv6 today. In early 2015, one large Content Distribution Network handles 50 billion IPv6 requests per day from hundreds of millions of IPv6 client addresses; billions of unique client addresses are observed per month. Address counts, however, obscure the number of hosts with IPv6 connectivity to the global Internet. There are numerous address… ▽ More

    Submitted 17 July, 2015; v1 submitted 26 June, 2015; originally announced June 2015.

  27. arXiv:1504.06779  [pdf, other

    cs.CV stat.ML

    Computational Cost Reduction in Learned Transform Classifications

    Authors: Emerson Lopes Machado, Cristiano Jacques Miosso, Ricardo von Borries, Murilo Coutinho, Pedro de Azevedo Berger, Thiago Marques, Ricardo Pezzuol Jacobi

    Abstract: We present a theoretical analysis and empirical evaluations of a novel set of techniques for computational cost reduction of classifiers that are based on learned transform and soft-threshold. By modifying optimization procedures for dictionary and classifier training, as well as the resulting dictionary entries, our techniques allow to reduce the bit precision and to replace each floating-point m… ▽ More

    Submitted 30 April, 2016; v1 submitted 25 April, 2015; originally announced April 2015.

  28. arXiv:1406.1605  [pdf, ps, other

    cs.NI

    Energy Efficient and Reliable Wireless Sensor Networks - An Extension to IEEE 802.15.4e

    Authors: Achim Berger, Markus Pichler, Werner Haslmayr, Andreas Springer

    Abstract: Collecting sensor data in industrial environments from up to some tenth of battery powered sensor nodes with sampling rates up to 100Hz requires energy aware protocols, which avoid collisions and long listening phases. The IEEE 802.15.4 standard focuses on energy aware wireless sensor networks (WSNs) and the Task Group 4e has published an amendment to fulfill up to 100 sensor value transmissions p… ▽ More

    Submitted 6 June, 2014; originally announced June 2014.

  29. arXiv:1404.4249  [pdf, other

    cs.DM

    Broder's Chain Is Not Rapidly Mixing

    Authors: Annabell Berger, Steffen Rechner

    Abstract: We prove that Broder's Markov chain for approximate sampling near-perfect and perfect matchings is not rapidly mixing for Hamiltonian, regular, threshold and planar bipartite graphs, filling a gap in the literature. In the second part we experimentally compare Broder's chain with the Markov chain by Jerrum, Sinclair and Vigoda from 2004. For the first time, we provide a systematic experimental inv… ▽ More

    Submitted 16 April, 2014; originally announced April 2014.

    Comments: Keywords: sampling of matchings, rapidly mixing Markov chains, permanent of a matrix, random generation, monomer-dimer systems, Markov chain Monte Carlo

  30. arXiv:1212.5443  [pdf, ps, other

    math.CO cs.DM

    The Connection between the Number of Realizations for Degree Sequences and Majorization

    Authors: Annabell Berger

    Abstract: The \emph{graph realization problem} is to find for given nonnegative integers $a_1,\dots,a_n$ a simple graph (no loops or multiple edges) such that each vertex $v_i$ has degree $a_i.$ Given pairs of nonnegative integers $(a_1,b_1),\dots,(a_n,b_n),$ (i) the \emph{bipartite realization problem} ask whether there is a bipartite graph (no loops or multiple edges) such that vectors $(a_1,...,a_n)$ and… ▽ More

    Submitted 1 July, 2014; v1 submitted 21 December, 2012; originally announced December 2012.

    Comments: 30 pages. There was a mistake an case~3 and case~4 in the proof of the result of Proposition 10 (current version). I corrected it. For that I added a further result in Proposition 9

  31. arXiv:1203.3636  [pdf, other

    cs.DS cs.DM

    How to Attack the NP-complete Dag Realization Problem in Practice

    Authors: Annabell Berger, Matthias Müller-Hannemann

    Abstract: We study the following fundamental realization problem of directed acyclic graphs (dags). Given a sequence S:=(a_1,b_1),...,(a_n, b_n) with a_i, b_i in Z_0^+, does there exist a dag (no parallel arcs allowed) with labeled vertex set V:= {v_1,...,v_n} such that for all v_i in V indegree and outdegree of v_i match exactly the given numbers a_i and b_i, respectively? Recently this decision problem ha… ▽ More

    Submitted 16 March, 2012; originally announced March 2012.

    Comments: 20 pages, 11 figures, extended abstract to appear in Proceedings of SEA 2012

  32. arXiv:0912.0685  [pdf, ps, other

    cs.DM cs.DS

    Uniform sampling of undirected and directed graphs with a fixed degree sequence

    Authors: Annabell Berger, Matthias Müller-Hannemann

    Abstract: Many applications in network analysis require algorithms to sample uniformly at random from the set of all graphs with a prescribed degree sequence. We present a Markov chain based approach which converges to the uniform distribution of all realizations for both the directed and undirected case. It remains an open challenge whether these Markov chains are rapidly mixing. For the case of direct… ▽ More

    Submitted 5 March, 2010; v1 submitted 3 December, 2009; originally announced December 2009.

    ACM Class: F.2.2; G.2.2; G.2.3

  33. A Model of Lexical Attraction and Repulsion

    Authors: Doug Beeferman, Adam Berger, John Lafferty

    Abstract: This paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. While previous work assumed the effects of word co-occurrence statistics to be constant over a window of several hundred words, we show that their influence is nonstationary on a much smaller time scale. Empirical data drawn from English and Japanese text, as well as co… ▽ More

    Submitted 16 June, 1997; v1 submitted 12 June, 1997; originally announced June 1997.

    Comments: 8 pages, LaTeX source and postscript figures for ACL/EACL'97 paper

  34. Text Segmentation Using Exponential Models

    Authors: Doug Beeferman, Adam Berger, John Lafferty

    Abstract: This paper introduces a new statistical approach to partitioning text automatically into coherent segments. Our approach enlists both short-range and long-range language models to help it sniff out likely sites of topic changes in text. To aid its search, the system consults a set of simple lexical hints it has learned to associate with the presence of boundaries through inspection of a large co… ▽ More

    Submitted 12 June, 1997; v1 submitted 11 June, 1997; originally announced June 1997.

    Comments: 12 pages, LaTeX source and postscript figures for EMNLP-2 paper