Zum Hauptinhalt springen

Showing 1–23 of 23 results for author: Morozov, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.15089  [pdf, other

    physics.geo-ph cs.AI cs.LG

    Learning Physics for Unveiling Hidden Earthquake Ground Motions via Conditional Generative Modeling

    Authors: Pu Ren, Rie Nakata, Maxime Lacour, Ilan Naiman, Nori Nakata, Jialin Song, Zhengfa Bi, Osman Asif Malik, Dmitriy Morozov, Omri Azencot, N. Benjamin Erichson, Michael W. Mahoney

    Abstract: Predicting high-fidelity ground motions for future earthquakes is crucial for seismic hazard assessment and infrastructure resilience. Conventional empirical simulations suffer from sparse sensor distribution and geographically localized earthquake locations, while physics-based methods are computationally intensive and require accurate representations of Earth structures and earthquake sources. W… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  2. arXiv:2404.03591  [pdf, other

    cs.DC

    Wilkins: HPC In Situ Workflows Made Easy

    Authors: Orcun Yildiz, Dmitriy Morozov, Arnur Nigmetov, Bogdan Nicolae, Tom Peterka

    Abstract: In situ approaches can accelerate the pace of scientific discoveries by allowing scientists to perform data analysis at simulation time. Current in situ workflow systems, however, face challenges in handling the growing complexity and diverse computational requirements of scientific tasks. In this work, we present Wilkins, an in situ workflow system that is designed for ease-of-use while providing… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  3. arXiv:2402.15734  [pdf, other

    cs.LG stat.ML

    Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning

    Authors: Wuyang Chen, Jialin Song, Pu Ren, Shashank Subramanian, Dmitriy Morozov, Michael W. Mahoney

    Abstract: Recent years have witnessed the promise of coupling machine learning methods and physical domainspecific insights for solving scientific problems based on partial differential equations (PDEs). However, being data-intensive, these methods still require a large amount of PDE data. This reintroduces the need for expensive numerical PDE solutions, partially undermining the original goal of avoiding t… ▽ More

    Submitted 13 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  4. arXiv:2312.10700  [pdf, other

    cs.CL cs.AI cs.LG

    Cross-Domain Robustness of Transformer-based Keyphrase Generation

    Authors: Anna Glazkova, Dmitry Morozov

    Abstract: Modern models for text generation show state-of-the-art results in many natural language processing tasks. In this work, we explore the effectiveness of abstractive text summarization models for keyphrase selection. A list of keyphrases is an important element of a text in databases and repositories of electronic documents. In our experiments, abstractive text summarization models fine-tuned for k… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Presented at the XXV International Conference "Data Analytics and Management in Data Intensive Domains" (DAMDID/RCDL), October 2023

    MSC Class: 68T50 ACM Class: I.2.7; I.7.m; H.3.3

  5. arXiv:2310.01698  [pdf, other

    cs.LG stat.ML

    Robustifying State-space Models for Long Sequences via Approximate Diagonalization

    Authors: Annan Yu, Arnur Nigmetov, Dmitriy Morozov, Michael W. Mahoney, N. Benjamin Erichson

    Abstract: State-space models (SSMs) have recently emerged as a framework for learning long-range sequence tasks. An example is the structured state-space sequence (S4) layer, which uses the diagonal-plus-low-rank structure of the HiPPO initialization framework. However, the complicated structure of the S4 layer poses challenges; and, in an effort to address these challenges, models such as S4D and S5 have c… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  6. arXiv:2306.00258  [pdf, other

    cs.LG math.NA

    Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior

    Authors: Shashank Subramanian, Peter Harrington, Kurt Keutzer, Wahid Bhimji, Dmitriy Morozov, Michael Mahoney, Amir Gholami

    Abstract: Pre-trained machine learning (ML) models have shown great performance for a wide range of applications, in particular in natural language processing (NLP) and computer vision (CV). Here, we study how pre-training could be used for scientific machine learning (SciML) applications, specifically in the context of transfer learning. We study the transfer behavior of these models as (i) the pre-trained… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: 16 pages, 11 figures

    Journal ref: NeurIPS 2023

  7. arXiv:2301.10838  [pdf, other

    cs.CG

    Fast Merge Tree Computation via SYCL

    Authors: Arnur Nigmetov, Dmitriy Morozov

    Abstract: A merge tree is a topological descriptor of a real-valued function. Merge trees are used in visualization and topological data analysis, either directly or as a means to another end: computing a 0-dimensional persistence diagram, identifying connected components, performing topological simplification, etc. Scientific computing relies more and more on GPUs to achieve fast, scalable computation. F… ▽ More

    Submitted 27 January, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: Topological Data Analysis and Visualization (TopoInVis) 2022 v2 -- corrected Acknowledgements

  8. Applying Transformer-based Text Summarization for Keyphrase Generation

    Authors: Anna Glazkova, Dmitry Morozov

    Abstract: Keyphrases are crucial for searching and systematizing scholarly documents. Most current methods for keyphrase extraction are aimed at the extraction of the most significant words in the text. But in practice, the list of keyphrases often includes words that do not appear in the text explicitly. In this case, the list of keyphrases represents an abstractive summary of the source text. In this pape… ▽ More

    Submitted 6 October, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: 15 pages, 4 figures. DAMDID-2022

    MSC Class: 68T50 ACM Class: I.2.7; I.7.m; H.3.3

    Journal ref: Lobachevskii J Math 44, 123-136 (2023)

  9. arXiv:2203.16748  [pdf, other

    cs.CG math.AT math.OC

    Topological Optimization with Big Steps

    Authors: Arnur Nigmetov, Dmitriy Morozov

    Abstract: Using persistent homology to guide optimization has emerged as a novel application of topological data analysis. Existing methods treat persistence calculation as a black box and backpropagate gradients only onto the simplices involved in particular pairs. We show how the cycles and chains used in the persistence calculation can be used to prescribe gradients to larger subsets of the domain. In pa… ▽ More

    Submitted 2 November, 2023; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: 26 pages, 29 figures. Updated version (section on consistency of critical sets, more experiments) accepted to DCG

  10. arXiv:2112.03980  [pdf, other

    cs.CG math.AT

    Output-sensitive Computation of Generalized Persistence Diagrams for 2-filtrations

    Authors: Dmitriy Morozov, Amit Patel

    Abstract: When persistence diagrams are formalized as the Mobius inversion of the birth-death function, they naturally generalize to the multi-parameter setting and enjoy many of the key properties, such as stability, that we expect in applications. The direct definition in the 2-parameter setting, and the corresponding brute-force algorithm to compute them, require $Ω(n^4)$ operations. But the size of the… ▽ More

    Submitted 16 May, 2023; v1 submitted 7 December, 2021; originally announced December 2021.

    Comments: Major revision. The exposition is greatly simplified and background section is expanded

  11. arXiv:2104.04739  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG

    MIPT-NSU-UTMN at SemEval-2021 Task 5: Ensembling Learning with Pre-trained Language Models for Toxic Spans Detection

    Authors: Mikhail Kotyushev, Anna Glazkova, Dmitry Morozov

    Abstract: This paper describes our system for SemEval-2021 Task 5 on Toxic Spans Detection. We developed ensemble models using BERT-based neural architectures and post-processing to combine tokens into spans. We evaluated several pre-trained language models using various ensemble techniques for toxic span identification and achieved sizable improvements over our baseline fine-tuned BERT models. Finally, our… ▽ More

    Submitted 10 April, 2021; originally announced April 2021.

    Comments: Accepted at SemEval-2021 Workshop, ACL-IJCNLP 2021

    MSC Class: 68T50 ACM Class: I.2.7; I.7.m; H.3.3

    Journal ref: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)", pp. 913-918, 2021

  12. arXiv:2011.05290  [pdf, other

    cs.LG math.AT

    Topological Regularization via Persistence-Sensitive Optimization

    Authors: Arnur Nigmetov, Aditi S. Krishnapriyan, Nicole Sanderson, Dmitriy Morozov

    Abstract: Optimization, a key tool in machine learning and statistics, relies on regularization to reduce overfitting. Traditional regularization methods control a norm of the solution to ensure its smoothness. Recently, topological methods have emerged as a way to provide a more precise and expressive control over the solution, relying on persistent homology to quantify and reduce its roughness. All such e… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: The first two authors contributed equally to this work

  13. arXiv:2010.16027  [pdf, other

    q-bio.BM cs.LG math.AT

    PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction

    Authors: Nicolas Swenson, Aditi S. Krishnapriyan, Aydin Buluc, Dmitriy Morozov, Katherine Yelick

    Abstract: Understanding protein structure-function relationships is a key challenge in computational biology, with applications across the biotechnology and pharmaceutical industries. While it is known that protein structure directly impacts protein function, many functional prediction tasks use only protein sequence. In this work, we isolate protein structure to make functional annotations for proteins in… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

    Comments: The first two authors contributed equally to this work

  14. arXiv:2010.00532  [pdf, other

    cond-mat.mtrl-sci cs.LG math.AT physics.comp-ph

    Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks

    Authors: Aditi S. Krishnapriyan, Joseph Montoya, Maciej Haranczyk, Jens Hummelshøj, Dmitriy Morozov

    Abstract: Machine learning has emerged as a powerful approach in materials discovery. Its major challenge is selecting features that create interpretable representations of materials, useful across multiple prediction tasks. We introduce an end-to-end machine learning model that automatically generates descriptors that capture a complex representation of a material's structure and chemistry. This approach b… ▽ More

    Submitted 31 March, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: 14 pages main text, 8 figures

  15. arXiv:2001.05972  [pdf, other

    cond-mat.mtrl-sci cs.LG math.AT physics.comp-ph

    Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials

    Authors: Aditi S. Krishnapriyan, Maciej Haranczyk, Dmitriy Morozov

    Abstract: Machine learning has emerged as an attractive alternative to experiments and simulations for predicting material properties. Usually, such an approach relies on specific domain knowledge for feature design: each learning target requires careful selection of features that an expert recognizes as important for the specific task. The major drawback of this approach is that computation of only a few s… ▽ More

    Submitted 6 March, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

    Comments: 14 pages, 7 figures

  16. arXiv:1910.14499  [pdf, other

    eess.SY cs.LG

    Data-driven model for hydraulic fracturing design optimization: focus on building digital database and production forecast

    Authors: A. D. Morozov, D. O. Popkov, V. M. Duplyakov, R. F. Mutalova, A. A. Osiptsov, A. L. Vainshtein, E. V. Burnaev, E. V. Shel, G. V. Paderin

    Abstract: Growing amount of hydraulic fracturing (HF) jobs in the recent two decades resulted in a significant amount of measured data available for development of predictive models via machine learning (ML). In multistage fractured completions, post-fracturing production analysis reveals that different stages produce very non-uniformly due to a combination of geomechanics and fracturing design factors. Hen… ▽ More

    Submitted 18 July, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

  17. arXiv:1809.09955  [pdf, other

    cs.IR eess.SP

    Knowledge extraction, modeling and formalization: EEG case study

    Authors: Dmitry Morozov, Mario Lezoche, Hervé Panetto

    Abstract: Formal Concept Analysis (FCA) is a well-established method for data analysis which finds many applications in data mining. Its extension on complex data representation formats brought a wave of new applications to the problems such as gene expression mining, prediction of toxicity of chemical compounds or clustering of sequences in process event logs. Insipired from this work our research inherits… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

    Comments: arXiv admin note: text overlap with arXiv:1506.05018 by other authors

  18. arXiv:1710.10769  [pdf, other

    stat.ML cs.DC cs.LG

    Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation

    Authors: Penporn Koanantakool, Alnur Ali, Ariful Azad, Aydin Buluc, Dmitriy Morozov, Leonid Oliker, Katherine Yelick, Sang-Yun Oh

    Abstract: Across a variety of scientific disciplines, sparse inverse covariance estimation is a popular tool for capturing the underlying dependency relationships in multivariate data. Unfortunately, most estimators are not scalable enough to handle the sizes of modern high-dimensional data sets (often on the order of terabytes), and assume Gaussian samples. To address these deficiencies, we introduce HP-CO… ▽ More

    Submitted 8 April, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

    Comments: Main paper: 15 pages, appendix: 24 pages

    Journal ref: Artificial Intelligence and Statistics vol. 84 1376-1386 (2018)

  19. arXiv:1606.03357  [pdf, other

    cs.CG

    Geometry Helps to Compare Persistence Diagrams

    Authors: Michael Kerber, Dmitriy Morozov, Arnur Nigmetov

    Abstract: Exploiting geometric structure to improve the asymptotic complexity of discrete assignment problems is a well-studied subject. In contrast, the practical advantages of using geometry for such problems have not been explored. We implement geometric variants of the Hopcroft--Karp algorithm for bottleneck matching (based on previous work by Efrat el al.) and of the auction algorithm by Bertsekas for… ▽ More

    Submitted 10 June, 2016; originally announced June 2016.

    Comments: 20 pages, 10 figures; extended version of paper published in ALENEX 2016

    ACM Class: G.4; G.2.2

  20. Dualities in persistent (co)homology

    Authors: Vin de Silva, Dmitriy Morozov, Mikael Vejdemo-Johansson

    Abstract: We consider sequences of absolute and relative homology and cohomology groups that arise naturally for a filtered cell complex. We establish algebraic relationships between their persistence modules, and show that they contain equivalent information. We explain how one can use the existing algorithm for persistent homology to process any of the four modules, and relate it to a recently introduced… ▽ More

    Submitted 28 July, 2011; originally announced July 2011.

    Comments: 16 pages, 3 figures, submitted to the Inverse Problems special issue on Topological Data Analysis

  21. arXiv:1102.4972  [pdf, ps, other

    cs.CG

    Witnessed k-Distance

    Authors: Leonidas J. Guibas, Quentin Mérigot, Dmitriy Morozov

    Abstract: Distance function to a compact set plays a central role in several areas of computational geometry. Methods that rely on it are robust to the perturbations of the data by the Hausdorff noise, but fail in the presence of outliers. The recently introduced distance to a measure offers a solution by extending the distance function framework to reasoning about the geometry of probability measures, whil… ▽ More

    Submitted 24 February, 2011; originally announced February 2011.

  22. arXiv:1102.3389  [pdf, ps, other

    cs.CG math.AT

    Homology and Robustness of Level and Interlevel Sets

    Authors: Paul Bendich, Herbert Edelsbrunner, Dmitriy Morozov, Amit Patel

    Abstract: Given a function $f: \Xspace \to \Rspace$ on a topological space, we consider the preimages of intervals and their homology groups and show how to read the ranks of these groups from the extended persistence diagram of $f$. In addition, we quantify the robustness of the homology classes under perturbations of $f$ using well groups, and we show how to read the ranks of these groups from the same ex… ▽ More

    Submitted 16 February, 2011; originally announced February 2011.

  23. arXiv:0911.2142  [pdf, ps, other

    cs.CG math.GM

    Quantifying Transversality by Measuring the Robustness of Intersections

    Authors: Herbert Edelsbrunner, Dmitriy Morozov, Amit Patel

    Abstract: By definition, transverse intersections are stable under infinitesimal perturbations. Using persistent homology, we extend this notion to a measure. Given a space of perturbations, we assign to each homology class of the intersection its robustness, the magnitude of a perturbations in this space necessary to kill it, and prove that robustness is stable. Among the applications of this result is a… ▽ More

    Submitted 20 April, 2010; v1 submitted 11 November, 2009; originally announced November 2009.