Zum Hauptinhalt springen

Showing 1–14 of 14 results for author: Hogg, D W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18278  [pdf, other

    astro-ph.EP astro-ph.IM cs.LG

    NotPlaNET: Removing False Positives from Planet Hunters TESS with Machine Learning

    Authors: Valentina Tardugno Poleo, Nora Eisner, David W. Hogg

    Abstract: Differentiating between real transit events and false positive signals in photometric time series data is a bottleneck in the identification of transiting exoplanets, particularly long-period planets. This differentiation typically requires visual inspection of a large number of transit-like signals to rule out instrumental and astrophysical false positives that mimic planetary transit signals. We… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Under review at The Astronomical Journal

  2. arXiv:2405.18095  [pdf, other

    stat.ML astro-ph.IM cs.LG physics.data-an

    Is machine learning good or bad for the natural sciences?

    Authors: David W. Hogg, Soledad Villar

    Abstract: Machine learning (ML) methods are having a huge impact across all of the sciences. However, ML has a strong ontology - in which only the data exist - and a strong epistemology - in which a model is considered good if it performs well on held-out training data. These philosophies are in strong conflict with both standard practices and key philosophies in the natural sciences. Here we identify some… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: A Position Paper accepted for publication in the 2024 International Conference on Machine Learning (ICML)

  3. arXiv:2305.12585  [pdf, other

    cs.LG

    GeometricImageNet: Extending convolutional neural networks to vector and tensor images

    Authors: Wilson Gregory, David W. Hogg, Ben Blum-Smith, Maria Teresa Arias, Kaze W. K. Wong, Soledad Villar

    Abstract: Convolutional neural networks and their ilk have been very successful for many learning tasks involving images. These methods assume that the input is a scalar image representing the intensity in each pixel, possibly in multiple channels for color images. In natural-science domains however, image-like data sets might have vectors (velocity, say), tensors (polarization, say), pseudovectors (magneti… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  4. arXiv:2301.13724  [pdf, other

    stat.ML astro-ph.IM cs.LG math-ph physics.data-an

    Towards fully covariant machine learning

    Authors: Soledad Villar, David W. Hogg, Weichi Yao, George A. Kevrekidis, Bernhard Schölkopf

    Abstract: Any representation of data involves arbitrary investigator choices. Because those choices are external to the data-generating process, each choice leads to an exact symmetry, corresponding to the group of transformations that takes one possible representation to another. These are the passive symmetries; they include coordinate freedom, gauge symmetry, and units covariance, all of which have led t… ▽ More

    Submitted 28 June, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: substantial revision from v1; submitted to TMLR

  5. arXiv:2204.00887  [pdf, other

    stat.ML cs.LG physics.data-an

    Dimensionless machine learning: Imposing exact units equivariance

    Authors: Soledad Villar, Weichi Yao, David W. Hogg, Ben Blum-Smith, Bianca Dumitrascu

    Abstract: Units equivariance (or units covariance) is the exact symmetry that follows from the requirement that relationships among measured quantities of physics relevance must obey self-consistent dimensional scalings. Here, we express this symmetry in terms of a (non-compact) group action, and we employ dimensional analysis and ideas from equivariant machine learning to provide a methodology for exactly… ▽ More

    Submitted 31 December, 2022; v1 submitted 2 April, 2022; originally announced April 2022.

    Journal ref: Journal of Machine Learning Research 24 (2023) 1--32

  6. arXiv:2110.03761  [pdf, other

    cs.LG

    A simple equivariant machine learning method for dynamics based on scalars

    Authors: Weichi Yao, Kate Storey-Fisher, David W. Hogg, Soledad Villar

    Abstract: Physical systems obey strict symmetry principles. We expect that machine learning methods that intrinsically respect these symmetries should have higher prediction accuracy and better generalization in prediction of physical dynamics. In this work we implement a principled model based on invariant scalars, and release open-source code. We apply this Scalars method to a simple chaotic dynamical sys… ▽ More

    Submitted 30 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

  7. arXiv:2106.06610  [pdf, other

    cs.LG math-ph stat.ML

    Scalars are universal: Equivariant machine learning, structured like classical physics

    Authors: Soledad Villar, David W. Hogg, Kate Storey-Fisher, Weichi Yao, Ben Blum-Smith

    Abstract: There has been enormous progress in the last few years in designing neural networks that respect the fundamental symmetries and coordinate freedoms of physical law. Some of these frameworks make use of irreducible representations, some make use of high-order tensor objects, and some apply symmetry-enforcing constraints. Different physical laws obey different combinations of fundamental symmetries,… ▽ More

    Submitted 7 February, 2023; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

    Journal ref: Advances in Neural Information Processing Systems, 34, 28848-28863. 2021

  8. arXiv:2101.07256  [pdf, other

    physics.data-an astro-ph.IM cs.LG

    Fitting very flexible models: Linear regression with large numbers of parameters

    Authors: David W. Hogg, Soledad Villar

    Abstract: There are many uses for linear fitting; the context here is interpolation and denoising of data, as when you have calibration data and you want to fit a smooth, flexible function to those data. Or you want to fit a flexible function to de-trend a time series or normalize a spectrum. In these contexts, investigators often choose a polynomial basis, or a Fourier basis, or wavelets, or something equa… ▽ More

    Submitted 15 January, 2021; originally announced January 2021.

    Comments: all code used to make the figures is available at https://github.com/davidwhogg/FlexibleLinearModels

  9. arXiv:2011.11477  [pdf, other

    stat.ML cs.LG

    Dimensionality reduction, regularization, and generalization in overparameterized regressions

    Authors: Ningyuan Huang, David W. Hogg, Soledad Villar

    Abstract: Overparameterization in deep learning is powerful: Very large models fit the training data perfectly and yet often generalize well. This realization brought back the study of linear models for regression, including ordinary least squares (OLS), which, like deep learning, shows a "double-descent" behavior: (1) The risk (expected out-of-sample prediction error) can grow arbitrarily when the number o… ▽ More

    Submitted 19 October, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

    Journal ref: SIAM Journal on Mathematics of Data Science Vol.4 Iss.1, 2022

  10. arXiv:1711.00028  [pdf, other

    physics.ed-ph astro-ph.IM cs.CY

    Hack Weeks as a model for Data Science Education and Collaboration

    Authors: Daniela Huppenkothen, Anthony Arendt, David W. Hogg, Karthik Ram, Jake VanderPlas, Ariel Rokem

    Abstract: Across almost all scientific disciplines, the instruments that record our experimental data and the methods required for storage and data analysis are rapidly increasing in complexity. This gives rise to the need for scientific communities to adapt on shorter time scales than traditional university curricula allow for, and therefore requires new modes of knowledge transfer. The universal applicabi… ▽ More

    Submitted 31 October, 2017; originally announced November 2017.

    Comments: 15 pages, 2 figures, submitted to PNAS, all relevant code available at https://github.com/uwescience/HackWeek-Writeup

  11. arXiv:1505.03036  [pdf, other

    stat.ML astro-ph.EP astro-ph.IM cs.LG

    Removing systematic errors for exoplanet search via latent causes

    Authors: Bernhard Schölkopf, David W. Hogg, Dun Wang, Daniel Foreman-Mackey, Dominik Janzing, Carl-Johann Simon-Gabriel, Jonas Peters

    Abstract: We describe a method for removing the effect of confounders in order to reconstruct a latent quantity of interest. The method, referred to as half-sibling regression, is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification and illustrate the potential of the method in a challenging astronomy application.

    Submitted 12 May, 2015; originally announced May 2015.

    Comments: Extended version of a paper appearing in the Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 2015

    ACM Class: G.3; I.2.6; J.2

  12. arXiv:1406.1528  [pdf, other

    cs.CV astro-ph.IM

    Towards building a Crowd-Sourced Sky Map

    Authors: Dustin Lang, David W. Hogg, Bernhard Scholkopf

    Abstract: We describe a system that builds a high dynamic-range and wide-angle image of the night sky by combining a large set of input images. The method makes use of pixel-rank information in the individual input images to improve a "consensus" pixel rank in the combined image. Because it only makes use of ranks and the complexity of the algorithm is linear in the number of images, the method is useful fo… ▽ More

    Submitted 5 June, 2014; originally announced June 2014.

    Comments: Appeared at AI-STATS 2014

    Journal ref: JMLR Workshop and Conference Proceedings, 33 (AI & Statistics 2014), 549

  13. arXiv:1401.2134  [pdf, other

    cs.DL astro-ph.IM cs.CY

    10 Simple Rules for the Care and Feeding of Scientific Data

    Authors: Alyssa Goodman, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Mercè Crosas, Rosanne Di Stefano, Yolanda Gil, Paul Groth, Margaret Hedstrom, David W. Hogg, Vinay Kashyap, Ashish Mahabal, Aneta Siemiginowska, Aleksandra Slavkovic

    Abstract: This article offers a short guide to the steps scientists can take to ensure that their data and associated analyses continue to be of value and to be recognized. In just the past few years, hundreds of scholarly papers and reports have been written on questions of data sharing, data provenance, research reproducibility, licensing, attribution, privacy, and more, but our goal here is not to review… ▽ More

    Submitted 9 January, 2014; originally announced January 2014.

    Comments: Accepted in PLOS Computational Biology. This paper was written collaboratively, on the web, in the open, using Authorea. The living version of this article, which includes sources and history, is available at http://www.authorea.com/3410/

  14. arXiv:0810.3851  [pdf, ps, other

    astro-ph cs.CV physics.data-an

    Astronomical imaging: The theory of everything

    Authors: David W. Hogg, Dustin Lang

    Abstract: We are developing automated systems to provide homogeneous calibration meta-data for heterogeneous imaging data, using the pixel content of the image alone where necessary. Standardized and complete calibration meta-data permit generative modeling: A good model of the sky through wavelength and time--that is, a model of the positions, motions, spectra, and variability of all stellar sources, plu… ▽ More

    Submitted 21 October, 2008; originally announced October 2008.

    Comments: a talk given at "Classification and Discovery in Large Astronomical Surveys", Ringberg Castle, 2008-10-16