-
Validating Climate Models with Spherical Convolutional Wasserstein Distance
Authors:
Robert C. Garrett,
Trevor Harris,
Bo Li,
Zhuo Wang
Abstract:
The validation of global climate models is crucial to ensure the accuracy and efficacy of model output. We introduce the spherical convolutional Wasserstein distance to more comprehensively measure differences between climate models and reanalysis data. This new similarity measure accounts for spatial variability using convolutional projections and quantifies local differences in the distribution…
▽ More
The validation of global climate models is crucial to ensure the accuracy and efficacy of model output. We introduce the spherical convolutional Wasserstein distance to more comprehensively measure differences between climate models and reanalysis data. This new similarity measure accounts for spatial variability using convolutional projections and quantifies local differences in the distribution of climate variables. We apply this method to evaluate the historical model outputs of the Coupled Model Intercomparison Project (CMIP) members by comparing them to observational and reanalysis data products. Additionally, we investigate the progression from CMIP phase 5 to phase 6 and find modest improvements in the phase 6 models regarding their ability to produce realistic climatologies.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Learning Polynomial Problems with $SL(2,\mathbb{R})$ Equivariance
Authors:
Hannah Lawrence,
Mitchell Tong Harris
Abstract:
Optimizing and certifying the positivity of polynomials are fundamental primitives across mathematics and engineering applications, from dynamical systems to operations research. However, solving these problems in practice requires large semidefinite programs, with poor scaling in dimension and degree. In this work, we demonstrate for the first time that neural networks can effectively solve such…
▽ More
Optimizing and certifying the positivity of polynomials are fundamental primitives across mathematics and engineering applications, from dynamical systems to operations research. However, solving these problems in practice requires large semidefinite programs, with poor scaling in dimension and degree. In this work, we demonstrate for the first time that neural networks can effectively solve such problems in a data-driven fashion, achieving tenfold speedups while retaining high accuracy. Moreover, we observe that these polynomial learning problems are equivariant to the non-compact group $SL(2,\mathbb{R})$, which consists of area-preserving linear transformations. We therefore adapt our learning pipelines to accommodate this structure, including data augmentation, a new $SL(2,\mathbb{R})$-equivariant architecture, and an architecture equivariant with respect to its maximal compact subgroup, $SO(2, \mathbb{R})$. Surprisingly, the most successful approaches in practice do not enforce equivariance to the entire group, which we prove arises from an unusual lack of architecture universality for $SL(2,\mathbb{R})$ in particular. A consequence of this result, which is of independent interest, is that there exists an equivariant function for which there is no sequence of equivariant polynomials multiplied by arbitrary invariants that approximates the original function. This is a rare example of a symmetric problem where data augmentation outperforms a fully equivariant architecture, and provides interesting lessons in both theory and practice for other problems with non-compact symmetries.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
An Efficient Framework for Global Non-Convex Polynomial Optimization with Nonlinear Polynomial Constraints
Authors:
Mitchell Tong Harris,
Pierre-David Letourneau,
Dalton Jones,
M. Harper Langston
Abstract:
We present an efficient framework for solving constrained global non-convex polynomial optimization problems. We prove the existence of an equivalent nonlinear reformulation of such problems that possesses essentially no spurious local minima. We show through numerical experiments that polynomial scaling in dimension and degree is achievable for computing the optimal value and location of previous…
▽ More
We present an efficient framework for solving constrained global non-convex polynomial optimization problems. We prove the existence of an equivalent nonlinear reformulation of such problems that possesses essentially no spurious local minima. We show through numerical experiments that polynomial scaling in dimension and degree is achievable for computing the optimal value and location of previously intractable global constrained polynomial optimization problems in high dimension.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Fuel Consumption Prediction for a Passenger Ferry using Machine Learning and In-service Data: A Comparative Study
Authors:
Pedram Agand,
Allison Kennedy,
Trevor Harris,
Chanwoo Bae,
Mo Chen,
Edward J Park
Abstract:
As the importance of eco-friendly transportation increases, providing an efficient approach for marine vessel operation is essential. Methods for status monitoring with consideration to the weather condition and forecasting with the use of in-service data from ships requires accurate and complete models for predicting the energy efficiency of a ship. The models need to effectively process all the…
▽ More
As the importance of eco-friendly transportation increases, providing an efficient approach for marine vessel operation is essential. Methods for status monitoring with consideration to the weather condition and forecasting with the use of in-service data from ships requires accurate and complete models for predicting the energy efficiency of a ship. The models need to effectively process all the operational data in real-time. This paper presents models that can predict fuel consumption using in-service data collected from a passenger ship. Statistical and domain-knowledge methods were used to select the proper input variables for the models. These methods prevent over-fitting, missing data, and multicollinearity while providing practical applicability. Prediction models that were investigated include multiple linear regression (MLR), decision tree approach (DT), an artificial neural network (ANN), and ensemble methods. The best predictive performance was from a model developed using the XGboost technique which is a boosting ensemble approach. \rvv{Our code is available on GitHub at \url{https://github.com/pagand/model_optimze_vessel/tree/OE} for future research.
△ Less
Submitted 23 October, 2023; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Forecasting West Nile Virus with Graph Neural Networks: Harnessing Spatial Dependence in Irregularly Sampled Geospatial Data
Authors:
Adam Tonks,
Trevor Harris,
Bo Li,
William Brown,
Rebecca Smith
Abstract:
Machine learning methods have seen increased application to geospatial environmental problems, such as precipitation nowcasting, haze forecasting, and crop yield prediction. However, many of the machine learning methods applied to mosquito population and disease forecasting do not inherently take into account the underlying spatial structure of the given data. In our work, we apply a spatially awa…
▽ More
Machine learning methods have seen increased application to geospatial environmental problems, such as precipitation nowcasting, haze forecasting, and crop yield prediction. However, many of the machine learning methods applied to mosquito population and disease forecasting do not inherently take into account the underlying spatial structure of the given data. In our work, we apply a spatially aware graph neural network model consisting of GraphSAGE layers to forecast the presence of West Nile virus in Illinois, to aid mosquito surveillance and abatement efforts within the state. More generally, we show that graph neural networks applied to irregularly sampled geospatial data can exceed the performance of a range of baseline methods including logistic regression, XGBoost, and fully-connected neural networks.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution
Authors:
Keshav Santhanam,
Siddharth Krishna,
Ryota Tomioka,
Tim Harris,
Matei Zaharia
Abstract:
The rapidly growing size of deep neural network (DNN) models and datasets has given rise to a variety of distribution strategies such as data, tensor-model, pipeline parallelism, and hybrid combinations thereof. Each of these strategies offers its own trade-offs and exhibits optimal performance across different models and hardware topologies. Selecting the best set of strategies for a given setup…
▽ More
The rapidly growing size of deep neural network (DNN) models and datasets has given rise to a variety of distribution strategies such as data, tensor-model, pipeline parallelism, and hybrid combinations thereof. Each of these strategies offers its own trade-offs and exhibits optimal performance across different models and hardware topologies. Selecting the best set of strategies for a given setup is challenging because the search space grows combinatorially, and debugging and testing on clusters is expensive. In this work we propose DistIR, an expressive intermediate representation for distributed DNN computation that is tailored for efficient analyses, such as simulation. This enables automatically identifying the top-performing strategies without having to execute on physical hardware. Unlike prior work, DistIR can naturally express many distribution strategies including pipeline parallelism with arbitrary schedules. Our evaluation on MLP training and GPT-2 inference models demonstrates how DistIR and its simulator enable fast grid searches over complex distribution spaces spanning up to 1000+ configurations, reducing optimization time by an order of magnitude for certain regimes.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Modeling memory bandwidth patterns on NUMA machines with performance counters
Authors:
Daniel Goodman,
Roni Haecki,
Tim Harris
Abstract:
Computers used for data analytics are often NUMA systems with multiple sockets per machine, multiple cores per socket, and multiple thread contexts per core. To get the peak performance out of these machines requires the correct number of threads to be placed in the correct positions on the machine. One particularly interesting element of the placement of memory and threads is the way it effects t…
▽ More
Computers used for data analytics are often NUMA systems with multiple sockets per machine, multiple cores per socket, and multiple thread contexts per core. To get the peak performance out of these machines requires the correct number of threads to be placed in the correct positions on the machine. One particularly interesting element of the placement of memory and threads is the way it effects the movement of data around the machine, and the increased latency this can introduce to reads and writes. In this paper we describe work on modeling the bandwidth requirements of an application on a NUMA compute node based on the placement of threads. The model is parameterized by sampling performance counters during 2 application runs with carefully chosen thread placements. Evaluating the model with thousands of measurements shows a median difference from predictions of 2.34% of the bandwidth. The results of this modeling can be used in a number of ways varying from: Performance debugging during development where the programmer can be alerted to potentially problematic memory access patterns; To systems such as Pandia which take an application and predict the performance and system load of a proposed thread count and placement; To libraries of data structures such as Parallel Collections and Smart Arrays that can abstract from the user memory placement and thread placement issues when parallelizing code.
△ Less
Submitted 15 June, 2021;
originally announced June 2021.
-
ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition
Authors:
Daniela Massiceti,
Luisa Zintgraf,
John Bronskill,
Lida Theodorou,
Matthew Tobias Harris,
Edward Cutrell,
Cecily Morrison,
Katja Hofmann,
Simone Stumpf
Abstract:
Object recognition has made great advances in the last decade, but predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variatio…
▽ More
Object recognition has made great advances in the last decade, but predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset and benchmark, grounded in the real-world application of teachable object recognizers for people who are blind/low-vision. The dataset contains 3,822 videos of 486 objects recorded by people who are blind/low-vision on their mobile phones. The benchmark reflects a realistic, highly challenging recognition problem, providing a rich playground to drive research in robustness to few-shot, high-variation conditions. We set the benchmark's first state-of-the-art and show there is massive scope for further innovation, holding the potential to impact a broad range of real-world vision applications including tools for the blind/low-vision community. We release the dataset at https://doi.org/10.25383/city.14294597 and benchmark code at https://github.com/microsoft/ORBIT-Dataset.
△ Less
Submitted 8 October, 2021; v1 submitted 8 April, 2021;
originally announced April 2021.
-
Persistent Memory Transactions
Authors:
Virendra Marathe,
Achin Mishra,
Amee Trivedi,
Yihe Huang,
Faisal Zaghloul,
Sanidhya Kashyap,
Margo Seltzer,
Tim Harris,
Steve Byan,
Bill Bridge,
Dave Dice
Abstract:
This paper presents a comprehensive analysis of performance trade offs between implementation choices for transaction runtime systems on persistent memory. We compare three implementations of transaction runtimes: undo logging, redo logging, and copy-on-write. We also present a memory allocator that plugs into these runtimes. Our microbenchmark based evaluation focuses on understanding the interpl…
▽ More
This paper presents a comprehensive analysis of performance trade offs between implementation choices for transaction runtime systems on persistent memory. We compare three implementations of transaction runtimes: undo logging, redo logging, and copy-on-write. We also present a memory allocator that plugs into these runtimes. Our microbenchmark based evaluation focuses on understanding the interplay between various factors that contribute to performance differences between the three runtimes -- read/write access patterns of workloads, size of the persistence domain (portion of the memory hierarchy where the data is effectively persistent), cache locality, and transaction runtime bookkeeping overheads. No single runtime emerges as a clear winner. We confirm our analysis in more realistic settings of three "real world" applications we developed with our transactional API: (i) a key-value store we implemented from scratch, (ii) a SQLite port, and (iii) a persistified version of memcached, a popular key-value store. These findings are not only consistent with our microbenchmark analysis, but also provide additional interesting insights into other factors (e.g. effects of multithreading and synchronization) that affect application performance.
△ Less
Submitted 9 March, 2018;
originally announced April 2018.
-
The End of a Myth: Distributed Transactions Can Scale
Authors:
Erfan Zamanian,
Carsten Binnig,
Tim Kraska,
Tim Harris
Abstract:
The common wisdom is that distributed transactions do not scale. But what if distributed transactions could be made scalable using the next generation of networks and a redesign of distributed databases? There would be no need for developers anymore to worry about co-partitioning schemes to achieve decent performance. Application development would become easier as data placement would no longer de…
▽ More
The common wisdom is that distributed transactions do not scale. But what if distributed transactions could be made scalable using the next generation of networks and a redesign of distributed databases? There would be no need for developers anymore to worry about co-partitioning schemes to achieve decent performance. Application development would become easier as data placement would no longer determine how scalable an application is. Hardware provisioning would be simplified as the system administrator can expect a linear scale-out when adding more machines rather than some complex sub-linear function, which is highly application specific.
In this paper, we present the design of our novel scalable database system NAM-DB and show that distributed transactions with the very common Snapshot Isolation guarantee can indeed scale using the next generation of RDMA-enabled network technology without any inherent bottlenecks. Our experiments with the TPC-C benchmark show that our system scales linearly to over 6.5 million new-order (14.5 million total) distributed transactions per second on 56 machines.
△ Less
Submitted 21 November, 2016; v1 submitted 3 July, 2016;
originally announced July 2016.
-
The Influence of Malloc Placement on TSX Hardware Transactional Memory
Authors:
Dave Dice,
Tim Harris,
Alex Kogan,
Yossi Lev
Abstract:
The hardware transactional memory (HTM) implementation in Intel's i7-4770 "Haswell" processor tracks the transactional read-set in the L1 (level-1), L2 (level-2) and L3 (level-3) caches and the write-set in the L1 cache. Displacement or eviction of read-set entries from the cache hierarchy or write-set entries from the L1 results in abort. We show that the placement policies of dynamic storage all…
▽ More
The hardware transactional memory (HTM) implementation in Intel's i7-4770 "Haswell" processor tracks the transactional read-set in the L1 (level-1), L2 (level-2) and L3 (level-3) caches and the write-set in the L1 cache. Displacement or eviction of read-set entries from the cache hierarchy or write-set entries from the L1 results in abort. We show that the placement policies of dynamic storage allocators -- such as those found in common "malloc" implementations -- can influence the L1 conflict miss rate in the L1. Conflict misses -- sometimes called mapping misses -- arise because of less than ideal associativity and represent imbalanced distribution of active memory blocks over the set of available L1 indices. Under transactional execution conflict misses may manifest as aborts, representing wasted or futile effort instead of a simple stall as would occur in normal execution mode.
Furthermore, when HTM is used for transactional lock elision (TLE), persistent aborts arising from conflict misses can force the offending thread through the so-called "slow path". The slow path is undesirable as the thread must acquire the lock and run the critical section in normal execution mode, precluding the concurrent execution of threads in the "fast path" that monitor that same lock and run their critical sections in transactional mode. For a given lock, multiple threads can concurrently use the transactional fast path, but at most one thread can use the non-transactional slow path at any given time. Threads in the slow path preclude safe concurrent fast path execution. Aborts rising from placement policies and L1 index imbalance can thus result in loss of concurrency and reduced aggregate throughput.
△ Less
Submitted 29 April, 2015; v1 submitted 17 April, 2015;
originally announced April 2015.
-
Hardware extensions to make lazy subscription safe
Authors:
Dave Dice,
Timothy L. Harris,
Alex Kogan,
Yossi Lev,
Mark Moir
Abstract:
Transactional Lock Elision (TLE) uses Hardware Transactional Memory (HTM) to execute unmodified critical sections concurrently, even if they are protected by the same lock. To ensure correctness, the transactions used to execute these critical sections "subscribe" to the lock by reading it and checking that it is available. A recent paper proposed using the tempting "lazy subscription" optimi…
▽ More
Transactional Lock Elision (TLE) uses Hardware Transactional Memory (HTM) to execute unmodified critical sections concurrently, even if they are protected by the same lock. To ensure correctness, the transactions used to execute these critical sections "subscribe" to the lock by reading it and checking that it is available. A recent paper proposed using the tempting "lazy subscription" optimization for a similar technique in a different context, namely transactional systems that use a single global lock (SGL) to protect all transactional data. We identify several pitfalls that show that lazy subscription \emph{is not safe} for TLE because unmodified critical sections executing before subscribing to the lock may behave incorrectly in a number of subtle ways. We also show that recently proposed compiler support for modifying transaction code to ensure subscription occurs before any incorrect behavior could manifest is not sufficient to avoid all of the pitfalls we identify. We further argue that extending such compiler support to avoid all pitfalls would add substantial complexity and would usually limit the extent to which subscription can be deferred, undermining the effectiveness of the optimization. Hardware extensions suggested in the recent proposal also do not address all of the pitfalls we identify. In this extended version of our WTTM 2014 paper, we describe hardware extensions that make lazy subscription safe, both for SGL-based transactional systems and for TLE, without the need for special compiler support. We also explain how nontransactional loads can be exploited, if available, to further enhance the effectiveness of lazy subscription.
△ Less
Submitted 23 July, 2014;
originally announced July 2014.