-
Accelerating Dedispersion using Many-Core Architectures
Authors:
Jan Novotný,
Karel Adámek,
M. A. Clark,
Mike Giles,
Wesley Armour
Abstract:
Astrophysical radio signals are excellent probes of extreme physical processes that emit them. However, to reach Earth, electromagnetic radiation passes through the ionised interstellar medium (ISM), introducing a frequency-dependent time delay (dispersion) to the emitted signal. Removing dispersion enables searches for transient signals like Fast Radio Bursts (FRB) or repeating signals from isola…
▽ More
Astrophysical radio signals are excellent probes of extreme physical processes that emit them. However, to reach Earth, electromagnetic radiation passes through the ionised interstellar medium (ISM), introducing a frequency-dependent time delay (dispersion) to the emitted signal. Removing dispersion enables searches for transient signals like Fast Radio Bursts (FRB) or repeating signals from isolated pulsars or those in orbit around other compact objects. The sheer volume and high resolution of data that next generation radio telescopes will produce require High-Performance Computing (HPC) solutions and algorithms to be used in time-domain data processing pipelines to extract scientifically valuable results in real-time. This paper presents a state-of-the-art implementation of brute force incoherent dedispersion on NVIDIA GPUs, and on Intel and AMD CPUs. We show that our implementation is 4x faster (8-bit 8192 channels input) than other available solutions and demonstrate, using 11 existing telescopes, that our implementation is at least 20 faster than real-time. This work is part of the AstroAccelerate package.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Implementing CUDA Streams into AstroAccelerate -- A Case Study
Authors:
Jan Novotný,
Karel Adámek,
Wes Armour
Abstract:
To be able to run tasks asynchronously on NVIDIA GPUs a programmer must explicitly implement asynchronous execution in their code using the syntax of CUDA streams. Streams allow a programmer to launch independent concurrent execution tasks, providing the ability to utilise different functional units on the GPU asynchronously. For example, it is possible to transfer the results from a previous comp…
▽ More
To be able to run tasks asynchronously on NVIDIA GPUs a programmer must explicitly implement asynchronous execution in their code using the syntax of CUDA streams. Streams allow a programmer to launch independent concurrent execution tasks, providing the ability to utilise different functional units on the GPU asynchronously. For example, it is possible to transfer the results from a previous computation performed on input data n-1, over the PCIe bus whilst computing the result for input data n, by placing different tasks in different CUDA streams. The benefit of such an approach is that the time taken for the data transfer between the host and device can be hidden with computation. This case study deals with the implementation of CUDA streams into AstroAccelerate. AstroAccelerate is a GPU accelerated real-time signal processing pipeline for time-domain radio astronomy.
△ Less
Submitted 6 May, 2021; v1 submitted 4 January, 2021;
originally announced January 2021.
-
Polytropic spheres modelling dark matter halos of dwarf galaxies
Authors:
Jan Novotný,
Zdeněk Stuchlík,
Jan Hladík
Abstract:
Dwarf galaxies and their dark matter (DM) halos have the velocity curves of a different character than those in large galaxies. They are modelled by a simple pseudo iso-thermal model containing only two parameters that do not allow to obtain insight into physics of the DM halo. We would like to obtain some insight into the physical conditions in DM halos of dwarf galaxies by using a simple physica…
▽ More
Dwarf galaxies and their dark matter (DM) halos have the velocity curves of a different character than those in large galaxies. They are modelled by a simple pseudo iso-thermal model containing only two parameters that do not allow to obtain insight into physics of the DM halo. We would like to obtain some insight into the physical conditions in DM halos of dwarf galaxies by using a simple physically based model of DM halos. In order to treat a diversity of the dwarf galaxy velocity profiles in a unifying framework, we apply the polytropic spheres characterised by the polytropic index $n$ and the relativistic parameter $σ$ as a model of dwarf-galaxy DM halos and match the velocity of circular geodesics of the polytropes to the velocity curves observed in the dwarf galaxies from the LITTLE THINGS ensemble. We introduce three classes of the LITTLE THINGS dwarf galaxies in accord with the polytrope models, due to the different character of the velocity profile. The first class corresponds to polytropes having $n < 1$ with linearly increasing velocity along with the whole profile, the second class has $1 < n < 2$ and the velocity profile becomes flat in the external region, the third class has $n > 2$ and the velocity profile reaches a maximum and demonstrated a decline in the external region. The $σ$ parameter has to be strongly non-relativistic ($σ< 10^{-8}$) for all dwarf galaxy models -- it varies for the models of each class, but these variations have a negligible influence on the character of the velocity profile. Our results indicate the possibility that at least two different kinds of dark matter are behind the composition of DM halos. The matches of the observational velocity curves are of the same quality as those obtained by the pseudo-isothermal, core-like models of dwarf galaxy DM halos.
△ Less
Submitted 25 January, 2021; v1 submitted 4 January, 2021;
originally announced January 2021.
-
Development of production-ready GPU data processing pipeline software for AstroAccelerate
Authors:
Cees Carels,
Karel Adámek,
Jan Novotný,
Wesley Armour
Abstract:
Upcoming large scale telescope projects such as the Square Kilometre Array (SKA) will see high data rates and large data volumes; requiring tools that can analyse telescope event data quickly and accurately. In modern radio telescopes, analysis software forms a core part of the data read out, and long-term software stability and maintainability are essential. AstroAccelerate is a many core acceler…
▽ More
Upcoming large scale telescope projects such as the Square Kilometre Array (SKA) will see high data rates and large data volumes; requiring tools that can analyse telescope event data quickly and accurately. In modern radio telescopes, analysis software forms a core part of the data read out, and long-term software stability and maintainability are essential. AstroAccelerate is a many core accelerated software package that uses NVIDIA(R) GPUs to perform realtime analysis of radio telescope data, and it has been shown to be substantially faster than realtime at processing simulated SKA-like data. AstroAccelerate contains optimised GPU implementations of signal processing tools used in radio astronomy including dedispersion, Fourier domain acceleration search, single pulse detection, and others. This article describes the transformation of AstroAccelerate from a C-like prototype code to a production-ready software library with a C++ API and a Python interface; while preserving compatibility with legacy software that is implemented in C. The design of the software library interfaces, refactoring aspects, and coding techniques are discussed.
△ Less
Submitted 16 January, 2020; v1 submitted 16 December, 2019;
originally announced December 2019.
-
Searching for pulsars in extreme orbits -- GPU acceleration of the Fourier domain 'jerk' search
Authors:
Karel Adámek,
Jan Novotný,
Sofia Dimoudi,
Wesley Armour
Abstract:
Binary pulsars are an important target for radio surveys because they present a natural laboratory for a wide range of astrophysics for example testing general relativity, including detection of gravitational waves. The orbital motion of a pulsar which is locked in a binary system causes a frequency shift (a Doppler shift) in their normally very periodic pulse emissions. These shifts cause a reduc…
▽ More
Binary pulsars are an important target for radio surveys because they present a natural laboratory for a wide range of astrophysics for example testing general relativity, including detection of gravitational waves. The orbital motion of a pulsar which is locked in a binary system causes a frequency shift (a Doppler shift) in their normally very periodic pulse emissions. These shifts cause a reduction in the sensitivity of traditional periodicity searches. To correct this smearing Ransom [2001], Ransom et al. [2002] developed the Fourier domain acceleration search (FDAS) which uses a matched filtering technique. This method is however limited to a constant pulsar acceleration. Therefore, Andersen and Ransom [2018] broadened the Fourier domain acceleration search to account also for a linear change in the acceleration by implementing the Fourier domain "jerk" search into the PRESTO software package. This extension increases the number of matched filters used significantly. We have implemented the Fourier domain "jerk" search (JERK) on GPUs using CUDA. We have achieved 90x performance increase when compared to the parallel implementation of JERK in PRESTO. This work is part of the AstroAccelerate project Armour et al. [2019], a many-core accelerated time-domain signal processing library for radio astronomy.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
General relativistic polytropes with a repulsive cosmological constant
Authors:
Zdeněk Stuchlík,
Stanislav Hledík,
Jan Novotný
Abstract:
Spherically symmetric equilibrium configurations of perfect fluid obeying a polytropic equation of state are studied in spacetimes with a repulsive cosmological constant. The configurations are specified in terms of three parameters---the polytropic index $n$, the ratio of central pressure and central energy density of matter $σ$, and the ratio of energy density of vacuum and central density of ma…
▽ More
Spherically symmetric equilibrium configurations of perfect fluid obeying a polytropic equation of state are studied in spacetimes with a repulsive cosmological constant. The configurations are specified in terms of three parameters---the polytropic index $n$, the ratio of central pressure and central energy density of matter $σ$, and the ratio of energy density of vacuum and central density of matter $λ$. The static equilibrium configurations are determined by two coupled first-order nonlinear differential equations that are solved by numerical methods with the exception of polytropes with $n=0$ corresponding to the configurations with a uniform distribution of energy density, when the solution is given in terms of elementary functions. The geometry of the polytropes is conveniently represented by embedding diagrams of both the ordinary space geometry and the optical reference geometry reflecting some dynamical properties of the geodesic motion. The polytropes are represented by radial profiles of energy density, pressure, mass, and metric coefficients. For all tested values of $n>0$, the static equilibrium configurations with fixed parameters $n$, $σ$, are allowed only up to a critical value of the cosmological parameter $λ_{\mathrm{c}}=λ_{\mathrm{c}}(n,σ)$. In the case of $n>3$, the critical value $λ_{\mathrm{c}}$ tends to zero for special values of $σ$. The gravitational potential energy and the binding energy of the polytropes are determined and studied by numerical methods. We discuss in detail the polytropes with an extension comparable to those of the dark matter halos related to galaxies, i.e., with extension $\ell > 100\,\mathrm{kpc}$ and mass $M > 10^{12}\,\mathrm{M}_{\odot}$. ...
△ Less
Submitted 15 November, 2016;
originally announced November 2016.
-
A polyphase filter for many-core architectures
Authors:
Karel Adámek,
Jan Novotný,
Wes Armour
Abstract:
In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards, on dual Intel Xeon CPUs and the Intel Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse t…
▽ More
In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards, on dual Intel Xeon CPUs and the Intel Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this, the first makes use of L1/Texture cache, the second uses shared memory. We discuss the usability of each of our implementations along with their behaviours. We measure performance in execution time, which is a critical factor for real-time systems, we also present results in terms of bandwidth (GB/s), compute (GFlop/s) and type conversions (GTc/s). We include a presentation of our results in terms of the sample rate which can be processed in real-time by a chosen platform, which more intuitively describes the expected performance in a signal processing setting. Our findings show that, for the GPUs considered, the performance of our polyphase filter when using lower precision input data is limited by type conversions rather than device bandwidth. We compare these results to an implementation on the Xeon Phi. We show that our Xeon Phi implementation has a performance that is 1.47x to 1.95x greater than our CPU implementation, however is not insufficient to compete with the performance of GPUs. We conclude with a comparison of our best performing code to two other implementations of the polyphase filter, showing that our implementation is faster in nearly all cases. This work forms part of the Astro-Accelerate project, a many-core accelerated real-time data processing library for digital signal processing of time-domain radio astronomy data.
△ Less
Submitted 21 April, 2016; v1 submitted 11 November, 2015;
originally announced November 2015.
-
Unification of Galileon Dualities
Authors:
Karol Kampf,
Jiri Novotny
Abstract:
We study dualities of the general Galileon theory in d dimensions in terms of coordinate transformations on the coset space corresponding to the spontaneously broken Galileon group. The most general duality transformation is found to be determined uniquely up to four free parameters and under compositions these transformations form a group which can be identified with GL(2,R). This group represent…
▽ More
We study dualities of the general Galileon theory in d dimensions in terms of coordinate transformations on the coset space corresponding to the spontaneously broken Galileon group. The most general duality transformation is found to be determined uniquely up to four free parameters and under compositions these transformations form a group which can be identified with GL(2,R). This group represents a unified framework for all the up to now known Galileon dualities. We discuss a representation of this group on the Galileon theory space and using concrete examples we illustrate its applicability both on the classical and quantum level.
△ Less
Submitted 7 October, 2014; v1 submitted 26 March, 2014;
originally announced March 2014.
-
Phase Mixing in Unperturbed and Perturbed Hamiltonian Systems
Authors:
Henry E. Kandrup,
Steven J. Novotny
Abstract:
This paper summarises a numerical investigation of phase mixing in time-independent Hamiltonian systems that admit a coexistence of regular and chaotic phase space regions, allowing also for low amplitude perturbations idealised as periodic driving, friction, and/or white and colored noise. The evolution of initially localised ensembles of orbits was probed through lower order moments and coarse…
▽ More
This paper summarises a numerical investigation of phase mixing in time-independent Hamiltonian systems that admit a coexistence of regular and chaotic phase space regions, allowing also for low amplitude perturbations idealised as periodic driving, friction, and/or white and colored noise. The evolution of initially localised ensembles of orbits was probed through lower order moments and coarse-grained distribution functions. In the absence of time-dependent perturbations, regular ensembles disperse initially as a power law in time and only exhibit a coarse-grained approach towards an invariant equilibrium over comparatively long times. Chaotic ensembles generally diverge exponentially fast on a time scale related to a typical finite time Lyapunov exponent, but can exhibit complex behaviour if they are impacted by the effects of cantori or the Arnold web. Viewed over somewhat longer times, chaotic ensembles typical converge exponentially towards an invariant or near-invariant equilibrium. This, however, need not correspond to a true equilibrium, which may only be approached over very long time scales. Time-dependent perturbations can dramatically increase the efficiency of phase mixing, both by accelerating the approach towards a near-equilibrium and by facilitating diffusion through cantori or along the Arnold web so as to accelerate the approach towards a true equilibrium. The efficacy of such perturbations typically scales logarithmically in amplitude, but is comparatively insensitive to most other details, a conclusion which reinforces the interpretation that the perturbations act via a resonant coupling.
△ Less
Submitted 1 April, 2002;
originally announced April 2002.