-
Loading-dependent microscale measures control bulk properties in granular material: an experimental test of the Stress-Force-Fabric relation
Authors:
Carmen L. Lee,
Ephraim Bililign,
Emilien Azéma,
Karen E. Daniels
Abstract:
The bulk behaviour of granular materials is tied to its mesoscale and particle-scale features: strength properties arise from the buildup of various anisotropic structures at the particle-scale induced by grain connectivity (fabric), force transmission, and frictional mobilization. More fundamentally, these anisotropic structures work collectively to define features like the bulk friction coeffici…
▽ More
The bulk behaviour of granular materials is tied to its mesoscale and particle-scale features: strength properties arise from the buildup of various anisotropic structures at the particle-scale induced by grain connectivity (fabric), force transmission, and frictional mobilization. More fundamentally, these anisotropic structures work collectively to define features like the bulk friction coefficient and the stress tensor at the macroscale and can be explained by the Stress-Force-Fabric (SFF) relationship stemming from the microscale. Although the SFF relation has been extensively verified by discrete numerical simulations, a laboratory realization has remained elusive due to the challenge of measuring both normal and frictional contact forces. In this study, we analyze experiments performed on a photoelastic granular system under four different loading conditions: uniaxial compression, isotropic compression, pure shear, and annular shear. During these experiments, we record particle locations, contacts, and normal and frictional forces vectors to measure the particle-scale response to progressing strain. We track microscale measures like the packing fraction, average coordination number and average normal force along with anisotropic distributions of contacts and forces. We match the particle-scale anisotropy to the bulk using the SFF relation, which is founded on two key principles, a Stress Rule to describe the stress tensor and a Sum Rule to describe the bulk friction coefficient; we find that the Sum and Stress Rules accurately describe bulk measurements. Additionally, we test the assumption that fabric and forces transmit load equally through our granular packings and show that this assumption is sufficient at large strain values, and can be applied to areas like rock mechanics, soft colloids, or cellular tissue where force information is inaccessible.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
Chalcogenide Metasurfaces Enabling Ultra-Wideband Detectors from Visible to Mid-infrared
Authors:
Shutao Zhang,
Shu An,
Mingjin Dai,
Qing Yang Steve Wu,
Nur Qalishah Adanan,
Jun Zhang,
Yan Liu,
Henry Yit Loong Lee,
Nancy Lai Mun Wong,
Ady Suwardi,
Jun Ding,
Robert Edward Simpson,
Qi Jie Wang,
Joel K. W. Yang,
Zhaogang Dong
Abstract:
Thermoelectric materials can be designed to support optical resonances across multiple spectral ranges to enable ultra-wide band photodetection. For instance, antimony telluride (Sb2Te3) chalcogenide exhibits interband plasmonic resonances in the visible range and Mie resonances in the mid-infrared (mid-IR) range, while simultaneously possessing large thermoelectric Seebeck coefficients. In this p…
▽ More
Thermoelectric materials can be designed to support optical resonances across multiple spectral ranges to enable ultra-wide band photodetection. For instance, antimony telluride (Sb2Te3) chalcogenide exhibits interband plasmonic resonances in the visible range and Mie resonances in the mid-infrared (mid-IR) range, while simultaneously possessing large thermoelectric Seebeck coefficients. In this paper, we designed and fabricated Sb2Te3 metasurface devices to achieve resonant absorption for enabling photodetectors operating across an ultra-wideband spectrum, from visible to mid-IR. Furthermore, relying on asymmetric Sb2Te3 metasurface, we demonstrated the thermoelectric photodetectors with polarization-selectivity. This work provides a potential platform towards the portable ultrawide band spectrometers at room temperature, for environmental sensing applications.
△ Less
Submitted 7 September, 2024;
originally announced September 2024.
-
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
Authors:
Lazar Atanackovic,
Xi Zhang,
Brandon Amos,
Mathieu Blanchette,
Leo J. Lee,
Yoshua Bengio,
Alexander Tong,
Kirill Neklyudov
Abstract:
Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynamics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the p…
▽ More
Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynamics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the population level - they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities. That is, the change of the population at any moment in time depends on the population itself due to the interactions between samples. In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depends on the microenvironment of cells specific to each patient. We propose Meta Flow Matching (MFM), a practical approach to integrating along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations. Namely, we embed the population of samples using a Graph Neural Network (GNN) and use these embeddings to train a Flow Matching model. This gives MFM the ability to generalize over the initial distributions unlike previously proposed methods. We demonstrate the ability of MFM to improve prediction of individual treatment responses on a large scale multi-patient single-cell drug screen dataset.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Exploring the Feasibility of Automated Data Standardization using Large Language Models for Seamless Positioning
Authors:
Max J. L. Lee,
Ju Lin,
Li-Ta Hsu
Abstract:
We propose a feasibility study for real-time automated data standardization leveraging Large Language Models (LLMs) to enhance seamless positioning systems in IoT environments. By integrating and standardizing heterogeneous sensor data from smartphones, IoT devices, and dedicated systems such as Ultra-Wideband (UWB), our study ensures data compatibility and improves positioning accuracy using the…
▽ More
We propose a feasibility study for real-time automated data standardization leveraging Large Language Models (LLMs) to enhance seamless positioning systems in IoT environments. By integrating and standardizing heterogeneous sensor data from smartphones, IoT devices, and dedicated systems such as Ultra-Wideband (UWB), our study ensures data compatibility and improves positioning accuracy using the Extended Kalman Filter (EKF). The core components include the Intelligent Data Standardization Module (IDSM), which employs a fine-tuned LLM to convert varied sensor data into a standardized format, and the Transformation Rule Generation Module (TRGM), which automates the creation of transformation rules and scripts for ongoing data standardization. Evaluated in real-time environments, our study demonstrates adaptability and scalability, enhancing operational efficiency and accuracy in seamless navigation. This study underscores the potential of advanced LLMs in overcoming sensor data integration complexities, paving the way for more scalable and precise IoT navigation solutions.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Light scrambling and focal ratio degradation of thin multimode fibers with different core geometries
Authors:
Man-Yin Leo Lee,
Zhiheng Lin,
Chit-Ho Hui,
Renbin Yan,
YiuHung Cheung,
Horace Tsz-Hong Hung,
Matthew A. Bershady,
Sabysachi Chattopadhyay,
Michael P. Smith
Abstract:
The performance of fiber-fed astronomical spectrographs is highly influenced by the properties of fibers. The near-field and far-field scrambling characteristics have a profound impact on the line spread function (LSF) of the spectra. Focal ratio degradation (FRD) influences the output beam size, thereby affecting the throughput, as well as the size of the collimator and dispersion elements. While…
▽ More
The performance of fiber-fed astronomical spectrographs is highly influenced by the properties of fibers. The near-field and far-field scrambling characteristics have a profound impact on the line spread function (LSF) of the spectra. Focal ratio degradation (FRD) influences the output beam size, thereby affecting the throughput, as well as the size of the collimator and dispersion elements. While previous research has indicated that these properties depend on the shape of the fiber core and showed that non-circular core fibers can yield uniform near-field scrambling, the result remains inconclusive for far-field. In this study, we investigate the near-field and far-field scrambling properties, along with the FRD, of 50-micron core fibers with different core geometries. We find that in addition to excellent near-field scrambling, octagonal-core fibers can also produce more uniform far-field output when compared to circular-core fibers. They also have less FRD effect when being fed with a f/3 beam.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Ionized gas in quiescent galaxies: Temperature measurement and constraint on the ionization source
Authors:
Man-Yin Leo Lee,
Renbin Yan,
Xihan Ji,
Gerome Algodon,
Kyle Westfall,
Zesen Lin,
Francesco Belfiore,
Kevin Bundy
Abstract:
In non-star-forming, passively evolving galaxies, regions with emission lines dominated by low-ionization species are classified as Low-Ionization Emission Regions (LIERs). The ionization mechanism behind such regions has long been a mystery. Active Galactic Nuclei (AGNs), which were once believed to be the source, have been found not to be the dominant mechanism, especially in regions distant fro…
▽ More
In non-star-forming, passively evolving galaxies, regions with emission lines dominated by low-ionization species are classified as Low-Ionization Emission Regions (LIERs). The ionization mechanism behind such regions has long been a mystery. Active Galactic Nuclei (AGNs), which were once believed to be the source, have been found not to be the dominant mechanism, especially in regions distant from the galaxy nuclei. The remaining candidates, photoionization by post-Asymtopic Giant Branch (pAGB) stars and interstellar shocks can only be distinguished with in-depth analysis. As the temperature predictions of these two models differ, temperature measurements can provide strong constraints on this puzzle. We selected a sample of 2795 quiescent red-sequence galaxies from the Sloan Digital Sky Survey IV (SDSS-IV) Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey. We divided the sample spectra into three groups based on their [N II]/H$α$ flux ratio and utilized stacking techniques to improve the signal-to-noise ratio of the observed spectra. We determined the temperature of [O III], [N II], [S II], and [O II] through their temperature-sensitive emission line ratios. Subsequently, we compared the measured temperatures with predictions from different models. The results demonstrate consistency with the interstellar shock model with preshock density n = 1 cm$^{-3}$ and solar metallicity, thus supporting shocks as the dominant ionization source of LIERs. Additionally, we also find that the interstellar dust extinction value measured through the Balmer decrement appears to be larger than that implied by the forbidden line ratios of low-ionization lines.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
MetaDragonBoat: Exploring Paddling Techniques of Virtual Dragon Boating in a Metaverse Campus
Authors:
Wei He,
Xiang Li,
Shengtian Xu,
Yuzheng Chen,
Chan-In Sio,
Ge Lin Kan,
Lik-Hang Lee
Abstract:
The preservation of cultural heritage, as mandated by the United Nations Sustainable Development Goals (SDGs), is integral to sustainable urban development. This paper focuses on the Dragon Boat Festival, a prominent event in Chinese cultural heritage, and proposes leveraging Virtual Reality (VR), to enhance its preservation and accessibility. Traditionally, participation in the festival's dragon…
▽ More
The preservation of cultural heritage, as mandated by the United Nations Sustainable Development Goals (SDGs), is integral to sustainable urban development. This paper focuses on the Dragon Boat Festival, a prominent event in Chinese cultural heritage, and proposes leveraging Virtual Reality (VR), to enhance its preservation and accessibility. Traditionally, participation in the festival's dragon boat races was limited to elite athletes, excluding broader demographics. Our proposed solution, named MetaDragonBoat, enables virtual participation in dragon boat racing, offering immersive experiences that replicate physical exertion through a cultural journey. Thus, we build a digital twin of a university campus located in a region with a rich dragon boat racing tradition. Coupled with three paddling techniques that are enabled by either commercial controllers or physical paddle controllers with haptic feedback, diversified users can engage in realistic rowing experiences. Our results demonstrate that by integrating resistance into the paddle controls, users could simulate the physical effort of dragon boat racing, promoting a deeper understanding and appreciation of this cultural heritage.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
The ALMA-CRISTAL Survey: Spatial extent of [CII] line emission in star-forming galaxies at $z=4-6$
Authors:
Ryota Ikeda,
Ken-ichi Tadaki,
Ikki Mitsuhashi,
Manuel Aravena,
Ilse De Looze,
Natascha M. Förster Schreiber,
Jorge González-López,
Rodrigo Herrera-Camus,
Justin Spilker,
Loreto Barcos-Muñoz,
Elisabete da Cunha,
Rebecca Davies,
Tanio Díaz-Santos,
Andrea Ferrara,
Meghana Killi,
Lilian L. Lee,
Juno Li,
Dieter Lutz,
Renske Smit,
Manuel Solimano,
Kseniia Telikova,
Hannah Übler,
Sylvain Veilleux,
Vicente Villanueva
Abstract:
We investigate the spatial extent of the [CII] line emission in a sample of 34 galaxies at $z=4-6$ from the ALMA-CRISTAL Survey. By modeling the [CII] line emission in the visibility data directly, we derive the effective radius of [CII] line emission assuming exponential distribution. These measurements comprise not only isolated galaxies but also interacting systems, identified thanks to the hig…
▽ More
We investigate the spatial extent of the [CII] line emission in a sample of 34 galaxies at $z=4-6$ from the ALMA-CRISTAL Survey. By modeling the [CII] line emission in the visibility data directly, we derive the effective radius of [CII] line emission assuming exponential distribution. These measurements comprise not only isolated galaxies but also interacting systems, identified thanks to the high spatial resolution of the data. The [CII] line radius ranges from 0.5 to 3.5 kpc with an average value of 1.9 kpc. We compare the [CII] sizes with the sizes of UV and FIR continua, which were measured from the HST F160W and ALMA Band-7 continuum images, respectively. We confirm that the [CII] line emission is more spatially extended than the continuum emission, with average size ratios of $R_{e,[CII]}/R_{e,UV}=2.90$ and $R_{e,[CII]}/R_{e,FIR}=1.54$, although about half of the FIR-detected sample show comparable spatial extent between [CII] line and FIR continuum emission ($R_{e,[CII]}\approx R_{e, FIR}$). The residual visibility data of the best-fit model do not show evidence of flux excesses either individually or in stacking analysis. This indicates that the [CII] line emission in star-forming galaxies can be characterized by an extended exponential disk profile. Overall, our results suggest that the spatial extent of [CII] line emission can primarily be explained by photodissociation regions associated with star formation activity, while the contribution from diffuse neutral medium (atomic gas) and the effects of mergers may further expand the [CII] line distributions, causing their variations among our sample. We report the correlations between the [CII] line, dust, and Lya line properties, which may be in line with our scenario. Future 3D-analysis of Lya and Ha lines will shed light on the association of the extended [CII] line emission with atomic gas and outflows.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Quantum Computing for Climate Resilience and Sustainability Challenges
Authors:
Kin Tung Michael Ho,
Kuan-Cheng Chen,
Lily Lee,
Felix Burt,
Shang Yu,
Po-Heng,
Lee
Abstract:
The escalating impacts of climate change and the increasing demand for sustainable development and natural resource management necessitate innovative technological solutions. Quantum computing (QC) has emerged as a promising tool with the potential to revolutionize these critical areas. This review explores the application of quantum machine learning and optimization techniques for climate change…
▽ More
The escalating impacts of climate change and the increasing demand for sustainable development and natural resource management necessitate innovative technological solutions. Quantum computing (QC) has emerged as a promising tool with the potential to revolutionize these critical areas. This review explores the application of quantum machine learning and optimization techniques for climate change prediction and enhancing sustainable development. Traditional computational methods often fall short in handling the scale and complexity of climate models and natural resource management. Quantum advancements, however, offer significant improvements in computational efficiency and problem-solving capabilities. By synthesizing the latest research and developments, this paper highlights how QC and quantum machine learning can optimize multi-infrastructure systems towards climate neutrality. The paper also evaluates the performance of current quantum algorithms and hardware in practical applications and presents realistic cases, i.e., waste-to-energy in anaerobic digestion, disaster prevention in flooding prediction, and new material development for carbon capture. The integration of these quantum technologies promises to drive significant advancements in achieving climate resilience and sustainable development.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Evidence-Based Temporal Fact Verification
Authors:
Anab Maulana Barik,
Wynne Hsu,
Mong Li Lee
Abstract:
Automated fact verification plays an essential role in fostering trust in the digital space. Despite the growing interest, the verification of temporal facts has not received much attention in the community. Temporal fact verification brings new challenges where cues of the temporal information need to be extracted and temporal reasoning involving various temporal aspects of the text must be appli…
▽ More
Automated fact verification plays an essential role in fostering trust in the digital space. Despite the growing interest, the verification of temporal facts has not received much attention in the community. Temporal fact verification brings new challenges where cues of the temporal information need to be extracted and temporal reasoning involving various temporal aspects of the text must be applied. In this work, we propose an end-to-end solution for temporal fact verification that considers the temporal information in claims to obtain relevant evidence sentences and harness the power of large language model for temporal reasoning. Recognizing that temporal facts often involve events, we model these events in the claim and evidence sentences. We curate two temporal fact datasets to learn time-sensitive representations that encapsulate not only the semantic relationships among the events, but also their chronological proximity. This allows us to retrieve the top-k relevant evidence sentences and provide the context for a large language model to perform temporal reasoning and outputs whether a claim is supported or refuted by the retrieved evidence sentences. Experiment results demonstrate that the proposed approach significantly enhances the accuracy of temporal claim verification, thereby advancing current state-of-the-art in automated fact verification.
△ Less
Submitted 18 August, 2024; v1 submitted 21 July, 2024;
originally announced July 2024.
-
A hidden AGN powering bright [O III] nebulae in a protocluster core at $z=4.5$ revealed by JWST
Authors:
M. Solimano,
J. González-López,
M. Aravena,
B. Alcalde Pampliega,
R. J. Assef,
M. Béthermin,
M. Boquien,
S. Bovino,
C. M. Casey,
P. Cassata,
E. da Cunha,
R. L. Davies,
I. De Looze,
X. Ding,
T. Díaz-Santos,
A. L. Faisst,
A. Ferrara,
D. B. Fisher,
N. M. Förster-Schreiber,
S. Fujimoto,
M. Ginolfi,
C. Gruppioni,
L. Guaita,
N. Hathi,
R. Herrera-Camus
, et al. (26 additional authors not shown)
Abstract:
We present new JWST/NIRSpec IFU observations of the J1000+0234 system at $z=4.54$, the dense core of a galaxy protocluster hosting a massive, dusty star forming galaxy (DSFG) with a low luminosity radio counterpart. The new data reveals two extended, high equivalent width (EW$_0 > 1000$ Å) nebulae at each side of the DSFG disk along its minor axis (namely O3-N and O3-S). On one hand, O3-N's spectr…
▽ More
We present new JWST/NIRSpec IFU observations of the J1000+0234 system at $z=4.54$, the dense core of a galaxy protocluster hosting a massive, dusty star forming galaxy (DSFG) with a low luminosity radio counterpart. The new data reveals two extended, high equivalent width (EW$_0 > 1000$ Å) nebulae at each side of the DSFG disk along its minor axis (namely O3-N and O3-S). On one hand, O3-N's spectrum shows a prominent FWHM $\sim1300$ km s$^{-1}$ broad and blueshifted component, suggesting an outflow origin. On the other hand, O3-S stretches over parsec and has a velocity gradient that spans $800$ km s$^{-1}$ but no evidence of a broad component. Both sources, however, seem to be powered at least partially by an active galactic nucleus (AGN), so we classify them as extended emission-line regions (EELRs). The strongest evidence comes from the detection of the high-ionization [Ne V] $\lambda3427$ line toward O3-N, which paired with the non-detection of hard X-rays implies an obscuring column density above the Compton-thick regime. In O3-S, the [Ne V] line is not detected, but we measure a He II well above the expectation for star formation. We interpret this as O3-S being externally irradiated by the AGN, akin to the famous Hanny's Voorwerp object in the local Universe. In addition, more classical line ratio diagnostics (e.g. [O III]/H$β$ vs [N II]/H$α$) put the DSFG itself in the AGN region of the diagrams, and hence the most probable host of the AGN. These results showcase the ability of JWST of unveiling highly obscured AGN at high redshifts.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Interim report for the International Muon Collider Collaboration (IMCC)
Authors:
C. Accettura,
S. Adrian,
R. Agarwal,
C. Ahdida,
C. Aimé,
A. Aksoy,
G. L. Alberghi,
S. Alden,
N. Amapane,
D. Amorim,
P. Andreetto,
F. Anulli,
R. Appleby,
A. Apresyan,
P. Asadi,
M. Attia Mahmoud,
B. Auchmann,
J. Back,
A. Badea,
K. J. Bae,
E. J. Bahng,
L. Balconi,
F. Balli,
L. Bandiera,
C. Barbagallo
, et al. (362 additional authors not shown)
Abstract:
The International Muon Collider Collaboration (IMCC) [1] was established in 2020 following the recommendations of the European Strategy for Particle Physics (ESPP) and the implementation of the European Strategy for Particle Physics-Accelerator R&D Roadmap by the Laboratory Directors Group [2], hereinafter referred to as the the European LDG roadmap. The Muon Collider Study (MuC) covers the accele…
▽ More
The International Muon Collider Collaboration (IMCC) [1] was established in 2020 following the recommendations of the European Strategy for Particle Physics (ESPP) and the implementation of the European Strategy for Particle Physics-Accelerator R&D Roadmap by the Laboratory Directors Group [2], hereinafter referred to as the the European LDG roadmap. The Muon Collider Study (MuC) covers the accelerator complex, detectors and physics for a future muon collider. In 2023, European Commission support was obtained for a design study of a muon collider (MuCol) [3]. This project started on 1st March 2023, with work-packages aligned with the overall muon collider studies. In preparation of and during the 2021-22 U.S. Snowmass process, the muon collider project parameters, technical studies and physics performance studies were performed and presented in great detail. Recently, the P5 panel [4] in the U.S. recommended a muon collider R&D, proposed to join the IMCC and envisages that the U.S. should prepare to host a muon collider, calling this their "muon shot". In the past, the U.S. Muon Accelerator Programme (MAP) [5] has been instrumental in studies of concepts and technologies for a muon collider.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
On the blow-up formula of the Chow weights for polarized toric manifolds
Authors:
King Leung Lee,
Naoto Yotsutani
Abstract:
Let $X$ be a smooth projective toric variety and let $\widetilde{X}$ be the blow-up manifold of $X$ at finitely many distinct tours invariants points of $X$. In this paper, we give an explicit combinatorial formula of the Chow weight of $\widetilde{X}$ in terms of the base toric manifold $X$ and the symplectic cuts of the Delzant polytope. We then apply this blow-up formula to the projective plane…
▽ More
Let $X$ be a smooth projective toric variety and let $\widetilde{X}$ be the blow-up manifold of $X$ at finitely many distinct tours invariants points of $X$. In this paper, we give an explicit combinatorial formula of the Chow weight of $\widetilde{X}$ in terms of the base toric manifold $X$ and the symplectic cuts of the Delzant polytope. We then apply this blow-up formula to the projective plane and see the difference of Chow stability between the toric blow-up manifolds and the manifolds of blow-ups at general points. Finally, we detect the blow-up formula of the Futaki-Ono invariant which is an obstruction for asymptotic Chow semistability of a polarized toric manifold.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
SIDQL: An Efficient Keyframe Extraction and Motion Reconstruction Framework in Motion Capture
Authors:
Xuling Zhang,
Ziru Zhang,
Yuyang Wang,
Lik-hang Lee,
Pan Hui
Abstract:
Metaverse, which integrates the virtual and physical worlds, has emerged as an innovative paradigm for changing people's lifestyles. Motion capture has become a reliable approach to achieve seamless synchronization of the movements between avatars and human beings, which plays an important role in diverse Metaverse applications. However, due to the continuous growth of data, current communication…
▽ More
Metaverse, which integrates the virtual and physical worlds, has emerged as an innovative paradigm for changing people's lifestyles. Motion capture has become a reliable approach to achieve seamless synchronization of the movements between avatars and human beings, which plays an important role in diverse Metaverse applications. However, due to the continuous growth of data, current communication systems face a significant challenge of meeting the demand of ultra-low latency during application. In addition, current methods also have shortcomings when selecting keyframes, e.g., relying on recognizing motion types and artificially selected keyframes. Therefore, the utilization of keyframe extraction and motion reconstruction techniques could be considered a feasible and promising solution. In this work, a new motion reconstruction algorithm is designed in a spherical coordinate system involving location and velocity information. Then, we formalize the keyframe extraction problem into an optimization problem to reduce the reconstruction error. Using Deep Q-Learning (DQL), the Spherical Interpolation based Deep Q-Learning (SIDQL) framework is proposed to generate proper keyframes for reconstructing the motion sequences. We use the CMU database to train and evaluate the framework. Our scheme can significantly reduce the data volume and transmission latency compared to various baselines while maintaining a reconstruction error of less than 0.09 when extracting five keyframes.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Unidirectional Chiral Emission via Twisted Bi-layer Metasurfaces
Authors:
Dmitrii Gromyko,
Shu An,
Sergey Gorelik,
Jiahui Xu,
Li Jun Lim,
Henry Yit Loong Lee,
Febiana Tjiptoharsono,
Zhi-Kuang Tan,
Cheng-Wei Qiu,
Zhaogang Dong,
Lin Wu
Abstract:
Controlling and channelling light emissions from unpolarized quantum dots into specific directions with chiral polarization remains a key challenge in modern photonics. Stacked metasurface designs offer a potential compact solution for chirality and directionality engineering. However, experimental observations of directional chiral radiation from resonant metasurfaces with quantum emitters remain…
▽ More
Controlling and channelling light emissions from unpolarized quantum dots into specific directions with chiral polarization remains a key challenge in modern photonics. Stacked metasurface designs offer a potential compact solution for chirality and directionality engineering. However, experimental observations of directional chiral radiation from resonant metasurfaces with quantum emitters remain obscure. In this paper, we present experimental observations of unidirectional chiral emission from a twisted bi-layer metasurface via multi-dimensional control, including twist angle, interlayer distance, and lateral displacement between the top and bottom layers, as enabled by doublet alignment lithography (DAL). First, maintaining alignment, the metasurface demonstrates a resonant intrinsic optical chirality with near-unity circular dichroism of 0.94 and reflectance difference of 74%, where a high circular dichroism greater than 0.9 persists across a wide range of angles from -11 to 11 degrees. Second, engineered lateral displacement induces a unidirectional chiral resonance, resulting in unidirectional chiral emission from the quantum dots deposited onto the metasurface. Our bi-layer metasurfaces offer a universal compact platform for efficient radiation manipulation over a wide angular range, promising potential applications in miniaturized lasers, grating couplers, and chiral nanoantennas.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Examining the Legal Status of Digital Assets as Property: A Comparative Analysis of Jurisdictional Approaches
Authors:
Luke Lee
Abstract:
This paper examines the complex legal landscape surrounding digital assets, analysing how they are defined and regulated as property across various jurisdictions. As digital assets such as cryptocurrencies and non-fungible tokens (NFTs) increasingly integrate with global economies, their intangible nature presents unique challenges to traditional property law concepts, necessitating a re-evaluatio…
▽ More
This paper examines the complex legal landscape surrounding digital assets, analysing how they are defined and regulated as property across various jurisdictions. As digital assets such as cryptocurrencies and non-fungible tokens (NFTs) increasingly integrate with global economies, their intangible nature presents unique challenges to traditional property law concepts, necessitating a re-evaluation of legal definitions and ownership frameworks. This research presents a comparative analysis, reviewing how different legal systems classify and manage digital assets within property law, highlighting the variations in regulatory approaches and their implications on ownership, transfer, and inheritance rights. By examining seminal cases and regulatory developments in major jurisdictions, including the United States, the European Union, and Singapore, this paper explores the emerging trends and potential legal evolutions that could influence the global handling of digital assets. The study aims to contribute to the scholarly discourse by proposing a harmonized approach to digital asset regulation, seeking to balance innovation with legal certainty and consumer protection.
△ Less
Submitted 26 April, 2024;
originally announced June 2024.
-
Financial Assets Dependency Prediction Utilizing Spatiotemporal Patterns
Authors:
Haoren Zhu,
Pengfei Zhao,
Wilfred Siu Hung NG,
Dik Lun Lee
Abstract:
Financial assets exhibit complex dependency structures, which are crucial for investors to create diversified portfolios to mitigate risk in volatile financial markets. To explore the financial asset dependencies dynamics, we propose a novel approach that models the dependencies of assets as an Asset Dependency Matrix (ADM) and treats the ADM sequences as image sequences. This allows us to leverag…
▽ More
Financial assets exhibit complex dependency structures, which are crucial for investors to create diversified portfolios to mitigate risk in volatile financial markets. To explore the financial asset dependencies dynamics, we propose a novel approach that models the dependencies of assets as an Asset Dependency Matrix (ADM) and treats the ADM sequences as image sequences. This allows us to leverage deep learning-based video prediction methods to capture the spatiotemporal dependencies among assets. However, unlike images where neighboring pixels exhibit explicit spatiotemporal dependencies due to the natural continuity of object movements, assets in ADM do not have a natural order. This poses challenges to organizing the relational assets to reveal better the spatiotemporal dependencies among neighboring assets for ADM forecasting. To tackle the challenges, we propose the Asset Dependency Neural Network (ADNN), which employs the Convolutional Long Short-Term Memory (ConvLSTM) network, a highly successful method for video prediction. ADNN can employ static and dynamic transformation functions to optimize the representations of the ADM. Through extensive experiments, we demonstrate that our proposed framework consistently outperforms the baselines in the ADM prediction and downstream application tasks. This research contributes to understanding and predicting asset dependencies, offering valuable insights for financial market participants.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation
Authors:
Jiaming Liu,
Mengzhen Liu,
Zhenyu Wang,
Lily Lee,
Kaichen Zhou,
Pengju An,
Senqiao Yang,
Renrui Zhang,
Yandong Guo,
Shanghang Zhang
Abstract:
A fundamental objective in robot manipulation is to enable models to comprehend visual scenes and execute actions. Although existing robot Multimodal Large Language Models (MLLMs) can handle a range of basic tasks, they still face challenges in two areas: 1) inadequate reasoning ability to tackle complex tasks, and 2) high computational costs for MLLM fine-tuning and inference. The recently propos…
▽ More
A fundamental objective in robot manipulation is to enable models to comprehend visual scenes and execute actions. Although existing robot Multimodal Large Language Models (MLLMs) can handle a range of basic tasks, they still face challenges in two areas: 1) inadequate reasoning ability to tackle complex tasks, and 2) high computational costs for MLLM fine-tuning and inference. The recently proposed state space model (SSM) known as Mamba demonstrates promising capabilities in non-trivial sequence modeling with linear inference complexity. Inspired by this, we introduce RoboMamba, an end-to-end robotic MLLM that leverages the Mamba model to deliver both robotic reasoning and action capabilities, while maintaining efficient fine-tuning and inference. Specifically, we first integrate the vision encoder with Mamba, aligning visual data with language embedding through co-training, empowering our model with visual common sense and robot-related reasoning. To further equip RoboMamba with action pose prediction abilities, we explore an efficient fine-tuning strategy with a simple policy head. We find that once RoboMamba possesses sufficient reasoning capability, it can acquire manipulation skills with minimal fine-tuning parameters (0.1\% of the model) and time (20 minutes). In experiments, RoboMamba demonstrates outstanding reasoning capabilities on general and robotic evaluation benchmarks. Meanwhile, our model showcases impressive pose prediction results in both simulation and real-world experiments, achieving inference speeds 7 times faster than existing robot MLLMs. Our project web page: https://sites.google.com/view/robomamba-web
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Anomalous 4$f$ fine structure in TmSe$_{1-x}$Te$_x$ across the metal-insulator transition
Authors:
C. -H. Min,
S. Müller,
W. J. Choi,
L. Dudy,
V. Zabolotny,
M. Heber,
J. D. Denlinger,
C. -J. Kang,
M. Kalläne,
N. Wind,
M. Scholz,
T. L. Lee,
C. Schlueter,
A. Gloskovskii,
E. D. L. Rienks,
V. Hinkov,
H. Bentmann,
Y. S. Kwon,
F. Reinert,
K. Rossnagel
Abstract:
Hybridization between localized 4$f$ and itinerant 5$d$6$s$ states in heavy fermion compounds is a well-studied phenomenon and commonly captured by the paradigmatic Anderson model. However, the investigation of additional electronic interactions, beyond the standard Anderson model, has been limited, despite their predicted important role in the exotic quasiparticle formation in mixed-valence syste…
▽ More
Hybridization between localized 4$f$ and itinerant 5$d$6$s$ states in heavy fermion compounds is a well-studied phenomenon and commonly captured by the paradigmatic Anderson model. However, the investigation of additional electronic interactions, beyond the standard Anderson model, has been limited, despite their predicted important role in the exotic quasiparticle formation in mixed-valence systems. We investigate the 4$f$ states in TmSe$_{1-x}$Te$_x$ throughout a semimetal-insulator phase transition, which drastically varies the interactions related to the 4$f$ states. Using synchrotron-based hard x-ray and extreme ultraviolet photoemission spectroscopy, we resolve subtle peak splitting in the 4$f$ peaks near the Fermi level in the mixed-valent semimetal phase. The separation is enhanced by several tens of meV by increasing the lattice parameter by a few percent. Our results elucidate the evolving nature of the 4$f$ state across the phase transition, and provide direct experimental evidence for electronic interactions beyond the standard Anderson model in mixed-valence systems.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
2BP: 2-Stage Backpropagation
Authors:
Christopher Rae,
Joseph K. L. Lee,
James Richings
Abstract:
As Deep Neural Networks (DNNs) grow in size and complexity, they often exceed the memory capacity of a single accelerator, necessitating the sharding of model parameters across multiple accelerators. Pipeline parallelism is a commonly used sharding strategy for training large DNNs. However, current implementations of pipeline parallelism are being unintentionally bottlenecked by the automatic diff…
▽ More
As Deep Neural Networks (DNNs) grow in size and complexity, they often exceed the memory capacity of a single accelerator, necessitating the sharding of model parameters across multiple accelerators. Pipeline parallelism is a commonly used sharding strategy for training large DNNs. However, current implementations of pipeline parallelism are being unintentionally bottlenecked by the automatic differentiation tools provided by ML frameworks. This paper introduces 2-stage backpropagation (2BP). By splitting the backward propagation step into two separate stages, we can reduce idle compute time. We tested 2BP on various model architectures and pipelining schedules, achieving increases in throughput in all cases. Using 2BP, we were able to achieve a 1.70x increase in throughput compared to traditional methods when training a LLaMa-like transformer with 7 billion parameters across 4 GPUs.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation
Authors:
Jiaming Liu,
Chenxuan Li,
Guanqun Wang,
Lily Lee,
Kaichen Zhou,
Sixiang Chen,
Chuyan Xiong,
Jiaxin Ge,
Renrui Zhang,
Shanghang Zhang
Abstract:
Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure action is essential for a practical robotic system. Recently, Multimodal Large Language Models (MLLMs) have shown promise in visual instruction following and demonstrated strong reasoning abilities in va…
▽ More
Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure action is essential for a practical robotic system. Recently, Multimodal Large Language Models (MLLMs) have shown promise in visual instruction following and demonstrated strong reasoning abilities in various tasks. To unleash general MLLMs as an end-to-end robotic agent, we introduce a Self-Corrected (SC)-MLLM, equipping our model not only to predict end-effector poses but also to autonomously recognize and correct failure actions. Specifically, we first conduct parameter-efficient fine-tuning to empower MLLM with pose prediction ability, which is reframed as a language modeling problem. When facing execution failures, our model learns to identify low-level action error causes (i.e., position and rotation errors) and adaptively seeks prompt feedback from experts. Based on the feedback, SC-MLLM rethinks the current failure scene and generates the corrected actions. Furthermore, we design a continuous policy learning method for successfully corrected samples, enhancing the model's adaptability to the current scene configuration and reducing the frequency of expert intervention. To evaluate our SC-MLLM, we conduct extensive experiments in both simulation and real-world settings. SC-MLLM agent significantly improve manipulation accuracy compared to previous state-of-the-art robotic MLLM (ManipLLM), increasing from 57\% to 79\% on seen object categories and from 47\% to 69\% on unseen novel categories.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Cross-Domain Feature Augmentation for Domain Generalization
Authors:
Yingnan Liu,
Yingtian Zou,
Rui Qiao,
Fusheng Liu,
Mong Li Lee,
Wynne Hsu
Abstract:
Domain generalization aims to develop models that are robust to distribution shifts. Existing methods focus on learning invariance across domains to enhance model robustness, and data augmentation has been widely used to learn invariant predictors, with most methods performing augmentation in the input space. However, augmentation in the input space has limited diversity whereas in the feature spa…
▽ More
Domain generalization aims to develop models that are robust to distribution shifts. Existing methods focus on learning invariance across domains to enhance model robustness, and data augmentation has been widely used to learn invariant predictors, with most methods performing augmentation in the input space. However, augmentation in the input space has limited diversity whereas in the feature space is more versatile and has shown promising results. Nonetheless, feature semantics is seldom considered and existing feature augmentation methods suffer from a limited variety of augmented features. We decompose features into class-generic, class-specific, domain-generic, and domain-specific components. We propose a cross-domain feature augmentation method named XDomainMix that enables us to increase sample diversity while emphasizing the learning of invariant representations to achieve domain generalization. Experiments on widely used benchmark datasets demonstrate that our proposed method is able to achieve state-of-the-art performance. Quantitative analysis indicates that our feature augmentation approach facilitates the learning of effective models that are invariant across different domains.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Chow stability of $λ$-stable toric varieties
Authors:
King leung Lee,
Naoto Yotsutani
Abstract:
For a given polarized toric variety, we define the notion of $λ$-stability which is a natural generalization of uniform K-stability. At the neighbourhoods of the vertices of the corresponding moment polytope $Δ$, we consider appropriate triangulations and give a sufficient criteria for a $λ$-stable polarized toric variety $(X,L)$ to be asymptotically Chow polystable when the obstruction of asympto…
▽ More
For a given polarized toric variety, we define the notion of $λ$-stability which is a natural generalization of uniform K-stability. At the neighbourhoods of the vertices of the corresponding moment polytope $Δ$, we consider appropriate triangulations and give a sufficient criteria for a $λ$-stable polarized toric variety $(X,L)$ to be asymptotically Chow polystable when the obstruction of asymptotic Chow semistability (the Futaki-Ono invariant) vanishes. As an application, we prove that any K-semistable polarized smooth toric variety $(X,L)$ with the vanishing Futaki-Ono invariant is asymptotically Chow polystable.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Particle scale anisotropy controls bulk properties in sheared granular materials
Authors:
Carmen L. Lee,
Ephraim Bililign,
Emilien Azéma,
Karen E. Daniels
Abstract:
The bulk dynamics of dense granular materials arise through a combination of particle-scale and mesoscale effects. Theoretical and numerical studies have shown that collective effects are created by particle-scale anisotropic structures such as grain connectivity (fabric), force transmission, and frictional mobilization, all of which influence bulk properties like bulk friction and the stress tens…
▽ More
The bulk dynamics of dense granular materials arise through a combination of particle-scale and mesoscale effects. Theoretical and numerical studies have shown that collective effects are created by particle-scale anisotropic structures such as grain connectivity (fabric), force transmission, and frictional mobilization, all of which influence bulk properties like bulk friction and the stress tensor through the Stress-Force-Fabric (SFF) relationship. To date, establishing the relevance of these effects to laboratory systems has remained elusive due to the challenge of measuring both normal and frictional contact forces at the particle scale. In this study, we perform experiments on a sheared photoelastic granular system in an quasi-2D annular (Couette) cell. During these experiments, we measure particle locations, contacts, and normal and frictional forces vectors during loading. We reconstruct the angular distributions of the contact and force vectors, and extract the corresponding emergent anisotropies for each of these metrics. Finally, we show that the SFF relation quantitatively predicts the relationship between particle scale anisotropies, the stress tensor components, and the bulk friction coefficient, capturing even transient behaviors. As such, this method shows promise for application to other dense particulate systems where fabric anisotropy can provide a useful measure of bulk friction.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Pegasus-v1 Technical Report
Authors:
Raehyuk Jung,
Hyojun Go,
Jaehyuk Yi,
Jiho Jang,
Daniel Kim,
Jay Suh,
Aiden Lee,
Cooper Han,
Jae Lee,
Jeff Kim,
Jin-Young Kim,
Junwan Kim,
Kyle Park,
Lucas Lee,
Mars Ha,
Minjoon Seo,
Abraham Jo,
Ed Park,
Hassan Kianinejad,
SJ Kim,
Tony Moon,
Wade Jeong,
Andrei Popescu,
Esther Kim,
EK Yoon
, et al. (19 additional authors not shown)
Abstract:
This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi…
▽ More
This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Enhancing Financial Inclusion and Regulatory Challenges: A Critical Analysis of Digital Banks and Alternative Lenders Through Digital Platforms, Machine Learning, and Large Language Models Integration
Authors:
Luke Lee
Abstract:
This paper explores the dual impact of digital banks and alternative lenders on financial inclusion and the regulatory challenges posed by their business models. It discusses the integration of digital platforms, machine learning (ML), and Large Language Models (LLMs) in enhancing financial services accessibility for underserved populations. Through a detailed analysis of operational frameworks an…
▽ More
This paper explores the dual impact of digital banks and alternative lenders on financial inclusion and the regulatory challenges posed by their business models. It discusses the integration of digital platforms, machine learning (ML), and Large Language Models (LLMs) in enhancing financial services accessibility for underserved populations. Through a detailed analysis of operational frameworks and technological infrastructures, this research identifies key mechanisms that facilitate broader financial access and mitigate traditional barriers. Additionally, the paper addresses significant regulatory concerns involving data privacy, algorithmic bias, financial stability, and consumer protection. Employing a mixed-methods approach, which combines quantitative financial data analysis with qualitative insights from industry experts, this paper elucidates the complexities of leveraging digital technology to foster financial inclusivity. The findings underscore the necessity of evolving regulatory frameworks that harmonize innovation with comprehensive risk management. This paper concludes with policy recommendations for regulators, financial institutions, and technology providers, aiming to cultivate a more inclusive and stable financial ecosystem through prudent digital technology integration.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Benchmarking Machine Learning Applications on Heterogeneous Architecture using Reframe
Authors:
Christopher Rae,
Joseph K. L. Lee,
James Richings,
Michele Weiland
Abstract:
With the rapid increase in machine learning workloads performed on HPC systems, it is beneficial to regularly perform machine learning specific benchmarks to monitor performance and identify issues. Furthermore, as part of the Edinburgh International Data Facility, EPCC currently hosts a wide range of machine learning accelerators including Nvidia GPUs, the Graphcore Bow Pod64 and Cerebras CS-2, w…
▽ More
With the rapid increase in machine learning workloads performed on HPC systems, it is beneficial to regularly perform machine learning specific benchmarks to monitor performance and identify issues. Furthermore, as part of the Edinburgh International Data Facility, EPCC currently hosts a wide range of machine learning accelerators including Nvidia GPUs, the Graphcore Bow Pod64 and Cerebras CS-2, which are managed via Kubernetes and Slurm. We extended the Reframe framework to support the Kubernetes scheduler backend, and utilise Reframe to perform machine learning benchmarks, and we discuss the preliminary results collected and challenges involved in integrating Reframe across multiple platforms and architectures.
△ Less
Submitted 25 April, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Hyperparameter Selection in Continual Learning
Authors:
Thomas L. Lee,
Sigrid Passano Hellan,
Linus Ericsson,
Elliot J. Crowley,
Amos Storkey
Abstract:
In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparam…
▽ More
In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparameter settings. However, this end-of-training HPO is unrealistic as in practice a learner can only see the stream once. Hence, there is an open question: what HPO framework should a practitioner use for a CL problem in reality? This paper answers this question by evaluating several realistic HPO frameworks. We find that all the HPO frameworks considered, including end-of-training HPO, perform similarly. We therefore advocate using the realistic and most computationally efficient method: fitting the hyperparameters on the first task and then fixing them throughout training.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling
Authors:
Haoran Li,
Haolin Shi,
Wenli Zhang,
Wenjun Wu,
Yong Liao,
Lin Wang,
Lik-hang Lee,
Pengyuan Zhou
Abstract:
Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies.…
▽ More
Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at https://dreamscene-project.github.io .
△ Less
Submitted 19 July, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF
Authors:
Jie Long Lee,
Chen Li,
Gim Hee Lee
Abstract:
We present DiSR-NeRF, a diffusion-guided framework for view-consistent super-resolution (SR) NeRF. Unlike prior works, we circumvent the requirement for high-resolution (HR) reference images by leveraging existing powerful 2D super-resolution models. Nonetheless, independent SR 2D images are often inconsistent across different views. We thus propose Iterative 3D Synchronization (I3DS) to mitigate…
▽ More
We present DiSR-NeRF, a diffusion-guided framework for view-consistent super-resolution (SR) NeRF. Unlike prior works, we circumvent the requirement for high-resolution (HR) reference images by leveraging existing powerful 2D super-resolution models. Nonetheless, independent SR 2D images are often inconsistent across different views. We thus propose Iterative 3D Synchronization (I3DS) to mitigate the inconsistency problem via the inherent multi-view consistency property of NeRF. Specifically, our I3DS alternates between upscaling low-resolution (LR) rendered images with diffusion models, and updating the underlying 3D representation with standard NeRF training. We further introduce Renoised Score Distillation (RSD), a novel score-distillation objective for 2D image resolution. Our RSD combines features from ancestral sampling and Score Distillation Sampling (SDS) to generate sharp images that are also LR-consistent. Qualitative and quantitative results on both synthetic and real-world datasets demonstrate that our DiSR-NeRF can achieve better results on NeRF super-resolution compared with existing works. Code and video results available at the project website.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Dynamic motion trajectory control with nanoradian accuracy for multi-element X-ray optical systems via laser interferometry
Authors:
Sina M Koehlenbeck,
Lance Lee,
Mario D Balcazar,
Ying Chen,
Vincent Esposito,
Jerry Hastings,
Matthias C Hoffmann,
Zhirong Huang,
May-Ling Ng,
Saxon Price,
Takahiro Sato,
Matthew Seaberg,
Yanwen Sun,
Adam White,
Lin Zhang,
Brian Lantz,
Diling Zhu
Abstract:
The past decades have witnessed the development of new X-ray beam sources with brightness growing at a rate surpassing Moore's law. Current and upcoming diffraction limited and fully coherent X-ray beam sources, including multi-bend achromat based synchrotron sources and high repetition rate X-ray free electron lasers, puts increasingly stringent requirements on stability and accuracy of X-ray opt…
▽ More
The past decades have witnessed the development of new X-ray beam sources with brightness growing at a rate surpassing Moore's law. Current and upcoming diffraction limited and fully coherent X-ray beam sources, including multi-bend achromat based synchrotron sources and high repetition rate X-ray free electron lasers, puts increasingly stringent requirements on stability and accuracy of X-ray optics systems. Parasitic motion errors at sub-micro radian scale in beam transport and beam conditioning optics can lead to significant loss of coherence and brightness delivered from source to experiment. To address this challenge, we incorporated optical metrology based on interferometry and differential wavefront sensing as part of the X-ray optics motion control system. A prototype X-ray optics system was constructed following the optical layout of a tunable X-ray cavity. On-line interferometric metrology enabled dynamical feedback to a motion control system to track and compensate for motion errors. The system achieved sub-microradian scale performance, as multiple optical elements are synchronously and continuously adjusted. This first proof of principle measurement demonstrated both the potential and necessity of incorporating optical metrology as part of the motion control architecture for large scale X-ray optical systems such as monochromators, delay lines, and in particular, X-ray cavity systems to enable the next generation cavity-based X-ray free electron lasers.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Towards Massive Interaction with Generalist Robotics: A Systematic Review of XR-enabled Remote Human-Robot Interaction Systems
Authors:
Xian Wang,
Luyao Shen,
Lik-Hang Lee
Abstract:
The rising interest of generalist robots seek to create robots with versatility to handle multiple tasks in a variety of environments, and human will interact with such robots through immersive interfaces. In the context of human-robot interaction (HRI), this survey provides an exhaustive review of the applications of extended reality (XR) technologies in the field of remote HRI. We developed a sy…
▽ More
The rising interest of generalist robots seek to create robots with versatility to handle multiple tasks in a variety of environments, and human will interact with such robots through immersive interfaces. In the context of human-robot interaction (HRI), this survey provides an exhaustive review of the applications of extended reality (XR) technologies in the field of remote HRI. We developed a systematic search strategy based on the PRISMA methodology. From the initial 2,561 articles selected, 100 research papers that met our inclusion criteria were included. We categorized and summarized the domain in detail, delving into XR technologies, including augmented reality (AR), virtual reality (VR), and mixed reality (MR), and their applications in facilitating intuitive and effective remote control and interaction with robotic systems. The survey highlights existing articles on the application of XR technologies, user experience enhancement, and various interaction designs for XR in remote HRI, providing insights into current trends and future directions. We also identified potential gaps and opportunities for future research to improve remote HRI systems through XR technology to guide and inform future XR and robotics research.
△ Less
Submitted 26 March, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Gemma: Open Models Based on Gemini Research and Technology
Authors:
Gemma Team,
Thomas Mesnard,
Cassidy Hardin,
Robert Dadashi,
Surya Bhupatiraju,
Shreya Pathak,
Laurent Sifre,
Morgane Rivière,
Mihir Sanjay Kale,
Juliette Love,
Pouya Tafti,
Léonard Hussenot,
Pier Giuseppe Sessa,
Aakanksha Chowdhery,
Adam Roberts,
Aditya Barua,
Alex Botev,
Alex Castro-Ros,
Ambrose Slone,
Amélie Héliou,
Andrea Tacchetti,
Anna Bulanova,
Antonia Paterson,
Beth Tsai,
Bobak Shahriari
, et al. (83 additional authors not shown)
Abstract:
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge…
▽ More
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.
△ Less
Submitted 16 April, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1110 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 8 August, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
Authors:
Joseph Cho,
Fachrina Dewi Puspitasari,
Sheng Zheng,
Jingyao Zheng,
Lik-Hang Lee,
Tae-Ho Kim,
Choong Seon Hong,
Chaoning Zhang
Abstract:
The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discu…
▽ More
The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discuss these elements consisting of but not limited to core building blocks (vision, language, and temporal) and supporting features from the perspective of their contributions to achieving a world model. We employ the PRISMA framework to curate 97 impactful research articles from renowned scientific databases primarily studying video synthesis using text conditions. Upon minute exploration of these manuscripts, we observe that text-to-video generation involves more intricate technologies beyond the plain extension of text-to-image generation. Our additional review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation such as dataset, evaluation metric, efficient architecture, and human-controlled generation. Finally, we conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community towards its advancement as the first step to realize artificial general intelligence (AGI).
△ Less
Submitted 7 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
The ALMA-CRISTAL survey: Extended [CII] emission in an interacting galaxy system at z ~ 5.5
Authors:
A. Posses,
M. Aravena,
J. González-López,
N. M. Förster Schreiber,
D. Liu,
L. Lee,
M. Solimano,
T. Díaz-Santos,
R. J. Assef,
L. Barcos-Muñoz,
S. Bovino,
R. A. A. Bowler,
G. Calistro Rivera,
E. da Cunha,
R. L. Davies,
M. Killi,
I. De Looze,
A. Ferrara,
D. B. Fisher,
R. Herrera-Camus,
R. Ikeda,
T. Lambert,
J. Li,
D. Lutz,
I. Mitsuhashi
, et al. (9 additional authors not shown)
Abstract:
The ALMA [CII] Resolved Ism in STar-forming gALaxies (CRISTAL) survey is a Cycle 8 ALMA Large Programme that studies the cold gas component of high-redshift galaxies. Its sub-arcsecond resolution observations are key to disentangling physical mechanisms that shape galaxies during cosmic dawn. In this paper, we explore the morphology and kinematics of the cold gas, star-forming, and stellar compone…
▽ More
The ALMA [CII] Resolved Ism in STar-forming gALaxies (CRISTAL) survey is a Cycle 8 ALMA Large Programme that studies the cold gas component of high-redshift galaxies. Its sub-arcsecond resolution observations are key to disentangling physical mechanisms that shape galaxies during cosmic dawn. In this paper, we explore the morphology and kinematics of the cold gas, star-forming, and stellar components in the star-forming main-sequence galaxy CRISTAL-05/HZ3, at z = 5.54. Our analysis includes 0.3" spatial resolution (~2 kpc) ALMA observations of the [CII] line. While CRISTAL-05 was previously classified as a single source, our observations reveal that the system is a close interacting pair surrounded by an extended component of carbon-enriched gas. This is imprinted in the disturbed elongated [CII] morphology and the separation of the two components in the position-velocity diagram (~100 km/s). The central region is composed of two components, named C05-NW and C05-SE, with the former being the dominant one. A significant fraction of the [CII] arises beyond the close pair up to 10 kpc, while the regions forming new massive stars and the stellar component seem compact (r_[CII] ~ 4 r_UV), as traced by rest-frame UV and optical imaging obtained with the Hubble Space Telescope and the James Webb Space Telescope. Our kinematic model, using the DYSMALpy software, yields a minor contribution of dark matter of C05-NW within a radius of ~2x Reff. Finally, we explore the resolved [CII]/FIR ratios as a proxy for shock-heating produced by this merger. We argue that the extended [CII] emission is mainly caused by the merger, which could not be discerned with lower-resolution observations. Our work emphasizes the need for high-resolution observations to fully characterize the dynamic stages of infant galaxies and the physical mechanisms that drive the metal enrichment of the circumgalactic medium.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Authors:
Peng Qi,
Zehong Yan,
Wynne Hsu,
Mong Li Lee
Abstract:
Misinformation is a prevalent societal issue due to its potential high risks. Out-of-context (OOC) misinformation, where authentic images are repurposed with false text, is one of the easiest and most effective ways to mislead audiences. Current methods focus on assessing image-text consistency but lack convincing explanations for their judgments, which is essential for debunking misinformation. W…
▽ More
Misinformation is a prevalent societal issue due to its potential high risks. Out-of-context (OOC) misinformation, where authentic images are repurposed with false text, is one of the easiest and most effective ways to mislead audiences. Current methods focus on assessing image-text consistency but lack convincing explanations for their judgments, which is essential for debunking misinformation. While Multimodal Large Language Models (MLLMs) have rich knowledge and innate capability for visual reasoning and explanation generation, they still lack sophistication in understanding and discovering the subtle crossmodal differences. In this paper, we introduce SNIFFER, a novel multimodal large language model specifically engineered for OOC misinformation detection and explanation. SNIFFER employs two-stage instruction tuning on InstructBLIP. The first stage refines the model's concept alignment of generic objects with news-domain entities and the second stage leverages language-only GPT-4 generated OOC-specific instruction data to fine-tune the model's discriminatory powers. Enhanced by external tools and retrieval, SNIFFER not only detects inconsistencies between text and image but also utilizes external knowledge for contextual verification. Our experiments show that SNIFFER surpasses the original MLLM by over 40% and outperforms state-of-the-art methods in detection accuracy. SNIFFER also provides accurate and persuasive explanations as validated by quantitative and human evaluations.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
An Automated Chemical Exploration of NGC 6334I at 340 au Resolution
Authors:
Samer J. El-Abd,
Crystal L. Brogan,
Todd R. Hunter,
Kin Long Kelvin Lee,
Ryan A. Loomis,
Brett A. McGuire
Abstract:
Much of the information gleaned from observations of star-forming regions comes from the analysis of their molecular emission spectra, particularly in the radio regime. The time-consuming nature of fitting synthetic spectra to observations interactively for such line-rich sources, however, often results in such analysis being limited to data extracted from a single-dish observation or a handful of…
▽ More
Much of the information gleaned from observations of star-forming regions comes from the analysis of their molecular emission spectra, particularly in the radio regime. The time-consuming nature of fitting synthetic spectra to observations interactively for such line-rich sources, however, often results in such analysis being limited to data extracted from a single-dish observation or a handful of pixels from an interferometric observation. Yet, star-forming regions display a wide variety of physical conditions that are difficult, if not impossible, to accurately characterize with such a limited number of spectra. We have developed an automated fitting routine that visits every pixel in the field of view of an ALMA data cube and determines the best-fit physical parameters, including excitation temperature and column densities, for a given list of molecules. In this proof-of-concept work, we provide an overview of the fitting routine and apply it to 0".26, 1.1 km s$^{-1}$ resolution ALMA observations of two sites of massive star-formation in NGC 6334I. Parameters were found for 21 distinct molecules by generating synthetic spectra across 7.48 GHz of spectral bandwidth between 280 and 351 GHz. Spatial images of the derived parameters for each of the > 8000 pixels are presented with special attention paid to the C$_2$H$_4$O$_2$ isomers and their relative variations. We highlight the greater scientific utility of the column density and velocity images of individual molecules compared to traditional moment maps of single transitions.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
From GARCH to Neural Network for Volatility Forecast
Authors:
Pengfei Zhao,
Haoren Zhu,
Wilfred Siu Hung NG,
Dik Lun Lee
Abstract:
Volatility, as a measure of uncertainty, plays a crucial role in numerous financial activities such as risk management. The Econometrics and Machine Learning communities have developed two distinct approaches for financial volatility forecasting: the stochastic approach and the neural network (NN) approach. Despite their individual strengths, these methodologies have conventionally evolved in sepa…
▽ More
Volatility, as a measure of uncertainty, plays a crucial role in numerous financial activities such as risk management. The Econometrics and Machine Learning communities have developed two distinct approaches for financial volatility forecasting: the stochastic approach and the neural network (NN) approach. Despite their individual strengths, these methodologies have conventionally evolved in separate research trajectories with little interaction between them. This study endeavors to bridge this gap by establishing an equivalence relationship between models of the GARCH family and their corresponding NN counterparts. With the equivalence relationship established, we introduce an innovative approach, named GARCH-NN, for constructing NN-based volatility models. It obtains the NN counterparts of GARCH models and integrates them as components into an established NN architecture, thereby seamlessly infusing volatility stylized facts (SFs) inherent in the GARCH models into the neural network. We develop the GARCH-LSTM model to showcase the power of the GARCH-NN approach. Experiment results validate that amalgamating the NN counterparts of the GARCH family models into established NN models leads to enhanced outcomes compared to employing the stochastic and NN models in isolation.
△ Less
Submitted 29 January, 2024;
originally announced February 2024.
-
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
Authors:
Liang-Hsuan Tseng,
En-Pei Hu,
Cheng-Han Chiang,
Yuan Tseng,
Hung-yi Lee,
Lin-shan Lee,
Shao-Hua Sun
Abstract:
Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text…
▽ More
Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text challenging, especially without paired data. In this paper, we propose REBORN,Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR. REBORN alternates between (1) training a segmentation model that predicts the boundaries of the segmental structures in speech signals and (2) training the phoneme prediction model, whose input is the speech feature segmented by the segmentation model, to predict a phoneme transcription. Since supervised data for training the segmentation model is not available, we use reinforcement learning to train the segmentation model to favor segmentations that yield phoneme sequence predictions with a lower perplexity. We conduct extensive experiments and find that under the same setting, REBORN outperforms all prior unsupervised ASR models on LibriSpeech, TIMIT, and five non-English languages in Multilingual LibriSpeech. We comprehensively analyze why the boundaries learned by REBORN improve the unsupervised ASR performance.
△ Less
Submitted 28 May, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Eigenmode Decomposition Method for Full-Wave Modeling of Microring Resonators
Authors:
Yuriy Akimov,
Aswin Alexander Eapen,
Shiyang Zhu,
Doris K. T. Ng,
Nanxi Li,
Woon Leng Loh,
Lennon Y. T. Lee,
Alagappan Gandhi,
Aravind P. Anthur
Abstract:
We develop a theoretical predictive model for an all-pass ring resonator that enables the most complete description of linear coupling regimes. The model is based on eigenmode decomposition of Maxwell's equations with full account of the confined and leaky modes, as opposed to the existing phenomenological methods restricted to the confined modes only. This model enables quantitative description o…
▽ More
We develop a theoretical predictive model for an all-pass ring resonator that enables the most complete description of linear coupling regimes. The model is based on eigenmode decomposition of Maxwell's equations with full account of the confined and leaky modes, as opposed to the existing phenomenological methods restricted to the confined modes only. This model enables quantitative description of all-pass ring resonators and provides insights into the physics underlying microring-waveguide coupling. We experimentally validate the model using transmission measurements in the linear regime of aluminium nitride resonators. The developed model is then used to explore the field enhancement in microrings crucial for nonlinear photonic applications.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPT
Authors:
Yiming Zhu,
Zhizhuo Yin,
Gareth Tyson,
Ehsan-Ul Haq,
Lik-Hang Lee,
Pan Hui
Abstract:
Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning -- techniques and guidelines that attempt to improve the quality of prompts. Yet these largely…
▽ More
Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning -- techniques and guidelines that attempt to improve the quality of prompts. Yet these largely rely on manual effort and prior knowledge of the dataset being annotated. To address this limitation, we propose APT-Pipe, an automated prompt-tuning pipeline. APT-Pipe aims to automatically tune prompts to enhance ChatGPT's text classification performance on any given dataset. We implement APT-Pipe and test it across twelve distinct text classification datasets. We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets, with an improvement of 7.01% on average. We further highlight APT-Pipe's flexibility as a framework by showing how it can be extended to support additional tuning mechanisms.
△ Less
Submitted 20 February, 2024; v1 submitted 24 January, 2024;
originally announced February 2024.
-
SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering
Authors:
Chyi-Jiunn Lin,
Guan-Ting Lin,
Yung-Sung Chuang,
Wei-Lun Wu,
Shang-Wen Li,
Abdelrahman Mohamed,
Hung-yi Lee,
Lin-shan Lee
Abstract:
Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the ans…
▽ More
Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the answer from a spoken archive in addition, was never considered. This paper proposes the first known end-to-end framework, Speech Dense Passage Retriever (SpeechDPR), for the retrieval component of the openSQA problem. SpeechDPR learns a sentence-level semantic representation by distilling knowledge from the cascading model of unsupervised ASR (UASR) and text dense retriever (TDR). No manually transcribed speech data is needed. Initial experiments showed performance comparable to the cascading model of UASR and TDR, and significantly better when UASR was poor, verifying this approach is more robust to speech recognition errors.
△ Less
Submitted 24 August, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
Rapid Estimation of Left Ventricular Contractility with a Physics-Informed Neural Network Inverse Modeling Approach
Authors:
Ehsan Naghavi,
Haifeng Wang,
Lei Fan,
Jenny S. Choy,
Ghassan Kassab,
Seungik Baek,
Lik-Chuan Lee
Abstract:
Physics-based computer models based on numerical solution of the governing equations generally cannot make rapid predictions, which in turn, limits their applications in the clinic. To address this issue, we developed a physics-informed neural network (PINN) model that encodes the physics of a closed-loop blood circulation system embedding a left ventricle (LV). The PINN model is trained to satisf…
▽ More
Physics-based computer models based on numerical solution of the governing equations generally cannot make rapid predictions, which in turn, limits their applications in the clinic. To address this issue, we developed a physics-informed neural network (PINN) model that encodes the physics of a closed-loop blood circulation system embedding a left ventricle (LV). The PINN model is trained to satisfy a system of ordinary differential equations (ODEs) associated with a lumped parameter description of the circulatory system. The model predictions have a maximum error of less than 5% when compared to those obtained by solving the ODEs numerically. An inverse modeling approach using the PINN model is also developed to rapidly estimate model parameters (in $\sim$ 3 mins) from single-beat LV pressure and volume waveforms. Using synthetic LV pressure and volume waveforms generated by the PINN model with different model parameter values, we show that the inverse modeling approach can recover the corresponding ground truth values, which suggests that the model parameters are unique. The PINN inverse modeling approach is then applied to estimate LV contractility indexed by the end-systolic elastance $E_{es}$ using waveforms acquired from 11 swine models, including waveforms acquired before and after administration of dobutamine (an inotropic agent) in 3 animals. The estimated $E_{es}$ is about 58% to 284% higher for the data associated with dobutamine compared to those without, which implies that this approach can be used to estimate LV contractility using single-beat measurements. The PINN inverse modeling can potentially be used in the clinic to simultaneously estimate LV contractility and other physiological parameters from single-beat measurements.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
IQNet: Image Quality Assessment Guided Just Noticeable Difference Prefiltering For Versatile Video Coding
Authors:
Yu-Han Sun,
Chiang Lo-Hsuan Lee,
Tian-Sheuan Chang
Abstract:
Image prefiltering with just noticeable distortion (JND) improves coding efficiency in a visual lossless way by filtering the perceptually redundant information prior to compression. However, real JND cannot be well modeled with inaccurate masking equations in traditional approaches or image-level subject tests in deep learning approaches. Thus, this paper proposes a fine-grained JND prefiltering…
▽ More
Image prefiltering with just noticeable distortion (JND) improves coding efficiency in a visual lossless way by filtering the perceptually redundant information prior to compression. However, real JND cannot be well modeled with inaccurate masking equations in traditional approaches or image-level subject tests in deep learning approaches. Thus, this paper proposes a fine-grained JND prefiltering dataset guided by image quality assessment for accurate block-level JND modeling. The dataset is constructed from decoded images to include coding effects and is also perceptually enhanced with block overlap and edge preservation. Furthermore, based on this dataset, we propose a lightweight JND prefiltering network, IQNet, which can be applied directly to different quantization cases with the same model and only needs 3K parameters. The experimental results show that the proposed approach to Versatile Video Coding could yield maximum/average bitrate savings of 41\%/15\% and 53\%/19\% for all-intra and low-delay P configurations, respectively, with negligible subjective quality loss. Our method demonstrates higher perceptual quality and a model size that is an order of magnitude smaller than previous deep learning methods.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
$ρ$-Diffusion: A diffusion-based density estimation framework for computational physics
Authors:
Maxwell X. Cai,
Kin Long Kelvin Lee
Abstract:
In physics, density $ρ(\cdot)$ is a fundamentally important scalar function to model, since it describes a scalar field or a probability density function that governs a physical process. Modeling $ρ(\cdot)$ typically scales poorly with parameter space, however, and quickly becomes prohibitively difficult and computationally expensive. One promising avenue to bypass this is to leverage the capabili…
▽ More
In physics, density $ρ(\cdot)$ is a fundamentally important scalar function to model, since it describes a scalar field or a probability density function that governs a physical process. Modeling $ρ(\cdot)$ typically scales poorly with parameter space, however, and quickly becomes prohibitively difficult and computationally expensive. One promising avenue to bypass this is to leverage the capabilities of denoising diffusion models often used in high-fidelity image generation to parameterize $ρ(\cdot)$ from existing scientific data, from which new samples can be trivially sampled from. In this paper, we propose $ρ$-Diffusion, an implementation of denoising diffusion probabilistic models for multidimensional density estimation in physics, which is currently in active development and, from our results, performs well on physically motivated 2D and 3D density functions. Moreover, we propose a novel hashing technique that allows $ρ$-Diffusion to be conditioned by arbitrary amounts of physical parameters of interest.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
The ALMA-CRISTAL survey: Widespread dust-obscured star formation in typical star-forming galaxies at z=4-6
Authors:
Ikki Mitsuhashi,
Ken-ichi Tadaki,
Ryota Ikeda,
Rodrigo Herrera-Camus,
Manuel Aravena,
Ilse De Looze,
Natascha M. Förster Schreiber,
Jorge González-López,
Justin Spilker,
Roberto J. Assef,
Rychard Bouwens,
Loreto Barcos-Munoz,
Jack Birkin,
Rebecca A. A. Bowler,
Gabriela Calistro Rivera,
Rebecca Davies,
Elisabete Da Cunha,
Tanio Díaz-Santos,
Andrea Ferrara,
Deanne Fisher,
Lilian L. Lee,
Juno Li,
Dieter Lutz,
Monica Relaño,
Thorsten Naab
, et al. (7 additional authors not shown)
Abstract:
We present the morphological parameters and global properties of dust-obscured star formation in typical star-forming galaxies at z=4-6. Among 26 galaxies composed of 20 galaxies observed by the Cycle-8 ALMA Large Program, CRISTAL, and six galaxies from archival data, we have individually detected rest-frame 158$μ$m dust continuum emission from 19 galaxies, nine of which are reported for the first…
▽ More
We present the morphological parameters and global properties of dust-obscured star formation in typical star-forming galaxies at z=4-6. Among 26 galaxies composed of 20 galaxies observed by the Cycle-8 ALMA Large Program, CRISTAL, and six galaxies from archival data, we have individually detected rest-frame 158$μ$m dust continuum emission from 19 galaxies, nine of which are reported for the first time. The derived far-infrared luminosities are in the range $\log_{10} L_{\rm IR}\,[L_{\odot}]=$10.9-12.4, an order of magnitude lower than previously detected massive dusty star-forming galaxies (DSFGs). The average relationship between the fraction of dust-obscured star formation ($f_{\rm obs}$) and the stellar mass is consistent with previous results at z=4-6 in a mass range of $\log_{10} M_{\ast}\,[M_{\odot}]\sim$9.5-11.0 and show potential evolution from z=6-9. The individual $f_{\rm obs}$ exhibits a significant diversity, and it shows a correlation with the spatial offset between the dust and the UV continuum, suggesting the inhomogeneous dust reddening may cause the source-to-source scatter in $f_{\rm obs}$. The effective radii of the dust emission are on average $\sim$1.5 kpc and are $\sim2$ times more extended than the rest-frame UV. The infrared surface densities of these galaxies ($Σ_{\rm IR}\sim2.0\times10^{10}\,L_{\odot}\,{\rm kpc}^{-2}$) are one order of magnitude lower than those of DSFGs that host compact central starbursts. On the basis of the comparable contribution of dust-obscured and dust-unobscured star formation along with their similar spatial extent, we suggest that typical star-forming galaxies at z=4-6 form stars throughout the entirety of their disks.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Buckling instability in a chain of sticky bubbles
Authors:
Carmen L. Lee,
Kari Dalnoki-Veress
Abstract:
A slender object undergoing an axial compression will buckle to alleviate the stress. Typically the morphology of the deformed object depends on the bending stiffness for solids, or the viscoelastic properties for liquid threads. We study a chain of uniform sticky air bubbles that rise due to buoyancy through an aqueous bath. A buckling instability of the bubble chain with a characteristic wavelen…
▽ More
A slender object undergoing an axial compression will buckle to alleviate the stress. Typically the morphology of the deformed object depends on the bending stiffness for solids, or the viscoelastic properties for liquid threads. We study a chain of uniform sticky air bubbles that rise due to buoyancy through an aqueous bath. A buckling instability of the bubble chain with a characteristic wavelength is observed. If a chain of bubbles is produced faster than it is able to rise, the dominance of viscous drag over buoyancy results in a compressive stress that is alleviated by buckling the bubble chain. Using low Reynolds number hydrodynamics, we predict the critical buckling speed, the terminal speed of a buckled chain, and the geometry of the buckles.
△ Less
Submitted 30 May, 2024; v1 submitted 26 November, 2023;
originally announced November 2023.
-
A Study of Partisan News Sharing in the Russian invasion of Ukraine
Authors:
Yiming Zhu,
Ehsan-Ul Haq,
Gareth Tyson,
Lik-Hang Lee,
Yuyang Wang,
Pan Hui
Abstract:
Since the Russian invasion of Ukraine, a large volume of biased and partisan news has been spread via social media platforms. As this may lead to wider societal issues, we argue that understanding how partisan news sharing impacts users' communication is crucial for better governance of online communities. In this paper, we perform a measurement study of partisan news sharing. We aim to characteri…
▽ More
Since the Russian invasion of Ukraine, a large volume of biased and partisan news has been spread via social media platforms. As this may lead to wider societal issues, we argue that understanding how partisan news sharing impacts users' communication is crucial for better governance of online communities. In this paper, we perform a measurement study of partisan news sharing. We aim to characterize the role of such sharing in influencing users' communications. Our analysis covers an eight-month dataset across six Reddit communities related to the Russian invasion. We first perform an analysis of the temporal evolution of partisan news sharing. We confirm that the invasion stimulates discussion in the observed communities, accompanied by an increased volume of partisan news sharing. Next, we characterize users' response to such sharing. We observe that partisan bias plays a role in narrowing its propagation. More biased media is less likely to be spread across multiple subreddits. However, we find that partisan news sharing attracts more users to engage in the discussion, by generating more comments. We then built a predictive model to identify users likely to spread partisan news. The prediction is challenging though, with 61.57% accuracy on average. Our centrality analysis on the commenting network further indicates that the users who disseminate partisan news possess lower network influence in comparison to those who propagate neutral news.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.