-
A Digital Beamforming Receiver Architecture Implemented on a FPGA for Space Applications
Authors:
Eduardo Ortega,
Agustín Martínez,
Antonio Oliva,
Fernando Sanz,
Oscar Rodríguez,
Manuel Prieto,
Pablo Parra,
Antonio Da Silva,
Sebastián Sánchez
Abstract:
The burgeoning interest within the space community in digital beamforming is largely attributable to the superior flexibility that satellites with active antenna systems offer for a wide range of applications, notably in communication services. This paper delves into the analysis and practical implementation of a Digital Beamforming and Digital Down Conversion (DDC) chain, leveraging a high-speed…
▽ More
The burgeoning interest within the space community in digital beamforming is largely attributable to the superior flexibility that satellites with active antenna systems offer for a wide range of applications, notably in communication services. This paper delves into the analysis and practical implementation of a Digital Beamforming and Digital Down Conversion (DDC) chain, leveraging a high-speed Analog-to-Digital Converter (ADC) certified for space applications alongside a high-performance Field-Programmable Gate Array (FPGA). The proposed design strategy focuses on optimizing resource efficiency and minimizing power consumption by strategically sequencing the beamformer processor ahead of the complex down-conversion operation. This innovative approach entails the application of demodulation and low-pass filtering exclusively to the aggregated beam channel, culminating in a marked reduction in the requisite digital signal processing resources relative to traditional, more resource-intensive digital beamforming and DDC architectures. In the experimental validation, an evaluation board integrating a high-speed ADC and a FPGA was utilized. This setup facilitated the empirical validation of the design's efficacy by applying various RF input signals to the digital beamforming receiver system. The ADC employed is capable of high-resolution signal processing, while the FPGA provides the necessary computational flexibility and speed for real-time digital signal processing tasks. The findings underscore the potential of this design to significantly enhance the efficiency and performance of digital beamforming systems in space applications.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
A CT-based deep learning system for automatic assessment of aortic root morphology for TAVI planning
Authors:
Simone Saitta,
Francesco Sturla,
Riccardo Gorla,
Omar A. Oliva,
Emiliano Votta,
Francesco Bedogni,
Alberto Redaelli
Abstract:
Accurate planning of transcatheter aortic implantation (TAVI) is important to minimize complications, and it requires anatomic evaluation of the aortic root (AR), commonly done through 3D computed tomography (CT) image analysis. Currently, there is no standard automated solution for this process. Two convolutional neural networks (CNNs) with 3D U-Net architectures (model 1 and model 2) were traine…
▽ More
Accurate planning of transcatheter aortic implantation (TAVI) is important to minimize complications, and it requires anatomic evaluation of the aortic root (AR), commonly done through 3D computed tomography (CT) image analysis. Currently, there is no standard automated solution for this process. Two convolutional neural networks (CNNs) with 3D U-Net architectures (model 1 and model 2) were trained on 310 CT scans for AR analysis. Model 1 performed AR segmentation and model 2 identified the aortic annulus and sinotubular junction (STJ) contours. Results were validated against manual measurements of 178 TAVI candidates. After training, the two models were integrated into a fully automated pipeline for geometric analysis of the AR. The trained CNNs effectively segmented the AR, annulus and STJ, resulting in mean Dice scores of 0.93 for the AR, and mean surface distances of 1.16 mm and 1.30 mm for the annulus and STJ, respectively. Automatic measurements were in good agreement with manual annotations, yielding annulus diameters that differed by 0.52 [-2.96, 4.00] mm (bias and 95% limits of agreement for manual minus algorithm). Evaluating the area-derived diameter, bias and limits of agreement were 0.07 [-0.25, 0.39] mm. STJ and sinuses diameters computed by the automatic method yielded differences of 0.16 [-2.03, 2.34] and 0.1 [-2.93, 3.13] mm, respectively. The proposed tool is a fully automatic solution to quantify morphological biomarkers for pre-TAVI planning. The method was validated against manual annotation from clinical experts and showed to be quick and effective in assessing AR anatomy, with potential for time and cost savings.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions
Authors:
Mathew Monfort,
SouYoung Jin,
Alexander Liu,
David Harwath,
Rogerio Feris,
James Glass,
Aude Oliva
Abstract:
When people observe events, they are able to abstract key information and build concise summaries of what is happening. These summaries include contextual and semantic information describing the important high-level details (what, where, who and how) of the observed event and exclude background information that is deemed unimportant to the observer. With this in mind, the descriptions people gener…
▽ More
When people observe events, they are able to abstract key information and build concise summaries of what is happening. These summaries include contextual and semantic information describing the important high-level details (what, where, who and how) of the observed event and exclude background information that is deemed unimportant to the observer. With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest in each video. These descriptions can be captured in captions that provide expanded attributes for video labeling (e.g. actions/objects/scenes/sentiment/etc.) while allowing us to gain new insight into what people find important or necessary to summarize specific events. Existing caption datasets for video understanding are either small in scale or restricted to a specific domain. To address this, we present the Spoken Moments (S-MiT) dataset of 500k spoken captions each attributed to a unique short video depicting a broad range of different events. We collect our descriptions using audio recordings to ensure that they remain as natural and concise as possible while allowing us to scale the size of a large classification dataset. In order to utilize our proposed dataset, we present a novel Adaptive Mean Margin (AMM) approach to contrastive learning and evaluate our models on video/caption retrieval on multiple datasets. We show that our AMM approach consistently improves our results and that models trained on our Spoken Moments dataset generalize better than those trained on other video-caption datasets.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
Authors:
Mathew Monfort,
Bowen Pan,
Kandan Ramakrishnan,
Alex Andonian,
Barry A McNamara,
Alex Lascelles,
Quanfu Fan,
Dan Gutfreund,
Rogerio Feris,
Aude Oliva
Abstract:
Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds. However, most large-scale datasets built to train models for action recognition in video only provide a single label per video. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not…
▽ More
Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds. However, most large-scale datasets built to train models for action recognition in video only provide a single label per video. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not learn the full spectrum of information present in each video in training. Towards this goal, we present the Multi-Moments in Time dataset (M-MiT) which includes over two million action labels for over one million three second videos. This multi-label dataset introduces novel challenges on how to train and analyze models for multi-action detection. Here, we present baseline results for multi-action recognition using loss functions adapted for long tail multi-label learning, provide improved methods for visualizing and interpreting models trained for multi-label action detection and show the strength of transferring models trained on M-MiT to smaller datasets.
△ Less
Submitted 27 September, 2021; v1 submitted 1 November, 2019;
originally announced November 2019.