SPADE-E2VID: Spatially-Adaptive Denormalization for Event-Based Video Reconstruction

IEEE Trans Image Process. 2021:30:2488-2500. doi: 10.1109/TIP.2021.3052070. Epub 2021 Feb 1.

Abstract

Event-based cameras have several advantages over traditional cameras that shoot videos in frames. Event cameras have a high temporal resolution, high dynamic range, and almost non-existence of blurriness. The data that is produced by event sensors forms a chain of events when a change in brightness is reported in each pixel. This feature makes it difficult to directly apply existing algorithms and take advantage of the event camera data. Due to the developments in neural networks, important advances were made in event-based image reconstruction. Even though these neural networks achieve precise reconstructions while preserving most of the properties of the event cameras, there is still an initialization time that needs to have the highest possible quality in the reconstructed frames. In this work, we present the SPADE-E2VID neural network model that improves the quality of early frames in an event-based reconstructed video, as well as the overall contrast. The SPADE-E2VID model improves the quality of the first reconstructed frames by 15.87% for MSE error, 4.15% for SSIM, and 2.5% in LPIPS. In addition, the SPADE layer in our model allows training our model to reconstruct videos without a temporal loss function. Another advantage of our model is that it has a faster training time. In a many-to-one training style, we avoid running the loss function at each step, executing the loss function at the end of each loop only once. In the present work, we also carried out experiments with event cameras that do not have polarity data. Our model produces quality video reconstructions with non-polarity events in HD resolution (1200 × 800). The Video, the code, and the datasets will be available at: https://github.com/RodrigoGantier/SPADE_E2VID.