In recent times, there has been a notable rise in the utilization of Internet of Medical Things (IoMT) frameworks particularly those based on edge computing, to enhance remote monitoring in healthcare applications. Most existing models in this field have been developed temperature screening methods using RCNN, face temperature encoder (FTE), and a combination of data from wearable sensors for predicting respiratory rate (RR) and monitoring blood pressure. These methods aim to facilitate remote screening and monitoring of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and COVID-19. However, these models require inadequate computing resources and are not suitable for lightweight environments. We propose a multimodal screening framework that leverages deep learning-inspired data fusion models to enhance screening results. A Variation Encoder (VEN) design proposes to measure skin temperature using Regions of Interest (RoI) identified by YoLo. Subsequently, the multi-data fusion model integrates electronic records features with data from wearable human sensors. To optimize computational efficiency, a data reduction mechanism is added to eliminate unnecessary features. Furthermore, we employ a contingent probability method to estimate distinct feature weights for each cluster, deepening our understanding of variations in thermal and sensory data to assess the prediction of abnormal COVID-19 instances. Simulation results using our lab dataset demonstrate a precision of 95.2%, surpassing state-of-the-art models due to the thoughtful design of the multimodal data-based feature fusion model, weight prediction factor, and feature selection model.