In electroencephalography (EEG) classification paradigms, data from a target subject is often difficult to obtain, leading to difficulties in training a robust deep learning network. Transfer learning and their variations are effective tools in improving such models suffering from lack of data. However, many of the proposed variations and deep models often rely on a single assumed distribution to represent the latent features which may not scale well due to inter- and intra-subject variations in signals. This leads to significant instability in individual subject decoding performances. The presence of non-trivial domain differences between different sets of training or transfer learning data causes poorer model generalization towards the target subject. However, the detection of these domain differences is often difficult to perform due to the ill-defined nature of the EEG domain features. This study proposes a novel inference model, the Joint Embedding Variational Autoencoder, that offers conditionally tighter approximation of the estimated spatiotemporal feature distribution through the use of jointly optimised variational autoencoders to achieve optimizable data dependent inputs as an additional variable for improved overall model optimisation and scaling without sacrificing model tightness. To learn the variational bound, we show that maximising the marginal log-likelihood of only the second embedding section is required to achieve conditionally tighter lower bounds. Furthermore, we show that this model provides state-of-the-art EEG data reconstruction and deep feature extraction. The extracted domains of the EEG signals across each subject displays the rationale as to why there exists disparity between subjects' adaptation efficacy.