Sky-GVIO: an enhanced GNSS/INS/Vision navigation with FCN-based sky-segmentation in urban canyon

Jingrong Wang, Bo Xu, Ronghe Jin, Shoujian Zhang, Xingxing Li, Kefu Gao, and Jingnan Liu This work was supported by the National Key Research and Development Program of China under Grant 2021YFB2501100. (Corresponding author: Shoujian Zhang.) Jingrong Wang, Kefu Gao and Jingnan Liu are with the GNSS research center, Wuhan University, Wuhan 430079, China;Bo Xu, Shoujian Zhang and Xingxing Li are with School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (email: [email protected]).Ronghe Jin is with The Department of Aeronautical and Aviation Engineering, Hong Kong Polytechnic University, Hong Kong, China
Abstract

Accurate, continuous, and reliable positioning is a critical component of achieving autonomous driving. However, in complex urban canyon environments, the vulnerability of a stand-alone sensor and non-line-of-sight (NLOS) caused by high buildings, trees, and elevated structures seriously affect positioning results. To address these challenges, a sky-view images segmentation algorithm based on Fully Convolutional Network (FCN) is proposed for GNSS NLOS detection. Building upon this, a novel NLOS detection and mitigation algorithm (named S-NDM) is extended to the tightly coupled Global Navigation Satellite Systems (GNSS), Inertial Measurement Units (IMU), and visual feature system which is called Sky-GVIO, with the aim of achieving continuous and accurate positioning in urban canyon environments. Furthermore, the system harmonizes Single Point Positioning (SPP) with Real-Time Kinematic (RTK) methodologies to bolster its operational versatility and resilience. In urban canyon environments, the positioning performance of S-NDM algorithm proposed in this paper is evaluated under different tightly coupled SPP-related and RTK-related models. The results exhibit that Sky-GVIO system achieves meter-level accuracy under SPP mode and sub-decimeter precision with RTK, surpassing the performance of GNSS/INS/Vision frameworks devoid of S-NDM. Additionally, the sky-view image dataset, inclusive of training and evaluation subsets, has been made publicly accessible for scholarly exploration at https://github.com/whuwangjr/sky-view-images .

Note to Practitioners

This study focus on the tight integration of multiple homogeneous and heterogeneous sensors (e.g. GNSS/INS/Vision) with the goal of addressing GNSS NLOS interference challenges for wide-area vehicle navigation applications in urban canyon. We propose a sky-view images-based accurate and efficient NLOS detection and mitigation algorithm (named S-NDM), and extend it to the tightly coupled GNSS/INS/Vision integration framework (called Sky-GVIO). The LOS/NLOS satellites are identified by associating the semantic information of sky-view images, and a reasonable stochastic model is constructed to suppress NLOS influence in the tightly coupled GNSS/INS/Vision integration model positioning accuracy. The experimental results explains that the Sky-GVIO is able to maximize the use of as much sensor information as possible to achieve accurate and robust positioning in the real urban canyon scenarios.

Index Terms:
GNSS NLOS, GNSS/INS/Vision system, sky-view images, tightly coupled integration, urban canyon.

I Introduction

Autonomous driving is one of the significant components in the field of intelligent transportation, necessitating high-precision localization, notably within the challenging terrains like urban canyons. Currently, the synergistic use of Global Navigation Satellite System (GNSS) and Inertial Navigation System (INS) has emerged as the predominant approach to navigate the complex urban environments [1], [2], [3]. Within the domain of GNSS, Real-Time Kinematic (RTK) and Precise Point Positioning (PPP) technologies have been extensively adopted to enhance GNSS/INS integrated solutions. Comparative studies indicate that RTK/INS fusion yields superior accuracy over PPP/INS under identical observational conditions [4]. Nonetheless, the urban environment, with its pervasive obstructions like edifices and arboreal coverage, introduces Non-Line-of-Sight (NLOS) errors to GNSS signals, compromising the accuracy of GNSS/INS integration positioning. Scholars have sought to augment the precision and robustness of GNSS/INS systems in urban canyons by incorporating additional sensory apparatus or by developing techniques to detect and rectify NLOS-induced signal distortions.

Cameras are increasingly utilized in vehicular motion estimation due to their energy efficiency and cost benefits. As an external sensor, cameras can provide rich environmental features for vehicle motion estimation [5], [6]. Consequently, the integration of cameras with GNSS and Micro-Electro-Mechanical System (MEMS)-based Inertial Measurement Units (IMUs) is a common strategy to attain precise localization in complex environments [7], [8]. Previous research [9] introduced Visual-Inertial Navigation Systems (VINS)-monocular model, integrating Visual Inertial Odometry (VIO)-derived relative poses with Global Positioning System (GPS) data within a unified optimization structure. In contrast to VINS-Mono, the work in [10] combines differential GNSS results with the VIO model, where the VIO model is transformed from the local frame to the global frame, achieving meter-level positioning accuracy in complex urban environments. Advancing from VINS-Mono, the work in [11] introduced the well-known GVINS model, which performs a joint optimization of GNSS pseudorange measurements, visual features, and inertial measurements through factor graph optimization techniques. While methods based on nonlinear optimization have advantages in handling system nonlinearity, multiple iterations of optimization increase computational complexity. Therefore, some researchers have started to focus on Extended Kalman Filter (EKF)-based methodologies. Building on [12], the paper [13] put forth a tightly-coupled Mono/MEMS-IMU/single-frequency GNSS-RTK model employing a Multi-State Constraint Kalman Filter (MSCKF), which attained decimetre-level positioning accuracy in urban environments.

In the above multi-sensor fusion positioning systems, GNSS is the only subsystem that provides absolute position information. Therefore, the quality control of GNSS raw measurements determines the overall performance of the system. This underscores the significance of NLOS signal detection and mitigation, especially in the convoluted terrains of urban environments.

In the detection and mitigation of GNSS NLOS signals, strategies are divided into hardware-centric designs and algorithmic advancements. Compared to expensive hardware improvements such as antenna design in [14], [15], [16], many researchers have focused on algorithmic improvements. These include empirical weighting models based on elevation angle [17], signal-to-noise ratio (SNR) [18], and methods that leverage multi-source information for satellite visibility. Notably, methods augmented by external sources like LiDAR [19], [20], 3-dimensional (3D) maps [21], [22], and cameras [23], [24] have refined GNSS NLOS signal detection accuracy. Cameras, especially, present a cost-effective alternative to the high expenses and limited scope of LiDAR, and the necessity for continuously updated 3D map databases. Infrared cameras [25] exhibit varying results for objects at different temperatures, making it easier to distinguish between sky and non-sky areas, which is advantageous for determining the satellite’s projection location on the sky-view images. However, compared to regular fish-eye cameras, infrared cameras are more costly. Furthermore, these cameras have not yet seen widespread use in consumer market products like smartphones or vehicle-mounted cameras. Subsequently, many research works began to use sky-pointing fish-eye cameras to capture sky-view images. These images were processed using segmentation algorithms [25], [26], [27] to distinguish between sky and non-sky areas. Finally, the satellites received by GNSS receiver were projected onto the sky-view images, facilitating the visualization of GNSS NLOS satellites. As seen from the results in [24] and [28], this approach significantly enhances the performance of SPP/INS positioning in complex urban environments. However, these traditional segmentation algorithms may not adapt well to sky-view images with varying lighting conditions. Furthermore, we have observed that the use of sky-view images for GNSS NLOS detection has not been extended to tightly coupled GNSS/INS/Vision systems. Additionally, there is an absence of comparative performance analysis of sky-view images in different GNSS positioning modes, both domestically and internationally.

We aim to extend the sky-view images aided GNSS NLOS detection and mitigation method (named S-NDM) to the tightly coupled GNSS/INS/Vision system, thereby enhancing vehicle positioning performance in urban canyons. Here we particularly emphasize the progressiveness from [28]: (a) different from the previous idea of improving the region growth algorithm, we use the algorithm of neural network to achieve segmentation of sky-view images to adapt to different lighting conditions; (b) the original NLOS signal processing algorithm is only used in tightly coupled SPP/INS framework. In this paper, we extend it to tightly coupled SPP/INS/Vision and RTK/INS/Vision framework. In addition, we evaluate the performance of the algorithm in these two frameworks and verify the practicability of the algorithm. This paper emphasizes the following primary contributions:

1) Adaptive Sky-view Images Segmentation: We introduce an adaptive sky-view images segmentation based on Fully Convolutional Networks (FCN) that can adjust to varying lighting conditions, addressing a key limitation of traditional methods.

2) Integration of Sky-GNSS/INS/Vision: We propose an integrated model that combines GNSS, INS, and Vision. And we extend S-NDM method to this model (named Sky-GVIO), enabling a comprehensive approach to vehicle positioning in challenging urban canyon environments.

3) Performance Evaluation: A comprehensive evaluation of S-NDM’s performance is conducted, with a focus on its effectiveness within GNSS pseudorange and carrier phase positioning frameworks, thereby shedding light on its applicability across different GNSS-related integration positioning techniques.

4) Open-Source Sky-view Images Dataset: An open-source repository of sky-view images, including training and testing data, is provided at https://github.com/whuwangjr/sky-view-images , contributing a valuable dataset to the research community and mitigating the lack of available resources in this field.

The reminder of this paper is organized as follows: Section II gives an overview of the tightly coupled GNSS/INS/Vision system enhanced by S-NDM. The experimental description and result analysis are introduced in Section III. Finally, Section IV summarizes and concludes the study.

II System overview

The proposed model Sky-GVIO are described in this section, include sky-view images segmentation based on FCN, the tightly coupled GNSS/INS/Vision integration system and S-NDM, as shown Fig. 1. The tightly coupled model is a fusion of the observed values. Before the fusion, it is very important to process the GNSS original data. We use S-NDM algorithm to process GNSS NLOS signals. In addition, the INS mechanization is used for state prediction and the system covariance would also be propagated. In the visual part, the feature extraction and tracking will be performed following [29]. Finally, we integrate the observation equations of GNSS, INS and vision into the MSCKF framework to obtain the navigation results.

II-A Sky-view Images Segmentation

Sky-view images can be significantly affected by factors such as clouds and lighting conditions, making it challenging to achieve high-precision segmentation using traditional methods based on pixel [29], category [30], region [31], and so on. It is well-known that FCN represent a mature pixel-level semantic segmentation network [32]. The FCN network structure primarily consists of two parts: the fully convolutional part and the deconvolution part. The fully convolutional part comprises classical CNN networks, such as VGG and ResNet, which are used for feature extraction. The deconvolution part, on the other hand, upsamples the feature maps to obtain the original-sized semantic segmentation image.

Refer to caption
Figure 1: The system structure of the proposed Sky-GVIO.

In this paper, the existing ResNet50 [33] is used for downsampling, which includes 48 convolutional layers. The deconvolution part, on the other hand, upsamples the feature maps to obtain the original-sized semantic segmentation images. In this paper, the upsampling is based on FCN-8s. In the upsampling process, FCN-8s uses transposed convolution to scale the 8x, 16x and 32x feature maps to the original size, and combines these three scaled feature maps by introducing skip connection, so as to ensure the learning of features at different scales.

The input of FCN can be any size images, the output is the same size as the input, and the number of channels is n (number of target categories) +1 (background). For the sky-view images segmentation task, two types of labels are required (sky region and non-sky region), so the number of channels for sky-view images segmentation algorithm based on FCN is 2. In addition, in this study, we made 440 training datasets by ourselves. As shown in Fig. 2, we built a sky-view images segmentation model based on FCN.

Refer to caption
Figure 2: The sky-view images segmentation algorithm based on FCN.

II-B Tightly Coupled GNSS/INS/Vision Integration Model

GNSS model, INS dynamic model and visual observation model are introduced, respectively. Subsequently, the state model and measurement model of Sky-GVIO integration model are described. Finally, we use the segmentation results to realize the NLOS detection and construct the LOS/NLOS model for NLOS mitigation.

1) GNSS Observation Model: The original pseudorange and carrier phase observation equations in GNSS positioning are expressed as follows:

P=ρ+c(trts)+I+T+εp𝑃𝜌𝑐subscript𝑡𝑟superscript𝑡𝑠𝐼𝑇subscript𝜀𝑝P=\rho+c(t_{r}-t^{s})+I+T+\varepsilon_{p}italic_P = italic_ρ + italic_c ( italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_t start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) + italic_I + italic_T + italic_ε start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT (1)
L=ρ+c(trts)I+T+λN+εL𝐿𝜌𝑐subscript𝑡𝑟superscript𝑡𝑠𝐼𝑇𝜆𝑁subscript𝜀𝐿L=\rho+c(t_{r}-t^{s})-I+T+\lambda N+\varepsilon_{L}italic_L = italic_ρ + italic_c ( italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_t start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) - italic_I + italic_T + italic_λ italic_N + italic_ε start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT (2)

where P𝑃Pitalic_P and L𝐿Litalic_L represent the pseudorange and carrier phase, respectively. The angular symbols s𝑠sitalic_s and r𝑟ritalic_r refer to satellites and receivers, respectively. ρ𝜌\rhoitalic_ρ denotes the geometric distance between the phase centers of the receiver and satellite antennas. trsubscript𝑡𝑟t_{r}italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and tssuperscript𝑡𝑠t^{s}italic_t start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT respectively represent receiver and satellite clock offsets. The speed of light is c𝑐citalic_c. I𝐼Iitalic_I and T𝑇Titalic_T refer to the ionospheric and troposphere delay, respectively. λ𝜆\lambdaitalic_λ represents the carrier wavelength. N𝑁Nitalic_N represents carrier phase ambiguity. εPsubscript𝜀𝑃\varepsilon_{P}italic_ε start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT and εLsubscript𝜀𝐿\varepsilon_{L}italic_ε start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT represent pseudorange noise and carrier phase noise, respectively.

For SPP model, equation (1) is sufficient. However, in the case of RTK model, it can be represented as follows:

{ΔP=Δρ+ΔI+ΔT+ΔϵPΔL=ΔρΔI+ΔT+λΔN+ΔϵLcasesΔ𝑃Δ𝜌Δ𝐼Δ𝑇Δsubscriptitalic-ϵ𝑃otherwiseΔ𝐿Δ𝜌Δ𝐼Δ𝑇𝜆Δ𝑁Δsubscriptitalic-ϵ𝐿otherwise\begin{cases}\nabla\Delta P=\nabla\Delta\rho+\nabla\Delta I+\nabla\Delta T+% \nabla\Delta\epsilon_{P}\\ \nabla\Delta L=\nabla\Delta\rho-\nabla\Delta I+\nabla\Delta T+\lambda\nabla% \Delta N+\nabla\Delta\epsilon_{L}\end{cases}{ start_ROW start_CELL ∇ roman_Δ italic_P = ∇ roman_Δ italic_ρ + ∇ roman_Δ italic_I + ∇ roman_Δ italic_T + ∇ roman_Δ italic_ϵ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL ∇ roman_Δ italic_L = ∇ roman_Δ italic_ρ - ∇ roman_Δ italic_I + ∇ roman_Δ italic_T + italic_λ ∇ roman_Δ italic_N + ∇ roman_Δ italic_ϵ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW (3)

where ΔΔ\nabla\Delta∇ roman_Δ denotes the double-differenced (DD) operator. The DD operation is used to not only eliminate satellite orbit errors and clock errors but also to mitigate receiver clock errors, tropospheric and ionospheric delays, making it a powerful technique in GNSS positioning.

2) INS Dynamic Model: Considering the noisy measurement of the low-cost IMU, the Coriolis and centrifugal forces due to earth rotation are ignored in the IMU formulation. The inertial measurement can be modeled [34] in b𝑏bitalic_b (body) frame as follows:

𝒂~k=𝒂k+𝒃ak+(𝑹bkn)T𝒈n+𝒏asubscript~𝒂𝑘subscript𝒂𝑘subscript𝒃subscript𝑎𝑘superscriptsubscriptsuperscript𝑹𝑛subscript𝑏𝑘𝑇superscript𝒈𝑛subscript𝒏𝑎\tilde{\bm{a}}_{k}=\bm{a}_{k}+\bm{b}_{a_{k}}+\left(\bm{R}^{n}_{{b}_{k}}\right)% ^{T}\bm{g}^{n}+\bm{n}_{a}over~ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = bold_italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + bold_italic_b start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT + ( bold_italic_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_g start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT + bold_italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT (4)
𝝎~k=𝝎k+𝒃ωk+𝒏ωsubscript~𝝎𝑘subscript𝝎𝑘subscript𝒃subscript𝜔𝑘subscript𝒏𝜔\tilde{\bm{\omega}}_{k}=\bm{\omega}_{k}+\bm{b}_{\omega_{k}}+\bm{n}_{\omega}over~ start_ARG bold_italic_ω end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + bold_italic_b start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT + bold_italic_n start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT (5)

where [𝒂~k,𝝎~k]subscript~𝒂𝑘subscript~𝝎𝑘\left[\tilde{\bm{a}}_{k},\tilde{\bm{\omega}}_{k}\right][ over~ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over~ start_ARG bold_italic_ω end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] is the output of the IMU at time k𝑘kitalic_k and [𝒂k,𝝎k]subscript𝒂𝑘subscript𝝎𝑘\left[\bm{a}_{k},\bm{\omega}_{k}\right][ bold_italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] is the linear acceleration and angular velocity of the IMU sensor. 𝒃aksubscript𝒃subscript𝑎𝑘\bm{b}_{a_{k}}bold_italic_b start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝒃ωksubscript𝒃subscript𝜔𝑘\bm{b}_{\omega_{k}}bold_italic_b start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT respectively are the biases of the accelerometer and gyroscope at time k𝑘kitalic_k . In addition, 𝒏asubscript𝒏𝑎\bm{n}_{a}bold_italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and 𝒏ωsubscript𝒏𝜔\bm{n}_{\omega}bold_italic_n start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT are assumed to be zero-mean Gaussian distributed with 𝒏𝒂N(0,Σna)similar-tosubscript𝒏𝒂𝑁0subscriptΣsubscript𝑛𝑎\bm{n_{a}}\sim N\left(0,\Sigma_{n_{a}}\right)bold_italic_n start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT ∼ italic_N ( 0 , roman_Σ start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , 𝒏𝝎N(0,Σnω)similar-tosubscript𝒏𝝎𝑁0subscriptΣsubscript𝑛𝜔\bm{n_{\omega}}\sim N\left(0,\Sigma_{n_{\omega}}\right)bold_italic_n start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ∼ italic_N ( 0 , roman_Σ start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_POSTSUBSCRIPT ). 𝑹bknsubscriptsuperscript𝑹𝑛subscript𝑏𝑘\bm{R}^{n}_{{b}_{k}}bold_italic_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT denotes the rotation matrix from IMU body (b𝑏bitalic_b)-frame to navigation (n𝑛nitalic_n)-frame. 𝒈nsuperscript𝒈𝑛\bm{g}^{n}bold_italic_g start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the gravity in the n𝑛nitalic_n frame.

The linearized INS dynamic model [35] can be expressed as:

{δ𝐩˙n=δ𝐯nδ𝐯˙n=𝐑bn(𝐚~𝐛a)δ𝜽𝐑bnδ𝐛a𝐑bn𝐧aδ𝜽˙=(𝝎~𝐛w)δ𝜽δ𝐛w𝐧wδ𝐛˙w=𝐧bwδ𝐛˙a=𝐧bacases𝛿superscript˙𝐩𝑛absent𝛿superscript𝐯𝑛𝛿superscript˙𝐯𝑛absentsubscriptsuperscript𝐑𝑛𝑏superscript~𝐚subscript𝐛𝑎𝛿𝜽subscriptsuperscript𝐑𝑛𝑏𝛿subscript𝐛𝑎superscriptsubscript𝐑𝑏𝑛subscript𝐧𝑎𝛿˙𝜽absentsuperscriptbold-~𝝎subscript𝐛𝑤𝛿𝜽𝛿subscript𝐛𝑤subscript𝐧𝑤𝛿subscript˙𝐛𝑤absentsubscript𝐧subscript𝑏𝑤𝛿subscript˙𝐛𝑎absentsubscript𝐧subscript𝑏𝑎otherwise\begin{cases}\begin{aligned} \delta\dot{\mathbf{p}}^{n}&=\delta\mathbf{v}^{n}% \\ \delta\dot{\mathbf{v}}^{n}&=-\mathbf{R}^{n}_{b}\left(\tilde{\mathbf{a}}-% \mathbf{b}_{a}\right)^{\wedge}\delta\bm{\theta}-\mathbf{R}^{n}_{b}\delta% \mathbf{b}_{a}-\mathbf{R}_{b}^{n}\mathbf{n}_{a}\\ \delta\dot{\bm{\theta}}&=-\left(\bm{\tilde{\omega}-\mathbf{b}}_{w}\right)^{% \wedge}\delta\bm{\theta}-\delta\mathbf{b}_{w}-\mathbf{n}_{w}\\ \delta\dot{\mathbf{b}}_{w}&=\mathbf{n}_{b_{w}}\\ \delta\dot{\mathbf{b}}_{a}&=\mathbf{n}_{b_{a}}\\ \end{aligned}\end{cases}{ start_ROW start_CELL start_ROW start_CELL italic_δ over˙ start_ARG bold_p end_ARG start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_CELL start_CELL = italic_δ bold_v start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ over˙ start_ARG bold_v end_ARG start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_CELL start_CELL = - bold_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( over~ start_ARG bold_a end_ARG - bold_b start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∧ end_POSTSUPERSCRIPT italic_δ bold_italic_θ - bold_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT italic_δ bold_b start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT - bold_R start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ over˙ start_ARG bold_italic_θ end_ARG end_CELL start_CELL = - ( overbold_~ start_ARG bold_italic_ω end_ARG bold_- bold_b start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∧ end_POSTSUPERSCRIPT italic_δ bold_italic_θ - italic_δ bold_b start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT - bold_n start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ over˙ start_ARG bold_b end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_CELL start_CELL = bold_n start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ over˙ start_ARG bold_b end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL start_CELL = bold_n start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW end_CELL start_CELL end_CELL end_ROW (6)

where δ𝜽˙𝛿bold-˙𝜽\delta\bm{\dot{\theta}}italic_δ overbold_˙ start_ARG bold_italic_θ end_ARG, δ𝐯˙n𝛿superscript˙𝐯𝑛\delta\dot{\mathbf{v}}^{n}italic_δ over˙ start_ARG bold_v end_ARG start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and δ𝐩˙n𝛿superscript˙𝐩𝑛\delta\dot{\mathbf{p}}^{n}italic_δ over˙ start_ARG bold_p end_ARG start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT represent the derivative of attitude, velocity and position errors in n𝑛\mathit{n}italic_n frame, respectively. The derivatives of δ𝐛˙a𝛿subscript˙𝐛𝑎\delta\dot{\mathbf{b}}_{a}italic_δ over˙ start_ARG bold_b end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and δ𝐛˙ω𝛿subscript˙𝐛𝜔\delta\dot{\mathbf{b}}_{\omega}italic_δ over˙ start_ARG bold_b end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT, denoting the accelerometer and gyroscope biases in b𝑏\mathit{b}italic_b frame, respectively. In addition, 𝐑bnsuperscriptsubscript𝐑𝑏𝑛\mathbf{R}_{b}^{n}bold_R start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT represents the rotation matrix from b𝑏\mathit{b}italic_b frame to n𝑛\mathit{n}italic_n frame; 𝝎~bold-~𝝎\bm{\tilde{\omega}}overbold_~ start_ARG bold_italic_ω end_ARG and 𝒂~~𝒂\tilde{\bm{a}}over~ start_ARG bold_italic_a end_ARG represent the outputs of gyroscope and accelerometer, respectively; 𝒃ωsubscript𝒃𝜔\bm{b}_{\omega}bold_italic_b start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT and 𝒃asubscript𝒃𝑎\bm{b}_{a}bold_italic_b start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT represent the nominal biases of gyroscope and accelerometer, respectively; δ𝜽𝛿𝜽\delta\bm{\theta}italic_δ bold_italic_θ and δ𝐯n𝛿superscript𝐯𝑛\delta\mathbf{v}^{n}italic_δ bold_v start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT represent the errors of attitude and velocity in n𝑛\mathit{n}italic_n frame, respectively; δ𝒃ω𝛿subscript𝒃𝜔\delta\bm{b}_{\omega}italic_δ bold_italic_b start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT and δ𝒃a𝛿subscript𝒃𝑎\delta\bm{b}_{a}italic_δ bold_italic_b start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT represent the errors of gyroscope bias and accelerometer bias, respectively; 𝒏ωsubscript𝒏𝜔\bm{n}_{\omega}bold_italic_n start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT and 𝒏asubscript𝒏𝑎\bm{n}_{a}bold_italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT represent the noises of angular rate and acceleration, respectively; 𝐧basubscript𝐧subscript𝑏𝑎\mathbf{n}_{b_{a}}bold_n start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝐧bωsubscript𝐧subscript𝑏𝜔\mathbf{n}_{b_{\omega}}bold_n start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_POSTSUBSCRIPT represent the noises of gyroscope bias and accelerometer bias, respectively. The symbol ()superscript\left(\cdot\right)^{\wedge}( ⋅ ) start_POSTSUPERSCRIPT ∧ end_POSTSUPERSCRIPT is the cross-product.

Therefore, the error state vector of INS can be expressed as:

δ𝒙ins=[δ𝐩nδ𝐯nδ𝜽δ𝐛aδ𝐛ω]T𝛿subscript𝒙𝑖𝑛𝑠superscript𝛿superscript𝐩𝑛𝛿superscript𝐯𝑛𝛿𝜽𝛿subscript𝐛𝑎𝛿subscript𝐛𝜔𝑇\delta\bm{x}_{ins}=\left[\delta\mathbf{p}^{n}\quad\delta\mathbf{v}^{n}\quad% \delta\bm{\theta}\quad\delta\mathbf{b}_{a}\quad\delta\mathbf{b}_{\omega}\right% ]^{T}italic_δ bold_italic_x start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT = [ italic_δ bold_p start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ bold_v start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ bold_italic_θ italic_δ bold_b start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_δ bold_b start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (7)

3) Visual Measurement Model: The core idea of the well-known MSCKF is to establish geometric constraints between multi-camera states by utilizing the same visual feature points observed by multi-cameras. Following this concept, we establish a visual model. For a visual feature point fjsuperscript𝑓𝑗f^{j}italic_f start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT observed by a stereo camera at time i𝑖iitalic_i, its visual observation model [34] on the normalized projection planes of the left and right cameras can be represented as follows:

zcam,ij=[uc0,ijvc0,ijuc1,ijvc1,ij]=[1Zc0,ij𝐈2×2𝟎2×2𝟎2×21Zc1,ij𝐈2×2][Xc0,ijYc0,ijXc1,ijYc1,ij]+ϵcam,ijsuperscriptsubscript𝑧𝑐𝑎𝑚𝑖𝑗delimited-[]matrixsuperscriptsubscript𝑢subscript𝑐0𝑖𝑗superscriptsubscript𝑣subscript𝑐0𝑖𝑗superscriptsubscript𝑢subscript𝑐1𝑖𝑗superscriptsubscript𝑣subscript𝑐1𝑖𝑗delimited-[]matrix1superscriptsubscript𝑍subscript𝑐0𝑖𝑗subscript𝐈22subscript022subscript0221superscriptsubscript𝑍subscript𝑐1𝑖𝑗subscript𝐈22delimited-[]matrixsuperscriptsubscript𝑋subscript𝑐0𝑖𝑗superscriptsubscript𝑌subscript𝑐0𝑖𝑗superscriptsubscript𝑋subscript𝑐1𝑖𝑗superscriptsubscript𝑌subscript𝑐1𝑖𝑗superscriptsubscriptitalic-ϵ𝑐𝑎𝑚𝑖𝑗\mathit{z}_{cam,i}^{j}=\left[\begin{matrix}\mathit{u}_{c_{0,i}}^{j}\\ \mathit{v}_{c_{0,i}}^{j}\\ \mathit{u}_{c_{1,i}}^{j}\\ \mathit{v}_{c_{1,i}}^{j}\end{matrix}\right]=\left[\begin{matrix}\frac{1}{% \mathit{Z}_{c_{0,i}}^{j}}\mathbf{I}_{2\times 2}&\mathbf{0}_{2\times 2}\\ \mathbf{0}_{2\times 2}&\frac{1}{\mathit{Z}_{c_{1,i}}^{j}}\mathbf{I}_{2\times 2% }\end{matrix}\right]\left[\begin{matrix}\mathit{X}_{c_{0,i}}^{j}\\ \mathit{Y}_{c_{0,i}}^{j}\\ \mathit{X}_{c_{1,i}}^{j}\\ \mathit{Y}_{c_{1,i}}^{j}\end{matrix}\right]+\mathit{\epsilon}_{cam,i}^{j}italic_z start_POSTSUBSCRIPT italic_c italic_a italic_m , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL italic_u start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_v start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_v start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_Z start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG bold_I start_POSTSUBSCRIPT 2 × 2 end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT 2 × 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_0 start_POSTSUBSCRIPT 2 × 2 end_POSTSUBSCRIPT end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG italic_Z start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG bold_I start_POSTSUBSCRIPT 2 × 2 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] + italic_ϵ start_POSTSUBSCRIPT italic_c italic_a italic_m , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT (8)

where the subscripts 0 and 1 represent the left and right cameras, respectively. (uc0,jj,vc0,jj)Tsuperscriptsuperscriptsubscript𝑢subscript𝑐0𝑗𝑗superscriptsubscript𝑣subscript𝑐0𝑗𝑗𝑇\left(\mathit{u}_{c_{0,j}}^{j},\mathit{v}_{c_{0,j}}^{j}\right)^{T}( italic_u start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and (uc1,jj,vc1,jj)Tsuperscriptsuperscriptsubscript𝑢subscript𝑐1𝑗𝑗superscriptsubscript𝑣subscript𝑐1𝑗𝑗𝑇\left(\mathit{u}_{c_{1,j}}^{j},\mathit{v}_{c_{1,j}}^{j}\right)^{T}( italic_u start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT are the pixel coordinates of the same feature point on the normalized plane for left camera and right camera, respectively. ϵcam,ijsuperscriptsubscriptitalic-ϵ𝑐𝑎𝑚𝑖𝑗\mathit{\epsilon}_{cam,i}^{j}italic_ϵ start_POSTSUBSCRIPT italic_c italic_a italic_m , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT is visual measurement noise. (Xc0,jj,Yc0,jj,Zc0,jj)Tsuperscriptsuperscriptsubscript𝑋subscript𝑐0𝑗𝑗superscriptsubscript𝑌subscript𝑐0𝑗𝑗superscriptsubscript𝑍subscript𝑐0𝑗𝑗𝑇\left(\mathit{X}_{c_{0,j}}^{j},\mathit{Y}_{c_{0,j}}^{j},\mathit{Z}_{c_{0,j}}^{% j}\right)^{T}( italic_X start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_Y start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and (Xc1,jj,Yc1,jj,Zc1,jj)Tsuperscriptsuperscriptsubscript𝑋subscript𝑐1𝑗𝑗superscriptsubscript𝑌subscript𝑐1𝑗𝑗superscriptsubscript𝑍subscript𝑐1𝑗𝑗𝑇\left(\mathit{X}_{c_{1,j}}^{j},\mathit{Y}_{c_{1,j}}^{j},\mathit{Z}_{c_{1,j}}^{% j}\right)^{T}( italic_X start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_Y start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT represents the position of the same feature point for left camera and right camera in c𝑐\mathit{c}italic_c frame, which can be expressed as:

[Xc0,ijYc0,ijZc0,ij]=(𝐑c0,in)T(𝒑jn𝒑c0,in)matrixsuperscriptsubscript𝑋subscript𝑐0𝑖𝑗superscriptsubscript𝑌subscript𝑐0𝑖𝑗superscriptsubscript𝑍subscript𝑐0𝑖𝑗superscriptsuperscriptsubscript𝐑subscript𝑐0𝑖𝑛𝑇superscriptsubscript𝒑𝑗𝑛superscriptsubscript𝒑subscript𝑐0𝑖𝑛\begin{bmatrix}\mathit{X}_{c_{0,i}}^{j}\\ \mathit{Y}_{c_{0,i}}^{j}\\ \mathit{Z}_{c_{0,i}}^{j}\end{bmatrix}=\left(\mathbf{R}_{c_{0,i}}^{n}\right)^{T% }\left(\bm{p}_{j}^{n}-\bm{p}_{c_{0,i}}^{n}\right)[ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Z start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] = ( bold_R start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - bold_italic_p start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) (9)
[Xc1,ijYc1,ijZc1,ij]=(𝐑c0,ic1,i)T(𝒑jc0,i𝒑c1,ic0,i)matrixsuperscriptsubscript𝑋subscript𝑐1𝑖𝑗superscriptsubscript𝑌subscript𝑐1𝑖𝑗superscriptsubscript𝑍subscript𝑐1𝑖𝑗superscriptsuperscriptsubscript𝐑subscript𝑐0𝑖subscript𝑐1𝑖𝑇superscriptsubscript𝒑𝑗subscript𝑐0𝑖superscriptsubscript𝒑subscript𝑐1𝑖subscript𝑐0𝑖\begin{bmatrix}\mathit{X}_{c_{1,i}}^{j}\\ \mathit{Y}_{c_{1,i}}^{j}\\ \mathit{Z}_{c_{1,i}}^{j}\end{bmatrix}=\left(\mathbf{R}_{c_{0,i}}^{c_{1,i}}% \right)^{T}\left(\bm{p}_{j}^{c_{0,i}}-\bm{p}_{c_{1,i}}^{c_{0,i}}\right)[ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Z start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] = ( bold_R start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT - bold_italic_p start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) (10)

where 𝐑c0,ic1,isuperscriptsubscript𝐑subscript𝑐0𝑖subscript𝑐1𝑖\mathbf{R}_{c_{0,i}}^{c_{1,i}}bold_R start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and 𝒑c0,insuperscriptsubscript𝒑subscript𝑐0𝑖𝑛\bm{p}_{c_{0,i}}^{n}bold_italic_p start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are the rotation matrix and position of the left camera at time i𝑖\mathit{i}italic_i in n𝑛\mathit{n}italic_n frame, respectively. 𝐑c0,ic1,isuperscriptsubscript𝐑subscript𝑐0𝑖subscript𝑐1𝑖\mathbf{R}_{c_{0,i}}^{c_{1,i}}bold_R start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the rotation matrix from left camera to right camera at time i𝑖\mathit{i}italic_i, 𝒑c1,ic0,isuperscriptsubscript𝒑subscript𝑐1𝑖subscript𝑐0𝑖\bm{p}_{c_{1,i}}^{c_{0,i}}bold_italic_p start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the translation matrix from left camera to right camera, which can be accurately corrected in advance [36]. 𝒑jnsuperscriptsubscript𝒑𝑗𝑛\bm{p}_{j}^{n}bold_italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝒑jc0,isuperscriptsubscript𝒑𝑗subscript𝑐0𝑖\bm{p}_{j}^{c_{0,i}}bold_italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT respectively are the positions of the same visual feature point in n𝑛\mathit{n}italic_n frame and left c𝑐\mathit{c}italic_c frame.

We adopted the method proposed by [37] to construct the visual reprojection error between relative camera poses, and the visual state vector was described as:

δ𝒙cam=[δ𝜽c1nδ𝒑c1nδ𝜽c2nδ𝒑c2nδ𝜽csnδ𝒑csn]T𝛿subscript𝒙𝑐𝑎𝑚superscript𝛿subscriptsuperscript𝜽𝑛subscript𝑐1𝛿subscriptsuperscript𝒑𝑛subscript𝑐1𝛿subscriptsuperscript𝜽𝑛subscript𝑐2𝛿subscriptsuperscript𝒑𝑛subscript𝑐2𝛿subscriptsuperscript𝜽𝑛subscript𝑐𝑠𝛿subscriptsuperscript𝒑𝑛subscript𝑐𝑠𝑇\delta\bm{x}_{cam}=\left[\delta\bm{\theta}^{n}_{c_{1}}\quad\delta\bm{p}^{n}_{c% _{1}}\quad\delta\bm{\theta}^{n}_{c_{2}}\quad\delta\bm{p}^{n}_{c_{2}}\quad...% \quad\delta\bm{\theta}^{n}_{c_{s}}\quad\delta\bm{p}^{n}_{c_{s}}\right]^{T}italic_δ bold_italic_x start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT = [ italic_δ bold_italic_θ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ bold_italic_p start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ bold_italic_θ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ bold_italic_p start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT … italic_δ bold_italic_θ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ bold_italic_p start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (11)

where δ𝜽cin𝛿subscriptsuperscript𝜽𝑛subscript𝑐𝑖\delta\bm{\theta}^{n}_{c_{i}}italic_δ bold_italic_θ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and δ𝒑cin𝛿subscriptsuperscript𝒑𝑛subscript𝑐𝑖\delta\bm{p}^{n}_{c_{i}}italic_δ bold_italic_p start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT are attitude errors and position errors at time i𝑖\mathit{i}italic_i. The subscript s𝑠\mathit{s}italic_s represents the total number of camera poses in the sliding window. The measurement equation of visual reprojection error is expressed as follows:

δ𝒛cam=𝒛~cam𝒛^cam=𝑯camδ𝒙cam+𝑽cam𝛿subscript𝒛𝑐𝑎𝑚subscriptbold-~𝒛𝑐𝑎𝑚subscriptbold-^𝒛𝑐𝑎𝑚subscript𝑯𝑐𝑎𝑚𝛿subscript𝒙𝑐𝑎𝑚subscript𝑽𝑐𝑎𝑚\delta\bm{z}_{cam}=\bm{\tilde{z}}_{cam}-\bm{\hat{z}}_{cam}=\bm{H}_{cam}\delta% \bm{x}_{cam}+\bm{V}_{cam}italic_δ bold_italic_z start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT = overbold_~ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT - overbold_^ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT = bold_italic_H start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT italic_δ bold_italic_x start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT + bold_italic_V start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT (12)

𝒛~camsubscriptbold-~𝒛𝑐𝑎𝑚\bm{\tilde{z}}_{cam}overbold_~ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT and 𝒛^camsubscriptbold-^𝒛𝑐𝑎𝑚\bm{\hat{z}}_{cam}overbold_^ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT represent visual observations and visual reprojection observations, respectively; 𝑯camsubscript𝑯𝑐𝑎𝑚\bm{H}_{cam}bold_italic_H start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT represents the Jacobi matrix of stereo camera positioning model.

4) State and Measurement model of the Tightly Coupled GNSS/INS/Vision: This paper employs MSCKF for the tightly coupled GNSS/INS/Vision integration. Based on the above introductions of different sensor models, the complete state model for the tightly coupled GNSS/INS/Vision integration is as follows:

δ𝒙=[δ𝒙insδ𝒙GNSSδ𝒙cam]T𝛿𝒙superscript𝛿subscript𝒙𝑖𝑛𝑠𝛿subscript𝒙𝐺𝑁𝑆𝑆𝛿subscript𝒙𝑐𝑎𝑚𝑇\delta\bm{x}=\left[\delta\bm{x}_{ins}\quad\delta\bm{x}_{GNSS}\quad\delta\bm{x}% _{cam}\right]^{T}italic_δ bold_italic_x = [ italic_δ bold_italic_x start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT italic_δ bold_italic_x start_POSTSUBSCRIPT italic_G italic_N italic_S italic_S end_POSTSUBSCRIPT italic_δ bold_italic_x start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (13)

For both SPP and RTK positioning modes, this paper has constructed state models separately:

δxGNSS,SPP=[δtr]T𝛿subscript𝑥𝐺𝑁𝑆𝑆𝑆𝑃𝑃superscriptdelimited-[]𝛿subscript𝑡𝑟𝑇\delta x_{GNSS,SPP}=[\delta t_{r}]^{T}italic_δ italic_x start_POSTSUBSCRIPT italic_G italic_N italic_S italic_S , italic_S italic_P italic_P end_POSTSUBSCRIPT = [ italic_δ italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (14)
δxGNSS,RTK=[δΔN]T𝛿subscript𝑥𝐺𝑁𝑆𝑆𝑅𝑇𝐾superscriptdelimited-[]𝛿Δ𝑁𝑇\delta x_{GNSS,RTK}=[\delta\nabla\Delta N]^{T}italic_δ italic_x start_POSTSUBSCRIPT italic_G italic_N italic_S italic_S , italic_R italic_T italic_K end_POSTSUBSCRIPT = [ italic_δ ∇ roman_Δ italic_N ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (15)

where δΔN𝛿Δ𝑁\delta\nabla\Delta Nitalic_δ ∇ roman_Δ italic_N represents the DD carrier phase ambiguity.In addition, the error state model of INS and vision have already been provided in equation (7) and (11).

The state prediction model for the tightly coupled GNSS/INS/Vision integration is as follows:

[δ𝒙˙insδ𝒙˙GNSSδ𝒙˙cam]=[𝑭ins𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎][δ𝒙insδ𝒙GNSSδ𝒙cam]+[𝒏ins𝒏GNSS𝟎]matrix𝛿subscriptbold-˙𝒙𝑖𝑛𝑠𝛿subscriptbold-˙𝒙𝐺𝑁𝑆𝑆𝛿subscriptbold-˙𝒙𝑐𝑎𝑚matrixsubscript𝑭𝑖𝑛𝑠00000000matrix𝛿subscript𝒙𝑖𝑛𝑠𝛿subscript𝒙𝐺𝑁𝑆𝑆𝛿subscript𝒙𝑐𝑎𝑚matrixsubscript𝒏𝑖𝑛𝑠subscript𝒏𝐺𝑁𝑆𝑆0\small\begin{bmatrix}\delta\bm{\dot{x}}_{ins}\\ \delta\bm{\dot{x}}_{GNSS}\\ \delta\bm{\dot{x}}_{cam}\end{bmatrix}=\begin{bmatrix}\bm{F}_{ins}&\mathbf{0}&% \mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}\end{bmatrix}\begin{bmatrix}\delta\bm{x}_{ins}% \\ \delta\bm{x}_{GNSS}\\ \delta\bm{x}_{cam}\end{bmatrix}+\begin{bmatrix}\bm{n}_{ins}\\ \bm{n}_{GNSS}\\ \mathbf{0}\end{bmatrix}[ start_ARG start_ROW start_CELL italic_δ overbold_˙ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ overbold_˙ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_G italic_N italic_S italic_S end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ overbold_˙ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL bold_italic_F start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_δ bold_italic_x start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ bold_italic_x start_POSTSUBSCRIPT italic_G italic_N italic_S italic_S end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ bold_italic_x start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL bold_italic_n start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_n start_POSTSUBSCRIPT italic_G italic_N italic_S italic_S end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL end_ROW end_ARG ] (16)

where 𝑭inssubscript𝑭𝑖𝑛𝑠\bm{F}_{ins}bold_italic_F start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT is the system matrices of INS state which could be directly from equation (6). 𝒏inssubscript𝒏𝑖𝑛𝑠\bm{n}_{ins}bold_italic_n start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT and 𝒏GNSSsubscript𝒏𝐺𝑁𝑆𝑆\bm{n}_{GNSS}bold_italic_n start_POSTSUBSCRIPT italic_G italic_N italic_S italic_S end_POSTSUBSCRIPT are the process noises of INS and map, respectively. In addition, the camera poses in the sliding window are considered constant, so its process noise is 𝟎0\mathbf{0}bold_0. Based on equation (6), the special form of can be written as:

𝑭ins=[𝟎𝐈𝟎𝟎𝟎𝟎𝟎𝐑bn(𝐚~𝐛a)𝐑bn𝟎𝟎𝟎(𝝎~𝐛w)𝟎𝐈𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎]subscript𝑭𝑖𝑛𝑠matrix0𝐈00000subscriptsuperscript𝐑𝑛𝑏superscript~𝐚subscript𝐛𝑎subscriptsuperscript𝐑𝑛𝑏000superscriptbold-~𝝎subscript𝐛𝑤0𝐈0000000000\bm{F}_{ins}=\begin{bmatrix}\mathbf{0}&{\mathbf{I}}&\mathbf{0}&\mathbf{0}&% \mathbf{0}\\ \mathbf{0}&\mathbf{0}&-\mathbf{R}^{n}_{b}\left(\tilde{\mathbf{a}}-\mathbf{b}_{% a}\right)^{\wedge}&-\mathbf{R}^{n}_{b}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&-\left(\bm{\tilde{\omega}-\mathbf{b}}_{w}\right)^{\wedge% }&\mathbf{0}&-\mathbf{I}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{0}\end{bmatrix}bold_italic_F start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL bold_0 end_CELL start_CELL bold_I end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL - bold_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( over~ start_ARG bold_a end_ARG - bold_b start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∧ end_POSTSUPERSCRIPT end_CELL start_CELL - bold_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL - ( overbold_~ start_ARG bold_italic_ω end_ARG bold_- bold_b start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∧ end_POSTSUPERSCRIPT end_CELL start_CELL bold_0 end_CELL start_CELL - bold_I end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW end_ARG ] (17)

where 𝐈𝐈\mathbf{I}bold_I is the identity matrix.

To deal with discrete time measurement from the INS, the 4th order Runge-kutta [7] numerical integration of equation (17) to propagate the estimated state variables. It is worth noting, only the IMU state variables are propagated, the visual and GNSS state variables are only copied. Meanwhile, we also need to propagate the covariance of the state:

𝑷k,k1subscript𝑷𝑘𝑘1\displaystyle\bm{P}_{k,k-1}bold_italic_P start_POSTSUBSCRIPT italic_k , italic_k - 1 end_POSTSUBSCRIPT =𝚽k,k1𝑷k1𝚽k,k1T+𝑸k1absentsubscript𝚽𝑘𝑘1subscript𝑷𝑘1superscriptsubscript𝚽𝑘𝑘1𝑇subscript𝑸𝑘1\displaystyle=\bm{\Phi}_{k,k-1}\bm{P}_{k-1}\bm{\Phi}_{k,k-1}^{T}+\bm{Q}_{k-1}= bold_Φ start_POSTSUBSCRIPT italic_k , italic_k - 1 end_POSTSUBSCRIPT bold_italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT bold_Φ start_POSTSUBSCRIPT italic_k , italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + bold_italic_Q start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT (18)
𝚽k,k1subscript𝚽𝑘𝑘1\displaystyle\bm{\Phi}_{k,k-1}bold_Φ start_POSTSUBSCRIPT italic_k , italic_k - 1 end_POSTSUBSCRIPT =𝚽(tk1,tk)=exp(tktk1𝑭(τ)dτ)absent𝚽subscript𝑡𝑘1subscript𝑡𝑘superscriptsubscriptsubscript𝑡𝑘subscript𝑡𝑘1𝑭𝜏differential-d𝜏\displaystyle=\bm{\Phi}(t_{k-1},t_{k})=\exp(\int_{t_{k}}^{t_{k-1}}\bm{F}(\tau)% \mathrm{d}\tau)= bold_Φ ( italic_t start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = roman_exp ( ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_italic_F ( italic_τ ) roman_d italic_τ )

where 𝚽k,k1subscript𝚽𝑘𝑘1\bm{\Phi}_{k,k-1}bold_Φ start_POSTSUBSCRIPT italic_k , italic_k - 1 end_POSTSUBSCRIPT represents the discrete state transition matrix, 𝑭(τ)𝑭𝜏\bm{F}(\tau)bold_italic_F ( italic_τ ) is the continuous time state transition matrix at time τ𝜏\tauitalic_τ (τ(tk,tk+1)𝜏subscript𝑡𝑘subscript𝑡𝑘1\tau\in\left(t_{k},t_{k+1}\right)italic_τ ∈ ( italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT )) and 𝑸k1subscript𝑸𝑘1\bm{Q}_{k-1}bold_italic_Q start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT is the discrete time noise covariance. 𝑷k1subscript𝑷𝑘1\bm{P}_{k-1}bold_italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT represents the error state covariance matrix before augmentation. 𝑷k,k1subscript𝑷𝑘𝑘1\bm{P}_{k,k-1}bold_italic_P start_POSTSUBSCRIPT italic_k , italic_k - 1 end_POSTSUBSCRIPT represents the one-step prediction error covariance matrix from time tk1subscript𝑡𝑘1t_{k-1}italic_t start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT to time tksubscript𝑡𝑘t_{k}italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

It is worth noting that every time a new image is recorded, the state and covariance matrix will be augmented with a copy of the current camera pose estimate. The initial value of camera pose is derived from the INS mechanization and the covariance matrix 𝑷ksubscript𝑷𝑘\bm{P}_{k}bold_italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT after augmented can be expressed as:

𝑷k=[𝐈15+y+6m𝐉]𝑷k1[𝐈15+y+6m𝐉]Tsubscript𝑷𝑘delimited-[]matrixsubscript𝐈15𝑦6𝑚𝐉subscript𝑷𝑘1superscriptdelimited-[]matrixsubscript𝐈15𝑦6𝑚𝐉𝑇\bm{P}_{k}=\left[\begin{matrix}\mathbf{I}_{15+y+6m}\\ \mathbf{J}\end{matrix}\right]\bm{P}_{k-1}\left[\begin{matrix}\mathbf{I}_{15+y+% 6m}\\ \mathbf{J}\end{matrix}\right]^{T}bold_italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL bold_I start_POSTSUBSCRIPT 15 + italic_y + 6 italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_J end_CELL end_ROW end_ARG ] bold_italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT [ start_ARG start_ROW start_CELL bold_I start_POSTSUBSCRIPT 15 + italic_y + 6 italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_J end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (19)

where y𝑦yitalic_y and m𝑚mitalic_m represent the number of variables related to GNSS and vision at a certain moment. And when the GNSS is recorded, we only need to remove and add state variables and corresponding covariance. 𝐉𝐉\mathbf{J}bold_J is the Jacobi matrix, which has the following form:

𝐉=[(𝐑cb)T𝟎𝟎𝟎𝟎𝟎3×(y+6m)𝐑bc(𝒑cb)𝟎𝐈𝟎𝟎𝟎3×(y+6m)]𝐉delimited-[]matrixsuperscriptsuperscriptsubscript𝐑𝑐𝑏𝑇0000subscript03𝑦6𝑚superscriptsubscript𝐑𝑏𝑐superscriptsuperscriptsubscript𝒑𝑐𝑏0𝐈00subscript03𝑦6𝑚\mathbf{J}=\left[\begin{matrix}(\mathbf{R}_{c}^{b})^{T}&\mathbf{0}&\mathbf{0}&% \mathbf{0}&\mathbf{0}&\mathbf{0}_{3\times(y+6m)}\\ -\mathbf{R}_{b}^{c}(\bm{p}_{c}^{b})^{\wedge}&\mathbf{0}&\mathbf{I}&\mathbf{0}&% \mathbf{0}&\mathbf{0}_{3\times(y+6m)}\end{matrix}\right]bold_J = [ start_ARG start_ROW start_CELL ( bold_R start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 start_POSTSUBSCRIPT 3 × ( italic_y + 6 italic_m ) end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL - bold_R start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( bold_italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ∧ end_POSTSUPERSCRIPT end_CELL start_CELL bold_0 end_CELL start_CELL bold_I end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 start_POSTSUBSCRIPT 3 × ( italic_y + 6 italic_m ) end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] (20)

where 𝐑cbsuperscriptsubscript𝐑𝑐𝑏\mathbf{R}_{c}^{b}bold_R start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT and 𝒑cbsuperscriptsubscript𝒑𝑐𝑏\bm{p}_{c}^{b}bold_italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT are the rotation matrix and translation matrix between camera and IMU, which are calibrated offline [36].

Based on the previous equations, the measurement equation for the tightly coupled SPP/INS/Vision integration are formulated as follows:

[δ𝑷SPPδ𝒛cam]delimited-[]matrix𝛿subscript𝑷𝑆𝑃𝑃𝛿subscript𝒛𝑐𝑎𝑚\displaystyle\left[\begin{matrix}\delta\bm{P}_{SPP}\\ \delta\bm{z}_{cam}\end{matrix}\right][ start_ARG start_ROW start_CELL italic_δ bold_italic_P start_POSTSUBSCRIPT italic_S italic_P italic_P end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ bold_italic_z start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] =[𝐇P,SPP𝐇cam][δ𝒙insδ𝒙GNSS,SPPδ𝒙cam]+[𝜺P,SPP𝜺cam]absentdelimited-[]matrixsubscript𝐇𝑃𝑆𝑃𝑃subscript𝐇𝑐𝑎𝑚delimited-[]matrix𝛿subscript𝒙𝑖𝑛𝑠𝛿subscript𝒙𝐺𝑁𝑆𝑆𝑆𝑃𝑃𝛿subscript𝒙𝑐𝑎𝑚delimited-[]matrixsubscript𝜺𝑃𝑆𝑃𝑃subscript𝜺𝑐𝑎𝑚\displaystyle=\left[\begin{matrix}{\bf H}_{P,SPP}\\ {\bf H}_{cam}\end{matrix}\right]\left[\begin{matrix}\delta\bm{x}_{ins}\\ \delta\bm{x}_{GNSS,SPP}\\ \delta\bm{x}_{cam}\end{matrix}\right]+\left[\begin{matrix}\bm{\varepsilon}_{P,% SPP}\\ \bm{\varepsilon}_{cam}\end{matrix}\right]= [ start_ARG start_ROW start_CELL bold_H start_POSTSUBSCRIPT italic_P , italic_S italic_P italic_P end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_H start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_δ bold_italic_x start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ bold_italic_x start_POSTSUBSCRIPT italic_G italic_N italic_S italic_S , italic_S italic_P italic_P end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ bold_italic_x start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL bold_italic_ε start_POSTSUBSCRIPT italic_P , italic_S italic_P italic_P end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_ε start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] (21)
[δ𝑷SPPδ𝒛cam]delimited-[]matrix𝛿subscript𝑷𝑆𝑃𝑃𝛿subscript𝒛𝑐𝑎𝑚\displaystyle\left[\begin{matrix}\delta\bm{P}_{SPP}\\ \delta\bm{z}_{cam}\end{matrix}\right][ start_ARG start_ROW start_CELL italic_δ bold_italic_P start_POSTSUBSCRIPT italic_S italic_P italic_P end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ bold_italic_z start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] =[𝑷𝑷^ins𝒛cam𝒛^cam]absentdelimited-[]matrix𝑷subscriptbold-^𝑷𝑖𝑛𝑠subscript𝒛𝑐𝑎𝑚subscript^𝒛𝑐𝑎𝑚\displaystyle=\left[\begin{matrix}\bm{P}-\bm{\hat{P}}_{ins}\\ \bm{z}_{cam}-\hat{\bm{z}}_{cam}\end{matrix}\right]= [ start_ARG start_ROW start_CELL bold_italic_P - overbold_^ start_ARG bold_italic_P end_ARG start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_z start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT - over^ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ]

where δ𝑷SPP𝛿subscript𝑷𝑆𝑃𝑃\delta\bm{P}_{SPP}italic_δ bold_italic_P start_POSTSUBSCRIPT italic_S italic_P italic_P end_POSTSUBSCRIPT is error of pseudorange observation in SPP and δ𝒛cam𝛿subscript𝒛𝑐𝑎𝑚\delta\bm{z}_{cam}italic_δ bold_italic_z start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT is error of visual observation in equation (12). 𝐇P,SPPsubscript𝐇𝑃𝑆𝑃𝑃{\bf H}_{P,SPP}bold_H start_POSTSUBSCRIPT italic_P , italic_S italic_P italic_P end_POSTSUBSCRIPT is the Jacobi matrix of pseudorange error and 𝐇camsubscript𝐇𝑐𝑎𝑚{\bf H}_{cam}bold_H start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT is the Jacobi matrix of the involved camera states in equation (12). Then δ𝒙ins𝛿subscript𝒙𝑖𝑛𝑠\delta\bm{x}_{ins}italic_δ bold_italic_x start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT, δ𝒙GNSS,SPP𝛿subscript𝒙𝐺𝑁𝑆𝑆𝑆𝑃𝑃\delta\bm{x}_{GNSS,SPP}italic_δ bold_italic_x start_POSTSUBSCRIPT italic_G italic_N italic_S italic_S , italic_S italic_P italic_P end_POSTSUBSCRIPT and δ𝒙cam𝛿subscript𝒙𝑐𝑎𝑚\delta\bm{x}_{cam}italic_δ bold_italic_x start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT are the error state vectors of INS, SPP and Vision which can be found in equation (7), (14) and (11), respectively. In the same way, 𝜺P,SPPsubscript𝜺𝑃𝑆𝑃𝑃\bm{\varepsilon}_{P,SPP}bold_italic_ε start_POSTSUBSCRIPT italic_P , italic_S italic_P italic_P end_POSTSUBSCRIPT and 𝜺camsubscript𝜺𝑐𝑎𝑚\bm{\varepsilon}_{cam}bold_italic_ε start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT denote pseudorange observation error noise in SPP and visual observation error noise, respectively. In addition, P𝑃Pitalic_P and 𝑷^inssubscriptbold-^𝑷𝑖𝑛𝑠\bm{\hat{P}}_{ins}overbold_^ start_ARG bold_italic_P end_ARG start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT are the actual measured pseudorange in equation (1) and the pseudorange predicted by INS mechanization, respectively. 𝒛camsubscript𝒛𝑐𝑎𝑚\bm{z}_{cam}bold_italic_z start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT and 𝒛^camsubscript^𝒛𝑐𝑎𝑚\hat{\bm{z}}_{cam}over^ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT respectively represent the observed and reprojected visual measurements in equation (12).

The measurement equation for the tightly coupled RTK/INS/Vision integration are formulated as follows:

[δ𝑷RTKδ𝑳RTKδ𝒛cam]delimited-[]matrix𝛿subscript𝑷𝑅𝑇𝐾𝛿subscript𝑳𝑅𝑇𝐾𝛿subscript𝒛𝑐𝑎𝑚\displaystyle\left[\begin{matrix}\delta\bm{P}_{RTK}\\ \delta\bm{L}_{RTK}\\ \delta\bm{z}_{cam}\end{matrix}\right][ start_ARG start_ROW start_CELL italic_δ bold_italic_P start_POSTSUBSCRIPT italic_R italic_T italic_K end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ bold_italic_L start_POSTSUBSCRIPT italic_R italic_T italic_K end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ bold_italic_z start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] =[𝐇P,RTK𝐇L,RTK𝐇cam][δ𝒙insδ𝒙GNSS,RTKδ𝒙cam]+[𝜺ΔP,RTK𝜺ΔL,RTK𝜺cam]absentdelimited-[]matrixsubscript𝐇𝑃𝑅𝑇𝐾subscript𝐇𝐿𝑅𝑇𝐾subscript𝐇𝑐𝑎𝑚delimited-[]matrix𝛿subscript𝒙𝑖𝑛𝑠𝛿subscript𝒙𝐺𝑁𝑆𝑆𝑅𝑇𝐾𝛿subscript𝒙𝑐𝑎𝑚delimited-[]matrixsubscript𝜺Δ𝑃𝑅𝑇𝐾subscript𝜺Δ𝐿𝑅𝑇𝐾subscript𝜺𝑐𝑎𝑚\displaystyle=\left[\begin{matrix}{\bf H}_{P,RTK}\\ {\bf H}_{L,RTK}\\ {\bf H}_{cam}\end{matrix}\right]\left[\begin{matrix}\delta\bm{x}_{ins}\\ \delta\bm{x}_{GNSS,RTK}\\ \delta\bm{x}_{cam}\end{matrix}\right]+\left[\begin{matrix}\bm{\varepsilon}_{% \nabla\Delta P,RTK}\\ \bm{\varepsilon}_{\nabla\Delta L,RTK}\\ \bm{\varepsilon}_{cam}\end{matrix}\right]= [ start_ARG start_ROW start_CELL bold_H start_POSTSUBSCRIPT italic_P , italic_R italic_T italic_K end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_H start_POSTSUBSCRIPT italic_L , italic_R italic_T italic_K end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_H start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_δ bold_italic_x start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ bold_italic_x start_POSTSUBSCRIPT italic_G italic_N italic_S italic_S , italic_R italic_T italic_K end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ bold_italic_x start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL bold_italic_ε start_POSTSUBSCRIPT ∇ roman_Δ italic_P , italic_R italic_T italic_K end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_ε start_POSTSUBSCRIPT ∇ roman_Δ italic_L , italic_R italic_T italic_K end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_ε start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] (22)
[δ𝑷RTKδ𝑳RTKδ𝒛cam]delimited-[]matrix𝛿subscript𝑷𝑅𝑇𝐾𝛿subscript𝑳𝑅𝑇𝐾𝛿subscript𝒛𝑐𝑎𝑚\displaystyle\left[\begin{matrix}\delta\bm{P}_{RTK}\\ \delta\bm{L}_{RTK}\\ \delta\bm{z}_{cam}\end{matrix}\right][ start_ARG start_ROW start_CELL italic_δ bold_italic_P start_POSTSUBSCRIPT italic_R italic_T italic_K end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ bold_italic_L start_POSTSUBSCRIPT italic_R italic_T italic_K end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ bold_italic_z start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] =[𝚫𝑷𝚫𝑷^ins𝚫𝑳𝚫𝑳^ins𝒛cam𝒛^cam]absentdelimited-[]matrixbold-∇𝚫𝑷bold-∇𝚫subscriptbold-^𝑷𝑖𝑛𝑠bold-∇𝚫𝑳bold-∇𝚫subscriptbold-^𝑳𝑖𝑛𝑠subscript𝒛𝑐𝑎𝑚subscript^𝒛𝑐𝑎𝑚\displaystyle=\left[\begin{matrix}\bm{\nabla\Delta P}-\bm{\nabla\Delta\hat{P}}% _{ins}\\ \bm{\nabla\Delta L}-\bm{\nabla\Delta\hat{L}}_{ins}\\ \bm{z}_{cam}-\hat{\bm{z}}_{cam}\end{matrix}\right]= [ start_ARG start_ROW start_CELL bold_∇ bold_Δ bold_italic_P - bold_∇ bold_Δ overbold_^ start_ARG bold_italic_P end_ARG start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_∇ bold_Δ bold_italic_L - bold_∇ bold_Δ overbold_^ start_ARG bold_italic_L end_ARG start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_z start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT - over^ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ]

where δ𝑷RTK𝛿subscript𝑷𝑅𝑇𝐾\delta\bm{P}_{RTK}italic_δ bold_italic_P start_POSTSUBSCRIPT italic_R italic_T italic_K end_POSTSUBSCRIPT and δ𝑳RTK𝛿subscript𝑳𝑅𝑇𝐾\delta\bm{L}_{RTK}italic_δ bold_italic_L start_POSTSUBSCRIPT italic_R italic_T italic_K end_POSTSUBSCRIPT represent the observation errors of DD pseudorange and DD carrier phase in RTK, respectively. 𝚫𝑷bold-∇𝚫𝑷\bm{\nabla\Delta P}bold_∇ bold_Δ bold_italic_P and 𝚫𝑳bold-∇𝚫𝑳\bm{\nabla\Delta L}bold_∇ bold_Δ bold_italic_L can be found in equation (3). In addition, 𝚫𝑷^insbold-∇𝚫subscriptbold-^𝑷𝑖𝑛𝑠\bm{\nabla\Delta\hat{P}}_{ins}bold_∇ bold_Δ overbold_^ start_ARG bold_italic_P end_ARG start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT and 𝚫𝑳^insbold-∇𝚫subscriptbold-^𝑳𝑖𝑛𝑠\bm{\nabla\Delta\hat{L}}_{ins}bold_∇ bold_Δ overbold_^ start_ARG bold_italic_L end_ARG start_POSTSUBSCRIPT italic_i italic_n italic_s end_POSTSUBSCRIPT are DD pseudorange predicted and DD carrier phase predicted by INS mechanization, respectively. Then 𝐇P,RTKsubscript𝐇𝑃𝑅𝑇𝐾{\bf H}_{P,RTK}bold_H start_POSTSUBSCRIPT italic_P , italic_R italic_T italic_K end_POSTSUBSCRIPT and 𝐇L,RTKsubscript𝐇𝐿𝑅𝑇𝐾{\bf H}_{L,RTK}bold_H start_POSTSUBSCRIPT italic_L , italic_R italic_T italic_K end_POSTSUBSCRIPT are the Jacobi matrices of DD pseudorange error and DD carrier phase error. δ𝒙GNSS,RTK𝛿subscript𝒙𝐺𝑁𝑆𝑆𝑅𝑇𝐾\delta\bm{x}_{GNSS,RTK}italic_δ bold_italic_x start_POSTSUBSCRIPT italic_G italic_N italic_S italic_S , italic_R italic_T italic_K end_POSTSUBSCRIPT is the error state vector of RTK which can be found in equation (15). 𝜺ΔP,RTKsubscript𝜺Δ𝑃𝑅𝑇𝐾\bm{\varepsilon}_{\nabla\Delta P,RTK}bold_italic_ε start_POSTSUBSCRIPT ∇ roman_Δ italic_P , italic_R italic_T italic_K end_POSTSUBSCRIPT and 𝜺ΔL,RTKsubscript𝜺Δ𝐿𝑅𝑇𝐾\bm{\varepsilon}_{\nabla\Delta L,RTK}bold_italic_ε start_POSTSUBSCRIPT ∇ roman_Δ italic_L , italic_R italic_T italic_K end_POSTSUBSCRIPT denote DD pseudorange observation error noise and DD carrier phase observation error in RTK.

{RkP,LOS=f×(10SNRS1a((AS0S1101)SNRS1S0S1+1))×σp2RkL,LOS=f×(10SNRS1a((AS0S1101)SNRS1S0S1+1))×σL2casessuperscriptsubscript𝑅𝑘𝑃𝐿𝑂𝑆𝑓superscript10𝑆𝑁𝑅subscript𝑆1𝑎𝐴subscript𝑆0subscript𝑆1101𝑆𝑁𝑅subscript𝑆1subscript𝑆0subscript𝑆11superscriptsubscript𝜎𝑝2otherwisesuperscriptsubscript𝑅𝑘𝐿𝐿𝑂𝑆𝑓superscript10𝑆𝑁𝑅subscript𝑆1𝑎𝐴subscript𝑆0subscript𝑆1101𝑆𝑁𝑅subscript𝑆1subscript𝑆0subscript𝑆11superscriptsubscript𝜎𝐿2otherwise\displaystyle\begin{cases}R_{k}^{P,LOS}=f\times(10^{\frac{SNR-S_{1}}{a}}((% \frac{A}{\frac{S_{0}-S_{1}}{10}}-1)\frac{SNR-S_{1}}{S_{0}-S_{1}}+1))\times% \sigma_{p}^{2}\\ R_{k}^{L,LOS}=f\times(10^{\frac{SNR-S_{1}}{a}}((\frac{A}{\frac{S_{0}-S_{1}}{10% }}-1)\frac{SNR-S_{1}}{S_{0}-S_{1}}+1))\times\sigma_{L}^{2}\end{cases}{ start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P , italic_L italic_O italic_S end_POSTSUPERSCRIPT = italic_f × ( 10 start_POSTSUPERSCRIPT divide start_ARG italic_S italic_N italic_R - italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_a end_ARG end_POSTSUPERSCRIPT ( ( divide start_ARG italic_A end_ARG start_ARG divide start_ARG italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 10 end_ARG end_ARG - 1 ) divide start_ARG italic_S italic_N italic_R - italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG + 1 ) ) × italic_σ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L , italic_L italic_O italic_S end_POSTSUPERSCRIPT = italic_f × ( 10 start_POSTSUPERSCRIPT divide start_ARG italic_S italic_N italic_R - italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_a end_ARG end_POSTSUPERSCRIPT ( ( divide start_ARG italic_A end_ARG start_ARG divide start_ARG italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 10 end_ARG end_ARG - 1 ) divide start_ARG italic_S italic_N italic_R - italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG + 1 ) ) × italic_σ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL end_CELL end_ROW (23)
{RkP,NLOS=K×RkP,LOSRkL,NLOS=K×RkL,LOScasessuperscriptsubscript𝑅𝑘𝑃𝑁𝐿𝑂𝑆absent𝐾superscriptsubscript𝑅𝑘𝑃𝐿𝑂𝑆superscriptsubscript𝑅𝑘𝐿𝑁𝐿𝑂𝑆absent𝐾superscriptsubscript𝑅𝑘𝐿𝐿𝑂𝑆\displaystyle\begin{cases}R_{k}^{P,NLOS}&=K\times R_{k}^{P,LOS}\\ R_{k}^{L,NLOS}&=K\times R_{k}^{L,LOS}\end{cases}{ start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P , italic_N italic_L italic_O italic_S end_POSTSUPERSCRIPT end_CELL start_CELL = italic_K × italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P , italic_L italic_O italic_S end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L , italic_N italic_L italic_O italic_S end_POSTSUPERSCRIPT end_CELL start_CELL = italic_K × italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L , italic_L italic_O italic_S end_POSTSUPERSCRIPT end_CELL end_ROW (24)

In equation (23), f=1/sin2(ele)𝑓1superscript2𝑒𝑙𝑒f={1}/{\sin^{2}(ele)}italic_f = 1 / roman_sin start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_e italic_l italic_e ), ele𝑒𝑙𝑒eleitalic_e italic_l italic_e and SNR𝑆𝑁𝑅SNRitalic_S italic_N italic_R refer to elevation angles and SNR of satellites, respectively. The work [38] gives s1=50subscript𝑠150s_{1}=50italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 50, A=30𝐴30A=30italic_A = 30, s0=10subscript𝑠010s_{0}=10italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 10 and a=20𝑎20a=20italic_a = 20, these parameters are empirical values. σPsubscript𝜎𝑃\sigma_{P}italic_σ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT and σLsubscript𝜎𝐿\sigma_{L}italic_σ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT represent the standard deviation of pseudorange and carrier phase respectively, which are 0.3 m and 0.03 m given in this paper. The K𝐾Kitalic_K is the scale factor, which is 10 in this paper. If the satellite’s projection is located in the sky semantic region, then Stochastic model of satellite observations will be modeled by using equation (24), and if not, it will be modeled by using equation (23).

II-C The Sky-view Images aided GNSS NLOS Detection and Mitigation Method (S-NDM)

For accurate modeling of the GNSS noise covariance in the tightly coupled GNSS/INS/Vision integration, it is essential to differentiate between LOS and NLOS satellites. Therefore, we obtain sky-mask after performing semantic segmentation of the sky-view images using FCN, and subsequently, based on the projection model mentioned in [28] and satellite information included elevation and azimuth angles provided by satellite ephemeris, we ultimately identify LOS and NLOS conditions around the GNSS receiver. Fig. 3 shows the overall flow of S-NDM algorithm. In this process, if the satellite’s projection is located in the sky region, then the satellite will be classified as an LOS satellite (represented by a blue dot), and if not, it will be classified as an NLOS satellite (represented by a red dot). Through this satellite visualization strategy, we can obtain the judgment conditions of equation (23) and (24). Different from judging conditions by experience threshold in [38], satellite visualization strategy is more reliable, which is also the difference between LOS/NLOS signal modeling in this paper and traditional methods.

Refer to caption
Figure 3: The overall flow of S-NDM algorithm.
TABLE I: Technical specifications of the IMU sensors.
IMU Equipment Grade Sample rates (Hz) Angular Velocity Acc Gyro
(/h)\left({}^{\circ}/\sqrt{h}\right)( start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT / square-root start_ARG italic_h end_ARG ) (m/s/h)𝑚𝑠\left(m/s/\sqrt{h}\right)( italic_m / italic_s / square-root start_ARG italic_h end_ARG ) (mGal)𝑚𝐺𝑎𝑙\left(mGal\right)( italic_m italic_G italic_a italic_l ) (/h)\left({}^{\circ}/h\right)( start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT / italic_h )
ADIS-16470 MEMS 100 0.34 0.18 1300 8
SPAN-ISA-100C Tactical 200 0.005 0.018 100 0.05

III Experiments

This section delineates the experimental methodology undertaken to assess the effectiveness of the sky-view image-assisted GNSS NLOS detection across different positioning models. The study categorizes the models into SPP-related and RTK-related tightly coupled models for comparative analysis. To evaluate positioning performance, we calculated the root mean square error (RMSE) in the three directions of the East (E), North (N) and Up (U).

III-A Experiment Description

As is shown in Fig. 4, The data acquisition platform consists of a GNSS receiver (Septentrio mosaic-X5 mini), GNSS antenna (NovAtel GNSS-850), and two forward-looking cameras (FLIR BFS-U3-31S4C-C), an sky-pointing fish-eye camera (FE185C057HA-1), a tactical grade IMU (NovAtel SPAN-ISA-100C), a MEMS-IMU (ADIS-16470), and a time synchronization board. The time synchronization board unifies the time of all sensors to GPS time through pulse per second (PPS) generated by the GNSS receiver. The sampling rates of GNSS, MEMS-IMU, forward-looking cameras and fish-eye camera are 1 Hz, 100 Hz, 10 Hz, and 1 Hz, respectively. In addition, the NovAtel SPAN-ISA-100C interacts with NovAtel’s ProPak7 receiver via a highly reliable IMU interface. The tightly coupled multi-GNSS post-processing kinematic (PPK)/INS bidirectional smoothing position results can be obtained through commercial IE 8.9 software and used as a reference truth value. Table I lists the specific parameters of the two IMUs. For software, we run the Linux system in the environment with Intel Core i7-9750H@ 2.6GHz, 32GB memory. In addition, we used a 3060Ti GPU (Graphics Processing Unit) for acceleration. Meanwhile, we used the opencv3.4.9 [39] to process the images in the tightly coupled system.

We collected vehicular data in a typical urban canyon area in Wuhan On September 3, 2023. The experimental trajectory and surrounding landscape, featuring high-rise structures, dense foliage, and overpasses, are illustrated in Fig. 5. The GNSS elevation angles and the position dilution of precision (PDOP) values for this route are shown in Fig. 6. Combined with the LOS/NLOS satellite conditions presented in Fig. 5, it is conceivable that our testing environment is plagued by severe GNSS NLOS, multipath, and cycle slip issues. These complications not only impair GNSS-based positioning accuracy but also challenge the reliability of GNSS/INS/Vision integration model that depends on GNSS for absolute position information. The choice of such a challenging environment underscores the purpose of the experiment, which is to validate the efficacy and dependability of S-NDM algorithm and Sky-GVIO model introduced in this study.

Refer to caption
Figure 4: Illustration of experimental hardware platform.
Refer to caption
Figure 5: The experimental route and scene in the urban canyon. (A, B, C and D on the right correspond to the sky-view images of the four scenes in the trajectory, respectively. In the sky-view images on the right, the red dots represent the NLOS satellite, the blue dots represent the LOS satellite.)
Refer to caption
Figure 6: Number of LOS and total satellites (top) and PDOP value of all satellites (bottom).
Refer to caption
Figure 7: The examples of sky-view images dataset by annotated semantically.

In addition, we briefly introduce our sky-view images dataset, including the training dataset and the testing dataset. The training dataset contains 440 images and the testing dataset contains 2000 images. These data were collected in typical urban canyons in two different areas of Wuhan, which contain trees, tall buildings, and light poles. Therefore, this dataset is very suitable for sky-view images segmentation experiments. As shown in Fig. 7, it is the sky-view images that we have labeled semantically, with blue representing the sky area and black representing the non-sky area.

III-B The Results of Sky-view Images Segmentation and GNSS NLOS Detection

The segmentation of sky-view images in urban canyons is challenging due to dynamic environmental factors such as cloud cover and varying light conditions, which can degrade the accuracy of traditional image segmentation techniques. The Result of poor image segmentation accuracy will lead to errors in NLOS detection. Fig. 8 presents a comparison of the results of sky-view images segmentation between traditional segmentation algorithms and the method proposed in this paper.

Refer to caption
Figure 8: Experimental results of sky-view images segmentation using different methods. From left to right: Ground truth with the outline of sky-region (red line), the segmentation results by Otsu, Kmeans, Region growth and our proposed method, respectively. Regions that are incorrectly segmented are highlighted via green boxes.

We compare our method based FCN with representative methods on image segmentation, including Otsu, Kmeans and Region growth. For fair comparison with the other competitors, all tests were performed on our collected dataset. As can be seen in Fig. 8, cloud and light cause obvious errors in Fig. 8(b), Fig. 8(c) and Fig. 8(d), especially in areas close to buildings. The performances of Otsu and Kmeans are relatively similar, but Region Growth demonstrates a higher incidence of misclassification. Furthermore, in images featuring elevated bridges, an erroneous selection of seed points leads to the misidentification of sky regions as non-sky regions, which can be critically detrimental in NLOS identification. In contrast, our proposed FCN-based approach attains a high-precision segmentation outcome. This is because FCN captures global context information for the input image by using convolution and pooling layers. This allows the network to better understand the relationships between different objects and the overall structure of the image, which helps in more accurate segmentation.

FCN-derived segmentation results are utilized for NLOS detection, with the visibility of satellites illustrated in Fig. 9. In the Fig. 9, we can observe that there are no identification errors in the visualization results of LOS and NLOS satellites, underscoring the reliability of our S-NDM algorithm. In the case of mild urban canyon environments (as illustrated in Fig. 9(a) and Fig. 9(c)), LOS satellites dominate. However, in the case of deep urban canyons (as shown in Fig. 9(b)) and environments with elevated bridges (as in Fig. 9(d)), fewer than four LOS satellites are detectable, and the satellite configurations are suboptimal, highlighting the complexity and challenges of our experimental testing environments.

Refer to caption
Figure 9: The detection visibility of LOS/NLOS satellites. (The red dots represent the NLOS satellite, the blue dots represent the LOS satellite.)
TABLE II: Performance comparison of sky-view images using different methods
Method Kmeans Otsu Region growth Ours
FPS 0.34 5.47 3.69 10.85
Accuracy 49.50% 36.45% 44.96% 98.54%

III-C The Quantitative Analysis of Sky-view Images Segmentation

Considering the high requirement of real-time and precision for vehicle positioning, we conducted quantitative tests on the efficiency and accuracy of different segmentation algorithms. The results are shown in Table II. The efficiency of these algorithms is reflected by FPS (Frames Per Second), that FPS refers to the number of images processed per second. The Accuracy of the algorithm is reflected by “Accuracy”, which refers to the percentage of the number of correctly segmented images in the total number of processing results. The experiment is carried out on the training data set, which is convenient for us to calculate the performance index of these algorithms in sky-view images segmentation.

It can be seen from Table II that the FCN-based image segmentation algorithm is more efficient, which is due to FCN supporting GPU acceleration. The accuracy of the other CPU (Central Processing Unit)-based machine learning methods are less than 50%. In addition, the update cycle of the GNSS, INS and vision tightly coupled model based on MSCKF is 1s (synchronizing with the GNSS sampling frequency). Therefore, FCN’s FPS fully meets the demand. Compared with machine learning methods, FCN based on deep learning is also more advantageous in terms of accuracy. Therefore, in terms of efficiency or accuracy, the FCN-based sky-view images segmentation algorithm proposed in this paper is meaningful.

III-D The Experimental Results of Positioning

To verify the effectiveness of Sky-GVIO model on the positioning of car in the urban canyon, and evaluate the performance improvement of the two modes based on SPP-related and RTK-related enhanced by S-NDM. It should be noted that we use the RTK float solution. We conducted several experimental comparisons and compared against state-of-the-art methods.

Refer to caption
Figure 10: The comparisons of the tightly coupled (TC)-SPP/INS/Vision, SPP/INS/Vision/Sky models about the position errors in the urban canyon areas. (SPP/INS/Vision/Sky refer to Sky-GVIO of SPP-related.)
Refer to caption
Figure 11: The comparisons of the TC-RTK/INS/Vision and TC-RTK/INS/Vision/Sky models about the position errors in the urban canyon areas. (RTK/INS/Vision/Sky refer to Sky-GVIO of RTK-related.)
TABLE III: Position RMSEs of VINS-mono, GVINS and (TC)-SPP/INS/Vision, SPP/INS/Vision/Sky, RTK/INS/Vision, RTK/INS/Vision/Sky models
         Method          Position RMSE(m)
         East          North          Up
         Ours          TC-SPP/INS/Vision          3.24          2.14          3.39
         TC-SPP/INS/Vision/Sky          2.07          1.51          2.47
         TC-RTK/INS/Vision          0.21          0.13          0.36
         TC-RTK/INS/Vision/Sky          0.16          0.11          0.27
         Others          VINS-mono          -          -          -
         GVINS          2.50          1.75          2.82

The time series of position errors for different tightly coupled models of SPP-related are presented in Fig. 10 and the corresponding RMSEs are summarized in Table III. The positioning accuracy of TC-SPP/INS/Vision in E-N-U directions is 3.24, 2.14 and 3.39 m. Different from TC-SPP/INS/Vision, TC-SPP/INS/Vision/Sky identify LOS/NLOS satellites under the GNSS challenge environment and model them to inhibit the impact of NLOS on GNSS observations. As expected, the positioning accuracy is improved to 2.07, 1.51 and 2.47 m in E-N-U directions when TC-SPP/INS/Vision enhanced by S-NDM. Compared with TC-SPP/INS/Vision, the positioning accuracy of TC-SPP/INS/Vision/Sky is improved by 36%, 29% and 27% in E-N-U directions, respectively. As seen from results, TC-SPP/INS/Vision/Sky can maintain meter-level positioning accuracy. Therefore the Sky-GVIO of SPP-related is more suitable for mobile phone navigation and pedestrian navigation in urban canyons.

The time series of position errors for different tightly coupled models of RTK-related are presented in Fig. 11 and the corresponding RMSEs are summarized in Table III. The positioning accuracy of TC-RTK/INS/Vision in E-N-U directions are 0.21, 0.13 and 0.36 m. It can be seen that the positioning accuracy of TC-RTK/INS/Vision/Sky is 0.16, 0.11 and 0.27 m in E-N-U directions which outperforms TC-RTK/INS/Vision. Compared with TC-RTK/INS/Vision, the positioning accuracy of TC-RTK/INS/Vision/Sky is improved by 24%, 15% and 25% in E-N-U directions, respectively. These considerable improvements in the positioning accuracy mainly stem from GNSS NLOS detection and mitigation enhanced by sky-view images which makes the weighting of GNSS observations more reasonable.

In addition, we compared against state-of-the-art methods, including VINS-mono [5]and GVINS [7]. As we all know, VINS-mono is a very famous tightly coupled model. However, without external information for correction, VINS-mono will accumulate drift errors, resulting in gradually larger errors in E-N-U directions as shown in Fig. 12. Due to large errors obtained by VINS-mono, statistics were not carried out in Table III. GVINS which GNSS pseudorange measurement, GNSS doppler measurement, visual constraints and inertial constraints were jointly optimized is also mature tightly coupled GNSS/INS/Vision model, which is often used to compare models of the same type. The time series of position errors for GVINS are presented in Fig. 13 and the corresponding RMSEs are summarized in Table III. The positioning accuracy of GVINS in E-N-U directions is 2.50, 1.75 and 2.82 m which outperforms TC-SPP/INS/Vision.in this paper. This is because GVINS adds doppler measurements. However, GVINS did not carry out strict quality control in GNSS preprocessing, especially in GNSS NLOS part. Therefore, TC-SPP/INS/Vision/Sky model proposed in this paper has higher accuracy than GVINS.

Refer to caption
Figure 12: The position errors of VINS-mono in E-N-U directions.
Refer to caption
Figure 13: The position errors of GVINS in E-N-U directions.

IV Conclusion

This paper presents a GNSS NLOS-detectable, reliable tight-coupled model in urban canyons, called Sky-GVIO.We detail a module for GNSS NLOS detection and mitigation, and extend it to the tightly coupled GNSS/INS/Vision model. Based on this, we evaluate the position performance of the SPP-related and RTK-related tight-coupled models. We find that these models can be helped to improve the positioning accuracy by the S-NDM algorithm proposed in this paper.In urban canyon environments where GNSS performance is challenging, our Sky-GVIO model of RTK-related can achieve sub-decimeter accuracy, which is exciting for users with high-precision location service needs. In addition, our Sky-GVIO model of SPP-related also achieves meter-level positioning accuracy in this GNSS-challenged urban canyon environment, which is also very meaningful for low-cost users such as cell phone navigation and pedestrian navigation.

In the future, we still have the following work to do:

(1) Enhancing the utilization of fish-eye camera data beyond GNSS NLOS detection, potentially integrating fish-eye camera observations into the proposed model.

(2) Accuracy is expected to reach centimeter level. By adding prior information (such as high-precision maps), the whole system is more robust and the positioning accuracy is higher.

Acknowledgments

Upon reasonable request to the corresponding author, the experimental data used in this research is available. This work was supported by the National Key Research and Development Program of China under Grant 2021YFB2501100.

References

  • [1] S. Godha and M. Cannon, “Gps/mems ins integrated system for navigation in urban areas,” Gps Solutions, vol. 11, pp. 193–203, 2007.
  • [2] T. Li, H. Zhang, Z. Gao, Q. Chen, and X. Niu, “High-accuracy positioning in urban environments using single-frequency multi-gnss rtk/mems-imu integration,” Remote sensing, vol. 10, no. 2, p. 205, 2018.
  • [3] X. Niu, Y. Dai, T. Liu, Q. Chen, and Q. Zhang, “Feature-based gnss positioning error consistency optimization for gnss/ins integrated system,” Gps Solutions, vol. 27, no. 2, p. 89, 2023.
  • [4] K. Chen, G. Chang, and C. Chen, “Ginav: a matlab-based software for the data processing and analysis of a gnss/ins integrated navigation system,” GPS solutions, vol. 25, no. 3, p. 108, 2021.
  • [5] Q. Sun, J. Yuan, X. Zhang, and F. Duan, “Plane-edge-slam: Seamless fusion of planes and edges for slam in indoor environments,” IEEE Transactions on Automation Science and Engineering, vol. 18, no. 4, pp. 2061–2075, 2020.
  • [6] J. Cheng, C. Wang, and M. Q.-H. Meng, “Robust visual localization in dynamic environments based on sparse motion removal,” IEEE Transactions on Automation Science and Engineering, vol. 17, no. 2, pp. 658–669, 2019.
  • [7] C. Jiang, Z. Hu, Z. P. Mourelatos, D. Gorsich, P. Jayakumar, Y. Fu, and M. Majcher, “R2-rrt*: Reliability-based robust mission planning of off-road autonomous ground vehicle under uncertain terrain environment,” IEEE Transactions on Automation Science and Engineering, vol. 19, no. 2, pp. 1030–1046, 2021.
  • [8] Z. Shen, X. Li, Y. Zhou, S. Li, Z. Wu, and X. Wang, “Accurate and capable gnss-inertial-visual vehicle navigation via tightly coupled multiple homogeneous sensors,” IEEE Transactions on Automation Science and Engineering, 2024.
  • [9] T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE transactions on robotics, vol. 34, no. 4, pp. 1004–1020, 2018.
  • [10] J. Liao, X. Li, X. Wang, S. Li, and H. Wang, “Enhancing navigation performance through visual-inertial odometry in gnss-degraded environment,” Gps Solutions, vol. 25, pp. 1–18, 2021.
  • [11] S. Cao, X. Lu, and S. Shen, “Gvins: Tightly coupled gnss–visual–inertial fusion for smooth and consistent state estimation,” IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2004–2021, 2022.
  • [12] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint kalman filter for vision-aided inertial navigation,” in Proceedings 2007 IEEE international conference on robotics and automation.   IEEE, 2007, pp. 3565–3572.
  • [13] T. Li, H. Zhang, Z. Gao, X. Niu, and N. El-Sheimy, “Tight fusion of a monocular camera, mems-imu, and single-frequency multi-gnss rtk for precise navigation in gnss-challenged environments,” Remote Sensing, vol. 11, no. 6, p. 610, 2019.
  • [14] P. D. Groves, Z. Jiang, B. Skelton, P. A. Cross, L. Lau, Y. Adane, and I. Kale, “Novel multipath mitigation methods using a dual-polarization antenna,” in Proceedings of the 23rd International Technical Meeting of The Satellite Division of the Institute of Navigation (ION GNSS 2010), 2010, pp. 140–151.
  • [15] S. Liu, D. Li, B. Li, and F. Wang, “A compact high-precision gnss antenna with a miniaturized choke ring,” IEEE Antennas and Wireless Propagation Letters, vol. 16, pp. 2465–2468, 2017.
  • [16] I. J. Gupta, I. M. Weiss, and A. W. Morrison, “Desired features of adaptive antenna arrays for gnss receivers,” Proceedings of the IEEE, vol. 104, no. 6, pp. 1195–1206, 2016.
  • [17] D. H. Won, J. Ahn, S.-W. Lee, J. Lee, S. Sung, H.-W. Park, J.-P. Park, and Y. J. Lee, “Weighted dop with consideration on elevation-dependent range errors of gnss satellites,” IEEE Transactions on Instrumentation and Measurement, vol. 61, no. 12, pp. 3241–3250, 2012.
  • [18] P. D. Groves and Z. Jiang, “Height aiding, c/n0 weighting and consistency checking for gnss nlos and multipath mitigation in urban areas,” The Journal of Navigation, vol. 66, no. 5, pp. 653–669, 2013.
  • [19] W. Wen, G. Zhang, and L.-T. Hsu, “Exclusion of gnss nlos receptions caused by dynamic objects in heavy traffic urban scenarios using real-time 3d point cloud: An approach without 3d maps,” in 2018 IEEE/ION Position, Location and Navigation Symposium (PLANS).   IEEE, 2018, pp. 158–165.
  • [20] W. W. Wen, G. Zhang, and L.-T. Hsu, “Gnss nlos exclusion based on dynamic object detection using lidar point cloud,” IEEE transactions on intelligent transportation systems, vol. 22, no. 2, pp. 853–862, 2019.
  • [21] L. Wang, P. D. Groves, and M. K. Ziebart, “Gnss shadow matching: Improving urban positioning accuracy using a 3d city model with optimized visibility scoring scheme,” NAVIGATION: Journal of the Institute of Navigation, vol. 60, no. 3, pp. 195–207, 2013.
  • [22] L.-T. Hsu, Y. Gu, and S. Kamijo, “3d building model-based pedestrian positioning method using gps/glonass/qzss and its reliability calculation,” GPS solutions, vol. 20, pp. 413–428, 2016.
  • [23] T. Suzuki and N. Kubo, “N-los gnss signal detection using fish-eye camera for vehicle navigation in urban environments,” in Proceedings of the 27th International Technical Meeting of The Satellite Division of the Institute of Navigation (ION GNSS+ 2014), 2014, pp. 1897–1906.
  • [24] W. Wen, X. Bai, Y. C. Kan, and L.-T. Hsu, “Tightly coupled gnss/ins integration via factor graph and aided by fish-eye camera,” IEEE Transactions on Vehicular Technology, vol. 68, no. 11, pp. 10 651–10 662, 2019.
  • [25] J.-i. Meguro, T. Murata, J.-i. Takiguchi, Y. Amano, and T. Hashizume, “Gps multipath mitigation for urban area using omnidirectional infrared camera,” IEEE Transactions on Intelligent Transportation Systems, vol. 10, no. 1, pp. 22–30, 2009.
  • [26] A. Cohen, C. Meurie, Y. Ruichek, J. Marais, and A. Flancquart, “Quantification of gnss signals accuracy: An image segmentation method for estimating the percentage of sky,” in 2009 IEEE International Conference on Vehicular Electronics and Safety (ICVES).   IEEE, 2009, pp. 35–40.
  • [27] D. Attia, C. Meurie, Y. Ruichek, and J. Marais, “Counting of satellites with direct gnss signals using fisheye camera: A comparison of clustering algorithms,” in 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).   IEEE, 2011, pp. 7–12.
  • [28] J. Wang, J. Liu, S. Zhang, B. Xu, Y. Luo, and R. Jin, “Sky-view images aided nlos detection and suppression for tightly coupled gnss/ins system in urban canyon areas,” Measurement Science and Technology, vol. 35, no. 2, p. 025112, 2023.
  • [29] P. P. Vijay and N. Patil, “Gray scale image segmentation using otsu thresholding optimal approach,” Journal for Research, vol. 2, no. 05, 2016.
  • [30] N. Dhanachandra, K. Manglem, and Y. J. Chanu, “Image segmentation using k-means clustering algorithm and subtractive clustering algorithm,” Procedia Computer Science, vol. 54, pp. 764–771, 2015.
  • [31] J. Soltani-Nabipour, A. Khorshidi, and B. Noorian, “Lung tumor segmentation using improved region growing algorithm,” Nuclear Engineering and Technology, vol. 52, no. 10, pp. 2313–2319, 2020.
  • [32] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
  • [33] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • [34] B. Xu, S. Zhang, K. Kuang, and X. Li, “A unified cycle-slip, multipath estimation, detection and mitigation method for vio-aided ppp in urban environments,” GPS Solutions, vol. 27, no. 2, p. 59, 2023.
  • [35] J. Sola, “Quaternion kinematics for the error-state kalman filter,” arXiv preprint arXiv:1711.02508, 2017.
  • [36] P. Furgale, J. Rehder, and R. Siegwart, “Unified temporal and spatial calibration for multi-sensor systems,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.   IEEE, 2013, pp. 1280–1286.
  • [37] K. Sun, K. Mohta, B. Pfrommer, M. Watterson, S. Liu, Y. Mulgaonkar, C. J. Taylor, and V. Kumar, “Robust stereo visual inertial odometry for fast autonomous flight,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 965–972, 2018.
  • [38] A. M. Herrera, H. F. Suhandri, E. Realini, M. Reguzzoni, and M. C. de Lacy, “gogps: open-source matlab software,” GPS solutions, vol. 20, pp. 595–603, 2016.
  • [39] G. Bradski, “The opencv library. dr. dobb’s journal of software tools,(4.7. 0),” computer program] Available at: https://opencv. org [Accessed: 04 April 2023], 2000.
[Uncaptioned image] Jingrong Wang received the B.E. degree in the School of Land Science and Technology from China University of Geosciences, Beijing, China, in 2014 and 2018, and the M.E. degree in the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS) from Wuhan University, Wuhan, China, in 2018 and 2020, respectively, where, he is currently pursuing the Ph.D degree with the GNSS Research Center. His research interests include GNSS precise positioning, visual inertial odometry (VIO) and multi-sensor fusion algorithm.
[Uncaptioned image] Bo Xu received the B.E. degree in the School of Land Science and Technology from China University of Geosciences, Beijing, China, in 2014 and 2018, and the M.E. degree in the School of Geodesy and Geomatics from Wuhan University, Wuhan, China, in 2018 and 2021, respectively, where, he is currently pursuing the Ph.D degree with the School of Geodesy and Geomatics. His research interests include GNSS precise positioning, visual SLAM, visual inertial odometry (VIO) and multi-sensor fusion algorithm.
[Uncaptioned image] Ronghe Jin received the B.S. degree in geographic information system and the M.S. degree in surveying and mapping from Wuhan University, Wuhan, China, in 2013 and 2016, respectively. He received Ph.D. degree with the School of Geodesy and Geomatics, from Wuhan University, Wuhan, China, in 2022. He is currently a Postdoctoral Fellow with the Hong Kong Polytechnic University, Hong Kong, China. His research interests include GNSS precise positioning, visual SLAM, visual inertial odometery (VIO), multi-sensor fusion algorithm, and their applications in land vehicle navigation.
[Uncaptioned image] Shoujian Zhang received the B.Eng. and Ph.D. degrees in the School of Geodesy and Geomatics from Wuhan University, Wuhan, China, in 2004 and 2009, respectively. He is an associate professor at the school of geodesy and geomatics from Wuhan University. His research interests are GNSS data processing, multiple sensor data fusion algorithms and applications.
[Uncaptioned image] Xingxing Li received the B.Eng. degree in geodesy and survey engineering from the School of Geodesy and Geomatics, Wuhan University, Wuhan, China, in 2008, and the Ph.D. degree in geodesy from the Department of Geodesy and Remote Sensing, German Research Center for Geosciences (GFZ), Potsdam, Germany, in 2015. He is currently a Professor with Wuhan University. His current research interests include GNSS precise data processing and multisensor fusion navigation.
[Uncaptioned image] Kefu Gao received his Ph.D. degree in informaiton security from Wuhan University in 2016. From October 2011, he worked at GNSS Research Center of Wuhan University. His research interests include the theories and methods of location based service, intelligent computing.
[Uncaptioned image] Jingnan Liu received the B.Eng. degree in astronomical geodesy and the M.Eng. degree in geodesy from the Wuhan Technical University of Surveying and Mapping, China (now Wuhan University) in 1967 and 1982, respectively. He is an Academician of the Chinese Academy of Engineering, a Former President of Wuhan University, the first Chancellor of Duke Kunshan University, and the Director of the National Engineering Research Center for Satellite Positioning System. He has been engaged in the research of geodetic theories and applications, including national coordinate system establishment, global navigation satellite system (GNSS) technology, and software development as well as large project implementation. As an academic authority in the GNSS field, he has been awarded more than ten national or provincial prizes for progress in science and technology.