Sky-GVIO: an enhanced GNSS/INS/Vision navigation with FCN-based sky-segmentation in urban canyon

Jingrong Wang, Bo Xu, Ronghe Jin, Shoujian Zhang, Xingxing Li, Kefu Gao, and Jingnan Liu This work was supported by the National Key Research and Development Program of China under Grant 2021YFB2501100. (Corresponding author: Shoujian Zhang.) Jingrong Wang, Kefu Gao and Jingnan Liu are with the GNSS research center, Wuhan University, Wuhan 430079, China;Bo Xu, Shoujian Zhang and Xingxing Li are with School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (email: [email protected]).Ronghe Jin is with The Department of Aeronautical and Aviation Engineering, Hong Kong Polytechnic University, Hong Kong, China

Abstract

Accurate, continuous, and reliable positioning is a critical component of achieving autonomous driving. However, in complex urban canyon environments, the vulnerability of a stand-alone sensor and non-line-of-sight (NLOS) caused by high buildings, trees, and elevated structures seriously affect positioning results. To address these challenges, a sky-view images segmentation algorithm based on Fully Convolutional Network (FCN) is proposed for GNSS NLOS detection. Building upon this, a novel NLOS detection and mitigation algorithm (named S-NDM) is extended to the tightly coupled Global Navigation Satellite Systems (GNSS), Inertial Measurement Units (IMU), and visual feature system which is called Sky-GVIO, with the aim of achieving continuous and accurate positioning in urban canyon environments. Furthermore, the system harmonizes Single Point Positioning (SPP) with Real-Time Kinematic (RTK) methodologies to bolster its operational versatility and resilience. In urban canyon environments, the positioning performance of S-NDM algorithm proposed in this paper is evaluated under different tightly coupled SPP-related and RTK-related models. The results exhibit that Sky-GVIO system achieves meter-level accuracy under SPP mode and sub-decimeter precision with RTK, surpassing the performance of GNSS/INS/Vision frameworks devoid of S-NDM. Additionally, the sky-view image dataset, inclusive of training and evaluation subsets, has been made publicly accessible for scholarly exploration at https://github.com/whuwangjr/sky-view-images .

Note to Practitioners

This study focus on the tight integration of multiple homogeneous and heterogeneous sensors (e.g. GNSS/INS/Vision) with the goal of addressing GNSS NLOS interference challenges for wide-area vehicle navigation applications in urban canyon. We propose a sky-view images-based accurate and efficient NLOS detection and mitigation algorithm (named S-NDM), and extend it to the tightly coupled GNSS/INS/Vision integration framework (called Sky-GVIO). The LOS/NLOS satellites are identified by associating the semantic information of sky-view images, and a reasonable stochastic model is constructed to suppress NLOS influence in the tightly coupled GNSS/INS/Vision integration model positioning accuracy. The experimental results explains that the Sky-GVIO is able to maximize the use of as much sensor information as possible to achieve accurate and robust positioning in the real urban canyon scenarios.

Index Terms:

GNSS NLOS, GNSS/INS/Vision system, sky-view images, tightly coupled integration, urban canyon.

I Introduction

Autonomous driving is one of the significant components in the field of intelligent transportation, necessitating high-precision localization, notably within the challenging terrains like urban canyons. Currently, the synergistic use of Global Navigation Satellite System (GNSS) and Inertial Navigation System (INS) has emerged as the predominant approach to navigate the complex urban environments [1], [2], [3]. Within the domain of GNSS, Real-Time Kinematic (RTK) and Precise Point Positioning (PPP) technologies have been extensively adopted to enhance GNSS/INS integrated solutions. Comparative studies indicate that RTK/INS fusion yields superior accuracy over PPP/INS under identical observational conditions [4]. Nonetheless, the urban environment, with its pervasive obstructions like edifices and arboreal coverage, introduces Non-Line-of-Sight (NLOS) errors to GNSS signals, compromising the accuracy of GNSS/INS integration positioning. Scholars have sought to augment the precision and robustness of GNSS/INS systems in urban canyons by incorporating additional sensory apparatus or by developing techniques to detect and rectify NLOS-induced signal distortions.

Cameras are increasingly utilized in vehicular motion estimation due to their energy efficiency and cost benefits. As an external sensor, cameras can provide rich environmental features for vehicle motion estimation [5], [6]. Consequently, the integration of cameras with GNSS and Micro-Electro-Mechanical System (MEMS)-based Inertial Measurement Units (IMUs) is a common strategy to attain precise localization in complex environments [7], [8]. Previous research [9] introduced Visual-Inertial Navigation Systems (VINS)-monocular model, integrating Visual Inertial Odometry (VIO)-derived relative poses with Global Positioning System (GPS) data within a unified optimization structure. In contrast to VINS-Mono, the work in [10] combines differential GNSS results with the VIO model, where the VIO model is transformed from the local frame to the global frame, achieving meter-level positioning accuracy in complex urban environments. Advancing from VINS-Mono, the work in [11] introduced the well-known GVINS model, which performs a joint optimization of GNSS pseudorange measurements, visual features, and inertial measurements through factor graph optimization techniques. While methods based on nonlinear optimization have advantages in handling system nonlinearity, multiple iterations of optimization increase computational complexity. Therefore, some researchers have started to focus on Extended Kalman Filter (EKF)-based methodologies. Building on [12], the paper [13] put forth a tightly-coupled Mono/MEMS-IMU/single-frequency GNSS-RTK model employing a Multi-State Constraint Kalman Filter (MSCKF), which attained decimetre-level positioning accuracy in urban environments.

In the above multi-sensor fusion positioning systems, GNSS is the only subsystem that provides absolute position information. Therefore, the quality control of GNSS raw measurements determines the overall performance of the system. This underscores the significance of NLOS signal detection and mitigation, especially in the convoluted terrains of urban environments.

In the detection and mitigation of GNSS NLOS signals, strategies are divided into hardware-centric designs and algorithmic advancements. Compared to expensive hardware improvements such as antenna design in [14], [15], [16], many researchers have focused on algorithmic improvements. These include empirical weighting models based on elevation angle [17], signal-to-noise ratio (SNR) [18], and methods that leverage multi-source information for satellite visibility. Notably, methods augmented by external sources like LiDAR [19], [20], 3-dimensional (3D) maps [21], [22], and cameras [23], [24] have refined GNSS NLOS signal detection accuracy. Cameras, especially, present a cost-effective alternative to the high expenses and limited scope of LiDAR, and the necessity for continuously updated 3D map databases. Infrared cameras [25] exhibit varying results for objects at different temperatures, making it easier to distinguish between sky and non-sky areas, which is advantageous for determining the satellite’s projection location on the sky-view images. However, compared to regular fish-eye cameras, infrared cameras are more costly. Furthermore, these cameras have not yet seen widespread use in consumer market products like smartphones or vehicle-mounted cameras. Subsequently, many research works began to use sky-pointing fish-eye cameras to capture sky-view images. These images were processed using segmentation algorithms [25], [26], [27] to distinguish between sky and non-sky areas. Finally, the satellites received by GNSS receiver were projected onto the sky-view images, facilitating the visualization of GNSS NLOS satellites. As seen from the results in [24] and [28], this approach significantly enhances the performance of SPP/INS positioning in complex urban environments. However, these traditional segmentation algorithms may not adapt well to sky-view images with varying lighting conditions. Furthermore, we have observed that the use of sky-view images for GNSS NLOS detection has not been extended to tightly coupled GNSS/INS/Vision systems. Additionally, there is an absence of comparative performance analysis of sky-view images in different GNSS positioning modes, both domestically and internationally.

We aim to extend the sky-view images aided GNSS NLOS detection and mitigation method (named S-NDM) to the tightly coupled GNSS/INS/Vision system, thereby enhancing vehicle positioning performance in urban canyons. Here we particularly emphasize the progressiveness from [28]: (a) different from the previous idea of improving the region growth algorithm, we use the algorithm of neural network to achieve segmentation of sky-view images to adapt to different lighting conditions; (b) the original NLOS signal processing algorithm is only used in tightly coupled SPP/INS framework. In this paper, we extend it to tightly coupled SPP/INS/Vision and RTK/INS/Vision framework. In addition, we evaluate the performance of the algorithm in these two frameworks and verify the practicability of the algorithm. This paper emphasizes the following primary contributions:

1) Adaptive Sky-view Images Segmentation: We introduce an adaptive sky-view images segmentation based on Fully Convolutional Networks (FCN) that can adjust to varying lighting conditions, addressing a key limitation of traditional methods.

2) Integration of Sky-GNSS/INS/Vision: We propose an integrated model that combines GNSS, INS, and Vision. And we extend S-NDM method to this model (named Sky-GVIO), enabling a comprehensive approach to vehicle positioning in challenging urban canyon environments.

3) Performance Evaluation: A comprehensive evaluation of S-NDM’s performance is conducted, with a focus on its effectiveness within GNSS pseudorange and carrier phase positioning frameworks, thereby shedding light on its applicability across different GNSS-related integration positioning techniques.

4) Open-Source Sky-view Images Dataset: An open-source repository of sky-view images, including training and testing data, is provided at https://github.com/whuwangjr/sky-view-images , contributing a valuable dataset to the research community and mitigating the lack of available resources in this field.

The reminder of this paper is organized as follows: Section II gives an overview of the tightly coupled GNSS/INS/Vision system enhanced by S-NDM. The experimental description and result analysis are introduced in Section III. Finally, Section IV summarizes and concludes the study.

II System overview

The proposed model Sky-GVIO are described in this section, include sky-view images segmentation based on FCN, the tightly coupled GNSS/INS/Vision integration system and S-NDM, as shown Fig. 1. The tightly coupled model is a fusion of the observed values. Before the fusion, it is very important to process the GNSS original data. We use S-NDM algorithm to process GNSS NLOS signals. In addition, the INS mechanization is used for state prediction and the system covariance would also be propagated. In the visual part, the feature extraction and tracking will be performed following [29]. Finally, we integrate the observation equations of GNSS, INS and vision into the MSCKF framework to obtain the navigation results.

II-A Sky-view Images Segmentation

Sky-view images can be significantly affected by factors such as clouds and lighting conditions, making it challenging to achieve high-precision segmentation using traditional methods based on pixel [29], category [30], region [31], and so on. It is well-known that FCN represent a mature pixel-level semantic segmentation network [32]. The FCN network structure primarily consists of two parts: the fully convolutional part and the deconvolution part. The fully convolutional part comprises classical CNN networks, such as VGG and ResNet, which are used for feature extraction. The deconvolution part, on the other hand, upsamples the feature maps to obtain the original-sized semantic segmentation image.

Refer to caption — Figure 1: The system structure of the proposed Sky-GVIO.

In this paper, the existing ResNet50 [33] is used for downsampling, which includes 48 convolutional layers. The deconvolution part, on the other hand, upsamples the feature maps to obtain the original-sized semantic segmentation images. In this paper, the upsampling is based on FCN-8s. In the upsampling process, FCN-8s uses transposed convolution to scale the 8x, 16x and 32x feature maps to the original size, and combines these three scaled feature maps by introducing skip connection, so as to ensure the learning of features at different scales.

The input of FCN can be any size images, the output is the same size as the input, and the number of channels is n (number of target categories) +1 (background). For the sky-view images segmentation task, two types of labels are required (sky region and non-sky region), so the number of channels for sky-view images segmentation algorithm based on FCN is 2. In addition, in this study, we made 440 training datasets by ourselves. As shown in Fig. 2, we built a sky-view images segmentation model based on FCN.

II-B Tightly Coupled GNSS/INS/Vision Integration Model

GNSS model, INS dynamic model and visual observation model are introduced, respectively. Subsequently, the state model and measurement model of Sky-GVIO integration model are described. Finally, we use the segmentation results to realize the NLOS detection and construct the LOS/NLOS model for NLOS mitigation.

1) GNSS Observation Model: The original pseudorange and carrier phase observation equations in GNSS positioning are expressed as follows:

P=\rho+c(t_{r}-t^{s})+I+T+\varepsilon_{p}

(1)

L=\rho+c(t_{r}-t^{s})-I+T+\lambda N+\varepsilon_{L}

(2)

where $P$ and $L$ represent the pseudorange and carrier phase, respectively. The angular symbols $s$ and $r$ refer to satellites and receivers, respectively. $\rho$ denotes the geometric distance between the phase centers of the receiver and satellite antennas. $t_{r}$ and $t^{s}$ respectively represent receiver and satellite clock offsets. The speed of light is $c$ . $I$ and $T$ refer to the ionospheric and troposphere delay, respectively. $\lambda$ represents the carrier wavelength. $N$ represents carrier phase ambiguity. $\varepsilon_{P}$ and $\varepsilon_{L}$ represent pseudorange noise and carrier phase noise, respectively.

For SPP model, equation (1) is sufficient. However, in the case of RTK model, it can be represented as follows:

\begin{cases}\nabla\Delta P=\nabla\Delta\rho+\nabla\Delta I+\nabla\Delta T+% \nabla\Delta\epsilon_{P}\\ \nabla\Delta L=\nabla\Delta\rho-\nabla\Delta I+\nabla\Delta T+\lambda\nabla% \Delta N+\nabla\Delta\epsilon_{L}\end{cases}

(3)

where $\nabla\Delta$ denotes the double-differenced (DD) operator. The DD operation is used to not only eliminate satellite orbit errors and clock errors but also to mitigate receiver clock errors, tropospheric and ionospheric delays, making it a powerful technique in GNSS positioning.

2) INS Dynamic Model: Considering the noisy measurement of the low-cost IMU, the Coriolis and centrifugal forces due to earth rotation are ignored in the IMU formulation. The inertial measurement can be modeled [34] in $b$ (body) frame as follows:

\tilde{\bm{a}}_{k}=\bm{a}_{k}+\bm{b}_{a_{k}}+\left(\bm{R}^{n}_{{b}_{k}}\right)% ^{T}\bm{g}^{n}+\bm{n}_{a}

(4)

\tilde{\bm{\omega}}_{k}=\bm{\omega}_{k}+\bm{b}_{\omega_{k}}+\bm{n}_{\omega}

(5)

where $\left[\tilde{\bm{a}}_{k},\tilde{\bm{\omega}}_{k}\right]$ is the output of the IMU at time $k$ and $\left[\bm{a}_{k},\bm{\omega}_{k}\right]$ is the linear acceleration and angular velocity of the IMU sensor. $\bm{b}_{a_{k}}$ and $\bm{b}_{\omega_{k}}$ respectively are the biases of the accelerometer and gyroscope at time $k$ . In addition, $\bm{n}_{a}$ and $\bm{n}_{\omega}$ are assumed to be zero-mean Gaussian distributed with $\bm{n_{a}}\sim N\left(0,\Sigma_{n_{a}}\right)$ , $\bm{n_{\omega}}\sim N\left(0,\Sigma_{n_{\omega}}\right)$ . $\bm{R}^{n}_{{b}_{k}}$ denotes the rotation matrix from IMU body ( $b$ )-frame to navigation ( $n$ )-frame. $\bm{g}^{n}$ is the gravity in the $n$ frame.

The linearized INS dynamic model [35] can be expressed as:

\begin{cases}\begin{aligned} \delta\dot{\mathbf{p}}^{n}&=\delta\mathbf{v}^{n}% \\ \delta\dot{\mathbf{v}}^{n}&=-\mathbf{R}^{n}_{b}\left(\tilde{\mathbf{a}}-% \mathbf{b}_{a}\right)^{\wedge}\delta\bm{\theta}-\mathbf{R}^{n}_{b}\delta% \mathbf{b}_{a}-\mathbf{R}_{b}^{n}\mathbf{n}_{a}\\ \delta\dot{\bm{\theta}}&=-\left(\bm{\tilde{\omega}-\mathbf{b}}_{w}\right)^{% \wedge}\delta\bm{\theta}-\delta\mathbf{b}_{w}-\mathbf{n}_{w}\\ \delta\dot{\mathbf{b}}_{w}&=\mathbf{n}_{b_{w}}\\ \delta\dot{\mathbf{b}}_{a}&=\mathbf{n}_{b_{a}}\\ \end{aligned}\end{cases}

(6)

where $\delta\bm{\dot{\theta}}$ , $\delta\dot{\mathbf{v}}^{n}$ and $\delta\dot{\mathbf{p}}^{n}$ represent the derivative of attitude, velocity and position errors in $\mathit{n}$ frame, respectively. The derivatives of $\delta\dot{\mathbf{b}}_{a}$ and $\delta\dot{\mathbf{b}}_{\omega}$ , denoting the accelerometer and gyroscope biases in $\mathit{b}$ frame, respectively. In addition, $\mathbf{R}_{b}^{n}$ represents the rotation matrix from $\mathit{b}$ frame to $\mathit{n}$ frame; $\bm{\tilde{\omega}}$ and $\tilde{\bm{a}}$ represent the outputs of gyroscope and accelerometer, respectively; $\bm{b}_{\omega}$ and $\bm{b}_{a}$ represent the nominal biases of gyroscope and accelerometer, respectively; $\delta\bm{\theta}$ and $\delta\mathbf{v}^{n}$ represent the errors of attitude and velocity in $\mathit{n}$ frame, respectively; $\delta\bm{b}_{\omega}$ and $\delta\bm{b}_{a}$ represent the errors of gyroscope bias and accelerometer bias, respectively; $\bm{n}_{\omega}$ and $\bm{n}_{a}$ represent the noises of angular rate and acceleration, respectively; $\mathbf{n}_{b_{a}}$ and $\mathbf{n}_{b_{\omega}}$ represent the noises of gyroscope bias and accelerometer bias, respectively. The symbol $\left(\cdot\right)^{\wedge}$ is the cross-product.

Therefore, the error state vector of INS can be expressed as:

\delta\bm{x}_{ins}=\left[\delta\mathbf{p}^{n}\quad\delta\mathbf{v}^{n}\quad% \delta\bm{\theta}\quad\delta\mathbf{b}_{a}\quad\delta\mathbf{b}_{\omega}\right% ]^{T}

(7)

3) Visual Measurement Model: The core idea of the well-known MSCKF is to establish geometric constraints between multi-camera states by utilizing the same visual feature points observed by multi-cameras. Following this concept, we establish a visual model. For a visual feature point $f^{j}$ observed by a stereo camera at time $i$ , its visual observation model [34] on the normalized projection planes of the left and right cameras can be represented as follows:

\mathit{z}_{cam,i}^{j}=\left[\begin{matrix}\mathit{u}_{c_{0,i}}^{j}\\ \mathit{v}_{c_{0,i}}^{j}\\ \mathit{u}_{c_{1,i}}^{j}\\ \mathit{v}_{c_{1,i}}^{j}\end{matrix}\right]=\left[\begin{matrix}\frac{1}{% \mathit{Z}_{c_{0,i}}^{j}}\mathbf{I}_{2\times 2}&\mathbf{0}_{2\times 2}\\ \mathbf{0}_{2\times 2}&\frac{1}{\mathit{Z}_{c_{1,i}}^{j}}\mathbf{I}_{2\times 2% }\end{matrix}\right]\left[\begin{matrix}\mathit{X}_{c_{0,i}}^{j}\\ \mathit{Y}_{c_{0,i}}^{j}\\ \mathit{X}_{c_{1,i}}^{j}\\ \mathit{Y}_{c_{1,i}}^{j}\end{matrix}\right]+\mathit{\epsilon}_{cam,i}^{j}

(8)

where the subscripts 0 and 1 represent the left and right cameras, respectively. $\left(\mathit{u}_{c_{0,j}}^{j},\mathit{v}_{c_{0,j}}^{j}\right)^{T}$ and $\left(\mathit{u}_{c_{1,j}}^{j},\mathit{v}_{c_{1,j}}^{j}\right)^{T}$ are the pixel coordinates of the same feature point on the normalized plane for left camera and right camera, respectively. $\mathit{\epsilon}_{cam,i}^{j}$ is visual measurement noise. $\left(\mathit{X}_{c_{0,j}}^{j},\mathit{Y}_{c_{0,j}}^{j},\mathit{Z}_{c_{0,j}}^{% j}\right)^{T}$ and $\left(\mathit{X}_{c_{1,j}}^{j},\mathit{Y}_{c_{1,j}}^{j},\mathit{Z}_{c_{1,j}}^{% j}\right)^{T}$ represents the position of the same feature point for left camera and right camera in $\mathit{c}$ frame, which can be expressed as:

\begin{bmatrix}\mathit{X}_{c_{0,i}}^{j}\\ \mathit{Y}_{c_{0,i}}^{j}\\ \mathit{Z}_{c_{0,i}}^{j}\end{bmatrix}=\left(\mathbf{R}_{c_{0,i}}^{n}\right)^{T% }\left(\bm{p}_{j}^{n}-\bm{p}_{c_{0,i}}^{n}\right)

(9)

\begin{bmatrix}\mathit{X}_{c_{1,i}}^{j}\\ \mathit{Y}_{c_{1,i}}^{j}\\ \mathit{Z}_{c_{1,i}}^{j}\end{bmatrix}=\left(\mathbf{R}_{c_{0,i}}^{c_{1,i}}% \right)^{T}\left(\bm{p}_{j}^{c_{0,i}}-\bm{p}_{c_{1,i}}^{c_{0,i}}\right)

(10)

where $\mathbf{R}_{c_{0,i}}^{c_{1,i}}$ and $\bm{p}_{c_{0,i}}^{n}$ are the rotation matrix and position of the left camera at time $\mathit{i}$ in $\mathit{n}$ frame, respectively. $\mathbf{R}_{c_{0,i}}^{c_{1,i}}$ is the rotation matrix from left camera to right camera at time $\mathit{i}$ , $\bm{p}_{c_{1,i}}^{c_{0,i}}$ is the translation matrix from left camera to right camera, which can be accurately corrected in advance [36]. $\bm{p}_{j}^{n}$ and $\bm{p}_{j}^{c_{0,i}}$ respectively are the positions of the same visual feature point in $\mathit{n}$ frame and left $\mathit{c}$ frame.

We adopted the method proposed by [37] to construct the visual reprojection error between relative camera poses, and the visual state vector was described as:

\delta\bm{x}_{cam}=\left[\delta\bm{\theta}^{n}_{c_{1}}\quad\delta\bm{p}^{n}_{c% _{1}}\quad\delta\bm{\theta}^{n}_{c_{2}}\quad\delta\bm{p}^{n}_{c_{2}}\quad...% \quad\delta\bm{\theta}^{n}_{c_{s}}\quad\delta\bm{p}^{n}_{c_{s}}\right]^{T}

(11)

where $\delta\bm{\theta}^{n}_{c_{i}}$ and $\delta\bm{p}^{n}_{c_{i}}$ are attitude errors and position errors at time $\mathit{i}$ . The subscript $\mathit{s}$ represents the total number of camera poses in the sliding window. The measurement equation of visual reprojection error is expressed as follows:

\delta\bm{z}_{cam}=\bm{\tilde{z}}_{cam}-\bm{\hat{z}}_{cam}=\bm{H}_{cam}\delta% \bm{x}_{cam}+\bm{V}_{cam}

(12)

$\bm{\tilde{z}}_{cam}$ and $\bm{\hat{z}}_{cam}$ represent visual observations and visual reprojection observations, respectively; $\bm{H}_{cam}$ represents the Jacobi matrix of stereo camera positioning model.

4) State and Measurement model of the Tightly Coupled GNSS/INS/Vision: This paper employs MSCKF for the tightly coupled GNSS/INS/Vision integration. Based on the above introductions of different sensor models, the complete state model for the tightly coupled GNSS/INS/Vision integration is as follows:

\delta\bm{x}=\left[\delta\bm{x}_{ins}\quad\delta\bm{x}_{GNSS}\quad\delta\bm{x}% _{cam}\right]^{T}

(13)

For both SPP and RTK positioning modes, this paper has constructed state models separately:

\delta x_{GNSS,SPP}=[\delta t_{r}]^{T}

(14)

\delta x_{GNSS,RTK}=[\delta\nabla\Delta N]^{T}

(15)

where $\delta\nabla\Delta N$ represents the DD carrier phase ambiguity.In addition, the error state model of INS and vision have already been provided in equation (7) and (11).

The state prediction model for the tightly coupled GNSS/INS/Vision integration is as follows:

\small\begin{bmatrix}\delta\bm{\dot{x}}_{ins}\\ \delta\bm{\dot{x}}_{GNSS}\\ \delta\bm{\dot{x}}_{cam}\end{bmatrix}=\begin{bmatrix}\bm{F}_{ins}&\mathbf{0}&% \mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}\end{bmatrix}\begin{bmatrix}\delta\bm{x}_{ins}% \\ \delta\bm{x}_{GNSS}\\ \delta\bm{x}_{cam}\end{bmatrix}+\begin{bmatrix}\bm{n}_{ins}\\ \bm{n}_{GNSS}\\ \mathbf{0}\end{bmatrix}

(16)

where $\bm{F}_{ins}$ is the system matrices of INS state which could be directly from equation (6). $\bm{n}_{ins}$ and $\bm{n}_{GNSS}$ are the process noises of INS and map, respectively. In addition, the camera poses in the sliding window are considered constant, so its process noise is $\mathbf{0}$ . Based on equation (6), the special form of can be written as:

\bm{F}_{ins}=\begin{bmatrix}\mathbf{0}&{\mathbf{I}}&\mathbf{0}&\mathbf{0}&% \mathbf{0}\\ \mathbf{0}&\mathbf{0}&-\mathbf{R}^{n}_{b}\left(\tilde{\mathbf{a}}-\mathbf{b}_{% a}\right)^{\wedge}&-\mathbf{R}^{n}_{b}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&-\left(\bm{\tilde{\omega}-\mathbf{b}}_{w}\right)^{\wedge% }&\mathbf{0}&-\mathbf{I}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{0}\end{bmatrix}

(17)

where $\mathbf{I}$ is the identity matrix.

To deal with discrete time measurement from the INS, the 4th order Runge-kutta [7] numerical integration of equation (17) to propagate the estimated state variables. It is worth noting, only the IMU state variables are propagated, the visual and GNSS state variables are only copied. Meanwhile, we also need to propagate the covariance of the state:

	$\displaystyle\bm{P}_{k,k-1}$	$\displaystyle=\bm{\Phi}_{k,k-1}\bm{P}_{k-1}\bm{\Phi}_{k,k-1}^{T}+\bm{Q}_{k-1}$		(18)
	$\displaystyle\bm{\Phi}_{k,k-1}$	$\displaystyle=\bm{\Phi}(t_{k-1},t_{k})=\exp(\int_{t_{k}}^{t_{k-1}}\bm{F}(\tau)% \mathrm{d}\tau)$		(18)

where $\bm{\Phi}_{k,k-1}$ represents the discrete state transition matrix, $\bm{F}(\tau)$ is the continuous time state transition matrix at time $\tau$ ( $\tau\in\left(t_{k},t_{k+1}\right)$ ) and $\bm{Q}_{k-1}$ is the discrete time noise covariance. $\bm{P}_{k-1}$ represents the error state covariance matrix before augmentation. $\bm{P}_{k,k-1}$ represents the one-step prediction error covariance matrix from time $t_{k-1}$ to time $t_{k}$ .

It is worth noting that every time a new image is recorded, the state and covariance matrix will be augmented with a copy of the current camera pose estimate. The initial value of camera pose is derived from the INS mechanization and the covariance matrix $\bm{P}_{k}$ after augmented can be expressed as:

\bm{P}_{k}=\left[\begin{matrix}\mathbf{I}_{15+y+6m}\\ \mathbf{J}\end{matrix}\right]\bm{P}_{k-1}\left[\begin{matrix}\mathbf{I}_{15+y+% 6m}\\ \mathbf{J}\end{matrix}\right]^{T}

(19)

where $y$ and $m$ represent the number of variables related to GNSS and vision at a certain moment. And when the GNSS is recorded, we only need to remove and add state variables and corresponding covariance. $\mathbf{J}$ is the Jacobi matrix, which has the following form:

\mathbf{J}=\left[\begin{matrix}(\mathbf{R}_{c}^{b})^{T}&\mathbf{0}&\mathbf{0}&% \mathbf{0}&\mathbf{0}&\mathbf{0}_{3\times(y+6m)}\\ -\mathbf{R}_{b}^{c}(\bm{p}_{c}^{b})^{\wedge}&\mathbf{0}&\mathbf{I}&\mathbf{0}&% \mathbf{0}&\mathbf{0}_{3\times(y+6m)}\end{matrix}\right]

(20)

where $\mathbf{R}_{c}^{b}$ and $\bm{p}_{c}^{b}$ are the rotation matrix and translation matrix between camera and IMU, which are calibrated offline [36].

Based on the previous equations, the measurement equation for the tightly coupled SPP/INS/Vision integration are formulated as follows:

	$\displaystyle\left[\begin{matrix}\delta\bm{P}_{SPP}\\ \delta\bm{z}_{cam}\end{matrix}\right]$	$\displaystyle=\left[\begin{matrix}{\bf H}_{P,SPP}\\ {\bf H}_{cam}\end{matrix}\right]\left[\begin{matrix}\delta\bm{x}_{ins}\\ \delta\bm{x}_{GNSS,SPP}\\ \delta\bm{x}_{cam}\end{matrix}\right]+\left[\begin{matrix}\bm{\varepsilon}_{P,% SPP}\\ \bm{\varepsilon}_{cam}\end{matrix}\right]$		(21)
	$\displaystyle\left[\begin{matrix}\delta\bm{P}_{SPP}\\ \delta\bm{z}_{cam}\end{matrix}\right]$	$\displaystyle=\left[\begin{matrix}\bm{P}-\bm{\hat{P}}_{ins}\\ \bm{z}_{cam}-\hat{\bm{z}}_{cam}\end{matrix}\right]$		(21)

where $\delta\bm{P}_{SPP}$ is error of pseudorange observation in SPP and $\delta\bm{z}_{cam}$ is error of visual observation in equation (12). ${\bf H}_{P,SPP}$ is the Jacobi matrix of pseudorange error and ${\bf H}_{cam}$ is the Jacobi matrix of the involved camera states in equation (12). Then $\delta\bm{x}_{ins}$ , $\delta\bm{x}_{GNSS,SPP}$ and $\delta\bm{x}_{cam}$ are the error state vectors of INS, SPP and Vision which can be found in equation (7), (14) and (11), respectively. In the same way, $\bm{\varepsilon}_{P,SPP}$ and $\bm{\varepsilon}_{cam}$ denote pseudorange observation error noise in SPP and visual observation error noise, respectively. In addition, $P$ and $\bm{\hat{P}}_{ins}$ are the actual measured pseudorange in equation (1) and the pseudorange predicted by INS mechanization, respectively. $\bm{z}_{cam}$ and $\hat{\bm{z}}_{cam}$ respectively represent the observed and reprojected visual measurements in equation (12).

The measurement equation for the tightly coupled RTK/INS/Vision integration are formulated as follows:

	$\displaystyle\left[\begin{matrix}\delta\bm{P}_{RTK}\\ \delta\bm{L}_{RTK}\\ \delta\bm{z}_{cam}\end{matrix}\right]$	$\displaystyle=\left[\begin{matrix}{\bf H}_{P,RTK}\\ {\bf H}_{L,RTK}\\ {\bf H}_{cam}\end{matrix}\right]\left[\begin{matrix}\delta\bm{x}_{ins}\\ \delta\bm{x}_{GNSS,RTK}\\ \delta\bm{x}_{cam}\end{matrix}\right]+\left[\begin{matrix}\bm{\varepsilon}_{% \nabla\Delta P,RTK}\\ \bm{\varepsilon}_{\nabla\Delta L,RTK}\\ \bm{\varepsilon}_{cam}\end{matrix}\right]$		(22)
	$\displaystyle\left[\begin{matrix}\delta\bm{P}_{RTK}\\ \delta\bm{L}_{RTK}\\ \delta\bm{z}_{cam}\end{matrix}\right]$	$\displaystyle=\left[\begin{matrix}\bm{\nabla\Delta P}-\bm{\nabla\Delta\hat{P}}% _{ins}\\ \bm{\nabla\Delta L}-\bm{\nabla\Delta\hat{L}}_{ins}\\ \bm{z}_{cam}-\hat{\bm{z}}_{cam}\end{matrix}\right]$		(22)

where $\delta\bm{P}_{RTK}$ and $\delta\bm{L}_{RTK}$ represent the observation errors of DD pseudorange and DD carrier phase in RTK, respectively. $\bm{\nabla\Delta P}$ and $\bm{\nabla\Delta L}$ can be found in equation (3). In addition, $\bm{\nabla\Delta\hat{P}}_{ins}$ and $\bm{\nabla\Delta\hat{L}}_{ins}$ are DD pseudorange predicted and DD carrier phase predicted by INS mechanization, respectively. Then ${\bf H}_{P,RTK}$ and ${\bf H}_{L,RTK}$ are the Jacobi matrices of DD pseudorange error and DD carrier phase error. $\delta\bm{x}_{GNSS,RTK}$ is the error state vector of RTK which can be found in equation (15). $\bm{\varepsilon}_{\nabla\Delta P,RTK}$ and $\bm{\varepsilon}_{\nabla\Delta L,RTK}$ denote DD pseudorange observation error noise and DD carrier phase observation error in RTK.

\displaystyle\begin{cases}R_{k}^{P,LOS}=f\times(10^{\frac{SNR-S_{1}}{a}}((% \frac{A}{\frac{S_{0}-S_{1}}{10}}-1)\frac{SNR-S_{1}}{S_{0}-S_{1}}+1))\times% \sigma_{p}^{2}\\ R_{k}^{L,LOS}=f\times(10^{\frac{SNR-S_{1}}{a}}((\frac{A}{\frac{S_{0}-S_{1}}{10% }}-1)\frac{SNR-S_{1}}{S_{0}-S_{1}}+1))\times\sigma_{L}^{2}\end{cases}

(23)

\displaystyle\begin{cases}R_{k}^{P,NLOS}&=K\times R_{k}^{P,LOS}\\ R_{k}^{L,NLOS}&=K\times R_{k}^{L,LOS}\end{cases}

(24)

In equation (23), $f={1}/{\sin^{2}(ele)}$ , $ele$ and $SNR$ refer to elevation angles and SNR of satellites, respectively. The work [38] gives $s_{1}=50$ , $A=30$ , $s_{0}=10$ and $a=20$ , these parameters are empirical values. $\sigma_{P}$ and $\sigma_{L}$ represent the standard deviation of pseudorange and carrier phase respectively, which are 0.3 m and 0.03 m given in this paper. The $K$ is the scale factor, which is 10 in this paper. If the satellite’s projection is located in the sky semantic region, then Stochastic model of satellite observations will be modeled by using equation (24), and if not, it will be modeled by using equation (23).

II-C The Sky-view Images aided GNSS NLOS Detection and Mitigation Method (S-NDM)

For accurate modeling of the GNSS noise covariance in the tightly coupled GNSS/INS/Vision integration, it is essential to differentiate between LOS and NLOS satellites. Therefore, we obtain sky-mask after performing semantic segmentation of the sky-view images using FCN, and subsequently, based on the projection model mentioned in [28] and satellite information included elevation and azimuth angles provided by satellite ephemeris, we ultimately identify LOS and NLOS conditions around the GNSS receiver. Fig. 3 shows the overall flow of S-NDM algorithm. In this process, if the satellite’s projection is located in the sky region, then the satellite will be classified as an LOS satellite (represented by a blue dot), and if not, it will be classified as an NLOS satellite (represented by a red dot). Through this satellite visualization strategy, we can obtain the judgment conditions of equation (23) and (24). Different from judging conditions by experience threshold in [38], satellite visualization strategy is more reliable, which is also the difference between LOS/NLOS signal modeling in this paper and traditional methods.

TABLE I: Technical specifications of the IMU sensors.

IMU Equipment	Grade	Sample rates (Hz)	Angular	Velocity	Acc	Gyro
IMU Equipment	Grade	Sample rates (Hz)	$\left({}^{\circ}/\sqrt{h}\right)$	$\left(m/s/\sqrt{h}\right)$	$\left(mGal\right)$	$\left({}^{\circ}/h\right)$
ADIS-16470	MEMS	100	0.34	0.18	1300	8
SPAN-ISA-100C	Tactical	200	0.005	0.018	100	0.05

III Experiments

This section delineates the experimental methodology undertaken to assess the effectiveness of the sky-view image-assisted GNSS NLOS detection across different positioning models. The study categorizes the models into SPP-related and RTK-related tightly coupled models for comparative analysis. To evaluate positioning performance, we calculated the root mean square error (RMSE) in the three directions of the East (E), North (N) and Up (U).

III-A Experiment Description

As is shown in Fig. 4, The data acquisition platform consists of a GNSS receiver (Septentrio mosaic-X5 mini), GNSS antenna (NovAtel GNSS-850), and two forward-looking cameras (FLIR BFS-U3-31S4C-C), an sky-pointing fish-eye camera (FE185C057HA-1), a tactical grade IMU (NovAtel SPAN-ISA-100C), a MEMS-IMU (ADIS-16470), and a time synchronization board. The time synchronization board unifies the time of all sensors to GPS time through pulse per second (PPS) generated by the GNSS receiver. The sampling rates of GNSS, MEMS-IMU, forward-looking cameras and fish-eye camera are 1 Hz, 100 Hz, 10 Hz, and 1 Hz, respectively. In addition, the NovAtel SPAN-ISA-100C interacts with NovAtel’s ProPak7 receiver via a highly reliable IMU interface. The tightly coupled multi-GNSS post-processing kinematic (PPK)/INS bidirectional smoothing position results can be obtained through commercial IE 8.9 software and used as a reference truth value. Table I lists the specific parameters of the two IMUs. For software, we run the Linux system in the environment with Intel Core i7-9750H@ 2.6GHz, 32GB memory. In addition, we used a 3060Ti GPU (Graphics Processing Unit) for acceleration. Meanwhile, we used the opencv3.4.9 [39] to process the images in the tightly coupled system.

We collected vehicular data in a typical urban canyon area in Wuhan On September 3, 2023. The experimental trajectory and surrounding landscape, featuring high-rise structures, dense foliage, and overpasses, are illustrated in Fig. 5. The GNSS elevation angles and the position dilution of precision (PDOP) values for this route are shown in Fig. 6. Combined with the LOS/NLOS satellite conditions presented in Fig. 5, it is conceivable that our testing environment is plagued by severe GNSS NLOS, multipath, and cycle slip issues. These complications not only impair GNSS-based positioning accuracy but also challenge the reliability of GNSS/INS/Vision integration model that depends on GNSS for absolute position information. The choice of such a challenging environment underscores the purpose of the experiment, which is to validate the efficacy and dependability of S-NDM algorithm and Sky-GVIO model introduced in this study.

In addition, we briefly introduce our sky-view images dataset, including the training dataset and the testing dataset. The training dataset contains 440 images and the testing dataset contains 2000 images. These data were collected in typical urban canyons in two different areas of Wuhan, which contain trees, tall buildings, and light poles. Therefore, this dataset is very suitable for sky-view images segmentation experiments. As shown in Fig. 7, it is the sky-view images that we have labeled semantically, with blue representing the sky area and black representing the non-sky area.

III-B The Results of Sky-view Images Segmentation and GNSS NLOS Detection

The segmentation of sky-view images in urban canyons is challenging due to dynamic environmental factors such as cloud cover and varying light conditions, which can degrade the accuracy of traditional image segmentation techniques. The Result of poor image segmentation accuracy will lead to errors in NLOS detection. Fig. 8 presents a comparison of the results of sky-view images segmentation between traditional segmentation algorithms and the method proposed in this paper.

We compare our method based FCN with representative methods on image segmentation, including Otsu, Kmeans and Region growth. For fair comparison with the other competitors, all tests were performed on our collected dataset. As can be seen in Fig. 8, cloud and light cause obvious errors in Fig. 8(b), Fig. 8(c) and Fig. 8(d), especially in areas close to buildings. The performances of Otsu and Kmeans are relatively similar, but Region Growth demonstrates a higher incidence of misclassification. Furthermore, in images featuring elevated bridges, an erroneous selection of seed points leads to the misidentification of sky regions as non-sky regions, which can be critically detrimental in NLOS identification. In contrast, our proposed FCN-based approach attains a high-precision segmentation outcome. This is because FCN captures global context information for the input image by using convolution and pooling layers. This allows the network to better understand the relationships between different objects and the overall structure of the image, which helps in more accurate segmentation.

FCN-derived segmentation results are utilized for NLOS detection, with the visibility of satellites illustrated in Fig. 9. In the Fig. 9, we can observe that there are no identification errors in the visualization results of LOS and NLOS satellites, underscoring the reliability of our S-NDM algorithm. In the case of mild urban canyon environments (as illustrated in Fig. 9(a) and Fig. 9(c)), LOS satellites dominate. However, in the case of deep urban canyons (as shown in Fig. 9(b)) and environments with elevated bridges (as in Fig. 9(d)), fewer than four LOS satellites are detectable, and the satellite configurations are suboptimal, highlighting the complexity and challenges of our experimental testing environments.

TABLE II: Performance comparison of sky-view images using different methods

Method	Kmeans	Otsu	Region growth	Ours
FPS	0.34	5.47	3.69	10.85
Accuracy	49.50%	36.45%	44.96%	98.54%

III-C The Quantitative Analysis of Sky-view Images Segmentation

Considering the high requirement of real-time and precision for vehicle positioning, we conducted quantitative tests on the efficiency and accuracy of different segmentation algorithms. The results are shown in Table II. The efficiency of these algorithms is reflected by FPS (Frames Per Second), that FPS refers to the number of images processed per second. The Accuracy of the algorithm is reflected by “Accuracy”, which refers to the percentage of the number of correctly segmented images in the total number of processing results. The experiment is carried out on the training data set, which is convenient for us to calculate the performance index of these algorithms in sky-view images segmentation.

It can be seen from Table II that the FCN-based image segmentation algorithm is more efficient, which is due to FCN supporting GPU acceleration. The accuracy of the other CPU (Central Processing Unit)-based machine learning methods are less than 50%. In addition, the update cycle of the GNSS, INS and vision tightly coupled model based on MSCKF is 1s (synchronizing with the GNSS sampling frequency). Therefore, FCN’s FPS fully meets the demand. Compared with machine learning methods, FCN based on deep learning is also more advantageous in terms of accuracy. Therefore, in terms of efficiency or accuracy, the FCN-based sky-view images segmentation algorithm proposed in this paper is meaningful.

III-D The Experimental Results of Positioning

To verify the effectiveness of Sky-GVIO model on the positioning of car in the urban canyon, and evaluate the performance improvement of the two modes based on SPP-related and RTK-related enhanced by S-NDM. It should be noted that we use the RTK float solution. We conducted several experimental comparisons and compared against state-of-the-art methods.

TABLE III: Position RMSEs of VINS-mono, GVINS and (TC)-SPP/INS/Vision, SPP/INS/Vision/Sky, RTK/INS/Vision, RTK/INS/Vision/Sky models

	Method	Position RMSE(m)
	Method	East	North	Up
Ours	TC-SPP/INS/Vision	3.24	2.14	3.39
	TC-SPP/INS/Vision/Sky	2.07	1.51	2.47
	TC-RTK/INS/Vision	0.21	0.13	0.36
	TC-RTK/INS/Vision/Sky	0.16	0.11	0.27
Others	VINS-mono	-	-	-
Others	GVINS	2.50	1.75	2.82

The time series of position errors for different tightly coupled models of SPP-related are presented in Fig. 10 and the corresponding RMSEs are summarized in Table III. The positioning accuracy of TC-SPP/INS/Vision in E-N-U directions is 3.24, 2.14 and 3.39 m. Different from TC-SPP/INS/Vision, TC-SPP/INS/Vision/Sky identify LOS/NLOS satellites under the GNSS challenge environment and model them to inhibit the impact of NLOS on GNSS observations. As expected, the positioning accuracy is improved to 2.07, 1.51 and 2.47 m in E-N-U directions when TC-SPP/INS/Vision enhanced by S-NDM. Compared with TC-SPP/INS/Vision, the positioning accuracy of TC-SPP/INS/Vision/Sky is improved by 36%, 29% and 27% in E-N-U directions, respectively. As seen from results, TC-SPP/INS/Vision/Sky can maintain meter-level positioning accuracy. Therefore the Sky-GVIO of SPP-related is more suitable for mobile phone navigation and pedestrian navigation in urban canyons.

The time series of position errors for different tightly coupled models of RTK-related are presented in Fig. 11 and the corresponding RMSEs are summarized in Table III. The positioning accuracy of TC-RTK/INS/Vision in E-N-U directions are 0.21, 0.13 and 0.36 m. It can be seen that the positioning accuracy of TC-RTK/INS/Vision/Sky is 0.16, 0.11 and 0.27 m in E-N-U directions which outperforms TC-RTK/INS/Vision. Compared with TC-RTK/INS/Vision, the positioning accuracy of TC-RTK/INS/Vision/Sky is improved by 24%, 15% and 25% in E-N-U directions, respectively. These considerable improvements in the positioning accuracy mainly stem from GNSS NLOS detection and mitigation enhanced by sky-view images which makes the weighting of GNSS observations more reasonable.

In addition, we compared against state-of-the-art methods, including VINS-mono [5]and GVINS [7]. As we all know, VINS-mono is a very famous tightly coupled model. However, without external information for correction, VINS-mono will accumulate drift errors, resulting in gradually larger errors in E-N-U directions as shown in Fig. 12. Due to large errors obtained by VINS-mono, statistics were not carried out in Table III. GVINS which GNSS pseudorange measurement, GNSS doppler measurement, visual constraints and inertial constraints were jointly optimized is also mature tightly coupled GNSS/INS/Vision model, which is often used to compare models of the same type. The time series of position errors for GVINS are presented in Fig. 13 and the corresponding RMSEs are summarized in Table III. The positioning accuracy of GVINS in E-N-U directions is 2.50, 1.75 and 2.82 m which outperforms TC-SPP/INS/Vision.in this paper. This is because GVINS adds doppler measurements. However, GVINS did not carry out strict quality control in GNSS preprocessing, especially in GNSS NLOS part. Therefore, TC-SPP/INS/Vision/Sky model proposed in this paper has higher accuracy than GVINS.

IV Conclusion

This paper presents a GNSS NLOS-detectable, reliable tight-coupled model in urban canyons, called Sky-GVIO.We detail a module for GNSS NLOS detection and mitigation, and extend it to the tightly coupled GNSS/INS/Vision model. Based on this, we evaluate the position performance of the SPP-related and RTK-related tight-coupled models. We find that these models can be helped to improve the positioning accuracy by the S-NDM algorithm proposed in this paper.In urban canyon environments where GNSS performance is challenging, our Sky-GVIO model of RTK-related can achieve sub-decimeter accuracy, which is exciting for users with high-precision location service needs. In addition, our Sky-GVIO model of SPP-related also achieves meter-level positioning accuracy in this GNSS-challenged urban canyon environment, which is also very meaningful for low-cost users such as cell phone navigation and pedestrian navigation.

In the future, we still have the following work to do:

(1) Enhancing the utilization of fish-eye camera data beyond GNSS NLOS detection, potentially integrating fish-eye camera observations into the proposed model.

(2) Accuracy is expected to reach centimeter level. By adding prior information (such as high-precision maps), the whole system is more robust and the positioning accuracy is higher.

Acknowledgments

Upon reasonable request to the corresponding author, the experimental data used in this research is available. This work was supported by the National Key Research and Development Program of China under Grant 2021YFB2501100.

References

[1] S. Godha and M. Cannon, “Gps/mems ins integrated system for navigation in urban areas,” Gps Solutions, vol. 11, pp. 193–203, 2007.
[2] T. Li, H. Zhang, Z. Gao, Q. Chen, and X. Niu, “High-accuracy positioning in urban environments using single-frequency multi-gnss rtk/mems-imu integration,” Remote sensing, vol. 10, no. 2, p. 205, 2018.
[3] X. Niu, Y. Dai, T. Liu, Q. Chen, and Q. Zhang, “Feature-based gnss positioning error consistency optimization for gnss/ins integrated system,” Gps Solutions, vol. 27, no. 2, p. 89, 2023.
[4] K. Chen, G. Chang, and C. Chen, “Ginav: a matlab-based software for the data processing and analysis of a gnss/ins integrated navigation system,” GPS solutions, vol. 25, no. 3, p. 108, 2021.
[5] Q. Sun, J. Yuan, X. Zhang, and F. Duan, “Plane-edge-slam: Seamless fusion of planes and edges for slam in indoor environments,” IEEE Transactions on Automation Science and Engineering, vol. 18, no. 4, pp. 2061–2075, 2020.
[6] J. Cheng, C. Wang, and M. Q.-H. Meng, “Robust visual localization in dynamic environments based on sparse motion removal,” IEEE Transactions on Automation Science and Engineering, vol. 17, no. 2, pp. 658–669, 2019.
[7] C. Jiang, Z. Hu, Z. P. Mourelatos, D. Gorsich, P. Jayakumar, Y. Fu, and M. Majcher, “R2-rrt*: Reliability-based robust mission planning of off-road autonomous ground vehicle under uncertain terrain environment,” IEEE Transactions on Automation Science and Engineering, vol. 19, no. 2, pp. 1030–1046, 2021.
[8] Z. Shen, X. Li, Y. Zhou, S. Li, Z. Wu, and X. Wang, “Accurate and capable gnss-inertial-visual vehicle navigation via tightly coupled multiple homogeneous sensors,” IEEE Transactions on Automation Science and Engineering, 2024.
[9] T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE transactions on robotics, vol. 34, no. 4, pp. 1004–1020, 2018.
[10] J. Liao, X. Li, X. Wang, S. Li, and H. Wang, “Enhancing navigation performance through visual-inertial odometry in gnss-degraded environment,” Gps Solutions, vol. 25, pp. 1–18, 2021.
[11] S. Cao, X. Lu, and S. Shen, “Gvins: Tightly coupled gnss–visual–inertial fusion for smooth and consistent state estimation,” IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2004–2021, 2022.
[12] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint kalman filter for vision-aided inertial navigation,” in Proceedings 2007 IEEE international conference on robotics and automation. IEEE, 2007, pp. 3565–3572.
[13] T. Li, H. Zhang, Z. Gao, X. Niu, and N. El-Sheimy, “Tight fusion of a monocular camera, mems-imu, and single-frequency multi-gnss rtk for precise navigation in gnss-challenged environments,” Remote Sensing, vol. 11, no. 6, p. 610, 2019.
[14] P. D. Groves, Z. Jiang, B. Skelton, P. A. Cross, L. Lau, Y. Adane, and I. Kale, “Novel multipath mitigation methods using a dual-polarization antenna,” in Proceedings of the 23rd International Technical Meeting of The Satellite Division of the Institute of Navigation (ION GNSS 2010), 2010, pp. 140–151.
[15] S. Liu, D. Li, B. Li, and F. Wang, “A compact high-precision gnss antenna with a miniaturized choke ring,” IEEE Antennas and Wireless Propagation Letters, vol. 16, pp. 2465–2468, 2017.
[16] I. J. Gupta, I. M. Weiss, and A. W. Morrison, “Desired features of adaptive antenna arrays for gnss receivers,” Proceedings of the IEEE, vol. 104, no. 6, pp. 1195–1206, 2016.
[17] D. H. Won, J. Ahn, S.-W. Lee, J. Lee, S. Sung, H.-W. Park, J.-P. Park, and Y. J. Lee, “Weighted dop with consideration on elevation-dependent range errors of gnss satellites,” IEEE Transactions on Instrumentation and Measurement, vol. 61, no. 12, pp. 3241–3250, 2012.
[18] P. D. Groves and Z. Jiang, “Height aiding, c/n0 weighting and consistency checking for gnss nlos and multipath mitigation in urban areas,” The Journal of Navigation, vol. 66, no. 5, pp. 653–669, 2013.
[19] W. Wen, G. Zhang, and L.-T. Hsu, “Exclusion of gnss nlos receptions caused by dynamic objects in heavy traffic urban scenarios using real-time 3d point cloud: An approach without 3d maps,” in 2018 IEEE/ION Position, Location and Navigation Symposium (PLANS). IEEE, 2018, pp. 158–165.
[20] W. W. Wen, G. Zhang, and L.-T. Hsu, “Gnss nlos exclusion based on dynamic object detection using lidar point cloud,” IEEE transactions on intelligent transportation systems, vol. 22, no. 2, pp. 853–862, 2019.
[21] L. Wang, P. D. Groves, and M. K. Ziebart, “Gnss shadow matching: Improving urban positioning accuracy using a 3d city model with optimized visibility scoring scheme,” NAVIGATION: Journal of the Institute of Navigation, vol. 60, no. 3, pp. 195–207, 2013.
[22] L.-T. Hsu, Y. Gu, and S. Kamijo, “3d building model-based pedestrian positioning method using gps/glonass/qzss and its reliability calculation,” GPS solutions, vol. 20, pp. 413–428, 2016.
[23] T. Suzuki and N. Kubo, “N-los gnss signal detection using fish-eye camera for vehicle navigation in urban environments,” in Proceedings of the 27th International Technical Meeting of The Satellite Division of the Institute of Navigation (ION GNSS+ 2014), 2014, pp. 1897–1906.
[24] W. Wen, X. Bai, Y. C. Kan, and L.-T. Hsu, “Tightly coupled gnss/ins integration via factor graph and aided by fish-eye camera,” IEEE Transactions on Vehicular Technology, vol. 68, no. 11, pp. 10 651–10 662, 2019.
[25] J.-i. Meguro, T. Murata, J.-i. Takiguchi, Y. Amano, and T. Hashizume, “Gps multipath mitigation for urban area using omnidirectional infrared camera,” IEEE Transactions on Intelligent Transportation Systems, vol. 10, no. 1, pp. 22–30, 2009.
[26] A. Cohen, C. Meurie, Y. Ruichek, J. Marais, and A. Flancquart, “Quantification of gnss signals accuracy: An image segmentation method for estimating the percentage of sky,” in 2009 IEEE International Conference on Vehicular Electronics and Safety (ICVES). IEEE, 2009, pp. 35–40.
[27] D. Attia, C. Meurie, Y. Ruichek, and J. Marais, “Counting of satellites with direct gnss signals using fisheye camera: A comparison of clustering algorithms,” in 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC). IEEE, 2011, pp. 7–12.
[28] J. Wang, J. Liu, S. Zhang, B. Xu, Y. Luo, and R. Jin, “Sky-view images aided nlos detection and suppression for tightly coupled gnss/ins system in urban canyon areas,” Measurement Science and Technology, vol. 35, no. 2, p. 025112, 2023.
[29] P. P. Vijay and N. Patil, “Gray scale image segmentation using otsu thresholding optimal approach,” Journal for Research, vol. 2, no. 05, 2016.
[30] N. Dhanachandra, K. Manglem, and Y. J. Chanu, “Image segmentation using k-means clustering algorithm and subtractive clustering algorithm,” Procedia Computer Science, vol. 54, pp. 764–771, 2015.
[31] J. Soltani-Nabipour, A. Khorshidi, and B. Noorian, “Lung tumor segmentation using improved region growing algorithm,” Nuclear Engineering and Technology, vol. 52, no. 10, pp. 2313–2319, 2020.
[32] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
[33] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[34] B. Xu, S. Zhang, K. Kuang, and X. Li, “A unified cycle-slip, multipath estimation, detection and mitigation method for vio-aided ppp in urban environments,” GPS Solutions, vol. 27, no. 2, p. 59, 2023.
[35] J. Sola, “Quaternion kinematics for the error-state kalman filter,” arXiv preprint arXiv:1711.02508, 2017.
[36] P. Furgale, J. Rehder, and R. Siegwart, “Unified temporal and spatial calibration for multi-sensor systems,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2013, pp. 1280–1286.
[37] K. Sun, K. Mohta, B. Pfrommer, M. Watterson, S. Liu, Y. Mulgaonkar, C. J. Taylor, and V. Kumar, “Robust stereo visual inertial odometry for fast autonomous flight,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 965–972, 2018.
[38] A. M. Herrera, H. F. Suhandri, E. Realini, M. Reguzzoni, and M. C. de Lacy, “gogps: open-source matlab software,” GPS solutions, vol. 20, pp. 595–603, 2016.
[39] G. Bradski, “The opencv library. dr. dobb’s journal of software tools,(4.7. 0),” computer program] Available at: https://opencv. org [Accessed: 04 April 2023], 2000.