\externaldocument

direg_SM

Structural adaptation via directional regularity: rate accelerated estimation in multivariate functional data

Omar Kassi¹¹1Ensai, CREST - UMR 9194, France; [email protected] , Sunny G.W. Wang²²2Ensai, CREST - UMR 9194, France; [email protected]

(September 5, 2024)

Abstract

We introduce directional regularity, a new definition of anisotropy for multivariate functional data. Instead of taking the conventional view which determines anisotropy as a notion of smoothness along a dimension, directional regularity additionally views anisotropy through the lens of directions. We show that faster rates of convergence can be obtained through a change-of-basis by adapting to the directional regularity of a multivariate process. An algorithm for the estimation and identification of the change-of-basis matrix is constructed, made possible due to the unique replication structure of functional data. Non-asymptotic bounds are provided for our algorithm, supplemented by numerical evidence from an extensive simulation study. We discuss two possible applications of the directional regularity approach, and advocate its consideration as a standard pre-processing step in multivariate functional data analysis.

Key words: Anisotropy; Multivariate functional data; Hölder exponent

MSC2020: 62R10; 62G07; 62M99; 60G22

1 Introduction

Anisotropy is a key concept in probability and statistics. In the context of non-parametric statistics, it roughly states that the regularity is different between dimensions. Let $(X_{m},Y_{m})$ be independent and identically distributed (iid) pairs observed under the model

Y_{m}=f(X_{m})+\epsilon_{m},\qquad m=1,\dots,M_{0},

with $f:[0,1]^{d}\rightarrow\mathbb{R}$ , and the design points $X_{m}$ ’s are uniformly distributed on the hypercube $[0,1]^{d}$ . The errors $\epsilon_{m}$ are assumed to be uncorrelated, centered random variables. When $f$ belongs to an anisotropic Hölder class, it is well known that under suitable assumptions on the noise, the minimax rate of estimation is $M^{-\beta/(2\beta+1)}$ , where $\beta$ is given by

\frac{1}{\beta}=\sum_{i=1}^{d}\frac{1}{\beta_{i}},

(1)

with $\beta_{i}$ denoting the regularity along dimension $i$ . The parameter $\beta$ is also known as the “effective smoothness" of $f$ , since it characterizes the regularity of $f$ across different dimensions. Since $\beta^{-1}\leq\min_{i=1,\dots,d}\beta_{i}^{-1}$ , anisotropy brings along better rates of convergence in the non-parametric regression setup of (1).

The regularities $\beta_{i},i=1,\dots,d$ are usually considered along the canonical basis, the domain where the data is commonly observed on. However, this can lead to misleading conclusions since anisotropy is not invariant to directions. A relatively recent body of work in non-parametric statistics aims to give consideration to the directional nature of anisotropy, called structural adaptation (Lepski, (2015)). See Samarov and Tsybakov, (2004), Lepski and Rebelles, (2020), Ammous et al., (2024) for some examples in the density model. In spatial data, this has been considered under the umbrella of fractal analysis, for example in the seminal work of Davies and Hall, (1999).

Surprisingly, to the best of our knowledge, the directional nature of anisotropy has yet to be examined in multivariate functional data analysis. We aim to fill this gap by giving a rigorous definition of anisotropy through the directional regularity of a process. Through several examples, we demonstrate that performing a change-of-basis can increase the effective smoothness of a process along the canonical basis. This can lead to faster rates of convergence for ubiquitous tasks such as smoothing in functional data analysis. This change-of-basis matrix is intimately related to the notion of directional regularity that we introduce in this paper, and an algorithm is constructed to estimate it.

Functional data analysis (FDA) is a burgeoning field that deals with modern, complex data, such as those collected by sensors. For an introduction, see the textbooks Ramsay and Silverman, (2005), Horváth and Kokoszka, (2012), Kokoszka and Reimherr, (2017). FDA deals with data sets where the datum are functions, regarded as realizations of a stochastic process or random field, defined over a continuous domain. A unique feature of functional data is replication, where several sample paths are observed.

The remainder of this paper is organized as follows. In Section 2 we give a formal definition of directional regularity, which is elucidated through Lemma 1 and several examples. Lemma 1, which states that the regularity of a surface is the same in every direction except possibly one, is a key property that motivates much of the paper. In Section 3 we describe the estimation procedure of the parameters associated to the directional regularity approach. Section 4 studies the theoretical properties of our Algorithm, where non-asymptotic bounds are provided. Section 5 describes an extensive simulation study that illustrates the good finite sample properties of our Algorithms. Section 6 discusses additional properties of our Algorithm, such as computational considerations and other interesting insights. Section 7 describes some natural applications of our directional regularity approach, including surface smoothing and anisotropic detection. All the proofs are relegated to the Supplementary Material, which also contain additional simulation results.

Refer to caption — Figure 1: In general, the worst regularity $H_{1}$ will be obtained on each direction of canonical basis, leading to isotropy (here $H_{1}<H_{2})$ . This can lead to slower rates of convergence. A change-of-basis from $(\mathbf{e_{1}},\mathbf{e_{2}})$ to $(\mathbf{u_{1}},\mathbf{u_{2}})$ enables one to obtain an anisotropic process. Locating the basis ( $\mathbf{u_{1}},\mathbf{u_{2}}$ ) is equivalent to locating the angle between $\mathbf{u_{1}}$ and $\mathbf{e_{1}}$ .

2 Directional Regularity

2.1 Data setting and definition

Let $\{X(\mathbf{t}),\mathbf{t}\in\mathcal{T}\}$ be a second-order stochastic process on an open, bounded domain $\mathcal{T}\subset\mathbb{R}^{2}$ . In this paper, we will focus on the common design framework, which represents many applications. In particular, functional data collected by sensors and images fall into such a framework. The observations come in the form of pairs $(Y^{(j)}(\mathbf{t}_{m}),\mathbf{t}_{m})\in\mathbb{R}\times\mathcal{T}$ , generated under the model

Y^{(j)}(\mathbf{t}_{m})=X^{(j)}(\mathbf{t}_{m})+\varepsilon_{m}^{(j)},\qquad 1% \leq j\leq N,1\leq m\leq M_{0},

(2)

where $\varepsilon^{(j)}$ are independent, centered errors with variance $\sigma^{2}$ . We suppose without loss of generality that the design points $\mathbf{t}_{m}$ are observed on the canonical basis, and that $\mathcal{T}=(0,1)^{2}$ . The random design case with heteroscedastic errors, together with other possible extensions, are discussed in Section 6.

Definition 1.

Let $X$ be a stochastic process with continuous and non-differentiable sample paths, and $\mathbf{u}\in\mathbb{S}$ be a unit vector. The process $X$ has local regularity $H_{\mathbf{u}}$ at $\mathbf{t}\in\mathcal{T}$ along the direction $\mathbf{u}$ if a non-negative function $L_{u}:\mathcal{T}\rightarrow\mathbb{R}_{+}$ exists such that :

\theta_{\mathbf{u}}(\mathbf{t},\Delta):=\mathbb{E}\left[\left\{X\left(\mathbf{% t}-\frac{\Delta}{2}\mathbf{u}\right)-X\left(\mathbf{t}+\frac{\Delta}{2}\mathbf% {u}\right)\right\}^{2}\right]=L_{\mathbf{u}}(\mathbf{t})\Delta^{2H_{\mathbf{u}% }(\mathbf{t})}+G(\mathbf{t},\Delta),

(3)

where $G_{\mathbf{u}}(\mathbf{t},\Delta)\stackrel{{\scriptstyle\Delta\rightarrow 0}}{% {=}}o\left(\Delta^{2H_{\mathbf{u}}(\mathbf{t})}\right)$ for each fixed $\mathbf{t}$ . The map

H:\begin{cases}\mathbb{S}\times\mathcal{T}\rightarrow(0,1)\\ (\mathbf{u},\mathbf{t})\mapsto H_{\mathbf{u}}(\mathbf{t})\end{cases}

(4)

is called the directional regularity of $X$ .

Definition 1 is local in the sense that the local regularity may vary along the domain $\mathcal{T}$ . In the following, we consider the case where $G_{\mathbf{u}}(\mathbf{t},\Delta)=O(\Delta^{2H_{\mathbf{u}}(\mathbf{t})+\zeta})$ for some $\zeta>0$ . If the function $H$ does not depend on the direction $\mathbf{u}$ , we say that $X$ is an isotropic process, otherwise we call it an anisotropic process. Definition 1 formalizes anisotropy as a notion of regularity not only along a dimension, but also along a direction. This is in contrast to the usual consideration of anisotropy, which typically views it through the lens of the canonical basis. Some examples of processes with prescribed directional regularity will be discussed in Section 2.2.

A natural question that arises is the number of possible regularities that an anisotropic process can take, and how these regularities change with the directions. Lemma 1 provides a definitive answer to the preceding question, stating that there can be at most two different regularities. Furthermore, there exists a unique direction, up to a reflection, for the maximal regularity.

Lemma 1.

Let $\mathbf{t}\in\mathcal{T}$ and assume that there exists basis vectors $(\mathbf{u_{1}}(\mathbf{t}),\mathbf{u_{2}}(\mathbf{t}))\in\mathbb{S}$ such that $H_{\mathbf{u_{1}}}(\mathbf{t})<H_{\mathbf{u_{2}}}(\mathbf{t})$ . Moreover, suppose that the functions $L_{\mathbf{u_{1}}}$ and $L_{\mathbf{u_{2}}}$ are continuously differentiable. For any $\mathbf{v}\in\mathbb{S}$ , we have the following dichotomous relationship:

•

If $\mathbf{v}\neq\pm\mathbf{u_{2}}$ , then $H_{\mathbf{v}}(\mathbf{t})=H_{\mathbf{u_{1}}}(\mathbf{t})$ .
•

Otherwise, we have $H_{\mathbf{v}}(\mathbf{t})=H_{\mathbf{u_{2}}}(\mathbf{t})$ .

The proof of Lemma 1 is provided in the Supplementary Material. It states that the map $\mathbf{v}\mapsto H_{\mathbf{v}}$ can take at most two possible values. This is in alignment with the results in Davies and Hall, (1999), who studied anisotropy from the lens of the fractal dimension of surfaces, and showed that “the fractal dimension of line transects across a surface must either be constant in every direction or be constant in each direction except one".

Let $\mathbf{v}^{*}(\mathbf{t})={\arg\max}_{\mathbf{v}\in\mathbb{S}}H_{\mathbf{v}}(% \mathbf{t})$ be the direction that maximizes the regularity of $X$ in the sense of Definition 1. Then $\mathbf{v}^{*}(\mathbf{t})$ and $-\mathbf{v}^{*}(\mathbf{t})$ are the only two possible maximizers, and are “singularities" in the sense that if we locally examine any vector other than $\pm\mathbf{v}^{*}(\mathbf{t})$ , the associated regularity is $\min_{u\in\mathbb{S}}H_{\mathbf{u}}(\mathbf{t})$ . Parameterizing the vectors by their angle $\mathbf{u}(\beta)=\cos(\beta)\mathbf{e}_{1}+\sin(\beta)\mathbf{e}_{2}$ , the maximization problem $(\mathcal{P}):\arg\max_{\mathbf{v}\in\mathbb{S}}H_{\mathbf{v}}(\mathbf{t})$ is equivalent to ${\arg\max}_{\beta\in[0,2\pi)}H_{\mathbf{u}(\beta)}(\mathbf{t})$ . By restricting the domain to $[0,\pi)$ , $(\mathcal{P})$ admits an unique solution. $(\mathcal{P})$ can thus be parameterized as locating the angle between $\mathbf{v}^{*}$ and $\mathbf{e_{1}}$ , given by

\alpha(\mathbf{t})={\arg\max}_{\beta\in[0,\pi)}H_{\mathbf{u}(\beta)}(\mathbf{t% }).

(5)

An illustration can be found in Figure 1.

2.2 Examples

In the following, we provide some examples of processes that motivate Definition 1.

Example 1 (Sums and products of two fractional Brownian motions).

Let $H\in(0,1)$ be the Hurst parameter of the fractional Brownian motion (fBm) $\{B^{H}(t),t\in(0,1)\}$ . $B^{H}$ is a centered Gaussian process with covariance function

\mathbb{E}\left[B^{H}(t)B^{H}(s)\right]=\frac{1}{2}\left\{|t|^{2H}+|s|^{2H}-|t% -s|^{2H}\right\},\qquad\forall(t,s)\in(0,1)^{2}.

(6)

The fBm is a generalization of the standard Brownian motion, which has Hurst parameter $H=1/2$ . Unlike the standard Brownian motion, the increments of the fBm are not necessarily independent. A simple calculation reveals that

\mathbb{E}\left[\{B^{H}(t)-B^{H}(s)\}^{2}\right]=|t-s|^{2H},\qquad\forall(t,s)% \in(0,1)^{2}.

(7)

Let $(\mathbf{u_{1}},\mathbf{u_{2}})$ be a basis of $\mathbb{R}^{2}$ , $(H_{1},H_{2})\in(0,1)^{2}$ , and $B_{1}$ , $B_{2}$ be two independent fBms with Hurst parameters $H_{1}$ and $H_{2}$ respectively. Set

X_{1}(\mathbf{t})=B_{1}(t_{1})+B(t_{2}),\qquad\text{ for any }\mathbf{t}\in(0,% 1)^{2},\quad\text{and }\mathbf{t}=t_{1}\mathbf{u_{1}}+t_{2}\mathbf{u_{2}}.

(8)

The independence of $B_{1}$ and $B_{2}$ , together with (7), implies that for any $\Delta>0$ sufficiently small, we have

\mathbb{E}\left[\left\{X_{1}\left(\mathbf{t}-\frac{\Delta}{2}\mathbf{u_{i}}% \right)-X_{1}\left(\mathbf{t}+\frac{\Delta}{2}\mathbf{u_{i}}\right)\right\}^{2% }\right]=\Delta^{2H_{i}},\quad i=1,2.

(9)

By Definition 1, for $i=1,2$ , the process $X_{1}$ has a local regularity $H_{i}$ along the direction $\mathbf{u_{i}}$ with $G(\mathbf{t},\Delta)=0$ $(\zeta=\infty)$ , and $L_{\mathbf{u_{i}}}\equiv 1$ .

Similarly, we can define the tensor product of $B_{1}$ and $B_{2}$ w.r.t the basis $(\mathbf{u_{1}},\mathbf{u_{2}})$ :

X_{2}(\mathbf{t})=B_{1}(t_{1})\times B(t_{2}),\qquad\text{ for any }\mathbf{t}% \in(0,1)^{2},\quad\text{and }\mathbf{t}=t_{1}\mathbf{u_{1}}+t_{2}\mathbf{u_{2}}.

(10)

For any $\Delta>0$ sufficiently small, we analogously have

\mathbb{E}\left[\left\{X_{2}\left(\mathbf{t}-\frac{\Delta}{2}\mathbf{u_{i}}% \right)-X_{2}\left(\mathbf{t}+\frac{\Delta}{2}\mathbf{u_{i}}\right)\right\}^{2% }\right]=|t_{j}|^{2H_{j}}\Delta^{2H_{i}},\quad i,j\in\{1,2\}\text{ and }i\neq j,

(11)

so the process $X_{2}$ has a local regularity $H_{i}$ along the direction $\mathbf{u_{i}}$ .

Example 2 (Sum and product of two independent Ornstein-Uhlenbeck processes).

Another example that is regularly used in practice is the stationary fractional Ornstein-Uhlenbeck process $\{U(t),t\in(0,1)\}$ , with index $\rho\in(0,2)$ . It is a centered Gaussian process with a covariance function given by

\mathbb{E}\left[U(t)U(s)\right]=\exp\left(-a|t-s|^{\rho}\right),\quad\text{ % for some }a>0\text{ and }t,s\in(0,1).

(12)

The covariance structure exhibits the following property:

\mathbb{E}\left[\{U(t)-U(s)\}^{2}\right]=2a|t-s|^{\rho}+O(|t-s|^{2\rho}),% \qquad\text{for any }t,s\in(0,1).

(13)

Its bi-dimensional counterpart can be constructed by considering either the sum or the tensor product of two independent Ornstein-Uhlenbeck processes, similar to the fBm in (8) and (10).

Example 3 (Multi-fractional Brownian sheet).

The local regularity in previous examples is constant along the domain $\mathcal{T}$ . The multifractional Brownian sheet (MfBs) is a generalization of the standard fractional Brownian sheet, where the Hurst parameter is allowed to vary along the domain. See Herbin, (2006) among others. (See Kassi et al., , 2023, Proposition 6) for an illustration of the directional regularity property in the case of MfBs.

3 Methodology

3.1 Estimating equations

Let $(\mathbf{e_{1}},\mathbf{e_{2}})$ be the canonical basis of $\mathcal{T}$ , and $(\mathbf{u_{1}},\mathbf{u_{2}})$ be orthonormal basis vectors such that $|H_{\mathbf{u}_{1}}(\mathbf{t})-H_{\mathbf{u}_{2}}(\mathbf{t})|>0$ . Let $\alpha(\mathbf{t})\in[0,\pi)$ be the solution of the problem (5). Denote $H_{1}(\mathbf{t})=H_{\mathbf{u_{1}}}(\mathbf{t})$ and $H_{2}(\mathbf{t})=H_{\mathbf{u_{2}}}(\mathbf{t})$ . The estimating equation for $\alpha$ is found in Proposition 1. Denote $a\wedge b=\min\{a,b\}$ , for $a,b\in\mathbb{R}$ , and $a\vee b=\max\{a,b\}$ . Finally, we use $\#\mathcal{S}$ to denote the cardinality of a finite set $\mathcal{S}$ .

Proposition 1.

Suppose that $\mathbf{u_{1}}\neq\pm\mathbf{e_{i}}$ , for $i=1,2$ . If a process $X$ satisfies (3) for any $\mathbf{t}\in\mathcal{T}$ , then we have

g\left(\alpha(\mathbf{t})\right)=\left(\frac{\theta_{\mathbf{e_{2}}}(\mathbf{t% },\Delta)}{\theta_{\mathbf{e_{1}}}(\mathbf{t},\Delta)}\right)^{\frac{1}{2% \underline{H}(\mathbf{t})}}\left\{1+O\left(\Delta^{\zeta\wedge|H_{1}(\mathbf{t% })-H_{2}(\mathbf{t})|}\right)\right\},

(14)

where $g(\mathbf{t})=|\tan|\mathbf{1}\{H_{1}(\mathbf{t})<H_{2}(\mathbf{t})\}+|\cot|% \mathbf{1}\{H_{1}(\mathbf{t})>H_{2}(\mathbf{t})\}$ , and $\underline{H}(\mathbf{t})=H_{1}(\mathbf{t})\wedge H_{2}(\mathbf{t})$ .

The remainder term $O\left(\Delta^{\zeta\wedge|H_{1}(\mathbf{t})-H_{2}(\mathbf{t})|}\right)$ will be discussed in Section 3.3. Proposition 1 allows us to find a function of the angle by examining the ratios of mean-squared variations, as in (3), by searching solely along the canonical basis. In order to focus on the main idea of directional regularity, we will consider the case where $H$ and $\alpha$ are constant over $\mathcal{T}$ . In theory, the general case can be established by considering a local, pointwise study of our methodology. Hereafter, the notational dependence of $H$ and $\alpha$ on $\mathbf{t}$ will sometimes be suppressed. When the directional regularity is constant over $\mathcal{T}$ , more stable estimates can be obtained by averaging over the design points.

Due to the unique replication nature of function data, the quantities in (14) can be easily estimated. A natural plug-in estimator for the mean-squared variations is its empirical counterpart, given by

\widecheck{\theta}_{\mathbf{e_{i}}}(\mathbf{t},\Delta)=\frac{1}{N}\sum_{j=1}^{% N}\left\{\widetilde{X}^{(j)}\left(\mathbf{t}-(\Delta/2)\mathbf{e_{i}}\right)-% \widetilde{X}^{(j)}\left(\mathbf{t}+(\Delta/2)\mathbf{e_{i}}\right)\right\}^{2},

(15)

where $\widetilde{X}^{(j)}$ denotes an observable approximation of $X^{(j)}$ . Let $\widetilde{\mathcal{T}}$ be a finite approximation of $\mathcal{T}$ with a deterministic number of grid points. Averaging over the design points, we obtain

\widehat{\theta}_{\mathbf{e_{i}}}(\Delta)=\frac{1}{\#\widetilde{\mathcal{T}}}% \sum_{\mathbf{t}\in\widetilde{\mathcal{T}}}\widecheck{\theta}_{\mathbf{e_{i}}}% (\mathbf{t},\Delta)-2\widehat{\sigma}^{2},\qquad i=1,2,

(16)

where

\widehat{\sigma}^{2}=\frac{1}{M_{0}}\sum_{m=1}^{M_{0}}\frac{1}{2N}\sum_{j=1}^{% N}\left(Y^{(j)}(\mathbf{t}_{m})-Y^{(j)}(\mathbf{t}_{m,1})\right)^{2},

(17)

is an estimator of the noise term in (2), with $\mathbf{t}_{m,1}$ denoting the closest observed point to $\mathbf{t_{m}}$ . Since the mean-squared variations are associated to the true realizations of the process $X$ and not the observed, noisy version, denoising is important to obtain an estimator with good rates of convergence. Theoretical properties for the estimator of $\widehat{\sigma}^{2}$ in (17) are provided in Section 4.

In principle, one can choose between a variety of methods to construct the approximations $\widetilde{X}^{(j)}$ . Only a mild moment condition of the form $R_{2}(M_{0})\lesssim M_{0}^{-\nu}$ , for some $\nu>0$ is required, where $R_{p}(M_{0})=\sup_{\mathbf{t}\in\mathcal{T}}\mathbb{E}[|\widetilde{X}_{j}(% \mathbf{t})-X_{j}(\mathbf{t})|^{p}]$ . This is satisfied by many non-parametric smoothers, such as kernel smoothers and series expansions. See for instance Fan and Guerre, (2016) and Belloni et al., (2015). We will use nearest-neighbor interpolation, breaking ties by the lexicographic order, to construct $\widetilde{X}^{(j)}$ since it is sufficient for our purposes, when coupled with a denoising step as in (16). Nearest-neighbor interpolation has the added advantage of requiring no tuning parameters, and is computationally efficient.

A recent line of work in regularity estimation within the context of FDA has emerged, which we can exploit for our purposes. See for example (Golovkine et al., , 2022), Golovkine et al., (2023), Wang et al., (2023), Maissoro et al., (2024), Kassi et al., (2023), and Shen and Hsing, (2020). We will adopt the multivariate version of Kassi et al., (2023), where the minimum regularity can be estimated as

\widehat{\underline{H}}(\Delta)=\begin{cases}\min_{i=1,2}\frac{\log(\widehat{% \theta}_{\mathbf{e_{i}}}(2\Delta))-\log(\widehat{\theta}_{\mathbf{e_{i}}}(% \Delta))}{2\log(2)}\qquad\text{if}\qquad\widehat{\theta}_{\mathbf{e_{i}}}(2% \Delta),\widehat{\theta}_{\mathbf{e_{i}}}(\Delta)>0,\\ 1\qquad\text{otherwise}.\end{cases}

(18)

The regularity estimator in (18) is a slight adaptation of Kassi et al., (2023) by taking the minimum over the index of the basis vectors. This is motivated by Lemma 1, which says that at least one of the two vectors gives us the smallest regularity for the process $X$ . Collecting (16), (17) and (18), a plug-in estimator of (14) is given by

\widehat{g}(\alpha;\Delta)=\left(\frac{\widehat{\theta}_{\mathbf{e_{2}}}(% \Delta)\mathbbm{1}_{\widehat{\theta}_{\mathbf{e_{2}}}(\Delta)>0}+\mathbbm{1}_{% \widehat{\theta}_{\mathbf{e_{2}}}(\Delta)\leq 0}}{\widehat{\theta}_{\mathbf{e_% {1}}}(\Delta)\mathbbm{1}_{\widehat{\theta}_{\mathbf{e_{1}}}(\Delta)>0}+% \mathbbm{1}_{\widehat{\theta}_{\mathbf{e_{1}}}(\Delta)\leq 0}}\right)^{1/(2% \widehat{\underline{H}}(\Delta))}.

(19)

By construction, the only relevant tuning parameter in (19) is the spacing $\Delta$ . We address this issue in Section 3.2, immediately after Algorithm 1 is presented.

3.2 Identification issues

The estimating equation in (14) reveals two identification problems. The first issue is related to the function $g$ , which can either be the tangent ( $\tan$ ) or cotangent ( $\cot$ ) function depending on the ordinal nature of $H_{1}$ and $H_{2}$ . This reflects the phenomenon that the dominating term arising from the ratio of mean squared variations differ, depending on whether $H_{1}>H_{2}$ . However, this ordinality is unknown a priori.

The second issue is associated to the absolute value on the LHS of (14). By using the estimator in (19), one obtains either $g(\alpha)$ or $g(\pi-\alpha)$ . Similarly, the sign is unknown a prior to the statistician. Fortunately, an identification procedure can be constructed to resolve these two issues.

Let $\mathbf{u}$ be a unit vector represented in the canonical basis as $\mathbf{u}(\beta)=\cos(\beta)\mathbf{e_{1}}+\sin(\beta)\mathbf{e_{2}}$ . By construction, the regularity $H_{\mathbf{u}(\alpha)}$ is maximal when built using the angle $\alpha$ in (14). The true value of $\alpha$ can thus be identified as

\alpha={\arg\max}_{\beta\in\left\{\gamma,\pi-\gamma,\pi/2-\gamma,\pi/2+\gamma% \right\}}H_{\mathbf{u(\beta)}},

(20)

where

\gamma\approx\gamma(\Delta)=\text{arccot}\left(\frac{\theta_{\mathbf{e_{2}}}(% \Delta)}{\theta_{\mathbf{e_{1}}}(\Delta)}\right)^{\frac{1}{2\underline{H}(% \Delta)}}.

(21)

The angles $\gamma$ are computed by applying the relevant inverse maps to (19). An estimate of the regularity $H$ along an arbitrary direction $\mathbf{u}(\beta)$ can be obtained by

\widehat{H}_{\mathbf{u}(\beta)}(\Delta)=\begin{cases}\frac{\log(\widehat{% \theta}_{\mathbf{u}(\beta)}(2\Delta))-\log(\widehat{\theta}_{\mathbf{u}(\beta)% }(\Delta))}{2\log(2)}\qquad\text{if}\qquad\widehat{\theta}_{\mathbf{u}(\beta)}% (2\Delta)\geq\widehat{\theta}_{\mathbf{u}(\beta)}(\Delta)>0,\\ 1\qquad\text{otherwise}.\end{cases}

(22)

In order to obtain more robust estimates $\widehat{H}_{\mathbf{u}(\beta)}$ with respect to the spacing $\Delta$ , we propose to compute $\widehat{H}_{\mathbf{u}(\beta)}$ over a grid of points $\mathbf{\Delta}=(\Delta_{1},\dots,\Delta_{K_{0}})^{\top}$ . The angle which maximizes the sum over $\mathbf{\Delta}$ is then selected, yielding the angle

\widehat{\alpha}=\arg\max_{\beta\in\left\{\widehat{\gamma},\pi-\widehat{\gamma% },\pi/2-\widehat{\gamma},\pi/2+\widehat{\gamma}\right\}}\sum_{k=1}^{K_{0}}% \widehat{H}_{\mathbf{u}(\beta)}(\Delta_{k}).

(23)

As seen from Theorem 1, the spacing $\Delta$ used for identification should be chosen such that it is slower in rate compared to the one used for estimation, providing additional justification for (23). The estimation and identification procedure is summarized in Algorithm 1.

Algorithm 1 Estimation and identification of

\alpha

1:Data

(Y^{(j)}(\mathbf{t}_{m}),\mathbf{t}_{m})

, Grid

\widetilde{\mathcal{T}}=\left\{\mathbf{t_{1}},\dots,\mathbf{t_{p}}\right\}

; Initialize

\widehat{H}_{\mathbf{u}(\beta)}\leftarrow\emptyset

;

2:for

\Delta_{k}

\mathbf{\Delta}

3: Compute

\widehat{g}(\alpha,\Delta_{k})

according to (19);

\widehat{\alpha}^{tan}\leftarrow\arctan\left(\widehat{g}(\alpha,\Delta_{k})\right)

;

\widehat{\alpha}^{cot}\leftarrow\text{arccot}\left(\widehat{g}(\alpha,\Delta_{% k})\right)

;

\mathbf{u}(\beta)\leftarrow(\cos(\beta),\sin(\beta))^{\top}

;

\triangleright

\forall\beta\in\{\widehat{\alpha}^{tan},\widehat{\alpha}^{cot},\pi-\widehat{% \alpha}^{tan},\pi-\widehat{\alpha}^{cot}\}

;

7: Compute

\widehat{H}_{\mathbf{u}(\beta)}(\Delta_{k})

according to (22);

\widehat{H}_{\mathbf{u}(\beta)}\leftarrow\widehat{H}_{\mathbf{u}(\beta)}% \bigcup\widehat{H}_{\mathbf{u}(\beta)}(\Delta)

;

9:end forCompute

\widehat{\alpha}

according to (23);

10:return

\widehat{\alpha}

In the estimation of $\widehat{g}(\alpha,\Delta)$ , it is sufficient to choose a fixed spacing $\Delta$ . In order to ensure that the nearest-neighbor of $\mathbf{t}+(\Delta/2)\mathbf{e_{i}}$ is distinct to the nearest-neighbor of $\mathbf{t}-(\Delta/2)\mathbf{e_{i}}$ when computing $\theta_{\mathbf{e_{i}}}$ in $\widehat{g}(\alpha,\Delta)$ , $\Delta$ should be chosen at least as large as $(2M_{0})^{-1/2}$ . However, as seen from the theoretical results in Section 4, we need $\Delta$ to be strictly larger than the minimum size required to capture one point in order to obtain asymptotic concentration. In the non-asymptotic setting, we propose a conservative choice of $\Delta=M_{0}^{-1/4}$ . Simulation results in Section 5 confirm that this choice works well in practice.

3.3 Correction for the remainder term

Let $\widetilde{\mathcal{F}}(\alpha)=\widetilde{\mathcal{F}}(\alpha,\zeta)=O\left(% \Delta^{\zeta\wedge|H_{1}-H_{2}|}\right)$ denote the remainder term in (14). It turns out that $\mathcal{F}$ can become big enough to affect the estimation of $g(\alpha)$ through the constants, which is more pronounced for certain angles $\alpha$ . Let $L_{\overline{\mathbf{u}}}$ be the local Hölder constant in the direction of the maximizing regularity $\overline{H}$ , and let $L_{\underline{\mathbf{u}}}$ be the local Hölder constant in the orthogonal direction of $\overline{\mathbf{u}}$ . The explicit form of the remainder term is given by

\widetilde{\mathcal{F}}(\alpha)=\mathcal{F}(\alpha)+O\left(\Delta^{\zeta\wedge 1% }\right)=\left(\frac{1+\mathcal{F}_{num}}{1+\mathcal{F}_{denom}}\right)^{\frac% {1}{2\underline{H}}}+O\left(\Delta^{\zeta\wedge 1}\right),

(24)

where

\mathcal{F}_{num}=\frac{1}{L_{\underline{\mathbf{u}}}(\Delta)}\frac{L_{% \overline{\mathbf{u}}}(\Delta)\left|\cos(\alpha)\mathbbm{1}_{\{H_{1}<H_{2}\}}+% \sin(\alpha)\mathbbm{1}_{\{H_{1}>H_{2}\}}\right|^{2\overline{H}}\Delta^{2% \overline{H}}+R(\Delta)}{\left|\sin(\alpha)\mathbbm{1}_{\{H_{1}<H_{2}\}}+\cos(% \alpha)\mathbbm{1}_{\{H_{1}>H_{2}\}}\right|^{2\underline{H}}\Delta^{2% \underline{H}}},

(25)

and

\mathcal{F}_{denom}=\frac{1}{L_{\underline{\mathbf{u}}}(\Delta)}\frac{L_{% \overline{\mathbf{u}}}(\Delta)\left|\sin(\alpha)\mathbbm{1}_{\{H_{1}<H_{2}\}}+% \cos(\alpha)\mathbbm{1}_{\{H_{1}>H_{2}\}}\right|^{2\overline{H}}\Delta^{2% \overline{H}}+R(\Delta)}{\left|\cos(\alpha)\mathbbm{1}_{\{H_{1}<H_{2}\}}+\sin(% \alpha)\mathbbm{1}_{\{H_{1}>H_{2}\}}\right|^{2\underline{H}}\Delta^{2% \underline{H}}},

(26)

and

R(\Delta)=\mathbb{E}\left[\left\{X\left(\mathbf{t}-\frac{a_{1}\Delta}{2}% \mathbf{u_{1}}-\frac{a_{2}\Delta}{2}\mathbf{u_{2}}\right)-X\left(\mathbf{t}+% \frac{a_{1}\Delta}{2}\mathbf{u_{1}}-\frac{a_{2}\Delta}{2}\mathbf{u_{2}}\right)% \right\}\right.\\ \left.\times\left\{X\left(\mathbf{t}+\frac{a_{1}\Delta}{2}\mathbf{u_{1}}-\frac% {a_{2}\Delta}{2}\mathbf{u_{2}}\right)-X\left(\mathbf{t}+\frac{a_{1}\Delta}{2}% \mathbf{u_{1}}+\frac{a_{2}\Delta}{2}\mathbf{u_{2}}\right)\right\}\right].

(27)

It is easy to see from (24) that $\mathcal{F}(\pi/4)=1$ , regardless of the ordinal nature of $H_{1}$ and $H_{2}$ . This implies that when $\alpha=\pi/4$ , the estimating equations in (14) are most accurate. However as $\alpha\rightarrow 0$ or $\alpha\rightarrow\pi/2$ , the estimating equations can be dramatically affected by this remainder term. A plot of $\mathcal{F}(\alpha)$ can be seen in Figure 2.

The remainder term $R$ is intrinsic to the process $X$ . It can be bounded by Cauchy-Schwarz inequality to obtain $R=O\left(\Delta^{|H_{1}-H_{2}|}\right)$ . However, this is a pessimistic approach, since $R$ can be equal to zero, for processes such as (8). In order to adjust for this possibly non-negligible remainder term, the remainder terms $R$ and $\mathcal{F}(\alpha)$ can be estimated, provided that $2(\overline{H}-\underline{H})<\zeta\wedge 1$ . Since $R$ only involves expectations of the process, it can be estimated by (16). We now focus on the estimation of $\mathcal{F}$ .

Let $\theta_{\overline{\mathbf{u}}}$ and $\theta_{\underline{\mathbf{u}}}$ be the mean-squared variations such that $H_{\overline{\mathbf{u}}}=\overline{H}$ , and $H_{\underline{\mathbf{u}}}=\underline{H}$ respectively. Since

\frac{L_{\overline{\mathbf{u}}}(\Delta)}{L_{\underline{\mathbf{u}}}(\Delta)}% \Delta^{\overline{H}-\underline{H}}\approx\frac{\theta_{\overline{\mathbf{u}}}% (\Delta)}{\theta_{\underline{\mathbf{u}}}(\Delta)},

(28)

we have

\mathcal{F}_{num}\approx\frac{\theta_{\overline{\mathbf{u}}}(\Delta)}{\theta_{% \underline{\mathbf{u}}}(\Delta)}\frac{\left|\cos(\alpha)\mathbbm{1}_{\{H_{1}<H% _{2}\}}+\sin(\alpha)\mathbbm{1}_{\{H_{1}>H_{2}\}}\right|^{2\overline{H}}}{% \left|\sin(\alpha)\mathbbm{1}_{\{H_{1}<H_{2}\}}+\cos(\alpha)\mathbbm{1}_{\{H_{% 1}>H_{2}\}}\right|^{2\underline{H}}}+\frac{R(\Delta)}{\theta_{\underline{% \mathbf{u}}}(\Delta)},

(29)

and

\mathcal{F}_{denom}\approx\frac{\theta_{\overline{\mathbf{u}}}(\Delta)}{\theta% _{\underline{\mathbf{u}}}(\Delta)}\frac{\left|\sin(\alpha)\mathbbm{1}_{\{H_{1}% <H_{2}\}}+\cos(\alpha)\mathbbm{1}_{\{H_{1}>H_{2}\}}\right|^{2\overline{H}}}{% \left|\cos(\alpha)\mathbbm{1}_{\{H_{1}<H_{2}\}}+\sin(\alpha)\mathbbm{1}_{\{H_{% 1}>H_{2}\}}\right|^{2\underline{H}}}+\frac{R(\Delta)}{\theta_{\underline{% \mathbf{u}}}(\Delta)},

(30)

in the sense that the ratios of the LHS and RHS in (30) goes to 1 in the limit as $\Delta\rightarrow 0$ . $\widehat{\mathcal{F}}(\alpha)$ can be computed by replacing $\theta_{\underline{\mathbf{u}}}$ , $\theta_{\overline{\mathbf{u}}}$ , and $\alpha$ with their estimates obtained by Algorithm 1. An adjusted estimate of $\alpha$ will then be given by

\widehat{\widehat{g}}(\alpha;\Delta)=\widehat{g}(\alpha;\Delta)/\mathcal{F}(% \widehat{\alpha}),

(31)

where $\widehat{g}(\alpha,\Delta)$ is computed by (19). The simulation results in Section 5 suggest that using the adjusted estimates in (31) yields significant better results when $\alpha$ gets further away from $\pi/4$ . In order to avoid introducing greater dependence between quantities, we suggest to only compute the adjusted estimates once. Moreover, we suggest to set $R=0$ in practice to decrease the computational load. This seems to produce good results , even when $R\neq 0$ , as seen from the simulation results in the Supplementary Material. Likewise, the final estimate $\widehat{\widehat{\alpha}}$ can be obtained by applying the appropriate inverse function (either $\arctan$ or arccot) obtained by the identification process when computing the initial estimates $\widehat{\alpha}$ . This not only saves computational time, but also results in valid inference, as opposed to repeating the identification process using $\widehat{\widehat{\alpha}}$ .

4 Theoretical Properties

In the following, we provide non-asymptotic bounds for our main algorithms, as well as the auxiliary estimates. The proofs are provided in detail in the Supplementary Material. The following mild assumptions are imposed.

Assumptions.

(H1)

Let $X$ be anisotropic process with the two regularities $(H_{1},H_{2})$ , where $X^{(j)}$ , $1\leq j\leq N$ , are independent realizations of $X$ .
(H2)

The errror terms $e^{(j)}_{m}$ in the data model equation (2) are iid. The random variables $e^{(j)}_{m}$ and X are independents

(H3)

Three positive constants $\mathfrak{a}$ , $\mathfrak{A}$ and $r$ exist such that, for any $\boldsymbol{t}\in\mathcal{T}$ ,

\mathbb{E}\left|X^{(j)}\left(\boldsymbol{t}\right)-X^{(j)}\left(\boldsymbol{s}% \right)\right|^{2p}\leq\frac{p!}{2}\mathfrak{a}\mathfrak{A}^{p-2}\|\boldsymbol% {t}-\boldsymbol{s}\|^{2p\underline{H}}\qquad\forall\boldsymbol{s}\in B(% \boldsymbol{t};r),\;\forall p\geq 1.

(H4)

A constant $\mathfrak{G}$ exists such that

\mathbb{E}(\varepsilon^{2p})\leq\frac{p!}{2}\mathfrak{G}^{p-2}\sigma^{2},% \qquad\forall p\geq 1.

(32)

Remark 1.

Assumption (H3) states that $X$ has sub-Gaussian increments in a local sense, for every point $\mathbf{t}\in\mathcal{T}$ . It is satisfied by the processes considered in the simulations, as well as those presented in Section 2.2. In fact, under Definition 1, Assumption (H3) is equivalent to

\mathbb{E}\left|X^{(j)}\left(\boldsymbol{t}\right)-X^{(j)}\left(\boldsymbol{s}% \right)\right|^{2p}\leq\frac{p!}{2}\mathfrak{a}\mathfrak{A}^{p-2}\mathbb{E}% \left[\left|X^{(j)}\left(\boldsymbol{t}\right)-X^{(j)}\left(\boldsymbol{s}% \right)\right|^{2}\right]^{p},

an assumption which is more familiar and widely used in the literature.

Let $H_{\mathbf{u}(\beta)}(\Delta)$ denote a “proxy" of the directional regularity in (22), given by

H_{\mathbf{u}(\beta)}(\Delta)=\frac{\log\left(\theta_{\mathbf{u}(\beta)}(2% \Delta)\right)-\log\left(\theta_{\mathbf{u}(\beta)}(\Delta)\right)}{2\log(2)}.

(33)

Whenever we write $H_{\mathbf{u}(\beta)}(\Delta)$ with an explicit dependence on $\Delta$ , or mention the proxy, we are referring to the quantity in (33). We start by providing a bound for this quantity for different angles, which provides us with important insight into the behavior of the directional regularity in the “real world", where $\Delta$ cannot be infinitesimally small. See Section 6 for a more detailed discussion. Recall that we are working with $G(\mathbf{t},\Delta)=O(\Delta^{H_{\mathbf{u}(\beta)}+\zeta})$ ; see Definition 1.

Proposition 2.

We have

\left|H_{\mathbf{u}(\beta)}-H_{\mathbf{u}(\beta)}(\Delta)\right|=O\left(\Delta% ^{\zeta}\right).

(34)

Suppose that Assumption (H3) holds. Then for any $\Delta$ sufficiently small, and for any pair of angles $(\beta_{1},\beta_{2})\in[0,2\pi]$ , the following bound holds:

	$\displaystyle\left\|H_{\mathbf{u}(\beta_{1})}(\Delta)-H_{\mathbf{u}(\beta_{2})}% (\Delta)\right\|\leq$	$\displaystyle\mathfrak{a}\left\{\frac{2^{1+\underline{H}-2H_{\mathbf{u}(\beta_% {2})}}}{L_{\mathbf{u}(\beta_{2})}(\mathbf{t})}\Delta^{2(\underline{H}-H_{% \mathbf{u}(\beta_{2})})}+\frac{2^{1-\underline{H}}}{L_{\mathbf{u}(\beta_{1})}(% \mathbf{t})}\Delta^{2(\underline{H}-H_{\mathbf{u}(\beta_{1})})}\right\}$		(35)
	$\displaystyle\times$	$\displaystyle(\beta_{1}-\beta_{2})^{2\underline{H}}\times\{1+O\left(\Delta^{% \zeta}\right)\}.$		(36)

Proposition 3 is a critical ingredient in allowing us to derive rates of convergence associated to the identification process in Algorithm 1.

Proposition 3.

Assume that (H1), (H2), (H3) and (H32) hold true. Recall that $\underline{H}=H_{1}\wedge H_{2}$ and $\overline{H}=H_{1}\vee H_{2}$ . For any $\eta$ which satisfies

\eta\geq\frac{4\Delta^{-2\overline{H}}\mathfrak{a}}{\underline{L}}\times\left% \{\left(\sqrt{2}\right)^{-\underline{H}}\left\{\left(\sqrt{2}\right)^{2-% \underline{H}}M_{0}^{-\frac{1}{2}\underline{H}}+2\Delta^{\underline{H}}\right% \}M_{0}^{-\frac{1}{2}\underline{H}}+M_{0}^{-\underline{H}}\right\},

(37)

we have for any $r\in[0,\underline{H})$

	$\displaystyle\mathbb{P}\left(\sup_{\beta\in[0,2\pi)}\|\widehat{H}_{\mathbf{u}(% \beta)}(\Delta)-H_{\mathbf{u}(\beta)}(\Delta)\|\geq 2\eta\right)\leq$	$\displaystyle 4\exp\left(-\frac{1}{4}\times\frac{\eta^{2}N\Delta^{4\underline{% H}}}{2\mathfrak{b}+\mathfrak{B}\eta\Delta^{2\underline{H}}}\right)$		(38)
	$\displaystyle+$	$\displaystyle 4\exp\left(-\frac{1}{8}\times\frac{\eta^{2}\Delta^{4\underline{H% }}N}{8\mathfrak{a}_{1}+\mathfrak{A}_{1}\eta\Delta^{2\underline{H}}}\right),$		(39)

where

\mathfrak{A}_{1}=4\max\left\{\frac{\mathfrak{A}}{(2M_{0})^{\underline{H}}},4% \mathfrak{G}\right\},\qquad\mathfrak{a}_{1}=\frac{\mathfrak{a}}{(2M_{0})^{2% \underline{H}}}+2^{4}\sigma^{2},

and

\mathfrak{B}=\left(\frac{\sqrt{2}2^{\underline{H}}\mathfrak{A}}{\log(2)}\right% )^{2}(\underline{H}-r)^{-3}\left\{\sqrt{2}M_{0}^{-1/2}+\Delta\right\}^{2r},% \qquad\mathfrak{b}=\mathfrak{B}^{2}\left(\frac{\mathfrak{a}}{2\mathfrak{A}^{2}% }\right)^{\frac{\underline{H}-r}{1-\underline{H}+r}}.

(40)

The condition on $\eta$ in (37) represents the bias term that arises from performing denoised interpolation in order to compute the mean-squared variations. This bias term is an intrinsic feature of the “discretely observed" setup, where the surfaces are observed only at a finite number of discrete points, instead of in continuous time.

Corollary 1.

Suppose that assumptions of Proposition 3 hold true, and set

\Delta=\exp\left(-\log(M_{0})^{\xi}\right),\qquad\text{for }0<\xi<1.

(41)

Moreover, we assume that a non-negative constant $\mathfrak{e}>0$ (independent of $M_{0}$ and $N$ ) exist such that

\mathfrak{e}^{-1}\leq\frac{\log(M_{0})}{\log(N)}\leq\mathfrak{e}.

(42)

Then

\sup_{\beta\in[0,2\pi)}|\widehat{H}_{\mathbf{u}(\beta)}(\Delta)-H_{\mathbf{u}(% \beta)}(\Delta)|\\ =O_{\mathbb{P}}\left(\max\left\{M_{0}^{-\frac{1}{2}\underline{H}}\exp\left((2% \overline{H}-\underline{H})\log(M_{0}^{\xi})\right),N^{-\frac{1}{2}}\exp\left(% 2\underline{H}\log(M_{0}^{\xi})\right)\right\}\right).

(43)

The condition (42) requires that $M_{0}$ falls within two power of $N$ , equivalently, the converse hold true. The choice of $\Delta$ in (41) is done is such way that $\log(M_{0})^{-b}\gg\Delta(M_{0})\gg M_{0}^{-b}$ for any fixed $b>0$ , this choice can be found also in Golovkine et al., (2023) with $\xi=1/3$ . the rate in (43) converge to $0$ since $\Delta$ is chosen to be larger than any negative power of $M_{0}$ , and keeping (42) in mind.

Proposition 4.

Under the assumptions of Corollary 1, three positive constants $C_{1}$ , $C_{2}$ and $\mathfrak{u}$ exist such that for any $\varepsilon$ satisfying

\frac{|\log(\Delta)|}{\mathfrak{u}\Delta^{2\underline{H}}}\geq\varepsilon\geq% \mathfrak{u}\max\{\Delta^{-3\underline{H}}|\log(\Delta)|M_{0}^{-\frac{1}{2}% \underline{H}},\Delta^{\zeta\wedge|H_{1}-H_{2}|}\},

(44)

we obtain

\mathbb{P}\left(|\widehat{g(\alpha,\Delta)}-g(\alpha)|\geq\varepsilon\right)% \leq C_{1}\exp\left(-C_{2}\varepsilon^{2}N\frac{\Delta^{8\underline{H}}}{\log^% {2}(\Delta)}\right).

(45)

where $g$ is defined in Proposition 1.

The condition on $\varepsilon$ in (44) represent the reminder term in Proposition 1 combined with the condition (37) on $\eta$ in Proposition 3, such condition on $\varepsilon$ can be satisfied by choosing $\Delta$ as in (41) provided that $M_{0}$ is large enough. The probability bound (45) converge to zero as long as we choose $\Delta$ as in (41) and the condition (42) is satisfied.

Theorem 1.

Suppose that assumptions of Proposition 3 hold true. Moreover assume that for any $k\in\{1,\dots,K_{0}\}$ we have

\Delta_{k}\underset{\Delta\rightarrow 0}{=}o\left(\Delta^{\frac{\zeta\wedge\{% \overline{H}-\underline{H}\}}{\overline{H}-\underline{H}}\underline{H}}\right).

(46)

Then for any $\varepsilon$ such that $\Delta^{\zeta\wedge\{\overline{H}-\underline{H}\}}\ll\varepsilon$ , we have for $M_{0}$ sufficiently large

\mathbb{P}\left(|\widehat{\alpha}-\alpha|\geq 2\varepsilon\right)\leq\mathbb{P% }\left(|\widehat{g(\alpha,\Delta)}-g(\alpha)|\geq\varepsilon\right)\\ +\mathbb{P}\left(|\widehat{g(\alpha,\Delta)}-g(\alpha)|^{2\underline{H}}\geq% \frac{\underline{L}(\overline{H}-\underline{H})}{4\mathfrak{a}2^{3-\underline{% H}}K_{0}}\right)\\ +\mathbb{P}\left(\sup_{\beta\in[0,2\pi)}|\widehat{H}_{\mathbf{u}(\beta)}(% \Delta)-H_{\mathbf{u}(\beta)}(\Delta)|\geq\frac{\overline{H}-\underline{H}}{8K% _{0}}\right).

(47)

The convergence of the RHS in (47) can be guaranteed following the discussion provided for Proposition 3, and Proposition 4 and Corollary 1. The main message of Theorem 1 is that, in order to identify the right angle that gives the maximal regularity, we need to choose the values in the grid $\mathbf{\Delta}$ introduced in (23) to have a slower rate of decrease than the $\Delta$ used to estimate the function $g$ in (19). This is because, in (46), we have $(\zeta\wedge\{\overline{H}-\underline{H}\})\{\overline{H}-\underline{H}\}^{-1}% \underline{H}<1$ .

5 Numerical Properties

We begin this section by briefly describing a simple, novel simulator for a class of bivariate anisotropic processes. The simulator, along with the other algorithms mentioned in this paper, are implemented in the R package direg³³3Freely accessible at https://github.com/sunnywang93/direg..

5.1 Anisotropic simulator

Let $B_{1}$ , $B_{2}$ be two processes with Hurst exponents $H_{1}$ and $H_{2}$ respectively. Suppose without loss of generality that $H_{1}\neq H_{2}$ . In the following, our exposition will be structured assuming that $B_{1}$ and $B_{2}$ are fractional brownian motions (fBms). In principle, our simulation approach can be generalized to other processes that exhibit the stationary increments property.

Many methods are available to simulate these types of processes, and we do not attempt to provide an exhaustive list. For examples of some papers, see Wood and Chan, (1994), Stein, (2002) and Coeurjolly and Porcu, (2018). A survey can be found in Dieker, (2004). In particular, the circulant embedding method described in Wood and Chan, (1994) is attractive due to its speed, and can be adapted to simulate processes such as the fBm with stationary increments.

The anisotropic component of the simulator will be our main focus, since there is already a wide array of methods available for simulating the individual processes. Let $\mathbf{u_{1}}$ and $\mathbf{u_{2}}$ be unit vectors with associated regularities $H_{1}$ and $H_{2}$ . The vectors can be represented in the canonical basis as $\mathbf{u_{1}}=\cos(\alpha)\mathbf{e_{1}}+\sin(\alpha)\mathbf{e_{2}}$ and $\mathbf{u_{2}}=-\sin(\alpha)\mathbf{e_{1}}+\cos(\alpha)\mathbf{e_{2}}$ . The main difficulty of simulating processes on an equally spaced grid along $\mathbf{u_{1}}$ , $\mathbf{u_{2}}$ is the existence of negative values when $\alpha\in[0,3\pi/2]$ , while the fBm has a domain in $\mathbb{R}_{+}$ .

This problem can be resolved by exploiting the stationary increments property of the fBm, given by $B(t)-B(s)\stackrel{{\scriptstyle d}}{{=}}B(t-s)$ , where $\stackrel{{\scriptstyle d}}{{=}}$ denotes equality in distribution. By using the reference point $t=0$ , we obtain

-B(s)\stackrel{{\scriptstyle d}}{{=}}B(-s).

(48)

An anisotropic fBm can thus be simulated on any $\alpha\in[0,\pi]$ by first simulating a fBm on an equally spaced grid in $[0,|\cos(\alpha)|+\sin(\alpha)]$ , before transforming the negative part using (48). A bivariate process can finally be constructed by applying a function $f$ to the individual processes, for example $f(B_{1},B_{2})=B_{1}+B_{2}$ . Our simulator is similar in spirit to the Turning bands approach; see for example Matheron, (1973). Psuedocode for our simulator can be found in Algorithm SM.1 of the Supplementary Material.

5.2 Parameter settings and error measures

Observations $(Y^{(i)}(\mathbf{t}_{m}),\mathbf{t}_{m}),1\leq i\leq N,1\leq m\leq M_{0}$ were simulated using our algorithm for the sum of two fBms. Other processes (e.g product of two fBms) can be found in the Supplementary Material.

A total of 60 different parameter configurations were explored, consisting of all possible combinations of the following parameter sets: number of curves $N\in\{100,150\}$ , number of points along each curve $M_{0}\in\{51^{2},101^{2}\}$ , noise level $\sigma\in\{0.1,0.5,1\}$ , and angles $\alpha\in\{\pi/30,\pi/5,\pi/4,\pi/3,\pi/2-\pi/30\}$ . In both processes, the regularities were fixed to be $H_{1}=0.8$ and $H_{2}=0.5$ . In line with our discussion in Section 3.2, the parameter $\Delta$ was set to be $\Delta=M_{0}^{-1/4}$ . The grid of spacings $\mathbf{\Delta}$ involved in the identification process was set to $\mathbf{\Delta}=\{M^{-1/4},\Delta_{1},\dots,\Delta_{K_{0}-1},0.4\}$ , an evenly spaced grid consisting of $K_{0}=15$ points. The absolute error is used as a risk measure for each experiment, given by

\mathcal{R}_{\alpha}=|\widehat{\widehat{\alpha}}-\alpha|,

(49)

where $\widehat{\widehat{\alpha}}$ is the adjusted estimate given by (31). Parameter settings for anisotropic detection can be found in Table 1.

5.3 Empirical Results

Boxplots for the simulation results can be seen in Figure 3. The risk can be seen to be very small, below 0.1 for all different configurations, even in the extremely high noise setting $\sigma=1$ . In fact, we see that our Algorithms are fairly robust to noise, in the sense that the risk increase less than proportionately to the noise levels. As seen in Section 7, the levels of accuracy observed in Figure 3 is sufficient to achieve lower risk levels for applications such as smoothing.

6 Discussion

In this section we discuss some key aspects of our methodology. Extensions of our approach to more general settings, such as the random design framework with heteroscedastic noise, can be found in the Supplementary Material.

6.1 Computational Considerations

The computational complexity of Algorithm 1 is driven primarily by the identification process, due to the additional loop over the grid of spacings $\mathbf{\Delta}$ . Since we are performing a linear grid search in each basis vector containing $\sqrt{M_{0}}$ points, the complexity of nearest neighbor interpolation for each surface is $O(M_{0})$ , resulting in $O(N\times M_{0})$ for $N$ sample paths. By searching over $M_{0}$ points to determine the closest point to each $\mathbf{t}_{m}$ in terms of $\ell_{1}$ norm, the complexity of computing $\widehat{\sigma}^{2}$ is $O(N\times M_{0})$ , so the complexity of $\widehat{H}_{\mathbf{v}(\beta)}(\mathbf{t},\Delta)$ is $O(N\times M_{0})$ for each $\mathbf{t}$ , $\Delta$ . The complexity of the full identification process is therefore $O(N\times\#\widetilde{\mathcal{T}}\times K_{0}\times M_{0})$ . In the context of fda, a complexity of at least $O(N\times\#\widetilde{\mathcal{T}}\times M_{0})$ is to be expected. Since $K_{0}$ is relatively small (e.g 15 points), our algorithm is reasonably good in terms of computational complexity, considering the possible gains in the rates of convergence.

Clearly, the simulation described in Section 5.1 depends on the method used to generate the individual fBms. Once the fBms are given, only a linear search over $\#\widetilde{\mathcal{T}}$ grid points is required. Using a fast simulation method such as the circulant embedding method in Wood and Chan, (1994), individual one dimensional curves can be simulated in $O(\{\#\widetilde{\mathcal{T}}\times\log(\#\widetilde{\mathcal{T}})\}^{1/2})$ time. Thus the simulation of $N$ surfaces can be done in $O(N\times\#\widetilde{\mathcal{T}}\times\log(\#\widetilde{\mathcal{T}}))$ . To the best of our knowledge, we do not know of any simulation algorithm in the context of fda that has better computational complexity.

6.2 Singularity along the maximizing direction

Lemma 1 states that there is a singularity present in the concept of directional regularity, in the sense that there exists only one direction for which the regularity is maximal. Since the angles $\alpha$ are not expected to be estimated perfectly, why can there be a gain in convergence rates, as seen in the empirical results in Sections 5.3?

The answer lies in the intrinsic nature of directional regularity, which is itself intimately related to the mean-squared variations of a process. It is an “infinitesimal" concept, in the sense that

\lim_{\Delta\rightarrow 0}\frac{\theta_{\mathbf{u}}(\mathbf{t})}{L_{\mathbf{u}% }(\mathbf{t})\Delta^{2H_{\mathbf{u}}}}=1.

(50)

However, the limiting case cannot be obtained with finite precision computers. In reality, the “true" directional regularity" that one observes is given by the proxy $H_{\mathbf{u}(\beta)}(\Delta)$ . This partly explains the observed “continuity" associated to the directional regularity in practice, so that the estimate of $\alpha$ only needs to be sufficiently small in order for gains in the rate to be achieved.

A more precise, formal perspective can be taken through the lens of Proposition 2, found in the Supplementary Material. In the upper bound of $H_{\mathbf{u}(\beta_{1})}-H_{\mathbf{u}(\beta_{2})}$ , we observe two competing terms, in the form of $\Delta^{2(\underline{H}-H_{\mathbf{u}(\beta_{2})})}$ and $(\beta_{1}-\beta_{2})^{2\underline{H}}$ . In order for $H_{\mathbf{u}(\beta_{1})}-H_{\mathbf{u}(\beta_{2})}$ to converge to zero, it is necessary that the condition

\lim_{\Delta\rightarrow 0}\frac{(\beta_{1}-\beta_{2})^{2\underline{H}}}{\Delta% ^{2(H_{\mathbf{u}(\beta_{2})}-\underline{H})}}=0

(51)

is satisfied. In other words, $\beta_{1}-\beta_{2}=o(\Delta^{H_{\mathbf{u}(\beta_{2})}/\underline{H}-1})=o(1)$ , (51) can be loosely interpreted as saying “for directions that are close enough to each other, the proxies in (22), of the directional regularity stays somewhat close"; see Proposition 2.

7 Applications

In this section, we discuss concrete applications of our directional regularity methodology. They serve as a strong motivation for performing a change of basis using our methodology as a standard pre-processing step.

7.1 Smoothing Bivariate Functional Data

Let $\{X(\mathbf{t}),\mathbf{t}\in\mathcal{T}\subset\mathbb{R}^{2}\}$ be a bi-variate stochastic process satisfying (3), with maximizing regularity along the direction $\mathbf{u_{1}}=\cos(\alpha)\mathbf{e_{1}}+\sin(\alpha)\mathbf{e_{2}}$ . Following the preceding sections, our exposition will be centered around the common design case, although this is not necessary. Observations come in the form of pairs $\mathcal{D}_{0}=(Y_{m}^{(i)},\mathbf{t}_{m}),1\leq m\leq M_{0},1\leq i\leq N$ , generated under

Y^{(i)}_{m}=X^{(i)}(\mathbf{t}_{m})+\sigma e^{(i)}_{m},\qquad 1\leq m\leq M_{0% },1\leq i\leq N,

(52)

where $e_{m}^{(i)}$ are i.i.d centered random variables with unit variance. We call $\mathcal{D}_{0}$ the learning set.

Consider a new realization $X^{new}$ of $X$ , where the observed pairs $\mathcal{D}_{1}=(Y^{new}_{m},\mathbf{t}_{m}),1\leq m\leq M_{0}$ are generated from

Y^{new}_{m}=X^{new}(\mathbf{t}_{m})+\sigma e^{new}_{m},\quad\quad 1\leq m\leq M% _{0},

(53)

where $e_{m}^{new}$ are i.i.d centered random variables with unit variance and $X^{new}$ and $e_{m}^{new}$ are independent. We refer to $\mathcal{D}_{1}$ as the online set.

Our goal is the recovery of $X^{new}({\mathbf{t}_{m}})$ with $\mathcal{D}_{0}$ , using a suitable estimator $\widehat{X}^{new}({\mathbf{t}_{m}})$ . Several methodologies for smoothing multivariate functional data currently exist. For example, Ramsay, (2002), Sangalli et al., (2013), Wood et al., (2008) consider spatial regression using a penalty that involves the Laplacian. These do not take the anisotropy of the process into account. Azzimonti et al., (2015) and Bernardi et al., (2018) extends these previous approaches by using a penalty term involving the partial different operator, and takes the anisotropy of the process into account. All these methods assume that the processes are differentiable for the penalty to be well defined. This assumption does not always hold in practice, and we propose an approach which works for non-differentiable processes.

The parameters of $\widehat{X}^{new}({\mathbf{t}_{m}})$ are estimated from the learning set. Let $R_{\alpha}$ be a clockwise rotation matrix given by

R_{\alpha}=\begin{pmatrix}\cos(\alpha)&\sin(\alpha)\\ -\sin(\alpha)&\cos(\alpha)\end{pmatrix}.

(54)

Define a new process by applying the rotation matrix to the sampling points, denoted by

Z(\mathbf{t}):=X(R_{\alpha}^{-1}\cdot\mathbf{t}),\qquad\forall\mathbf{t}\in% \mathcal{T}.

(55)

It can be seen easily that $Z(\mathbf{t})$ admits a larger effective smoothness along the canonical basis. It is thus beneficial to work instead with $Z$ instead of $X$ on the canonical basis. In practice, this transformed process can be obtained by performing a change-of-basis after estimating the rotation matrix $R_{\alpha}$ using our methodology.

For simplicity, we fix the smoother to be the Nadaraya-Watson estimator with the multiplicative Epanechnikov kernel $K:\mathbb{R}^{2}\to\mathbb{R}_{+}$ , supported on $[-1,1]\times[-1,1]$ , where

K(\mathbf{s})=K_{ep}(s_{1})\times K_{ep}(s_{2}),\quad\forall\mathbf{s}=(s_{1},% s_{2})\in\mathbb{R}^{2},\quad\text{and }K_{ep}(x)=\frac{3}{4}(1-x^{2})\mathbbm% {1}_{\{|x|\leq 1\}}.

(56)

Furthermore, $\mathbf{B}=\operatorname{diag}(h_{1}^{-1},h_{2}^{-1})$ is a positive definite bandwidth matrix. Using the rule $0/0=0$ , the Nadaraya-Watson estimator is given by

\widehat{Z}^{new}(\mathbf{t};\mathbf{B})=\sum_{m=1}^{M_{0}}Y^{new}_{m}\frac{K% \left(\mathbf{B}(R_{\alpha}\mathbf{t}^{new}_{m}-\mathbf{t})\right)}{\sum_{m=1}% ^{M_{0}}K\left(\mathbf{B}(R_{\alpha}\mathbf{t}^{new}_{m}-\mathbf{t})\right)}.

(57)

The bandwidths should adapt to the regularity of the process $Z$ to obtain the optimal rate of convergence. In particular, the bandwidths should be selected adaptively to the intrinsic anisotropy of the process. This can be achieved through a plug-in bandwidth rule when explicit risk bounds are available and directly estimable from the data. The estimate of $X^{new}$ is then obtained by

\widehat{X}^{new}(\mathbf{t},\mathbf{B})=\widehat{Z}^{new}(R_{\alpha}\mathbf{t% },\mathbf{B}),\qquad\forall\mathbf{t}\in\mathcal{T}.

(58)

Let $B(\mathbf{0},r)$ denote the ball centered at the origin of $\mathbb{R}^{2}$ with radius $r$ . Consider the $L^{2}$ risk of $\widehat{Z}^{new}$ , given by

\mathcal{R}\left(\textbf{B},M_{0}\right)=\mathbb{E}\left[\|\widehat{Z}^{new}(% \cdot\hskip 2.84544pt;\textbf{B})-Z^{new}(\cdot)\|_{2}\right]=\mathbb{E}\left[% \|\widehat{X}^{new}(\cdot\hskip 2.84544pt;\textbf{B})-X^{new}(\cdot)\|_{2}% \right],

(59)

where the second inequality is given by substitution and the fact that the matrix $R_{\alpha}$ is with determinant 1. The pointwise risk is given by

\mathcal{R}\left(\mathbf{t},\textbf{B},M_{0}\right)=\mathbb{E}\left[\left|% \widehat{X}^{new}(\mathbf{t};\textbf{B})-X^{new}(\mathbf{t})\right|^{2}\right],

(60)

Observe that Fubini’s theorem implies $\mathcal{R}\left(\textbf{B},M_{0}\right)=\|\mathcal{R}\left(\cdot,\textbf{B},M% _{0}\right)\|,$ so the integrated risk can readily be recovered from the pointwise risk. Proposition 5 provides a pointwise risk bound. Assumptions can be found in the Appendix.

Proposition 5.

Let $h_{1}>0$ and $h_{2}>0$ in a bandwidth range satisfying

\max\{h_{1},h_{2}\}\rightarrow 0,\qquad\text{and}\quad\sqrt{M_{0}}\times\min\{% h_{1},h_{2}\}\rightarrow+\infty.

(61)

Then the following risk bounds hold:

\mathcal{R}(\textbf{B},M_{0})\lesssim\left\{\frac{1}{M_{0}h_{1}h_{2}}+h_{1}^{2% H_{1}}+h_{2}^{2H_{2}}\right\}.

(62)

The proof is provided in the Supplementary Material. The following corollary provides the optimal bandwidth with respect to the $L^{1}$ norm.

Corollary 2.

Under the same assumptions of Proposition 5, the optimal bandwidths $h_{1}^{*}$ and $h_{2}^{*}$ satisfies

h_{1}^{*}\asymp\left(\frac{1}{M_{0}}\right)^{\frac{H_{2}}{2H_{1}H_{2}+H_{1}+H_% {2}}},\quad h_{2}^{*}\asymp\left(\frac{1}{M_{0}}\right)^{\frac{H_{1}}{2H_{1}H_% {2}+H_{1}+H_{2}}}.

(63)

Moreover, if $h_{1}^{*}$ and $h_{2}^{*}$ is used for smoothing, the following risk is obtained:

\mathcal{R}(\textbf{B}^{*},M_{0})\lesssim M_{0}^{-\frac{2\omega}{2\omega+1}},

(64)

where $\omega$ is the “effective smoothness", defined by the relation

\frac{1}{H_{1}}+\frac{1}{H_{2}}=\frac{1}{\omega}.

(65)

All the quantities in Corollary 2 are estimable from the data using our methodology. Structural adaptation enables the optimal bandwidths to be chosen according to Corollary 2, obtaining the anisotropic rates of convergence, when the intrinsic anisotropy is not in the direction of the canonical basis. By working only on the canonical basis, the rates of convergence that is generally obtained corresponds to the isotropic rate. This is confirmed by our simulation study, which we discuss now.

Our simulation study aims to compare the $L^{1}$ risk of smoothing, with and without a change of basis. Surfaces corresponding to the sum of fBms with regularities $H_{1}=0.8$ , $H_{2}=0.5$ were simulated using the Algorithm described in Section 5.1. The angles $\alpha$ were given by $\alpha\in\{\pi/3,5\pi/6\}$ . $N=150$ surfaces were generated with $M_{0}=101$ evenly spaced sampling points. Gaussian noise $\sigma e_{m}^{(i)}$ were added, where $e_{m}^{(i)}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(0,1)$ and $\sigma=0.05$ . Estimates of $\alpha$ were obtained with Algorithm 1, with $\Delta=M_{0}^{-1/4}$ , $\mathbf{\Delta}=\{M^{-1/4},\Delta_{1},\dots,\Delta_{K_{0}-1},0.4\}$ , and $\#\mathbf{\Delta}=15$ .

The online set, consisting of one surface per replication, were generated from the same process, for a total of 400 replications. The true surface was first generated on an equally spaced grid of 201 points on $\mathcal{T}$ without noise. Nearest neighbor interpolation was then performed to discretize the process onto an grid of 101 points. The same Gaussian noise was added as the learning set.

For the anisotropic setp, a change of basis is performed on the online set. The minimum and maximum regularities were estimated using (18) and (22). The bandwidths were selected according to Corollary 2. In the isotropic setup, no change of basis is performed, and the bandwidths were similarly selected according to Corollary 2, with $H_{1}=H_{2}$ . A multiplicative kernel was used, where the Epachenikov kernel was used in each dimension.

The risk measure was taken to be the relative risk, given by

\mathcal{R}_{rel}=\frac{\mathcal{R}_{ani}}{\mathcal{R}_{iso}},

(66)

where $\mathcal{R}_{ani}$ and $\mathcal{R}_{iso}$ correspond to the $L^{2}$ anisotropic and isotropic risk respectively. Simulation results can be seen in Figure 4. We see that with the exception of the angles near the boundary (i.e 0 or $\pi/2$ ), performing a change of basis leads to a reduction in the $L^{2}$ risk, by levels as much as 10%. It is not surprising that at the boundary, the risk is worse, since one is already anisotropic along the canonical basis.

7.2 Anisotropic detection

Our focus so far has been on estimating the direction of the maximizing regularity, which implicitly assumes that anisotropy is present. When the process is intrinsically isotropic, no gains can be made by structural adaptation since the regularity of the process $X$ is invariant to directions. However, there is no real loss in terms of statistical error either, since the underlying regularity remains the same, as implied by Lemma 1. In particular, the rates of convergence do not get degraded, and can only be improved with structural adaptation.

Nevertheless, there might be instances where knowing the presence of anisotropy is relevant for some applications. Some examples include fingerprint verification Jain et al., (1997), Jiang, (2005) and texture analysis in materials science Germain et al., (2003). This issue can be naturally addressed using the directional regularity approach, by constructing an anistropic detection procedure based on thresholding. The underlying idea is to consider an event

\mathcal{A}(\tau):=\left\{\left|\widehat{\underline{H}}-\widehat{\overline{H}}% \right|>\tau\right\},

(67)

and determine anisotropy based on $\mathbbm{1}_{\mathcal{A}(\tau)}$ , for some threshold $\tau$ that is appropriately chosen. In particular, $\tau$ should be selected to account for the estimation errors of $\widehat{\underline{H}}$ and $\widehat{\overline{H}}$ .

Let $\underline{\varepsilon}=|\widehat{\underline{H}}-\underline{H}|$ and $\overline{\varepsilon}=\overline{H}-\underline{H}$ denote the estimation errors of the minimum and maximum regularities respectively. By construction, we have $\overline{\varepsilon}\geq\underline{\varepsilon}$ , with strict inequality holding in general, since $\widehat{\underline{H}}$ is used as an auxiliary quantity for estimating $\widehat{\overline{H}}$ . A sensible approach is thus to choose $\tau$ such that $\underline{\varepsilon}<\tau<\overline{\varepsilon}$ . On the one hand, when $\underline{H}=\overline{H}$ (i.e isotropy), choosing $\tau>\underline{\varepsilon}$ accounts for the estimation error so that $\mathbbm{1}_{\mathcal{A}(\tau)}=0$ with high probability. On the other hand, when $\underline{H}\neq\overline{H}$ , choosing $\tau<\overline{\varepsilon}$ is necessary to avoid falsely detecting isotropy. An illustration of this can be seen in Figure 5.

Figure 5: Illustration of the region in which the thresholding parameter

\tau

should fall.

Since $\underline{\varepsilon}$ is unknown in practice, we construct a data-driven procedure to estimate it, which we denote $\widehat{\underline{\varepsilon}}$ . Let $\beta=\{\beta_{1},\dots,\beta_{J}\},0<J<\infty$ be a random set of angles, where $J=J(N,M_{0})$ is an integer that depends on the sample size. Let $\beta^{\perp}=\{\beta_{1}^{\perp},\dots,\beta_{J}^{\perp}\}$ be the set of angles orthogonal to $\beta$ , where $\beta_{j}^{\perp}=\beta_{j}+\pi/2$ , $\forall j=1,\dots,J$ . The main idea is to estimate the minimum regularity associated to each $\beta_{j}$ and $\beta_{j}^{\perp}$ , and compute the average difference between these two estimates, given by

\widehat{\underline{\varepsilon}}=\frac{1}{J}\sum_{j=1}^{J}\underline{% \varepsilon}_{j},

(68)

where for all $j=1,\dots,J$ , we have

\underline{\varepsilon}_{j}=\left|\widecheck{H}_{\mathbf{u}(\beta_{j})}-% \widecheck{H}_{\mathbf{u}(\beta_{j}^{\perp})}\right|,\text{ and }\widecheck{H}% _{\mathbf{u}(\beta)}=\frac{1}{\#\mathbf{\Delta}}\sum_{k=1}^{K_{0}}\widehat{H}_% {\mathbf{u}(\beta)}(\Delta_{k}).

(69)

The principle for this approach is rooted in Lemma 1, which states that under anisotropy, there exists only one maximising direction $\mathbf{u}(\alpha)$ . Thus if we take any other random direction $\mathbf{v}(\beta)$ , $\beta\neq\alpha$ , then the regularity that we will “catch" corresponds to the minimum one.

The quantity $\widehat{\underline{\varepsilon}}$ in (68) estimates the “irreducible error" that arises in estimating the worst regularity $\underline{H}$ by simply changing directions, exemplified by Proposition 2. In order to avoid always detecting anisotropy, this error needs to be taken into account, so that $\tau>\underline{\varepsilon}$ . Averaging the estimated regularities over a grid of $\mathbf{\Delta}$ ’s provides added robustness since $\widehat{H}_{\mathbf{u}(\beta)}(\Delta_{k})$ is no longer simply an auxiliary quantity.

There are two reasons for computing the $\underline{\varepsilon}_{j}$ ’s using different groups of estimates in $\beta$ and $\beta^{\perp}$ . The first is to preserve the independence between summands when computing $\widehat{\underline{\varepsilon}}$ in (68). The second is to ensure that the angles are sufficiently well separated to ensure that the difference in the estimates are not driven by the proximity of angles; see Proposition 2. The threshold is then set to

\tau=\widehat{\underline{\varepsilon}}+\exp(-\log(M_{0})^{\xi}),

(70)

for some $0<\xi<0$ . The extra $\exp(-\log(M_{0})^{\xi})$ term in (70) converges to zero slower than the rate of convergence of $\widehat{\underline{H}}$ , and ensures that the strict inequality $\underline{\varepsilon}<\tau<\overline{\varepsilon}$ is preserved for any $M_{0}$ large enough. Algorithm 2 provides a summary of the anisotropic detection procedure.

Algorithm 2 Anisotropic Detection

1:Data

(Y^{(j)}(\mathbf{t}_{m}),\mathbf{t}_{m})

, Grid

\widetilde{\mathcal{T}}=\left\{\mathbf{t_{1}},\dots,\mathbf{t_{p}}\right\}

; Integer

J

; Estimated Angle

\widehat{\widehat{\alpha}}

Initialize

H\leftarrow\emptyset

;

2:for

j=1,\dots J

3: Sample

\beta_{j}\sim\text{Unif}([\widehat{\widehat{\alpha}}+\pi/4,\widehat{\widehat{% \alpha}}+3\pi/4])

;

4: Estimate

\widehat{H}_{j}=\widehat{H}_{\mathbf{v}(\beta_{j})}

according to (22);

H\leftarrow\widehat{H}_{j}\bigcup H

;

6:end for

7:Compute

\widehat{\underline{\varepsilon}}

according to (68);

8:Compute

\tau

according to (70);

9:return

\mathbbm{1}_{\mathcal{A}(\tau)}

;

\triangleright

1 indicates anisotropy, isotropy otherwise

In theory, the set of angles $\mathbf{\beta}$ can be chosen randomly, as long as $\mathbf{\beta}\neq\pm\widehat{\widehat{\alpha}}$ . However, in order to avoid any “continuity" issues, each $\beta_{j}$ should be sufficiently far away from $\widehat{\widehat{\alpha}}$ . We thus suggest to sample the angles from a uniform distribution, as seen in Algorithm 2. The power $\xi$ in (70), which governs the rate of convergence, only needs to be $\xi\in(0,1)$ in theory. Following Golovkine et al., (2023), we choose $\xi=1/3$ , a value that seems to work well in practice. The integer $J(N,M_{0})$ should be increasing with $N$ and $M_{0}$ , so that $\widehat{\underline{\varepsilon}}$ eventually converges. In view of computational efficiency, we suggest to select $J(N,M_{0})=\lceil(N\times M_{0})^{1/4}\rceil$ , which is aligned with the rate of convergence of $\widehat{H}$ and also works well in practice.

We establish the consistency of Algorithm 2 in the following proposition.

Proposition 6.

Let $\widehat{\overline{H}}=\widehat{H}_{\mathbf{u}(\widehat{\alpha})}$ , $\widehat{\underline{H}}=\widehat{H}_{\mathbf{u}(\widehat{\alpha}+\pi/2)}$ , and $\tau$ be defined as in Algorithm 2. Suppose that the assumptions of Corollary 1 hold true. Then Algorithm 2 is consistent, in the sense that

\lim_{N,M_{0}\rightarrow\infty}\mathbb{P}\left(\left|\widehat{\overline{H}}-% \widehat{\underline{H}}\right|>\tau,\underline{H}\neq\overline{H}\right)=1,

(71)

and

\lim_{N,M_{0}\rightarrow\infty}\mathbb{P}\left(\left|\widehat{\overline{H}}-% \widehat{\underline{H}}\right|>\tau,\underline{H}=\overline{H}\right)=0.

(72)

The proof can be found in the Supplementary Material.

Simulation results for the anisotropic detection procedure described in Section 7.2 can be seen in Table 1. The percentage column indicates the percentage of cases classified as anisotropic ( $\mathbbm{1}_{\mathcal{A}}=1$ ). In the isotropic case, we achieve virtually perfect classification, even for relatively small values of $M_{0}$ . In the anisotropic case, we require either a sufficiently large number of points observed on each surface, or for the difference $\overline{H}-\underline{H}$ to be well-separated. When the difference in regularities is large enough, we can achieve almost perfect classification for $M_{0}=101\times 101$ . In the context of regularity estimation and anisotropic detection, this is a relatively small number of points. For example, Richard, (2016) constructs a statistical test for isotropic detection, where the number of surfaces $N$ plays a more important role. In his simulations, $N=6000$ , a much larger quantity.

Setup	$\overline{H}$	$\sigma$	$M_{0}$	Percentage
Isotropic	0.5	0.1	$51\times 51$	0
Isotropic	0.5	0.1	$101\times 101$	0
Anisotropic	0.8	0.1	$51\times 51$	35.8
Anisotropic	0.8	0.1	$101\times 101$	42.4
Anisotropic	0.9	0.1	$51\times 51$	80.6
Anisotropic	0.9	0.1	$101\times 101$	97

Table 1: Table showing the results of the anisotropic detection approach using thresholding. Percentage column indicates the percentage of the cases classified as anisotropic (i.e

\mathbbm{1}_{\mathcal{A}}=1

) for the different setups.

Acknowledgements

We thank Valentin Patilea for a careful reading of this paper, and providing detailed suggestions and valuable feedback.

Funding

Sunny Wang gratefully acknowledge support from PIA EUR DIGISPORT project (ANR-18-EURE-0022).

Supplementary Material

In the supplement, we provide detailed proofs for the Propositions, Lemmas and Theorems in the main paper. Moreover, additional simulation results for different processes are available.

References

Ammous et al., (2024) Ammous, S., Dedecker, J., and Duval, C. (2024). Adaptive directional estimator of the density in $\mathbb{R}^{d}$ for independent and mixing sequences. J. Multivariate Anal., 203:105332.
Azzimonti et al., (2015) Azzimonti, L., Sangalli, L. M., Secchi, P., Domanin, M., and Nobile, F. (2015). Blood flow velocity field estimation via spatial regression with PDE penalization. J. Amer. Statist. Assoc., 110(511):1057–1071.
Belloni et al., (2015) Belloni, A., Chernozhukov, V., Chetverikov, D., and Kato, K. (2015). Some new asymptotic theory for least squares series: Pointwise and uniform results. J. Econometrics, 186(2):345–366.
Bernardi et al., (2018) Bernardi, M. S., Carey, M., Ramsay, J. O., and Sangalli, L. M. (2018). Modeling spatial anisotropy via regression with partial differential regularization. J. Multivariate Anal., 167:15–30.
Coeurjolly and Porcu, (2018) Coeurjolly, J.-F. and Porcu, E. (2018). Fast and exact simulation of complex-valued stationary Gaussian processes through embedding circulant matrix. J. Comput. Graph. Statist., 27(2):278–290.
Davies and Hall, (1999) Davies, S. and Hall, P. (1999). Fractal analysis of surface roughness by using spatial data. J. R. Stat. Soc. Ser. B. Stat. Methodol., 61(1):3–37.
Dieker, (2004) Dieker, T. (2004). Simulation of fractional brownian motion.
Fan and Guerre, (2016) Fan, Y. and Guerre, E. (2016). Multivariate local polynomial estimators: Uniform boundary properties and asymptotic linear representation. In Essays in Honor of Aman Ullah, volume 36, pages 489–537. Emerald Group Publishing Limited.
Germain et al., (2003) Germain, C., Da Costa, J., Lavialle, O., and Baylou, P. (2003). Multiscale estimation of vector field anisotropy application to texture characterization. Signal Processing, 83(7):1487–1503.
Golovkine et al., (2022) Golovkine, S., Klutchnikoff, N., and Patilea, V. (2022). Learning the smoothness of noisy curves with application to online curve estimation. Electron. J. Stat., 16(1):1485–1560.
Golovkine et al., (2023) Golovkine, S., Klutchnikoff, N., and Patilea, V. (2023). Adaptive estimation of irregular mean and covariance functions. arxiv 2108.06507v2.
Herbin, (2006) Herbin, E. (2006). From $N$ parameter fractional Brownian motions to $N$ parameter multifractional Brownian motions. Rocky Mountain J. Math., 36(4):1249–1284.
Horváth and Kokoszka, (2012) Horváth, L. and Kokoszka, P. (2012). Inference for functional data with applications. Springer Science & Business Media.
Jain et al., (1997) Jain, A., Hong, L., and Bolle, R. (1997). On-line fingerprint verification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):302–314.
Jiang, (2005) Jiang, X. (2005). On orientation and anisotropy estimation for online fingerprint authentication. IEEE Transactions on Signal Processing, 53(10):4038–4049.
Kassi et al., (2023) Kassi, O., Klutchnikoff, N., and Patilea, V. (2023). Learning the regularity of multivariate functional data.
Kokoszka and Reimherr, (2017) Kokoszka, P. and Reimherr, M. (2017). Introduction to functional data analysis. Texts in Statistical Science Series. CRC Press, Boca Raton, FL.
Lepski, (2015) Lepski, O. (2015). Adaptive estimation over anisotropic functional classes via oracle approach. Ann. Stat., 43(3):1178 – 1242.
Lepski and Rebelles, (2020) Lepski, O. V. and Rebelles, G. (2020). Structural adaptation in the density model. Math. Stat. Learn., 3(3-4):345–386.
Maissoro et al., (2024) Maissoro, H., Patilea, V., and Vimond, M. (2024). Adaptive estimation for weakly dependent functional times series.
Matheron, (1973) Matheron, G. (1973). The intrinsic random functions and their applications. Advances in Appl. Probability, 5:439–468.
Ramsay and Silverman, (2005) Ramsay, J. and Silverman, B. W. (2005). Functional Data Analysis. Springer Series in Statistics. Springer-Verlag, New York, 2 edition.
Ramsay, (2002) Ramsay, T. (2002). Spline smoothing over difficult regions. J. R. Stat. Soc. Ser. B Stat. Methodol., 64(2):307–319.
Richard, (2016) Richard, F. J. P. (2016). Tests of isotropy for rough textures of trended images. Statistica Sinica, 26(3):1279–1304.
Samarov and Tsybakov, (2004) Samarov, A. and Tsybakov, A. (2004). Nonparametric independent component analysis. Bernoulli, 10(4):565–582.
Sangalli et al., (2013) Sangalli, L. M., Ramsay, J. O., and Ramsay, T. O. (2013). Spatial spline regression models. J. R. Stat. Soc. Ser. B. Stat. Methodol., 75(4):681–703.
Shen and Hsing, (2020) Shen, J. and Hsing, T. (2020). Hurst function estimation. Ann. Stat., 48(2):838 – 862.
Stein, (2002) Stein, M. L. (2002). Fast and exact simulation of fractional Brownian surfaces. J. Comput. Graph. Statist., 11(3):587–599.
Wang et al., (2023) Wang, S., Patilea, V., and Klutchnikoff, N. (2023). Adaptive functional principal components analysis. arxiv 2306.16091.
Wood and Chan, (1994) Wood, A. T. A. and Chan, G. (1994). Simulation of stationary Gaussian processes in $[0,1]^{d}$ . J. Comput. Graph. Statist., 3(4):409–432.
Wood et al., (2008) Wood, S. N., Bravington, M. V., and Hedley, S. L. (2008). Soap film smoothing. J. R. Stat. Soc. Ser. B Stat. Methodol., 70(5):931–955.