direg_SM
Structural adaptation via directional regularity: rate accelerated estimation in multivariate functional data
Abstract
We introduce directional regularity, a new definition of anisotropy for multivariate functional data. Instead of taking the conventional view which determines anisotropy as a notion of smoothness along a dimension, directional regularity additionally views anisotropy through the lens of directions. We show that faster rates of convergence can be obtained through a change-of-basis by adapting to the directional regularity of a multivariate process. An algorithm for the estimation and identification of the change-of-basis matrix is constructed, made possible due to the unique replication structure of functional data. Non-asymptotic bounds are provided for our algorithm, supplemented by numerical evidence from an extensive simulation study. We discuss two possible applications of the directional regularity approach, and advocate its consideration as a standard pre-processing step in multivariate functional data analysis.
Key words: Anisotropy; Multivariate functional data; Hölder exponent
MSC2020: 62R10; 62G07; 62M99; 60G22
1 Introduction
Anisotropy is a key concept in probability and statistics. In the context of non-parametric statistics, it roughly states that the regularity is different between dimensions. Let be independent and identically distributed (iid) pairs observed under the model
with , and the design points ’s are uniformly distributed on the hypercube . The errors are assumed to be uncorrelated, centered random variables. When belongs to an anisotropic Hölder class, it is well known that under suitable assumptions on the noise, the minimax rate of estimation is , where is given by
(1) |
with denoting the regularity along dimension . The parameter is also known as the “effective smoothness" of , since it characterizes the regularity of across different dimensions. Since , anisotropy brings along better rates of convergence in the non-parametric regression setup of (1).
The regularities are usually considered along the canonical basis, the domain where the data is commonly observed on. However, this can lead to misleading conclusions since anisotropy is not invariant to directions. A relatively recent body of work in non-parametric statistics aims to give consideration to the directional nature of anisotropy, called structural adaptation (Lepski, (2015)). See Samarov and Tsybakov, (2004), Lepski and Rebelles, (2020), Ammous et al., (2024) for some examples in the density model. In spatial data, this has been considered under the umbrella of fractal analysis, for example in the seminal work of Davies and Hall, (1999).
Surprisingly, to the best of our knowledge, the directional nature of anisotropy has yet to be examined in multivariate functional data analysis. We aim to fill this gap by giving a rigorous definition of anisotropy through the directional regularity of a process. Through several examples, we demonstrate that performing a change-of-basis can increase the effective smoothness of a process along the canonical basis. This can lead to faster rates of convergence for ubiquitous tasks such as smoothing in functional data analysis. This change-of-basis matrix is intimately related to the notion of directional regularity that we introduce in this paper, and an algorithm is constructed to estimate it.
Functional data analysis (FDA) is a burgeoning field that deals with modern, complex data, such as those collected by sensors. For an introduction, see the textbooks Ramsay and Silverman, (2005), Horváth and Kokoszka, (2012), Kokoszka and Reimherr, (2017). FDA deals with data sets where the datum are functions, regarded as realizations of a stochastic process or random field, defined over a continuous domain. A unique feature of functional data is replication, where several sample paths are observed.
The remainder of this paper is organized as follows. In Section 2 we give a formal definition of directional regularity, which is elucidated through Lemma 1 and several examples. Lemma 1, which states that the regularity of a surface is the same in every direction except possibly one, is a key property that motivates much of the paper. In Section 3 we describe the estimation procedure of the parameters associated to the directional regularity approach. Section 4 studies the theoretical properties of our Algorithm, where non-asymptotic bounds are provided. Section 5 describes an extensive simulation study that illustrates the good finite sample properties of our Algorithms. Section 6 discusses additional properties of our Algorithm, such as computational considerations and other interesting insights. Section 7 describes some natural applications of our directional regularity approach, including surface smoothing and anisotropic detection. All the proofs are relegated to the Supplementary Material, which also contain additional simulation results.
2 Directional Regularity
2.1 Data setting and definition
Let be a second-order stochastic process on an open, bounded domain . In this paper, we will focus on the common design framework, which represents many applications. In particular, functional data collected by sensors and images fall into such a framework. The observations come in the form of pairs , generated under the model
(2) |
where are independent, centered errors with variance . We suppose without loss of generality that the design points are observed on the canonical basis, and that . The random design case with heteroscedastic errors, together with other possible extensions, are discussed in Section 6.
Definition 1.
Let be a stochastic process with continuous and non-differentiable sample paths, and be a unit vector. The process has local regularity at along the direction if a non-negative function exists such that :
(3) |
where for each fixed . The map
(4) |
is called the directional regularity of .
Definition 1 is local in the sense that the local regularity may vary along the domain . In the following, we consider the case where for some . If the function does not depend on the direction , we say that is an isotropic process, otherwise we call it an anisotropic process. Definition 1 formalizes anisotropy as a notion of regularity not only along a dimension, but also along a direction. This is in contrast to the usual consideration of anisotropy, which typically views it through the lens of the canonical basis. Some examples of processes with prescribed directional regularity will be discussed in Section 2.2.
A natural question that arises is the number of possible regularities that an anisotropic process can take, and how these regularities change with the directions. Lemma 1 provides a definitive answer to the preceding question, stating that there can be at most two different regularities. Furthermore, there exists a unique direction, up to a reflection, for the maximal regularity.
Lemma 1.
Let and assume that there exists basis vectors such that . Moreover, suppose that the functions and are continuously differentiable. For any , we have the following dichotomous relationship:
-
•
If , then .
-
•
Otherwise, we have .
The proof of Lemma 1 is provided in the Supplementary Material. It states that the map can take at most two possible values. This is in alignment with the results in Davies and Hall, (1999), who studied anisotropy from the lens of the fractal dimension of surfaces, and showed that “the fractal dimension of line transects across a surface must either be constant in every direction or be constant in each direction except one".
Let be the direction that maximizes the regularity of in the sense of Definition 1. Then and are the only two possible maximizers, and are “singularities" in the sense that if we locally examine any vector other than , the associated regularity is . Parameterizing the vectors by their angle , the maximization problem is equivalent to . By restricting the domain to , admits an unique solution. can thus be parameterized as locating the angle between and , given by
(5) |
An illustration can be found in Figure 1.
2.2 Examples
In the following, we provide some examples of processes that motivate Definition 1.
Example 1 (Sums and products of two fractional Brownian motions).
Let be the Hurst parameter of the fractional Brownian motion (fBm) . is a centered Gaussian process with covariance function
(6) |
The fBm is a generalization of the standard Brownian motion, which has Hurst parameter . Unlike the standard Brownian motion, the increments of the fBm are not necessarily independent. A simple calculation reveals that
(7) |
Let be a basis of , , and , be two independent fBms with Hurst parameters and respectively. Set
(8) |
The independence of and , together with (7), implies that for any sufficiently small, we have
(9) |
By Definition 1, for , the process has a local regularity along the direction with , and .
Similarly, we can define the tensor product of and w.r.t the basis :
(10) |
For any sufficiently small, we analogously have
(11) |
so the process has a local regularity along the direction .
Example 2 (Sum and product of two independent Ornstein-Uhlenbeck processes).
Another example that is regularly used in practice is the stationary fractional Ornstein-Uhlenbeck process , with index . It is a centered Gaussian process with a covariance function given by
(12) |
The covariance structure exhibits the following property:
(13) |
Its bi-dimensional counterpart can be constructed by considering either the sum or the tensor product of two independent Ornstein-Uhlenbeck processes, similar to the fBm in (8) and (10).
Example 3 (Multi-fractional Brownian sheet).
The local regularity in previous examples is constant along the domain . The multifractional Brownian sheet (MfBs) is a generalization of the standard fractional Brownian sheet, where the Hurst parameter is allowed to vary along the domain. See Herbin, (2006) among others. (See Kassi et al., , 2023, Proposition 6) for an illustration of the directional regularity property in the case of MfBs.
3 Methodology
3.1 Estimating equations
Let be the canonical basis of , and be orthonormal basis vectors such that . Let be the solution of the problem (5). Denote and . The estimating equation for is found in Proposition 1. Denote , for , and . Finally, we use to denote the cardinality of a finite set .
Proposition 1.
The remainder term will be discussed in Section 3.3. Proposition 1 allows us to find a function of the angle by examining the ratios of mean-squared variations, as in (3), by searching solely along the canonical basis. In order to focus on the main idea of directional regularity, we will consider the case where and are constant over . In theory, the general case can be established by considering a local, pointwise study of our methodology. Hereafter, the notational dependence of and on will sometimes be suppressed. When the directional regularity is constant over , more stable estimates can be obtained by averaging over the design points.
Due to the unique replication nature of function data, the quantities in (14) can be easily estimated. A natural plug-in estimator for the mean-squared variations is its empirical counterpart, given by
(15) |
where denotes an observable approximation of . Let be a finite approximation of with a deterministic number of grid points. Averaging over the design points, we obtain
(16) |
where
(17) |
is an estimator of the noise term in (2), with denoting the closest observed point to . Since the mean-squared variations are associated to the true realizations of the process and not the observed, noisy version, denoising is important to obtain an estimator with good rates of convergence. Theoretical properties for the estimator of in (17) are provided in Section 4.
In principle, one can choose between a variety of methods to construct the approximations . Only a mild moment condition of the form , for some is required, where . This is satisfied by many non-parametric smoothers, such as kernel smoothers and series expansions. See for instance Fan and Guerre, (2016) and Belloni et al., (2015). We will use nearest-neighbor interpolation, breaking ties by the lexicographic order, to construct since it is sufficient for our purposes, when coupled with a denoising step as in (16). Nearest-neighbor interpolation has the added advantage of requiring no tuning parameters, and is computationally efficient.
A recent line of work in regularity estimation within the context of FDA has emerged, which we can exploit for our purposes. See for example (Golovkine et al., , 2022), Golovkine et al., (2023), Wang et al., (2023), Maissoro et al., (2024), Kassi et al., (2023), and Shen and Hsing, (2020). We will adopt the multivariate version of Kassi et al., (2023), where the minimum regularity can be estimated as
(18) |
The regularity estimator in (18) is a slight adaptation of Kassi et al., (2023) by taking the minimum over the index of the basis vectors. This is motivated by Lemma 1, which says that at least one of the two vectors gives us the smallest regularity for the process . Collecting (16), (17) and (18), a plug-in estimator of (14) is given by
(19) |
By construction, the only relevant tuning parameter in (19) is the spacing . We address this issue in Section 3.2, immediately after Algorithm 1 is presented.
3.2 Identification issues
The estimating equation in (14) reveals two identification problems. The first issue is related to the function , which can either be the tangent () or cotangent () function depending on the ordinal nature of and . This reflects the phenomenon that the dominating term arising from the ratio of mean squared variations differ, depending on whether . However, this ordinality is unknown a priori.
The second issue is associated to the absolute value on the LHS of (14). By using the estimator in (19), one obtains either or . Similarly, the sign is unknown a prior to the statistician. Fortunately, an identification procedure can be constructed to resolve these two issues.
Let be a unit vector represented in the canonical basis as . By construction, the regularity is maximal when built using the angle in (14). The true value of can thus be identified as
(20) |
where
(21) |
The angles are computed by applying the relevant inverse maps to (19). An estimate of the regularity along an arbitrary direction can be obtained by
(22) |
In order to obtain more robust estimates with respect to the spacing , we propose to compute over a grid of points . The angle which maximizes the sum over is then selected, yielding the angle
(23) |
As seen from Theorem 1, the spacing used for identification should be chosen such that it is slower in rate compared to the one used for estimation, providing additional justification for (23). The estimation and identification procedure is summarized in Algorithm 1.
In the estimation of , it is sufficient to choose a fixed spacing . In order to ensure that the nearest-neighbor of is distinct to the nearest-neighbor of when computing in , should be chosen at least as large as . However, as seen from the theoretical results in Section 4, we need to be strictly larger than the minimum size required to capture one point in order to obtain asymptotic concentration. In the non-asymptotic setting, we propose a conservative choice of . Simulation results in Section 5 confirm that this choice works well in practice.
3.3 Correction for the remainder term
Let denote the remainder term in (14). It turns out that can become big enough to affect the estimation of through the constants, which is more pronounced for certain angles . Let be the local Hölder constant in the direction of the maximizing regularity , and let be the local Hölder constant in the orthogonal direction of . The explicit form of the remainder term is given by
(24) |
where
(25) |
and
(26) |
and
(27) |
It is easy to see from (24) that , regardless of the ordinal nature of and . This implies that when , the estimating equations in (14) are most accurate. However as or , the estimating equations can be dramatically affected by this remainder term. A plot of can be seen in Figure 2.
The remainder term is intrinsic to the process . It can be bounded by Cauchy-Schwarz inequality to obtain . However, this is a pessimistic approach, since can be equal to zero, for processes such as (8). In order to adjust for this possibly non-negligible remainder term, the remainder terms and can be estimated, provided that . Since only involves expectations of the process, it can be estimated by (16). We now focus on the estimation of .
Let and be the mean-squared variations such that , and respectively. Since
(28) |
we have
(29) |
and
(30) |
in the sense that the ratios of the LHS and RHS in (30) goes to 1 in the limit as . can be computed by replacing , , and with their estimates obtained by Algorithm 1. An adjusted estimate of will then be given by
(31) |
where is computed by (19). The simulation results in Section 5 suggest that using the adjusted estimates in (31) yields significant better results when gets further away from . In order to avoid introducing greater dependence between quantities, we suggest to only compute the adjusted estimates once. Moreover, we suggest to set in practice to decrease the computational load. This seems to produce good results , even when , as seen from the simulation results in the Supplementary Material. Likewise, the final estimate can be obtained by applying the appropriate inverse function (either or arccot) obtained by the identification process when computing the initial estimates . This not only saves computational time, but also results in valid inference, as opposed to repeating the identification process using .
4 Theoretical Properties
In the following, we provide non-asymptotic bounds for our main algorithms, as well as the auxiliary estimates. The proofs are provided in detail in the Supplementary Material. The following mild assumptions are imposed.
Assumptions.
-
(H1)
Let be anisotropic process with the two regularities , where , , are independent realizations of .
-
(H2)
The errror terms in the data model equation (2) are iid. The random variables and X are independents
-
(H3)
Three positive constants , and exist such that, for any ,
-
(H4)
A constant exists such that
(32)
Remark 1.
Assumption (H3) states that has sub-Gaussian increments in a local sense, for every point . It is satisfied by the processes considered in the simulations, as well as those presented in Section 2.2. In fact, under Definition 1, Assumption (H3) is equivalent to
an assumption which is more familiar and widely used in the literature.
Let denote a “proxy" of the directional regularity in (22), given by
(33) |
Whenever we write with an explicit dependence on , or mention the proxy, we are referring to the quantity in (33). We start by providing a bound for this quantity for different angles, which provides us with important insight into the behavior of the directional regularity in the “real world", where cannot be infinitesimally small. See Section 6 for a more detailed discussion. Recall that we are working with ; see Definition 1.
Proposition 2.
We have
(34) |
Suppose that Assumption (H3) holds. Then for any sufficiently small, and for any pair of angles , the following bound holds:
(35) | ||||
(36) |
Proposition 3 is a critical ingredient in allowing us to derive rates of convergence associated to the identification process in Algorithm 1.
Proposition 3.
The condition on in (37) represents the bias term that arises from performing denoised interpolation in order to compute the mean-squared variations. This bias term is an intrinsic feature of the “discretely observed" setup, where the surfaces are observed only at a finite number of discrete points, instead of in continuous time.
Corollary 1.
Suppose that assumptions of Proposition 3 hold true, and set
(41) |
Moreover, we assume that a non-negative constant (independent of and ) exist such that
(42) |
Then
(43) |
The condition (42) requires that falls within two power of , equivalently, the converse hold true. The choice of in (41) is done is such way that for any fixed , this choice can be found also in Golovkine et al., (2023) with . the rate in (43) converge to since is chosen to be larger than any negative power of , and keeping (42) in mind.
Proposition 4.
The condition on in (44) represent the reminder term in Proposition 1 combined with the condition (37) on in Proposition 3, such condition on can be satisfied by choosing as in (41) provided that is large enough. The probability bound (45) converge to zero as long as we choose as in (41) and the condition (42) is satisfied.
Theorem 1.
Suppose that assumptions of Proposition 3 hold true. Moreover assume that for any we have
(46) |
Then for any such that , we have for sufficiently large
(47) |
The convergence of the RHS in (47) can be guaranteed following the discussion provided for Proposition 3, and Proposition 4 and Corollary 1. The main message of Theorem 1 is that, in order to identify the right angle that gives the maximal regularity, we need to choose the values in the grid introduced in (23) to have a slower rate of decrease than the used to estimate the function in (19). This is because, in (46), we have .
5 Numerical Properties
We begin this section by briefly describing a simple, novel simulator for a class of bivariate anisotropic processes. The simulator, along with the other algorithms mentioned in this paper, are implemented in the R package direg333Freely accessible at https://github.com/sunnywang93/direg..
5.1 Anisotropic simulator
Let , be two processes with Hurst exponents and respectively. Suppose without loss of generality that . In the following, our exposition will be structured assuming that and are fractional brownian motions (fBms). In principle, our simulation approach can be generalized to other processes that exhibit the stationary increments property.
Many methods are available to simulate these types of processes, and we do not attempt to provide an exhaustive list. For examples of some papers, see Wood and Chan, (1994), Stein, (2002) and Coeurjolly and Porcu, (2018). A survey can be found in Dieker, (2004). In particular, the circulant embedding method described in Wood and Chan, (1994) is attractive due to its speed, and can be adapted to simulate processes such as the fBm with stationary increments.
The anisotropic component of the simulator will be our main focus, since there is already a wide array of methods available for simulating the individual processes. Let and be unit vectors with associated regularities and . The vectors can be represented in the canonical basis as and . The main difficulty of simulating processes on an equally spaced grid along , is the existence of negative values when , while the fBm has a domain in .
This problem can be resolved by exploiting the stationary increments property of the fBm, given by , where denotes equality in distribution. By using the reference point , we obtain
(48) |
An anisotropic fBm can thus be simulated on any by first simulating a fBm on an equally spaced grid in , before transforming the negative part using (48). A bivariate process can finally be constructed by applying a function to the individual processes, for example . Our simulator is similar in spirit to the Turning bands approach; see for example Matheron, (1973). Psuedocode for our simulator can be found in Algorithm SM.1 of the Supplementary Material.
5.2 Parameter settings and error measures
Observations were simulated using our algorithm for the sum of two fBms. Other processes (e.g product of two fBms) can be found in the Supplementary Material.
A total of 60 different parameter configurations were explored, consisting of all possible combinations of the following parameter sets: number of curves , number of points along each curve , noise level , and angles . In both processes, the regularities were fixed to be and . In line with our discussion in Section 3.2, the parameter was set to be . The grid of spacings involved in the identification process was set to , an evenly spaced grid consisting of points. The absolute error is used as a risk measure for each experiment, given by
5.3 Empirical Results
Boxplots for the simulation results can be seen in Figure 3. The risk can be seen to be very small, below 0.1 for all different configurations, even in the extremely high noise setting . In fact, we see that our Algorithms are fairly robust to noise, in the sense that the risk increase less than proportionately to the noise levels. As seen in Section 7, the levels of accuracy observed in Figure 3 is sufficient to achieve lower risk levels for applications such as smoothing.
6 Discussion
In this section we discuss some key aspects of our methodology. Extensions of our approach to more general settings, such as the random design framework with heteroscedastic noise, can be found in the Supplementary Material.
6.1 Computational Considerations
The computational complexity of Algorithm 1 is driven primarily by the identification process, due to the additional loop over the grid of spacings . Since we are performing a linear grid search in each basis vector containing points, the complexity of nearest neighbor interpolation for each surface is , resulting in for sample paths. By searching over points to determine the closest point to each in terms of norm, the complexity of computing is , so the complexity of is for each , . The complexity of the full identification process is therefore . In the context of fda, a complexity of at least is to be expected. Since is relatively small (e.g 15 points), our algorithm is reasonably good in terms of computational complexity, considering the possible gains in the rates of convergence.
Clearly, the simulation described in Section 5.1 depends on the method used to generate the individual fBms. Once the fBms are given, only a linear search over grid points is required. Using a fast simulation method such as the circulant embedding method in Wood and Chan, (1994), individual one dimensional curves can be simulated in time. Thus the simulation of surfaces can be done in . To the best of our knowledge, we do not know of any simulation algorithm in the context of fda that has better computational complexity.
6.2 Singularity along the maximizing direction
Lemma 1 states that there is a singularity present in the concept of directional regularity, in the sense that there exists only one direction for which the regularity is maximal. Since the angles are not expected to be estimated perfectly, why can there be a gain in convergence rates, as seen in the empirical results in Sections 5.3?
The answer lies in the intrinsic nature of directional regularity, which is itself intimately related to the mean-squared variations of a process. It is an “infinitesimal" concept, in the sense that
(50) |
However, the limiting case cannot be obtained with finite precision computers. In reality, the “true" directional regularity" that one observes is given by the proxy . This partly explains the observed “continuity" associated to the directional regularity in practice, so that the estimate of only needs to be sufficiently small in order for gains in the rate to be achieved.
A more precise, formal perspective can be taken through the lens of Proposition 2, found in the Supplementary Material. In the upper bound of , we observe two competing terms, in the form of and . In order for to converge to zero, it is necessary that the condition
(51) |
is satisfied. In other words, , (51) can be loosely interpreted as saying “for directions that are close enough to each other, the proxies in (22), of the directional regularity stays somewhat close"; see Proposition 2.
7 Applications
In this section, we discuss concrete applications of our directional regularity methodology. They serve as a strong motivation for performing a change of basis using our methodology as a standard pre-processing step.
7.1 Smoothing Bivariate Functional Data
Let be a bi-variate stochastic process satisfying (3), with maximizing regularity along the direction . Following the preceding sections, our exposition will be centered around the common design case, although this is not necessary. Observations come in the form of pairs , generated under
(52) |
where are i.i.d centered random variables with unit variance. We call the learning set.
Consider a new realization of , where the observed pairs are generated from
(53) |
where are i.i.d centered random variables with unit variance and and are independent. We refer to as the online set.
Our goal is the recovery of with , using a suitable estimator . Several methodologies for smoothing multivariate functional data currently exist. For example, Ramsay, (2002), Sangalli et al., (2013), Wood et al., (2008) consider spatial regression using a penalty that involves the Laplacian. These do not take the anisotropy of the process into account. Azzimonti et al., (2015) and Bernardi et al., (2018) extends these previous approaches by using a penalty term involving the partial different operator, and takes the anisotropy of the process into account. All these methods assume that the processes are differentiable for the penalty to be well defined. This assumption does not always hold in practice, and we propose an approach which works for non-differentiable processes.
The parameters of are estimated from the learning set. Let be a clockwise rotation matrix given by
(54) |
Define a new process by applying the rotation matrix to the sampling points, denoted by
(55) |
It can be seen easily that admits a larger effective smoothness along the canonical basis. It is thus beneficial to work instead with instead of on the canonical basis. In practice, this transformed process can be obtained by performing a change-of-basis after estimating the rotation matrix using our methodology.
For simplicity, we fix the smoother to be the Nadaraya-Watson estimator with the multiplicative Epanechnikov kernel , supported on , where
(56) |
Furthermore, is a positive definite bandwidth matrix. Using the rule , the Nadaraya-Watson estimator is given by
(57) |
The bandwidths should adapt to the regularity of the process to obtain the optimal rate of convergence. In particular, the bandwidths should be selected adaptively to the intrinsic anisotropy of the process. This can be achieved through a plug-in bandwidth rule when explicit risk bounds are available and directly estimable from the data. The estimate of is then obtained by
(58) |
Let denote the ball centered at the origin of with radius . Consider the risk of , given by
(59) |
where the second inequality is given by substitution and the fact that the matrix is with determinant 1. The pointwise risk is given by
(60) |
Observe that Fubini’s theorem implies so the integrated risk can readily be recovered from the pointwise risk. Proposition 5 provides a pointwise risk bound. Assumptions can be found in the Appendix.
Proposition 5.
Let and in a bandwidth range satisfying
(61) |
Then the following risk bounds hold:
(62) |
The proof is provided in the Supplementary Material. The following corollary provides the optimal bandwidth with respect to the norm.
Corollary 2.
Under the same assumptions of Proposition 5, the optimal bandwidths and satisfies
(63) |
Moreover, if and is used for smoothing, the following risk is obtained:
(64) |
where is the “effective smoothness", defined by the relation
(65) |
All the quantities in Corollary 2 are estimable from the data using our methodology. Structural adaptation enables the optimal bandwidths to be chosen according to Corollary 2, obtaining the anisotropic rates of convergence, when the intrinsic anisotropy is not in the direction of the canonical basis. By working only on the canonical basis, the rates of convergence that is generally obtained corresponds to the isotropic rate. This is confirmed by our simulation study, which we discuss now.
Our simulation study aims to compare the risk of smoothing, with and without a change of basis. Surfaces corresponding to the sum of fBms with regularities , were simulated using the Algorithm described in Section 5.1. The angles were given by . surfaces were generated with evenly spaced sampling points. Gaussian noise were added, where and . Estimates of were obtained with Algorithm 1, with , , and .
The online set, consisting of one surface per replication, were generated from the same process, for a total of 400 replications. The true surface was first generated on an equally spaced grid of 201 points on without noise. Nearest neighbor interpolation was then performed to discretize the process onto an grid of 101 points. The same Gaussian noise was added as the learning set.
For the anisotropic setp, a change of basis is performed on the online set. The minimum and maximum regularities were estimated using (18) and (22). The bandwidths were selected according to Corollary 2. In the isotropic setup, no change of basis is performed, and the bandwidths were similarly selected according to Corollary 2, with . A multiplicative kernel was used, where the Epachenikov kernel was used in each dimension.
The risk measure was taken to be the relative risk, given by
(66) |
where and correspond to the anisotropic and isotropic risk respectively. Simulation results can be seen in Figure 4. We see that with the exception of the angles near the boundary (i.e 0 or ), performing a change of basis leads to a reduction in the risk, by levels as much as 10%. It is not surprising that at the boundary, the risk is worse, since one is already anisotropic along the canonical basis.
7.2 Anisotropic detection
Our focus so far has been on estimating the direction of the maximizing regularity, which implicitly assumes that anisotropy is present. When the process is intrinsically isotropic, no gains can be made by structural adaptation since the regularity of the process is invariant to directions. However, there is no real loss in terms of statistical error either, since the underlying regularity remains the same, as implied by Lemma 1. In particular, the rates of convergence do not get degraded, and can only be improved with structural adaptation.
Nevertheless, there might be instances where knowing the presence of anisotropy is relevant for some applications. Some examples include fingerprint verification Jain et al., (1997), Jiang, (2005) and texture analysis in materials science Germain et al., (2003). This issue can be naturally addressed using the directional regularity approach, by constructing an anistropic detection procedure based on thresholding. The underlying idea is to consider an event
(67) |
and determine anisotropy based on , for some threshold that is appropriately chosen. In particular, should be selected to account for the estimation errors of and .
Let and denote the estimation errors of the minimum and maximum regularities respectively. By construction, we have , with strict inequality holding in general, since is used as an auxiliary quantity for estimating . A sensible approach is thus to choose such that . On the one hand, when (i.e isotropy), choosing accounts for the estimation error so that with high probability. On the other hand, when , choosing is necessary to avoid falsely detecting isotropy. An illustration of this can be seen in Figure 5.
Since is unknown in practice, we construct a data-driven procedure to estimate it, which we denote . Let be a random set of angles, where is an integer that depends on the sample size. Let be the set of angles orthogonal to , where , . The main idea is to estimate the minimum regularity associated to each and , and compute the average difference between these two estimates, given by
(68) |
where for all , we have
(69) |
The principle for this approach is rooted in Lemma 1, which states that under anisotropy, there exists only one maximising direction . Thus if we take any other random direction , , then the regularity that we will “catch" corresponds to the minimum one.
The quantity in (68) estimates the “irreducible error" that arises in estimating the worst regularity by simply changing directions, exemplified by Proposition 2. In order to avoid always detecting anisotropy, this error needs to be taken into account, so that . Averaging the estimated regularities over a grid of ’s provides added robustness since is no longer simply an auxiliary quantity.
There are two reasons for computing the ’s using different groups of estimates in and . The first is to preserve the independence between summands when computing in (68). The second is to ensure that the angles are sufficiently well separated to ensure that the difference in the estimates are not driven by the proximity of angles; see Proposition 2. The threshold is then set to
(70) |
for some . The extra term in (70) converges to zero slower than the rate of convergence of , and ensures that the strict inequality is preserved for any large enough. Algorithm 2 provides a summary of the anisotropic detection procedure.
In theory, the set of angles can be chosen randomly, as long as . However, in order to avoid any “continuity" issues, each should be sufficiently far away from . We thus suggest to sample the angles from a uniform distribution, as seen in Algorithm 2. The power in (70), which governs the rate of convergence, only needs to be in theory. Following Golovkine et al., (2023), we choose , a value that seems to work well in practice. The integer should be increasing with and , so that eventually converges. In view of computational efficiency, we suggest to select , which is aligned with the rate of convergence of and also works well in practice.
We establish the consistency of Algorithm 2 in the following proposition.
Proposition 6.
The proof can be found in the Supplementary Material.
Simulation results for the anisotropic detection procedure described in Section 7.2 can be seen in Table 1. The percentage column indicates the percentage of cases classified as anisotropic (). In the isotropic case, we achieve virtually perfect classification, even for relatively small values of . In the anisotropic case, we require either a sufficiently large number of points observed on each surface, or for the difference to be well-separated. When the difference in regularities is large enough, we can achieve almost perfect classification for . In the context of regularity estimation and anisotropic detection, this is a relatively small number of points. For example, Richard, (2016) constructs a statistical test for isotropic detection, where the number of surfaces plays a more important role. In his simulations, , a much larger quantity.
Setup | Percentage | |||
---|---|---|---|---|
Isotropic | 0.5 | 0.1 | 0 | |
Isotropic | 0.5 | 0.1 | 0 | |
Anisotropic | 0.8 | 0.1 | 35.8 | |
Anisotropic | 0.8 | 0.1 | 42.4 | |
Anisotropic | 0.9 | 0.1 | 80.6 | |
Anisotropic | 0.9 | 0.1 | 97 |
Acknowledgements
We thank Valentin Patilea for a careful reading of this paper, and providing detailed suggestions and valuable feedback.
Funding
Sunny Wang gratefully acknowledge support from PIA EUR DIGISPORT project (ANR-18-EURE-0022).
Supplementary Material
In the supplement, we provide detailed proofs for the Propositions, Lemmas and Theorems in the main paper. Moreover, additional simulation results for different processes are available.
References
- Ammous et al., (2024) Ammous, S., Dedecker, J., and Duval, C. (2024). Adaptive directional estimator of the density in for independent and mixing sequences. J. Multivariate Anal., 203:105332.
- Azzimonti et al., (2015) Azzimonti, L., Sangalli, L. M., Secchi, P., Domanin, M., and Nobile, F. (2015). Blood flow velocity field estimation via spatial regression with PDE penalization. J. Amer. Statist. Assoc., 110(511):1057–1071.
- Belloni et al., (2015) Belloni, A., Chernozhukov, V., Chetverikov, D., and Kato, K. (2015). Some new asymptotic theory for least squares series: Pointwise and uniform results. J. Econometrics, 186(2):345–366.
- Bernardi et al., (2018) Bernardi, M. S., Carey, M., Ramsay, J. O., and Sangalli, L. M. (2018). Modeling spatial anisotropy via regression with partial differential regularization. J. Multivariate Anal., 167:15–30.
- Coeurjolly and Porcu, (2018) Coeurjolly, J.-F. and Porcu, E. (2018). Fast and exact simulation of complex-valued stationary Gaussian processes through embedding circulant matrix. J. Comput. Graph. Statist., 27(2):278–290.
- Davies and Hall, (1999) Davies, S. and Hall, P. (1999). Fractal analysis of surface roughness by using spatial data. J. R. Stat. Soc. Ser. B. Stat. Methodol., 61(1):3–37.
- Dieker, (2004) Dieker, T. (2004). Simulation of fractional brownian motion.
- Fan and Guerre, (2016) Fan, Y. and Guerre, E. (2016). Multivariate local polynomial estimators: Uniform boundary properties and asymptotic linear representation. In Essays in Honor of Aman Ullah, volume 36, pages 489–537. Emerald Group Publishing Limited.
- Germain et al., (2003) Germain, C., Da Costa, J., Lavialle, O., and Baylou, P. (2003). Multiscale estimation of vector field anisotropy application to texture characterization. Signal Processing, 83(7):1487–1503.
- Golovkine et al., (2022) Golovkine, S., Klutchnikoff, N., and Patilea, V. (2022). Learning the smoothness of noisy curves with application to online curve estimation. Electron. J. Stat., 16(1):1485–1560.
- Golovkine et al., (2023) Golovkine, S., Klutchnikoff, N., and Patilea, V. (2023). Adaptive estimation of irregular mean and covariance functions. arxiv 2108.06507v2.
- Herbin, (2006) Herbin, E. (2006). From parameter fractional Brownian motions to parameter multifractional Brownian motions. Rocky Mountain J. Math., 36(4):1249–1284.
- Horváth and Kokoszka, (2012) Horváth, L. and Kokoszka, P. (2012). Inference for functional data with applications. Springer Science & Business Media.
- Jain et al., (1997) Jain, A., Hong, L., and Bolle, R. (1997). On-line fingerprint verification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):302–314.
- Jiang, (2005) Jiang, X. (2005). On orientation and anisotropy estimation for online fingerprint authentication. IEEE Transactions on Signal Processing, 53(10):4038–4049.
- Kassi et al., (2023) Kassi, O., Klutchnikoff, N., and Patilea, V. (2023). Learning the regularity of multivariate functional data.
- Kokoszka and Reimherr, (2017) Kokoszka, P. and Reimherr, M. (2017). Introduction to functional data analysis. Texts in Statistical Science Series. CRC Press, Boca Raton, FL.
- Lepski, (2015) Lepski, O. (2015). Adaptive estimation over anisotropic functional classes via oracle approach. Ann. Stat., 43(3):1178 – 1242.
- Lepski and Rebelles, (2020) Lepski, O. V. and Rebelles, G. (2020). Structural adaptation in the density model. Math. Stat. Learn., 3(3-4):345–386.
- Maissoro et al., (2024) Maissoro, H., Patilea, V., and Vimond, M. (2024). Adaptive estimation for weakly dependent functional times series.
- Matheron, (1973) Matheron, G. (1973). The intrinsic random functions and their applications. Advances in Appl. Probability, 5:439–468.
- Ramsay and Silverman, (2005) Ramsay, J. and Silverman, B. W. (2005). Functional Data Analysis. Springer Series in Statistics. Springer-Verlag, New York, 2 edition.
- Ramsay, (2002) Ramsay, T. (2002). Spline smoothing over difficult regions. J. R. Stat. Soc. Ser. B Stat. Methodol., 64(2):307–319.
- Richard, (2016) Richard, F. J. P. (2016). Tests of isotropy for rough textures of trended images. Statistica Sinica, 26(3):1279–1304.
- Samarov and Tsybakov, (2004) Samarov, A. and Tsybakov, A. (2004). Nonparametric independent component analysis. Bernoulli, 10(4):565–582.
- Sangalli et al., (2013) Sangalli, L. M., Ramsay, J. O., and Ramsay, T. O. (2013). Spatial spline regression models. J. R. Stat. Soc. Ser. B. Stat. Methodol., 75(4):681–703.
- Shen and Hsing, (2020) Shen, J. and Hsing, T. (2020). Hurst function estimation. Ann. Stat., 48(2):838 – 862.
- Stein, (2002) Stein, M. L. (2002). Fast and exact simulation of fractional Brownian surfaces. J. Comput. Graph. Statist., 11(3):587–599.
- Wang et al., (2023) Wang, S., Patilea, V., and Klutchnikoff, N. (2023). Adaptive functional principal components analysis. arxiv 2306.16091.
- Wood and Chan, (1994) Wood, A. T. A. and Chan, G. (1994). Simulation of stationary Gaussian processes in . J. Comput. Graph. Statist., 3(4):409–432.
- Wood et al., (2008) Wood, S. N., Bravington, M. V., and Hedley, S. L. (2008). Soap film smoothing. J. R. Stat. Soc. Ser. B Stat. Methodol., 70(5):931–955.