First we load the pcds
package:
pcds
is an R
package that stands for
proximity catch digraphs (PCDs) and provides
construction and visualization of the PCDs, and the spatial pattern
tests (for inference on spatial interaction between classes or species)
based on the two graph invariants of the PCDs. These invariants are the
domination number and the arc density. The package
provides a set of functions for the construction and visualization of
three PCD families, namely arc-slice PCDs (AS-PCDs),
proportional-edge PCDs (PE-PCDs), and central
similarity PCDs (CS-PCDs), and for spatial inference based
on two of these PCD families, namely PE- and CS-PCDs. Here, the spatial
inference concerns testing class/species interaction for point pattern
data, usually in two dimensional space. The spatial interaction patterns
of interest are segregation and association. Segregation is the
pattern in which classes tend to repel each other in the sense that
points tend to be clustered around points from the same class (forming
same-class clusters), while association is the pattern in which
points from one class tend to cluster around points from another class
(forming mixed-class clusters).
The one-dimensional versions of the PCD functions are also provided
where AS-PCD is a special case of PE-PCDs or CS-PCDs. The
one-dimensional versions are currently used for testing uniformity of
points, instead of interaction between classes (although it is possible
to use them for this purpose as well). We only extend the PE-PCD
construction and visualization to three dimensions in this package.
The vignette files for the pcds
package are written as
sections or chapters of a larger main vignette file and are organized as
“VSk_Title” where “k” is the section number and “Title” is the
corresponding markdown file name, starting with “VS0_Intro”. It is
recommended to read the vignettes in this order for more efficient and
informed use, however, they can be used in any order the users/readers
prefer, but some tools or concepts may have been introduced in an
earlier section (mostly with references).
The goal of these vignette files or sections is to facilitate graph
abstraction of spatial point data and make it easier for users to adopt
pcds
by providing a comprehensive overview of the package’s
contents and detailed examples and illustration of certain functions. We
begin with the
pcds
.Then we, illustrate
The discussion covers the structure of function arguments, the required input data formats, and the various output formats. The subsequent sections provide visualization of the proximity regions, and the associated PCDs, and computation of domination number and arc-density of the PCDs, and the computation of the large-sample tests based on these invariants.
For ease of exposition, we have grouped the package contents according their functionality and theme:
Because all of these groups contain many functions, we organize them into subgroups by purpose. Below, we display each group of functions in a table with one column per subgroup.
Utility | PCDs | Pattern Generation | S3 Methods |
---|---|---|---|
AuxDomination | ArcSliceFunctions | PatternGen | ClassFunctions |
AuxGeometry | PropEdge* |
||
AuxDelaunay C | entSim* |
||
AuxExtrema |
|
PropEdge* contains PropEdge1D, PropEdge2D, and PropEdge3D functions, whereas CentSim* contains CentSim1D and CentSim2D functions only.
ClassFunctions contain functions like summary
,
print.summary
, and plot
of the following
object classes: Lines
,TriLines
,
Lines3D
, Planes
, Patterns
,
Uniform
, Extrema
, and PCDs
. Among
these objects, Lines
, TriLines
,
Lines3D
, and Planes
facilitate visualization
of the various geometric structures, proximity regions, and the
corresponding digraphs, while Patterns
is used for spatial
point pattern generation (with Uniform
being a special
case), and PCDs
pertains to the actual PCDs (the number of
arcs, visualization of the digraph, and so on).
In all the pcds
functions, points are vectors, and data
sets are either matrices or data frames.
We illustrate PCDs in a two-class setting, extension to multi-class setting can be done in a pair-wise fashion or one-vs-rest fashion for the classes. For two classes, 𝒳 and 𝒴, of points, let 𝒳 be the class of interest (i.e. the target class) and 𝒴 be the reference class (i.e. the non-target class) and 𝒳n and 𝒴m be samples of size n and m from classes 𝒳 and 𝒴, respectively. The proximity map N(⋅) : Ω → 2Ω associates with each point x ∈ 𝒳, a proximity region N(x) ⊂ Ω. Consider the data-random (or vertex-random) proximity catch digraph (PCD) D = (V, A) with vertex set V = 𝒳 and arc set A defined by (u, v) ∈ A ⇔ {u, v} ⊂ 𝒳 and v ∈ N(u). The digraph D depends on the (joint) distribution of 𝒳 and on the map N(⋅). The adjective proximity — for the digraph D and for the map N(⋅) — comes from thinking of the region N(x) as representing those points in Ω “close” to x. The binary relation u ∼ v, which is defined as v ∈ N(u), is asymmetric, thus the adjacency of u and v is represented with directed edges or arcs which yield a digraph instead of a graph. See Chartrand, Lesniak, and Zhang (2010) and West (2001) for more on graphs and digraphs.
In the PCD approach the points correspond to observations from class 𝒳 and the proximity regions are defined to be (closed) regions (usually convex regions or simply triangles) based on class 𝒳 and 𝒴 points and the proximity region for a class 𝒳 point, x, gets larger as the distance between x and class 𝒴 points increases.
We briefly define three proximity map families. Let Ω = ℝd and let 𝒴m = {y1, y2, …, ym} be m points in general position in ℝd and Ti be the ith Delaunay cell for i = 1, 2, …, Jm, where Jm is the number of Delaunay cells based on 𝒴m. Let 𝒳n be a set of iid random variables from distribution F in ℝd with support 𝒮(F) ⊆ 𝒞H(𝒴m) where 𝒞H(𝒴m) stands for the convex hull of 𝒴m. For illustrative purposes, we focus on ℝ2 where a Delaunay tessellation is a triangulation, provided that no more than three 𝒴 points are cocircular (i.e., lie on the same circle). Furthermore, for simplicity, let 𝒴3 = {y1, y2, y3} be three non-collinear points in ℝ2 and T(𝒴3) = T(y1, y2, y3) be the triangle with vertices 𝒴3. Let 𝒳n be a set of iid random variables from F with support 𝒮(F) ⊆ T(𝒴3).
We define the arc-slice proximity region with M-vertex regions for a point x ∈ T(𝒴3) as follows; see also Figure @ref(fig:NAS-example). Using a center M of T(𝒴3), we partition T(𝒴3) into “vertex regions” RV(y1), RV(y2), and RV(y3). If M is the circumcenter of T(𝒴3), we use perpendicular line segments from M to the opposite edges to form the vertex regions. If M is not the circumcenter but is in the interior of T(𝒴3), we use line segments from M to the opposite edges as extensions of the lines joining the vertices and M to form the vertex regions. For x ∈ T(𝒴3) \ 𝒴3, let v(x) ∈ 𝒴3 be the vertex in whose region x falls, so x ∈ RV(v(x)). If x falls on the boundary of two vertex regions, we assign v(x) arbitrarily to one of the adjacent regions. The arc-slice proximity region is $N_{AS}(x):=\overline B(x,r(x)) \cap T(\mathcal{Y}_3)$ where $\overline B(x,r(x))$ is the closed ball centered at x with radius r(x) := d(x, v(x)). To make the dependence on M explicit, we also use the notationNAS(⋅, M). A natural choice for the radius is r(x) := miny ∈ 𝒴d(x, y) which implicitly uses the CC-vertex regions, since x ∈ RCC(y) iff y = arg minu ∈ 𝒴d(x, u). See Figure @ref(fig:NAS-example) for NAS(x, MCC) with x ∈ RCC(y2) and Ceyhan (2010) for more detail on AS proximity regions.
We define the proportional-edge proximity map with expansion parameter r ≥ 1 as follows; see also Figure @ref(fig:NPE-example). Using a center M of T(𝒴3), we partition T(𝒴3) into “M-vertex regions” as in Section @ref(sec:PE-PCD-construction). Let e(x) be the edge of T(𝒴3) opposite v(x), the vertex whose region contains x, ℓ(x) be the line parallel to e(x) through x, and d(v(x), ℓ(x)) be the Euclidean distance from v(x) to ℓ(x). For r ∈ [1, ∞), let ℓr(x) be the line parallel to e(x) such that d(v(x), ℓr(x)) = rd(v(x), ℓ(x)) and d(ℓ(x), ℓr(x)) < d(v(x), ℓr(x)). Let TPE(x, r) be the triangle similar to and with the same orientation as T(𝒴3) having v(x) as a vertex and ℓr(x) as the opposite edge. Then the proportional-edge proximity region NPE(x, r) is defined to be TPE(x, r) ∩ T(𝒴3). To make the dependence on M explicit, we also use the notationNPE(x, r, M). A natural choice for the center is the center of mass (CM) yielding the CM-vertex regions, which have the same area (equaling one-third of the area of T(𝒴3)). Notice that r ≥ 1 implies x ∈ NPE(x, r). Note also that limr → ∞NPE(x, r) = T(𝒴3) for all x ∈ T(𝒴3) \ 𝒴3, so we define NPE(x, ∞) = T(𝒴3) for all such x. For x ∈ 𝒴3, we define NPE(x, r) = {x} for all r ∈ [1, ∞]. See Ceyhan and Priebe (2005), Ceyhan, Priebe, and Wierman (2006), and Ceyhan (2014) for more detail.
We define the central similarity proximity map with expansion parameter τ > 0 as follows; see also Figure @ref(fig:NCS-example). Let ej be the edge opposite vertex yj for j = 1, 2, 3, and let “M-edge regions” RE(e1), RE(e2), RE(e3) partition T(𝒴3) using line segments from the center M in the interior of T(𝒴3) to the vertices. For x ∈ (T(𝒴3))o, let e(x) be the edge in whose region x falls; x ∈ RE(e(x)). If x falls on the boundary of two edge regions we assign e(x) arbitrarily. For τ > 0, the central similarity proximity region NCS(x, τ) is defined to be the triangle TCS(x, τ) ∩ T(𝒴3) with the following properties:
For τ ∈ (0, 1], the triangle TCS(x, τ) has an edge eτ(x) parallel to e(x) such that d(x, eτ(x)) = τ d(x, e(x)) and d(eτ(x), e(x)) ≤ d(x, e(x)) and for τ > 1, d(eτ(x), e(x)) < d(x, eτ(x)) where d(x, e(x)) is the Euclidean distance from x to e(x),
the triangle TCS(x, τ) has the same orientation as and is similar to T(𝒴3),
the point x is at the center of mass of TCS(x, τ).
Note that (i) implies that the expansion parameter is τ, (ii) implies “similarity”, and (iii) implies “central” in the name, (parameterized) central similarity proximity map. To make the dependence on M explicit, we also use the notationNCS(x, τ, M). A natural choice for the center is the center of mass (CM) yielding the CM-edge regions, which have the same area (equaling one-third of the area of T(𝒴3)). Notice that τ > 0 implies that x ∈ NCS(x, τ) and, by construction, we have NCS(x, τ) ⊆ T(𝒴3) for all x ∈ T(𝒴3). For x ∈ ∂(T(𝒴3)) and τ ∈ (0, ∞], we define NCS(x, τ) = {x}. For all x ∈ T(𝒴3)o the edges eτ(x) and e(x) are coincident iff τ = 1. Note also that limτ → ∞NCS(x, τ) = T(𝒴3) for all x ∈ (T(𝒴3))o, so we define NCS(x, ∞) = T(𝒴3) for all such x. The central similarity proximity maps in Ceyhan and Priebe (2003) and Ceyhan, Priebe, and Marchette (2007) are NCS(⋅, τ) with τ = 1 and τ ∈ (0, 1], respectively, and in Ceyhan (2014) with τ > 1.
The convex hull of the non-target class CH(𝒴m) can be partitioned into Delaunay cells through the Delaunay tessellation of 𝒴m ⊂ ℝd. The Delaunay tessellation becomes a triangulation in ℝ2 which partitions CH(𝒴m) into non-intersecting triangles. For 𝒴 points in general position, the triangles in the Delaunay triangulation satisfy the property that the circumcircle of a triangle contain no 𝒴 points except for the vertices of the triangle. In higher dimensions (i.e., d > 2), Delaunay cells are d-simplices (for example, a tetrahedron in ℝ3). Hence, the CH(𝒴m) is the union of a set of disjoint d-simplices {𝔖k}k = 1K where K is the number of d-simplices, or Delaunay cells. Each d-simplex has d + 1 non-co(hyper)planar vertices where none of the remaining points of 𝒴m are in the interior of the circumsphere of the simplex (except for the vertices of the simplex which are points from 𝒴m). Hence, simplices of the Delaunay tessellations are more likely to be acute (simplices will not have substantially large inner angles). Note that Delaunay tessellation is the dual of the Voronoi diagram of the points 𝒴m. A Voronoi diagram is a partitioning of ℝd into convex polytopes such that the points inside each polytope is closer to the point associated with the polytope than any other point in 𝒴. Hence, a polytope V(y) associated with a point y ∈ 𝒴m is defined as V(y) = {v ∈ ℝd : ∥v − y∥ ≤ ∥v − z∥ for all z ∈ 𝒴m \ {y}}.
Here, ∥⋅∥ stands for the usual Euclidean norm. Observe that the Voronoi diagram is unique for a fixed set of points 𝒴m. A Delaunay graph is constructed by joining the pairs of points in 𝒴m whose boundaries of Voronoi polytopes have nonempty intersections. The edges of the Delaunay graph constitute a partitioning of CH(𝒴m) providing the Delaunay tessellation. By the uniqueness of the Voronoi diagram, the Delaunay tessellation is also unique (except for cases where d + 1 or more points lying on the same hypersphere). Run the below code for an illustration of Delaunay triangulation of 20 uniform 𝒴 points in the unit square (0, 1) × (0, 1). More detail on Voronoi diagrams and Delaunay tessellations can be found in Okabe et al. (2000).
Arc Density: The arc density of a digraph D = (V, A) of order |V| = n, denoted ρ(D), is defined as $$ \rho(D) = \frac{|A|}{n(n-1)} $$ where |⋅| stands for set cardinality (Janson, Łuczak, and Ruciński (2000)). Thus ρ(D) represents the ratio of the number of arcs in the digraph D to the number of arcs in the complete symmetric digraph of order n, which has n(n − 1) arcs.
If $X_1,X_2,\ldots,X_n \stackrel{iid}{\sim} F$, then the relative density of the associated data-random PCD, denoted ρ(𝒳n; h, N), is a U-statistic, $$\rho(\mathcal{X}_n;h,N) = \frac{1}{n(n-1)} \underset{i<j}{\sum\sum}h(X_i,X_j;N) $$
wherewith I{⋅} being the indicator function. We denote h(Xi, Xj; N) as hij for brevity of notation. Since the digraph is asymmetric, hij is defined as the number of arcs in D between vertices Xi and Xj, in order to produce a symmetric kernel with finite variance (Lehmann (2004)).
See Ceyhan, Priebe, and Wierman (2006), Ceyhan, Priebe, and Marchette (2007), and Ceyhan (2014) for arc density of PE-PCDs and its use for spatial interaction for 2D data; and Ceyhan (2012) and Ceyhan (2016) for arc density of PE-PCDs for 1D data and its use for testing uniformity.
Domination Number: In a digraph D = (V, A), a vertex v ∈ V dominates itself and all vertices of the form {u : (v, u) ∈ A}. A dominating set SD for the digraph D is a subset of V such that each vertex v ∈ V is dominated by a vertex in SD. A minimum dominating set SD* is a dominating set of minimum cardinality and the domination number γ(D) is defined as γ(D) := |SD*| (see, e.g., Lee (1998)). If a minimum dominating set is of size one, we call it a dominating point. Note that for |V| = n > 0, 1 ≤ γ(D) ≤ n, since V itself is always a dominating set. See Ceyhan and Priebe (2005) and Ceyhan (2011) for the domination number and its use for testing spatial interaction patterns in 2D data and Ceyhan (2020) for testing uniformity of 1D data.
Geometry Invariance of Arc Density and Domination Number: Let 𝒰(T(𝒴3)) be the uniform distribution on T(𝒴3). If F = 𝒰(T(𝒴3)), a composition of translation, rotation, reflections, and scaling, denoted ϕb(T(𝒴3)), will take any given triangle T(𝒴3) to the standard basic triangle Tb = T((0, 0), (1, 0), (c1, c2)) with 0 < c1 ≤ 1/2, c2 > 0, and (1 − c1)2 + c22 ≤ 1, preserving uniformity. That is, if X ∼ 𝒰(T(𝒴3)) is transformed in the same manner to, say X′ = ϕb(X), then we have X′ ∼ 𝒰(Tb). In fact this will hold for data from any distribution F up to scale. Furthermore, Tb can be transformed to the standard equilateral triangle Te = T(A, B, C) with vertices A = (0, 0), B = (1, 0), and $C=(1/2,\sqrt{3}/2)$ by a transformation ϕe and ϕe(X′) ∼ 𝒰(Te). That is uniform points in any triangle can be mapped to points uniformly distributed in the standard equilateral triangle by a composition of ϕb and ϕe (in the form ϕe ∘ ϕb).
The distribution of the domination number and arc density for AS-PCDs do not change for uniform data in a triangle T(𝒴3) when the data points are transformed to (uniform) points in the standard basic triangle Tb (using ϕb) but not when ϕe is applied to the uniform data in Tb. So, one can focus on Tb for computations and derivations regarding the domination number and arc density of AS-PCD for uniform data. On the other hand, the distribution of the domination number and arc density for PE- and CS-PCDs do not change for uniform data in a triangle T(𝒴3) when the data points are transformed to (uniform) points in the standard equilateral triangle Te (using ϕe ∘ ϕb). That is, distribution of these graph quantities are geometry invariant for uniform data in triangles. So, one can focus on Te for computations and derivations regarding the PE- and CS-PCD quantities for uniform data. A similar geometry invariance holds in 3D setting for uniform data in any tetrahedron being transformed to the standard regular tetrahedron for PE- and CS-PCDs. Therefore, the pcds package has functions customized only for one simplex (i.e., one interval in ℝ, one triangle in ℝ2, and one tetrahedron in ℝ3). However, we don’t cover these functions here, as (i) they serve as utility functions in the more realistic case of multiple simplices (e.g., multiple triangles occur when there are m ≥ 4 𝒴 points) and (ii) they are mainly used for simulations or verifications or illustrations when 𝒳 points are restricted to one simplex. See the vignette files “VS2.1 - Illustration of PCDs in One Triangle”, “VS2.2 - Illustration of PCDs in One Interval”, and “VS2.3 - Illustration of PCDs in One Tetrahedron”.