First we load the pcds package:

library(pcds)

Introduction

pcds is an R package that stands for proximity catch digraphs (PCDs) and provides construction and visualization of the PCDs, and the spatial pattern tests (for inference on spatial interaction between classes or species) based on the two graph invariants of the PCDs. These invariants are the domination number and the arc density. The package provides a set of functions for the construction and visualization of three PCD families, namely arc-slice PCDs (AS-PCDs), proportional-edge PCDs (PE-PCDs), and central similarity PCDs (CS-PCDs), and for spatial inference based on two of these PCD families, namely PE- and CS-PCDs. Here, the spatial inference concerns testing class/species interaction for point pattern data, usually in two dimensional space. The spatial interaction patterns of interest are segregation and association. Segregation is the pattern in which classes tend to repel each other in the sense that points tend to be clustered around points from the same class (forming same-class clusters), while association is the pattern in which points from one class tend to cluster around points from another class (forming mixed-class clusters).
The one-dimensional versions of the PCD functions are also provided where AS-PCD is a special case of PE-PCDs or CS-PCDs. The one-dimensional versions are currently used for testing uniformity of points, instead of interaction between classes (although it is possible to use them for this purpose as well). We only extend the PE-PCD construction and visualization to three dimensions in this package.

The vignette files for the pcds package are written as sections or chapters of a larger main vignette file and are organized as “VSk_Title” where “k” is the section number and “Title” is the corresponding markdown file name, starting with “VS0_Intro”. It is recommended to read the vignettes in this order for more efficient and informed use, however, they can be used in any order the users/readers prefer, but some tools or concepts may have been introduced in an earlier section (mostly with references).

The goal of these vignette files or sections is to facilitate graph abstraction of spatial point data and make it easier for users to adopt pcds by providing a comprehensive overview of the package’s contents and detailed examples and illustration of certain functions. We begin with the

introduction and overview of PCDs,
followed by worked out examples for an artificial 2D data, a real-life 2D data, and an artificial 1D data with discussions of how to work with pcds.

Then we, illustrate

PCD construction in one Delaunay cell, which is an interval in 1D, a triangle in 2D, and a tetrahedron in 3D space,
data generation from various spatial point patterns in 2D space,
finding local extrema in Delaunay cells, and
finally some auxiliary functions in Euclidean geometry.

The discussion covers the structure of function arguments, the required input data formats, and the various output formats. The subsequent sections provide visualization of the proximity regions, and the associated PCDs, and computation of domination number and arc-density of the PCDs, and the computation of the large-sample tests based on these invariants.

Package Contents

For ease of exposition, we have grouped the package contents according their functionality and theme:

Utility functions
PCD functions
Pattern Generation Functions
S3 methods¹
Datasets

Because all of these groups contain many functions, we organize them into subgroups by purpose. Below, we display each group of functions in a table with one column per subgroup.

Package Contents: Grouping of the Functions
Utility	PCDs	Pattern Generation	S3 Methods
AuxDomination	ArcSliceFunctions	PatternGen	ClassFunctions
AuxGeometry	PropEdge*
AuxDelaunay C	entSim*
AuxExtrema

PropEdge* contains PropEdge1D, PropEdge2D, and PropEdge3D functions, whereas CentSim* contains CentSim1D and CentSim2D functions only.

ClassFunctions contain functions like summary, print.summary, and plot of the following object classes: Lines,TriLines, Lines3D, Planes, Patterns, Uniform, Extrema, and PCDs. Among these objects, Lines, TriLines, Lines3D, and Planes facilitate visualization of the various geometric structures, proximity regions, and the corresponding digraphs, while Patterns is used for spatial point pattern generation (with Uniform being a special case), and PCDs pertains to the actual PCDs (the number of arcs, visualization of the digraph, and so on).

In all the pcds functions, points are vectors, and data sets are either matrices or data frames.

Proximity Catch Digraphs

We illustrate PCDs in a two-class setting, extension to multi-class setting can be done in a pair-wise fashion or one-vs-rest fashion for the classes. For two classes, 𝒳 and 𝒴, of points, let 𝒳 be the class of interest (i.e. the target class) and 𝒴 be the reference class (i.e. the non-target class) and 𝒳_n and 𝒴_m be samples of size n and m from classes 𝒳 and 𝒴, respectively. The proximity map N(⋅) : Ω → 2^Ω associates with each point x ∈ 𝒳, a proximity region N(x) ⊂ Ω. Consider the data-random (or vertex-random) proximity catch digraph (PCD) D = (V, A) with vertex set V = 𝒳 and arc set A defined by (u, v) ∈ A ⇔ {u, v} ⊂ 𝒳 and v ∈ N(u). The digraph D depends on the (joint) distribution of 𝒳 and on the map N(⋅). The adjective proximity — for the digraph D and for the map N(⋅) — comes from thinking of the region N(x) as representing those points in Ω “close” to x. The binary relation u ∼ v, which is defined as v ∈ N(u), is asymmetric, thus the adjacency of u and v is represented with directed edges or arcs which yield a digraph instead of a graph. See Chartrand, Lesniak, and Zhang (2010) and West (2001) for more on graphs and digraphs.

In the PCD approach the points correspond to observations from class 𝒳 and the proximity regions are defined to be (closed) regions (usually convex regions or simply triangles) based on class 𝒳 and 𝒴 points and the proximity region for a class 𝒳 point, x, gets larger as the distance between x and class 𝒴 points increases.

Proximity Map Families

We briefly define three proximity map families. Let Ω = ℝ^d and let 𝒴_m = {y₁, y₂, …, y_m} be m points in general position in ℝ^d and T_i be the i^th Delaunay cell for i = 1, 2, …, J_m, where J_m is the number of Delaunay cells based on 𝒴_m. Let 𝒳_n be a set of iid random variables from distribution F in ℝ^d with support 𝒮(F) ⊆ 𝒞_H(𝒴_m) where 𝒞_H(𝒴_m) stands for the convex hull of 𝒴_m. For illustrative purposes, we focus on ℝ² where a Delaunay tessellation is a triangulation, provided that no more than three 𝒴 points are cocircular (i.e., lie on the same circle). Furthermore, for simplicity, let 𝒴₃ = {y₁, y₂, y₃} be three non-collinear points in ℝ² and T(𝒴₃) = T(y₁, y₂, y₃) be the triangle with vertices 𝒴₃. Let 𝒳_n be a set of iid random variables from F with support 𝒮(F) ⊆ T(𝒴₃).

Arc-Slice Proximity Maps and Associated Proximity Regions

We define the arc-slice proximity region with M-vertex regions for a point x ∈ T(𝒴₃) as follows; see also Figure @ref(fig:NAS-example). Using a center M of T(𝒴₃), we partition T(𝒴₃) into “vertex regions” R_V(y₁), R_V(y₂), and R_V(y₃). If M is the circumcenter of T(𝒴₃), we use perpendicular line segments from M to the opposite edges to form the vertex regions. If M is not the circumcenter but is in the interior of T(𝒴₃), we use line segments from M to the opposite edges as extensions of the lines joining the vertices and M to form the vertex regions. For x ∈ T(𝒴₃) \ 𝒴₃, let v(x) ∈ 𝒴₃ be the vertex in whose region x falls, so x ∈ R_V(v(x)). If x falls on the boundary of two vertex regions, we assign v(x) arbitrarily to one of the adjacent regions. The arc-slice proximity region is $N_{AS}(x):=\overline B(x,r(x)) \cap T(\mathcal{Y}_3)$ where $\overline B(x,r(x))$ is the closed ball centered at x with radius r(x) := d(x, v(x)). To make the dependence on M explicit, we also use the notationN_AS(⋅, M). A natural choice for the radius is r(x) := min_{y ∈ 𝒴}d(x, y) which implicitly uses the CC-vertex regions, since x ∈ R_CC(y) iff y = arg min_{u ∈ 𝒴}d(x, u). See Figure @ref(fig:NAS-example) for N_AS(x, M_CC) with x ∈ R_CC(y₂) and Ceyhan (2010) for more detail on AS proximity regions.

$(#fig:NAS-example) Illustration of the construction of arc-slice proximity region, N_{AS}(x,M_{CC}) with an x \in R_{CC}(\mathsf{y}_2).$

(#fig:NAS-example) Illustration of the construction of arc-slice proximity region, N_AS(x, M_CC) with an x ∈ R_CC(y₂).

Proportional-Edge Proximity Maps and Associated Proximity Regions

We define the proportional-edge proximity map with expansion parameter r ≥ 1 as follows; see also Figure @ref(fig:NPE-example). Using a center M of T(𝒴₃), we partition T(𝒴₃) into “M-vertex regions” as in Section @ref(sec:PE-PCD-construction). Let e(x) be the edge of T(𝒴₃) opposite v(x), the vertex whose region contains x, ℓ(x) be the line parallel to e(x) through x, and d(v(x), ℓ(x)) be the Euclidean distance from v(x) to ℓ(x). For r ∈ [1, ∞), let ℓ_r(x) be the line parallel to e(x) such that d(v(x), ℓ_r(x)) = rd(v(x), ℓ(x)) and d(ℓ(x), ℓ_r(x)) < d(v(x), ℓ_r(x)). Let T_PE(x, r) be the triangle similar to and with the same orientation as T(𝒴₃) having v(x) as a vertex and ℓ_r(x) as the opposite edge. Then the proportional-edge proximity region N_PE(x, r) is defined to be T_PE(x, r) ∩ T(𝒴₃). To make the dependence on M explicit, we also use the notationN_PE(x, r, M). A natural choice for the center is the center of mass (CM) yielding the CM-vertex regions, which have the same area (equaling one-third of the area of T(𝒴₃)). Notice that r ≥ 1 implies x ∈ N_PE(x, r). Note also that lim_{r → ∞}N_PE(x, r) = T(𝒴₃) for all x ∈ T(𝒴₃) \ 𝒴₃, so we define N_PE(x, ∞) = T(𝒴₃) for all such x. For x ∈ 𝒴₃, we define N_PE(x, r) = {x} for all r ∈ [1, ∞]. See Ceyhan and Priebe (2005), Ceyhan, Priebe, and Wierman (2006), and Ceyhan (2014) for more detail.

$(#fig:NPE-example) Illustration of the construction of proportional-edge proximity region, N_{PE}(x,r=2) (shaded region) for an x \in R_V(\mathsf{y}_1) where d_1=d(v(x),\ell(v(x),x)) and d_2=d(v(x),\ell_2(v(x),x))=2\,d(v(x),\ell(v(x),x)).$

(#fig:NPE-example) Illustration of the construction of proportional-edge proximity region, N_PE(x, r = 2) (shaded region) for an x ∈ R_V(y₁) where d₁ = d(v(x), ℓ(v(x), x)) and d₂ = d(v(x), ℓ₂(v(x), x)) = 2 d(v(x), ℓ(v(x), x)).

Central Similarity Proximity Maps and Associated Proximity Regions

We define the central similarity proximity map with expansion parameter τ > 0 as follows; see also Figure @ref(fig:NCS-example). Let e_j be the edge opposite vertex y_j for j = 1, 2, 3, and let “M-edge regions” R_E(e₁), R_E(e₂), R_E(e₃) partition T(𝒴₃) using line segments from the center M in the interior of T(𝒴₃) to the vertices. For x ∈ (T(𝒴₃))^o, let e(x) be the edge in whose region x falls; x ∈ R_E(e(x)). If x falls on the boundary of two edge regions we assign e(x) arbitrarily. For τ > 0, the central similarity proximity region N_CS(x, τ) is defined to be the triangle T_CS(x, τ) ∩ T(𝒴₃) with the following properties:

For τ ∈ (0, 1], the triangle T_CS(x, τ) has an edge e_τ(x) parallel to e(x) such that d(x, e_τ(x)) = τ d(x, e(x)) and d(e_τ(x), e(x)) ≤ d(x, e(x)) and for τ > 1, d(e_τ(x), e(x)) < d(x, e_τ(x)) where d(x, e(x)) is the Euclidean distance from x to e(x),
the triangle T_CS(x, τ) has the same orientation as and is similar to T(𝒴₃),
the point x is at the center of mass of T_CS(x, τ).

Note that (i) implies that the expansion parameter is τ, (ii) implies “similarity”, and (iii) implies “central” in the name, (parameterized) central similarity proximity map. To make the dependence on M explicit, we also use the notationN_CS(x, τ, M). A natural choice for the center is the center of mass (CM) yielding the CM-edge regions, which have the same area (equaling one-third of the area of T(𝒴₃)). Notice that τ > 0 implies that x ∈ N_CS(x, τ) and, by construction, we have N_CS(x, τ) ⊆ T(𝒴₃) for all x ∈ T(𝒴₃). For x ∈ ∂(T(𝒴₃)) and τ ∈ (0, ∞], we define N_CS(x, τ) = {x}. For all x ∈ T(𝒴₃)^o the edges e_τ(x) and e(x) are coincident iff τ = 1. Note also that lim_{τ → ∞}N_CS(x, τ) = T(𝒴₃) for all x ∈ (T(𝒴₃))^o, so we define N_CS(x, ∞) = T(𝒴₃) for all such x. The central similarity proximity maps in Ceyhan and Priebe (2003) and Ceyhan, Priebe, and Marchette (2007) are N_CS(⋅, τ) with τ = 1 and τ ∈ (0, 1], respectively, and in Ceyhan (2014) with τ > 1.

$(#fig:NCS-example) Illustration of the construction of central similarity proximity region, N_{CS}(x,\tau=1/2) (shaded region) for an x \in R_E(e_3) where h_2=d(x,e_3^\tau(x))=\frac{1}{2}\,d(x,e(x)) and h_1=d(x,e(x)).$

(#fig:NCS-example) Illustration of the construction of central similarity proximity region, N_CS(x, τ = 1/2) (shaded region) for an x ∈ R_E(e₃) where $h_2=d(x,e_3^\tau(x))=\frac{1}{2}\,d(x,e(x))$ and h₁ = d(x, e(x)).

Delaunay Tessellation

The convex hull of the non-target class C_H(𝒴_m) can be partitioned into Delaunay cells through the Delaunay tessellation of 𝒴_m ⊂ ℝ^d. The Delaunay tessellation becomes a triangulation in ℝ² which partitions C_H(𝒴_m) into non-intersecting triangles. For 𝒴 points in general position, the triangles in the Delaunay triangulation satisfy the property that the circumcircle of a triangle contain no 𝒴 points except for the vertices of the triangle. In higher dimensions (i.e., d > 2), Delaunay cells are d-simplices (for example, a tetrahedron in ℝ³). Hence, the C_H(𝒴_m) is the union of a set of disjoint d-simplices {𝔖_k}_k = 1^K where K is the number of d-simplices, or Delaunay cells. Each d-simplex has d + 1 non-co(hyper)planar vertices where none of the remaining points of 𝒴_m are in the interior of the circumsphere of the simplex (except for the vertices of the simplex which are points from 𝒴_m). Hence, simplices of the Delaunay tessellations are more likely to be acute (simplices will not have substantially large inner angles). Note that Delaunay tessellation is the dual of the Voronoi diagram of the points 𝒴_m. A Voronoi diagram is a partitioning of ℝ^d into convex polytopes such that the points inside each polytope is closer to the point associated with the polytope than any other point in 𝒴. Hence, a polytope V(y) associated with a point y ∈ 𝒴_m is defined as V(y) = {v ∈ ℝ^d : ∥v − y∥ ≤ ∥v − z∥ for all z ∈ 𝒴_m \ {y}}.

Here, ∥⋅∥ stands for the usual Euclidean norm. Observe that the Voronoi diagram is unique for a fixed set of points 𝒴_m. A Delaunay graph is constructed by joining the pairs of points in 𝒴_m whose boundaries of Voronoi polytopes have nonempty intersections. The edges of the Delaunay graph constitute a partitioning of C_H(𝒴_m) providing the Delaunay tessellation. By the uniqueness of the Voronoi diagram, the Delaunay tessellation is also unique (except for cases where d + 1 or more points lying on the same hypersphere). Run the below code for an illustration of Delaunay triangulation of 20 uniform 𝒴 points in the unit square (0, 1) × (0, 1). More detail on Voronoi diagrams and Delaunay tessellations can be found in Okabe et al. (2000).

ny<-20; 

set.seed(1)
#Xp<-cbind(runif(nx),runif(nx))
Yp<-cbind(runif(ny),runif(ny))

#oldpar <- par(no.readonly = TRUE)
plotDelaunay.tri(Yp,Yp,xlab="",ylab="",main="Delaunay Triangulation of Y points")

Graph Invariants: Arc Density and Domination Number

Arc Density: The arc density of a digraph D = (V, A) of order |V| = n, denoted ρ(D), is defined as $$ \rho(D) = \frac{|A|}{n(n-1)} $$ where |⋅| stands for set cardinality (Janson, Łuczak, and Ruciński (2000)). Thus ρ(D) represents the ratio of the number of arcs in the digraph D to the number of arcs in the complete symmetric digraph of order n, which has n(n − 1) arcs.

If $X_1,X_2,\ldots,X_n \stackrel{iid}{\sim} F$, then the relative density of the associated data-random PCD, denoted ρ(𝒳_n; h, N), is a U-statistic, $$\rho(\mathcal{X}_n;h,N) = \frac{1}{n(n-1)} \underset{i<j}{\sum\sum}h(X_i,X_j;N) $$

where

with I{⋅} being the indicator function. We denote h(X_i, X_j; N) as h_ij for brevity of notation. Since the digraph is asymmetric, h_ij is defined as the number of arcs in D between vertices X_i and X_j, in order to produce a symmetric kernel with finite variance (Lehmann (2004)).

See Ceyhan, Priebe, and Wierman (2006), Ceyhan, Priebe, and Marchette (2007), and Ceyhan (2014) for arc density of PE-PCDs and its use for spatial interaction for 2D data; and Ceyhan (2012) and Ceyhan (2016) for arc density of PE-PCDs for 1D data and its use for testing uniformity.

Domination Number: In a digraph D = (V, A), a vertex v ∈ V dominates itself and all vertices of the form {u : (v, u) ∈ A}. A dominating set S_D for the digraph D is a subset of V such that each vertex v ∈ V is dominated by a vertex in S_D. A minimum dominating set S_D^* is a dominating set of minimum cardinality and the domination number γ(D) is defined as γ(D) := |S_D^*| (see, e.g., Lee (1998)). If a minimum dominating set is of size one, we call it a dominating point. Note that for |V| = n > 0, 1 ≤ γ(D) ≤ n, since V itself is always a dominating set. See Ceyhan and Priebe (2005) and Ceyhan (2011) for the domination number and its use for testing spatial interaction patterns in 2D data and Ceyhan (2020) for testing uniformity of 1D data.

Geometry Invariance of Arc Density and Domination Number: Let 𝒰(T(𝒴₃)) be the uniform distribution on T(𝒴₃). If F = 𝒰(T(𝒴₃)), a composition of translation, rotation, reflections, and scaling, denoted ϕ_b(T(𝒴₃)), will take any given triangle T(𝒴₃) to the standard basic triangle T_b = T((0, 0), (1, 0), (c₁, c₂)) with 0 < c₁ ≤ 1/2, c₂ > 0, and (1 − c₁)² + c₂² ≤ 1, preserving uniformity. That is, if X ∼ 𝒰(T(𝒴₃)) is transformed in the same manner to, say X′ = ϕ_b(X), then we have X′ ∼ 𝒰(T_b). In fact this will hold for data from any distribution F up to scale. Furthermore, T_b can be transformed to the standard equilateral triangle T_e = T(A, B, C) with vertices A = (0, 0), B = (1, 0), and $C=(1/2,\sqrt{3}/2)$ by a transformation ϕ_e and ϕ_e(X′) ∼ 𝒰(T_e). That is uniform points in any triangle can be mapped to points uniformly distributed in the standard equilateral triangle by a composition of ϕ_b and ϕ_e (in the form ϕ_e ∘ ϕ_b).

The distribution of the domination number and arc density for AS-PCDs do not change for uniform data in a triangle T(𝒴₃) when the data points are transformed to (uniform) points in the standard basic triangle T_b (using ϕ_b) but not when ϕ_e is applied to the uniform data in T_b. So, one can focus on T_b for computations and derivations regarding the domination number and arc density of AS-PCD for uniform data. On the other hand, the distribution of the domination number and arc density for PE- and CS-PCDs do not change for uniform data in a triangle T(𝒴₃) when the data points are transformed to (uniform) points in the standard equilateral triangle T_e (using ϕ_e ∘ ϕ_b). That is, distribution of these graph quantities are geometry invariant for uniform data in triangles. So, one can focus on T_e for computations and derivations regarding the PE- and CS-PCD quantities for uniform data. A similar geometry invariance holds in 3D setting for uniform data in any tetrahedron being transformed to the standard regular tetrahedron for PE- and CS-PCDs. Therefore, the pcds package has functions customized only for one simplex (i.e., one interval in ℝ, one triangle in ℝ², and one tetrahedron in ℝ³). However, we don’t cover these functions here, as (i) they serve as utility functions in the more realistic case of multiple simplices (e.g., multiple triangles occur when there are m ≥ 4 𝒴 points) and (ii) they are mainly used for simulations or verifications or illustrations when 𝒳 points are restricted to one simplex. See the vignette files “VS2.1 - Illustration of PCDs in One Triangle”, “VS2.2 - Illustration of PCDs in One Interval”, and “VS2.3 - Illustration of PCDs in One Tetrahedron”.

References

Ceyhan, E. 2010. “Extension of One-Dimensional Proximity Regions to Higher Dimensions.” Computational Geometry: Theory and Applications 43(9): 721–48.

———. 2011. “Spatial Clustering Tests Based on Domination Number of a New Random Digraph Family.” Communications in Statistics - Theory and Methods 40(8): 1363–95.

———. 2012. “The Distribution of the Relative Arc Density of a Family of Interval Catch Digraph Based on Uniform Data.” Metrika 75(6): 761–93.

———. 2014. “Comparison of Relative Density of Two Random Geometric Digraph Families in Testing Spatial Clustering.” TEST 23(1): 100–134.

———. 2016. “Density of a Random Interval Catch Digraph Family and Its Use for Testing Uniformity.” REVSTAT 14(4): 349–94.

———. 2020. “Domination Number of an Interval Catch Digraph Family and Its Use for Testing Uniformity.” Statistics 54(2): 310–39.

Ceyhan, E., and C. E. Priebe. 2003. “Central Similarity Proximity Maps in Delaunay Tessellations.” In Proceedings of the Joint Statistical Meeting, Statistical Computing Section, American Statistical Association.

———. 2005. “The Use of Domination Number of a Random Proximity Catch Digraph for Testing Spatial Patterns of Segregation and Association.” Statistics & Probability Letters 73(1): 37–50.

Ceyhan, E., C. E. Priebe, and D. J. Marchette. 2007. “A New Family of Random Graphs for Testing Spatial Segregation.” Canadian Journal of Statistics 35(1): 27–50.

Ceyhan, E., C. E. Priebe, and J. C. Wierman. 2006. “Relative Density of the Random r-Factor Proximity Catch Digraphs for Testing Spatial Patterns of Segregation and Association.” Computational Statistics & Data Analysis 50(8): 1925–64.

Chartrand, G., L. Lesniak, and P. Zhang. 2010. Graphs & Digraphs. Chapman; Hall/CRC 5th Edition, Boca Raton, Florida.

Janson, S., T. Łuczak, and A. Ruciński. 2000. Random Graphs. Wiley-Interscience Series in Discrete Mathematics; Optimization, John Wiley & Sons, Inc., New York.

Lee, C. 1998. “Domination in Digraphs.” Journal of Korean Mathematical Society 4: 843–53.

Lehmann, E. L. 2004. Elements of Large Sample Theory. Springer, New York.

Okabe, A., B. Boots, K. Sugihara, and S. N. Chiu. 2000. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. Wiley, New York.

West, D. B. 2001. Introduction to Graph Theory, 2^nd Edition. Prentice Hall, NJ.

For example, calling the generic function summary(x) on an object x with class PCDs actually dispatches the method summary.PCDs on x, which is equivalent to calling summary.PCDs(x). For a nice introduction, see Advanced R by Hadley Wickham, available online.↩︎

VS0 - Introduction to pcds