--- title: "VS2.2 - Illustration of PCDs in One Interval" author: "Elvan Ceyhan" date: '`r Sys.Date()` ' output: bookdown::html_document2: base_format: rmarkdown::html_vignette bibliography: References.bib vignette: > %\VignetteIndexEntry{VS2.2 - Illustration of PCDs in One Interval} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE,comment = "#>",fig.width=6, fig.height=4, fig.align = "center") ``` \newcommand{\X}{\mathcal{X}} \newcommand{\Y}{\mathcal{Y}} First we load the `pcds` package: ```{r setup, message=FALSE, results='hide'} library(pcds) ``` # Functions for PCDs with Vertices in One Interval For illustration of construction of PCDs and related quantities, we first focus on one Delaunay cell, which is an interval in the 1D setting. Due to *scale invariance* of PE- and CS-PCDs with vertices from uniform distribution in a bounded interval in $\mathbb R$, most data generation and computations can be done in the unit interval $[0,1]$. Here, we do not treat AS-PCDs separately, as it is a special case of PE-PCDs with expansion parameter $r=2$. Also, geometry invariance and scale invariance for uniform data on bounded intervals in 1D space are equivalent. Most of the PCD functions we will illustrate in this section are counterparts of the functions in the multiple-interval case in Section "VS1_3_1DArtiData" for an arbitrary interval and sometimes for the unit interval $(0,1)$ (for speed and ease of computation). For more detail on the construction of PCDs in the 1D setting, see @ceyhan:metrika-2012, @ceyhan:revstat-2016 and @ceyhan:stat-2020. ```{r } c<-.4 a<-0; b<-10; int<-c(a,b) n<-5 #try also n=10, 50, 100 ``` We first choose an arbitrary interval $[a,b]$ with vertices $a=$ `r a` and $b=$ `r b`, and generate $n=$ `r n` $\X$ points inside this interval using the function `runif` in base `R`, and to construct the vertex regions choose the centrality parameter $c=$ `r c` which corresponds to $M_c=$ `r a+c*(b-a)` in this interval. $\X$ points are denoted as `Xp` and $\Y$ points in Section "VS1_3_1DArtiData" correspond to the vertices $\{a,b\}$ (i.e., end points of the interval $[a,b]$. ```{r } xf<-(int[2]-int[1])*.1 set.seed(123) Xp<-runif(n,a-xf,b+xf) ``` Plot of the interval $[a,b]$ and the $\X$ points in it. ```{r oneint, fig.cap="Scatterplot of the uniform $X$ points in the interval $(0,10)$."} Xp2 =c(Xp,int) Xlim<-range(Xp2) Ylim<-.005*c(-1,1) xd<-Xlim[2]-Xlim[1] plot(Xp2,rep(0,n+2),xlab="x", ylab=" ",xlim=Xlim+xd*c(-.05,.05), yaxt='n', ylim=Ylim,pch=".",cex=3, main="X Points and One Interval (based on Y points)") abline(h=0,lty=2) #now, we add the intervals based on Y points par(new=TRUE) plotIntervals(Xp,int,xlab="",ylab="",main="") ``` Alternatively, we can use the `plotIntervals` function in `pcds` to obtain the same plot by executing `plotIntervals(Xp,int,xlab="",ylab="")` command. ## Functions for Proportional Edge PCDs with Vertices in One Interval We use the same interval `int` and centrality parameter `c=` `r c` and the generated data above. ```{r include=F} r<-1.5 ``` And we choose the expansion parameter $r=$ `r r` to illustrate the PE proximity region construction. The function `NPEint` is used for the construction of PE proximity regions taking the arguments `x,int,r,c` where - `x`, a 1D point for which PE proximity region is constructed, - `r`, a positive real number which serves as the expansion parameter in PE proximity region; must be $\ge 1$, - `c`, a positive real number in $(0,1)$ parameterizing the center inside `int`$=(a,b)$ with the default `c=.5`, For the interval, `int`$=(a,b)$, the parameterized center is $M_c=a+c(b-a)$. - `int`, a vector of two real numbers representing an interval. `NPEint` returns the proximity region as the end points of the interval proximity region. ```{r eval=F } r<-1.5 NPEint(7,int,r,c) #> [1] 5.5 10.0 NPEint(Xp[1],int,r,c) #> [1] 0.000000 3.676395 ``` Indicator for the presence of an arc from a (data or $\X$) point to another for PE-PCDs is the function `IarcPEint` which takes arguments `p1,p2,int,r,c` where - `p1` a 1D point whose PE proximity region is constructed, - `p2` another 1D point. The function determines whether `p2` is inside the PE proximity region of `p1` or not, and - `int,r,c` are as in `NPEint`. This function returns $I(x_2 \in N_{PE}(x_1,r,c))$ for $x_2$, that is, returns 1 if $x_2$ in $N_{PE}(x_1,r,c)$, 0 otherwise. One can use it for points in the data set or for arbitrary points (as if they were in the data set). ```{r eval=F} IarcPEint(7,7,int,r,c) #> [1] 1 IarcPEint(Xp[1],Xp[2],int,r,c) #> [1] 0 ``` Number of arcs of the PE-PCD can be computed by the function `num.arcsPEint` which is an object of class "`NumArcs`" and takes arguments `Xp,int,r,c` where - `Xp` is a set of 1D points which constitute the vertices of PE-PCD - `int,r,c` are as in `NPEint`. The output is as in `num.arcsPE1D`. ```{r eval=F} Narcs = num.arcsPEint(Xp,int,r,c) summary(Narcs) #> Call: #> num.arcsPEint(Xp = Xp, int = int, r = r, c = c) #> #> Description of the output: #> Number of Arcs of the CS-PCD with vertices Xp and Quantities Related to the Support Interval #> #> Number of data (Xp) points in the range of Yp (nontarget) points = 4 #> Number of data points in the partition intervals based on Yp points = 0 4 1 #> Number of arcs in the entire digraph = 2 #> Numbers of arcs in the induced subdigraphs in the partition intervals = 0 2 0 #> #> End points of the support interval: #> 0 10 #> Indices of data points in the intervals: #> left end interval: NA #> middle interval: 1 2 3 4 #> right end interval: 5 #> #plot(Narcs) ``` The incidence matrix of the PE-PCD for the one interval case can be found by `inci.matPEint`, using the `inci.matPEint(Xp,int,r,c)` command. It has the same arguments as `num.arcsPEint` function. Plot of the arcs in the digraph PE-PCD with vertices in or outside one interval is provided by the function `plotPEarcs.int` which has the arguments `Xp,int,r,c=.5,Jit=.1,main=NULL,xlab=NULL,ylab=NULL,xlim=NULL,ylim=NULL,center=FALSE, ...` where `int` is a vector of two 1D points constituting the end points of the interval and the other arguments are as in `plotPEarcs1D`. The digraph is based on the $M_c$-vertex regions with $M_c=$ `r a+c*(b-a)`. Arcs are jittered along the $y$-axis to avoid clutter on the real line for better visualization. Plot of the arcs of the above PE-PCD, together with the center (with the option `center=TRUE`) is provided below. ```{r 1dPEarcs2, fig.cap="The arcs of the PE-PCD for a 1D data set, the end points of the interval (red) and the center (green) are plotted with vertical dashed lines."} jit<-.1 set.seed(1) plotPEarcs.int(Xp,int,r=1.5,c=.3,jit,xlab="",ylab="",center=TRUE) ``` Plots of the PE proximity regions (or intervals) in the one-interval case is provided with the function `plotPEarcs.int`. The proximity regions are plotted with the center with `center=TRUE` option. ```{r 1dPEpr2, fig.cap="The PE proximity regions for 10 $X$ points on the real line, the end points of the interval (black) and the center (green) are plotted with vertical dashed lines."} set.seed(1) plotPEregs.int(Xp,int,r,c,xlab="x",ylab="",center = TRUE) ``` The function `arcsPEint` is an object of class "`PCDs`" with arguments are as in `num.arcsPEint` and the output list is as in the function `arcsPE1D`. The `plot` function returns the same plot as in `plotPEarcs.int`, hence we comment it out below. ```{r PEarcs1i, eval=F, fig.cap="Arcs of the PE-PCD for $X$ points in the interval $(0,10)$. Arcs are jittered along the $y$-axis for better visualization."} Arcs<-arcsPEint(Xp,int,r,c) Arcs #> Call: #> arcsPEint(Xp = Xp, int = int, r = r, c = c) #> #> Type: #> [1] "Proportional Edge Proximity Catch Digraph (PE-PCD) for 1D Points with Expansion Parameter r = 1.5 and Centrality Parameter c = 0.4" summary(Arcs) #> Call: #> arcsPEint(Xp = Xp, int = int, r = r, c = c) #> #> Type of the digraph: #> [1] "Proportional Edge Proximity Catch Digraph (PE-PCD) for 1D Points with Expansion Parameter r = 1.5 and Centrality Parameter c = 0.4" #> #> Vertices of the digraph = Xp #> Partition points of the region = int #> #> Selected tail (or source) points of the arcs in the digraph #> (first 6 or fewer are printed) #> [1] 8.459662 3.907723 #> #> Selected head (or end) points of the arcs in the digraph #> (first 6 or fewer are printed) #> [1] 9.596209 2.450930 #> #> Parameters of the digraph #> $`centrality parameter` #> [1] 0.4 #> #> $`expansion parameter` #> [1] 1.5 #> #> Various quantities of the digraph #> number of vertices number of partition points #> 5.0 2.0 #> number of intervals number of arcs #> 1.0 2.0 #> arc density #> 0.1 plot(Arcs) ``` ## Functions for Construction and Simulation of Central Similarity PCDs with Vertices in One Interval ```{r include=F } tau<-1.5 ``` We use the same interval `int` and centrality parameter `c=.4` and the generated data above, and choose the expansion parameter $\tau=$ `r tau` to illustrate the CS proximity region construction. The function `NCSint` is used for the construction of CS proximity regions taking the same arguments as `NPEint` with expansion parameter `r` being replace by `t` (which must be positive). In fact, most functions for CS PCDs in the 1D setting admit the same arguments as their counterparts for PE PCDs, with expansion parameter `r` being replace by `t`. `NCSint` returns the proximity region as the end points of the interval proximity region. ```{r eval=F} tau<-1.5 NCSint(Xp[3],int,tau,c) #> [1] 0 10 ``` Indicator for the presence of an arc from a (data or $\X$) point to another for CS-PCDs is the function `IarcCSint` which takes the same arguments as `IarcPEint` with expansion parameter `r` replaced with `t`. This function returns $I(x_2 \in N_{PE}(x_1,r,c))$ for $x_2$, that is, returns 1 if $x_2$ in $N_{PE}(x_1,r,c)$, 0 otherwise. One can use it for points in the data set or for arbitrary points (as if they were in the data set). ```{r eval=F} IarcCSint(Xp[1],Xp[2],int,tau,c) #try also IarcCSint(Xp[2],Xp[1],int,tau,c) #> [1] 0 ``` Number of arcs of the CS-PCD can be computed by the function `num.arcsCSint` which is an object of class "`NumArcs`" and takes the same arguments as `num.arcsPEint` with expansion parameter `r` replaced with `t`. The output of this function is as in `num.arcsCS1D`. ```{r eval=F} Narcs = num.arcsCSint(Xp,int,tau,c) summary(Narcs) #> Call: #> num.arcsCSint(Xp = Xp, int = int, t = tau, c = c) #> #> Description of the output: #> Number of Arcs of the CS-PCD with vertices Xp and Quantities Related to the Support Interval #> #> Number of data (Xp) points in the range of Yp (nontarget) points = 4 #> Number of data points in the partition intervals based on Yp points = 0 4 1 #> Number of arcs in the entire digraph = 5 #> Numbers of arcs in the induced subdigraphs in the partition intervals = 0 5 0 #> #> End points of the support interval: #> 0 10 #> Indices of data points in the intervals: #> left end interval: NA #> middle interval: 1 2 3 4 #> right end interval: 5 #> #plot(Narcs) ``` The incidence matrix of the CS-PCD for the one interval case can be found by `inci.matCSint`, using the `inci.matCSint(Xp,int,tau,c)` command. It has the same arguments as `num.arcsCSint` function. Plot of the arcs in the digraph CS-PCD with vertices in or outside one interval is provided by the function `plotCSarcs.int` which has the same arguments as `plotPEarcs.int` with expansion parameter `r` replaced with `t`. The digraph is based on the $M_c$-vertex regions with $M_c$=`r a+c*(b-a)`. Arcs are jittered along the $y$-axis to avoid clutter on the real line for better visualization. Plot of the arcs of the above CS-PCD, together with the center (with the option `center=TRUE`) is provided below. ```{r 1dCSarcs2, fig.cap="The arcs of the CS-PCD for a 1D data set, the end points of the interval (red) and the center (green) are plotted with vertical dashed lines."} set.seed(1) plotCSarcs.int(Xp,int,t=1.5,c=.3,jit,xlab="",ylab="",center=TRUE) ``` Plots of the CS proximity regions (or intervals) in the one-interval case is provided with the function `plotCSarcs.int`. The proximity regions are plotted with the center with `center=TRUE` option. ```{r 1dCSpr2, fig.cap="The CS proximity regions for 10 $X$ points on the real line, the end points of the interval (black) and the center (green) are plotted with vertical dashed lines."} set.seed(1) plotCSregs.int(Xp,int,tau,c,xlab="x",ylab="",center=TRUE) ``` The function `arcsPEint` is an object of class "`PCDs`" with arguments are as in `num.arcsCSint` and the output list is as in the function `arcsCS1D`. The `plot` function returns the same plot as in `plotCSarcs.int`, hence we comment it out below. ```{r CSarcs1i, eval=F, fig.cap="Arcs of the CS-PCD for points in the interval $(0,10)$. Arcs are jittered along the $y$-axis for better visualization."} Arcs<-arcsCSint(Xp,int,tau,c) Arcs #> Call: #> arcsCSint(Xp = Xp, int = int, t = tau, c = c) #> #> Type: #> [1] "Central Similarity Proximity Catch Digraph (CS-PCD) for 1D Points with Expansion Parameter t = 1.5 and Centrality Parameter c = 0.4" summary(Arcs) #> Call: #> arcsCSint(Xp = Xp, int = int, t = tau, c = c) #> #> Type of the digraph: #> [1] "Central Similarity Proximity Catch Digraph (CS-PCD) for 1D Points with Expansion Parameter t = 1.5 and Centrality Parameter c = 0.4" #> #> Vertices of the digraph = Xp #> Partition points of the region = int #> #> Selected tail (or source) points of the arcs in the digraph #> (first 6 or fewer are printed) #> [1] 2.450930 8.459662 3.907723 3.907723 3.907723 #> #> Selected head (or end) points of the arcs in the digraph #> (first 6 or fewer are printed) #> [1] 3.907723 9.596209 2.450930 8.459662 9.596209 #> #> Parameters of the digraph #> $`centrality parameter` #> [1] 0.4 #> #> $`expansion parameter` #> [1] 1.5 #> #> Various quantities of the digraph #> number of vertices number of partition points #> 5.00 2.00 #> number of intervals number of arcs #> 1.00 5.00 #> arc density #> 0.25 plot(Arcs) ``` ## Auxiliary Functions to Define Proximity Regions for Points in an Interval PCDs in the 1D setting are constructed using the vertex regions based on a *center* or *central point* inside the support interval(s). We illustrate a center or central point in an interval. The function `centerMc` takes arguments `int,c` and returns the central point $M_c=a+c(b-a)$ in the interval $[a,b]$. On the other hand, the function `centersMc` takes arguments `Yp,c` where `Yp` is a vector real numbers that constitute the end points of intervals. It returns the central points of intervals based on the order statistics of a set 1D points `Yp`. ```{r eval=F} c<-.4 #try also c<-runif(1) a<-0; b<-10 int = c(a,b) centerMc(int,c) #> [1] 4 n<-5 #try also n=10, 50, 100 y<-runif(n) centersMc(y,c) #> [1] 0.2887174 0.6417875 0.7558345 0.9169039 ``` The function `rel.vert.mid.int` takes arguments `p,int,c` where - `p`, a 1D point for which the vertex region is to be found and - `int,c` are as in `centerMc`. This function returns the index of the vertex region in a middle interval that contains a given point. ```{r eval=F} c<-.4 a<-0; b<-10; int = c(a,b) rel.vert.mid.int(6,int,c) #> $rv #> [1] 2 #> #> $int #> vertex 1 vertex 2 #> 0 10 ``` ```{r include=F} n<-5 #try also n=10, 50, 100 ``` We illustrate the vertex regions using the following code. We annotate the vertices with corresponding indices with `rv=i` for $i=1,2$, and also generate $n=$ `r n` points inside the interval $[$`r a` , `r b` $]$ and provide the scatterplot of these points (jittered along the $y$-axis for better visualization) labeled according to the vertex region they reside in. Type also `? rel.vert.mid.int` for the code to generate this figure. ```{r 1DVR, eval=F, fig.cap="$M_c$-Vertex regions in the interval $(0,10)$. Also plotted are the $X$ points which are labeled according to the vertex region they reside in.", echo=FALSE} Mc<-centerMc(int,c) n<-10 #try also n<-20 xr<-range(a,b,Mc) xf<-(int[2]-int[1])*.5 Xp<-runif(n,a,b) Rv<-vector() for (i in 1:n) Rv<-c(Rv,rel.vert.mid.int(Xp[i],int,c)$rv) #Rv jit<-.1 yjit<-runif(n,-jit,jit) Xlim<-range(a,b,Xp) xd<-Xlim[2]-Xlim[1] plot(cbind(Mc,0),main="Vertex region indices for the X points", xlab=" ", ylab=" ", xlim=Xlim+xd*c(-.05,.05),ylim=3*range(yjit),pch=".",cex=3) abline(h=0) points(Xp,yjit,pch=".",cex=3) abline(v=c(a,b,Mc),lty=2,col=c(1,1,2)) text(Xp,yjit,labels=factor(Rv)) text(cbind(c(a,b,Mc),.02),c("rv=1","rv=2","Mc")) ``` The function `rel.vert.end.int` takes arguments `p,int` for the point whose vertex region is to be found and the support interval. It returns the index of the end interval that contains a given point, 1 for left end interval 2 for right end interval. ```{r eval=F} a<-0; b<-10; int<-c(a,b) rel.vert.end.int(-6,int) #> $rv #> [1] 1 #> #> $int #> vertex 1 vertex 2 #> 0 10 rel.vert.end.int(16,int) #> $rv #> [1] 2 #> #> $int #> vertex 1 vertex 2 #> 0 10 ``` # References