Title: | Nearest Neighbor Methods for Spatial Patterns |
---|---|
Description: | Contains the functions for testing the spatial patterns (of segregation, spatial symmetry, association, disease clustering, species correspondence and reflexivity) based on nearest neighbor relations, especially using contingency tables such as nearest neighbor contingency tables (Ceyhan (2010) <doi:10.1007/s10651-008-0104-x> and Ceyhan (2017) <doi:10.1016/j.jkss.2016.10.002> and references therein), nearest neighbor symmetry contingency tables (Ceyhan (2014) <doi:10.1155/2014/698296>), species correspondence contingency tables and reflexivity contingency tables (Ceyhan (2018) <doi:10.2436/20.8080.02.72>) for two (or higher) dimensional data. Also contains functions for generating patterns of segregation, association, uniformity in a multi-class setting (Ceyhan (2014) <doi:10.1007/s00477-013-0824-9>), and various non-random labeling patterns for disease clustering in two dimensional cases (Ceyhan (2014) <doi:10.1002/sim.6053>), and for visualization of all these patterns for the two dimensional data. The tests are usually (asymptotic) normal z-tests and chi-square tests. |
Authors: | Elvan Ceyhan |
Maintainer: | Elvan Ceyhan <[email protected]> |
License: | GPL-2 |
Version: | 0.1.1 |
Built: | 2024-11-05 04:12:51 UTC |
Source: | https://github.com/elvanceyhan/nnspat |
.onAttach start message
.onAttach(libname, pkgname)
.onAttach(libname, pkgname)
libname |
defunct |
pkgname |
defunct |
invisible()
.onLoad getOption package settings
.onLoad(libname, pkgname)
.onLoad(libname, pkgname)
libname |
defunct |
pkgname |
defunct |
invisible()
getOption("nnspat.name")
getOption("nnspat.name")
This function computes the matrix useful in calculations for Tango's test
for spatial (disease) clustering (see Eqn (2) of Tango (2007).
Here,
is any matrix of a measure of the closeness between two points
and
with
for all
, and
denotes the unknown parameter vector related
to cluster size and
, where
if
is a case and 0
otherwise.
The test is then
where .
becomes Cuzick and Edwards
tests statistic (Cuzick and Edwards (1990)),
if
if
is among the
k
NNs of and 0 otherwise.
In this case
and
aij.theta
becomes aij.mat
(more specifically,
aij.mat(dat,k)
and aij.theta(dat,k,model="NN")
.
In Tango's exponential clinal model (Tango (2000)),
if
and 0 otherwise,
where
is a predetermined scale of cluster such that any pair of cases far apart beyond the distance
cannot be considered as a cluster and
denote the Euclidean distance between
two points
and
.
In the exponential model (Tango (2007)),
if
and 0 otherwise,
where
and
are as above.
In the hot-spot model (Tango (2007)),
if
and
and 0 otherwise,
where
and
are as above.
The argument model
has four options, NN
, exp.clinal
, exponential
, and
hot.spot
, with exp.clinal
being the default.
And the theta
argument specifies the scale of clustering or the clustering parameter in the particular
spatial disease clustering model.
See also (Tango (2007)) and the references therein.
aij.theta(dat, theta, model = "exp.clinal", ...)
aij.theta(dat, theta, model = "exp.clinal", ...)
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
theta |
A predetermined cluster scale so that any pair of cases farther apart then the distance
|
model |
Type of Tango's spatial clustering model with four options:
|
... |
are for further arguments, such as |
The matrix useful in calculations for Tango's test
.
Elvan Ceyhan
Cuzick J, Edwards R (1990).
“Spatial clustering for inhomogeneous populations (with discussion).”
Journal of the Royal Statistical Society, Series B, 52, 73-104.
Tango T (2000).
“A test for spatial disease clustering adjusted for multiple testing.”
Statistics in Medicine, 19, 191-204.
Tango T (2007).
“A class of multiplicity adjusted tests for spatial clustering based on case-control point data.”
Biometrics, 63, 119-127.
aij.mat
, aij.nonzero
and ceTk
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) k<-3#1 #try also 2,3 #aij for CE's Tk Aij<-aij.theta(Y,k,model = "NN") Aij2<-aij.mat(Y,k) sum(abs(Aij-Aij2)) #check equivalence of aij.theta and aij.mat with model="NN" Aij<-aij.theta(Y,k,method="max") Aij2<-aij.mat(Y,k) range(Aij-Aij2) theta=.2 aij.theta(Y,theta,model = "exp.clinal") aij.theta(Y,theta,model = "exponential") aij.theta(Y,theta,model = "hot.spot")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) k<-3#1 #try also 2,3 #aij for CE's Tk Aij<-aij.theta(Y,k,model = "NN") Aij2<-aij.mat(Y,k) sum(abs(Aij-Aij2)) #check equivalence of aij.theta and aij.mat with model="NN" Aij<-aij.theta(Y,k,method="max") Aij2<-aij.mat(Y,k) range(Aij-Aij2) theta=.2 aij.theta(Y,theta,model = "exp.clinal") aij.theta(Y,theta,model = "exponential") aij.theta(Y,theta,model = "hot.spot")
and
ValuesThis function computes the asymptotic covariance between and
values
which is used in the computation of the asymptotic variance
of Cuzick and Edwards
test, which is a linear combination of some
tests.
The limit is as
goes to infinity.
The argument, , is the number of cases (denoted as
n1
as an argument).
The number of cases are denoted as and number of controls as
in this function
to match the case-control class labeling,
which is just the reverse of the labeling in Cuzick and Edwards (1990).
The logical argument nonzero.mat
(default=TRUE
) is for using the matrix if
FALSE
or just the matrix of nonzero
locations in the matrix (if
TRUE
) in the computations.
See page 80 of (Cuzick and Edwards (1990)) for more details.
asycovTkTl(dat, n1, k, l, nonzero.mat = TRUE, ...)
asycovTkTl(dat, n1, k, l, nonzero.mat = TRUE, ...)
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
n1 |
Number of cases |
k , l
|
Integers specifying the number of NNs (of subjects |
nonzero.mat |
A logical argument (default is |
... |
are for further arguments, such as |
Returns the asymptotic covariance between and
values.
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) k<-1 #try also 2,3 or sample(1:5,1) l<-1 #try also 2,3 or sample(1:5,1) c(k,l) asycovTkTl(Y,n1,k,l) asycovTkTl(Y,n1,k,l,nonzero.mat = FALSE) asycovTkTl(Y,n1,k,l,method="max")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) k<-1 #try also 2,3 or sample(1:5,1) l<-1 #try also 2,3 or sample(1:5,1) c(k,l) asycovTkTl(Y,n1,k,l) asycovTkTl(Y,n1,k,l,nonzero.mat = FALSE) asycovTkTl(Y,n1,k,l,method="max")
Test statisticThis function computes the asymptotic variance of Cuzick and Edwards test statistic based on the number
of cases within
k
NNs of the cases in the data.
The argument, , is the number of cases (denoted as
n1
as an argument).
The number of cases are denoted as and number of controls as
in this function
to match the case-control class labeling,
which is just the reverse of the labeling in Cuzick and Edwards (1990).
The logical argument nonzero.mat
(default=TRUE
) is for using the matrix if
FALSE
or just the matrix of nonzero
locations in the matrix (if
TRUE
) for computing and
, which are required in the computation of the
asymptotic variance.
and
are defined on page 78 of (Cuzick and Edwards (1990)) as follows.
(i.e., number of ordered pairs for which
k
NN relation is symmetric)
and (i.e, number of triplets
, and
distinct so that
is among
k
NNs of and
is among
k
NNs of ).
For the
matrix, see the description of the functions
aij.mat
and aij.nonzero
.
See (Cuzick and Edwards (1990)) for more details.
asyvarTk(dat, n1, k, nonzero.mat = TRUE, ...)
asyvarTk(dat, n1, k, nonzero.mat = TRUE, ...)
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
n1 |
Number of cases |
k |
Integer specifying the number of NNs (of subject |
nonzero.mat |
A logical argument (default is |
... |
are for further arguments, such as |
A list
with the elements
asy.var |
The asymptotic variance of Cuzick and Edwards |
Ns |
The |
Nt |
The |
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) k<-3 #try also 2,3 asyvarTk(Y,n1,k) asyvarTk(Y,n1,k,nonzero.mat=FALSE) asyvarTk(Y,n1,k,method="max")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) k<-3 #try also 2,3 asyvarTk(Y,n1,k) asyvarTk(Y,n1,k,nonzero.mat=FALSE) asyvarTk(Y,n1,k,method="max")
Computes the value of the probability density function (i.e. density) of the bivariate normal distribution
at the specified point X
, with mean mu
and standard deviations of the first and second components being
and
(denoted as
s1
and s2
in the arguments of the function, respectively)
and correlation between them being rho
(i.e., the covariance matrix is where
,
,
).
bvnorm.pdf(X, mu = c(0, 0), s1 = 1, s2 = 1, rho = 0)
bvnorm.pdf(X, mu = c(0, 0), s1 = 1, s2 = 1, rho = 0)
X |
A set of 2D points of size |
mu |
A |
s1 , s2
|
The standard deviations of the first and second components of the bivariate normal distribution,
with default is |
rho |
The correlation between the first and second components of the bivariate normal distribution with default=0. |
The value of the probability density function (i.e. density) of the bivariate normal distribution
at the specified point X
, with mean mu
and standard deviations of the first and second components being
and
and correlation between them being
rho
.
Elvan Ceyhan
mu<-c(0,0) s1<-1 s2<-1 rho<-.5 n<-5 Xp<-cbind(runif(n),runif(n)) bvnorm.pdf(Xp,mu,s1,s2,rho)
mu<-c(0,0) s1<-1 s2<-1 rho<-.5 n<-5 Xp<-cbind(runif(n),runif(n)) bvnorm.pdf(Xp,mu,s1,s2,rho)
Returns a matrix
of same dimension as, ct
, whose entries are the values
of the Types I-IV cell-specific test statistics, .
The row and column names are inherited from
ct
. The type argument specifies the type
of the cell-specific test among the types I-IV tests.
Equivalent to the function tct
in this package.
See also (Ceyhan (2017)) and the references therein.
cellsTij(ct, type = "III")
cellsTij(ct, type = "III")
ct |
A nearest neighbor contingency table |
type |
The type of the cell-specific test, default= |
A matrix
of the values of Type I-IV cell-specific tests
Elvan Ceyhan
Ceyhan E (2017). “Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.” Journal of the Korean Statistical Society, 46(2), 219-245.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct type.lab<-c("I","II","III","IV") for (i in 1:4) { print(paste("T_ij values for cell specific tests for type",type.lab[i])) print(cellsTij(ct,i)) } cellsTij(ct,"II") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) cellsTij(ct,2) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) cellsTij(ct,2) ct<-matrix(c(0,10,5,5),ncol=2) cellsTij(ct,2)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct type.lab<-c("I","II","III","IV") for (i in 1:4) { print(paste("T_ij values for cell specific tests for type",type.lab[i])) print(cellsTij(ct,i)) } cellsTij(ct,"II") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) cellsTij(ct,2) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) cellsTij(ct,2) ct<-matrix(c(0,10,5,5),ncol=2) cellsTij(ct,2)
Test statisticThis function computes Cuzick and Edwards test statistic based on the number of cases within
k
NNs of the cases
in the data.
For disease clustering, Cuzick and Edwards (1990) suggested a k
-NN test based on number of cases
among k
NNs of the case points.
Let be the
point and
be the number cases among
k
NNs of .
Then Cuzick-Edwards'
k
-NN test is , where
if
is a case, and 0 if
is a control.
The argument cc.lab
is case-control label, 1 for case, 0 for control, if the argument case.lab
is NULL
,
then cc.lab
should be provided in this fashion, if case.lab
is provided, the labels are converted to 0's
and 1's accordingly.
Also, is identical to the count for cell
in the nearest neighbor contingency table (NNCT)
(See the function
nnct
for more detail on NNCTs).
See also (Ceyhan (2014); Cuzick and Edwards (1990)) and the references therein.
ceTk(dat, cc.lab, k = 1, case.lab = NULL, ...)
ceTk(dat, cc.lab, k = 1, case.lab = NULL, ...)
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
cc.lab |
Case-control labels, 1 for case, 0 for control |
k |
Integer specifying the number of NNs (of subject |
case.lab |
The label used for cases in the |
... |
are for further arguments, such as |
Cuzick and Edwards test statistic for disease clustering
Elvan Ceyhan
Ceyhan E (2014).
“Segregation indices for disease clustering.”
Statistics in Medicine, 33(10), 1662-1684.
Cuzick J, Edwards R (1990).
“Spatial clustering for inhomogeneous populations (with discussion).”
Journal of the Royal Statistical Society, Series B, 52, 73-104.
Tcomb
, seg.ind
, Pseg.coeff
and ceTkinv
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) ceTk(Y,cls) ceTk(Y,cls,method="max") ceTk(Y,cls,k=3) ceTk(Y,cls+1,case.lab = 2) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ceTk(Y,fcls,case.lab="a") #try also ceTk(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) # here ceTk(Y,cls) gives an error message
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) ceTk(Y,cls) ceTk(Y,cls,method="max") ceTk(Y,cls,k=3) ceTk(Y,cls+1,case.lab = 2) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ceTk(Y,fcls,case.lab="a") #try also ceTk(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) # here ceTk(Y,cls) gives an error message
Test statisticThis function computes Cuzick and Edwards test statistic based on the sum of number of cases closer to
each case than the
k
-th nearest control to the case.
test statistic is an extension of the run length test allowing a fixed number of controls in the run
sequence.
test statistic is defined as
where
if
is a case, and 0 if
is a control and
is the number of cases closer
to the index case than the
k
nearest control, i.e., number of cases encountered beginning
at until
k
-th control is encountered.
The argument cc.lab
is case-control label, 1 for case, 0 for control, if the argument case.lab
is NULL
,
then cc.lab
should be provided in this fashion, if case.lab
is provided, the labels are converted to 0's and 1's
accordingly.
ceTkinv(dat, k, cc.lab, case.lab = NULL, ...)
ceTkinv(dat, k, cc.lab, case.lab = NULL, ...)
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
k |
Integer specifying the number of the closest controls to subject |
cc.lab |
Case-control labels, 1 for case, 0 for control |
case.lab |
The label used for cases in the |
... |
are for further arguments, such as |
A list
with two elements
Tkinv |
Cuzick and Edwards |
run.vec |
The |
Elvan Ceyhan
There are no references for Rd macro \insertAllCites
on this help page.
n<-20 Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) cls k<-2 #also try 3,4 ceTkinv(Y,k,cls) ceTkinv(Y,k,cls+1,case.lab = 2) ceTkinv(Y,k,cls,method="max") ceTrun(Y,cls) ceTkinv(Y,k=1,cls) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ceTkinv(Y,k,fcls,case.lab="a") #try also ceTrun(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #here ceTkinv(Y,k,cls) #gives error
n<-20 Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) cls k<-2 #also try 3,4 ceTkinv(Y,k,cls) ceTkinv(Y,k,cls+1,case.lab = 2) ceTkinv(Y,k,cls,method="max") ceTrun(Y,cls) ceTkinv(Y,k=1,cls) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ceTkinv(Y,k,fcls,case.lab="a") #try also ceTrun(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #here ceTkinv(Y,k,cls) #gives error
Test statisticThis function computes Cuzick and Edwards test statistic based on the sum of the number of successive
cases from each cases until a control is encountered in the data for detecting rare large clusters.
test statistic is defined as
where
if
is a case, and 0 if
is a control and
is the number successive cases encountered beginning
at
until a control is encountered.
The argument cc.lab
is case-control label, 1 for case, 0 for control, if the argument case.lab
is NULL
,
then cc.lab
should be provided in this fashion, if case.lab
is provided, the labels are converted to 0's and 1's
accordingly.
See also (Cuzick and Edwards (1990)) and the references therein.
ceTrun(dat, cc.lab, case.lab = NULL, ...)
ceTrun(dat, cc.lab, case.lab = NULL, ...)
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
cc.lab |
Case-control labels, 1 for case, 0 for control |
case.lab |
The label used for cases in the |
... |
are for further arguments, such as |
A list
with two elements
Trun |
Cuzick and Edwards |
run.vec |
The |
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) ceTrun(Y,cls) ceTrun(Y,cls,method="max") ceTrun(Y,cls+1,case.lab = 2) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ceTrun(Y,fcls,case.lab="a") #try also ceTrun(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #here ceTrun(Y,cls) #gives an error message
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) ceTrun(Y,cls) ceTrun(Y,cls,method="max") ceTrun(Y,cls+1,case.lab = 2) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ceTrun(Y,fcls,case.lab="a") #try also ceTrun(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #here ceTrun(Y,cls) #gives an error message
Returns the covariance matrix of cell counts for
in the NNCT,
ct
.
The covariance matrix is of dimension and its entries are
when
values are
by default corresponding to the row-wise vectorization of
ct
. If byrow=FALSE
, the column-wise
vectorization of ct
is used.
These covariances are valid under RL or conditional on and
under CSR.
See also (Dixon (1994, 2002); Ceyhan (2010, 2017)).
cov.nnct(ct, varN, Q, R, byrow = TRUE)
cov.nnct(ct, varN, Q, R, byrow = TRUE)
ct |
A nearest neighbor contingency table |
varN |
The |
Q |
The number of shared NNs |
R |
The number of reflexive NNs (i.e., twice the number of reflexive NN pairs) |
byrow |
A logical argument (default= |
The covariance matrix of cell counts
for
in the NNCT,
ct
Elvan Ceyhan
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Ceyhan E (2017).
“Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.”
Journal of the Korean Statistical Society, 46(2), 219-245.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
covNrow2col
, cov.tct
and cov.nnsym
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) cov.nnct(ct,varN,Qv,Rv) cov.nnct(ct,varN,Qv,Rv,byrow=FALSE) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) cov.nnct(ct,varN,Qv,Rv) cov.nnct(ct,varN,Qv,Rv,byrow=FALSE) #1D data points n<-20 #or try sample(1:20,1) X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) cov.nnct(ct,varN,Qv,Rv)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) cov.nnct(ct,varN,Qv,Rv) cov.nnct(ct,varN,Qv,Rv,byrow=FALSE) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) cov.nnct(ct,varN,Qv,Rv) cov.nnct(ct,varN,Qv,Rv,byrow=FALSE) #1D data points n<-20 #or try sample(1:20,1) X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) cov.nnct(ct,varN,Qv,Rv)
Returns the covariance matrix of the differences of the cell counts,
for
and
, in the NNCT,
ct
.
The covariance matrix is of dimension and its entries are
where the order of
for
is as
in the output of
ind.nnsym(k)
.
These covariances are valid under RL or conditional on and
under CSR.
The argument covN
is the covariance matrix of (concatenated rowwise).
See also (Dixon (1994); Ceyhan (2014)).
cov.nnsym(covN)
cov.nnsym(covN)
covN |
The |
The covariance matrix of the differences of the off-diagonal cell counts
for
and
in the NNCT,
ct
Elvan Ceyhan
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
var.nnsym
, cov.tct
, cov.nnct
and cov.seg.coeff
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow cov.nnsym(covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.nnsym(covN)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow cov.nnsym(covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.nnsym(covN)
Returns the covariance matrix of the segregation coefficients in a multi-class case based on
the NNCT, ct
. The covariance matrix is of dimension and its entry
correspond to the
entries in the rows
and
of the output of
ind.seg.coeff(k)
.
The segregation coefficients in the multi-class case are the extension of Pielou's segregation coefficient
for the two-class case.
These covariances are valid under RL or conditional on and
under CSR.
The argument covN
is the covariance matrix of (concatenated rowwise).
See also (Ceyhan (2014)).
cov.seg.coeff(ct, covN)
cov.seg.coeff(ct, covN)
ct |
A nearest neighbor contingency table |
covN |
The |
The x
covariance matrix of the segregation coefficients for the multi-class case
based on the NNCT,
ct
Elvan Ceyhan
Ceyhan E (2014). “Segregation indices for disease clustering.” Statistics in Medicine, 33(10), 1662-1684.
seg.coeff
, var.seg.coeff
, cov.nnct
and cov.nnsym
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.seg.coeff(ct,covN) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) cov.seg.coeff(ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.seg.coeff(ct,covN)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.seg.coeff(ct,covN) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) cov.seg.coeff(ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.seg.coeff(ct,covN)
Returns the covariance matrix of the entries for
in the TCT for the types I, III,
and IV cell-specific tests. The covariance matrix is of dimension
and its entries are
when
values are by default corresponding to the row-wise vectorization of TCT.
The argument
covN
must be the covariance matrix of values which are obtained from the NNCT by row-wise
vectorization.
The functions
cov.tctIII
and cov.tct3
are equivalent.
These covariances are valid under RL or conditional on and
under CSR.
See also (Ceyhan (2017)).
cov.tct(ct, covN, type = "III")
cov.tct(ct, covN, type = "III")
ct |
A nearest neighbor contingency table |
covN |
The |
type |
The type of the cell-specific test, default= |
The covariance matrix of the entries
for
in the Type I-IV TCTs
Elvan Ceyhan
Ceyhan E (2017). “Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.” Journal of the Korean Statistical Society, 46(2), 219-245.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.tct(ct,covN,type=1) cov.tct(ct,covN,type="I") cov.tct(ct,covN,type="II") cov.tct(ct,covN,type="III") cov.tct(ct,covN,type="IV") cov.tctI(ct,covN) cov.tct(ct,covN) cov.tctIII(ct,covN) cov.tct3(ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.tct(ct,covN,type=3) cov.tct(ct,covN,type="III") cov.tctIII(ct,covN) cov.tct3(ct,covN)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.tct(ct,covN,type=1) cov.tct(ct,covN,type="I") cov.tct(ct,covN,type="II") cov.tct(ct,covN,type="III") cov.tct(ct,covN,type="IV") cov.tctI(ct,covN) cov.tct(ct,covN) cov.tctIII(ct,covN) cov.tct3(ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.tct(ct,covN,type=3) cov.tct(ct,covN,type="III") cov.tctIII(ct,covN) cov.tct3(ct,covN)
Converts the covariance matrix of row-wise vectorized cell counts
for
in the NNCT,
ct
to the covariance matrix of column-wise vectorized cell counts.
In the output, the covariance matrix entries are when
values are
corresponding to the column-wise vectorization of
ct
.
These covariances are valid under RL or conditional on and
under CSR.
See also (Dixon (1994, 2002); Ceyhan (2010, 2017)).
covNrow2col(covN)
covNrow2col(covN)
covN |
The |
The covariance matrix of column-wise vectorized cell counts
for
in the NNCT,
ct
.
Elvan Ceyhan
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Ceyhan E (2017).
“Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.”
Journal of the Korean Statistical Society, 46(2), 219-245.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covNrow<-cov.nnct(ct,varN,Qv,Rv) covNcol1<-cov.nnct(ct,varN,Qv,Rv,byrow=FALSE) covNcol2<-covNrow2col(covNrow) covNrow covNcol1 covNcol2 all.equal(covNcol1,covNcol2) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covNrow<-cov.nnct(ct,varN,Qv,Rv) covNcol1<-cov.nnct(ct,varN,Qv,Rv,byrow=FALSE) covNcol2<-covNrow2col(covNrow) covNrow covNcol1 covNcol2 all.equal(covNcol1,covNcol2) #1D data points n<-20 #or try sample(1:20,1) X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) cov.nnct(ct,varN,Qv,Rv)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covNrow<-cov.nnct(ct,varN,Qv,Rv) covNcol1<-cov.nnct(ct,varN,Qv,Rv,byrow=FALSE) covNcol2<-covNrow2col(covNrow) covNrow covNcol1 covNcol2 all.equal(covNcol1,covNcol2) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covNrow<-cov.nnct(ct,varN,Qv,Rv) covNcol1<-cov.nnct(ct,varN,Qv,Rv,byrow=FALSE) covNcol2<-covNrow2col(covNrow) covNrow covNcol1 covNcol2 all.equal(covNcol1,covNcol2) #1D data points n<-20 #or try sample(1:20,1) X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) cov.nnct(ct,varN,Qv,Rv)
values in Tcomb
This function computes the covariance matrix for the values used in the
test statistics,
which is a linear combination of some
tests.
The argument, , is the number of cases (denoted as
n1
as an argument).
The number of cases is denoted as to match the case-control class labeling,
which is just the reverse of the labeling in Cuzick and Edwards (1990).
The argument klist
is the vector
of integers specifying the indices of the values used
in obtaining the
.
The logical argument nonzero.mat
(default=TRUE
) is for using the matrix if
FALSE
or just the matrix of nonzero
locations in the matrix (if
TRUE
) in the computations.
The logical argument asy.cov
(default=FALSE
) is for using the asymptotic covariance or the exact (i.e. finite
sample) covariance for the vector of values used in
Tcomb
.
If asy.cov=TRUE
, the asymptotic covariance is used, otherwise the exact covariance is used.
See page 87 of (Cuzick and Edwards (1990)) for more details.
covTcomb(dat, n1, klist, nonzero.mat = TRUE, asy.cov = FALSE, ...)
covTcomb(dat, n1, klist, nonzero.mat = TRUE, asy.cov = FALSE, ...)
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
n1 |
Number of cases |
klist |
|
nonzero.mat |
A logical argument (default is |
asy.cov |
A logical argument (default is |
... |
are for further arguments, such as |
Returns the covariance matrix for the values used in
Tcomb
.
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
asycovTkTl
, covTcomb
, and Ntkl
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) kl<-sample(1:5,3) #try also sample(1:5,2) kl covTcomb(Y,n1,kl) covTcomb(Y,n1,kl,method="max") covTcomb(Y,n1,kl,nonzero.mat = FALSE) covTcomb(Y,n1,kl,asy=TRUE)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) kl<-sample(1:5,3) #try also sample(1:5,2) kl covTcomb(Y,n1,kl) covTcomb(Y,n1,kl,method="max") covTcomb(Y,n1,kl,nonzero.mat = FALSE) covTcomb(Y,n1,kl,asy=TRUE)
and
ValuesThis function computes the exact (i.e., finite sample) covariance between and
values
which is used in the computation of the exact variance
of Cuzick and Edwards
test, which is a linear combination of some
tests.
The logical argument nonzero.mat
(default=TRUE
) is for using the matrix if
FALSE
or just the matrix of nonzero
locations in the matrix (if
TRUE
) in the computations.
See page 80 of (Cuzick and Edwards (1990)) for more details.
covTkTl(dat, n1, k, l, nonzero.mat = TRUE, ...)
covTkTl(dat, n1, k, l, nonzero.mat = TRUE, ...)
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
n1 |
Number of cases |
k |
Integers specifying the number of NNs (of subjects |
l |
Integers specifying the number of NNs (of subjects |
nonzero.mat |
A logical argument (default is |
... |
are for further arguments, such as |
Returns the exact covariance between and
values.
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
asycovTkTl
, covTcomb
, and Ntkl
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) k<-1 #try also 2,3 or sample(1:5,1) l<-1 #try also 2,3 or sample(1:5,1) c(k,l) covTkTl(Y,n1,k,l) covTkTl(Y,n1,k,l,method="max") asycovTkTl(Y,n1,k,l) covTkTl(Y,n1,k,l,nonzero.mat = FALSE) asycovTkTl(Y,n1,k,l,nonzero.mat = FALSE)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) k<-1 #try also 2,3 or sample(1:5,1) l<-1 #try also 2,3 or sample(1:5,1) c(k,l) covTkTl(Y,n1,k,l) covTkTl(Y,n1,k,l,method="max") asycovTkTl(Y,n1,k,l) covTkTl(Y,n1,k,l,nonzero.mat = FALSE) asycovTkTl(Y,n1,k,l,nonzero.mat = FALSE)
This function computes and returns the distance matrix computed by using the specified distance measure
to compute the distances between the rows of a data matrix which is standardized row or column-wise.
That is, the output is the interpoint distance (IPD) matrix of the rows of the given set of points x
dist
function in the stats
package of the standard R distribution.
The argument column is the logical argument (default=TRUE
) to determine row-wise or column-wise standardization.
If TRUE
each column is divided by its standard deviation, else each row is divided by its standard deviation.
This function is different from the dist
function in the stats
package.
dist
returns the distance matrix in a lower triangular form, and dist.std.data returns in a full matrix
of distances of standardized data set.
... are for further arguments, such as method
and p
, passed to the dist
function.
dist.std.data(x, column = TRUE, ...)
dist.std.data(x, column = TRUE, ...)
x |
A set of points in matrix or data frame form where points correspond to the rows. |
column |
A logical argument (default is |
... |
Additional parameters to be passed on the |
A distance matrix whose i,j-th entry is the distance between rows and
of
x
, which is
standardized row-wise or column-wise.
Elvan Ceyhan
dist
, ipd.mat
, and ipd.mat.euc
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) range(ipd) ipd2<-dist.std.data(Y) #distance of standardized data range(ipd2) ipd2<-dist.std.data(Y,method="max") #distance of standardized data range(ipd2) ############# Y<-matrix(runif(60,0,100),ncol=3) ipd<-ipd.mat(Y) range(ipd) ipd2<-dist.std.data(Y) #distance of standardized data range(ipd2)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) range(ipd) ipd2<-dist.std.data(Y) #distance of standardized data range(ipd2) ipd2<-dist.std.data(Y,method="max") #distance of standardized data range(ipd2) ############# Y<-matrix(runif(60,0,100),ncol=3) ipd<-ipd.mat(Y) range(ipd) ipd2<-dist.std.data(Y) #distance of standardized data range(ipd2)
Converts a lower triangular distance matrix to a full distance matrix with zeroes in the diagonal.
The input is usually the result of the dist
function in the stats
package.
This function is adapted from Everitt's book (Everitt (2004))
dist2full(dis)
dist2full(dis)
dis |
A lower triangular matrix, resulting from the |
A square (symmetric) distance matrix with zeroes in the diagonal.
Elvan Ceyhan
Everitt BS (2004). An R and S-Plus Companion to Multivariate Analysis. Springer-Verlag, London, UK.
#3D data points n<-3 X<-matrix(runif(3*n),ncol=3) dst<-dist(X) dist2full(dst)
#3D data points n<-3 X<-matrix(runif(3*n),ncol=3) dst<-dist(X) dist2full(dst)
Returns the Euclidean distance between x
and y
which can be vectors
or matrices or data frames of any dimension (x
and y
should be of same dimension).
This function is equivalent to Dist
function in the pcds
package but is
different from the dist
function in the stats
package of the standard R
distribution.
dist
requires its argument to be a data matrix and dist
computes and returns the distance matrix computed
by using the specified distance measure to compute the distances between the rows of a data matrix
(Becker et al. (1988)),
while euc.dist
needs two arguments to find the distances between.
For two data matrices A
and B
,
dist(rbind(as.vector(A),as.vector(B)))
and euc.dist(A,B)
yield the same result.
euc.dist(x, y)
euc.dist(x, y)
x , y
|
Vectors, matrices or data frames (both should be of the same type). |
Euclidean distance between x
and y
Elvan Ceyhan
Becker RA, Chambers JM, Wilks AR (1988). The New S Language. Wadsworth & Brooks/Cole.
dist
from the base package stats
and
Dist
from the package pcds
B<-c(1,0); C<-c(1/2,sqrt(3)/2); euc.dist(B,C); euc.dist(B,B); x<-runif(10) y<-runif(10) euc.dist(x,y) xm<-matrix(x,ncol=2) ym<-matrix(y,ncol=2) euc.dist(xm,ym) euc.dist(xm,xm) dat.fr<-data.frame(b=B,c=C) euc.dist(dat.fr,dat.fr) euc.dist(dat.fr,cbind(B,C))
B<-c(1,0); C<-c(1/2,sqrt(3)/2); euc.dist(B,C); euc.dist(B,B); x<-runif(10) y<-runif(10) euc.dist(x,y) xm<-matrix(x,ncol=2) ym<-matrix(y,ncol=2) euc.dist(xm,ym) euc.dist(xm,xm) dat.fr<-data.frame(b=B,c=C) euc.dist(dat.fr,dat.fr) euc.dist(dat.fr,cbind(B,C))
Returns a vector
of length of expected values of the self entries (i.e. first column) in a
species correspondence contingency table (SCCT) or the expected values of the diagonal entries
in an NNCT.
These expected values are valid under RL or CSR.
The argument ct
can be either the NNCT or SCCT.
See also (Ceyhan (2018)).
EV.Nii(ct)
EV.Nii(ct)
ct |
The NNCT or SCCT |
A vector
of length whose entries are the expected values of the self entries (i.e. first column) in a
species correspondence contingency table (SCCT) or of the diagonal entries in an NNCT.
Elvan Ceyhan
Ceyhan E (2018). “A contingency table approach based on nearest neighbor relations for testing self and mixed correspondence.” SORT-Statistics and Operations Research Transactions, 42(2), 125-158.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct EV.Nii(ct) ct<-scct(ipd,cls) EV.Nii(ct) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) EV.Nii(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) EV.Nii(ct) ct<-scct(ipd,cls) EV.Nii(ct)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct EV.Nii(ct) ct<-scct(ipd,cls) EV.Nii(ct) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) EV.Nii(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) EV.Nii(ct) ct<-scct(ipd,cls) EV.Nii(ct)
Returns a matrix
of same dimension as, ct
, whose entries are the expected cell counts of
the NNCT under RL or CSR. The class sizes given as the row sums of ct
and the row and column names are
inherited from ct
.
See also (Dixon (1994); Ceyhan (2010)).
EV.nnct(ct)
EV.nnct(ct)
ct |
A nearest neighbor contingency table |
A matrix
of the expected values of cell counts in the NNCT.
Elvan Ceyhan
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) EV.nnct(ct) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) EV.nnct(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) EV.nnct(ct) ct<-matrix(c(0,10,5,5),ncol=2) EV.nnct(ct)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) EV.nnct(ct) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) EV.nnct(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) EV.nnct(ct) ct<-matrix(c(0,10,5,5),ncol=2) EV.nnct(ct)
Returns a matrix
of same dimension as the RCT, rfct
,
whose entries are the expected cell counts of
the RCT under RL or CSR.
See also (Ceyhan and Bahadir (2017)).
EV.rct(rfct, nvec)
EV.rct(rfct, nvec)
rfct |
An RCT |
nvec |
The |
A matrix
of the expected values of cell counts in the RCT.
Elvan Ceyhan
Ceyhan E, Bahadir S (2017). “Nearest Neighbor Methods for Testing Reflexivity.” Environmental and Ecological Statistics, 24(1), 69-108.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) EV.rct(rfct,nvec) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) nvec<-as.numeric(table(fcls)) rfct<-rct(ipd,fcls) EV.rct(rfct,nvec) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) rfct<-rct(ipd,cls) EV.rct(rfct,nvec)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) EV.rct(rfct,nvec) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) nvec<-as.numeric(table(fcls)) rfct<-rct(ipd,fcls) EV.rct(rfct,nvec) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) rfct<-rct(ipd,cls) EV.rct(rfct,nvec)
Test StatisticThis function computes the expected value of Cuzick & Edwards test statistic in disease clustering,
where
is a linear combination of some
tests.
The argument, , is the number of cases (denoted as
n1
as an argument).
The number of cases is denoted as to match the case-control class labeling,
which is just the reverse of the labeling in Cuzick and Edwards (1990).
The argument klist
is the vector
of integers specifying the indices of the values used
in obtaining the
.
The argument sig
is the covariance matrix of the vector of values used in
Tcomb
, and can be computed
via the the covTcomb
function.
See page 87 of (Cuzick and Edwards (1990)) for more details.
EV.Tcomb(n1, n, klist, sig)
EV.Tcomb(n1, n, klist, sig)
n1 |
Number of cases |
n |
A positive integer representing the number of points in the data set |
klist |
|
sig |
The covariance matrix of the vector of |
Returns the expected value of the test statistic
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
n<-20 #or try sample(1:20,1) #try also n<-50, 100 Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) kl<-sample(1:5,3) #try also sample(1:5,2) kl sig<-covTcomb(Y,n1,kl) EV.Tcomb(n1,n,kl,sig)
n<-20 #or try sample(1:20,1) #try also n<-50, 100 Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) kl<-sample(1:5,3) #try also sample(1:5,2) kl sig<-covTcomb(Y,n1,kl) EV.Tcomb(n1,n,kl,sig)
Returns a matrix
of same dimension as, ct
, whose entries are the expected values
of the values which are the Types I-IV cell-specific test statistics (i.e.,
)
under RL or CSR.
The row and column names are inherited from
ct
. The type argument specifies the type
of the cell-specific test among the types I-IV tests.
See also (Ceyhan (2017)) and the references therein.
EV.tct(ct, type = "III")
EV.tct(ct, type = "III")
ct |
A nearest neighbor contingency table |
type |
The type of the cell-specific test, default= |
A matrix
of the expected values of Type I-IV cell-specific tests.
Elvan Ceyhan
Ceyhan E (2017). “Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.” Journal of the Korean Statistical Society, 46(2), 219-245.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) EV.tct(ct,2) EV.tct(ct,"II") EV.tctI(ct) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) EV.tct(ct,2) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) EV.tct(ct,2) ct<-matrix(c(0,10,5,5),ncol=2) EV.tct(ct,2)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) EV.tct(ct,2) EV.tct(ct,"II") EV.tctI(ct) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) EV.tct(ct,2) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) EV.tct(ct,2) ct<-matrix(c(0,10,5,5),ncol=2) EV.tct(ct,2)
Returns a matrix
of same dimension as, ct
, whose entries are the expected values
of the Type I cell-specific test statistics, .
The row and column names are inherited from
ct
.
These expected values are valid under RL or CSR.
See also (Ceyhan (2017)) and the references therein.
EV.tctI(ct)
EV.tctI(ct)
ct |
A nearest neighbor contingency table |
A matrix
of the expected values of Type I cell-specific tests.
Elvan Ceyhan
Ceyhan E (2017). “Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.” Journal of the Korean Statistical Society, 46(2), 219-245.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) EV.tctI(ct)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) EV.tctI(ct)
Test statisticThis function computes the expected value of Cuzick and Edwards test statistic which is based on the
sum of number of cases closer to each case than the
k
-th nearest control to the case.
The number of cases are denoted as (denoted as
n1
as an argument)
and number of controls as for both functions (denoted as
n0
as an argument),
to match the case-control class labeling,
which is just the reverse of the labeling in Cuzick and Edwards (1990).
See the function ceTkinv
for the details of the test.
See (Cuzick and Edwards (1990)) and references therein.
EV.Tkinv(n1, n0, k)
EV.Tkinv(n1, n0, k)
n1 , n0
|
The number of cases and controls |
k |
Integer specifying the number of the closest controls to subject |
The expected value of Cuzick and Edwards test statistic for disease clustering
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
n1<-20 n0<-25 k<-2 #try also 2, 3 EV.Tkinv(n1,n0,k) EV.Tkinv(n1,n0,k=1) EV.Trun(n1,n0)
n1<-20 n0<-25 k<-2 #try also 2, 3 EV.Tkinv(n1,n0,k) EV.Tkinv(n1,n0,k=1) EV.Trun(n1,n0)
An object of class "htest"
performing exact version of Pearson's chi-square test on nearest neighbor contingency
tables (NNCTs) for the RL or CSR independence for 2 classes.
Pearson's test is based on the test statistic
,
which has
distribution in the limit provided
that the contingency table is constructed under the independence null hypothesis.
The exact version of Pearson's test uses the exact distribution of
rather than large sample
approximation.
That is, for the one-sided alternative, we calculate
the
-values as in the function
exact.pval1s
;
and for the two-sided alternative, we calculate
the -values as in the function
exact.pval2s
with double argument determining
the type of the correction.
This test would be equivalent to Fisher's exact test fisher.test
if the odds ratio=1
(which can not be specified in the current version), and the odds ratio for the RL or CSR independence null
hypothesis is which is used in the function and
the
-value and confidence interval computations are are adapted from
fisher.test
.
See Ceyhan (2014) for more details.
exact.nnct( ct, alternative = "two.sided", conf.level = 0.95, pval.type = "inc", double = FALSE )
exact.nnct( ct, alternative = "two.sided", conf.level = 0.95, pval.type = "inc", double = FALSE )
ct |
A |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
pval.type |
The type of the |
double |
A logical argument (default is |
A list
with the elements
statistic |
The test statistic, it is |
p.value |
The |
conf.int |
Confidence interval for the odds ratio in the |
estimate |
Estimate, i.e., the observed odds ratio the |
null.value |
Hypothesized null value for the odds ratio in the |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
data.name |
Name of the contingency table, |
Elvan Ceyhan
Ceyhan E (2014). “Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.” The Scientific World Journal, Volume 2014, Article ID 698296.
fisher.test
, exact.pval1s
, and exact.pval2s
n<-20 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct exact.nnct(ct) fisher.test(ct) exact.nnct(ct,alt="g") fisher.test(ct,alt="g") exact.nnct(ct,alt="l",pval.type = "mid") ############# ct<-matrix(sample(10:20,9),ncol=3) fisher.test(ct) #here exact.nnct(ct) gives error message, since number of classes > 2
n<-20 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct exact.nnct(ct) fisher.test(ct) exact.nnct(ct,alt="g") fisher.test(ct,alt="g") exact.nnct(ct,alt="l",pval.type = "mid") ############# ct<-matrix(sample(10:20,9),ncol=3) fisher.test(ct) #here exact.nnct(ct) gives error message, since number of classes > 2
-value correction to the one-sided version of exact NNCT testIn using Fisher's exact test on the nearest neighbor contingency tables (NNCTs) a correction
may be needed for the
-value. For the one-sided alternatives, the probabilities of
more extreme tables are summed up, including or excluding the
probability of the table itself (or some middle way).
Let the probability of the contingency table itself be
where
which is the odds ratio
under RL or CSR independence and
is the probability mass function of the hypergeometric distribution.
For testing the one-sided alternative
versus
,
we consider the following four methods in calculating the
-value:
[(i)] with , we get the
table-inclusive version which is denoted as
,
[(ii)] with , we get the
table-exclusive version, denoted as
.
[(iii)] Using , we get the mid-
version, denoted as
.
[(iv)] We can also use Tocher corrected version which is denoted as
(see
tocher.cor
for details).
See (Ceyhan (2010)) for more details.
exact.pval1s(ptable, pval, type = "inc")
exact.pval1s(ptable, pval, type = "inc")
ptable |
Probability of the observed |
pval |
Table inclusive |
type |
The type of the |
A modified -value based on the correction specified in
type
.
Elvan Ceyhan
Ceyhan E (2010). “Exact Inference for Testing Spatial Patterns by Nearest Neighbor Contingency Tables.” Journal of Probability and Statistical Science, 8(1), 45-68.
ct<-matrix(sample(20:40,4),ncol=2) ptab<-prob.nnct(ct) pv<-.3 exact.pval1s(ptab,pv) exact.pval1s(ptab,pv,type="exc") exact.pval1s(ptab,pv,type="mid")
ct<-matrix(sample(20:40,4),ncol=2) ptab<-prob.nnct(ct) pv<-.3 exact.pval1s(ptab,pv) exact.pval1s(ptab,pv,type="exc") exact.pval1s(ptab,pv,type="mid")
-value correction to the two-sided version of exact NNCT testIn using Fisher's exact test on the nearest neighbor contingency tables (NNCTs) a correction may be needed
for the
-value. For the one-sided alternatives, the probabilities of
more extreme tables are summed up, including or excluding the
probability of the table itself (or some middle way).
There is additional complexity in -values for the two-sided alternatives.
A recommended method is adding up probabilities of the same
size and smaller than the probability associated with the current table.
Alternatively, one can double the one-sided
-value (see (Agresti (1992)).
Let the probability of the contingency table itself be
where
which is the odds ratio
under RL or CSR independence and
is the probability mass function of the hypergeometric distribution.
**Type (I):** For double the one-sided -value, we propose the following four variants:
[(i)] twice the minimum of for the one-sided tests, which is
table-inclusive version for this type of two-sided test, and denoted as
,
[(ii)] twice the minimum of minus twice the table
probability
, which is table-exclusive version of this type of
two-sided test, and denoted as
,
[(iii)] table-exclusive version of this type of
two-sided test plus , which is mid-
-value for
this test, and denoted as
,
[(iv)]Tocher corrected version (see tocher.cor
for details).
**Type (II):** For summing the -values of more extreme —than that of the table— cases
in both directions, the following variants are obtained.
The
-value is
with
[(i)] , which is
called table-inclusive version,
,
[(ii)] the probability of the observed table is included twice, once for each side;
that is , which is called twice-table-inclusive version,
,
[(iii)] table-inclusive minus , which is referred as table-exclusive version,
,
[(iv)] table-exclusive plus one-half
the , which is called mid-
version,
and,
[(v)]Tocher corrected version, , is obtained as before.
See (Ceyhan (2010)) for more details.
exact.pval2s(ptable, pval, type = "inc", double = FALSE)
exact.pval2s(ptable, pval, type = "inc", double = FALSE)
ptable |
Probability of the observed |
pval |
Table inclusive |
type |
The type of the |
double |
A logical argument (default is |
A modified -value based on the correction specified in
type
.
Elvan Ceyhan
Agresti A (1992).
“A Survey of Exact Inference for Contingency Tables.”
Statistical Science, 7(1), 131-153.
Ceyhan E (2010).
“Exact Inference for Testing Spatial Patterns by Nearest Neighbor Contingency Tables.”
Journal of Probability and Statistical Science, 8(1), 45-68.
ct<-matrix(sample(20:40,4),ncol=2) ptab<-prob.nnct(ct) pv<-.23 exact.pval2s(ptab,pv) exact.pval2s(ptab,pv,type="exc") exact.pval2s(ptab,pv,type="mid")
ct<-matrix(sample(20:40,4),ncol=2) ptab<-prob.nnct(ct) pv<-.23 exact.pval2s(ptab,pv) exact.pval2s(ptab,pv,type="exc") exact.pval2s(ptab,pv,type="mid")
Five functions: cov.2cells
, cov.cell.col
, covNijCk
, cov2cols
and covCiCj
These are auxiliary functions for computing covariances between entries in the TCT for the types I-IV
cell-specific tests. The covariances between values for
in the TCT require covariances
between two cells in the NNCT, between a cell and column sum, and between two column sums in the NNCT.
cov.2cells
computes the covariance between two cell counts and
in an NNCT,
cov.cell.col
and covNijCk
are equivalent and they compute the covariance between cell count
and sum of column
,
,
cov2cols
and covCiCj
are equivalent and they compute the covariance between sums of two columns,
and
.
The index arguments refer to which entry or column sum is intended in the NNCT.
The argument
covN
must be the covariance between values which are obtained from NNCT by row-wise vectorization.
These covariances are valid under RL or conditional on
and
under CSR.
cov.2cells(i, j, k, l, ct, covN) cov.cell.col(i, j, k, ct, covN) covNijCk(i, j, k, ct, covN) cov.2cols(i, j, ct, covN) covCiCj(i, j, ct, covN)
cov.2cells(i, j, k, l, ct, covN) cov.cell.col(i, j, k, ct, covN) covNijCk(i, j, k, ct, covN) cov.2cols(i, j, ct, covN) covCiCj(i, j, ct, covN)
i , j , k , l
|
Indices of the cell counts or column sums whose covariance is to be computed. All four are
needed for |
ct |
A nearest neighbor contingency table |
covN |
The |
cov.2cells
returns the covariance between two cell counts and
in an NNCT,
cov.cell.col
and covNijCk
return the covariance between cell count
and sum of column
,
,
cov2cols
and covCiCj
return the covariance between sums of two columns,
and
.
Elvan Ceyhan
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.2cells(1,1,1,2,ct,covN) cov.cell.col(2,2,1,ct,covN) covNijCk(2,2,1,ct,covN) cov.2cols(2,1,ct,covN) covCiCj(2,1,ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.2cells(2,3,1,2,ct,covN) cov.cell.col(1,1,2,ct,covN) covNijCk(1,1,2,ct,covN) cov.2cols(3,4,ct,covN) covCiCj(3,4,ct,covN)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.2cells(1,1,1,2,ct,covN) cov.cell.col(2,2,1,ct,covN) covNijCk(2,2,1,ct,covN) cov.2cols(2,1,ct,covN) covCiCj(2,1,ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.2cells(2,3,1,2,ct,covN) cov.cell.col(1,1,2,ct,covN) covNijCk(1,1,2,ct,covN) cov.2cols(3,4,ct,covN) covCiCj(3,4,ct,covN)
Two functions: base.class.spec.ct
and base.class.spec
.
Both functions are objects of class "classhtest"
but with different arguments (see the parameter list below).
Each one performs class specific segregation tests due to Dixon for classes. That is,
each one performs hypothesis tests of deviations of
entries in each row of NNCT from the expected values under RL or CSR for each row.
Recall that row labels in the NNCT are base class labels.
The test for each row
is based on the chi-squared approximation of the corresponding quadratic form
and are due to Dixon (2002).
Each function yields the test statistic, -value and
df
for each base class , description of the
alternative with the corresponding null values (i.e. expected values) for the row
, estimates for the entries in row
for
. The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis for each row is that the corresponding entries in row
are equal to their
expected values under RL or CSR.
See also (Dixon (2002); Ceyhan (2009)) and the references therein.
base.class.spec.ct(ct, covN) base.class.spec(dat, lab, ...)
base.class.spec.ct(ct, covN) base.class.spec(dat, lab, ...)
ct |
A nearest neighbor contingency table, used in |
covN |
The |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
type |
Type of the class-specific test, which is |
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
Estimates of the parameters, NNCT, i.e., matrix of the observed |
null.value |
Matrix of hypothesized null values for the parameters which are expected values of
the |
null.name |
Name of the null values |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2009).
“Class-Specific Tests of Segregation Based on Nearest Neighbor Contingency Tables.”
Statistica Neerlandica, 63(2), 149-182.
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
NN.class.spec.ct
, NN.class.spec
, class.spec.ct
and class.spec
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) base.class.spec(Y,cls) base.class.spec.ct(ct,covN) base.class.spec(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) base.class.spec(Y,fcls) base.class.spec.ct(ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) base.class.spec(Y,cls) base.class.spec.ct(ct,covN)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) base.class.spec(Y,cls) base.class.spec.ct(ct,covN) base.class.spec(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) base.class.spec(Y,fcls) base.class.spec.ct(ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) base.class.spec(Y,cls) base.class.spec.ct(ct,covN)
Two functions: cell.spec.ss.ct
and cell.spec.ss
.
Both functions are objects of class "cellhtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of equality of the expected values of the
cell counts (i.e., entries) in the NNCT for classes.
Each test is appropriate (i.e. have the appropriate asymptotic sampling distribution)
when that data is obtained by sparse sampling.
Each cell-specific segregation test is based on the normal approximation of the entries in the NNCT and are due to Pielou (1961).
Each function yields a contingency table of the test statistics, -values for the corresponding
alternative, expected values, lower and upper confidence levels, sample estimates (i.e. observed values)
and null value(s) (i.e. expected values) for the
values for
and also names of the test
statistics, estimates, null values and the method and the data set used.
The null hypothesis is that all where
is the sum of row
(i.e. size of class
)
is the sum of column
in the
NNCT for
.
In the output, the test statistic,
-value and the lower and upper confidence limits are valid only
for (properly) sparsely sampled data.
See also (Pielou (1961); Ceyhan (2010)) and the references therein.
cell.spec.ss.ct( ct, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) cell.spec.ss( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
cell.spec.ss.ct( ct, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) cell.spec.ss( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
A nearest neighbor contingency table, used in |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
LCL , UCL
|
Matrix of Lower and Upper Confidence Levels for the entries |
conf.int |
The confidence interval for the estimates, it is |
cnf.lvl |
Level of the upper and lower confidence limits (i.e., conf.level) of the NNCT entries. |
estimate |
Estimates of the parameters, i.e., matrix of the NNCT entries of the |
est.name , est.name2
|
Names of the estimates, former is a shorter description of the estimates than the latter. |
null.value |
Hypothesized null value for the expected values of the NNCT entries, E(Nij) for i,j=1,2,...,k. |
null.name |
Name of the null values |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Pielou EC (1961).
“Segregation and symmetry in two-species populations as studied by nearest-neighbor relationships.”
Journal of Ecology, 49(2), 255-269.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) cell.spec.ss(Y,cls) cell.spec.ss.ct(ct) cell.spec.ss.ct(ct,alt="g") cell.spec.ss(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) cell.spec.ss(Y,fcls) cell.spec.ss.ct(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) cell.spec.ss(Y,cls,alt="l") cell.spec.ss.ct(ct) cell.spec.ss.ct(ct,alt="l")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) cell.spec.ss(Y,cls) cell.spec.ss.ct(ct) cell.spec.ss.ct(ct,alt="g") cell.spec.ss(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) cell.spec.ss(Y,fcls) cell.spec.ss.ct(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) cell.spec.ss(Y,cls,alt="l") cell.spec.ss.ct(ct) cell.spec.ss.ct(ct,alt="l")
Two functions: class.spec.ct
and class.spec
.
Both functions are objects of class "classhtest"
but with different arguments (see the parameter list below).
Each one performs class specific segregation tests for the rows if type="base"
and
columns if type="NN"
for classes.
That is,
each one performs hypothesis tests of deviations of
entries in each row (column) of NNCT from the expected values under RL or CSR for each row (column)
if
type="base"
("NN"
).
Recall that row labels of the NNCT are base class labels and
column labels in the NNCT are NN class labels.
The test for each row (column) is based on the chi-squared approximation of the corresponding quadratic form
and are due to Dixon (2002)
(Ceyhan (2009)).
The argument covN
must be covariance of row-wise (column-wise) vectorization of NNCT if type="base"
(type="NN"
).
Each function yields the test statistic, -value and
df
for each base class , description of the
alternative with the corresponding null values (i.e. expected values) for the row (column)
, estimates for the entries in
row (column)
for
if
type="base"
(type="NN"
).
The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis for each row (column) is that the corresponding entries in row (column)
are
equal to their expected values under RL or CSR.
See also (Dixon (2002); Ceyhan (2009)) and the references therein.
class.spec.ct(ct, covN, type = "base") class.spec(dat, lab, type = "base", ...)
class.spec.ct(ct, covN, type = "base") class.spec(dat, lab, type = "base", ...)
ct |
A nearest neighbor contingency table, used in |
covN |
The |
type |
The type of the class-specific tests with default= |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
type |
Type of the class-specific test, which is |
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
Estimates of the parameters, NNCT, i.e., the matrix of the
observed |
null.value |
The |
null.name |
Name of the null values |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Ceyhan E (2009).
“Class-Specific Tests of Segregation Based on Nearest Neighbor Contingency Tables.”
Statistica Neerlandica, 63(2), 149-182.
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
base.class.spec.ct
, base.class.spec
, NN.class.spec.ct
and NN.class.spec
n<-20 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow class.spec(Y,cls) class.spec(Y,cls,type="NN") class.spec.ct(ct,covN) class.spec.ct(ct,covN,type="NN") class.spec(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) class.spec(Y,fcls) class.spec(Y,fcls,type="NN") class.spec.ct(ct,covN) class.spec.ct(ct,covN,type="NN") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) class.spec(Y,cls) class.spec(Y,cls,type="NN") class.spec.ct(ct,covN) class.spec.ct(ct,covN,type="NN")
n<-20 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow class.spec(Y,cls) class.spec(Y,cls,type="NN") class.spec.ct(ct,covN) class.spec.ct(ct,covN,type="NN") class.spec(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) class.spec(Y,fcls) class.spec(Y,fcls,type="NN") class.spec.ct(ct,covN) class.spec.ct(ct,covN,type="NN") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) class.spec(Y,cls) class.spec(Y,cls,type="NN") class.spec.ct(ct,covN) class.spec.ct(ct,covN,type="NN")
Two functions: covNii.ct
and covNii
.
Both functions return the covariance matrix of the self entries (i.e. first column entries) in a
species correspondence contingency table (SCCT)
but have different arguments (see the parameter list below).
The covariance matrix is of dimension and its entries are
where
values are
the entries in the first column of SCCT (recall that
equals diagonal entry
in the NNCT).
These covariances are valid under RL or conditional on
and
under CSR.
The argument ct
which is used in covNii.ct
only, can be either the NNCT or SCCT.
And the argument Vsq
is the vector of variances of the diagonal entries in the NNCT or the self entries
(i.e. the first column) in the SCCT.
See also (Ceyhan (2018)).
covNii.ct(ct, Vsq, Q, R) covNii(dat, lab, ...)
covNii.ct(ct, Vsq, Q, R) covNii(dat, lab, ...)
ct |
The NNCT or SCCT, used in |
Vsq |
The |
Q |
The number of shared NNs, used in |
R |
The number of reflexive NNs (i.e., twice the number of reflexive NN pairs), used in |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A vector
of length whose entries are the variances of the self entries (i.e. first column) in a
species correspondence contingency table (SCCT).
The covariance matrix of cell counts
in the self (i.e., first) column of the SCCT
or of the diagonal cell counts
for
in the NNCT.
Elvan Ceyhan
Ceyhan E (2018). “A contingency table approach based on nearest neighbor relations for testing self and mixed correspondence.” SORT-Statistics and Operations Research Transactions, 42(2), 125-158.
scct
, cov.nnct
, cov.tct
and cov.nnsym
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) vsq<-varNii.ct(ct,Qv,Rv) covNii(Y,cls) covNii.ct(ct,vsq,Qv,Rv) covNii(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) covNii(Y,fcls) covNii.ct(ct,vsq,Qv,Rv) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) vsq<-varNii.ct(ct,Qv,Rv) covNii(Y,cls) covNii.ct(ct,vsq,Qv,Rv)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) vsq<-varNii.ct(ct,Qv,Rv) covNii(Y,cls) covNii.ct(ct,vsq,Qv,Rv) covNii(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) covNii(Y,fcls) covNii.ct(ct,vsq,Qv,Rv) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) vsq<-varNii.ct(ct,Qv,Rv) covNii(Y,cls) covNii.ct(ct,vsq,Qv,Rv)
Four functions: cov.tctI
, cov.tctIII
, cov.tct3
and cov.tctIV
.
These functions return the covariances between between entries in the TCT for the types I, III, and IV
cell-specific tests in matrix form which is of dimension .
The covariance matrix entries are
when
values are by default corresponding to
the row-wise vectorization of TCT.
The argument
CovN
must be the covariance between values which are obtained from the NNCT by row-wise
vectorization.
The functions
cov.tctIII
and cov.tct3
are equivalent.
These covariances are valid under RL or conditional on and
under CSR.
See also (Ceyhan (2017)).
cov.tctI(ct, CovN) cov.tctIII(ct, CovN) cov.tct3(ct, CovN) cov.tctIV(ct, CovN)
cov.tctI(ct, CovN) cov.tctIII(ct, CovN) cov.tct3(ct, CovN) cov.tctIV(ct, CovN)
ct |
A nearest neighbor contingency table |
CovN |
The |
Each of these functions returns a covariance matrix, whose entries are the covariances of
the entries in the TCTs for the corresponding type I-IV cell-specific test.
The row and column names are inherited from
ct
.
Elvan Ceyhan
Ceyhan E (2017). “Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.” Journal of the Korean Statistical Society, 46(2), 219-245.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.tctI(ct,covN) cov.tctIII(ct,covN) cov.tctIV(ct,covN)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cov.tctI(ct,covN) cov.tctIII(ct,covN) cov.tctIV(ct,covN)
and k
NN distancesTwo functions: kthNNdist
and kNNdist
.
kthNNdist
returns the distances between subjects and their NNs. The output is an
matrix where
is the data size and first column is the subject index and second column contains the corresponding
distances to
NN subjects.
kNNdist
returns the distances between subjects and their k
NNs.
The output is an matrix where
is the data size and first column is the subject index and the remaining
k
columns contain the corresponding
distances to k
NN subjects.
kthNNdist(x, k, is.ipd = TRUE, ...) kNNdist(x, k, is.ipd = TRUE, ...)
kthNNdist(x, k, is.ipd = TRUE, ...) kNNdist(x, k, is.ipd = TRUE, ...)
x |
The IPD matrix (if |
k |
Integer specifying the number of NNs (of subjects). |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
kthNNdist
returns an matrix where
is data size (i.e. number of subjects) and
first column is the subject index and second column is the
NN distances.
kNNdist
returns an matrix where
is data size (i.e. number of subjects) and
first column is the subject index and the remaining
k
columns contain the corresponding
distances to k
NN subjects.
Elvan Ceyhan
#Examples for kthNNdist #3D data points, gives NAs when n<=k n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) kthNNdist(ipd,3) kthNNdist(Y,3,is.ipd = FALSE) kthNNdist(ipd,5) kthNNdist(Y,5,is.ipd = FALSE) kthNNdist(Y,3,is.ipd = FALSE,method="max") #1D data points X<-as.matrix(runif(5)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) kthNNdist(ipd,3) #Examples for kNNdist #3D data points, gives NAs if n<=k for n,n+1,...,kNNs n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) kNNdist(ipd,3) kNNdist(ipd,5) kNNdist(Y,5,is.ipd = FALSE) kNNdist(Y,5,is.ipd = FALSE,method="max") kNNdist(ipd,1) kthNNdist(ipd,1) #1D data points X<-as.matrix(runif(5)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) kNNdist(ipd,3)
#Examples for kthNNdist #3D data points, gives NAs when n<=k n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) kthNNdist(ipd,3) kthNNdist(Y,3,is.ipd = FALSE) kthNNdist(ipd,5) kthNNdist(Y,5,is.ipd = FALSE) kthNNdist(Y,3,is.ipd = FALSE,method="max") #1D data points X<-as.matrix(runif(5)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) kthNNdist(ipd,3) #Examples for kNNdist #3D data points, gives NAs if n<=k for n,n+1,...,kNNs n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) kNNdist(ipd,3) kNNdist(ipd,5) kNNdist(Y,5,is.ipd = FALSE) kNNdist(Y,5,is.ipd = FALSE,method="max") kNNdist(ipd,1) kthNNdist(ipd,1) #1D data points X<-as.matrix(runif(5)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) kNNdist(ipd,3)
and k
NN distancesTwo functions: kthNNdist2cl
and kNNdist2cl
.
kthNNdist2cl
returns the distances between subjects from class and their
NNs from class
.
The output is a
list
with first entry (kth.nndist
) is an matrix where
is the size of class
and first column is the subject index for class
,
second column is the index of the
NN of class
subjects among class
subjects and third column
contains the corresponding
NN distances. The other entries in the
list
are labels of base class and NN class
and the value of k
, respectively.
kNNdist2cl
returns the distances between subjects from class and their
k
NNs from class .
The output is a
list
with first entry (ind.knndist
) is an matrix where
is the size of class
,
first column is the indices of class
subjects, 2nd to
-st columns are the indices of
k
NNs of class
subjects among class
subjects. The second
list
entry (knndist
) is an matrix where
is the
size of class
and the columns are the
k
NN distances of class subjects to class
subjects.
The other entries in the
list
are labels of base class and NN class and the value of k
, respectively.
The argument within.class.ind
is a logical argument (default=FALSE
) to determine the indexing of the class
subjects. If
TRUE
, index numbering of subjects is within the class, from 1 to class size (i.e., 1:n_i
),
according to their order in the original data; otherwise, index numbering within class is just the indices
in the original data.
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
kthNNdist2cl(x, k, i, j, lab, within.class.ind = FALSE, is.ipd = TRUE, ...) kNNdist2cl(x, k, i, j, lab, within.class.ind = FALSE, is.ipd = TRUE, ...)
kthNNdist2cl(x, k, i, j, lab, within.class.ind = FALSE, is.ipd = TRUE, ...) kNNdist2cl(x, k, i, j, lab, within.class.ind = FALSE, is.ipd = TRUE, ...)
x |
The IPD matrix (if |
k |
Integer specifying the number of NNs (of subjects). |
i , j
|
class label of base class and NN classes, respectively. |
lab |
The |
within.class.ind |
A logical parameter (default= |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
kthNNdist2cl
returns the list
of elements
kth.nndist |
|
base.class |
label of base class |
nn.class |
label of NN class |
k |
value of |
kNNdist2cl
returns the list
of elements
ind.knndist |
|
knndist |
|
base.class |
label of base class |
nn.class |
label of NN class |
k |
value of |
Elvan Ceyhan
NNdist2cl
, kthNNdist
and kNNdist
#Examples for kthNNdist2cl #3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels table(clab) kthNNdist2cl(ipd,3,1,2,clab) kthNNdist2cl(Y,3,1,2,clab,is.ipd = FALSE) kthNNdist2cl(ipd,3,1,2,clab,within = TRUE) #three class case clab<-sample(1:3,n,replace=TRUE) #class labels table(clab) kthNNdist2cl(ipd,3,2,3,clab) #1D data points n<-15 X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels table(clab) kthNNdist2cl(ipd,3,1,2,clab) # here kthNNdist2cl(ipd,3,1,12,clab) #gives an error message kthNNdist2cl(ipd,3,"1",2,clab) #Examples for kNNdist2cl #3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels table(clab) kNNdist2cl(ipd,3,1,2,clab) kNNdist2cl(Y,3,1,2,clab,is.ipd = FALSE) kNNdist2cl(ipd,3,1,2,clab,within = TRUE) #three class case clab<-sample(1:3,n,replace=TRUE) #class labels table(clab) kNNdist2cl(ipd,3,1,2,clab) #1D data points n<-15 X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels table(clab) kNNdist2cl(ipd,3,1,2,clab) kNNdist2cl(ipd,3,"1",2,clab) #here kNNdist2cl(ipd,3,"a",2,clab) #gives an error message
#Examples for kthNNdist2cl #3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels table(clab) kthNNdist2cl(ipd,3,1,2,clab) kthNNdist2cl(Y,3,1,2,clab,is.ipd = FALSE) kthNNdist2cl(ipd,3,1,2,clab,within = TRUE) #three class case clab<-sample(1:3,n,replace=TRUE) #class labels table(clab) kthNNdist2cl(ipd,3,2,3,clab) #1D data points n<-15 X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels table(clab) kthNNdist2cl(ipd,3,1,2,clab) # here kthNNdist2cl(ipd,3,1,12,clab) #gives an error message kthNNdist2cl(ipd,3,"1",2,clab) #Examples for kNNdist2cl #3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels table(clab) kNNdist2cl(ipd,3,1,2,clab) kNNdist2cl(Y,3,1,2,clab,is.ipd = FALSE) kNNdist2cl(ipd,3,1,2,clab,within = TRUE) #three class case clab<-sample(1:3,n,replace=TRUE) #class labels table(clab) kNNdist2cl(ipd,3,1,2,clab) #1D data points n<-15 X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels table(clab) kNNdist2cl(ipd,3,1,2,clab) kNNdist2cl(ipd,3,"1",2,clab) #here kNNdist2cl(ipd,3,"a",2,clab) #gives an error message
Two functions: overall.nnct.ct
and overall.nnct
.
Both functions are objects of class "Chisqtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of deviations of
cell counts from the expected values under RL or CSR for all cells (i.e., entries) combined in the NNCT.
That is, each test is Dixon's overall test of segregation based on NNCTs for classes.
This overall test is based on the chi-squared approximation of the corresponding quadratic form
and are due to Dixon (1994, 2002).
Both functions exclude the last column of the NNCT (in fact any column will do and last column
is chosen without loss of generality), to avoid ill-conditioning of the covariance matrix (for its inversion
in the quadratic form).
Each function yields the test statistic, -value and
df
which is , description of the
alternative with the corresponding null values (i.e. expected values) of NNCT entries, sample estimates (i.e. observed values) of the entries in NNCT.
The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis is that all entries are equal to their expected values under RL or CSR.
See also (Dixon (1994, 2002); Ceyhan (2010, 2017)) and the references therein.
overall.nnct.ct(ct, covN) overall.nnct(dat, lab, ...)
overall.nnct.ct(ct, covN) overall.nnct(dat, lab, ...)
ct |
A nearest neighbor contingency table, used in |
covN |
The |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The overall chi-squared statistic |
stat.names |
Name of the test statistic |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
Estimates of the parameters, NNCT, i.e., matrix of the observed |
est.name , est.name2
|
Names of the estimates, former is a longer description of the estimates than the latter. |
null.value |
Matrix of hypothesized null values for the parameters which are expected values of the
the |
null.name |
Name of the null values |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Ceyhan E (2017).
“Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.”
Journal of the Korean Statistical Society, 46(2), 219-245.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
overall.seg.ct
, overall.seg
, overall.tct.ct
and overall.tct
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow overall.nnct(Y,cls) overall.nnct.ct(ct,covN) overall.nnct(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) overall.nnct(Y,fcls) overall.nnct.ct(ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) overall.nnct(Y,cls) overall.nnct.ct(ct,covN)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow overall.nnct(Y,cls) overall.nnct.ct(ct,covN) overall.nnct(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) overall.nnct(Y,fcls) overall.nnct.ct(ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) overall.nnct(Y,cls) overall.nnct.ct(ct,covN)
Two functions: overall.seg.ct
and overall.seg
.
All functions are objects of class "Chisqtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of deviations of
cell counts from the expected values under RL or CSR for all cells (i.e., entries) combined in the NNCT or TCT.
That is, each test is one of Dixon's or Types I-IV overall test of segregation based on NNCTs or TCTs
for classes.
Each overall test is based on the chi-squared approximation of the corresponding quadratic form
and are due to Dixon (1994, 2002)
and to Ceyhan (2010, 2017), respectively.
All functions exclude some row and/or column of the TCT, to avoid ill-conditioning of the covariance matrix
of the NNCT (for its inversion in the quadratic form), see the relevant functions under See also section below.
The type="dixon"
or "nnct"
refers to Dixon's overall test of segregation, and
type="I"
-"IV"
refers to types I-IV overall tests, respectively.
Each function yields the test statistic, -value and
df
which is for type II and Dixon's test
and
for the other types, description of the
alternative with the corresponding null values (i.e. expected values) of TCT entries, sample estimates (i.e. observed values) of the entries in TCT.
The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis is that all or
entries for the specified type are equal to their expected values
under RL or CSR, respectively.
See also (Dixon (1994, 2002); Ceyhan (2010, 2010)) and the references therein.
overall.seg.ct(ct, covN, type) overall.seg(dat, lab, type, ...)
overall.seg.ct(ct, covN, type) overall.seg(dat, lab, type, ...)
ct |
A nearest neighbor contingency table, used in |
covN |
The |
type |
The type of the overall test with no default.
Takes on values |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The overall chi-squared statistic for the specified type |
stat.names |
Name of the test statistic |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
Estimates of the parameters, NNCT for Dixon's test and type I-IV TCT for others. |
est.name , est.name2
|
Names of the estimates, former is a longer description of the estimates than the latter. |
null.value |
Matrix of hypothesized null values for the parameters which are expected values of the
the |
null.name |
Name of the null values |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2010).
“New Tests of Spatial Segregation Based on Nearest Neighbor Contingency Tables.”
Scandinavian Journal of Statistics, 37(1), 147-165.
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Ceyhan E (2017).
“Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.”
Journal of the Korean Statistical Society, 46(2), 219-245.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
overall.nnct.ct
, overall.nnct
, overall.tct.ct
and overall.tct
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow type<-"dixon" #try also "nnct", I", "II", "III", and "IV" overall.seg(Y,cls,type) overall.seg(Y,cls,type,method="max") overall.seg(Y,cls,type="I") overall.seg.ct(ct,covN,type) overall.seg.ct(ct,covN,type="I") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) overall.seg(Y,fcls,type="I") overall.seg.ct(ct,covN,type) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) overall.seg(Y,cls,type="I") overall.seg.ct(ct,covN,type)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow type<-"dixon" #try also "nnct", I", "II", "III", and "IV" overall.seg(Y,cls,type) overall.seg(Y,cls,type,method="max") overall.seg(Y,cls,type="I") overall.seg.ct(ct,covN,type) overall.seg.ct(ct,covN,type="I") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) overall.seg(Y,fcls,type="I") overall.seg.ct(ct,covN,type) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) overall.seg(Y,cls,type="I") overall.seg.ct(ct,covN,type)
Two functions: overall.tct.ct
and overall.tct
.
All functions are objects of class "Chisqtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of deviations of
cell counts from the expected values under RL or CSR for all cells (i.e., entries) combined in the TCT.
That is, each test is one of Types I-IV overall test of segregation based on TCTs for classes.
This overall test is based on the chi-squared approximation of the corresponding quadratic form
and are due to Ceyhan (2010, 2017).
Both functions exclude some row and/or column of the TCT, to avoid ill-conditioning of the covariance matrix
of the NNCT (for its inversion in the quadratic form).
In particular, type-II removes the last column, and all other types remove the last row and column.
Each function yields the test statistic, -value and
df
which is for type II test and
for the other types, description of the
alternative with the corresponding null values (i.e. expected values) of TCT entries, sample estimates (i.e. observed values) of the entries in TCT.
The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis is that all Tij entries for the specified type are equal to their expected values under RL or CSR.
See also (Ceyhan (2010, 2017)) and the references therein.
overall.tct.ct(ct, covN, type = "III") overall.tct(dat, lab, type = "III", ...)
overall.tct.ct(ct, covN, type = "III") overall.tct(dat, lab, type = "III", ...)
ct |
A nearest neighbor contingency table, used in |
covN |
The |
type |
The type of the overall segregation test, default= |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The overall chi-squared statistic for the specified type |
stat.names |
Name of the test statistic |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
Estimates of the parameters, TCT, i.e., matrix of the observed |
est.name , est.name2
|
Names of the estimates, former is a longer description of the estimates than the latter. |
null.value |
Matrix of hypothesized null values for the parameters which are expected values of the
the |
null.name |
Name of the null values |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2010).
“New Tests of Spatial Segregation Based on Nearest Neighbor Contingency Tables.”
Scandinavian Journal of Statistics, 37(1), 147-165.
Ceyhan E (2017).
“Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.”
Journal of the Korean Statistical Society, 46(2), 219-245.
overall.seg.ct
, overall.seg
, overall.nnct.ct
and overall.nnct
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow overall.tct(Y,cls) overall.tct(Y,cls,type="I") overall.tct(Y,cls,type="II") overall.tct(Y,cls,type="III") overall.tct(Y,cls,type="IV") overall.tct(Y,cls,method="max") overall.tct.ct(ct,covN) overall.tct.ct(ct,covN,type="I") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) overall.tct(Y,fcls) overall.tct.ct(ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) overall.tct(Y,cls) overall.tct.ct(ct,covN)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow overall.tct(Y,cls) overall.tct(Y,cls,type="I") overall.tct(Y,cls,type="II") overall.tct(Y,cls,type="III") overall.tct(Y,cls,type="IV") overall.tct(Y,cls,method="max") overall.tct.ct(ct,covN) overall.tct.ct(ct,covN,type="I") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) overall.tct(Y,fcls) overall.tct.ct(ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) overall.tct(Y,cls) overall.tct.ct(ct,covN)
The ancillary probability functions used in computation of the variance-covariance matrices
of various NN spatial tests such as NNCT tests and tests based on other contingency tables.
These functions can be classified as pij
and Pij
type functions. The pij
functions are for individual
probabilities and the corresponding Pij
functions are the summed pij
values. For example
is the probability of any 4 points with 2 from class
, and others are from classes
and
.
These probabilities are for data from RL or CSR.
p11(k, n) P11(nvec) p12(k, l, n) P12(nvec) p111(k, n) P111(nvec) p1111(k, n) P1111(nvec) p112(k, l, n) P112(nvec) p122(k, l, n) p123(k, l, m, n) P123(nvec) p1234(k, l, m, p, n) P1234(nvec) p1223(k, l, m, n) p1123(k, l, m, n) P1123(nvec) p1122(k, l, n) P1122(nvec) p1112(k, l, n) P1112(nvec)
p11(k, n) P11(nvec) p12(k, l, n) P12(nvec) p111(k, n) P111(nvec) p1111(k, n) P1111(nvec) p112(k, l, n) P112(nvec) p122(k, l, n) p123(k, l, m, n) P123(nvec) p1234(k, l, m, p, n) P1234(nvec) p1223(k, l, m, n) p1123(k, l, m, n) P1123(nvec) p1122(k, l, n) P1122(nvec) p1112(k, l, n) P1112(nvec)
k , l , m , p
|
Positive integers, usually representing the class sizes, used in |
n |
A positive integer representing the size of the data set (i.e., number of observations in the data set). |
nvec |
A |
Probability values for the selected points being from the indicated classes.
Two functions: scct.ct
and scct
.
Both functions return the species correspondence contingency table (SCCT)
but have different arguments (see the parameter list below).
SCCT is constructed by categorizing the NN pairs according to pair type as self or mixed.
A base-NN pair is called a self pair, if the elements of the pair are from the same class;
a mixed pair, if the elements of the pair are from different classes.
Row labels in the RCT are the class labels and the column labels are "self"
and "mixed"
.
The SCCT (whose first column is self column with entries
and second column is mixed with entries
)
is closely related to the
nearest neighbor contingency table (NNCT) whose entries are
,
where
and
with
is the size of class
.
The function scct.ct
returns the SCCT given the inter-point distance (IPD) matrix or data set x
,
and the function scct
returns the SCCT given the IPD matrix. SCCT is a matrix where
is
number of classes in the data set.
(See Ceyhan (2018) for more detail,
where SCCT is labeled as CCT for correspondence contingency table).
The argument ties
is a logical argument (default=FALSE
for both functions) to take ties into account or not.
If TRUE
a NN contributes to the NN count if it is one of the
tied NNs of a subject.
The argument nnct is a logical argument for scct.ct
only (default=FALSE
) to determine the structure of the
argument x
. If TRUE
, x
is taken to be the NNCT, and if
FALSE
, x
is taken to be the IPD matrix.
The argument lab is the vector
of class labels (default=NULL
when nnct=TRUE
in the function scct.ct
and no default
specified for scct).
scct.ct(x, lab = NULL, ties = FALSE, nnct = FALSE) scct(dat, lab, ties = FALSE, ...)
scct.ct(x, lab = NULL, ties = FALSE, nnct = FALSE) scct(dat, lab, ties = FALSE, ...)
x |
The IPD matrix (if |
lab |
The |
ties |
A logical argument (default= |
nnct |
A logical parameter (default= |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
... |
are for further arguments, such as |
Returns the SCCT where
is the number of classes in the data set.
Elvan Ceyhan
Ceyhan E (2018). “A contingency table approach based on nearest neighbor relations for testing self and mixed correspondence.” SORT-Statistics and Operations Research Transactions, 42(2), 125-158.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) NNCT<-nnct(ipd,cls) NNCT scct(Y,cls) scct(Y,cls,method="max") scct.ct(ipd,cls) scct.ct(ipd,cls,ties = TRUE) scct.ct(NNCT,nnct=TRUE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) scct.ct(ipd,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) NNCT<-nnct(ipd,cls) NNCT scct(Y,cls) scct.ct(ipd,cls) scct.ct(NNCT,nnct=TRUE)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) NNCT<-nnct(ipd,cls) NNCT scct(Y,cls) scct(Y,cls,method="max") scct.ct(ipd,cls) scct.ct(ipd,cls,ties = TRUE) scct.ct(NNCT,nnct=TRUE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) scct.ct(ipd,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) NNCT<-nnct(ipd,cls) NNCT scct(Y,cls) scct.ct(ipd,cls) scct.ct(NNCT,nnct=TRUE)
Two functions: Pseg.coeff
and seg.coeff
.
Each function computes segregation coefficients based on NNCTs.
The function Pseg.coeff
computes Pielou's segregation coefficient (Pielou (1961))
for the two-class case (i.e., based on NNCTs)
and
seg.coeff
is the extension of Pseg.coeff
to the multi-class case (i.e. for NNCTs with
)
and provides a
matrix of segregation coefficients
(Ceyhan (2014)).
Both functions use the same argument,
ct
, for NNCT.
Pielou's segregation coefficient (for two classes) is
and the extended segregation coefficents (for
classes) are
for the diagonal cells in the NNCT
and
for the off-diagonal cells in the NNCT.
Pseg.coeff(ct) seg.coeff(ct)
Pseg.coeff(ct) seg.coeff(ct)
ct |
A nearest neighbor contingency table, used in both functions |
Pseg.coeff
returns Pielou's segregation coefficient for NNCT
seg.coeff
returns a matrix of segregation coefficients (which are extended versions
of Pielou's segregation coefficient)
Elvan Ceyhan
Elvan Ceyhan
Ceyhan E (2014).
“Segregation indices for disease clustering.”
Statistics in Medicine, 33(10), 1662-1684.
Pielou EC (1961).
“Segregation and symmetry in two-species populations as studied by nearest-neighbor relationships.”
Journal of Ecology, 49(2), 255-269.
seg.ind
, Zseg.coeff.ct
and Zseg.coeff
#Examples for Pseg.coeff n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct Pseg.coeff(ct) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Pseg.coeff(ct) ############# ct<-matrix(sample(1:25,9),ncol=3) #Pseg.coeff(ct) #Examples for seg.coeff n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct seg.coeff(ct) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) seg.coeff(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) ct<-nnct(ipd,cls) seg.coeff(ct)
#Examples for Pseg.coeff n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct Pseg.coeff(ct) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Pseg.coeff(ct) ############# ct<-matrix(sample(1:25,9),ncol=3) #Pseg.coeff(ct) #Examples for seg.coeff n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct seg.coeff(ct) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) seg.coeff(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) ct<-nnct(ipd,cls) seg.coeff(ct)
Two functions: varNii.ct
and varNii
.
Both functions return a vector
of length of variances of the self entries (i.e. first column) in a
species correspondence contingency table (SCCT) or the variances of the diagonal entries
in an NNCT,
but have different arguments (see the parameter list below).
These variances are valid under RL or conditional on
and
under CSR.
The argument ct
which is used in varNii.ct
only, can be either the NNCT or SCCT.
See also (Ceyhan (2018)).
varNii.ct(ct, Q, R) varNii(dat, lab, ...)
varNii.ct(ct, Q, R) varNii(dat, lab, ...)
ct |
The NNCT or SCCT, used in |
Q |
The number of shared NNs, used in |
R |
The number of reflexive NNs (i.e., twice the number of reflexive NN pairs), used in |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A vector
of length whose entries are the variances of the self entries (i.e. first column) in a
species correspondence contingency table (SCCT) or of the diagonal entries in an NNCT.
Elvan Ceyhan
Ceyhan E (2018). “A contingency table approach based on nearest neighbor relations for testing self and mixed correspondence.” SORT-Statistics and Operations Research Transactions, 42(2), 125-158.
scct
, var.nnct
, var.tct
, var.nnsym
and covNii
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varNii(Y,cls) varNii.ct(ct,Qv,Rv) varNii(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) varNii(Y,fcls) varNii.ct(ct,Qv,Rv) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varNii(Y,cls) varNii.ct(ct,Qv,Rv)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varNii(Y,cls) varNii.ct(ct,Qv,Rv) varNii(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) varNii(Y,fcls) varNii.ct(ct,Qv,Rv) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varNii(Y,cls) varNii.ct(ct,Qv,Rv)
Three functions: var.tctI
, var.tctIII
and var.tctIV
.
These functions return the variances of values for
in the TCT in matrix form which
is of the same dimension as TCT for types I, III and IV tests.
The argument
covN
must be the covariance between values which are obtained from the NNCT by row-wise
vectorization.
These variances are valid under RL or conditional on
and
under CSR.
See also (Ceyhan (2017)).
var.tctI(ct, covN) var.tctIII(ct, covN) var.tctIV(ct, covN)
var.tctI(ct, covN) var.tctIII(ct, covN) var.tctIV(ct, covN)
ct |
A nearest neighbor contingency table |
covN |
The |
Each of these functions returns a matrix
of same dimension as, ct
, whose entries are the variances of
the entries in the TCT for the corresponding type of cell-specific test.
The row and column names are inherited from ct
.
Ceyhan E (2017). “Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.” Journal of the Korean Statistical Society, 46(2), 219-245.
Test statisticTwo functions: aij.mat
and aij.nonzero
.
The function aij.mat
yields the matrix where
if
is among the
k
NNs of
and 0 otherwise due to Tango (2007).
This matrix is useful in calculation of the moments of Cuzick-Edwards
tests.
The function aij.nonzero
keeps only nonzero entries, i.e., row and column entries where
in each row, for the entry
is the row entry and
is the column entry. Rows are from
1 to n, which stands for the data point or observation, and column entries are from 1 to
k
, where k
is specifying
the number of k
NNs (of each observation) considered. This function saves in storage memory, but needs to be
carefully unfolded in the functions to represent the actual the matrix.
See also (Tango (2007)).
aij.mat(dat, k, ...) aij.nonzero(dat, k, ...)
aij.mat(dat, k, ...) aij.nonzero(dat, k, ...)
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
k |
Integer specifying the number of NNs (of subject |
... |
are for further arguments, such as |
The function aij.mat
returns the matrix for computation of moments of Cuzick and Edwards
Test statistic while the function
aij.nonzero
returns the (locations of the) non-zero entries in the
matrix
Elvan Ceyhan
Tango T (2007). “A class of multiplicity adjusted tests for spatial clustering based on case-control point data.” Biometrics, 63, 119-127.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) k<-3 #try also 2,3 Aij<-aij.mat(Y,k) Aij Aij2<-aij.mat(Y,k,method="max") range(Aij,Aij2) apply(Aij,2,sum) #row sums of Aij aij.nonzero(Y,k) aij.nonzero(Y,k,method="max")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) k<-3 #try also 2,3 Aij<-aij.mat(Y,k) Aij Aij2<-aij.mat(Y,k,method="max") range(Aij,Aij2) apply(Aij,2,sum) #row sums of Aij aij.nonzero(Y,k) aij.nonzero(Y,k,method="max")
Two functions: correct.cf1
and correct.cf1
.
Each function yields matrices which are used in obtaining covariance matrices of values for
types I and II tests from the usual Chi-Square test of contingency tables (i.e. Pielou's test) applied
on NNCTs.
The output matrices are to be term-by-term multiplied with the covariance matrix of
the entries of NNCT. See Sections 3.1 and 3.2 in
(Ceyhan (2010))
or
Sections 3.5.1 and 3.5.2 in
(Ceyhan (2008)) for more details.
correct.cf1(ct) correct.cf2(ct)
correct.cf1(ct) correct.cf2(ct)
ct |
A nearest neighbor contingency table |
Both functions return a correction matrix which is to be multiplied with the covariance matrix of entries of the NNCT so as to obtain types I and II overall tests from Pielou's test of segregation. See the description above for further detail.
Elvan Ceyhan
Ceyhan E (2008).
“New Tests for Spatial Segregation Based on Nearest Neighbor Contingency Tables.”
https://arxiv.org/abs/0808.1409v3 [stat.ME].
Technical Report # KU-EC-08-6, Koç University, Istanbul, Turkey.
Ceyhan E (2010).
“New Tests of Spatial Segregation Based on Nearest Neighbor Contingency Tables.”
Scandinavian Journal of Statistics, 37(1), 147-165.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #correction type 1 CM1<-correct.cf1(ct) CovN.cf1<-covN*CM1 #correction type 2 CM2<-correct.cf2(ct) CovN.cf2<-covN*CM2 covN CovN.cf1 CovN.cf2
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #correction type 1 CM1<-correct.cf1(ct) CovN.cf1<-covN*CM1 #correction type 2 CM2<-correct.cf2(ct) CovN.cf2<-covN*CM2 covN CovN.cf1 CovN.cf2
Test statisticTwo functions: EV.Tk
and EV.Tkaij
.
Both functions compute the expected value of Cuzick and Edwards test statistic based on the number of cases
within
k
NNs of the cases in the data under RL or CSR independence.
The number of cases are denoted as (denoted as
n1
as an argument)
for both functions and number of controls as (denoted as
n0
as an argument) in EV.Tk
,
to match the case-control class labeling,
which is just the reverse of the labeling in Cuzick and Edwards (1990).
The function EV.Tkaij
uses Toshiro Tango's moments formulas based on the matrix
(and is equivalent to the function
EV.Tk
, see Tango (2007),
where if
is among the
k
NNs of and 0 otherwise.
See also (Ceyhan (2014)).
EV.Tk(k, n1, n0) EV.Tkaij(k, n1, a)
EV.Tk(k, n1, n0) EV.Tkaij(k, n1, a)
k |
Integer specifying the number of NNs (of subject |
n1 , n0
|
The number of cases and controls, |
a |
The |
The expected value of Cuzick and Edwards test statistic for disease clustering
Elvan Ceyhan
Ceyhan E (2014).
“Segregation indices for disease clustering.”
Statistics in Medicine, 33(10), 1662-1684.
Cuzick J, Edwards R (1990).
“Spatial clustering for inhomogeneous populations (with discussion).”
Journal of the Royal Statistical Society, Series B, 52, 73-104.
Tango T (2007).
“A class of multiplicity adjusted tests for spatial clustering based on case-control point data.”
Biometrics, 63, 119-127.
n1<-20 n0<-25 k<-1 #try also 3, 5, sample(1:5,1) EV.Tk(k,n1,n0) ### n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) n1<-sum(cls==1) n0<-sum(cls==0) a<-aij.mat(Y,k) EV.Tk(k,n1,n0) EV.Tkaij(k,n1,a)
n1<-20 n0<-25 k<-1 #try also 3, 5, sample(1:5,1) EV.Tk(k,n1,n0) ### n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) n1<-sum(cls==1) n0<-sum(cls==0) a<-aij.mat(Y,k) EV.Tk(k,n1,n0) EV.Tkaij(k,n1,a)
Test statisticTwo functions: EV.Trun
and EV.Trun.alt
.
Both functions compute the expected value of Cuzick and Edwards test statistic based on the number of
consecutive cases from the cases in the data under RL or CSR independence.
The number of cases are denoted as (denoted as
n1
as an argument)
and number of controls as for both functions (denoted as
n0
as an argument),
to match the case-control class labeling,
which is just the reverse of the labeling in Cuzick and Edwards (1990).
The function EV.Trun.alt
uses a loop and takes slightly longer than the function EV.Trun
,
hence EV.Trun
is used in other functions.
See also (Cuzick and Edwards (1990)).
EV.Trun(n1, n0) EV.Trun.alt(n1, n0)
EV.Trun(n1, n0) EV.Trun.alt(n1, n0)
n1 , n0
|
The number of cases and controls used as arguments for both functions. |
The expected value of Cuzick and Edwards test statistic for disease clustering
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
n1<-20 n0<-25 EV.Trun(n1,n0)
n1<-20 n0<-25 EV.Trun(n1,n0)
Two functions: nnct.cr1
and nnct.cr1
.
Each function yields matrices which are used in obtaining the correction term to be added to
the usual Chi-Square test of contingency tables (i.e. Pielou's test) applied
on NNCTs to obtain types I and II overall tests.
The output contingency tables are to be row-wise vectorized to obtain and
vectors.
See Sections 3.1 and 3.2 in
(Ceyhan (2010))
or
Sections 3.5.1 and 3.5.2 in
(Ceyhan (2008)) for more details.
nnct.cr1(ct) nnct.cr2(ct)
nnct.cr1(ct) nnct.cr2(ct)
ct |
A nearest neighbor contingency table |
Both functions return a contingency table which is to be row-wise vectorized to obtain
and
vectors which are used in the correction summands to obtain types I and II overall tests from Pielou's
test of segregation.
See the description above for further detail.
Elvan Ceyhan
Ceyhan E (2008).
“New Tests for Spatial Segregation Based on Nearest Neighbor Contingency Tables.”
https://arxiv.org/abs/0808.1409v3 [stat.ME].
Technical Report # KU-EC-08-6, Koç University, Istanbul, Turkey.
Ceyhan E (2010).
“New Tests of Spatial Segregation Based on Nearest Neighbor Contingency Tables.”
Scandinavian Journal of Statistics, 37(1), 147-165.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) #correction type 1 ct1<-nnct.cr1(ct) #correction type 2 ct2<-nnct.cr2(ct) ct ct1 ct2
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) #correction type 1 ct1<-nnct.cr1(ct) #correction type 2 ct2<-nnct.cr2(ct) ct ct1 ct2
Two functions: NN.class.spec.ct
and NN.class.spec
.
Both functions are objects of class "classhtest"
but with different arguments (see the parameter list below).
Each one performs class specific segregation tests for the columns, i.e., NN categories for classes.
That is,
each one performs hypothesis tests of deviations of
entries in each column of NNCT from the expected values under RL or CSR for each column.
Recall that column labels in the NNCT are NN class labels.
The test for each column
is based on the chi-squared approximation of the corresponding quadratic form
and are due to Ceyhan (2009).
The argument covN
must be covariance of column-wise vectorization of NNCT if the logical argument byrow=FALSE
otherwise the function converts covN
(which is done row-wise) to columnwise version with covNrow2col
function.
Each function yields the test statistic, -value and
df
for each base class , description of the
alternative with the corresponding null values (i.e. expected values) for the column
, estimates for the entries in column
for
. The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis for each column is that the corresponding entries in column
are equal to their
expected values under RL or CSR.
See also (Dixon (2002); Ceyhan (2009)) and the references therein.
NN.class.spec.ct(ct, covN, byrow = TRUE) NN.class.spec(dat, lab, ...)
NN.class.spec.ct(ct, covN, byrow = TRUE) NN.class.spec(dat, lab, ...)
ct |
A nearest neighbor contingency table, used in |
covN |
The |
byrow |
A logical argument (default= |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
type |
Type of the class-specific test, which is |
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
Estimates of the parameters, transpose of the NNCT, i.e., transpose of the matrix of the
observed |
null.value |
Transpose of the matrix of hypothesized null values for the parameters which are expected
values of the |
null.name |
Name of the null values |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2009).
“Class-Specific Tests of Segregation Based on Nearest Neighbor Contingency Tables.”
Statistica Neerlandica, 63(2), 149-182.
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
base.class.spec.ct
, base.class.spec
, class.spec.ct
and class.spec
n<-20 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covNrow<-cov.nnct(ct,varN,Qv,Rv) covNcol<-covNrow2col(covNrow) NN.class.spec(Y,cls) NN.class.spec(Y,cls,method="max") NN.class.spec.ct(ct,covNrow) NN.class.spec.ct(ct,covNcol,byrow = FALSE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) NN.class.spec(Y,fcls) NN.class.spec.ct(ct,covNrow) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covNrow<-cov.nnct(ct,varN,Qv,Rv) covNcol<-covNrow2col(covNrow) NN.class.spec(Y,cls) NN.class.spec.ct(ct,covNrow) NN.class.spec.ct(ct,covNcol,byrow = FALSE)
n<-20 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covNrow<-cov.nnct(ct,varN,Qv,Rv) covNcol<-covNrow2col(covNrow) NN.class.spec(Y,cls) NN.class.spec(Y,cls,method="max") NN.class.spec.ct(ct,covNrow) NN.class.spec.ct(ct,covNcol,byrow = FALSE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) NN.class.spec(Y,fcls) NN.class.spec.ct(ct,covNrow) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covNrow<-cov.nnct(ct,varN,Qv,Rv) covNcol<-covNrow2col(covNrow) NN.class.spec(Y,cls) NN.class.spec.ct(ct,covNrow) NN.class.spec.ct(ct,covNcol,byrow = FALSE)
Two functions: lab.onevsrest
and classirest
.
Both functions relabel the points, keeping class label as is and relabeling the other classes as "rest".
Used in the one-vs-rest type comparisons after the overall segregation test is found to be significant.
See also (Ceyhan (2017)).
lab.onevsrest(i, lab) classirest(i, lab)
lab.onevsrest(i, lab) classirest(i, lab)
i |
label of the class that is to be retained in the post-hoc comparison. |
lab |
The |
Both functions return the data relabeled as class label is retained and the remaining is
relabeled as "rest".
Elvan Ceyhan
Ceyhan E (2017). “Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.” Journal of the Korean Statistical Society, 46(2), 219-245.
n<-20 #or try sample(1:20,1) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) lab.onevsrest(1,cls) classirest(2,cls) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) lab.onevsrest("a",fcls) lab.onevsrest("b",fcls) classirest("b",fcls) #cls as a factor fcls<-rep(letters[1:4],rep(10,4)) lab.onevsrest("b",fcls) classirest("b",fcls)
n<-20 #or try sample(1:20,1) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) lab.onevsrest(1,cls) classirest(2,cls) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) lab.onevsrest("a",fcls) lab.onevsrest("b",fcls) classirest("b",fcls) #cls as a factor fcls<-rep(letters[1:4],rep(10,4)) lab.onevsrest("b",fcls) classirest("b",fcls)
Two functions: Pseg.ss.ct
and Pseg.ss
.
Both functions are objects of class "Chisqtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of deviations of
cell counts from the expected values under independence for all cells (i.e., entries) combined in the NNCT.
That is, each test is Pielou's overall test of segregation based on NNCTs for classes.
This overall test is based on the chi-squared approximation,
is equivalent to Pearson's chi-squared test on NNCT and
is due to Pielou (1961).
Each test is appropriate (i.e. have the appropriate asymptotic sampling distribution)
when that data is obtained by sparse sampling.
Each function yields the test statistic, -value and
df
which is , description of the
alternative with the corresponding null values (i.e. expected values) of NNCT entries,
sample estimates (i.e. observed values) of the entries in NNCT.
The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis is that for all entries in the NNCT
where
is the sum of row
(i.e. size of class
),
is the sum of column
in the
NNCT for
.
In the output, the test statistic and the
-value are valid only
for (properly) sparsely sampled data.
See also (Pielou (1961); Ceyhan (2010)) and the references therein.
Pseg.ss.ct(ct, yates = TRUE, sim = FALSE, Nsim = 2000) Pseg.ss(dat, lab, yates = TRUE, sim = FALSE, Nsim = 2000, ...)
Pseg.ss.ct(ct, yates = TRUE, sim = FALSE, Nsim = 2000) Pseg.ss(dat, lab, yates = TRUE, sim = FALSE, Nsim = 2000, ...)
ct |
A nearest neighbor contingency table, used in |
yates |
A logical parameter (default= |
sim |
A logical parameter (default= |
Nsim |
A positive integer specifying the number of replicates used in the Monte Carlo test.
Equivalent to the |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The overall chi-squared statistic |
stat.names |
Name of the test statistic |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is (k-1)^2 for this function.
Yields |
estimate |
Estimates of the parameters, NNCT, i.e., matrix of the observed |
est.name , est.name2
|
Names of the estimates, they are identical for this function. |
null.value |
Matrix of hypothesized null values for the parameters which are expected values of the
the |
null.name |
Name of the null values |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Pielou EC (1961).
“Segregation and symmetry in two-species populations as studied by nearest-neighbor relationships.”
Journal of Ecology, 49(2), 255-269.
overall.nnct.ct
, overall.nnct
, overall.seg.ct
,
overall.seg
and chisq.test
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct Pseg.ss(Y,cls) Pseg.ss.ct(ct) Pseg.ss.ct(ct,yates=FALSE) Pseg.ss.ct(ct,yates=FALSE,sim=TRUE) Pseg.ss.ct(ct,yates=FALSE,sim=TRUE,Nsim=10000) Pseg.ss(Y,cls,method="max") Pseg.ss(Y,cls,yates=FALSE,sim=TRUE,Nsim=10000,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Pseg.ss(Y,fcls) Pseg.ss.ct(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) Pseg.ss(Y,cls) Pseg.ss.ct(ct,yates=FALSE) Pseg.ss(Y,cls, sim = TRUE, Nsim = 2000) Pseg.ss.ct(ct,yates=FALSE)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct Pseg.ss(Y,cls) Pseg.ss.ct(ct) Pseg.ss.ct(ct,yates=FALSE) Pseg.ss.ct(ct,yates=FALSE,sim=TRUE) Pseg.ss.ct(ct,yates=FALSE,sim=TRUE,Nsim=10000) Pseg.ss(Y,cls,method="max") Pseg.ss(Y,cls,yates=FALSE,sim=TRUE,Nsim=10000,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Pseg.ss(Y,fcls) Pseg.ss.ct(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) Pseg.ss(Y,cls) Pseg.ss.ct(ct,yates=FALSE) Pseg.ss(Y,cls, sim = TRUE, Nsim = 2000) Pseg.ss.ct(ct,yates=FALSE)
Four functions: Qval
, Qvec
, sharedNN
and Rval
.
Qval
returns the value, the number of points with shared nearest neighbors (NNs), which occurs when two or
more points share a NN, for data in any dimension.
Qvec
returns the Q-value and also yields the Qv vector as well for data in any
dimension, where
is the number of points shared as a NN by
other points.
sharedNN
returns the vector
of number of points with shared NNs, where
is
the number of points that are NN to
points, and if a point is a NN of
points, then there are
points that share a NN. So
.
Rval
returns the number of reflexive NNs, R (i.e., twice the number of reflexive NN pairs).
These quantities are used, e.g., in computing the variances and covariances of the entries of the
nearest neighbor contingency tables used for Dixon's tests and other NNCT tests.
The input must be the incidence matrix, , of the NN digraph.
Qval(W) Qvec(W) sharedNN(W) Rval(W)
Qval(W) Qvec(W) sharedNN(W) Rval(W)
W |
The incidence matrix, |
Qval
returns the value
Qvec
returns a list
with two elements
q |
the |
qvec |
the |
sharedNN
returns a matrix
with 2 rows, where first row is the values and second row is
the corresponding vector of
values
Rval
the value, the number of reflexive NNs
See the description above for the details of these quantities.
Elvan Ceyhan
Tval
, QRval
, sharedNNmc
and Ninv
#Examples for Qval #3D data points n<-10 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qval(W) #1D data points X<-as.matrix(runif(10)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(10) would not work ipd<-ipd.mat(X) W<-Wmat(ipd) Qval(W) #with ties=TRUE in the data Y<-matrix(round(runif(15)*10),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd,ties=TRUE) Qval(W) #with ties=TRUE in the data Y<-matrix(round(runif(15)*10),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd,ties=TRUE) Qval(W) #Examples for Qvec #3D data points n<-10 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qvec(W) #2D data points n<-15 Y<-matrix(runif(2*n),ncol=2) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qvec(W) #1D data points X<-as.matrix(runif(15)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(15) would not work ipd<-ipd.mat(X) W<-Wmat(ipd) Qvec(W) #with ties=TRUE in the data Y<-matrix(round(runif(15)*10),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd,ties=TRUE) Qvec(W) #Examples for sharedNN #3D data points n<-10 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd) sharedNN(W) Qvec(W) #1D data points X<-as.matrix(runif(15)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) W<-Wmat(ipd) sharedNN(W) Qvec(W) #2D data points n<-15 Y<-matrix(runif(2*n),ncol=2) ipd<-ipd.mat(Y) W<-Wmat(ipd) sharedNN(W) Qvec(W) #with ties=TRUE in the data Y<-matrix(round(runif(30)*10),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd,ties=TRUE) sharedNN(W) #Examples for Rval #3D data points n<-10 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd) Rval(W) #1D data points X<-as.matrix(runif(15)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) W<-Wmat(ipd) Rval(W) #with ties=TRUE in the data Y<-matrix(round(runif(30)*10),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd,ties=TRUE) Rval(W)
#Examples for Qval #3D data points n<-10 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qval(W) #1D data points X<-as.matrix(runif(10)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(10) would not work ipd<-ipd.mat(X) W<-Wmat(ipd) Qval(W) #with ties=TRUE in the data Y<-matrix(round(runif(15)*10),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd,ties=TRUE) Qval(W) #with ties=TRUE in the data Y<-matrix(round(runif(15)*10),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd,ties=TRUE) Qval(W) #Examples for Qvec #3D data points n<-10 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qvec(W) #2D data points n<-15 Y<-matrix(runif(2*n),ncol=2) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qvec(W) #1D data points X<-as.matrix(runif(15)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(15) would not work ipd<-ipd.mat(X) W<-Wmat(ipd) Qvec(W) #with ties=TRUE in the data Y<-matrix(round(runif(15)*10),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd,ties=TRUE) Qvec(W) #Examples for sharedNN #3D data points n<-10 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd) sharedNN(W) Qvec(W) #1D data points X<-as.matrix(runif(15)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) W<-Wmat(ipd) sharedNN(W) Qvec(W) #2D data points n<-15 Y<-matrix(runif(2*n),ncol=2) ipd<-ipd.mat(Y) W<-Wmat(ipd) sharedNN(W) Qvec(W) #with ties=TRUE in the data Y<-matrix(round(runif(30)*10),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd,ties=TRUE) sharedNN(W) #Examples for Rval #3D data points n<-10 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd) Rval(W) #1D data points X<-as.matrix(runif(15)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) W<-Wmat(ipd) Rval(W) #with ties=TRUE in the data Y<-matrix(round(runif(30)*10),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd,ties=TRUE) Rval(W)
Two functions: row.sum
and col.sum
.
row.sum
returns the row sums of a given matrix (in particular a contingency table) as a vector and
col.sum
returns the column sums of a given matrix as a vector. row.sum
is equivalent to
rowSums
function and col.sum
is equivalent to colSums
function in the base
package.
row.sum(ct) col.sum(ct)
row.sum(ct) col.sum(ct)
ct |
A matrix, in particular a contingency table |
row.sum
returns the row sums of ct
as a vector
col.sum
returns the column sums of ct
as a vector
Elvan Ceyhan
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) row.sum(ct) rowSums(ct) col.sum(ct) colSums(ct) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) row.sum(ct) rowSums(ct) col.sum(ct) colSums(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) row.sum(ct) rowSums(ct) col.sum(ct) colSums(ct)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) row.sum(ct) rowSums(ct) col.sum(ct) colSums(ct) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) row.sum(ct) rowSums(ct) col.sum(ct) colSums(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) row.sum(ct) rowSums(ct) col.sum(ct) colSums(ct)
Test statisticTwo functions: VarTk
and VarTkaij
.
Both functions compute the (finite sample) variance of Cuzick and Edwards test statistic based on the
number of cases within
k
NNs of the cases in the data under RL or CSR independence.
The common arguments for both functions are n1
, representing the number of cases and k
.
The number of cases are denoted as and number of controls as
in this function
to match the case-control class labeling,
which is just the reverse of the labeling in Cuzick and Edwards (1990).
The logical argument nonzero.mat
(default=TRUE
) is for using the matrix if
FALSE
or just the matrix of nonzero
locations in the matrix (if
TRUE
) for computing and
, which are required in the computation of the
variance.
and
are defined on page 78 of (Cuzick and Edwards (1990)) as follows.
(i.e., number of ordered pairs for which
k
NN relation is symmetric)
and (i.e, number of triplets
, and
distinct so that
is among
k
NNs of and
is among
k
NNs of ).
The function VarTkaij
uses Toshiro Tango's moments formulas based on the matrix
(and is equivalent to the function
VarTk
, see Tango (2007),
where if
is among the
k
NNs of and 0 otherwise.
The function varTkaij
is equivalent to varTk
(with $var
extension).
See (Cuzick and Edwards (1990); Tango (2007)).
varTk(dat, n1, k, nonzero.mat = TRUE, ...) varTkaij(n1, k, a)
varTk(dat, n1, k, nonzero.mat = TRUE, ...) varTkaij(n1, k, a)
dat |
The data set in one or higher dimensions, each row corresponds to a data point, used in |
n1 |
Number of cases |
k |
Integer specifying the number of NNs (of subject |
nonzero.mat |
A logical argument (default is |
... |
are for further arguments, such as |
a |
The |
The function VarTk
returns a list
with the elements
var.Tk |
The (finite sample) variance of Cuzick and Edwards |
Ns |
The |
Nt |
The |
The function VarTkaij
returns only var.Tk
as above.
Elvan Ceyhan
Elvan Ceyhan
Cuzick J, Edwards R (1990).
“Spatial clustering for inhomogeneous populations (with discussion).”
Journal of the Royal Statistical Society, Series B, 52, 73-104.
Tango T (2007).
“A class of multiplicity adjusted tests for spatial clustering based on case-control point data.”
Biometrics, 63, 119-127.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) k<-2 #try also 2,3 a<-aij.mat(Y,k) varTk(Y,n1,k) varTk(Y,n1,k,nonzero.mat=FALSE) varTk(Y,n1,k,method="max") n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) k<-1 #try also 2,3, sample(1:5,1) a<-aij.mat(Y,k) varTkaij(n1,k,a) varTk(Y,n1,k)$var
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) k<-2 #try also 2,3 a<-aij.mat(Y,k) varTk(Y,n1,k) varTk(Y,n1,k,nonzero.mat=FALSE) varTk(Y,n1,k,method="max") n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) k<-1 #try also 2,3, sample(1:5,1) a<-aij.mat(Y,k) varTkaij(n1,k,a) varTk(Y,n1,k)$var
Test statisticTwo functions: varTrun
and varTrun.sim
.
The function varTrun
computes the (finite sample) variance of Cuzick and Edwards test statistic
which is based on the number of consecutive cases from the cases in the data under RL or CSR independence.
And the function
varTrun.sim
estimates this variance based on simulations under the RL hypothesis.
The only common argument for both functions is dat
, the data set used in the functions.
is an argument for
varTrun
and is the number of cases (denoted as n1
as an argument).
The number of cases are denoted as and number of controls as
in this function
to match the case-control class labeling,
which is just the reverse of the labeling in Cuzick and Edwards (1990).
The argument cc.lab
is case-control label, 1 for case, 0 for control, if the argument case.lab
is NULL
,
then cc.lab
should be provided in this fashion, if case.lab
is provided, the labels are converted to 0's
and 1's accordingly. The argument Nsim
represents the number of resamplings (without replacement) in the
RL scheme, with default being 1000
. cc.lab
, case.lab
and Nsim
are arguments for varTrun.sim
only.
The function varTrun
might take a very long time when data size is large (even larger than 50),
hence the need for the varTrun.sim
function.
See (Cuzick and Edwards (1990)).
varTrun(dat, n1, ...) varTrun.sim(dat, cc.lab, Nsim = 1000, case.lab = NULL)
varTrun(dat, n1, ...) varTrun.sim(dat, cc.lab, Nsim = 1000, case.lab = NULL)
dat |
The data set in one or higher dimensions, each row corresponds to a data point, used in both functions. |
n1 |
Number of cases, used in |
... |
are for further arguments, such as |
cc.lab |
Case-control labels, 1 for case, 0 for control, used in |
Nsim |
The number of simulations, i.e., the number of resamplings under the RL scheme to estimate the
variance of |
case.lab |
The label used for cases in the |
The function varTrun
returns the variance of Cuzick and Edwards test statistic
under RL or CSR independence.
And the function
varTrun.sim
estimates the same variance based on simulations under the RL hypothesis.
Elvan Ceyhan
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
n<-20 #or try sample(1:20,1) #try also 40, 50, 60 set.seed(123) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) n1<-sum(cls==1) n0<-sum(cls==0) c(n1,n0) varTrun(Y,n1) varTrun(Y,n1,method="max") n<-15 #or try sample(1:20,1) #try also 40, 50, 60 set.seed(123) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) n1<-sum(cls==1) varTrun(Y,n1) #the actual value (might take a long time if n is large) Nmc<-1000 varTrun.sim(Y,cls,Nsim=Nmc) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) varTrun.sim(Y,fcls,Nsim=Nmc,case.lab="a")
n<-20 #or try sample(1:20,1) #try also 40, 50, 60 set.seed(123) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) n1<-sum(cls==1) n0<-sum(cls==0) c(n1,n0) varTrun(Y,n1) varTrun(Y,n1,method="max") n<-15 #or try sample(1:20,1) #try also 40, 50, 60 set.seed(123) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) n1<-sum(cls==1) varTrun(Y,n1) #the actual value (might take a long time if n is large) Nmc<-1000 varTrun.sim(Y,cls,Nsim=Nmc) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) varTrun.sim(Y,fcls,Nsim=Nmc,case.lab="a")
values for Tango's
test statisticThree functions: W3val
, W4val
and W5val
, each of which is needed to compute
(i.e., for the skewness of
)
where
which is defined in Equation (2) of Tango (2007) as follows:
Let
,
, denote the locations of the points in the combined sample
when the indices have been randomly permuted so that the
contain no information about group membership.
where if
is a case,
and 0 if
is a control,
could be any matrix of a measure of
the closeness between two points
and
with
for all
, and
denotes the unknown parameter vector related to cluster size and
.
Here the number of cases are denoted as
and number of controls as
to match the case-control class
labeling, which is just the reverse of the labeling in Tango (2007).
If in the nearest neighbors model with
if
is among the
NNs of
and 0
otherwise, then the test statistic
is the Cuzick and Edwards
NN test statistic,
Cuzick and Edwards (1990), see also
ceTk
.
values are used for Tango's correction to Cuzick and Edwards
NN test statistic,
and
here corresponds to
in Tango (2007)
(defined for consistency with
's and
having
distinct elements).
The argument of the function is the matrix,
a
, which is the output of the function aij.mat
.
However, inside the function we symmetrize the matrix a
as b <- (a+a^t)/2
, to facilitate the formulation.
W3val(a) W4val(a) W5val(a)
W3val(a) W4val(a) W5val(a)
a |
|
Each function Wkval
returns the value for
.
Elvan Ceyhan
Cuzick J, Edwards R (1990).
“Spatial clustering for inhomogeneous populations (with discussion).”
Journal of the Royal Statistical Society, Series B, 52, 73-104.
Tango T (2007).
“A class of multiplicity adjusted tests for spatial clustering based on case-control point data.”
Biometrics, 63, 119-127.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) k<-sample(1:5,1) # try also 3, 5, sample(1:5,1) k a<-aij.mat(Y,k) W3val(a) W4val(a) W5val(a) a<-aij.mat(Y,k,method="max") W3val(a) W4val(a) W5val(a)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) k<-sample(1:5,1) # try also 3, 5, sample(1:5,1) k a<-aij.mat(Y,k) W3val(a) W4val(a) W5val(a) a<-aij.mat(Y,k,method="max") W3val(a) W4val(a) W5val(a)
Two functions: Xsq.nnref.ct
and Xsq.nnref
.
Both functions are objects of class "Chisqtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of equality of the expected values of the
diagonal cell counts (i.e., entries) under RL or CSR in the RCT for classes.
That is, each test performs an overall NN reflexivity test (for the vector of entries
and
,
respectively, in the RCT) which is
appropriate (i.e. have the appropriate asymptotic sampling distribution) for completely mapped data.
(See Ceyhan and Bahadir (2017) for more detail).
Each reflexivity test is based on the chi-squared approximation of the corresponding quadratic form for the vector of diagonal entries in the RCT and are due to Ceyhan and Bahadir (2017).
Each function yields the test statistic, -value and
df
which is 2, description of the
alternative with the corresponding null values (i.e. expected values) of the diagonal entries
and also the sample estimates (i.e. observed values) of the diagonal entries of RCT (as a vector).
The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis is that in the RCT, where
is the number of reflexive
NNs and
is the probability of any two points selected are being from the same class
and
is the probability of any two points selected are being from two different classes.
Xsq.nnref.ct(rfct, nvec, Qv, Tv) Xsq.nnref(dat, lab, ...)
Xsq.nnref.ct(rfct, nvec, Qv, Tv) Xsq.nnref(dat, lab, ...)
rfct |
An RCT, used in |
nvec |
The |
Qv |
The number of shared NNs, used in |
Tv |
|
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The chi-squared test statistic for overall NN reflexivity test |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is 2 for this function. |
estimate |
Estimates of the parameters, i.e., the observed diagonal entries |
est.name , est.name2
|
Names of the estimates, they are identical for this function. |
null.value |
Hypothesized null values for the diagonal entries |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E, Bahadir S (2017). “Nearest Neighbor Methods for Testing Reflexivity.” Environmental and Ecological Statistics, 24(1), 69-108.
Znnref.ct
, Znnref
, Zself.ref.ct
,
Zself.ref
, Zmixed.nonref.ct
and Zmixed.nonref
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qv<-Qvec(W)$q R<-Rval(W) Tv<-Tval(W,R) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Xsq.nnref(Y,cls) Xsq.nnref.ct(rfct,nvec,Qv,Tv) Xsq.nnref(Y,cls,method="max") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qv<-Qvec(W)$q R<-Rval(W) Tv<-Tval(W,R) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Xsq.nnref(Y,cls) Xsq.nnref.ct(rfct,nvec,Qv,Tv)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qv<-Qvec(W)$q R<-Rval(W) Tv<-Tval(W,R) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Xsq.nnref(Y,cls) Xsq.nnref.ct(rfct,nvec,Qv,Tv) Xsq.nnref(Y,cls,method="max") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qv<-Qvec(W)$q R<-Rval(W) Tv<-Tval(W,R) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Xsq.nnref(Y,cls) Xsq.nnref.ct(rfct,nvec,Qv,Tv)
Two functions: Xsq.nnsym.dx.ct
and Xsq.nnsym.dx
.
Both functions are objects of class "Chisqtest"
but with different arguments (see the parameter list below).
Each one performs the hypothesis test of equality of the expected value of the off-diagonal
cell counts (i.e., entries) under RL or CSR in the NNCT for classes.
That is, each performs Dixon's overall NN symmetry test.
The test is appropriate (i.e. have the appropriate asymptotic sampling distribution)
for completely mapped data.
(See Ceyhan (2014) for more detail).
Each symmetry test is based on the chi-squared approximation of the corresponding quadratic form and is an extension of Dixon's NN symmetry test, which is extended by Ceyhan (2014).
Each function yields the test statistic, -value and
df
which is , description of the
alternative with the corresponding null values (i.e. expected values) of differences of the off-diagonal entries,(which is
0 for this function) and also the sample estimates (i.e. observed values) of absolute differences of the off-diagonal entries of
NNCT (in the upper-triangular form).
The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis is that all entries for all
(i.e., symmetry in the
mixed NN structure).
See also (Ceyhan (2014)) and the references therein.
Xsq.nnsym.dx.ct(ct, covS) Xsq.nnsym.dx(dat, lab, ...)
Xsq.nnsym.dx.ct(ct, covS) Xsq.nnsym.dx(dat, lab, ...)
ct |
A nearest neighbor contingency table, used in |
covS |
The |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The chi-squared test statistic for Dixon's overall NN symmetry test |
stat.names |
Name of the test statistic |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
Estimates, i.e., absolute differences of the off-diagonal entries of NNCT (in the upper-triangular form). |
est.name , est.name2
|
Names of the estimates, former is a shorter description of the estimates than the latter. |
null.value |
Hypothesized null values for the differences between the expected values of the off-diagonal entries, which is 0 for this function. |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014). “Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.” The Scientific World Journal, Volume 2014, Article ID 698296.
Znnsym.dx.ct
, Znnsym.dx
, Znnsym
,
Xsq.nnsym
, Xsq.nnsym.ss.ct
, Xsq.nnsym.ss
and Qsym.test
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow covS<-cov.nnsym(covN) Xsq.nnsym.dx(Y,cls) Xsq.nnsym.dx.ct(ct,covS) Xsq.nnsym.dx(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Xsq.nnsym.dx(Y,fcls) Xsq.nnsym.dx.ct(ct,covS) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) covS<-cov.nnsym(covN) Xsq.nnsym.dx(Y,cls) Xsq.nnsym.dx.ct(ct,covS)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow covS<-cov.nnsym(covN) Xsq.nnsym.dx(Y,cls) Xsq.nnsym.dx.ct(ct,covS) Xsq.nnsym.dx(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Xsq.nnsym.dx(Y,fcls) Xsq.nnsym.dx.ct(ct,covS) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) covS<-cov.nnsym(covN) Xsq.nnsym.dx(Y,cls) Xsq.nnsym.dx.ct(ct,covS)
Two functions: Xsq.nnsym.ss.ct
and Xsq.nnsym.ss
.
Both functions are objects of class "Chisqtest"
but with different arguments (see the parameter list below).
Each one performs the hypothesis test of equality of the expected value of the off-diagonal
cell counts (i.e., entries) under RL or CSR in the NNCT for classes.
That is, each performs Pielou's first type of NN symmetry test which is also equivalent to McNemar's
test on the NNCT. The test is appropriate (i.e. have the appropriate asymptotic sampling distribution)
provided that data is obtained by sparse sampling.
(See Ceyhan (2014) for more detail).
Each symmetry test is based on the chi-squared approximation of the corresponding quadratic form and are due to Pielou (1961).
The argument cont.corr is a logical argument (default=TRUE
) for continuity correction to this test.
If TRUE
the continuity correction to McNemar's test is implemented,
and if FALSE
such a correction is not implemented.
Each function yields the test statistic, -value and
df
which is , description of the
alternative with the corresponding null values (i.e. expected values) of differences of the off-diagonal entries,(which is
0 for this function) and also the sample estimates (i.e. observed values) of absolute differences of th
off-diagonal entries of NNCT (in the upper-triangular form).
The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis is that for all entries for
(i.e., symmetry in the
mixed NN structure).
In the output, the test statistic,
-value and
df
are valid only for (properly) sparsely sampled data.
See also (Pielou (1961); Ceyhan (2014)) and the references therein.
Xsq.nnsym.ss.ct(ct, cont.corr = TRUE) Xsq.nnsym.ss(dat, lab, cont.corr = TRUE, ...)
Xsq.nnsym.ss.ct(ct, cont.corr = TRUE) Xsq.nnsym.ss(dat, lab, cont.corr = TRUE, ...)
ct |
A nearest neighbor contingency table, used in |
cont.corr |
A logical argument (default= |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The chi-squared test statistic for Pielou's first type of NN symmetry test |
stat.names |
Name of the test statistic |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
Estimates, i.e., absolute differences of the off-diagonal entries of NNCT (in the upper-triangular form). |
est.name , est.name2
|
Names of the estimates, former is a shorter description of the estimates than the latter. |
null.value |
Hypothesized null values for the differences between the expected values of the off-diagonal entries, which is 0 for this function. |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Pielou EC (1961).
“Segregation and symmetry in two-species populations as studied by nearest-neighbor relationships.”
Journal of Ecology, 49(2), 255-269.
Znnsym2cl.ss.ct
, Znnsym2cl.ss
, Znnsym.ss.ct
,
Znnsym.ss
, Xsq.nnsym.dx.ct
, Xsq.nnsym.dx
and Qsym.test
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct Xsq.nnsym.ss(Y,cls) Xsq.nnsym.ss.ct(ct) Xsq.nnsym.ss(Y,cls,method="max") Xsq.nnsym.ss(Y,cls,cont.corr=FALSE) Xsq.nnsym.ss.ct(ct,cont.corr=FALSE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Xsq.nnsym.ss(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) Xsq.nnsym.ss(Y,cls) Xsq.nnsym.ss.ct(ct) Xsq.nnsym.ss.ct(ct,cont.corr = FALSE)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct Xsq.nnsym.ss(Y,cls) Xsq.nnsym.ss.ct(ct) Xsq.nnsym.ss(Y,cls,method="max") Xsq.nnsym.ss(Y,cls,cont.corr=FALSE) Xsq.nnsym.ss.ct(ct,cont.corr=FALSE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Xsq.nnsym.ss(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) Xsq.nnsym.ss(Y,cls) Xsq.nnsym.ss.ct(ct) Xsq.nnsym.ss.ct(ct,cont.corr = FALSE)
Two functions: Xsq.seg.coeff.ct
and Xsq.seg.coeff
.
Each one performs hypothesis tests of (simultaneous) equality of the segregation coefficients in an NNCT to the ones under RL or CSR. That is, each performs the combined Chi-square test for segregation coefficients which is appropriate (i.e. have the appropriate asymptotic sampling distribution) for completely mapped data. (See Ceyhan (2014) for more detail).
Each test is based on the Chi-square approximation of the corresponding quadratic form for the segregation coefficients in an NNCT. The segregation coefficients in the multi-class case are the extension of Pielou's segregation coefficient for the two-class case. (See Ceyhan (2014) for more detail).
Each function yields the test statistic, -value and
df
which is , description of the
alternative with the corresponding null values (i.e. expected values) of the segregation coefficients in the NNCT
(which are 0 for this function) and also the sample estimates (i.e. observed values) of the segregation
coefficients. The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis for all cells is that the corresponding segregation coefficients are all
equal to the expected value (which is 0) under RL or CSR.
Xsq.seg.coeff.ct(ct, covSC) Xsq.seg.coeff(dat, lab, ...)
Xsq.seg.coeff.ct(ct, covSC) Xsq.seg.coeff(dat, lab, ...)
ct |
A nearest neighbor contingency table, used in |
covSC |
The covariance matrix for the segregation coefficients in the NNCT, used in |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The chi-squared test statistic for the combined segregation coefficients |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
The |
est.name , est.name2
|
Names of the estimates, they are identical for this function. |
null.value |
The null value of the parameters, i.e., expected values of segregation coefficients in the NNCT under RL or CSR (which is 0). |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014). “Segregation indices for disease clustering.” Statistics in Medicine, 33(10), 1662-1684.
seg.coeff
, Zseg.coeff.ct
and Zseg.coeff
n<-20 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) covSC<-cov.seg.coeff(ct,covN) Xsq.seg.coeff(Y,cls) Xsq.seg.coeff.ct(ct,covSC) Xsq.seg.coeff(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Xsq.seg.coeff.ct(ct,covSC) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) covSC<-cov.seg.coeff(ct,covN) Xsq.seg.coeff(Y,cls) Xsq.seg.coeff.ct(ct,covSC)
n<-20 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) covSC<-cov.seg.coeff(ct,covN) Xsq.seg.coeff(Y,cls) Xsq.seg.coeff.ct(ct,covSC) Xsq.seg.coeff(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Xsq.seg.coeff.ct(ct,covSC) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) covSC<-cov.seg.coeff(ct,covN) Xsq.seg.coeff(Y,cls) Xsq.seg.coeff.ct(ct,covSC)
Two functions: Xsq.spec.cor.ct
and Xsq.spec.cor
.
Each one performs hypothesis tests of (simultaneous) equality of the self entries (i.e. first column) in a
species correspondence contingency table (SCCT) or the expected values of the diagonal entries in an NNCT
to the ones under RL or CSR.
That is, each performs the overall species correspondence test which is appropriate
(i.e. have the appropriate asymptotic sampling distribution)
for completely mapped data.
(See Ceyhan (2018) for more detail).
Each test is based on the Chi-square approximation of the corresponding quadratic form for the first column
in a species correspondence contingency table (SCCT) or the diagonal entries in an NNCT and
are due to (Ceyhan 2018).
Each function yields the test statistic, -value and
df
which is , description of the
alternative with the corresponding null values (i.e. expected values) of the self entries (i.e. first column) in the SCCT
or the diagonal entries in the NNCT and also the sample estimates (i.e. observed values) of these entries.
The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis is that all
where
is the size of class
and
is the data size.
Xsq.spec.cor.ct(ct, covSC, nnct = FALSE) Xsq.spec.cor(dat, lab, ...)
Xsq.spec.cor.ct(ct, covSC, nnct = FALSE) Xsq.spec.cor(dat, lab, ...)
ct |
The NNCT or SCCT, used in |
covSC |
The covariance matrix for the self entries (i.e. first column) in the SCCT
or the diagonal entries in the NNCT, used in |
nnct |
A logical parameter (default= |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The chi-squared test statistic for overall species correspondence test |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
The |
est.name , est.name2
|
Names of the estimates, they are identical for this function. |
null.value |
The |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2018). “A contingency table approach based on nearest neighbor relations for testing self and mixed correspondence.” SORT-Statistics and Operations Research Transactions, 42(2), 125-158.
Zself.ref.ct
, Zself.ref
, Xsq.nnref.ct
and Xsq.nnref
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-scct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) vsq<-varNii.ct(ct,Qv,Rv) cv<-covNii.ct(ct,vsq,Qv,Rv) Xsq.spec.cor.ct(ct,cv) Xsq.spec.cor(Y,cls) Xsq.spec.cor(Y,cls,method="max") ct<-nnct(ipd,cls) Xsq.spec.cor.ct(ct,cv,nnct = TRUE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-scct(ipd,fcls) Xsq.spec.cor.ct(ct,cv) Xsq.spec.cor(Y,fcls) ct<-nnct(ipd,fcls) Xsq.spec.cor.ct(ct,cv,nnct=TRUE) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-scct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) vsq<-varNii.ct(ct,Qv,Rv) cv<-covNii.ct(ct,vsq,Qv,Rv) Xsq.spec.cor.ct(ct,cv) ct<-nnct(ipd,cls) Xsq.spec.cor.ct(ct,cv,nnct = TRUE) Xsq.spec.cor(Y,cls)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-scct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) vsq<-varNii.ct(ct,Qv,Rv) cv<-covNii.ct(ct,vsq,Qv,Rv) Xsq.spec.cor.ct(ct,cv) Xsq.spec.cor(Y,cls) Xsq.spec.cor(Y,cls,method="max") ct<-nnct(ipd,cls) Xsq.spec.cor.ct(ct,cv,nnct = TRUE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-scct(ipd,fcls) Xsq.spec.cor.ct(ct,cv) Xsq.spec.cor(Y,fcls) ct<-nnct(ipd,fcls) Xsq.spec.cor.ct(ct,cv,nnct=TRUE) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-scct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) vsq<-varNii.ct(ct,Qv,Rv) cv<-covNii.ct(ct,vsq,Qv,Rv) Xsq.spec.cor.ct(ct,cv) ct<-nnct(ipd,cls) Xsq.spec.cor.ct(ct,cv,nnct = TRUE) Xsq.spec.cor(Y,cls)
Two functions: Zcell.nnct.ct
and Zcell.nnct
.
Both functions are objects of class "cellhtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of deviations of
cell counts from the expected values under RL or CSR for each cell (i.e., entry) in the NNCT.
The test for each cell is based on the normal approximation of the corresponding cell count,
and are due to Dixon (1994, 2002).
Each function yields a contingency table of the test statistics, -values for the corresponding
alternative, expected values (i.e., null value(s)), lower and upper confidence levels, sample estimates (i.e. observed values)
for the cell counts and also names of the test statistics, estimates, null values and the method and
the data set used.
The null hypothesis for each cell is that the corresponding cell count is equal to the expected value
under RL or CSR, that is
and
where
is the size of
class
and
is the size of the data set.
See also (Dixon (1994, 2002); Ceyhan (2010)).
Zcell.nnct.ct( ct, varN, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zcell.nnct( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Zcell.nnct.ct( ct, varN, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zcell.nnct( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
A nearest neighbor contingency table, used in |
varN |
The variance matrix for cell counts in the NNCT, |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
LCL , UCL
|
Matrix of Lower and Upper Confidence Levels for the cell counts at the given confidence
level |
conf.int |
The confidence interval for the estimates, it is |
cnf.lvl |
Level of the upper and lower confidence limits of the cell counts,
provided in |
estimate |
Estimates of the parameters, i.e., matrix of the observed cell counts which is the NNCT |
est.name , est.name2
|
Names of the estimates, both are same in this function |
null.value |
Matrix of hypothesized null values for the parameters which are expected values of the cell counts. |
null.name |
Name of the null values |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
Zcell.nnct.2s
, Zcell.nnct.rs
, Zcell.nnct.ls
,
Zcell.nnct.pval
and Zcell.tct
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) varN Zcell.nnct(Y,cls) Zcell.nnct(Y,cls,alt="g") Zcell.nnct.ct(ct,varN) Zcell.nnct.ct(ct,varN,alt="g") Zcell.nnct(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Zcell.nnct(Y,cls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) Zcell.nnct(Y,cls) Zcell.nnct.ct(ct,varN)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) varN Zcell.nnct(Y,cls) Zcell.nnct(Y,cls,alt="g") Zcell.nnct.ct(ct,varN) Zcell.nnct.ct(ct,varN,alt="g") Zcell.nnct(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Zcell.nnct(Y,cls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) Zcell.nnct(Y,cls) Zcell.nnct.ct(ct,varN)
-values for Cell-specific Z Test Statistics for NNCTFour functions: Zcell.nnct.2s
, Zcell.nnct.rs
, Zcell.nnct.ls
and Zcell.nnct.pval
.
These functions yield a contingency table (i.e., a matrix) of the -values for the cell-specific Z
test statistics for the NNCT and take the cell-specific
test statistics in matrix form as their argument.
Zcell.nnct.pval
yields an array of size where 1st entry of the array is the matrix of
-values for the
two-sided alternative, 2nd entry of the array is the matrix of
-values for the left-sided alternative
and 3rd entry of the array is the matrix of
-values for the right-sided alternative.
And each of
Zcell.nnct.2s
, Zcell.nnct.rs
and Zcell.nnct.ls
yield a matrix of
-values
for the two-sided, right-sided and left-sided alternative, respectively.
The functions Zcell.nnct.2s
, Zcell.nnct.rs
and Zcell.nnct.ls
are equivalent to
Zcell.nnct(...,alt)$p.val
where alt="two-sided"
, "greater"
and "less"
, respectively, with the appropriate
arguments for the function Zcell.nnct
(see the examples below).
See also (Dixon (1994, 2002); Ceyhan (2010)).
Zcell.nnct.pval(zt) Zcell.nnct.2s(zt) Zcell.nnct.ls(zt) Zcell.nnct.rs(zt)
Zcell.nnct.pval(zt) Zcell.nnct.2s(zt) Zcell.nnct.ls(zt) Zcell.nnct.rs(zt)
zt |
A |
Zcell.nnct.pval
returns a array whose 1st entry is the matrix of
-values for the
two-sided alternative, 2nd entry is the matrix of
-values for the left-sided alternative
and 3rd entry is the matrix of
-values for the right-sided alternative
Zcell.nnct.2s
returns a matrix of
-values for the two-sided alternative
Zcell.nnct.rs
returns a matrix of
-values for the right-sided alternative
Zcell.nnct.ls
returns a matrix of
-values for the left-sided alternative
Elvan Ceyhan
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) TS<-Zcell.nnct(Y,cls)$statistic TS pv<-Zcell.nnct.pval(TS) pv Zcell.nnct(Y,cls,alt="t")$p.val Zcell.nnct(Y,cls,alt="l")$p.val Zcell.nnct(Y,cls,alt="g")$p.val Zcell.nnct.2s(TS) Zcell.nnct.ls(TS) Zcell.nnct.rs(TS)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) TS<-Zcell.nnct(Y,cls)$statistic TS pv<-Zcell.nnct.pval(TS) pv Zcell.nnct(Y,cls,alt="t")$p.val Zcell.nnct(Y,cls,alt="l")$p.val Zcell.nnct(Y,cls,alt="g")$p.val Zcell.nnct.2s(TS) Zcell.nnct.ls(TS) Zcell.nnct.rs(TS)
Two functions: Zcell.spec.ct
and Zcell.spec
.
All functions are objects of class "cellhtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of deviations of
entries of NNCT or types I-IV TCTs from the expected values under RL or CSR for each entry.
The test for each entry is based on the normal approximation of the corresponding
value
and are due to Dixon (2002)
and Ceyhan (2017), respectively.
The type="dixon"
or "nnct"
refers to Dixon's cell-specific test of segregation, and
type="I"
-"IV"
refers to types I-IV cell-specific tests, respectively.
Each function yields a contingency table of the test statistics, -values for the corresponding
alternative, expected values (i.e. null value(s)), lower and upper confidence levels and sample estimates (i.e. observed values)
for the
or
values and also names of the test statistics, estimates, null values and the method and
the data set used.
The null hypothesis for each entry is that the corresponding value
or
is equal to the
expected value under RL or CSR.
See also (Dixon (1994, 2002); Ceyhan (2010, 2017)) and the references therein.
cell.spec.ct( ct, covN, type, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) cell.spec( dat, lab, type, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
cell.spec.ct( ct, covN, type, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) cell.spec( dat, lab, type, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
A nearest neighbor contingency table, used in |
covN |
The |
type |
The type of the cell-specific test with no default.
Takes on values |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
LCL , UCL
|
Matrix of Lower and Upper Confidence Levels for the |
conf.int |
The confidence interval for the estimates, it is |
cnf.lvl |
Level of the upper and lower confidence limits of the entries, provided in |
estimate |
Estimates of the parameters, NNCT or TCT, i.e., matrix of the observed |
est.name , est.name2
|
Names of the estimates, both are same in this function |
null.value |
Matrix of hypothesized null values for the parameters which are expected values of the
the null |
null.name |
Name of the null values |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Ceyhan E (2017).
“Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.”
Journal of the Korean Statistical Society, 46(2), 219-245.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
Zcell.nnct.ct
, Zcell.nnct
, Zcell.tct.ct
and Zcell.tct
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) type<-"IV" #"dixon" #try also "nnct", "I", "II", "III", and "IV" cell.spec(Y,cls,type) cell.spec(Y,cls,type,alt="g") cell.spec.ct(ct,covN,type) cell.spec.ct(ct,covN,type="II",alt="g") cell.spec(Y,cls,type,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) cell.spec(Y,cls,type="I") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cell.spec(Y,cls,type) cell.spec.ct(ct,covN,type)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) type<-"IV" #"dixon" #try also "nnct", "I", "II", "III", and "IV" cell.spec(Y,cls,type) cell.spec(Y,cls,type,alt="g") cell.spec.ct(ct,covN,type) cell.spec.ct(ct,covN,type="II",alt="g") cell.spec(Y,cls,type,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) cell.spec(Y,cls,type="I") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) cell.spec(Y,cls,type) cell.spec.ct(ct,covN,type)
Two functions: Zcell.tct.ct
and Zcell.tct
.
All functions are objects of class "cellhtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of deviations of
entries of types I-IV TCT, , from their expected values under RL or CSR for each entry.
The test for each entry
is based on the normal approximation of the corresponding
value
and are due to Ceyhan (2017).
Each function yields a contingency table of the test statistics, -values for the corresponding
alternative, expected values (i.e. null value(s)), lower and upper confidence levels and sample estimates (i.e. observed values)
for the
values and also names of the test statistics, estimates, null values and the method and the data
set used.
The null hypothesis for each entry is that the corresponding value
is equal to the expected value
under RL or CSR, see Ceyhan (2017) for more detail.
See also (Ceyhan (2017)) and references therein.
Zcell.tct.ct( ct, covN, type = "III", alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zcell.tct( dat, lab, type = "III", alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Zcell.tct.ct( ct, covN, type = "III", alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zcell.tct( dat, lab, type = "III", alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
A nearest neighbor contingency table, used in |
covN |
The |
type |
The type of the cell-specific test, default= |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
LCL , UCL
|
Matrix of Lower and Upper Confidence Levels for the |
conf.int |
The confidence interval for the estimates, it is |
cnf.lvl |
Level of the upper and lower confidence limits of the entries, provided in |
estimate |
Estimates of the parameters, i.e., matrix of the observed |
est.name , est.name2
|
Names of the estimates, both are same in this function |
null.value |
Matrix of hypothesized null values for the parameters which are expected values of
|
null.name |
Name of the null values |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2017). “Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.” Journal of the Korean Statistical Society, 46(2), 219-245.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) type<-"I" #try also "II", "III", and "IV" Zcell.tct(Y,cls,type) Zcell.tct(Y,cls,type,alt="g") Zcell.tct(Y,cls,type,method="max") Zcell.tct.ct(ct,covN) Zcell.tct.ct(ct,covN,type) Zcell.tct.ct(ct,covN,type,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Zcell.tct(Y,cls,type) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) Zcell.tct(Y,cls,type) Zcell.tct.ct(ct,covN,type)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) type<-"I" #try also "II", "III", and "IV" Zcell.tct(Y,cls,type) Zcell.tct(Y,cls,type,alt="g") Zcell.tct(Y,cls,type,method="max") Zcell.tct.ct(ct,covN) Zcell.tct.ct(ct,covN,type) Zcell.tct.ct(ct,covN,type,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Zcell.tct(Y,cls,type) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) Zcell.tct(Y,cls,type) Zcell.tct.ct(ct,covN,type)
Two functions: Zdir.nnct.ct
and Zdir.nnct
.
Both functions are objects of class "htest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of equality of the expected value of the the difference between the
phat estimates in a NNCT to the one under RL or CSR (which is
) where
phat estimates are
and
.
That is, each performs directional (i.e. one-sided) tests based on the
NNCT
(i.e. have the appropriate asymptotic sampling distribution)
for completely mapped data.
(See Ceyhan (2010) for more detail).
The one-sided (or directional) test has two types, specified with the type argument, with default
type="II"
. The second type is
where
(which is the difference between
phat values) and the first type is
where
.
Each test is based on the normal approximation of the
and
based on the
NNCT and
are due to (Ceyhan 2010).
Each function yields the test statistic, -value for the
corresponding alternative, the confidence interval, sample estimate (i.e. observed value) and null
(i.e., expected) value for the difference in phat values which is
for this function
and method and name of the data set used.
The null hypothesis is that all and
converges to 0 as class sizes go to infinity (or
has mean equal to
where
is the data size.
Zdir.nnct.ct( ct, covN, type = "II", alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zdir.nnct( dat, lab, type = "II", alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Zdir.nnct.ct( ct, covN, type = "II", alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zdir.nnct( dat, lab, type = "II", alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
The NNCT, used in |
covN |
The |
type |
The type of the directional (i.e. one-sided) test with default= |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
p.value |
The |
conf.int |
Confidence interval for the difference in phat values in an NNCT
at the given confidence level |
estimate |
Estimate of the parameter, i.e., the observed difference in phat values in an NNCT. |
null.value |
Hypothesized null value for the difference in phat values in an NNCT
which is |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2010). “Directional clustering tests based on nearest neighbour contingency tables.” Journal of Nonparametric Statistics, 22(5), 599-616.
Zdir.nnct.ss.ct
, Zdir.nnct.ss
, overall.nnct.ct
and overall.nnct
n<-20 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) Zdir.nnct(Y,cls) Zdir.nnct.ct(ct,covN) Zdir.nnct(Y,cls,alt="g") Zdir.nnct.ct(ct,covN,type="I",alt="l") Zdir.nnct(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Zdir.nnct(Y,fcls) Zdir.nnct.ct(ct,covN) ############# ct<-matrix(1:4,ncol=2) Zdir.nnct.ct(ct,covN) #gives an error message if ct is defined as ct<-matrix(1:9,ncol=3)
n<-20 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) Zdir.nnct(Y,cls) Zdir.nnct.ct(ct,covN) Zdir.nnct(Y,cls,alt="g") Zdir.nnct.ct(ct,covN,type="I",alt="l") Zdir.nnct(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Zdir.nnct(Y,fcls) Zdir.nnct.ct(ct,covN) ############# ct<-matrix(1:4,ncol=2) Zdir.nnct.ct(ct,covN) #gives an error message if ct is defined as ct<-matrix(1:9,ncol=3)
Two functions: Zdir.nnct.ss.ct
and Zdir.nnct.ss
.
Both functions are objects of class "htest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of independence in the NNCT which implies
or equivalently
.
where
is the cell count in entry
,
is the sum of row
(i.e. size of class
),
is the sum of column
in the
NNCT;
and
are also referred to as the phat estimates in row-wise binomial framework
for
NNCT (see Ceyhan (2010)).
That is, each performs directional (i.e. one-sided) tests based on the NNCT and is appropriate
(i.e. have the appropriate asymptotic sampling distribution)
when that data is obtained by sparse sampling.
(See Ceyhan (2010) for more detail).
Each test is based on the normal approximation of which is the directional
-tests for the chi-squared
tests of independence for the contingency tables (Bickel and Doksum 1977).
Each function yields the test statistic, -value for the
corresponding alternative, the confidence interval, sample estimate (i.e. observed value) and
null (i.e., expected) value for the difference in the phat values (which is 0 for this test) in an NNCT,
and method and name of the data set used.
The null hypothesis is that or equivalently
.
Zdir.nnct.ss.ct( ct, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zdir.nnct.ss( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Zdir.nnct.ss.ct( ct, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zdir.nnct.ss( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
The NNCT, used in |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
p.value |
The |
conf.int |
Confidence interval for the difference in phat values in the NNCT
at the given confidence level |
estimate |
Estimate of the parameter, i.e., the observed difference in phat values in the NNCT. |
null.value |
Hypothesized null value for the difference in phat values in the NNCT which is 0 for this function. |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Bickel PJ, Doksum AK (1977).
Mathematical Statistics, Basic Ideas and Selected Topics.
Prentice Hall, Englewood Cliffs, NJ.
Ceyhan E (2010).
“Directional clustering tests based on nearest neighbour contingency tables.”
Journal of Nonparametric Statistics, 22(5), 599-616.
Zdir.nnct.ct
, Zdir.nnct
, Pseg.ss.ct
and Pseg.ss
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct Zdir.nnct.ss(Y,cls) Zdir.nnct.ss.ct(ct) Zdir.nnct.ss(Y,cls,alt="g") Zdir.nnct.ss(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Zdir.nnct.ss(Y,fcls) Zdir.nnct.ss.ct(ct) ############# ct<-matrix(1:4,ncol=2) Zdir.nnct.ss.ct(ct) #gives an error message if ct<-matrix(1:9,ncol=3)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct Zdir.nnct.ss(Y,cls) Zdir.nnct.ss.ct(ct) Zdir.nnct.ss(Y,cls,alt="g") Zdir.nnct.ss(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Zdir.nnct.ss(Y,fcls) Zdir.nnct.ss.ct(ct) ############# ct<-matrix(1:4,ncol=2) Zdir.nnct.ss.ct(ct) #gives an error message if ct<-matrix(1:9,ncol=3)
Two functions: Zmixed.nonref.ct
and Zmixed.nonref
.
Both functions are objects of class "htest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of mixed non-reflexivity in the NN structure using the
number of mixed-non-reflexive NN pairs (i.e. the second diagonal entry, ) in the RCT for
classes.
That is, each test performs a test of mixed non-reflexivity corresponding to entry
in the RCT)
which is appropriate (i.e. have the appropriate asymptotic sampling distribution) for completely mapped data.
(See Ceyhan and Bahadir (2017) for more detail).
The mixed non-reflexivity test is based on the normal approximation of the diagonal entry
in the RCT and are due to Ceyhan and Bahadir (2017).
Each function yields the test statistic, -value for the
corresponding alternative, the confidence interval, sample estimate (i.e. observed value) and null (i.e., expected) value for the
mixed non-reflexivity value (i.e., diagonal entry
value, respectively) in the RCT,
and method and name of the data set used.
The null hypothesis is that in the RCT, where
is the number of reflexive
NNs and
is the probability of any two points selected are being from two different classes.
Zmixed.nonref.ct( rfct, nvec, Qv, Tv, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zmixed.nonref( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Zmixed.nonref.ct( rfct, nvec, Qv, Tv, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zmixed.nonref( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
rfct |
An RCT, used in |
nvec |
The |
Qv |
The number of shared NNs, used in |
Tv |
|
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
p.value |
The |
conf.int |
Confidence interval for the mixed non-reflexivity value (i.e., diagonal entry |
estimate |
Estimate of the parameter, i.e., the observed diagonal entry |
null.value |
Hypothesized null value for the mixed non-reflexivity value (i.e., expected value of the
diagonal entry |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E, Bahadir S (2017). “Nearest Neighbor Methods for Testing Reflexivity.” Environmental and Ecological Statistics, 24(1), 69-108.
Zself.ref.ct
, Zself.ref
, Znnref.ct
and
Znnref
n<-20 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qv<-Qvec(W)$q R<-Rval(W) Tv<-Tval(W,R) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Zmixed.nonref(Y,cls) Zmixed.nonref.ct(rfct,nvec,Qv,Tv) Zmixed.nonref(Y,cls,alt="g") Zmixed.nonref(Y,cls,method="max") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qv<-Qvec(W)$q R<-Rval(W) Tv<-Tval(W,R) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Zmixed.nonref(Y,cls,alt="g") Zmixed.nonref.ct(rfct,nvec,Qv,Tv) Zmixed.nonref.ct(rfct,nvec,Qv,Tv,alt="l")
n<-20 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qv<-Qvec(W)$q R<-Rval(W) Tv<-Tval(W,R) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Zmixed.nonref(Y,cls) Zmixed.nonref.ct(rfct,nvec,Qv,Tv) Zmixed.nonref(Y,cls,alt="g") Zmixed.nonref(Y,cls,method="max") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qv<-Qvec(W)$q R<-Rval(W) Tv<-Tval(W,R) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Zmixed.nonref(Y,cls,alt="g") Zmixed.nonref.ct(rfct,nvec,Qv,Tv) Zmixed.nonref.ct(rfct,nvec,Qv,Tv,alt="l")
Two functions: Znnref.ct
and Znnref
.
Both functions are objects of class "refhtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of equality of the expected values of the
diagonal cell counts (i.e., entries) under RL or CSR in the RCT for classes.
That is, each test performs NN reflexivity test (i.e., a test of self reflexivity and a test of
mixed non-reflexivity, corresponding to entries
and
, respectively, in the RCT) which is
appropriate (i.e. have the appropriate asymptotic sampling distribution) for completely mapped data.
(See Ceyhan and Bahadir (2017) for more detail).
The reflexivity test is based on the normal approximation of the diagonal entries in the RCT and are due to Ceyhan and Bahadir (2017).
Each function yields the test statistics, -values for the corresponding
alternative, expected values (i.e. null value(s)), confidence intervals and sample estimates (i.e. observed values)for the
self reflexivity and mixed non-reflexivity values (i.e., entries
and
values, respectively)
in the RCT. Each function also gives names of the test statistics, null values and the method and the data
set used.
The null hypothesis is that and
in the RCT, where
is the number of reflexive
NNs and
is the probability of any two points selected are being from the same class
and
is the probability of any two points selected are being from two different classes.
The Znnref
functions (i.e. Znnref.ct
and Znnref
) are different from
the Znnself
functions (i.e. Znnself.ct
and Znnself
) and
from Zself.ref
functions (i.e. Zself.ref.ct
and Zself.ref
), and also
from Znnself.sum
functions (i.e. Znnself.sum.ct
and Znnself.sum
).
Znnref
functions are for testing the self reflexivity and mixed non-reflexivity
using the diagonal entries in the RCT while Znnself
functions are testing the self reflexivity at a
class-specific level (i.e. for each class) using the first column in the SCCT, and
Zself.ref
functions are for testing the self reflexivity for the entire data set
using entry in RCT, and
Znnself.sum
functions are testing the cumulative species correspondence using
the sum of the self column (i.e., the first column) in the SCCT.
Znnref.ct( rfct, nvec, Qv, Tv, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnref( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Znnref.ct( rfct, nvec, Qv, Tv, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnref( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
rfct |
An RCT, used in |
nvec |
The |
Qv |
The number of shared NNs, used in |
Tv |
|
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
conf.int |
Confidence intervals for the self reflexivity and mixed non-reflexivity values
(i.e., diagonal entries |
cnf.lvl |
Level of the onfidence intervals of the diagonal entries, provided in |
estimate |
Estimates of the parameters, i.e., the observed diagonal entries |
null.value |
Hypothesized null values for the self reflexivity and mixed non-reflexivity values
(i.e., expected values of the diagonal entries |
null.name |
Name of the null values |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E, Bahadir S (2017). “Nearest Neighbor Methods for Testing Reflexivity.” Environmental and Ecological Statistics, 24(1), 69-108.
Znnself.ct
, Znnself
, Zmixed.nonref.ct
,
Zmixed.nonref
, Xsq.nnref.ct
and Xsq.nnref
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) Tv<-Tval(W,Rv) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Znnref(Y,cls) Znnref(Y,cls,method="max") Znnref.ct(rfct,nvec,Qv,Tv) Znnref.ct(rfct,nvec,Qv,Tv,alt="g") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qv<-Qvec(W)$q R<-Rval(W) Tv<-Tval(W,R) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Znnref(Y,cls,alt="g") Znnref.ct(rfct,nvec,Qv,Tv) Znnref.ct(rfct,nvec,Qv,Tv,alt="l")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) Tv<-Tval(W,Rv) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Znnref(Y,cls) Znnref(Y,cls,method="max") Znnref.ct(rfct,nvec,Qv,Tv) Znnref.ct(rfct,nvec,Qv,Tv,alt="g") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) W<-Wmat(ipd) Qv<-Qvec(W)$q R<-Rval(W) Tv<-Tval(W,R) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Znnref(Y,cls,alt="g") Znnref.ct(rfct,nvec,Qv,Tv) Znnref.ct(rfct,nvec,Qv,Tv,alt="l")
Two functions: Znnself.ct
and Znnself
.
Both functions are objects of class "cellhtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of equality of the expected values of the self entries (i.e. first column)
in a species correspondence contingency table (SCCT) or the expected values of the diagonal entries in
an NNCT to the ones under RL or CSR.
That is, each performs NN self reflexivity for each class test which is appropriate
(i.e. have the appropriate asymptotic sampling distribution)
for completely mapped data.
NN self reflexivity is for each class can be viewed as a decomposition of species correspondence for
each class.
(See Ceyhan (2018) for more detail).
Each test is based on the normal approximation of the self entries (i.e. first column) in a
species correspondence contingency table (SCCT) or the diagonal entries in an NNCT and
are due to (Ceyhan 2018).
Each function yields a vector
of length of the test statistics,
-values for the corresponding
alternative, null values (i.e. expected values), sample estimates (i.e. observed values) of self entries
in the SCCT or diagonal entries in the NNCT, a
matrix of confidence intervals (where each row is the
confidence interval for self entry
in the SCCT or diagonal entry
in the NNCT) and
also names of the test statistics, estimates, null values and the method and the data
set used.
The null hypothesis is that all where
is the size of class
and
is the data size.
The Znnself
functions (i.e. Znnself.ct
and Znnself
) are different from the Znnref
functions
(i.e. Znnref.ct
and Znnref
) and from Zself.ref
functions (i.e. Zself.ref.ct
and Zself.ref
) and also from
Znnself.sum
functions (i.e. Znnself.sum.ct
and Znnself.sum
).
Znnself
functions are testing the self reflexivity at a class-specific level (i.e. for each class) using the
first column in the SCCT, while Zself.ref
functions are for testing the self reflexivity for the entire data set
using entry in RCT, and
Znnref
functions are for testing the self reflexivity and mixed non-reflexivity
using the diagonal entries in the RCT, and
Znnself.sum
functions are testing the cumulative species correspondence using the sum of the self column (i.e.,
the first column) in the SCCT.
Znnself.ct( ct, VarNii, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnself( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Znnself.ct( ct, VarNii, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnself( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
The NNCT or SCCT, used in |
VarNii |
The variance vector of differences of self entries in the SCCT or diagonal entries in the NNCT,
used in |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
LCL , UCL
|
Lower and Upper Confidence Levels, it is |
conf.int |
The |
cnf.lvl |
Level of the confidence intervals (i.e., conf.level) for the self entries in the SCCT or diagonal entries in the NNCT. |
estimate |
The |
est.name , est.name2
|
Names of the estimates, both are same in this function. |
null.value |
The |
null.name |
Name of the null values |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2018). “A contingency table approach based on nearest neighbor relations for testing self and mixed correspondence.” SORT-Statistics and Operations Research Transactions, 42(2), 125-158.
Zself.ref.ct
, Zself.ref
, Znnref.ct
,
Znnref
, Xsq.spec.cor
and Xsq.spec.cor.ct
n<-20 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) VarN.diag<-varNii.ct(ct,Qv,Rv) Znnself(Y,cls) Znnself(Y,cls,alt="g") Znnself.ct(ct,VarN.diag) Znnself.ct(ct,VarN.diag,alt="g") Znnself(Y,cls,method="max") ct<-scct(ipd,cls) Znnself.ct(ct,VarN.diag) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Znnself(Y,fcls) Znnself.ct(ct,VarN.diag) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) VarN.diag<-varNii.ct(ct,Qv,Rv) Znnself(Y,cls,alt="l") Znnself.ct(ct,VarN.diag) Znnself.ct(ct,VarN.diag,alt="l")
n<-20 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) VarN.diag<-varNii.ct(ct,Qv,Rv) Znnself(Y,cls) Znnself(Y,cls,alt="g") Znnself.ct(ct,VarN.diag) Znnself.ct(ct,VarN.diag,alt="g") Znnself(Y,cls,method="max") ct<-scct(ipd,cls) Znnself.ct(ct,VarN.diag) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Znnself(Y,fcls) Znnself.ct(ct,VarN.diag) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) VarN.diag<-varNii.ct(ct,Qv,Rv) Znnself(Y,cls,alt="l") Znnself.ct(ct,VarN.diag) Znnself.ct(ct,VarN.diag,alt="l")
Two functions: Znnself.sum.ct
and Znnself.sum
.
Both functions are objects of class "htest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of equality of the expected value of the sum of the self entries (i.e.
first column) in a species correspondence contingency table (SCCT) or the expected values of the sum of the
diagonal entries in an NNCT to the one under RL or CSR.
That is, each performs a cumulative species correspondence test which is appropriate
(i.e. have the appropriate asymptotic sampling distribution)
for completely mapped data.
(See Ceyhan (2018) for more detail).
Each test is based on the normal approximation of the sum of the self entries (i.e. first column) in a
species correspondence contingency table (SCCT) or the sum of the diagonal entries in an NNCT and
are due to (Ceyhan 2018).
Each function yields the test statistic, -value for the
corresponding alternative, the confidence interval, sample estimate (i.e. observed value) and null (i.e., expected) value for the
sum of the self entries (i.e. first column) in a
species correspondence contingency table (SCCT) or the sum of the diagonal entries
in an NNCT,
and method and name of the data set used.
The null hypothesis is that all
where
is the sum of the self column
in the SCCT,
is the size of class
and
is the data size.
The Znnself.sum
functions (i.e. Znnself.sum.ct
and Znnself.sum
) are different from the Znnself
functions (i.e. Znnself.ct
and Znnself
), and from the Znnref
functions
(i.e. Znnref.ct
and Znnref
) and also from Zself.ref
functions (i.e. Zself.ref.ct
and Zself.ref
).
Znnself.sum
functions are testing the cumulative species correspondence using the sum of the self column (i.e.,
the first column) in the SCCT, while Znnself
functions are testing the self reflexivity at a class-specific level (i.e. for each class) using the
first column in the SCCT, while Zself.ref
functions are for testing the self reflexivity for the entire data set
using entry in RCT, and
Znnref
functions are for testing the self reflexivity and mixed non-reflexivity
using the diagonal entries in the RCT.
Znnself.sum.ct( ct, covSC, nnct = FALSE, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnself.sum( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Znnself.sum.ct( ct, covSC, nnct = FALSE, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnself.sum( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
The NNCT or SCCT, used in |
covSC |
The covariance matrix for the self entries (i.e. first column) in the SCCT
or the diagonal entries in the NNCT, used in |
nnct |
A logical parameter (default= |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
p.value |
The |
conf.int |
Confidence interval for the sum of the self entries (i.e. first column) in a
species correspondence contingency table (SCCT) or the sum of the diagonal entries |
estimate |
Estimate of the parameter, i.e., the observed sum of the self entries (i.e. first column)
in a species correspondence contingency table (SCCT) or the sum of the diagonal entries |
null.value |
Hypothesized null value for the sum of the self entries (i.e. first column) in a
species correspondence contingency table (SCCT) or the sum of the diagonal entries |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2018). “A contingency table approach based on nearest neighbor relations for testing self and mixed correspondence.” SORT-Statistics and Operations Research Transactions, 42(2), 125-158.
Znnself.ct
, Znnself
, Znnref.ct
, Znnref
,
Zself.ref.ct
and Zself.ref
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-scct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) vsq<-varNii.ct(ct,Qv,Rv) cv<-covNii.ct(ct,vsq,Qv,Rv) Znnself.sum(Y,cls) Znnself.sum.ct(ct,cv) Znnself.sum.ct(ct,cv,alt="g") Znnself.sum(Y,cls,method="max") ct<-nnct(ipd,cls) Znnself.sum.ct(ct,cv,nnct = TRUE) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) ct<-scct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) vsq<-varNii.ct(ct,Qv,Rv) cv<-covNii.ct(ct,vsq,Qv,Rv) Znnself.sum(Y,cls) Znnself.sum.ct(ct,cv) Znnself.sum.ct(ct,cv,alt="g") ct<-nnct(ipd,cls) Znnself.sum.ct(ct,cv,nnct = TRUE) Znnself.sum(Y,cls,alt="g")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-scct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) vsq<-varNii.ct(ct,Qv,Rv) cv<-covNii.ct(ct,vsq,Qv,Rv) Znnself.sum(Y,cls) Znnself.sum.ct(ct,cv) Znnself.sum.ct(ct,cv,alt="g") Znnself.sum(Y,cls,method="max") ct<-nnct(ipd,cls) Znnself.sum.ct(ct,cv,nnct = TRUE) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) ct<-scct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) vsq<-varNii.ct(ct,Qv,Rv) cv<-covNii.ct(ct,vsq,Qv,Rv) Znnself.sum(Y,cls) Znnself.sum.ct(ct,cv) Znnself.sum.ct(ct,cv,alt="g") ct<-nnct(ipd,cls) Znnself.sum.ct(ct,cv,nnct = TRUE) Znnself.sum(Y,cls,alt="g")
Two functions: Znnsym.dx.ct
and Znnsym.dx
.
Both functions are objects of class "cellhtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of equality of the expected values of the off-diagonal
cell counts (i.e., entries) for each pair of classes under RL or CSR in the NNCT for
classes.
That is, each performs Dixon's NN symmetry test which is appropriate
(i.e. have the appropriate asymptotic sampling distribution)
for completely mapped data.
(See Dixon (1994); Ceyhan (2014) for more detail).
Each symmetry test is based on the normal approximation of the difference of the off-diagonal entries in the NNCT and are due to Dixon (1994).
Each function yields a contingency table of the test statistics, -values for the corresponding
alternative, expected values (i.e. null value(s)), lower and upper confidence levels and sample estimates (i.e. observed values)
for the
values for
(all in the upper-triangular form except for the null value, which is 0
for all pairs) and also names of the test statistics, estimates, null values and the method and the data
set used.
The null hypothesis is that all for
in the
NNCT (i.e., symmetry in the
mixed NN structure) for
.
In the output, the test statistic,
-value and the lower and upper confidence limits are valid
for completely mapped data.
See also (Dixon (1994); Ceyhan (2014)) and the references therein.
Znnsym.dx.ct( ct, varS, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnsym.dx( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Znnsym.dx.ct( ct, varS, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnsym.dx( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
A nearest neighbor contingency table, used in |
varS |
The variance vector of differences of off-diagonal cell counts in NNCT, |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
LCL , UCL
|
Matrix of Lower and Upper Confidence Levels (in the upper-triangular form) for the |
conf.int |
The confidence interval for the estimates, it is |
cnf.lvl |
Level of the upper and lower confidence limits (i.e., conf.level) of the differences of the off-diagonal entries. |
estimate |
Estimates of the parameters, i.e., matrix of the difference of the off-diagonal entries
(in the upper-triangular form) of the |
est.name , est.name2
|
Names of the estimates, former is a shorter description of the estimates than the latter. |
null.value |
Hypothesized null value for the expected difference between the off-diagonal entries,
|
null.name |
Name of the null values |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Znnsym2cl.dx.ct
, Znnsym2cl.dx
, Znnsym.ss.ct
,
Znnsym.ss
, Xsq.nnsym.dx.ct
and Xsq.nnsym.dx
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow varS<-var.nnsym(covN) Znnsym.dx(Y,cls) Znnsym.dx.ct(ct,varS) Znnsym.dx(Y,cls,method="max") Znnsym.dx(Y,cls,alt="g") Znnsym.dx.ct(ct,varS,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Znnsym.dx(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow varS<-var.nnsym(covN) Znnsym.dx(Y,cls) Znnsym.dx.ct(ct,varS)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow varS<-var.nnsym(covN) Znnsym.dx(Y,cls) Znnsym.dx.ct(ct,varS) Znnsym.dx(Y,cls,method="max") Znnsym.dx(Y,cls,alt="g") Znnsym.dx.ct(ct,varS,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Znnsym.dx(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow varS<-var.nnsym(covN) Znnsym.dx(Y,cls) Znnsym.dx.ct(ct,varS)
Two functions: Znnsym.ss.ct
and Znnsym.ss
.
Both functions are objects of class "cellhtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of equality of the expected values of the off-diagonal
cell counts (i.e., entries) for each pair of classes under RL or CSR in the NNCT for
classes.
That is, each performs Pielou's first type of NN symmetry test which is appropriate
(i.e. have the appropriate asymptotic sampling distribution)
provided that data is obtained by sparse sampling.
(See Ceyhan (2014) for more detail).
Each symmetry test is based on the normal approximation of the differences of the off-diagonal entries in the NNCT and are due to Pielou (1961).
Each function yields a contingency table of the test statistics, -values for the corresponding
alternative, expected values, lower and upper confidence levels, sample estimates (i.e. observed values)
and null value(s) (i.e. expected values) for the
values for
(all in the upper-triangular form except for the null value, which is 0 for all
pairs) and also names of the test statistics, estimates, null values and the method and the data
set used.
The null hypothesis is that all for
in the
NNCT (i.e., symmetry in the
mixed NN structure) for
.
In the output, the test statistic,
-value and the lower and upper confidence limits are valid only
for (properly) sparsely sampled data.
See also (Pielou (1961); Ceyhan (2014)) and the references therein.
Znnsym.ss.ct( ct, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnsym.ss( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Znnsym.ss.ct( ct, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnsym.ss( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
A nearest neighbor contingency table, used in |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
LCL , UCL
|
Matrix of Lower and Upper Confidence Levels (in the upper-triangular form) for the |
conf.int |
The confidence interval for the estimates, it is |
cnf.lvl |
Level of the upper and lower confidence limits (i.e., conf.level) of the differences of the off-diagonal entries. |
estimate |
Estimates of the parameters, i.e., matrix of the difference of the off-diagonal entries
(in the upper-triangular form) of the |
est.name , est.name2
|
Names of the estimates, former is a shorter description of the estimates than the latter. |
null.value |
Hypothesized null value for the expected difference between the off-diagonal entries,
|
null.name |
Name of the null values |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Pielou EC (1961).
“Segregation and symmetry in two-species populations as studied by nearest-neighbor relationships.”
Journal of Ecology, 49(2), 255-269.
Znnsym.dx.ct
, Znnsym.dx
, Znnsym2cl.ss.ct
and
Znnsym2cl.ss
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct Znnsym.ss(Y,cls) Znnsym.ss.ct(ct) Znnsym.ss(Y,cls,method="max") Znnsym.ss(Y,cls,alt="g") Znnsym.ss.ct(ct,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Znnsym.ss(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) Znnsym.ss(Y,cls) Znnsym.ss.ct(ct)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct Znnsym.ss(Y,cls) Znnsym.ss.ct(ct) Znnsym.ss(Y,cls,method="max") Znnsym.ss(Y,cls,alt="g") Znnsym.ss.ct(ct,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Znnsym.ss(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) Znnsym.ss(Y,cls) Znnsym.ss.ct(ct)
Two functions: Znnsym2cl.dx.ct
and Znnsym2cl.dx
.
Both functions are objects of class "htest"
but with different arguments (see the parameter list below).
Each one performs the hypothesis test of equality of the expected value of the off-diagonal
cell counts (i.e., entries) under RL or CSR in the NNCT for classes.
That is, each performs Dixon's NN symmetry test which is appropriate
(i.e. have the appropriate asymptotic sampling distribution)
for completely mapped data.
(See Ceyhan (2014) for more detail).
Each symmetry test is based on the normal approximation of the difference of the off-diagonal entries in the NNCT and are due to Dixon (1994).
Each function yields the test statistic, -value for the
corresponding alternative, the confidence interval, estimate and null value for the parameter of interest
(which is the difference of the off-diagonal entries in the NNCT), and method and name of the data set used.
The null hypothesis is that all in the
NNCT (i.e., symmetry in the
mixed NN structure).
See also (Dixon (1994); Ceyhan (2014)) and the references therein.
Znnsym2cl.dx.ct( ct, Q, R, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnsym2cl.dx( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Znnsym2cl.dx.ct( ct, Q, R, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnsym2cl.dx( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
A nearest neighbor contingency table, used in |
Q |
The number of shared NNs, used in |
R |
The number of reflexive NNs (i.e., twice the number of reflexive NN pairs),
used in |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
p.value |
The |
conf.int |
Confidence interval for the difference of the off-diagonal entries, |
estimate |
Estimate, i.e., the difference of the off-diagonal entries of the |
null.value |
Hypothesized null value for the expected difference between the off-diagonal entries,
|
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Znnsym2cl.ss.ct
, Znnsym2cl.ss
, Znnsym.dx.ct
,
Znnsym.dx
, Xsq.nnsym.dx.ct
and Xsq.nnsym.dx
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) Znnsym2cl.dx(Y,cls) Znnsym2cl.dx.ct(ct,Qv,Rv) Znnsym2cl.dx(Y,cls,method="max") Znnsym2cl.dx(Y,cls,alt="g") Znnsym2cl.dx.ct(ct,Qv,Rv,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Znnsym2cl.dx(Y,fcls) ############# ct<-matrix(sample(1:20,4),ncol=2) Znnsym2cl.dx.ct(ct,Qv,Rv) #gives an error message if ct<-matrix(sample(1:20,9),ncol=3) #here, Qv and Rv values are borrowed from above, to highlight a point
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) Znnsym2cl.dx(Y,cls) Znnsym2cl.dx.ct(ct,Qv,Rv) Znnsym2cl.dx(Y,cls,method="max") Znnsym2cl.dx(Y,cls,alt="g") Znnsym2cl.dx.ct(ct,Qv,Rv,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Znnsym2cl.dx(Y,fcls) ############# ct<-matrix(sample(1:20,4),ncol=2) Znnsym2cl.dx.ct(ct,Qv,Rv) #gives an error message if ct<-matrix(sample(1:20,9),ncol=3) #here, Qv and Rv values are borrowed from above, to highlight a point
Two functions: Znnsym2cl.ss.ct
and Znnsym2cl.ss
.
Both functions are objects of class "htest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of equality of the expected value of the off-diagonal
cell counts (i.e., entries) under RL or CSR in the NNCT for classes.
That is, each performs Pielou's first type of NN symmetry test which is appropriate
(i.e. have the appropriate asymptotic sampling distribution)
provided that data is obtained by sparse sampling.
(See Ceyhan (2014) for more detail).
Each symmetry test is based on the normal approximation of the difference of the off-diagonal entries in the NNCT and are due to Pielou (1961).
Each function yields the test statistic, -value for the
corresponding alternative, the confidence interval, estimate and null value for the parameter of interest
(which is the difference of the off-diagonal entries in the NNCT), and method and name of the data set used.
The null hypothesis is that in the
NNCT (i.e., symmetry in the
mixed NN structure).
In the output, the test statistic,
-value and the confidence interval are valid only
for (properly) sparsely sampled data.
See also (Pielou (1961); Ceyhan (2014)) and the references therein.
Znnsym2cl.ss.ct( ct, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnsym2cl.ss( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Znnsym2cl.ss.ct( ct, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Znnsym2cl.ss( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
A nearest neighbor contingency table, used in |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
p.value |
The |
conf.int |
Confidence interval for the difference of the off-diagonal entries, |
estimate |
Estimate, i.e., the difference of the off-diagonal entries of the |
null.value |
Hypothesized null value for the expected difference between the off-diagonal entries,
|
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Pielou EC (1961).
“Segregation and symmetry in two-species populations as studied by nearest-neighbor relationships.”
Journal of Ecology, 49(2), 255-269.
Xsq.nnsym.ss.ct
, Xsq.nnsym.ss
, Znnsym.ss.ct
and
Znnsym.ss
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct Znnsym2cl.ss(Y,cls) Znnsym2cl.ss.ct(ct) Znnsym2cl.ss(Y,cls,method="max") Znnsym.ss.ct(ct) Znnsym2cl.ss(Y,cls,alt="g") Znnsym2cl.ss.ct(ct,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Znnsym2cl.ss(Y,fcls) ############# ct<-matrix(sample(1:20,4),ncol=2) Znnsym2cl.ss.ct(ct) #gives an error message if ct<-matrix(sample(1:20,9),ncol=3)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct Znnsym2cl.ss(Y,cls) Znnsym2cl.ss.ct(ct) Znnsym2cl.ss(Y,cls,method="max") Znnsym.ss.ct(ct) Znnsym2cl.ss(Y,cls,alt="g") Znnsym2cl.ss.ct(ct,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Znnsym2cl.ss(Y,fcls) ############# ct<-matrix(sample(1:20,4),ncol=2) Znnsym2cl.ss.ct(ct) #gives an error message if ct<-matrix(sample(1:20,9),ncol=3)
Two functions: Zseg.coeff.ct
and Zseg.coeff
.
Both functions are objects of class "cellhtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of deviations of
segregation coefficients from their expected values under RL or CSR for each segregation coefficient
in the NNCT.
The test for each cell is based on the normal approximation of the corresponding segregation coefficient.
That is, each performs the segregation coefficient tests which are appropriate
(i.e. have the appropriate asymptotic sampling distribution)
for completely mapped data.
The segregation coefficients in the multi-class case are the extension of Pielou's segregation coefficient
for the two-class case.
(See Ceyhan (2014) for more detail).
Each function yields a contingency table of the test statistics, -values for the corresponding
alternative, lower and upper confidence levels, sample estimates (i.e. observed values) and null value
(i.e. expected value, which is 0) for the segregation coefficients
and also names of the test statistics, estimates, null value and the method and the data set used.
The null hypothesis for each cell is that the corresponding segregation coefficient equal to the expected value
(which is 0) under RL or CSR.
See also (Ceyhan (2014)).
Zseg.coeff.ct( ct, VarSC, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zseg.coeff( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Zseg.coeff.ct( ct, VarSC, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zseg.coeff( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
A nearest neighbor contingency table, used in |
VarSC |
The variance matrix for the segregation coefficients in the NNCT, |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
LCL , UCL
|
Matrix of Lower and Upper Confidence Levels for the segregation coefficients at the given
confidence level |
conf.int |
Confidence interval for segregation coefficients, it is |
cnf.lvl |
Level of the upper and lower confidence limits of the segregation coefficients,
provided in |
estimate |
Estimate of the parameter, i.e., matrix of the observed segregation coefficients |
est.name , est.name2
|
Names of the estimates, both are same in this function |
null.value |
Hypothesized null values for the parameters, i.e. expected values of the segregation coefficients, which are all 0 under RL or CSR. |
null.name |
Name of the null value |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014). “Segregation indices for disease clustering.” Statistics in Medicine, 33(10), 1662-1684.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) varT<-var.seg.coeff(ct,covN) Zseg.coeff(Y,cls) Zseg.coeff.ct(ct,varT) Zseg.coeff(Y,cls,method="max") Zseg.coeff(Y,cls,alt="g") Zseg.coeff.ct(ct,varT,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Zseg.coeff.ct(ct,varT) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) varT<-var.seg.coeff(ct,covN) Zseg.coeff(Y,cls) Zseg.coeff.ct(ct,varT) Zseg.coeff(Y,cls,alt="g") Zseg.coeff.ct(ct,varT,alt="g")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) varT<-var.seg.coeff(ct,covN) Zseg.coeff(Y,cls) Zseg.coeff.ct(ct,varT) Zseg.coeff(Y,cls,method="max") Zseg.coeff(Y,cls,alt="g") Zseg.coeff.ct(ct,varT,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) Zseg.coeff.ct(ct,varT) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) varT<-var.seg.coeff(ct,covN) Zseg.coeff(Y,cls) Zseg.coeff.ct(ct,varT) Zseg.coeff(Y,cls,alt="g") Zseg.coeff.ct(ct,varT,alt="g")
Two functions: Zseg.ind.ct
and Zseg.ind
.
Both functions are objects of class "cellhtest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of deviations of
segregation indices from their expected values under RL or CSR for each segregation index in the NNCT.
The test for each cell is based on the normal approximation of the corresponding segregation index.
Each function yields a contingency table of the test statistics, -values for the corresponding
alternative, lower and upper confidence levels, sample estimates (i.e. observed values) and null value(s) (i.e. expected values) for the segregation indices
and also names of the test statistics, estimates, null value and the method and the data set used.
The null hypothesis for each cell is that the corresponding segregation index equal to the expected value
(which is 0) under RL or CSR.
See also (Ceyhan (2014)).
Zseg.ind.ct( ct, varN, inf.corr = FALSE, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zseg.ind( dat, lab, inf.corr = FALSE, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Zseg.ind.ct( ct, varN, inf.corr = FALSE, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zseg.ind( dat, lab, inf.corr = FALSE, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
ct |
A nearest neighbor contingency table, used in |
varN |
The variance matrix for cell counts in the NNCT, |
inf.corr |
A logical argument (default= |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
LCL , UCL
|
Matrix of Lower and Upper Confidence Levels for the segregation indices at the given confidence
level |
cnf.lvl |
Level of the upper and lower confidence limits of the segregation indices,
provided in |
estimate |
Estimate of the parameter, i.e., matrix of the observed segregation indices |
est.name , est.name2
|
Names of the estimates, both are same in this function |
null.value |
Hypothesized values for the parameters, i.e. the null values of the segregation indices, which are all 0 under RL or CSR. |
null.name |
Name of the null value |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014). “Segregation indices for disease clustering.” Statistics in Medicine, 33(10), 1662-1684.
seg.ind
and Zseg.coeff
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct seg.ind(ct) seg.ind(ct,inf.corr=TRUE) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) varN Zseg.ind(Y,cls) Zseg.ind(Y,cls,inf.corr=TRUE) Zseg.ind.ct(ct,varN) Zseg.ind(Y,cls,alt="g") Zseg.ind.ct(ct,varN,alt="g") Zseg.ind(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Zseg.ind(Y,cls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) varN Zseg.ind(Y,cls) Zseg.ind(Y,cls,inf.corr = TRUE) Zseg.ind.ct(ct,varN) Zseg.ind.ct(ct,varN,inf.corr = TRUE) #1D data points n<-20 #or try sample(1:20,1) X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) Zseg.ind(X,cls) Zseg.ind.ct(ct,varN) Zseg.ind.ct(ct,varN,inf.corr=TRUE)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct seg.ind(ct) seg.ind(ct,inf.corr=TRUE) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) varN Zseg.ind(Y,cls) Zseg.ind(Y,cls,inf.corr=TRUE) Zseg.ind.ct(ct,varN) Zseg.ind(Y,cls,alt="g") Zseg.ind.ct(ct,varN,alt="g") Zseg.ind(Y,cls,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Zseg.ind(Y,cls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) varN Zseg.ind(Y,cls) Zseg.ind(Y,cls,inf.corr = TRUE) Zseg.ind.ct(ct,varN) Zseg.ind.ct(ct,varN,inf.corr = TRUE) #1D data points n<-20 #or try sample(1:20,1) X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) Zseg.ind(X,cls) Zseg.ind.ct(ct,varN) Zseg.ind.ct(ct,varN,inf.corr=TRUE)
Two functions: Zself.ref.ct
and Zself.ref
.
Both functions are objects of class "htest"
but with different arguments (see the parameter list below).
Each one performs hypothesis tests of self reflexivity in the NN structure using the
number of self-reflexive NN pairs (i.e. the first diagonal entry, ) in the RCT for
classes.
That is, each test performs a test of self reflexivity corresponding to entry
in the RCT)
which is appropriate (i.e. have the appropriate asymptotic sampling distribution) for completely mapped data.
(See Ceyhan and Bahadir (2017) for more detail).
The self reflexivity test is based on the normal approximation of the diagonal entry
in the RCT and are due to Ceyhan and Bahadir (2017).
Each function yields the test statistic, -value for the
corresponding alternative, the confidence interval, sample estimate (i.e. observed value) and null (i.e., expected) value for the
self reflexivity value (i.e., diagonal entry
value, respectively) in the RCT,
and method and name of the data set used.
The null hypothesis is that in the RCT, where
is the number of reflexive
NNs and
is the probability of any two points selected are being from the same class.
The Zself.ref
functions (i.e. Zself.ref.ct
and Zself.ref
) are different from the Znnref
functions (i.e. Znnref.ct
and Znnref
) and from Znnself
functions (i.e. Znnself.ct
and Znnself
), and also
from Znnself.sum
functions (i.e. Znnself.sum.ct
and Znnself.sum
).
Zself.ref
functions are for testing the self reflexivity for the entire data set
using entry in RCT while
Znnself
functions are testing the self reflexivity at a class-specific level
(i.e. for each class) using the first column in the SCCT, Znnref
functions are for testing the self
reflexivity and mixed non-reflexivity using the diagonal entries in the RCT, and
Znnself.sum
functions are testing the cumulative species correspondence using the sum of the self column (i.e.,
the first column) in the SCCT.
Zself.ref.ct( rfct, nvec, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zself.ref( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Zself.ref.ct( rfct, nvec, alternative = c("two.sided", "less", "greater"), conf.level = 0.95 ) Zself.ref( dat, lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
rfct |
An RCT, used in |
nvec |
The |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
dat |
The data set in one or higher dimensions, each row corresponds to a data point,
used in |
lab |
The |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
p.value |
The |
conf.int |
Confidence interval for the self reflexivity value (i.e., diagonal entry |
estimate |
Estimate of the parameter, i.e., the observed diagonal entry |
null.value |
Hypothesized null value for the self reflexivity value (i.e., expected value of the
diagonal entry |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
ct.name |
Name of the contingency table, |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E, Bahadir S (2017). “Nearest Neighbor Methods for Testing Reflexivity.” Environmental and Ecological Statistics, 24(1), 69-108.
Znnref.ct
, Znnref
, Zmixed.nonref.ct
and
Zmixed.nonref
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Zself.ref(Y,cls) Zself.ref(Y,cls,method="max") Zself.ref.ct(rfct,nvec) Zself.ref.ct(rfct,nvec,alt="g") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Zself.ref(Y,cls,alt="g") Zself.ref.ct(rfct,nvec) Zself.ref.ct(rfct,nvec,alt="l")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Zself.ref(Y,cls) Zself.ref(Y,cls,method="max") Zself.ref.ct(rfct,nvec) Zself.ref.ct(rfct,nvec,alt="g") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) nvec<-as.numeric(table(cls)) rfct<-rct(ipd,cls) Zself.ref(Y,cls,alt="g") Zself.ref.ct(rfct,nvec) Zself.ref.ct(rfct,nvec,alt="l")
statisticTwo functions: ZTkinv
and ZTkinv.sim
, each of which is an object of class "htest"
performing a
-test for Cuzick and Edwards
test statistic. See
ceTkinv
for a description of
test statistic.
The function ZTkinv
performs a -test for
using asymptotic normality with a simulation estimated
variance under RL of cases and controls to the given points.
And the function
ZTkinv.sim
performs test for based on MC simulations under the RL hypothesis.
Asymptotic normality for the is not established yet, but this seems likely according to
Cuzick and Edwards (1990).
If asymptotic normality holds, it seems a larger sample size would be needed before this becomes
an effective approximation.
Hence the simulation-based test
ZTkinv.sim
is recommended for use to be safe.
When ZTkinv
is used, this is also highlighted with the warning "asymptotic normality of is not yet established,
so simulation-based test is recommended".
All arguments are common for both functions, except for ..., Nvar.sim which are used in ZTkinv
only,
and Nsim
, which is used in ZTkinv.sim
only.
The argument cc.lab
is case-control label, 1 for case, 0 for control, if the argument case.lab
is NULL
,
then cc.lab
should be provided in this fashion, if case.lab
is provided, the labels are converted to 0's
and 1's accordingly.
The argument Nvar.sim
represents the number of resamplings (without replacement) in the
RL scheme, with default being 1000
for estimating the variance of statistic in
ZTkinv
.
The argument Nsim
represents the number of resamplings (without replacement) in the
RL scheme, with default being 1000
for estimating the values in
ZTkinv.sim
.
Both functions might take a very long time when data size is large or Nsim
is large.
See also (Cuzick and Edwards (1990)) and the references therein.
ZTkinv( dat, k, cc.lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, case.lab = NULL, Nvar.sim = 1000, ... ) ZTkinv.sim( dat, k, cc.lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, case.lab = NULL, Nsim = 1000 )
ZTkinv( dat, k, cc.lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, case.lab = NULL, Nvar.sim = 1000, ... ) ZTkinv.sim( dat, k, cc.lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, case.lab = NULL, Nsim = 1000 )
dat |
The data set in one or higher dimensions, each row corresponds to a data point, used in both functions. |
k |
Integer specifying the number of the closest controls to subject |
cc.lab |
Case-control labels, 1 for case, 0 for control, used in both functions. |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
case.lab |
The label used for cases in the |
Nvar.sim |
The number of simulations, i.e., the number of resamplings under the RL scheme to estimate the
variance of Tkinv, used in |
... |
are for further arguments, such as |
Nsim |
The number of simulations, i.e., the number of resamplings under the RL scheme to estimate the
|
A list
with the elements
statistic |
The |
p.value |
The |
conf.int |
Confidence interval for the Cuzick and Edwards |
-critical values are used in the construction of the confidence interval in
ZTkinv
,
while the percentile values are used in the generated sample of values in
ZTkinv.sim
estimate |
Estimate of the parameter, i.e., the Cuzick and Edwards |
null.value |
Hypothesized null value for the Cuzick and Edwards |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
data.name |
Name of the data set, |
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
n<-10 #try also 20, 50, 100 set.seed(123) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) k<-2 ZTkinv(Y,k,cls) ZTkinv(Y,k,cls+1,case.lab = 2,alt="l") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ZTkinv(Y,k,fcls,case.lab="a") n<-10 #try also 20, 50, 100 set.seed(123) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) k<-2 # try also 3,5 ZTkinv.sim(Y,k,cls) ZTkinv.sim(Y,k,cls,conf=.9,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ZTkinv.sim(Y,k,fcls,case.lab="a") #with k=1 ZTkinv.sim(Y,k=1,cls) ZTrun(Y,cls)
n<-10 #try also 20, 50, 100 set.seed(123) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) k<-2 ZTkinv(Y,k,cls) ZTkinv(Y,k,cls+1,case.lab = 2,alt="l") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ZTkinv(Y,k,fcls,case.lab="a") n<-10 #try also 20, 50, 100 set.seed(123) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) k<-2 # try also 3,5 ZTkinv.sim(Y,k,cls) ZTkinv.sim(Y,k,cls,conf=.9,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ZTkinv.sim(Y,k,fcls,case.lab="a") #with k=1 ZTkinv.sim(Y,k=1,cls) ZTrun(Y,cls)
Returns the index matrix for choosing the entries in the covariance matrix for NNCT
used for computing the covariance for Dixon's NN symmetry test.
The matrix is with each row is the
corresponding to
in the NNCT.
ind.nnsym(k)
ind.nnsym(k)
k |
An integer specifying the number of classes in the data set |
The index matrix with each row is the
corresponding to
in the NNCT
Elvan Ceyhan
Returns the index matrix for choosing the entries in the covariance matrix for NNCT
used for computing the covariance for the extension of Pielou's segregation coefficient to the multi-class
case. The matrix is with each row is the
corresponding to
in the NNCT.
ind.seg.coeff(k)
ind.seg.coeff(k)
k |
An integer specifying the number of classes in the data set |
The index matrix with each row is the
corresponding to
in the NNCT
Elvan Ceyhan
cov.seg.coeff
, seg.coeff
and ind.nnsym
This function computes and returns the distance matrix computed by using the specified distance measure to
compute the distances between the rows of the set of points x
and y
using the
dist
function in the stats
package of the standard R distribution.
If y
is provided (default=NULL
) it yields a matrix of distances between the rows of x
and
rows of y
. Otherwise, it provides a square matrix with i,j-th entry being the distance between row
and row
of
x
.
This function is different from the dist
function in the stats
package.
dist
returns the distance matrix in a lower triangular form, and ipd.mat
returns in a full matrix.
... are for further arguments, such as method
and p
, passed to the dist
function.
ipd.mat(x, y = NULL, ...)
ipd.mat(x, y = NULL, ...)
x |
A set of points in matrix or data frame form where points correspond to the rows. |
y |
A set of points in matrix or data frame form where points correspond to the rows (default= |
... |
Additional parameters to be passed on the |
A distance matrix whose i,j-th entry is the distance between row of
x
and row of
y
if y
is provided,
otherwise i,j-th entry is the distance between rows and
of
x
.
Elvan Ceyhan
dist
, ipd.mat.euc
, dist.std.data
#3D data points n<-3 X<-matrix(runif(3*n),ncol=3) mtd<-"euclidean" #try also "maximum", "manhattan", "canberra", "binary" ipd.mat(X,method=mtd) ipd.mat(X,method="minkowski",p=6) n<-5 Y<-matrix(runif(3*n),ncol=3) ipd.mat(X,Y,method=mtd) ipd.mat(X[1,],Y,method=mtd) ipd.mat(c(.1,.2,.3),Y,method=mtd) ipd.mat(X[1,],Y[3,],method=mtd) #1D data points X<-as.matrix(runif(3)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(3) would not work ipd.mat(X) Y<-as.matrix(runif(5)) ipd.mat(X,Y) ipd.mat(X[1,],Y) ipd.mat(X[1,],Y[3,])
#3D data points n<-3 X<-matrix(runif(3*n),ncol=3) mtd<-"euclidean" #try also "maximum", "manhattan", "canberra", "binary" ipd.mat(X,method=mtd) ipd.mat(X,method="minkowski",p=6) n<-5 Y<-matrix(runif(3*n),ncol=3) ipd.mat(X,Y,method=mtd) ipd.mat(X[1,],Y,method=mtd) ipd.mat(c(.1,.2,.3),Y,method=mtd) ipd.mat(X[1,],Y[3,],method=mtd) #1D data points X<-as.matrix(runif(3)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(3) would not work ipd.mat(X) Y<-as.matrix(runif(5)) ipd.mat(X,Y) ipd.mat(X[1,],Y) ipd.mat(X[1,],Y[3,])
Returns the Euclidean interpoint distance (IPD) matrix of a given the set of points x
and y
using
two for loops with the euc.dist
function of the current package.
If y
is provided (default=NULL
) it yields a matrix of Euclidean distances between the rows of x
and rows of y
,
otherwise it provides a square matrix with i,j-th entry being the Euclidean distance between row and row
of
x
. This function is different from the ipd.mat
function in this package.
ipd.mat
returns the full distance matrix for a variety of distance metrics (including the
Euclidean metric), while ipd.mat.euc
uses the Euclidean distance metric only.
ipd.mat.euc(X)
and ipd.mat(X)
yield the same output for a set of points X
,
as the default metric in ipd.mat
is also "euclidean"
.
ipd.mat.euc(x, y = NULL)
ipd.mat.euc(x, y = NULL)
x |
A set of points in matrix or data frame form where points correspond to the rows. |
y |
A set of points in matrix or data frame form where points correspond to the rows (default= |
A distance matrix whose i,j-th entry is the Euclidean distance between row of
x
and
row of
y
if y
is provided, otherwise i,j-th entry is
the Euclidean distance between rows and
of
x
.
Elvan Ceyhan
dist
, ipd.mat.euc
, dist.std.data
#3D data points n<-3 X<-matrix(runif(3*n),ncol=3) ipd.mat.euc(X) n<-5 Y<-matrix(runif(3*n),ncol=3) ipd.mat.euc(X,Y) ipd.mat.euc(X[1,],Y) ipd.mat.euc(c(.1,.2,.3),Y) ipd.mat.euc(X[1,],Y[3,]) #1D data points X<-as.matrix(runif(3)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(3) would not work ipd.mat.euc(X) Y<-as.matrix(runif(5)) ipd.mat.euc(X,Y) ipd.mat.euc(X[1,],Y) ipd.mat.euc(X[1,],Y[3,])
#3D data points n<-3 X<-matrix(runif(3*n),ncol=3) ipd.mat.euc(X) n<-5 Y<-matrix(runif(3*n),ncol=3) ipd.mat.euc(X,Y) ipd.mat.euc(X[1,],Y) ipd.mat.euc(c(.1,.2,.3),Y) ipd.mat.euc(X[1,],Y[3,]) #1D data points X<-as.matrix(runif(3)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(3) would not work ipd.mat.euc(X) Y<-as.matrix(runif(5)) ipd.mat.euc(X,Y) ipd.mat.euc(X[1,],Y) ipd.mat.euc(X[1,],Y[3,])
k
NNs of a given pointReturns the indices of the k
nearest neighbors of subject given data set or IPD matrix
x
.
Subject indices correspond to rows (i.e. rows 1:n
) if x
is the data set and to rows or columns
if x
is the IPD matrix.
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
kNN(x, i, k, is.ipd = TRUE, ...)
kNN(x, i, k, is.ipd = TRUE, ...)
x |
The IPD matrix (if |
i |
index of (i.e., row number for) the subject whose NN is to be found. |
k |
Integer specifying the number of NNs (of subject |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
Returns the indices (i.e. row numbers) of the k
NNs of subject
Elvan Ceyhan
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) k<-sample(1:5,1) k NN(ipd,1) kNN(ipd,1,k) kNN(Y,1,k,is.ipd = FALSE) kNN(Y,1,k,is.ipd = FALSE,method="max") NN(ipd,5) kNN(ipd,5,k) kNN(Y,5,k,is.ipd = FALSE) #1D data points X<-as.matrix(runif(15)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) kNN(ipd,3,k) #with possible ties in the data Y<-matrix(round(runif(30)*10),ncol=3) ny<-nrow(Y) ipd<-ipd.mat(Y) for (i in 1:ny) cat(i,":",kNN(ipd,i,k),"\n")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) k<-sample(1:5,1) k NN(ipd,1) kNN(ipd,1,k) kNN(Y,1,k,is.ipd = FALSE) kNN(Y,1,k,is.ipd = FALSE,method="max") NN(ipd,5) kNN(ipd,5,k) kNN(Y,5,k,is.ipd = FALSE) #1D data points X<-as.matrix(runif(15)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) kNN(ipd,3,k) #with possible ties in the data Y<-matrix(round(runif(30)*10),ncol=3) ny<-nrow(Y) ipd<-ipd.mat(Y) for (i in 1:ny) cat(i,":",kNN(ipd,i,k),"\n")
Converts the contingency table (or any matrix) ct
to a vector
by default row-wise (i.e., by appending
each row one after the other) or column-wise, and also returns the entry indices (in the original matrix ct
)
in a matrix
mat2vec(ct, byrow = TRUE)
mat2vec(ct, byrow = TRUE)
ct |
A matrix, in particular a contingency table |
byrow |
A logical argument (default= |
A list
with two elements
vec |
The |
ind |
The |
Elvan Ceyhan
ind.nnsym
and ind.seg.coeff
,
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct mat2vec(ct) mat2vec(ct,byrow=FALSE) #an arbitrary 3x3 matrix M<-matrix(sample(10:20,9),ncol=3) M mat2vec(M) mat2vec(M,byrow=FALSE)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct mat2vec(ct) mat2vec(ct,byrow=FALSE) #an arbitrary 3x3 matrix M<-matrix(sample(10:20,9),ncol=3) M mat2vec(M) mat2vec(M,byrow=FALSE)
Computes the square root of the matrix , where
does not have to be a square matrix,
when the square root exists.
See https://people.orie.cornell.edu/davidr/SDAFE2/Rscripts/SDAFE2.R
matrix.sqrt(A)
matrix.sqrt(A)
A |
A matrix, not necessarily square |
Returns the square root of , if exists, otherwise gives an error message.
Elvan Ceyhan
A<-matrix(sample(20:40,4),ncol=2) matrix.sqrt(A) A<-matrix(sample(20:40,16),ncol=4) matrix.sqrt(A) #sqrt of inverse of A, or sqrt inverse of A matrix.sqrt(solve(A)) #non-square matrix A<-matrix(sample(20:40,20),ncol=4) matrix.sqrt(A)
A<-matrix(sample(20:40,4),ncol=2) matrix.sqrt(A) A<-matrix(sample(20:40,16),ncol=4) matrix.sqrt(A) #sqrt of inverse of A, or sqrt inverse of A matrix.sqrt(solve(A)) #non-square matrix A<-matrix(sample(20:40,20),ncol=4) matrix.sqrt(A)
Returns the Qvec
and R
where with
is the number of points shared as a NN
by
other points i.e. number of points that are NN of
points, for
and
R
is the number of reflexive pairs where A and B are reflexive iff they are NN to each other.
Ninv(x, is.ipd = TRUE, ...)
Ninv(x, is.ipd = TRUE, ...)
x |
The IPD matrix (if |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
Returns a list
with two elements
Qvec |
vector of |
R |
number of reflexive points |
Elvan Ceyhan
Qval
, Qvec
, sharedNN
, Rval
and QRval
#3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd) sharedNN(W) Qvec(W) Ninv(ipd) Ninv(Y,is.ipd = FALSE) Ninv(Y,is.ipd = FALSE,method="max") #1D data points n<-15 X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) W<-Wmat(ipd) sharedNN(W) Qvec(W) Ninv(ipd) #with possible ties in the data Y<-matrix(round(runif(30)*10),ncol=3) ny<-nrow(Y) ipd<-ipd.mat(Y) W<-Wmat(ipd) sharedNN(W) Qvec(W) Ninv(ipd)
#3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd) sharedNN(W) Qvec(W) Ninv(ipd) Ninv(Y,is.ipd = FALSE) Ninv(Y,is.ipd = FALSE,method="max") #1D data points n<-15 X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) W<-Wmat(ipd) sharedNN(W) Qvec(W) Ninv(ipd) #with possible ties in the data Y<-matrix(round(runif(30)*10),ncol=3) ny<-nrow(Y) ipd<-ipd.mat(Y) W<-Wmat(ipd) sharedNN(W) Qvec(W) Ninv(ipd)
Returns the index (or indices) of the nearest neighbor(s) of subject given data set or IPD matrix
x
.
It will yield a vector
if there are ties, and subject indices correspond to rows (i.e. rows 1:n
) if x
is the data set and to rows or columns if x
is the IPD matrix.
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
NN(x, i, is.ipd = TRUE, ...)
NN(x, i, is.ipd = TRUE, ...)
x |
The IPD matrix (if |
i |
index of (i.e., row number for) the subject whose NN is to be found. |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
Returns the index (indices) i.e. row number(s) of the NN of subject
Elvan Ceyhan
#3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) NN(ipd,1) NN(Y,1,is.ipd = FALSE) NN(ipd,5) NN(Y,5,is.ipd = FALSE) NN(Y,5,is.ipd = FALSE,method="max") #1D data points X<-as.matrix(runif(15)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) NN(ipd,1) NN(ipd,5) #with possible ties in the data Y<-matrix(round(runif(30)*10),ncol=3) ny<-nrow(Y) ipd<-ipd.mat(Y) for (i in 1:ny) cat(i,":",NN(ipd,i),"|",NN(Y,i,is.ipd = FALSE),"\n")
#3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) NN(ipd,1) NN(Y,1,is.ipd = FALSE) NN(ipd,5) NN(Y,5,is.ipd = FALSE) NN(Y,5,is.ipd = FALSE,method="max") #1D data points X<-as.matrix(runif(15)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) NN(ipd,1) NN(ipd,5) #with possible ties in the data Y<-matrix(round(runif(30)*10),ncol=3) ny<-nrow(Y) ipd<-ipd.mat(Y) for (i in 1:ny) cat(i,":",NN(ipd,i),"|",NN(Y,i,is.ipd = FALSE),"\n")
Returns the NNCT given the IPD matrix or data set
x
where is
the number of classes in the data set.
Rows and columns of the NNCT are labeled with the corresponding class labels.
The argument ties
is a logical argument (default=FALSE
) to take ties into account or not.
If TRUE
a NN
contributes to the NN count if it is one of the
tied NNs of a subject.
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
See also (Dixon (1994, 2002); Ceyhan (2010, 2017)) and the references therein.
nnct(x, lab, ties = FALSE, is.ipd = TRUE, ...)
nnct(x, lab, ties = FALSE, is.ipd = TRUE, ...)
x |
The IPD matrix (if |
lab |
The |
ties |
A logical argument (default= |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
Returns the NNCT where
is the number of classes in the data set.
Elvan Ceyhan
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Ceyhan E (2017).
“Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.”
Journal of the Korean Statistical Society, 46(2), 219-245.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct(ipd,cls) nnct(ipd,cls,ties = TRUE) nnct(Y,cls,is.ipd = FALSE) nnct(Y,cls,is.ipd = FALSE,method="max") nnct(Y,cls,is.ipd = FALSE,method="mink",p=6) #with one class, it works but really uninformative cls<-rep(1,n) nnct(ipd,cls) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) nnct(ipd,fcls) #cls as an unsorted factor fcls1<-sample(c("a","b"),n,replace = TRUE) nnct(ipd,fcls1) fcls2<-sort(fcls1) nnct(ipd,fcls2) #ipd needs to be sorted as well, otherwise this result will not agree with fcls1 nnct(Y,fcls1,ties = TRUE,is.ipd = FALSE) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct(ipd,cls) nnct(Y,cls,is.ipd = FALSE) #cls as a factor fcls<-rep(letters[1:4],rep(10,4)) nnct(ipd,fcls) #1D data points n<-20 #or try sample(1:20,1) X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct(ipd,cls) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) nnct(ipd,fcls) #with possible ties in the data Y<-matrix(round(runif(3*n)*10),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct(ipd,cls) nnct(ipd,cls,ties = TRUE)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct(ipd,cls) nnct(ipd,cls,ties = TRUE) nnct(Y,cls,is.ipd = FALSE) nnct(Y,cls,is.ipd = FALSE,method="max") nnct(Y,cls,is.ipd = FALSE,method="mink",p=6) #with one class, it works but really uninformative cls<-rep(1,n) nnct(ipd,cls) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) nnct(ipd,fcls) #cls as an unsorted factor fcls1<-sample(c("a","b"),n,replace = TRUE) nnct(ipd,fcls1) fcls2<-sort(fcls1) nnct(ipd,fcls2) #ipd needs to be sorted as well, otherwise this result will not agree with fcls1 nnct(Y,fcls1,ties = TRUE,is.ipd = FALSE) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct(ipd,cls) nnct(Y,cls,is.ipd = FALSE) #cls as a factor fcls<-rep(letters[1:4],rep(10,4)) nnct(ipd,fcls) #1D data points n<-20 #or try sample(1:20,1) X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct(ipd,cls) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) nnct(ipd,fcls) #with possible ties in the data Y<-matrix(round(runif(3*n)*10),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct(ipd,cls) nnct(ipd,cls,ties = TRUE)
Returns the NNCT with sampling replacement of the points for each base point. That is, for each base
point, the rows in the IPD matrix are sampled with replacement and the NN counts are updated accordingly.
Row and columns of the NNCT are labeled with the corresponding class labels.
The argument self is a logical argument (default=TRUE
) for including the base point in the resampling or not.
If TRUE
, for each base point all entries in the row are sampled (with replacement) so the point itself can
also be sampled multiple times and if FALSE
the point is excluded from the resampling (i.e. other points
are sampled with replacement).
The argument ties
is a logical argument (default=FALSE
) to take ties into account or not. If TRUE
a NN
contributes to the NN count if it is one of the
tied NNs of a subject.
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
nnct.boot.dis(x, lab, self = TRUE, ties = TRUE, is.ipd = TRUE, ...)
nnct.boot.dis(x, lab, self = TRUE, ties = TRUE, is.ipd = TRUE, ...)
x |
The IPD matrix (if |
lab |
The |
self |
A logical argument (default= |
ties |
A logical argument (default= |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
Returns the NNCT where
is the number of classes in the data set with sampling replacement
of the rows of the IPD matrix.
Elvan Ceyhan
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct.boot.dis(ipd,cls) nnct.boot.dis(Y,cls,is.ipd = FALSE) #may give different result from above due to random sub-sampling nnct.boot.dis(ipd,cls,self = FALSE) nnct.boot.dis(ipd,cls,ties = FALSE) #differences are due to ties and resampling of distances #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) nnct.boot.dis(ipd,fcls) #cls as an unsorted factor fcls<-sample(c("a","b"),n,replace = TRUE) nnct.boot.dis(ipd,fcls) fcls<-sort(fcls) nnct.boot.dis(ipd,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct.boot.dis(ipd,cls) #cls as a factor fcls<-rep(letters[1:4],rep(10,4)) nnct.boot.dis(ipd,fcls) #1D data points n<-20 #or try sample(1:20,1) X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct.boot.dis(ipd,cls) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) nnct.boot.dis(ipd,fcls) #with possible ties in the data Y<-matrix(round(runif(3*n)*10),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct.boot.dis(ipd,cls) nnct.boot.dis(ipd,cls,self = FALSE) nnct.boot.dis(ipd,cls,ties = FALSE) #differences are due to ties and resampling of distances
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct.boot.dis(ipd,cls) nnct.boot.dis(Y,cls,is.ipd = FALSE) #may give different result from above due to random sub-sampling nnct.boot.dis(ipd,cls,self = FALSE) nnct.boot.dis(ipd,cls,ties = FALSE) #differences are due to ties and resampling of distances #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) nnct.boot.dis(ipd,fcls) #cls as an unsorted factor fcls<-sample(c("a","b"),n,replace = TRUE) nnct.boot.dis(ipd,fcls) fcls<-sort(fcls) nnct.boot.dis(ipd,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct.boot.dis(ipd,cls) #cls as a factor fcls<-rep(letters[1:4],rep(10,4)) nnct.boot.dis(ipd,fcls) #1D data points n<-20 #or try sample(1:20,1) X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct.boot.dis(ipd,cls) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) nnct.boot.dis(ipd,fcls) #with possible ties in the data Y<-matrix(round(runif(3*n)*10),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct.boot.dis(ipd,cls) nnct.boot.dis(ipd,cls,self = FALSE) nnct.boot.dis(ipd,cls,ties = FALSE) #differences are due to ties and resampling of distances
Returns the NNCT with (only) base points are restricted to be in the subset of indices
ss
using
the IPD matrix or data set x
where is the number of classes in the data set. That is, the base points
are the points with indices in
ss
but for the NNs the function checks all the points in the data set
(including the points in ss
).
Row and columns of the NNCT are labeled with the corresponding class labels.
The argument ties
is a logical argument (default=FALSE
) to take ties into account or not. If TRUE
a NN
contributes to the NN count if it is one of the
tied NNs of a subject.
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
nnct.sub(ss, x, lab, ties = FALSE, is.ipd = TRUE, ...)
nnct.sub(ss, x, lab, ties = FALSE, is.ipd = TRUE, ...)
ss |
indices of subjects (i.e., row indices in the data set) chosen to be the base points |
x |
The IPD matrix (if |
lab |
The |
ties |
A logical argument (default= |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
Returns the NNCT where
is the number of classes in the data set with (only) base points
restricted to a subsample
ss
.
Elvan Ceyhan
nnct
and nnct.boot.dis
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct(ipd,cls) #subsampling indices ss<-sample(1:n,floor(n/2)) nnct.sub(ss,ipd,cls) nnct.sub(ss,Y,cls,is.ipd = FALSE) nnct.sub(ss,ipd,cls,ties = TRUE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) nnct.sub(ss,ipd,fcls) #cls as an unsorted factor fcls<-sample(c("a","b"),n,replace = TRUE) nnct(ipd,fcls) nnct.sub(ss,ipd,fcls) fcls<-sort(fcls) nnct.sub(ss,ipd,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ss<-sample(1:40,30) nnct.sub(ss,ipd,cls) #cls as a factor fcls<-rep(letters[1:4],rep(10,4)) nnct.sub(ss,ipd,cls) #1D data points n<-20 #or try sample(1:20,1) X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct(ipd,cls) #subsampling indices ss<-sample(1:n,floor(n/2)) nnct.sub(ss,ipd,cls) #with possible ties in the data Y<-matrix(round(runif(120)*10),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ss<-sample(1:40,30) nnct.sub(ss,ipd,cls) nnct.sub(ss,ipd,cls,ties = TRUE)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct(ipd,cls) #subsampling indices ss<-sample(1:n,floor(n/2)) nnct.sub(ss,ipd,cls) nnct.sub(ss,Y,cls,is.ipd = FALSE) nnct.sub(ss,ipd,cls,ties = TRUE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) nnct.sub(ss,ipd,fcls) #cls as an unsorted factor fcls<-sample(c("a","b"),n,replace = TRUE) nnct(ipd,fcls) nnct.sub(ss,ipd,fcls) fcls<-sort(fcls) nnct.sub(ss,ipd,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ss<-sample(1:40,30) nnct.sub(ss,ipd,cls) #cls as a factor fcls<-rep(letters[1:4],rep(10,4)) nnct.sub(ss,ipd,cls) #1D data points n<-20 #or try sample(1:20,1) X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) nnct(ipd,cls) #subsampling indices ss<-sample(1:n,floor(n/2)) nnct.sub(ss,ipd,cls) #with possible ties in the data Y<-matrix(round(runif(120)*10),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ss<-sample(1:40,30) nnct.sub(ss,ipd,cls) nnct.sub(ss,ipd,cls,ties = TRUE)
Returns the distances between subjects and their NNs. The output is an matrix where
is the data size
and first column is the subject index and second column contains the corresponding distances to NN subjects.
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
NNdist(x, is.ipd = TRUE, ...)
NNdist(x, is.ipd = TRUE, ...)
x |
The IPD matrix (if |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
Returns an matrix where
is data size (i.e. number of subjects) and first column is the subject
index and second column is the NN distances.
Elvan Ceyhan
kthNNdist
, kNNdist
, and NNdist2cl
#3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) NNdist(ipd) NNdist(Y,is.ipd = FALSE) NNdist(Y,is.ipd = FALSE,method="max") #1D data points X<-as.matrix(runif(5)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) NNdist(ipd) NNdist(X,is.ipd = FALSE)
#3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) NNdist(ipd) NNdist(Y,is.ipd = FALSE) NNdist(Y,is.ipd = FALSE,method="max") #1D data points X<-as.matrix(runif(5)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) NNdist(ipd) NNdist(X,is.ipd = FALSE)
and their NNs from class
Returns the distances between subjects from class and their nearest neighbors (NNs) from class
.
The output is a
list
with first entry (nndist
) being an matrix where
is the size of class
and first column is the subject index in class
, second column is the subject index in NN class
,
and third column contains the corresponding distances of each class
subject to its NN among class
subjects. Class
is labeled as base class and class
is labeled as NN class.
The argument within.class.ind
is a logical argument (default=FALSE
) to determine the indexing of
the class subjects. If
TRUE
, index numbering of subjects is within the class,
from 1 to class size (i.e., 1:n_i
), according to their order in the original data;
otherwise, index numbering within class is just the indices in the original data.
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
NNdist2cl(x, i, j, lab, within.class.ind = FALSE, is.ipd = TRUE, ...)
NNdist2cl(x, i, j, lab, within.class.ind = FALSE, is.ipd = TRUE, ...)
x |
The IPD matrix (if |
i , j
|
class label of base class and NN classes, respectively. |
lab |
The |
within.class.ind |
A logical parameter (default= |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
Returns a list
with three elements
nndist |
|
base.class |
label of base class |
nn.class |
label of NN class |
Elvan Ceyhan
kthNNdist
, kNNdist
, and NNdist2cl
#3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels table(clab) NNdist2cl(ipd,1,2,clab) NNdist2cl(Y,1,2,clab,is.ipd = FALSE) NNdist2cl(ipd,1,2,clab,within = TRUE) #three class case clab<-sample(1:3,n,replace=TRUE) #class labels table(clab) NNdist2cl(ipd,2,1,clab) #1D data points n<-15 X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels table(clab) NNdist2cl(ipd,1,2,clab) NNdist2cl(X,1,2,clab,is.ipd = FALSE)
#3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels table(clab) NNdist2cl(ipd,1,2,clab) NNdist2cl(Y,1,2,clab,is.ipd = FALSE) NNdist2cl(ipd,1,2,clab,within = TRUE) #three class case clab<-sample(1:3,n,replace=TRUE) #class labels table(clab) NNdist2cl(ipd,2,1,clab) #1D data points n<-15 X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels table(clab) NNdist2cl(ipd,1,2,clab) NNdist2cl(X,1,2,clab,is.ipd = FALSE)
nnspat
is a package for computation of spatial pattern tests based on NN relations and
generation of various spatial patterns.
The nnspat
package contains the functions for segregation/association tests based on nearest neighbor contingency
tables (NNCTs), and tests for species correspondence, NN symmetry and reflexivity based on the corresponding
contingency tables and functions for generating patterns of segregation, association, uniformity and various
non-random labeling (RL) patterns for disease clustering for data in two (or more) dimensions.
See (Dixon (1994); Ceyhan (2010, 2017)).
nnspat
functionsThe nnspat
functions can be grouped as Auxiliary Functions, NNCT Functions, SCCT Functions, RCT Functions
NN-Symmetry Functions and the Pattern (Generation) Functions.
Contains the auxiliary functions used in NN methods, such as indices of NNs, number of shared NNs, Q, R and T values, and so on. In all these functions the data sets are either matrices or data frames.
Contains the functions for testing segregation/association using the NNCT. The types of the tests are cell- specific tests, class-specific tests and overall tests of segregation. See (Ceyhan (2009, 2010)).
Contains the functions used for testing species correspondence using the NNCT. The types are NN self and self-sum tests and the overall test of species correspondence. See (Ceyhan (2018)).
Contains the functions for testing reflexivity using the reflexivity contingency table (RCT). The types are NN self reflexivity and NN mixed-non reflexivity. See (Ceyhan and Bahadir (2017); Bahadir and Ceyhan (2018)).
Contains the functions for testing NN symmetry using the NNCT and -symmetry contingency table. The types are NN
symmetry and symmetry in shared NN structure.
See (Ceyhan (2014)).
Contains the functions for generating and visualization of spatial patterns of segregation, association, uniformity clustering and non-RL. See (Ceyhan (2014, 2014)).
Bahadir S, Ceyhan E (2018).
“On the Number of reflexive and shared nearest neighbor pairs in one-dimensional uniform data.”
Probability and Mathematical Statistics, 38(1), 123-137.
Ceyhan E (2009).
“Class-Specific Tests of Segregation Based on Nearest Neighbor Contingency Tables.”
Statistica Neerlandica, 63(2), 149-182.
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Ceyhan E (2010).
“Exact Inference for Testing Spatial Patterns by Nearest Neighbor Contingency Tables.”
Journal of Probability and Statistical Science, 8(1), 45-68.
Ceyhan E (2010).
“New Tests of Spatial Segregation Based on Nearest Neighbor Contingency Tables.”
Scandinavian Journal of Statistics, 37(1), 147-165.
Ceyhan E (2010).
“Directional clustering tests based on nearest neighbour contingency tables.”
Journal of Nonparametric Statistics, 22(5), 599-616.
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Ceyhan E (2014).
“Segregation indices for disease clustering.”
Statistics in Medicine, 33(10), 1662-1684.
Ceyhan E (2014).
“Simulation and characterization of multi-class spatial patterns from stochastic point processes of randomness, clustering and regularity.”
Stochastic Environmental Research and Risk Assessment (SERRA), 38(5), 1277-1306.
Ceyhan E (2017).
“Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.”
Journal of the Korean Statistical Society, 46(2), 219-245.
Ceyhan E (2018).
“A contingency table approach based on nearest neighbor relations for testing self and mixed correspondence.”
SORT-Statistics and Operations Research Transactions, 42(2), 125-158.
Ceyhan E, Bahadir S (2017).
“Nearest Neighbor Methods for Testing Reflexivity.”
Environmental and Ecological Statistics, 24(1), 69-108.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Returns the index (indices) of the nearest neighbor(s) of subject (other than subject
) among the indices of points
provided in the subsample
ss
using the given data set or IPD matrix x
. The indices in ss
determine the
columns of the IPD matrix to be used in this function.
It will yield a vector
if there are ties, and subject indices correspond to rows (i.e. rows 1:n
) if x
is the data set and to rows or columns if x
is the IPD matrix.
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
NNsub(ss, x, i, is.ipd = TRUE, ...)
NNsub(ss, x, i, is.ipd = TRUE, ...)
ss |
indices of subjects (i.e., row indices in the data set) among with the NN of subject is to be found |
x |
The IPD matrix (if |
i |
index of (i.e., row number for) the subject whose NN is to be found. |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
Returns a list
with the elements
base.ind |
index of the base subject |
ss.ind |
the index (indices) i.e. row number(s) of the NN of subject |
ss.dis |
distance from subject |
Elvan Ceyhan
#3D data points bura n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) #indices of the subsample ss ss<-sample(1:n,floor(n/2),replace=FALSE) NNsub(ss,ipd,2) NNsub(ss,Y,2,is.ipd = FALSE) NNsub(ss,ipd,5) #1D data points n<-15 X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels #indices of the subsample ss ss<-sample(1:n,floor(n/2),replace=FALSE) NNsub(ss,ipd,2) NNsub(ss,ipd,5) #with possible ties in the data Y<-matrix(round(runif(60)*10),ncol=3) ipd<-ipd.mat(Y) ss<-sample(1:20,10,replace=FALSE) #class labels NNsub(ss,ipd,2) NNsub(ss,ipd,5)
#3D data points bura n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) #indices of the subsample ss ss<-sample(1:n,floor(n/2),replace=FALSE) NNsub(ss,ipd,2) NNsub(ss,Y,2,is.ipd = FALSE) NNsub(ss,ipd,5) #1D data points n<-15 X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) #two class case clab<-sample(1:2,n,replace=TRUE) #class labels #indices of the subsample ss ss<-sample(1:n,floor(n/2),replace=FALSE) NNsub(ss,ipd,2) NNsub(ss,ipd,5) #with possible ties in the data Y<-matrix(round(runif(60)*10),ncol=3) ipd<-ipd.mat(Y) ss<-sample(1:20,10,replace=FALSE) #class labels NNsub(ss,ipd,2) NNsub(ss,ipd,5)
Value (found with the definition formula)This function computes the value which is required in the computation of the asymptotic variance
of Cuzick and Edwards
test. Nt is defined on page 78 of (Cuzick and Edwards (1990)) as follows.
(i.e, number of triplets
, and
distinct so that
is among
NNs of
and
is among
NNs of
).
This function yields the same result as the asyvarTk
and varTk
functions with $Nt
inserted at the
end.
See (Cuzick and Edwards (1990)) for more details.
Nt.def(a)
Nt.def(a)
a |
The |
Returns the value standing for the number of triplets
, and
distinct so that
is among
NNs of
and
is among
NNs of
. See the description.
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) k<-2 #try also 2,3 a<-aij.mat(Y,k) Nt.def(a)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) k<-2 #try also 2,3 a<-aij.mat(Y,k) Nt.def(a)
ValueThis function computes the value which is required in the computation of the exact and asymptotic variance
of Cuzick and Edwards
test, which is a linear combination of some
tests.
is defined on page 80 of (Cuzick and Edwards (1990)) as follows.
Let
be 1 if
is a
k
NN of and zero otherwise and
.
The logical argument nonzero.mat
(default=TRUE
) is for using the matrix if
FALSE
or just the matrix of nonzero
locations in the matrix (if
TRUE
) in the computations.
See (Cuzick and Edwards (1990)) for more details.
Ntkl(dat, k, l, nonzero.mat = TRUE, ...)
Ntkl(dat, k, l, nonzero.mat = TRUE, ...)
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
k , l
|
Integers specifying the number of NNs (of subjects |
nonzero.mat |
A logical argument (default is |
... |
are for further arguments, such as |
Returns the value. See the description.
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
asycovTkTl
, and covTkTl
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) k<-1 #try also 2,3 or sample(1:5,1) l<-1 #try also 2,3 or sample(1:5,1) c(k,l) Ntkl(Y,k,l) Ntkl(Y,k,l,nonzero.mat = FALSE) Ntkl(Y,k,l,method="max")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) k<-1 #try also 2,3 or sample(1:5,1) l<-1 #try also 2,3 or sample(1:5,1) c(k,l) Ntkl(Y,k,l) Ntkl(Y,k,l,nonzero.mat = FALSE) Ntkl(Y,k,l,method="max")
Keeps only the specified labels and
and returns the data from classes with these labes and also
the corresponding label vector having class labels
and
only.
See also (Ceyhan (2017)).
pairwise.lab(dat, lab, i, j)
pairwise.lab(dat, lab, i, j)
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
lab |
The |
i , j
|
Label of the classes that are to be retained in the post-hoc comparison. |
A list
with two elements
data.pair |
The type of the pattern from which points are to be generated |
lab.pair |
The |
Elvan Ceyhan
Ceyhan E (2017). “Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.” Journal of the Korean Statistical Society, 46(2), 219-245.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) pairwise.lab(Y,cls,1,2) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) pairwise.lab(Y,cls,2,3) #cls as a factor fcls<-rep(letters[1:4],rep(10,4)) pairwise.lab(Y,fcls,"b","c")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) pairwise.lab(Y,cls,1,2) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) pairwise.lab(Y,cls,2,3) #cls as a factor fcls<-rep(letters[1:4],rep(10,4)) pairwise.lab(Y,fcls,"b","c")
This function finds and returns the k
smallest and k
largest distances in a distance matrix or distance object,
and also provides pairs of objects these distances correspond to.
The code is adapted from
http://people.stat.sc.edu/Hitchcock/chapter1_R_examples.txt.
pick.min.max(ds, k = 1)
pick.min.max(ds, k = 1)
ds |
A distance matrix or a distance object |
k |
A positive integer representing the number of (min and max) distances to be presented, default is |
A list
with the elements
min.dis |
The |
ind.min.dis |
The indices (i.e. row numbers) of the |
max.dis |
The |
ind.max.dis |
The indices (i.e. row numbers) of the |
Elvan Ceyhan
dist
, ipd.mat
, and ipd.mat.euc
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) pick.min.max(ipd) #or pick.min.max(dist(Y)) pick.min.max(ipd,2)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) pick.min.max(ipd) #or pick.min.max(dist(Y)) pick.min.max(ipd,2)
k
items selected from the class with size
Returns the ratio ,
which is the probability that the
k
selected
objects are from class 1 with size (denoted as
n1
as an argument)
and the total data size is n
.
This probability is valid under RL or CSR.
This function computes the value which is required in the computation of the variance
of Cuzick and Edwards
test.
is defined as the ratio
.
The argument, , is the number of cases (denoted as
n1
as an argument).
The number of cases are denoted as and number of controls as
in this function
to match the case-control class labeling,
which is just the reverse of the labeling in Cuzick and Edwards (1990).
See (Cuzick and Edwards (1990)) for more details.
pk(n, n1, k) pk(n, n1, k)
pk(n, n1, k) pk(n, n1, k)
n |
A positive integer representing the number of points in the data set |
n1 |
Number of cases |
k |
Integer specifying the number of NNs (of subject |
Returns the probability of k
items selected from n
items are from the class of interest
(i.e., from the class whose size is )
Returns the value. See the description.
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
Clusters
objectPlots the points generated from the pattern (color coded for each class) together with the study window
## S3 method for class 'Clusters' plot(x, asp = NA, xlab = "x", ylab = "y", ...)
## S3 method for class 'Clusters' plot(x, asp = NA, xlab = "x", ylab = "y", ...)
x |
Object of class |
asp |
A numeric value, giving the aspect ratio for y axis to x-axis y/x (default is |
xlab , ylab
|
Titles for the x and y axes, respectively (default is |
... |
Additional parameters for |
None
#TBF
#TBF
SpatPatterns
objectPlots the points generated from the pattern (color coded for each class) together with the study window
## S3 method for class 'SpatPatterns' plot(x, asp = NA, xlab = "x", ylab = "y", ...)
## S3 method for class 'SpatPatterns' plot(x, asp = NA, xlab = "x", ylab = "y", ...)
x |
Object of class |
asp |
A numeric value, giving the aspect ratio for y axis to x-axis y/x (default is |
xlab , ylab
|
Titles for the x and y axes, respectively (default is |
... |
Additional parameters for |
None
#TBF
#TBF
cellhtest
objectPrinting objects of class "cellhtest
" by simple print
methods.
## S3 method for class 'cellhtest' print(x, digits = getOption("digits"), prefix = "\t", ...)
## S3 method for class 'cellhtest' print(x, digits = getOption("digits"), prefix = "\t", ...)
x |
object of class " |
digits |
number of significant digits to be used. |
prefix |
string, passed to |
... |
Additional parameters for |
None
Chisqtest
objectPrinting objects of class "Chisqtest
" by simple print
methods.
## S3 method for class 'Chisqtest' print(x, digits = getOption("digits"), prefix = "\t", ...)
## S3 method for class 'Chisqtest' print(x, digits = getOption("digits"), prefix = "\t", ...)
x |
object of class " |
digits |
number of significant digits to be used. |
prefix |
string, passed to |
... |
Additional parameters for |
None
classhtest
objectPrinting objects of class "classhtest
" by simple print
methods.
## S3 method for class 'classhtest' print(x, digits = getOption("digits"), prefix = "\t", ...)
## S3 method for class 'classhtest' print(x, digits = getOption("digits"), prefix = "\t", ...)
x |
object of class " |
digits |
number of significant digits to be used. |
prefix |
string, passed to |
... |
Additional parameters for |
None
Clusters
objectPrints the call
of the object of class 'Clusters
'
and also the type
(or description) of the pattern).
## S3 method for class 'Clusters' print(x, ...)
## S3 method for class 'Clusters' print(x, ...)
x |
A |
... |
Additional arguments for the S3 method ' |
The call
of the object of class 'Clusters
'
and also the type
(or description) of the pattern).
summary.Clusters
, print.summary.Clusters
, and plot.Clusters
#TBF (to be filled)
#TBF (to be filled)
refhtest
objectPrinting objects of class "refhtest
" by simple print
methods.
## S3 method for class 'refhtest' print(x, digits = getOption("digits"), prefix = "\t", ...)
## S3 method for class 'refhtest' print(x, digits = getOption("digits"), prefix = "\t", ...)
x |
object of class " |
digits |
number of significant digits to be used. |
prefix |
string, passed to |
... |
Additional parameters for |
None
SpatPatterns
objectPrints the call
of the object of class 'SpatPatterns
'
and also the type
(or description) of the pattern).
## S3 method for class 'SpatPatterns' print(x, ...)
## S3 method for class 'SpatPatterns' print(x, ...)
x |
A |
... |
Additional arguments for the S3 method ' |
The call
of the object of class 'SpatPatterns
'
and also the type
(or description) of the pattern).
summary.SpatPatterns
, print.summary.SpatPatterns
, and plot.SpatPatterns
#TBF (to be filled)
#TBF (to be filled)
Clusters
objectPrints some information about the object
.
## S3 method for class 'summary.Clusters' print(x, ...)
## S3 method for class 'summary.Clusters' print(x, ...)
x |
object of class " |
... |
Additional parameters for |
None
print.Clusters
, summary.Clusters
, and plot.Clusters
SpatPatterns
objectPrints some information about the object
.
## S3 method for class 'summary.SpatPatterns' print(x, ...)
## S3 method for class 'summary.SpatPatterns' print(x, ...)
x |
object of class " |
... |
Additional parameters for |
None
print.SpatPatterns
, summary.SpatPatterns
, and plot.SpatPatterns
Computes the probability of the observed nearest neighbor contingency table (NNCT)
where
which is the odds ratio
under RL or CSR independence and
is the probability mass function of the hypergeometric distribution.
That is, given the margins of the current NNCT, the probability of obtaining the current table with the odds
ratio
being the value under the null hypothesis.
This value is used to compute the table-inclusive and exclusive
-values for the exact inference on NNCTs.
See (Ceyhan (2010)) for more details.
prob.nnct(ct)
prob.nnct(ct)
ct |
A NNCT |
The probability of getting the observed NNCT, ct
, under the null hypothesis.
Elvan Ceyhan
Ceyhan E (2010). “Exact Inference for Testing Spatial Patterns by Nearest Neighbor Contingency Tables.” Journal of Probability and Statistical Science, 8(1), 45-68.
ct<-matrix(sample(20:40,4),ncol=2) prob.nnct(ct) ct<-matrix(sample(20:40,4),ncol=2) prob.nnct(ct)
ct<-matrix(sample(20:40,4),ncol=2) prob.nnct(ct) ct<-matrix(sample(20:40,4),ncol=2) prob.nnct(ct)
Returns the and
values where
is the number of points shared as a NN
by other points i.e. number of points that are NN of other points (which occurs when two or
more points share a NN, for data in any dimension) and
is the number of reflexive pairs
where A and B are reflexive iff they are NN to each other.
These quantities are used, e.g., in computing the variances and covariances of the entries of the nearest neighbor contingency tables used for Dixon's tests and other NNCT tests.
QRval(njr)
QRval(njr)
njr |
A |
A list
with two elements
Q |
the |
R |
the |
Elvan Ceyhan
Qval
, Qvec
, sharedNN
, Rval
and Ninv
#3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) ninv<-Ninv(ipd) QRval(ninv) W<-Wmat(ipd) Qvec(W)$q #1D data points n<-15 X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) ninv<-Ninv(ipd) QRval(ninv) W<-Wmat(ipd) Qvec(W)$q #with possible ties in the data Y<-matrix(round(runif(30)*10),ncol=3) ny<-nrow(Y) ipd<-ipd.mat(Y) ninv<-Ninv(ipd) QRval(ninv) W<-Wmat(ipd) Qvec(W)$q
#3D data points n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) ninv<-Ninv(ipd) QRval(ninv) W<-Wmat(ipd) Qvec(W)$q #1D data points n<-15 X<-as.matrix(runif(n))# need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(n) would not work ipd<-ipd.mat(X) ninv<-Ninv(ipd) QRval(ninv) W<-Wmat(ipd) Qvec(W)$q #with possible ties in the data Y<-matrix(round(runif(30)*10),ncol=3) ny<-nrow(Y) ipd<-ipd.mat(Y) ninv<-Ninv(ipd) QRval(ninv) W<-Wmat(ipd) Qvec(W)$q
-symmetry Contingency Table (QCT)Returns the contingency table for
-symmetry (i.e.
-symmetry contingency table (QCT)) given the
IPD matrix or data set
x
where is the number of classes in the data set.
Each row in the QCT is the vector of number of points with shared NNs,
where
is the number of class
points that are NN to class
points
for
and
is the number of class
points that are NN to class
or more points.
That is, this function pools the cells 3 or larger together for
classes, so
,
etc. are pooled,
so the column labels are
,
and
with the last one is actually sum of
for
.
Rows the QCT are labeled with the corresponding class labels.
-symmetry is also equivalent to Pielou's second type of NN symmetry
or the symmetry in the shared NN structure for all classes.
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
See also (Pielou (1961); Ceyhan (2014)) and the references therein.
Qsym.ct(x, lab, is.ipd = TRUE, ...)
Qsym.ct(x, lab, is.ipd = TRUE, ...)
x |
The IPD matrix (if |
lab |
The |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
Returns the QCT where
is the number of classes in the data set.
Elvan Ceyhan
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Pielou EC (1961).
“Segregation and symmetry in two-species populations as studied by nearest-neighbor relationships.”
Journal of Ecology, 49(2), 255-269.
sharedNNmc
, Qsym.test
and scct
n<-20 #or try sample(1:20,1) Y<-matrix(runif(n*3),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) Qsym.ct(ipd,cls) Qsym.ct(Y,cls,is.ipd = FALSE) Qsym.ct(Y,cls,is.ipd = FALSE,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Qsym.ct(ipd,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) Qsym.ct(ipd,cls)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(n*3),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) Qsym.ct(ipd,cls) Qsym.ct(Y,cls,is.ipd = FALSE) Qsym.ct(Y,cls,is.ipd = FALSE,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Qsym.ct(ipd,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) Qsym.ct(ipd,cls)
An object of class "Chisqtest"
performing the hypothesis test of equality of the probabilities for the rows
in the -symmetry contingency table (QCT).
Each row of the QCT is the vector of Qi
values where
is the number of class
points that are NN
to
points.
That is, the test performs Pielou's second type of NN symmetry test which is also equivalent to Pearson's
test on the QCT (Pielou (1961)).
Pielou's second type of NN symmetry is the symmetry in the shared NN structure for all classes, which is also
called
-symmetry.
The test is appropriate (i.e. have the appropriate asymptotic sampling distribution)
provided that data is obtained by sparse sampling, although simulations suggest it seems to work for
completely mapped data as well.
(See Ceyhan (2014) for more detail).
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
The argument combine is a logical argument (default=TRUE
) to determine whether to combine the 3rd column
and the columns to the left.
If TRUE
, this function pools the cells 3 or larger together for classes in the QCT,
so
,
etc. are pooled, so the column
labels are
,
and
with the last one is actually sum of
for
in the QCT.
If
FALSE
, the function does not perform the pooling of the cells.
The function yields the test statistic, -value and
df
which is where
is the number of
columns in QCT (which reduces to
, if
combine=TRUE
). It also provides the description of
the alternative with the corresponding null values (i.e. expected values) of the entries of the QCT and also the sample estimates
of the entries of QCT (i.e., the observed QCT).
The function also provides names of the test statistics, the method and the data set used.
The null hypothesis is the symmetry in the shared NN structure for each class, that is,
all where
the size of class
and
is the sum of column
in the QCT (i.e., the total number of points serving as NN to class
other points). (i.e., symmetry in the
mixed NN structure).
See also (Pielou (1961); Ceyhan (2014)) and the references therein.
Qsym.test(x, lab, is.ipd = TRUE, combine = TRUE, ...)
Qsym.test(x, lab, is.ipd = TRUE, combine = TRUE, ...)
x |
The IPD matrix (if |
lab |
The |
is.ipd |
A logical parameter (default= |
combine |
A logical parameter (default= |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The chi-squared test statistic for Pielou's second type of NN symmetry test (i.e., |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
Estimates, i.e., the observed QCT. |
est.name , est.name2
|
Names of the estimates, they are identical for this function. |
null.value |
Hypothesized null values for the entries of the QCT, i.e., the matrix with entries
|
method |
Description of the hypothesis test |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Pielou EC (1961).
“Segregation and symmetry in two-species populations as studied by nearest-neighbor relationships.”
Journal of Ecology, 49(2), 255-269.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) Qsym.ct(ipd,cls) Qsym.test(ipd,cls) Qsym.test(Y,cls,is.ipd = FALSE) Qsym.test(Y,cls,is.ipd = FALSE,method="max") Qsym.test(ipd,cls,combine = FALSE) #cls as a faqctor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Qsym.test(ipd,fcls) Qsym.test(Y,fcls,is.ipd = FALSE) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) Qsym.test(ipd,cls) Qsym.test(Y,cls,is.ipd = FALSE)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) Qsym.ct(ipd,cls) Qsym.test(ipd,cls) Qsym.test(Y,cls,is.ipd = FALSE) Qsym.test(Y,cls,is.ipd = FALSE,method="max") Qsym.test(ipd,cls,combine = FALSE) #cls as a faqctor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Qsym.test(ipd,fcls) Qsym.test(Y,fcls,is.ipd = FALSE) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) Qsym.test(ipd,cls) Qsym.test(Y,cls,is.ipd = FALSE)
An object of class "SpatPatterns"
.
Generates n_2
2D points associated with the given set of points (i.e. reference points) in the
type=type
fashion with the parameter=asc.par which specifies the level of association.
The generated points are intended to be from a different class, say class 2 (or points)
than the reference (i.e.
points, say class 1 points, denoted as
X1
as an argument
of the function), say class 1 points).
To generate (denoted as
n2
as an argument of the function) points,
of
points are randomly selected (possibly with replacement) and
for a selected
X1
point, say ,
a new point from the class 2, say
, is generated from a distribution specified
by the type argument.
In type I association, i.e., if type="I"
, first a number,
, is generated.
If
,
is generated (uniform in the polar coordinates) within a
circle with radius equal to the distance to the closest
point,
else it is generated uniformly within the smallest bounding box containing
points.
In the type C association pattern
the new point from the class 2, , is generated (uniform in the polar coordinates) within a circle
centered at
with radius equal to
,
in type U association pattern
is generated similarly except it is uniform in the circle.
In type G association, is generated from the bivariate normal distribution centered at
with covariance
where
is
identity matrix.
See Ceyhan (2014) for more detail.
rassoc(X1, n2, asc.par, type)
rassoc(X1, n2, asc.par, type)
X1 |
A set of 2D points representing the reference points, also referred as class 1 points. The generated points are associated in a type=type sense with these points. |
n2 |
A positive integer representing the number of class 2 points to be generated. |
asc.par |
A positive real number representing the association parameter. For |
type |
The type of the association pattern. Takes on values |
A list
with the elements
pat.type |
= |
type |
The type of the point pattern |
parameters |
The |
ref.points |
The input set of reference points |
desc.pat |
Description of the point pattern |
mtitle |
The |
num.points |
The |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
Ceyhan E (2014). “Simulation and characterization of multi-class spatial patterns from stochastic point processes of randomness, clustering and regularity.” Stochastic Environmental Research and Risk Assessment (SERRA), 38(5), 1277-1306.
rassocI
, rassocC
, rassocU
, and rassocG
n1<-20; n2<-1000; #try also n1<-10; n2<-1000; #with default bounding box (i.e., unit square) X1<-cbind(runif(n1),runif(n1)) Xdat<-rassoc(X1,n2,asc.par=.05,type="G") #try other types as well Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #with type U association Xdat<-rassoc(X1,n2,asc.par=.1,type="U") Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #with type C association Xdat<-rassoc(X1,n2,asc.par=.1,type=2) #2 is for "C" Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n1<-20; n2<-1000; #try also n1<-10; n2<-1000; #with default bounding box (i.e., unit square) X1<-cbind(runif(n1),runif(n1)) Xdat<-rassoc(X1,n2,asc.par=.05,type="G") #try other types as well Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #with type U association Xdat<-rassoc(X1,n2,asc.par=.1,type="U") Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #with type C association Xdat<-rassoc(X1,n2,asc.par=.1,type=2) #2 is for "C" Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "SpatPatterns"
.
Generates n_2
2D points associated with the given set of points (i.e. reference points) in the
type C fashion with a radius of association
(denoted as
r0
as
an argument of the function) which is a positive real number.
The generated points are intended to be from a different class, say class 2 (or points) than the reference
(i.e.
points, say class 1 points, denoted as
X1
as an argument of the function), say class 1 points).
To generate
points,
of
points are randomly selected (possibly with replacement) and
for a selected
X1
point, say ,
a new point from the class 2, say
, is generated within a
circle with radius equal to
(uniform in the polar coordinates).
That is,
where
and
.
Note that, the level of association increases as
decreases, and the association vanishes when
is
sufficiently large.
For type C association, it is recommended to take times length of the shorter
edge of a rectangular study region, or take
with the appropriate choice of
to get an association pattern more robust to differences in relative abundances
(i.e. the choice of
implies
times length of the shorter edge to have alternative patterns more
robust to differences in sample sizes).
Here
is the
estimated intensity of points in the study region (i.e., # of points divided by the area of the region).
Type C association is closely related to Type U association, see the function rassocC
and the other association types.
In the type U association pattern
the new point from the class 2, , is generated uniformly within a circle
centered at
with radius equal to
.
In type G association,
is generated from the bivariate normal distribution centered at
with covariance
where
is
identity matrix.
In type I association, first a
number,
, is generated.
If
,
is generated (uniform in the polar coordinates) within a
circle with radius equal to the distance to the closest
point,
else it is generated uniformly within the smallest bounding box containing
points.
See Ceyhan (2014) for more detail.
rassocC(X1, n2, r0)
rassocC(X1, n2, r0)
X1 |
A set of 2D points representing the reference points, also referred as class 1 points. The generated points are associated in a type C sense (in a circular/radial fashion) with these points. |
n2 |
A positive integer representing the number of class 2 points to be generated. |
r0 |
A positive real number representing the radius of association of class 2 points associated with a randomly selected class 1 point (see the description below). |
A list
with the elements
pat.type |
= |
type |
The type of the point pattern |
parameters |
Radius of association controlling the level of association |
gen.points |
The output set of generated points (i.e. class 2 points) associated with reference (i.e.
|
ref.points |
The input set of reference points |
desc.pat |
Description of the point pattern |
mtitle |
The |
num.points |
The |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
Ceyhan E (2014). “Simulation and characterization of multi-class spatial patterns from stochastic point processes of randomness, clustering and regularity.” Stochastic Environmental Research and Risk Assessment (SERRA), 38(5), 1277-1306.
rassocI
, rassocG
, rassocU
, and rassoc
n1<-20; n2<-1000; #try also n1<-10; n2<-1000; r0<-.15 #try also .10 and .20, runif(1) #with default bounding box (i.e., unit square) X1<-cbind(runif(n1),runif(n1)) #try also X1<-1+cbind(runif(n1),runif(n1)) Xdat<-rassocC(X1,n2,r0) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #radius adjusted with the expected NN distance x<-range(X1[,1]); y<-range(X1[,2]) ar<-(y[2]-y[1])*(x[2]-x[1]) #area of the smallest rectangular window containing X1 points rho<-n1/ar r0<-1/(2*sqrt(rho)) #r0=1/(2rho) where \code{rho} is the intensity of X1 points Xdat<-rassocC(X1,n2,r0) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n1<-20; n2<-1000; #try also n1<-10; n2<-1000; r0<-.15 #try also .10 and .20, runif(1) #with default bounding box (i.e., unit square) X1<-cbind(runif(n1),runif(n1)) #try also X1<-1+cbind(runif(n1),runif(n1)) Xdat<-rassocC(X1,n2,r0) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #radius adjusted with the expected NN distance x<-range(X1[,1]); y<-range(X1[,2]) ar<-(y[2]-y[1])*(x[2]-x[1]) #area of the smallest rectangular window containing X1 points rho<-n1/ar r0<-1/(2*sqrt(rho)) #r0=1/(2rho) where \code{rho} is the intensity of X1 points Xdat<-rassocC(X1,n2,r0) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "SpatPatterns"
.
Generates n_2
2D points associated with the given set of points (i.e. reference points) in the
type G fashion with the parameter sigma which is a positive real number representing the variance of the
Gaussian marginals.
The generated points are intended to be from a different class, say class 2 (or
points) than the reference
(i.e.
points, say class 1 points, denoted as
X1
as an argument of the function), say class 1 points).
To generate (denoted as
n2
as an argument of the function) points,
of
points are randomly selected (possibly with replacement) and
for a selected
X1
point, say ,
a new point from the class 2, say
, is generated from a bivariate normal distribution centered at
where the covariance matrix of the bivariate normal is a diagonal matrix with sigma in the diagonals.
That is,
where
with
being the
identity matrix.
Note that, the level of association increases as
sigma
decreases, and the association vanishes when sigma
goes to infinity.
For type G association, it is recommended to take times length of the shorter
edge of a rectangular study region, or take
with the appropriate choice of
to get an association pattern more robust to differences in relative abundances
(i.e. the choice of
implies
times length of the shorter edge to have alternative patterns more
robust to differences in sample sizes).
Here
is the
estimated intensity of points in the study region (i.e., # of points divided by the area of the region).
Type G association is closely related to Types C and U association,
see the functions rassocC
and rassocU
and
the other association types.
In the type C association pattern
the new point from the class 2, , is generated (uniform in the polar coordinates) within a circle
centered at
with radius equal to
,
in type U association pattern
is generated similarly except it is uniform in the circle.
In type I association, first a
number,
, is generated.
If
,
is generated (uniform in the polar coordinates) within a
circle with radius equal to the distance to the closest
point,
else it is generated uniformly within the smallest bounding box containing
points.
See Ceyhan (2014) for more detail.
rassocG(X1, n2, sigma)
rassocG(X1, n2, sigma)
X1 |
A set of 2D points representing the reference points, also referred as class 1 points. The generated points are associated in a type G sense with these points. |
n2 |
A positive integer representing the number of class 2 points to be generated. |
sigma |
A positive real number representing the variance of the Gaussian marginals, where
the bivariate normal distribution has covariance |
A list
with the elements
pat.type |
= |
type |
The type of the point pattern |
parameters |
The variance of the Gaussian marginals controlling the level of association, where
the bivariate normal distribution has covariance |
gen.points |
The output set of generated points (i.e. class 2 points) associated with reference (i.e.
|
ref.points |
The input set of reference points |
desc.pat |
Description of the point pattern |
mtitle |
The |
num.points |
The |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
Ceyhan E (2014). “Simulation and characterization of multi-class spatial patterns from stochastic point processes of randomness, clustering and regularity.” Stochastic Environmental Research and Risk Assessment (SERRA), 38(5), 1277-1306.
rassocI
, rassocG
, rassocC
, and rassoc
n1<-20; n2<-1000; #try also n1<-10; n2<-1000; stdev<-.05 #try also .075 and .15 #with default bounding box (i.e., unit square) X1<-cbind(runif(n1),runif(n1)) #try also X1<-1+cbind(runif(n1),runif(n1)) Xdat<-rassocG(X1,n2,stdev) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #sigma adjusted with the expected NN distance x<-range(X1[,1]); y<-range(X1[,2]) ar<-(y[2]-y[1])*(x[2]-x[1]) #area of the smallest rectangular window containing X1 points rho<-n1/ar stdev<-1/(4*sqrt(rho)) #r0=1/(2rho) where \code{rho} is the intensity of X1 points Xdat<-rassocG(X1,n2,stdev) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n1<-20; n2<-1000; #try also n1<-10; n2<-1000; stdev<-.05 #try also .075 and .15 #with default bounding box (i.e., unit square) X1<-cbind(runif(n1),runif(n1)) #try also X1<-1+cbind(runif(n1),runif(n1)) Xdat<-rassocG(X1,n2,stdev) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #sigma adjusted with the expected NN distance x<-range(X1[,1]); y<-range(X1[,2]) ar<-(y[2]-y[1])*(x[2]-x[1]) #area of the smallest rectangular window containing X1 points rho<-n1/ar stdev<-1/(4*sqrt(rho)) #r0=1/(2rho) where \code{rho} is the intensity of X1 points Xdat<-rassocG(X1,n2,stdev) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "SpatPatterns"
.
Generates n_2
2D points associated with the given set of points (i.e. reference points) in the
type I fashion with circular (or radial) between class attraction parameter
p
, which is a probability value between 0 and 1.
The generated points are intended to be from a different class, say class 2 (or points) than the reference
(i.e.
points, say class 1 points, denoted as
X1
as an argument of the function).
To generate (denoted as
n2
as an argument of the function) points,
of
points are randomly selected (possibly with replacement) and
for a selected
X1
point, say , a
number,
, is generated.
If
, a new point from the class 2, say
, is generated within a
circle with radius equal to the distance to the closest
point (uniform in the polar coordinates),
else the new point is generated uniformly
within the smallest bounding box containing
points.
That is, if
,
where
and
with
,
else
where
is the smallest bounding box containing
points.
Note that, the level of association increases as
p
increases, and the association vanishes
when p
approaches to 0.
Type I association is closely related to Type C association in
Ceyhan (2014), see the function rassocC
and also other association types.
In the type C association pattern
the new point from the class 2, , is generated (uniform in the polar coordinates) within a circle
centered at
with radius equal to
,
in type U association pattern
is generated similarly except it is uniform in the circle.
In type G association,
is generated from the bivariate normal distribution centered at
with covariance
where
is
identity matrix.
rassocI(X1, n2, p)
rassocI(X1, n2, p)
X1 |
A set of 2D points representing the reference points, also referred as class 1 points. The generated points are associated in a type I sense (in a circular/radial fashion) with these points. |
n2 |
A positive integer representing the number of class 2 (i.e. |
p |
A real number between 0 and 1 representing the attraction probability of class 2 points associated with a randomly selected class 1 point (see the description below). |
A list
with the elements
pat.type |
equals |
type |
The type of the point pattern |
parameters |
Radial (i.e. circular) between class attraction parameter controlling the level of association |
gen.points |
The output set of generated points (i.e. class 2 points) associated with reference (i.e.
|
ref.points |
The input set of reference points |
desc.pat |
Description of the point pattern |
lab |
The class labels of the generated points, it is |
mtitle |
The |
num.points |
The |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
Ceyhan E (2014). “Simulation and characterization of multi-class spatial patterns from stochastic point processes of randomness, clustering and regularity.” Stochastic Environmental Research and Risk Assessment (SERRA), 38(5), 1277-1306.
rassocC
, rassocG
, rassocU
, and rassoc
n1<-20; n2<-1000; #try also n1<-10; n2<-1000; p<- .75 #try also .25, .5, .9, runif(1) #with default bounding box (i.e., unit square) X1<-cbind(runif(n1),runif(n1)) #try also X1<-1+cbind(runif(n1),runif(n1)) Xdat<-rassocI(X1,n2,p) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n1<-20; n2<-1000; #try also n1<-10; n2<-1000; p<- .75 #try also .25, .5, .9, runif(1) #with default bounding box (i.e., unit square) X1<-cbind(runif(n1),runif(n1)) #try also X1<-1+cbind(runif(n1),runif(n1)) Xdat<-rassocI(X1,n2,p) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "SpatPatterns"
.
Generates n_2
2D points associated with the given set of points (i.e. reference points) in the
type U fashion with a radius of association
(denoted as
r0
as an argument of the function) which is a positive real number.
The generated points are intended to be from a different class, say class 2 (or points) than the reference
(i.e.
points, say class 1 points, denoted as
X1
as an argument of the function), say class 1 points).
To generate (denoted as
n2
as an argument of the function) points,
of
points are randomly selected (possibly with replacement) and
for a selected
X1
point, say ,
a new point from the class 2, say
, is generated uniformly within a
circle with radius equal to
.
That is,
where
with
and
.
Note that, the level of association increases as
decreases, and the association vanishes when
is
sufficiently large.
For type U association, it is recommended to take times length of the shorter
edge of a rectangular study region, or take
with the appropriate choice of
to get an association pattern more robust to differences in relative abundances
(i.e. the choice of
implies
times length of the shorter edge to have alternative patterns more
robust to differences in sample sizes).
Here
is the
estimated intensity of points in the study region (i.e., # of points divided by the area of the region).
Type U association is closely related to Type C association, see the function rassocC
and the other association types.
In the type C association pattern
the new point from the class 2, , is generated (uniform in the polar coordinates) within a circle
centered at
with radius equal to
.
In type G association,
is generated from the bivariate normal distribution centered at
with covariance
where
is
identity matrix.
In type I association, first a
number,
, is generated.
If
,
is generated (uniform in the polar coordinates) within a
circle with radius equal to the distance to the closest
point,
else it is generated uniformly within the smallest bounding box containing
points.
See Ceyhan (2014) for more detail.
rassocU(X1, n2, r0)
rassocU(X1, n2, r0)
X1 |
A set of 2D points representing the reference points, also referred as class 1 points. The generated points are associated in a type U sense (in a circular/radial fashion) with these points. |
n2 |
A positive integer representing the number of class 2 points to be generated. |
r0 |
A positive real number representing the radius of association of class 2 points associated with a randomly selected class 1 point (see the description below). |
A list
with the elements
pat.type |
= |
type |
The type of the point pattern |
parameters |
Radius of association controlling the level of association |
gen.points |
The output set of generated points (i.e. class 2 points) associated with reference (i.e.
|
ref.points |
The input set of reference points |
desc.pat |
Description of the point pattern |
mtitle |
The |
num.points |
The |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
Ceyhan E (2014). “Simulation and characterization of multi-class spatial patterns from stochastic point processes of randomness, clustering and regularity.” Stochastic Environmental Research and Risk Assessment (SERRA), 38(5), 1277-1306.
rassocI
, rassocG
, rassocC
, and rassoc
n1<-20; n2<-1000; #try also n1<-10; n2<-1000; r0<-.15 #try also .10 and .20 #with default bounding box (i.e., unit square) X1<-cbind(runif(n1),runif(n1)) #try also X1<-1+cbind(runif(n1),runif(n1)) Xdat<-rassocU(X1,n2,r0) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #radius adjusted with the expected NN distance x<-range(X1[,1]); y<-range(X1[,2]) ar<-(y[2]-y[1])*(x[2]-x[1]) #area of the smallest rectangular window containing X1 points rho<-n1/ar r0<-1/(2*sqrt(rho)) #r0=1/(2rho) where \code{rho} is the intensity of X1 points Xdat<-rassocU(X1,n2,r0) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n1<-20; n2<-1000; #try also n1<-10; n2<-1000; r0<-.15 #try also .10 and .20 #with default bounding box (i.e., unit square) X1<-cbind(runif(n1),runif(n1)) #try also X1<-1+cbind(runif(n1),runif(n1)) Xdat<-rassocU(X1,n2,r0) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #radius adjusted with the expected NN distance x<-range(X1[,1]); y<-range(X1[,2]) ar<-(y[2]-y[1])*(x[2]-x[1]) #area of the smallest rectangular window containing X1 points rho<-n1/ar r0<-1/(2*sqrt(rho)) #r0=1/(2rho) where \code{rho} is the intensity of X1 points Xdat<-rassocU(X1,n2,r0) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
Returns the RCT given the IPD matrix or data set x
, the RCT is regardless of the
number of classes in the data set.
RCT is constructed by categorizing the NN pairs according to pair type as self or mixed and whether
the pair is reflexive or non-reflexive.
A base-NN pair is called a reflexive pair, if the elements of the pair are NN to each other;
a non-reflexive pair, if the elements of the pair are not NN to each other;
a self pair, if the elements of the pair are from the same class; a mixed pair, if the
elements of the pair are from different classes.
Row labels in the RCT are "ref"
for reflexive and "non-ref"
for non-reflexive and
column labels are "self"
and "mixed"
.
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
See also (Ceyhan and Bahadir (2017); Bahadir and Ceyhan (2018)) and the references therein.
rct(x, lab, is.ipd = TRUE, ...)
rct(x, lab, is.ipd = TRUE, ...)
x |
The IPD matrix (if |
lab |
The |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
Returns the RCT, see the description above for more detail.
Elvan Ceyhan
Bahadir S, Ceyhan E (2018).
“On the Number of reflexive and shared nearest neighbor pairs in one-dimensional uniform data.”
Probability and Mathematical Statistics, 38(1), 123-137.
Ceyhan E, Bahadir S (2017).
“Nearest Neighbor Methods for Testing Reflexivity.”
Environmental and Ecological Statistics, 24(1), 69-108.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) rct(ipd,cls) rct(Y,cls,is.ipd = FALSE) rct(Y,cls,is.ipd = FALSE,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) rct(ipd,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) rct(ipd,cls)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) rct(ipd,cls) rct(Y,cls,is.ipd = FALSE) rct(Y,cls,is.ipd = FALSE,method="max") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) rct(ipd,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) rct(ipd,cls)
An object of class "Clusters"
.
Generates n
2D points with k
() clusters along the first diagonal
where about
points belongs to each cluster.
If distribution="uniform"
, the points are uniformly generated in their square
supports where one square is the unit square (i.e., with vertices ), and
the others are unit squares translated
units along the first diagonal for
(i.e. with vertices
).
If distribution="bvnormal"
, the points are generated from the bivariate normal distribution with means equal to the
centers of the above squares (i.e. for each cluster with mean=
for
and the covariance matrix
, where
is the
identity matrix.
Notice that the clusters are more separated, i.e., generated data indicates more clear clusters as increases
in either positive or negative direction with
indicating one cluster in the data. For a fixed
, when
distribution="bvnormal"
,
the clustering gets stronger if the variance of each component, , gets smaller, and clustering gets weaker
as the variance of each component gets larger where default is
.
rdiag.clust(n, k, d, sd = 1/6, distribution = c("uniform", "bvnormal"))
rdiag.clust(n, k, d, sd = 1/6, distribution = c("uniform", "bvnormal"))
n |
A positive integer representing the number of points to be generated from the two clusters |
k |
A positive integer representing the number of clusters to be generated |
d |
Shift in the first diagonal indicating the level of clustering in the data. Larger absolute values in either direction (i.e. positive or negative) would yield stronger clustering. |
sd |
The standard deviation of the components of the bivariate normal distribution with default |
distribution |
The argument determining the distribution of each cluster. Takes on values |
A list
with the elements
type |
The type of the clustering pattern |
parameters |
The number of clusters, |
gen.points |
The output set of generated points from the clusters. |
desc.pat |
Description of the clustering pattern |
mtitle |
The |
num.points |
The number of generated points. |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
rhor.clust
and rrot.clust
n<-20 #or try sample(1:20,1); #try also n<-50; n<-1000; d<-.5 #try also -75,.75, 1 k<-3 #try also 5 #data generation Xdat<-rdiag.clust(n,k,d) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #data generation (bvnormal) n<-20 #or try sample(1:20,1); #try also n<-50; n<-1000; d<-.5 #try also -.75,.75, 1 k<-3 #try also 5 Xdat<-rdiag.clust(n,k,d,distr="bvnormal") #try also Xdat<-rdiag.clust(n,k,d,sd=.09,distr="bvnormal") Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n<-20 #or try sample(1:20,1); #try also n<-50; n<-1000; d<-.5 #try also -75,.75, 1 k<-3 #try also 5 #data generation Xdat<-rdiag.clust(n,k,d) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #data generation (bvnormal) n<-20 #or try sample(1:20,1); #try also n<-50; n<-1000; d<-.5 #try also -.75,.75, 1 k<-3 #try also 5 Xdat<-rdiag.clust(n,k,d,distr="bvnormal") #try also Xdat<-rdiag.clust(n,k,d,sd=.09,distr="bvnormal") Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "Clusters"
.
Generates n
2D points with k
() clusters along the horizontal axis
where about
points belongs to each cluster.
If distribution="uniform"
, the points are uniformly generated in their square
supports where one square is the unit square (i.e., with vertices ), and
the others are
units shifted horizontally from each other so that their lower end vertices are
for
.
If distribution="bvnormal"
, the points are generated from the bivariate normal distribution with means equal to the
centers of the above squares (i.e. for each cluster with mean=(j+(j-1)d-1/2,1/2) for
and the covariance matrix
, where
is the
identity matrix.
Notice that the clusters are more separated, i.e., generated data indicates more clear clusters as increases
in either direction with
indicating one cluster in the data. For a fixed
, when
distribution="bvnormal"
,
the clustering gets stronger if the variance of each component, , gets smaller, and clustering gets weaker
as the variance of each component gets larger where default is
.
rhor.clust(n, k, d, sd = 1/6, distribution = c("uniform", "bvnormal"))
rhor.clust(n, k, d, sd = 1/6, distribution = c("uniform", "bvnormal"))
n |
A positive integer representing the number of points to be generated from all the clusters |
k |
A positive integer representing the number of clusters to be generated |
d |
Horizontal shift indicating the level of clustering in the data. Larger absolute values in either direction (i.e. positive or negative) would yield stronger clustering. |
sd |
The standard deviation of the components of the bivariate normal distribution with default |
distribution |
The argument determining the distribution of each cluster. Takes on values |
A list
with the elements
type |
The type of the clustering pattern |
parameters |
The number of clusters, |
gen.points |
The output set of generated points from the |
desc.pat |
Description of the clustering pattern |
mtitle |
The |
num.points |
The number of generated points. |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
n<-100; #try also n<-50; or n<-1000; d<-.5 #try also -.5,.75, 1 k<-3 #try also 5 #data generation Xdat<-rhor.clust(n,k,d) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #data generation (bvnormal) n<-100; #try also n<-50; n<-1000; d<-.1 #try also -.1, .75, 1 k<-3 #try also 5 Xdat<-rhor.clust(n,k,d,distr="bvnormal") #try also Xdat<-rhor.clust(n,k,d,sd=.15,distr="bvnormal") Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n<-100; #try also n<-50; or n<-1000; d<-.5 #try also -.5,.75, 1 k<-3 #try also 5 #data generation Xdat<-rhor.clust(n,k,d) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #data generation (bvnormal) n<-100; #try also n<-50; n<-1000; d<-.1 #try also -.1, .75, 1 k<-3 #try also 5 Xdat<-rhor.clust(n,k,d,distr="bvnormal") #try also Xdat<-rhor.clust(n,k,d,sd=.15,distr="bvnormal") Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "SpatPatterns"
.
Given the set of points,
dat
, in a region, this function assigns some of them as cases,
and the rest as controls in a non-RL type=type
fashion.
Type I nonRL pattern assigns round(n*prop,0)
of the data points as cases,
and the rest as controls with first selecting a point, , as a case and assigning the
label case to the remaining points with infection probabilities
prob=c(prop+((1-prop)*rho)/(1:k))
where rho
is a
parameter adjusting the NN dependence of infection probabilities.
Type II nonRL pattern assigns round(n*ult.prop,0)
of them as cases,
and the rest as controls with first selecting round(n*init.prop,0)
as cases initially, then selecting
a contagious case and then assigning the label case to the remaining points with infection probabilities
inversely proportional to their position in the k
NNs.
Type III nonRL pattern assigns round(n*prop,0)
of them as cases,
and the rest as controls with first selecting a point, , as a case and assigning the
label case to the remaining points with infection probabilities
where
is the
distance from
to
for
,
is the maximum of
values,
rho
is a scaling parameter for
the infection probabilities and pow
is a parameter in the power adjusting the distance dependence.
Type IV nonRL pattern assigns round(n*ult.prop,0)
of them as cases,
and the rest as controls with first selecting round(n*init.prop,0)
as cases initially and assigning the
label case to the remaining points with infection probabilities equal to the scaled bivariate normal density values
at those points.
The number of cases in Types I and III will be on the average if the argument
poisson=TRUE
(i.e., rpois(1,round(n*prop,0))
), otherwise round(n*prop,0)
.
The initial and ultimate number of cases in Types II and IV will be and
on the average if the argument
poisson=TRUE
(i.e., rpois(1,round(n*init.prop,0)
) and rpois(1,round(n*ult.prop,0))
), otherwise
they will be exactly equal to round(n*ult.prop,0)
and round(n*init.prop,0)
.
At each type, we stop when we first exceed cases. That is, the procedure ends when number of cases
exceed
, and
of the cases (other than the initial case(s)) are randomly selected and relabeled as
controls, i.e. 0s, so that the number of cases is exactly
.
In the output cases are labeled as 1 and controls as 0, and initial contagious case is marked with a red cross in the plot of the pattern.
See Ceyhan (2014) and the functions rnonRLI
,
rnonRLII
, rnonRLIII
, and rnonRLIV
for more detail on each type of
non-RL pattern.
Although the non-RL pattern is described for the case-control setting, it can be adapted for any two-class setting when it is appropriate to treat one of the classes as cases or one of the classes behave like cases and other class as controls.
The parameters of the non-RL patterns are specified in the argument par.vec
, and the logical arguments rand.init
and poisson pass on to the types where required. rand.init
is not used in type I but used in all other types,
poisson is used in all types, and init.from.cases is used in type I non-RL only.
rnonRL( dat, par.vec, type, rand.init = TRUE, poisson = FALSE, init.from.cases = TRUE )
rnonRL( dat, par.vec, type, rand.init = TRUE, poisson = FALSE, init.from.cases = TRUE )
dat |
A set of points the non-RL procedure is applied to obtain cases and controls randomly in the
|
par.vec |
The parameter vector. It is |
type |
The type of the non-RL pattern. Takes on values |
rand.init |
A logical argument (default is |
poisson |
A logical argument (default is |
init.from.cases |
A logical argument (default is |
A list
with the elements
pat.type |
|
type |
The type of the point pattern |
parameters |
|
lab |
The labels of the points as 1 for cases and 0 for controls after the nonRL procedure is
applied to the data set, |
init.cases |
The initial cases in the data set, |
cont.cases |
The contagious cases in the data set, |
gen.points , ref.points
|
Both are |
desc.pat |
Description of the point pattern |
mtitle |
The |
num.points |
The |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
Ceyhan E (2014). “Segregation indices for disease clustering.” Statistics in Medicine, 33(10), 1662-1684.
rnonRLI
, rnonRLII
, rnonRLIII
, and rnonRLIV
#data generation n<-40; #try also n<-20; n<-100; dat<-cbind(runif(n,0,1),runif(n,0,1)) #Type I non-RL pattern #c(prop,k,rho) for type I prop<-.5; knn<-3; rho<- .3 prv<-c(prop,knn,rho) Xdat<-rnonRL(dat,type="I",prv) #labeled data # or try Xdat<-rnonRL(dat,type="I",prv) for type I non-RL Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #Type II non-RL pattern #c(k,rho,pow,init.prop,ult.prop) for type II rho<-.8; pow<-2; knn<-5; ip<-.3; up<-.5 prv<-c(knn,rho,pow,ip,up) Xdat<-rnonRL(dat,type="II",prv) #labeled data # or try Xdat<-rnonRL(dat,type="I",prv) for type I non-RL Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #Type III non-RL pattern #c(prop,rho,pow) for type III prop<- .5; rho<-.8; pow<-2 prv<-c(prop,rho,pow) Xdat<-rnonRL(dat,type="III",prv) #labeled data # or try Xdat<-rnonRL(dat,type="I",prv) for type I non-RL Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #Type IV non-RL pattern #c(init.prop,ult.prop,s1,s2,rho) for type IV ult<-.5; int<- .1; s1<-s2<-.4; rho<- .1 prv<-c(int,ult,s1,s2,rho) Xdat<-rnonRL(dat,type="IV",prv) #labeled data # or try Xdat<-rnonRL(dat,type="I",prv) for type I non-RL Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
#data generation n<-40; #try also n<-20; n<-100; dat<-cbind(runif(n,0,1),runif(n,0,1)) #Type I non-RL pattern #c(prop,k,rho) for type I prop<-.5; knn<-3; rho<- .3 prv<-c(prop,knn,rho) Xdat<-rnonRL(dat,type="I",prv) #labeled data # or try Xdat<-rnonRL(dat,type="I",prv) for type I non-RL Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #Type II non-RL pattern #c(k,rho,pow,init.prop,ult.prop) for type II rho<-.8; pow<-2; knn<-5; ip<-.3; up<-.5 prv<-c(knn,rho,pow,ip,up) Xdat<-rnonRL(dat,type="II",prv) #labeled data # or try Xdat<-rnonRL(dat,type="I",prv) for type I non-RL Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #Type III non-RL pattern #c(prop,rho,pow) for type III prop<- .5; rho<-.8; pow<-2 prv<-c(prop,rho,pow) Xdat<-rnonRL(dat,type="III",prv) #labeled data # or try Xdat<-rnonRL(dat,type="I",prv) for type I non-RL Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #Type IV non-RL pattern #c(init.prop,ult.prop,s1,s2,rho) for type IV ult<-.5; int<- .1; s1<-s2<-.4; rho<- .1 prv<-c(int,ult,s1,s2,rho) Xdat<-rnonRL(dat,type="IV",prv) #labeled data # or try Xdat<-rnonRL(dat,type="I",prv) for type I non-RL Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "SpatPatterns"
.
Given the set of points,
dat
, in a region, this function assigns round(n*prop,0)
of them as cases,
and the rest as controls with first selecting a point, , as a case and assigning the
label case to the remaining points with infection probabilities
prob=c(prop+((1-prop)*rho)/(1:k))
where rho
is a
parameter adjusting the NN dependence of infection probabilities.
The number of cases will be on the average if the argument
poisson=TRUE
(i.e., rpois(1,round(n*prop,0))
), otherwise round(n*prop,0)
.
We stop when we first exceed cases.
rho
must be between -prop/(1-prop)
and 1 for the infection
probabilities to be valid.
The init.from.cases
is a logical argument (with default=TRUE
) to determine the initial cases are from the
cases or controls (the first initial case is always from controls), so if TRUE
, initial cases (other than
the first initial case) are selected randomly among the cases (as if they are contagious), otherwise,
they are selected from controls as new cases infecting their k
NNs.
otherwise first entry is chosen as the case (or case is recorded as the first entry) in the data set, dat
.
Algorithmically, first all dat points are treated as non-cases (i.e. controls or healthy subjects). Then the function follows the following steps for labeling of the points:
step 0: is generated randomly from a Poisson distribution with
mean = n*prop
, so that the
average number of cases is n*prop
.
step 0: is generated randomly from a Poisson distribution with
mean = round(n*prop,0)
, so that the
average number of cases will be round(n*prop,0)
if the argument poisson=TRUE
, else round(n*prop,0)
.
step 1: Initially, one point from dat is selected randomly as a case. In the first round this point is selected
from the controls, and the subsequent rounds, it is selected from cases if the argument init.from.cases=TRUE
,
and from controls otherwise. Then it assigns the label case to the k
NNs among controls of the initial case
selected in step 1 with infection probabilities prob=c(prop+((1-prop)*rho)/(1:k))
, see the description for the details
of the parameters in the prob
.
step 2: Then this initial case and cases among its k
NNs (possibly all points) in step 2 are removed from
the data, and for the remaining control points step 1 is applied where initial point is from cases or control
based on the argument init.from.cases.
step 3: The procedure ends when number of cases exceeds
, and
of the cases (other than the
initial cases) are randomly selected and relabeled as controls, i.e. 0s,
so that the number of cases is exactly
.
In the output cases are labeled as 1 and controls as 0.
Note that the infection probabilities of the k
NNs of each initial case increase
with increasing rho, and infection probability decreases for increasing k in the k
NNs.
See Ceyhan (2014) for more detail where type I non-RL pattern is the
case 1 of non-RL pattern considered in Section 6 with is
fixed as a parameter rather than being generated from a Poisson distribution and
init=FALSEALSE
.
Although the non-RL pattern is described for the case-control setting, it can be adapted for any two-class setting when it is appropriate to treat one of the classes as cases or one of the classes behave like cases and other class as controls.
rnonRLI(dat, prop = 0.5, k, rho, poisson = FALSE, init.from.cases = TRUE)
rnonRLI(dat, prop = 0.5, k, rho, poisson = FALSE, init.from.cases = TRUE)
dat |
A set of points the non-RL procedure is applied to obtain cases and controls randomly in the type I fashion (see the description). |
prop |
A real number between 0 and 1 (inclusive) representing the proportion of new cases (on the average)
infected by the initial cases, i.e., number of newly infected cases (in addition to the initial cases) is
Poisson with |
k |
An integer representing the number of NNs considered for each initial case, i.e., |
rho |
A parameter for labeling the |
poisson |
A logical argument (default is |
init.from.cases |
A logical argument (default is |
A list
with the elements
pat.type |
|
type |
The type of the point pattern |
parameters |
|
dat.points |
The set of points non-RL procedure is applied to obtain cases and controls randomly in the type I fashion |
lab |
The labels of the points as 1 for cases and 0 for controls after the type I nonRL procedure is
applied to the data set, |
init.cases |
The initial cases in the data set, |
gen.points , ref.points
|
Both are |
desc.pat |
Description of the point pattern |
mtitle |
The |
num.points |
The |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
Ceyhan E (2014). “Segregation indices for disease clustering.” Statistics in Medicine, 33(10), 1662-1684.
rnonRLII
, rnonRLIII
, rnonRLIV
, and rnonRL
n<-40; #try also n<-20; n<-100; #data generation dat<-cbind(runif(n,0,1),runif(n,0,1)) prop<-.5; #try also .25, .75 rho<- .3 knn<-3 #try 2 or 5 Xdat<-rnonRLI(dat,prop,knn,rho,poisson=FALSE,init=FALSE) #labeled data try also poisson=TRUE or init=FALSE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #normal original data n<-40; #try also n<-20; n<-100; #data generation dat<-cbind(rnorm(n,0,1),rnorm(n,0,1)) prop<-.50; #try also .25, .75 rho<- .3 knn<-5 #try 2 or 3 Xdat<-rnonRLI(dat,prop,knn,rho,poisson=FALSE) #labeled data try also poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n<-40; #try also n<-20; n<-100; #data generation dat<-cbind(runif(n,0,1),runif(n,0,1)) prop<-.5; #try also .25, .75 rho<- .3 knn<-3 #try 2 or 5 Xdat<-rnonRLI(dat,prop,knn,rho,poisson=FALSE,init=FALSE) #labeled data try also poisson=TRUE or init=FALSE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #normal original data n<-40; #try also n<-20; n<-100; #data generation dat<-cbind(rnorm(n,0,1),rnorm(n,0,1)) prop<-.50; #try also .25, .75 rho<- .3 knn<-5 #try 2 or 3 Xdat<-rnonRLI(dat,prop,knn,rho,poisson=FALSE) #labeled data try also poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "SpatPatterns"
.
Given the set of points,
dat
, in a region, this function assigns round(n*ult.prop,0)
of them as cases,
and the rest as controls with first selecting round(n*init.prop,0)
as cases initially, then selecting
a contagious case and then assigning the label case to the remaining points with infection probabilities
inversely proportional to their position among the k
NNs.
The initial and ultimate number of cases will be and
on the average if the argument
poisson=TRUE
(i.e., rpois(1,round(n*init.prop,0)
) and rpois(1,round(n*ult.prop,0))
), otherwise
they will be exactly equal to round(n*ult.prop,0)
and round(n*init.prop,0)
.
More specifically, let be the initial cases. Then one of the cases is selected as a
contagious case, say
and then its
k
NNs (among the non-cases) are found.
Then label these k
NN non-case points as cases with infection probabilities prob
equal to the value
of the rho*(1/(1:k))^pow
values at these points, where rho
is a scaling parameter for
the infection probabilities and pow
is a parameter in the power adjusting the k
NN dependence.
We stop when we first exceed cases.
rho
has to be in for
prob
to be a vector
of probabilities,
and for a given rho
, pow
must be .
If
rand.init=TRUE
, first entries are chosen as the initial cases in the data set,
dat
, otherwise, initial cases are selected randomly among the data points.
Algorithmically, first all dat points are treated as non-cases (i.e. controls or healthy subjects). Then the function follows the following steps for labeling of the points:
step 0: is generated randomly from a Poisson distribution with
mean = round(n*ult.prop,0)
, so that the
average number of ultimate cases will be round(n*ult.prop,0)
if the argument poisson=TRUE
, else round(n*ult.prop,0)
.
And is generated randomly from a Poisson distribution with
mean = round(n*init.prop,0)
, so that the
average number of initial cases will be round(n*init.prop,0)
if the argument poisson=TRUE
, else round(n*init.prop,0)
.
step 1: Initially, many points from dat are selected as cases.
The selection of initial cases are determined based on the argument
rand.init
(with default=TRUE
)
where if rand.init=TRUE
then the initial cases are selected randomly from the data points, and if rand.init=
FALSE
, the first entries in the data set,
dat
, are selected as the cases.
step 2: Then it selects a contagious case among the cases, and randomly labels its k
control NNs as cases with
decreasing infection probabilities prob=rho*(1/(1:k))^pow
. See the description for the details
of the parameters in the prob
.
step 3: The procedure ends when number of cases exceeds
, and
of the cases (other than the
initial cases) are randomly selected and relabeled as controls, i.e. 0s,
so that the number of cases is exactly
.
Note that the infection probabilities of the k
NNs of each initial case increase
with increasing rho; and probability of infection decreases as further NNs are considered from
a contagious case (i.e. as k
increases in the k
NNs).
See Ceyhan (2014) for more detail where type II non-RL pattern is the
case 2 of non-RL pattern considered in Section 6 with is
fixed as a parameter rather than being generated from a Poisson distribution and
pow=1
.
Although the non-RL pattern is described for the case-control setting, it can be adapted for any two-class setting when it is appropriate to treat one of the classes as cases or one of the classes behave like cases and other class as controls.
rnonRLII( dat, k, rho, pow, init.prop, ult.prop, rand.init = TRUE, poisson = FALSE )
rnonRLII( dat, k, rho, pow, init.prop, ult.prop, rand.init = TRUE, poisson = FALSE )
dat |
A set of points the non-RL procedure is applied to obtain cases and controls randomly in the type II fashion (see the description). |
k |
An integer representing the number of NNs considered for each contagious case, i.e.,
|
rho |
A scaling parameter for the probabilities of labeling the points as cases (see the description). |
pow |
A parameter in the power adjusting the |
init.prop |
A real number between 0 and 1 representing the initial proportion of cases in the data set,
|
ult.prop |
A real number between 0 and 1 representing the ultimate proportion of cases in the data set,
|
rand.init |
A logical argument (default is |
poisson |
A logical argument (default is |
A list
with the elements
pat.type |
|
type |
The type of the point pattern |
parameters |
Number of NNs, |
dat.points |
The set of points non-RL procedure is applied to obtain cases and controls randomly in the type II fashion |
lab |
The labels of the points as 1 for cases and 0 for controls after the type II nonRL procedure is
applied to the data set, |
init.cases |
The initial cases in the data set, |
cont.cases |
The contagious cases in the data set, |
gen.points , ref.points
|
Both are |
desc.pat |
Description of the point pattern |
mtitle |
The |
num.points |
The |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
Ceyhan E (2014). “Segregation indices for disease clustering.” Statistics in Medicine, 33(10), 1662-1684.
rnonRLI
, rnonRLIII
, rnonRLIV
, and rnonRL
n<-40; #try also n<-20; n<-100; #data generation dat<-cbind(runif(n,0,1),runif(n,0,1)) rho<-.8 pow<-2 knn<-5 #try 2 or 3 ip<-.3 #initial proportion up<-.5 #ultimate proportion Xdat<-rnonRLII(dat,knn,rho,pow,ip,up,poisson=FALSE) #labeled data, try poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #normal original data n<-40; #try also n<-20; n<-100; #data generation dat<-cbind(rnorm(n,0,1),rnorm(n,0,1)) rho<-0.8 pow<-2 knn<-5 #try 2 or 3 ip<-.3 #initial proportion up<-.5 #ultimate proportion Xdat<-rnonRLII(dat,knn,rho,pow,ip,up,poisson=FALSE) #labeled data, try poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n<-40; #try also n<-20; n<-100; #data generation dat<-cbind(runif(n,0,1),runif(n,0,1)) rho<-.8 pow<-2 knn<-5 #try 2 or 3 ip<-.3 #initial proportion up<-.5 #ultimate proportion Xdat<-rnonRLII(dat,knn,rho,pow,ip,up,poisson=FALSE) #labeled data, try poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #normal original data n<-40; #try also n<-20; n<-100; #data generation dat<-cbind(rnorm(n,0,1),rnorm(n,0,1)) rho<-0.8 pow<-2 knn<-5 #try 2 or 3 ip<-.3 #initial proportion up<-.5 #ultimate proportion Xdat<-rnonRLII(dat,knn,rho,pow,ip,up,poisson=FALSE) #labeled data, try poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "SpatPatterns"
.
Given the set of points,
dat
, in a region, this function assigns round(n*prop,0)
of them as cases,
and the rest as controls with first selecting a point, , as a case and assigning the
label case to the remaining points with infection probabilities
where
is the
distance from
to
for
,
is the maximum of
values,
rho
is a scaling parameter for
the infection probabilities and pow
is a parameter in the power adjusting the distance dependence.
The number of cases will be on the average if the argument
poisson=TRUE
(i.e., rpois(1,round(n*prop,0))
), otherwise round(n*prop,0)
.
We stop when we first exceed cases.
rho
has to be positive for prob
to be a vector
of probabilities,
and for a given rho
, pow
must be ,
also, when
pow
is given, rho
must be .
If
rand.init=TRUE
, initial case is selected randomly among the data points,
otherwise first entry is chosen as the case (or case is recorded as the first entry) in the data set, dat
.
Algorithmically, first all dat points are treated as non-cases (i.e. controls or healthy subjects). Then the function follows the following steps for labeling of the points:
step 0: is generated randomly from a Poisson distribution with
mean = round(n*prop,0)
, so that the
average number of cases will be round(n*prop,0) if the argument poisson=TRUE
, else round(n*prop,0)
.
step 1: Initially, one point from dat is selected as a case.
The selection of initial case is determined based on the argument rand.init
(with default=TRUE
)
where if rand.init=TRUE
then the initial case is selected randomly from the data points, and if rand.init=
FALSE
, the first entry in the data set, dat
, is selected as the case.
step 2: Then it assigns the label case to the remaining points
with infection probabilities , see the description for the details
of the parameters in the
prob
.
step 3: The procedure ends when number of cases exceeds
, and
of the cases (other than the
initial contagious case) are randomly selected and relabeled as controls, i.e. 0s,
so that the number of cases is exactly
.
In the output cases are labeled as 1 and controls as 0, and initial contagious case is marked with a red cross
in the plot of the pattern.
Note that the infection probabilities of the points is inversely proportional to their distances to the
initial case and increase with increasing rho
.
This function might take a long time for certain choices of the arguments. For example, if pow
is taken to be
too large, the infection probabilities would be too small, and case assignment will take a rather long time.
See Ceyhan (2014) for more detail where type III non-RL pattern is the
case 3 of non-RL pattern considered in Section 6 with is
fixed as a parameter rather than being generated from a Poisson distribution and
and pow
is represented as
.
Although the non-RL pattern is described for the case-control setting, it can be adapted for any two-class setting when it is appropriate to treat one of the classes as cases or one of the classes behave like cases and other class as controls.
rnonRLIII(dat, prop, rho, pow, rand.init = TRUE, poisson = FALSE)
rnonRLIII(dat, prop, rho, pow, rand.init = TRUE, poisson = FALSE)
dat |
A set of points the non-RL procedure is applied to obtain cases and controls randomly in the type III fashion (see the description). |
prop |
A real number between 0 and 1 (inclusive) representing the proportion of new cases (on the average)
infected by the initial case, i.e., number of newly infected cases (in addition to the first case) is Poisson
with |
rho |
A scaling parameter for the probabilities of labeling the points as cases (see the description). |
pow |
A parameter in the power adjusting the distance dependence in the probabilities of labeling the points as cases (see the description). |
rand.init |
A logical argument (default is |
poisson |
A logical argument (default is |
A list
with the elements
pat.type |
|
type |
The type of the point pattern |
parameters |
rho and pow, where |
dat.points |
The set of points non-RL procedure is applied to obtain cases and controls randomly in the type III fashion |
lab |
The labels of the points as 1 for cases and 0 for controls after the type III nonRL procedure is
applied to the data set, |
init.cases |
The initial case in the data set, |
cont.cases |
The contagious cases in the data set, |
gen.points , ref.points
|
Both are |
desc.pat |
Description of the point pattern |
mtitle |
The |
num.points |
The |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
Ceyhan E (2014). “Segregation indices for disease clustering.” Statistics in Medicine, 33(10), 1662-1684.
rnonRLI
, rnonRLII
, rnonRLIV
, and rnonRL
n<-40; #try also n<-20; n<-100; prop<- .5; #try also .25, .75 #data generation dat<-cbind(runif(n,0,1),runif(n,0,1)) rho<-.8 pow<-2 Xdat<-rnonRLIII(dat,prop,rho,pow,poisson=FALSE) #labeled data, try also poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #normal original data n<-40; #try also n<-20; n<-100; dat<-cbind(rnorm(n,0,1),rnorm(n,0,1)) prop<- .5; #try also .25, .75 rho<-.8 pow<-2 Xdat<-rnonRLIII(dat,prop,rho,pow,poisson=FALSE) #labeled data, try also poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n<-40; #try also n<-20; n<-100; prop<- .5; #try also .25, .75 #data generation dat<-cbind(runif(n,0,1),runif(n,0,1)) rho<-.8 pow<-2 Xdat<-rnonRLIII(dat,prop,rho,pow,poisson=FALSE) #labeled data, try also poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #normal original data n<-40; #try also n<-20; n<-100; dat<-cbind(rnorm(n,0,1),rnorm(n,0,1)) prop<- .5; #try also .25, .75 rho<-.8 pow<-2 Xdat<-rnonRLIII(dat,prop,rho,pow,poisson=FALSE) #labeled data, try also poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "SpatPatterns"
.
Given the set of points,
dat
, in a region, this function assigns round(n*ult.prop,0)
of them as cases,
and the rest as controls with first selecting round(n*init.prop,0)
as cases initially and assigning the
label case to the remaining points with infection probabilities equal to the scaled bivariate normal density values
at those points.
The initial and ultimate number of cases will be and
on the average if the argument
poisson=TRUE
(i.e., rpois(1,round(n*init.prop,0)
) and rpois(1,round(n*ult.prop,0))
), otherwise
they will be exactly equal to round(n*ult.prop,0)
and round(n*init.prop,0)
.
More specifically, let be the initial cases and for
let
be the value of the pdf of the
, which is the bivariate normal
distribution mean=z_j and standard deviations of the first and second components being
and
(denoted as
s1
and s2
as arguments of the function) and
correlation between them being (denoted as
rho
as an argument of the function)
(i.e., the covariance matrix is where
,
,
). Add these pdf values as
for each
and find
.
Then label the points (other than the initial cases) as cases with infection probabilities
prob
equal to the value
of the values at these points.
We stop when we first exceed
cases.
has to be in (-1,1) for
prob
to be
a valid probability and and
must be positive (actually these are required for the BVN density
to be nondegenerately defined).
If
rand.init=TRUE
, first entries are chosen as the initial cases in the data set,
dat
, otherwise, initial cases are selected randomly among the data points.
Algorithmically, first all dat points are treated as non-cases (i.e. controls or healthy subjects). Then the function follows the following steps for labeling of the points:
step 0: is generated randomly from a Poisson distribution with
mean = round(n*ult.prop,0)
, so that the
average number of ultimate cases will be round(n*ult.prop,0)
if the argument poisson=TRUE
, else round(n*ult.prop,0)
.
And is generated randomly from a Poisson distribution with
mean = round(n*init.prop,0)
, so that the
average number of initial cases will be round(n*init.prop,0) if the argument poisson=TRUE
, else round(n*init.prop,0)
.
step 1: Initially, many points from dat are selected as cases.
The selection of initial cases are determined based on the argument
rand.init
(with default=TRUE
)
where if rand.init=TRUE
then the initial cases are selected randomly from the data points, and if rand.init=
FALSE
, the first entries in the data set,
dat
, are selected as the cases.
step 2: Then it assigns the label case to the remaining points
with infection probabilities ,
which is the sum of the BVN densities scaled by the maximum of such sums.
See the description for the details of the parameters in the
prob
.
step 3: The procedure ends when number of cases exceed
, and
of the cases (other than the initial
cases) are randomly selected and relabeled as controls, i.e. 0s, so that the number of cases is
exactly
.
In the output cases are labeled as 1 and controls as 0, and initial contagious case is marked with a red cross in the plot of the pattern.
See Ceyhan (2014) for more detail where type IV non-RL pattern is the
case 4 of non-RL pattern considered in Section 6 with and
are
fixed as parameters and
rho
is represented as and
in the article.
Although the non-RL pattern is described for the case-control setting, it can be adapted for any two-class setting when it is appropriate to treat one of the classes as cases or one of the classes behave like cases and other class as controls.
rnonRLIV( dat, init.prop, ult.prop, s1, s2, rho, rand.init = TRUE, poisson = FALSE )
rnonRLIV( dat, init.prop, ult.prop, s1, s2, rho, rand.init = TRUE, poisson = FALSE )
dat |
A set of points the non-RL procedure is applied to obtain cases and controls randomly in the type IV fashion (see the description). |
init.prop |
A real number between 0 and 1 representing the initial proportion of cases in the data set,
|
ult.prop |
A real number between 0 and 1 representing the ultimate proportion of cases in the data set,
|
s1 , s2
|
Positive real numbers representing the standard deviations of the first and second components of the bivariate normal distribution. |
rho |
A real number between -1 and 1 representing the correlation between the first and second components of the bivariate normal distribution. |
rand.init |
A logical argument (default is |
poisson |
A logical argument (default is |
A list
with the elements
pat.type |
|
type |
The type of the point pattern |
parameters |
initial and ultimate proportion of cases after the non-RL procedure is applied to the data,
|
dat.points |
The set of points non-RL procedure is applied to obtain cases and controls randomly in the type IV fashion |
lab |
The labels of the points as 1 for cases and 0 for controls after the type IV nonRL procedure is
applied to the data set, |
init.cases |
The initial cases in the data set, |
gen.points , ref.points
|
Both are |
desc.pat |
Description of the point pattern |
mtitle |
The |
num.points |
The |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
Ceyhan E (2014). “Segregation indices for disease clustering.” Statistics in Medicine, 33(10), 1662-1684.
rnonRLI
, rnonRLII
, rnonRLIII
, and rnonRL
n<-40; #try also n<-20; n<-100; ult<-.5; #try also .25, .75 #data generation dat<-cbind(runif(n,0,1),runif(n,0,1)) int<-.1 s1<-s2<-.4 rho<- .1 Xdat<-rnonRLIV(dat,int,ult,s1,s2,rho,poisson=FALSE) #labeled data, try also with poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #normal original data n<-40; #try also n<-20; n<-100; dat<-cbind(rnorm(n,0,1),rnorm(n,0,1)) ult<-.5; #try also .25, .75 int<-.1 s1<-s2<-.4 rho<-0.1 Xdat<-rnonRLIV(dat,int,ult,s1,s2,rho,poisson=FALSE) #labeled data, try also with poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n<-40; #try also n<-20; n<-100; ult<-.5; #try also .25, .75 #data generation dat<-cbind(runif(n,0,1),runif(n,0,1)) int<-.1 s1<-s2<-.4 rho<- .1 Xdat<-rnonRLIV(dat,int,ult,s1,s2,rho,poisson=FALSE) #labeled data, try also with poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #normal original data n<-40; #try also n<-20; n<-100; dat<-cbind(rnorm(n,0,1),rnorm(n,0,1)) ult<-.5; #try also .25, .75 int<-.1 s1<-s2<-.4 rho<-0.1 Xdat<-rnonRLIV(dat,int,ult,s1,s2,rho,poisson=FALSE) #labeled data, try also with poisson=TRUE Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "Clusters"
.
Generates n
2D points with k
() clusters with centers d unit away from origin and angles
between the rays joining successive centers and origin is
where about
points belongs to each cluster.
If distribution="uniform"
, the points are uniformly generated in their square
supports with unit edge lengths and centers at for
.
If distribution="bvnormal"
, the points are generated from the bivariate normal distribution with means equal to the
centers of the above squares (i.e. for each cluster with mean=
for
and the covariance matrix
, where
and
is the
identity matrix.
Notice that the clusters are more separated, i.e., generated data indicates more clear clusters as increases
in either direction with
indicating one cluster in the data. For a fixed
, when
distribution="bvnormal"
,
the clustering gets stronger if the variance of each component, , gets smaller, and clustering gets weaker
as the variance of each component gets larger where default is
.
rrot.clust( n, k, d, sd = d * sqrt(2 * (1 - cos(2 * pi/k)))/3, distribution = c("uniform", "bvnormal") )
rrot.clust( n, k, d, sd = d * sqrt(2 * (1 - cos(2 * pi/k)))/3, distribution = c("uniform", "bvnormal") )
n |
A positive integer representing the number of points to be generated from all the clusters |
k |
A positive integer representing the number of clusters to be generated |
d |
Radial shift indicating the level of clustering in the data. Larger absolute values in either direction (i.e. positive or negative) would yield stronger clustering. |
sd |
The standard deviation of the components of the bivariate normal distribution with default
|
distribution |
The argument determining the distribution of each cluster. Takes on values |
A list
with the elements
type |
The type of the clustering pattern |
parameters |
The number of clusters, |
gen.points |
The output set of generated points from the |
desc.pat |
Description of the clustering pattern |
mtitle |
The |
num.points |
The number of generated points. |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
n<-100; #try also n<-50; n<-1000; d<- 1.5 #try also -1, 1, 1.5, 2 k<-3 #try also 5 #data generation Xdat<-rrot.clust(n,k,d) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #data generation (bvnormal) n<-100; #try also n<-50; n<-1000; d<- 1.5 #try also -1, 1, 1.5, 2 k<-3 #try also 5 Xdat<-rrot.clust(n,k,d,distr="bvnormal") #also try Xdat<-rrot.clust(n,k,d,sd=.5,distr="bvnormal") Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n<-100; #try also n<-50; n<-1000; d<- 1.5 #try also -1, 1, 1.5, 2 k<-3 #try also 5 #data generation Xdat<-rrot.clust(n,k,d) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #data generation (bvnormal) n<-100; #try also n<-50; n<-1000; d<- 1.5 #try also -1, 1, 1.5, 2 k<-3 #try also 5 Xdat<-rrot.clust(n,k,d,distr="bvnormal") #also try Xdat<-rrot.clust(n,k,d,sd=.5,distr="bvnormal") Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "SpatPatterns"
.
Generates n_i
2D points from class with parameters
for
.
The generated points are from two different classes which are segregated from each other.
The pattern generation starts with the initial points
X1.init
and X2.init
(with default=NULL
for both).
If both X1.init=NULL
and X2.init=NULL
, both X1.init
and X2.init
are generated uniformly in the unit square.
If only X1.init=NULL
, X1.init
is the sum of a point uniformly generated in the unit square and X2.init
and
if only X2.init=NULL
, X2.init
is the sum of a point uniformly generated in the unit square and X1.init
.
After the initial points from each class are available, points from class
are generated
as
Xj[i,]<-Xj[(i-1),]+ru*c(cos(tu),sin(tu))
where ru<-runif(1,0,rj)
and tu<-runif(1,0,2*pi)
for
with
Xj[1,]=Xj.init
for .
That is, at each step the new point in class
is generated within a circle with radius equal to
(uniform in the polar coordinates).
Note that, the level of segregation is stronger if the initial points are further apart, and the level
of segregation increases as the radius values gets smaller.
rseg(n1, n2, r1, r2, X1.init = NULL, X2.init = NULL)
rseg(n1, n2, r1, r2, X1.init = NULL, X2.init = NULL)
n1 , n2
|
Positive integers representing the number of class 1 and class 2 (i.e. |
r1 , r2
|
Positive real numbers representing the radius of attraction within class, i.e. radius of the circle center and generated points are from the same class. |
X1.init , X2.init
|
2D points representing the initial points for the segregated classes, default= |
A list
with the elements
pat.type |
|
type |
The type of the point pattern |
parameters |
Radial (i.e. circular) within class radii of segregation, |
lab |
The class labels of the generated points, it is 1 class 1 or |
init.cases |
The initial points for class 1 and class 2, one initial point for each class. |
gen.points |
The output set of generated points (i.e. class 1 and class 2 points) segregated from each other. |
ref.points |
The input set of reference points, it is |
desc.pat |
Description of the point pattern |
mtitle |
The |
num.points |
The |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
n1<-20; #try also n1<-10; n1<-100; n2<-20; #try also n1<-40; n2<-50 r1<-.3; r2<-.2 #data generation Xdat<-rseg(n1,n2,r1,r2) #labeled data Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #with one initial point X1init<-c(3,2) Xdat<-rseg(n1,n2,r1,r2,X1.init=X1init) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #with two initial points X1init<-c(3,2) X2init<-c(4,2) Xdat<-rseg(n1,n2,r1,r2,X1init,X2init) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n1<-20; #try also n1<-10; n1<-100; n2<-20; #try also n1<-40; n2<-50 r1<-.3; r2<-.2 #data generation Xdat<-rseg(n1,n2,r1,r2) #labeled data Xdat table(Xdat$lab) summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #with one initial point X1init<-c(3,2) Xdat<-rseg(n1,n2,r1,r2,X1.init=X1init) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #with two initial points X1init<-c(3,2) X2init<-c(4,2) Xdat<-rseg(n1,n2,r1,r2,X1init,X2init) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "SpatPatterns"
.
Generates 2D points from class 1 and
(denoted as
n2
as an argument)
2D points from class 2 in such a way that
self-reflexive pairs are more frequent than expected under CSR independence.
If distribution="uniform"
, the points from class 1, say are generated as follows:
for
for
where
,
and for
,
where
and
are iid
.
Similarly, the points from class 2, say
are generated as follows:
for
for
where
,
and for
,
where
and
.
This version is the case IV in the article (Ceyhan (2018)).
If distribution="bvnormal"
, the points from class 1, say are generated as follows:
where
is the center of mass of
and I_2x is a
matrix with diagonals
equal to
with
and off-diagonals are 0 for
where
,
and for
,
where
with
being the
matrix with diagonals
and 0 off-diagonals,
and
are iid
.
Similarly, the points from class 2, say
are generated as follows:
where
is the center of mass of
and I_2y is a
matrix with diagonals
equal to
with
and off-diagonals are 0 for
where
,
and for
,
where
with
being the
matrix with diagonals
and 0 off-diagonals,
and
.
Notice that the classes will be segregated if the supports and
are separated, with more separation
implying stronger segregation. Furthermore,
(denoted as
r0
as an argument) determines the level of self-reflexivity or self correspondence,
i.e. smaller implies a higher level of self correspondence and vice versa for higher
.
See also (Ceyhan (2018)) and the references therein.
rself.ref(n1, n2, c1r, c2r, r0, distribution = c("uniform", "bvnormal"))
rself.ref(n1, n2, c1r, c2r, r0, distribution = c("uniform", "bvnormal"))
n1 , n2
|
Positive integers representing the numbers of points to be generated from the two classes |
c1r , c2r
|
Ranges of the squares which constitute the supports of the two classes |
r0 |
The radius of attraction which determines the level of self-reflexivity (or self correspondence) in both the uniform and bvnormal distributions for the two classes |
distribution |
The argument determining the distribution of each class. Takes on values |
A list
with the elements
pat.type |
|
type |
The type of the spatial pattern |
parameters |
The radius of attraction |
lab |
The class labels of the generated points, it is 1 class 1 or X1 points and
2 for class 2 or |
init.cases |
The initial points for class 1 and class 2, one initial point for each class, marked with a cross in the plot. |
gen.points |
The output set of generated points from the self-correspondence pattern. |
ref.points |
The input set of reference points, it is |
desc.pat |
Description of the species correspondence pattern |
mtitle |
The |
num.points |
The number of generated points. |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
n1<-50; #try also n1<-50; n1<-1000; n2<-50; #try also n2<-50; n2<-1000; c1r<-c(0,1) #try also c(0,5/6), C(0,3/4), c(0,2/3) c2r<-c(0,1) #try also c(1/6,1), c(1/4,1), c(1/3,1) r0<-1/9 #try also 1/7, 1/8 #data generation Xdat<-rself.ref(n1,n2,c1r,c2r,r0) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #data generation (bvnormal) Xdat<-rself.ref(n1,n2,c1r,c2r,r0,distr="bvnormal") Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n1<-50; #try also n1<-50; n1<-1000; n2<-50; #try also n2<-50; n2<-1000; c1r<-c(0,1) #try also c(0,5/6), C(0,3/4), c(0,2/3) c2r<-c(0,1) #try also c(1/6,1), c(1/4,1), c(1/3,1) r0<-1/9 #try also 1/7, 1/8 #data generation Xdat<-rself.ref(n1,n2,c1r,c2r,r0) Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat) #data generation (bvnormal) Xdat<-rself.ref(n1,n2,c1r,c2r,r0,distr="bvnormal") Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
An object of class "SpatPatterns"
.
Generates n
2D points uniformly in the circle with center=cent
and radius=rad
using the rejection
sampling approach (i.e., the function generates points in the smallest square containing the circle, keeping
only the points inside the circle until points are generated).
The defaults for
cent=c(0,0)
and rad=1
.
runif.circ(n, cent = c(0, 0), rad = 1)
runif.circ(n, cent = c(0, 0), rad = 1)
n |
A positive integer representing the number of points to be generated uniformly in the circle |
cent |
A 2D point representing the center of the circle, with default= |
rad |
A positive real number representing the radius of the circle. |
A list
with the elements
pat.type |
|
type |
The type of the point pattern |
parameters |
center of the circle, |
lab |
The class labels of the generated points, |
init.cases |
The initial points, |
gen.points |
The output set of generated points uniform in the circle. |
ref.points |
The input set of reference points, it is |
desc.pat |
Description of the point pattern |
mtitle |
The |
num.points |
The number of generated points. |
xlimit , ylimit
|
The possible ranges of the |
Elvan Ceyhan
n<-20 #or try sample(1:20,1); #try also 10, 100, or 1000; r<-.1; #try also r<-.3 or .5 cent<-c(1,2) #data generation Xdat<-runif.circ(n,cent,r) #generated data Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
n<-20 #or try sample(1:20,1); #try also 10, 100, or 1000; r<-.1; #try also r<-.3 or .5 cent<-c(1,2) #data generation Xdat<-runif.circ(n,cent,r) #generated data Xdat summary(Xdat) plot(Xdat,asp=1) plot(Xdat)
Returns Dixon's segregation indices in matrix form based on entries of the NNCT, ct
.
Segregation index for cell is defined as
if
and
as
if
.
See (Dixon (2002); Ceyhan (2014)).
The argument inf.corr
is a logical argument (default=FALSE
) to avoid for the segregation
indices. If
TRUE
indices are modified so that they are finite and if FALSE
the above definition is used.
(See Ceyhan (2014) for more detail).
seg.ind(ct, inf.corr = FALSE)
seg.ind(ct, inf.corr = FALSE)
ct |
A contingency table, in particular an NNCT |
inf.corr |
A logical argument (default= |
Returns a matrix
of segregation indices which is of the same dimension as ct
.
Elvan Ceyhan
Pseg.coeff
, seg.coeff
, Zseg.ind
and Zseg.ind.ct
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct seg.ind(ct) seg.ind(ct,inf.corr = TRUE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) seg.ind(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) seg.ind(ct) seg.ind(ct,inf.corr = TRUE) ct<-matrix(c(0,10,5,5),ncol=2) seg.ind(ct) seg.ind(ct,inf.corr = TRUE)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct seg.ind(ct) seg.ind(ct,inf.corr = TRUE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) seg.ind(ct) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) seg.ind(ct) seg.ind(ct,inf.corr = TRUE) ct<-matrix(c(0,10,5,5),ncol=2) seg.ind(ct) seg.ind(ct,inf.corr = TRUE)
Test statisticThis function estimates the skewness of Cuzick and Edwards test statistic under the RL hypothesis.
Skewness of a random variable
is defined as
where
.
Skewness is used for Tango's correction to Cuzick and Edwards k
NN test statistic, .
Tango's correction is a chi-square approximation, and its degrees of freedom is estimated using the skewness
estimate (see page 121 of Tango (2007)).
The argument, , is the number of cases (denoted as
n1
as an argument)
and k
is the number of NNs considered in test statistic.
The argument of the function is the
matrix,
a
, which is the output of the function aij.mat
.
However, inside the function we symmetrize the matrix a
as b <- (a+a^t)/2
, to facilitate the formulation.
The number of cases are denoted as and number of controls as
in this function
to match the case-control class labeling,
which is just the reverse of the labeling in Cuzick and Edwards (1990).
SkewTk(n1, k, a)
SkewTk(n1, k, a)
n1 |
Number of cases |
k |
Integer specifying the number of NNs (of subject |
a |
|
The skewness of Cuzick and Edwards test statistic for disease clustering
Elvan Ceyhan
Cuzick J, Edwards R (1990).
“Spatial clustering for inhomogeneous populations (with discussion).”
Journal of the Royal Statistical Society, Series B, 52, 73-104.
Tango T (2007).
“A class of multiplicity adjusted tests for spatial clustering based on case-control point data.”
Biometrics, 63, 119-127.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) n1<-sum(cls==1) k<-sample(1:5,1) # try also 3, 5, sample(1:5,1) k a<-aij.mat(Y,k) SkewTk(n1,k,a)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) n1<-sum(cls==1) k<-sample(1:5,1) # try also 3, 5, sample(1:5,1) k a<-aij.mat(Y,k) SkewTk(n1,k,a)
Clusters
objectReturns the below information about the object
:
call
of the function defining the object
, the type
of the pattern, parameters
of the pattern,
study window, some sample points from the generated pattern, reference points (if any for the bivariate pattern),
and number of points for each class
## S3 method for class 'Clusters' summary(object, ...)
## S3 method for class 'Clusters' summary(object, ...)
object |
Object of class |
... |
Additional parameters for |
The call
of the object of class 'Clusters
', the type
of the pattern, parameters
of the pattern,
study window, some sample points from the generated pattern, reference points (if any for the bivariate pattern),
and number of points for each class
#TBF
#TBF
SpatPatterns
objectReturns the below information about the object
:
call
of the function defining the object
, the type
of the pattern, parameters
of the pattern,
study window, some sample points from the generated pattern, reference points (if any for the bivariate pattern),
and number of points for each class
## S3 method for class 'SpatPatterns' summary(object, ...)
## S3 method for class 'SpatPatterns' summary(object, ...)
object |
Object of class |
... |
Additional parameters for |
The call
of the object of class 'SpatPatterns
', the type
of the pattern, parameters
of the pattern,
study window, some sample points from the generated pattern, reference points (if any for the bivariate pattern),
and number of points for each class
#TBF
#TBF
Locations and species classification of trees in a plot in the Savannah River, SC, USA.
Locations are given in meters, rounded to the nearest 0.1 decimal.
The data come from a one-hectare (200-by-50m) plot in the Savannah River Site.
The 734 mapped stems included 156 Carolina ashes (Fraxinus caroliniana),
215 water tupelos (Nyssa aquatica), 205 swamp tupelos (Nyssa sylvatica), 98 bald cypresses (Taxodium distichum)
and 60 stems from 8 additional three species (labeled as Others (OT)).
The plots were set up by Bill Good and their spatial patterns described in (Good and Whipple (1982)),
the plots have been maintained and resampled by Rebecca Sharitz and her colleagues of the Savannah River
Ecology Laboratory. The data and some of its description are borrowed from the swamp data entry in the dixon
package in the CRAN repository.
See also (Good and Whipple (1982); Jones et al. (1994); Dixon (2002)).
data(swamptrees)
data(swamptrees)
A data frame with 734 rows and 4 variables
Text describing the variable (i.e., column) names in the data set.
x,y: x and y (i.e., Cartesian) coordinates of the trees
live: a categorical variable that indicates the tree is alive (labeled as 1) or dead (labeled as 0)
sp: species label of the trees:
FX: Carolina ash (Fraxinus caroliniana)
NS: Swamp tupelo (Nyssa sylvatica)
NX: Water tupelo (Nyssa aquatica)
TD: Bald cypress (Taxodium distichum)
OT: Other species
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
Good BJ, Whipple SA (1982).
“Tree spatial patterns: South Carolina bottomland and swamp forests.”
Bulletin of the Torrey Botanical Club, 109(4), 529-536.
Jones RH, Sharitz RR, James SM, Dixon PM (1994).
“Tree population dynamics in seven South Carolina mixed-species forests.”
Bulletin of the Torrey Botanical Club, 121(4), 360-368.
data(swamptrees) plot(swamptrees$x,swamptrees$y, col=as.numeric(swamptrees$sp),pch=19, xlab='',ylab='',main='Swamp Trees')
data(swamptrees) plot(swamptrees$x,swamptrees$y, col=as.numeric(swamptrees$sp),pch=19, xlab='',ylab='',main='Swamp Trees')
This function computes the value of Cuzick & Edwards test statistic in disease clustering, where
is a linear combination of some
tests.
The argument cc.lab
is case-control label, 1 for case, 0 for control, if the argument case.lab
is NULL
,
then cc.lab
should be provided in this fashion, if case.lab
is provided, the labels are converted to 0's
and 1's accordingly.
The argument klist
is the vector
of integers specifying the indices of the values used
in obtaining the
.
The logical argument nonzero.mat
(default=TRUE
) is for using the matrix if
FALSE
or just the matrix of nonzero
locations in the matrix (if
TRUE
) in the computations.
The logical argument asy.cov
(default=FALSE
) is for using the asymptotic covariance or the exact (i.e. finite
sample) covariance for the vector of values used in
Tcomb
in the standardization of .
If
asy.cov=TRUE
, the asymptotic covariance is used, otherwise the exact covariance is used.
See page 87 of (Cuzick and Edwards (1990)) for more details.
Tcomb( dat, cc.lab, klist, case.lab = NULL, nonzero.mat = TRUE, asy.cov = FALSE, ... )
Tcomb( dat, cc.lab, klist, case.lab = NULL, nonzero.mat = TRUE, asy.cov = FALSE, ... )
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
cc.lab |
Case-control labels, 1 for case, 0 for control |
klist |
|
case.lab |
The label used for cases in the |
nonzero.mat |
A logical argument (default is |
asy.cov |
A logical argument (default is |
... |
are for further arguments, such as |
Returns the value of the test statistic
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
n<-20 #or try sample(1:20,1) #try also n<-50, 100 Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) kl<-sample(1:5,3) #try also sample(1:5,2) kl Tcomb(Y,cls,kl) Tcomb(Y,cls,kl,method="max") Tcomb(Y,cls+1,kl,case.lab=2) Tcomb(Y,cls,kl,nonzero.mat = FALSE) Tcomb(Y,cls,kl,asy.cov = TRUE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Tcomb(Y,fcls,kl,case.lab="a")
n<-20 #or try sample(1:20,1) #try also n<-50, 100 Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) n1<-sum(cls==1) kl<-sample(1:5,3) #try also sample(1:5,2) kl Tcomb(Y,cls,kl) Tcomb(Y,cls,kl,method="max") Tcomb(Y,cls+1,kl,case.lab=2) Tcomb(Y,cls,kl,nonzero.mat = FALSE) Tcomb(Y,cls,kl,asy.cov = TRUE) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Tcomb(Y,fcls,kl,case.lab="a")
Contingency Table (TCT)Returns the T
contingency table (TCT), which is a matrix of same dimension as, ct
,
whose entries are the values of the Types I-IV cell-specific test statistics, .
The row and column names are inherited from
ct
. The type argument specifies the type
of the cell-specific test among the types I-IV tests.
See also (Ceyhan (2017)) and the references therein.
tct(ct, type = "III")
tct(ct, type = "III")
ct |
A nearest neighbor contingency table |
type |
The type of the cell-specific test, default= |
A matrix
of the values of Type I-IV cell-specific tests
Elvan Ceyhan
Ceyhan E (2017). “Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.” Journal of the Korean Statistical Society, 46(2), 219-245.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct type.lab<-c("I","II","III","IV") for (i in 1:4) { print(paste("T_ij values for cell specific tests for type",type.lab[i])) print(tct(ct,i)) } tct(ct,"II") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) tct(ct,2) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) tct(ct,2) ct<-matrix(c(0,10,5,5),ncol=2) tct(ct,2)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct type.lab<-c("I","II","III","IV") for (i in 1:4) { print(paste("T_ij values for cell specific tests for type",type.lab[i])) print(tct(ct,i)) } tct(ct,"II") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) tct(ct,2) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) tct(ct,2) ct<-matrix(c(0,10,5,5),ncol=2) tct(ct,2)
-valueTocher's modification is used for the Fisher's exact test on the contingency tables making it less conservative,
by including the probability for the current table based on a randomized test
(Tocher (1950)). It is applied When table-inclusive version of the -value,
, is larger, but table-exclusive version,
, is less than the level of the test
,
a random number,
, is generated from uniform distribution in
, and if
,
is used, otherwise
is used as the
-value.
Table-inclusive and exclusive -values are defined as follows.
Let the probability of the contingency table itself
be
where
is the odds ratio
under the null hypothesis (e.g.
under independence) and
is the probability mass function of the hypergeometric distribution.
In testing the one-sided alternative
versus
,
let
, then
with
, we get the table-inclusive version which is denoted as
and with
, we get the table-exclusive version, denoted as
.
See (Ceyhan (2010)) for more details.
tocher.cor(ptable, pval)
tocher.cor(ptable, pval)
ptable |
Probability of the contingency table under the null hypothesis using the hypergeometric distribution for Fisher's exact test. |
pval |
Table inclusive |
A modified -value based on the Tocher's randomized correction.
Elvan Ceyhan
Ceyhan E (2010).
“Exact Inference for Testing Spatial Patterns by Nearest Neighbor Contingency Tables.”
Journal of Probability and Statistical Science, 8(1), 45-68.
Tocher KD (1950).
“Extension of the Neyman-Pearson theory of tests to discontinuous variates.”
Biometrika, 37, 130-144.
prob.nnct
, exact.pval1s
, and exact.pval2s
ptab<-.03 pval<-.06 tocher.cor(ptab,pval)
ptab<-.03 pval<-.06 tocher.cor(ptab,pval)
value in NN structureReturns the value, which is the number of triplets
with
"
and
" where
is the nearest neighbor function.
Note that in the NN digraph,
is the sum of the indegrees of the points in the reflexive pairs.
This quantity (together with and
) is used in computing the variances and covariances of the entries of the
reflexivity contingency table. See (Ceyhan and Bahadir (2017)) for further
details.
Tval(W, R)
Tval(W, R)
W |
The incidence matrix, |
R |
The number of reflexive NNs (i.e., twice the number of reflexive NN pairs) |
Returns the value. See the description above for the details of this quantity.
Elvan Ceyhan
#3D data points n<-10 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd) R<-Rval(W) Tval(W,R) #1D data points X<-as.matrix(runif(15)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) W<-Wmat(ipd) R<-Rval(W) Tval(W,R) #with ties=TRUE in the data Y<-matrix(round(runif(30)*10),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd,ties=TRUE) R<-Rval(W) Tval(W,R)
#3D data points n<-10 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd) R<-Rval(W) Tval(W,R) #1D data points X<-as.matrix(runif(15)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) W<-Wmat(ipd) R<-Rval(W) Tval(W,R) #with ties=TRUE in the data Y<-matrix(round(runif(30)*10),ncol=3) ipd<-ipd.mat(Y) W<-Wmat(ipd,ties=TRUE) R<-Rval(W) Tval(W,R)
Returns the variances of cell counts for
in the NNCT,
ct
in matrix form which
is of the same dimension as ct
. These variances are valid under RL or conditional on and
under CSR.
See also (Dixon (1994, 2002); Ceyhan (2010, 2017)).
var.nnct(ct, Q, R)
var.nnct(ct, Q, R)
ct |
A nearest neighbor contingency table |
Q |
The number of shared NNs |
R |
The number of reflexive NNs (i.e., twice the number of reflexive NN pairs) |
A matrix
of same dimension as, ct
, whose entries are the variances of the cell counts
in the NNCT with class sizes given as the row sums of ct
. The row and column names are inherited from ct
.
Elvan Ceyhan
Ceyhan E (2010).
“On the use of nearest neighbor contingency tables for testing spatial segregation.”
Environmental and Ecological Statistics, 17(3), 247-282.
Ceyhan E (2017).
“Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.”
Journal of the Korean Statistical Society, 46(2), 219-245.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Dixon PM (2002).
“Nearest-neighbor contingency table analysis of spatial segregation for several species.”
Ecoscience, 9(2), 142-151.
var.tct
, var.nnsym
and cov.nnct
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) var.nnct(ct,Qv,Rv) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) var.nnct(ct,Qv,Rv) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) var.nnct(ct,Qv,Rv)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) ct W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) var.nnct(ct,Qv,Rv) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) var.nnct(ct,Qv,Rv) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) var.nnct(ct,Qv,Rv)
Returns the variances of differences of off-diagonal cell counts for
and
in the NNCT,
ct
in a vector of length , the order of
for
is as in the output of
ind.nnsym(k)
.
These variances are valid under RL or conditional on and
under CSR.
See also (Dixon (1994); Ceyhan (2014)).
var.nnsym(covN)
var.nnsym(covN)
covN |
The |
A vector
of length , whose entries are the variances of differences of off-diagonal
cell counts
for
and
in the NNCT.
Elvan Ceyhan
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
var.nnct
, var.tct
and cov.nnct
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow var.nnsym(covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) var.nnsym(covN)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) #default is byrow var.nnsym(covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) var.nnsym(covN)
Returns the variances of segregation coefficients in a multi-class case based on the NNCT, ct
in a vector
of length , the order of the variances are as in the order of rows output of
ind.seg.coeff(k)
. These variances are valid under RL or conditional on and
under CSR.
See also (Ceyhan (2014)).
The argument covN
is the covariance matrix of (concatenated rowwise).
var.seg.coeff(ct, covN)
var.seg.coeff(ct, covN)
ct |
A nearest neighbor contingency table |
covN |
The |
A vector
of length , whose entries are the variances of segregation coefficients for the
entry
in the NNCT, where the order of the variances are as in the order of rows output of
ind.seg.coeff(k)
.
Elvan Ceyhan
Ceyhan E (2014). “Segregation indices for disease clustering.” Statistics in Medicine, 33(10), 1662-1684.
seg.coeff
, cov.seg.coeff
, var.nnsym
and var.nnct
and
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) var.seg.coeff(ct,covN) varPseg.coeff(ct,covN) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) var.seg.coeff(ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) var.seg.coeff(ct,covN)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) var.seg.coeff(ct,covN) varPseg.coeff(ct,covN) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) var.seg.coeff(ct,covN) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ipd<-ipd.mat(Y) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) var.seg.coeff(ct,covN)
Returns the variances of values for
in the TCT in matrix form which
is of the same dimension as TCT for types I-IV tests.
The argument
covN
must be the covariance between values which are obtained from the NNCT by row-wise
vectorization. type determines the type of the test for which variances are to be computed, with default=
"III"
.
These variances are valid under RL or conditional on and
under CSR.
See also (Ceyhan (2010, 2017)).
var.tct(ct, covN, type = "III")
var.tct(ct, covN, type = "III")
ct |
A nearest neighbor contingency table |
covN |
The |
type |
The type of the cell-specific test, default= |
A matrix
of same dimension as, ct
, whose entries are the variances of
the entries in the TCT for the corresponding type of cell-specific test.
The row and column names are inherited from ct
.
Elvan Ceyhan
Ceyhan E (2010).
“New Tests of Spatial Segregation Based on Nearest Neighbor Contingency Tables.”
Scandinavian Journal of Statistics, 37(1), 147-165.
Ceyhan E (2017).
“Cell-Specific and Post-hoc Spatial Clustering Tests Based on Nearest Neighbor Contingency Tables.”
Journal of the Korean Statistical Society, 46(2), 219-245.
var.nnct
, var.tctI
, var.tctIII
, var.tctIV
and cov.tct
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) var.tct(ct,covN,"I") var.tct(ct,covN,2) var.tct(ct,covN,"III") var.tct(ct,covN,"IV") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) var.tct(ct,covN,"I") var.tct(ct,covN,2)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) var.tct(ct,covN,"I") var.tct(ct,covN,2) var.tct(ct,covN,"III") var.tct(ct,covN,"IV") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) var.tct(ct,covN,"I") var.tct(ct,covN,2)
Returns the variance of Pielou's coefficient of segregation for the two-class case
(i.e., based on NNCTs)in a
NNCT.
This variance is valid under RL or conditional on
and
under CSR.
See also (Ceyhan (2014)) for more detail.
varPseg.coeff(ct, covN)
varPseg.coeff(ct, covN)
ct |
A nearest neighbor contingency table |
covN |
The |
The variance of Pielou's coefficient of segregation for the two-class case.
Elvan Ceyhan
Ceyhan E (2014). “Segregation indices for disease clustering.” Statistics in Medicine, 33(10), 1662-1684.
Pseg.coeff
, seg.coeff
and var.seg.coeff
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) varPseg.coeff(ct,covN) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) varPseg.coeff(ct,covN) ############# ct<-matrix(sample(1:25,9),ncol=3) #varPseg.coeff(ct,covN)
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) ct<-nnct(ipd,cls) W<-Wmat(ipd) Qv<-Qvec(W)$q Rv<-Rval(W) varN<-var.nnct(ct,Qv,Rv) covN<-cov.nnct(ct,varN,Qv,Rv) varPseg.coeff(ct,covN) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ct<-nnct(ipd,fcls) varPseg.coeff(ct,covN) ############# ct<-matrix(sample(1:25,9),ncol=3) #varPseg.coeff(ct,covN)
Test statisticThis function estimates the variance of Cuzick and Edwards test statistic by Monte Carlo simulations
under the RL hypothesis.
The exact variance of is currently not available and (Cuzick and Edwards (1990)) say
that "The permutational variance of
becomes unwieldy for
and is more easily simulated", hence
we estimate the variance of
by RL of cases and controls to the given point data.
The argument cc.lab
is case-control label, 1 for case, 0 for control, if the argument case.lab
is NULL
,
then cc.lab
should be provided in this fashion, if case.lab
is provided, the labels are converted to 0's
and 1's accordingly. The argument Nsim
represents the number of resamplings (without replacement) in the
RL scheme, with default being 1000
.
See (Cuzick and Edwards (1990)).
See the function ceTkinv
for the details of the test.
varTkinv.sim(dat, k, cc.lab, Nsim = 1000, case.lab = NULL)
varTkinv.sim(dat, k, cc.lab, Nsim = 1000, case.lab = NULL)
dat |
The data set in one or higher dimensions, each row corresponds to a data point, |
k |
Integer specifying the number of the closest controls to subject |
cc.lab |
Case-control labels, 1 for case, 0 for control |
Nsim |
The number of simulations, i.e., the number of resamplings under the RL scheme to estimate the
variance of |
case.lab |
The label used for cases in the |
The simulation estimated variance of Cuzick and Edwards test statistic for disease clustering
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
set.seed(123) n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) n1<-sum(cls==1) k<-2 Nmc<-1000 varTkinv.sim(Y,k,cls,Nsim=Nmc) set.seed(1) varTrun.sim(Y,cls,Nsim=Nmc) set.seed(1) varTkinv.sim(Y,k=1,cls,Nsim=Nmc) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) varTkinv.sim(Y,k,fcls,Nsim=Nmc,case.lab="a")
set.seed(123) n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) n1<-sum(cls==1) k<-2 Nmc<-1000 varTkinv.sim(Y,k,cls,Nsim=Nmc) set.seed(1) varTrun.sim(Y,cls,Nsim=Nmc) set.seed(1) varTkinv.sim(Y,k=1,cls,Nsim=Nmc) #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) varTkinv.sim(Y,k,fcls,Nsim=Nmc,case.lab="a")
W
for the NN digraphReturns the matrix which is used to compute
,
and
values in the NN structure.
point eqnj is a NN of point
i.e.
if point
is a NN of point
and 0 otherwise.
The argument ties
is a logical argument (default=FALSE
) to take ties into account or not. If TRUE
the function
takes ties into account by making if point
is a NN of point
and there are
tied NNs and 0 otherwise. If
FALSE
, if point
is a NN of point
and 0 otherwise.
The matrix
is equivalent to
matrix with
, i.e.,
Wmat(X)=aij.mat(X,k=1)
.
The argument is.ipd
is a logical argument (default=TRUE
) to determine the structure of the argument x
.
If TRUE
, x
is taken to be the inter-point distance (IPD) matrix, and if FALSE
, x
is taken to be the data set
with rows representing the data points.
Wmat(x, ties = FALSE, is.ipd = TRUE, ...)
Wmat(x, ties = FALSE, is.ipd = TRUE, ...)
x |
The IPD matrix (if |
ties |
A logical parameter (default= |
is.ipd |
A logical parameter (default= |
... |
are for further arguments, such as |
The incidence matrix where
point eqnj is a NN of point
,
i.e.
if point
is a NN of point
and 0 otherwise.
Elvan Ceyhan
aij.mat
, aij.nonzero
, and aij.theta
n<-3 X<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(X) Wmat(ipd) Wmat(X,is.ipd = FALSE) n<-5 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) Wmat(ipd) Wmat(Y,is.ipd = FALSE) Wmat(Y,is.ipd = FALSE,method="max") Wmat(Y,is.ipd = FALSE) aij.mat(Y,k=1) #1D data points X<-as.matrix(runif(5)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) Wmat(ipd) Wmat(X,is.ipd = FALSE) #with ties=TRUE in the data Y<-matrix(round(runif(15)*10),ncol=3) ipd<-ipd.mat(Y) Wmat(ipd,ties=TRUE) Wmat(Y,ties=TRUE,is.ipd = FALSE)
n<-3 X<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(X) Wmat(ipd) Wmat(X,is.ipd = FALSE) n<-5 Y<-matrix(runif(3*n),ncol=3) ipd<-ipd.mat(Y) Wmat(ipd) Wmat(Y,is.ipd = FALSE) Wmat(Y,is.ipd = FALSE,method="max") Wmat(Y,is.ipd = FALSE) aij.mat(Y,k=1) #1D data points X<-as.matrix(runif(5)) # need to be entered as a matrix with one column #(i.e., a column vector), hence X<-runif(5) would not work ipd<-ipd.mat(X) Wmat(ipd) Wmat(X,is.ipd = FALSE) #with ties=TRUE in the data Y<-matrix(round(runif(15)*10),ncol=3) ipd<-ipd.mat(Y) Wmat(ipd,ties=TRUE) Wmat(Y,ties=TRUE,is.ipd = FALSE)
Test statisticAn object of class "Chisqtest"
performing a chi-square approximation for Cuzick and Edwards test statistic
based on the number of cases within
k
NNs of the cases in the data.
This approximation is suggested by Tango (2007) since statistic had high
skewness rendering the normal approximation less efficient. The chi-square approximation is as follows:
where
is a chi-square
random variable with
df, and
(see
SkewTk
for the skewness).
The argument cc.lab
is case-control label, 1 for case, 0 for control, if the argument case.lab
is NULL
,
then cc.lab
should be provided in this fashion, if case.lab
is provided, the labels are converted to 0's
and 1's accordingly.
The logical argument nonzero.mat
(default=FALSE
) is for using the matrix if
FALSE
or just the matrix of nonzero
locations in the matrix (if
TRUE
).
The logical argument asy.var
(default=FALSE
) is for using the asymptotic variance or the exact (i.e. finite
sample) variance for the variance of in its standardization.
If
asy.var=TRUE
, the asymptotic variance is used for (see
asyvarTk
), otherwise the exact
variance (see varTk
) is used.
See also (Tango (2007)) and the references therein.
Xsq.ceTk( dat, cc.lab, k, case.lab = NULL, nonzero.mat = TRUE, asy.var = FALSE, ... )
Xsq.ceTk( dat, cc.lab, k, case.lab = NULL, nonzero.mat = TRUE, asy.var = FALSE, ... )
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
cc.lab |
Case-control labels, 1 for case, 0 for control |
k |
Integer specifying the number of NNs (of subject |
case.lab |
The label used for cases in the |
nonzero.mat |
A logical argument (default is |
asy.var |
A logical argument (default is |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The chi-squared test statistic for Tango's chi-square approximation to Cuzick & Edwards' |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
Estimates, i.e., the observed |
est.name , est.name2
|
Names of the estimates, they are almost identical for this function. |
null.value |
Hypothesized null value for Cuzick & Edwards' |
method |
Description of the hypothesis test |
data.name |
Name of the data set, |
Elvan Ceyhan
Tango T (2007). “A class of multiplicity adjusted tests for spatial clustering based on case-control point data.” Biometrics, 63, 119-127.
set.seed(123) n<-20 Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) k<-sample(1:5,1) # try also 1, 3, 5, k Xsq.ceTk(Y,cls,k) Xsq.ceTk(Y,cls,k,nonzero.mat=FALSE) Xsq.ceTk(Y,cls+1,k,case.lab = 2) Xsq.ceTk(Y,cls,k,method="max") Xsq.ceTk(Y,cls,k,asy.var=TRUE)
set.seed(123) n<-20 Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) k<-sample(1:5,1) # try also 1, 3, 5, k Xsq.ceTk(Y,cls,k) Xsq.ceTk(Y,cls,k,nonzero.mat=FALSE) Xsq.ceTk(Y,cls+1,k,case.lab = 2) Xsq.ceTk(Y,cls,k,method="max") Xsq.ceTk(Y,cls,k,asy.var=TRUE)
An object of class "Chisqtest"
performing the hypothesis test of equality of the expected
values of the off-diagonal cell counts (i.e., entries) under RL or CSR in the NNCT for classes.
That is, the test performs Dixon's or Pielou's (first type of) overall NN symmetry test which is appropriate
(i.e. have the appropriate asymptotic sampling distribution)
for completely mapped data or for sparsely sample data, respectively.
(See Pielou (1961); Dixon (1994); Ceyhan (2014) for more detail).
The type="dixon"
refers to Dixon's overall NN symmetry test and
type="pielou"
refers to Pielou's first type of overall NN symmetry test.
The symmetry test is based on the chi-squared approximation of the corresponding quadratic form
and type="dixon"
yields an extension of Dixon's NN symmetry test, which is extended by
Ceyhan (2014) and type="pielou"
yields
Pielou's overall NN symmetry test.
The function yields the test statistic, -value and
df
which is , description of the
alternative with the corresponding null values (i.e. expected values) of differences of the off-diagonal entries,(which is
0 for this function) and also the sample estimates (i.e. observed values) of absolute differences of the off-diagonal entries of
NNCT (in the upper-triangular form).
The functions also provide names of the test statistics, the method and the data set used.
The null hypothesis is that all for
in the
NNCT (i.e., symmetry in the
mixed NN structure) for
.
In the output, if if
type="pielou"
,
the test statistic, -value and the df are valid only for (properly) sparsely sampled data.
See also (Pielou (1961); Dixon (1994); Ceyhan (2014)) and the references therein.
Xsq.nnsym(dat, lab, type = "dixon", ...)
Xsq.nnsym(dat, lab, type = "dixon", ...)
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
lab |
The |
type |
The type of the overall NN symmetry test with default= |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The chi-squared test statistic for Dixon's or Pielou's (first type of) overall NN symmetry test |
stat.names |
Name of the test statistic |
p.value |
The |
df |
Degrees of freedom for the chi-squared test, which is |
estimate |
Estimates, i.e., absolute differences of the off-diagonal entries of NNCT (in the upper-triangular form). |
est.name , est.name2
|
Names of the estimates, former is a shorter description of the estimates than the latter. |
null.value |
Hypothesized null values for the differences between the expected values of the off-diagonal entries, which is 0 for this function. |
method |
Description of the hypothesis test |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Pielou EC (1961).
“Segregation and symmetry in two-species populations as studied by nearest-neighbor relationships.”
Journal of Ecology, 49(2), 255-269.
Znnsym.ss
, Znnsym.dx
and Znnsym2cl
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) Xsq.nnsym(Y,cls) Xsq.nnsym(Y,cls,method="max") Xsq.nnsym(Y,cls,type="pielou") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Xsq.nnsym(Y,fcls) Xsq.nnsym(Y,fcls,type="pielou") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) Xsq.nnsym(Y,cls) Xsq.nnsym(Y,cls,type="pielou")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) Xsq.nnsym(Y,cls) Xsq.nnsym(Y,cls,method="max") Xsq.nnsym(Y,cls,type="pielou") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Xsq.nnsym(Y,fcls) Xsq.nnsym(Y,fcls,type="pielou") ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) Xsq.nnsym(Y,cls) Xsq.nnsym(Y,cls,type="pielou")
-test for Cuzick and Edwards
statisticAn object of class "htest"
performing a -test for Cuzick and Edwards
test statistic based on the
number of cases within
k
NNs of the cases in the data.
For disease clustering, Cuzick and Edwards (1990) suggested a k
-NN test based on number of cases
among
k
NNs of the case points.
Under RL of cases and
controls to the given locations in the study region,
approximately has
distribution for large
.
The argument cc.lab
is case-control label, 1 for case, 0 for control, if the argument case.lab
is NULL
,
then cc.lab
should be provided in this fashion, if case.lab
is provided, the labels are converted to 0's
and 1's accordingly.
Also, is identical to the count for cell
in the nearest neighbor contingency table (NNCT)
(See the function
nnct
for more detail on NNCTs).
Thus, the -test for
is same as the cell-specific
-test for cell
in the NNCT (see
cell.spec
).
The logical argument nonzero.mat
(default=TRUE
) is for using the matrix if
FALSE
or just the matrix of nonzero
locations in the matrix (if
TRUE
) in the computations.
The logical argument asy.var
(default=FALSE
) is for using the asymptotic variance or the exact (i.e. finite
sample) variance for the variance of in its standardization.
If
asy.var=TRUE
, the asymptotic variance is used for (see
asyvarTk
), otherwise the exact
variance (see varTk
) is used.
See also (Ceyhan (2014); Cuzick and Edwards (1990)) and the references therein.
ZceTk( dat, cc.lab, k, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, case.lab = NULL, nonzero.mat = TRUE, asy.var = FALSE, ... )
ZceTk( dat, cc.lab, k, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, case.lab = NULL, nonzero.mat = TRUE, asy.var = FALSE, ... )
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
cc.lab |
Case-control labels, 1 for case, 0 for control |
k |
Integer specifying the number of NNs (of subject |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
case.lab |
The label used for cases in the |
nonzero.mat |
A logical argument (default is |
asy.var |
A logical argument (default is |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
p.value |
The |
conf.int |
Confidence interval for the Cuzick and Edwards |
estimate |
Estimate of the parameter, i.e., the Cuzick and Edwards |
null.value |
Hypothesized null value for the Cuzick and Edwards |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014).
“Segregation indices for disease clustering.”
Statistics in Medicine, 33(10), 1662-1684.
Cuzick J, Edwards R (1990).
“Spatial clustering for inhomogeneous populations (with discussion).”
Journal of the Royal Statistical Society, Series B, 52, 73-104.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) k<-1 #try also 2,3, sample(1:5,1) ZceTk(Y,cls,k) ZceTk(Y,cls,k,nonzero.mat=FALSE) ZceTk(Y,cls,k,method="max") ZceTk(Y,cls+1,k,case.lab = 2,alt="l") ZceTk(Y,cls,k,asy.var=TRUE,alt="g")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) k<-1 #try also 2,3, sample(1:5,1) ZceTk(Y,cls,k) ZceTk(Y,cls,k,nonzero.mat=FALSE) ZceTk(Y,cls,k,method="max") ZceTk(Y,cls+1,k,case.lab = 2,alt="l") ZceTk(Y,cls,k,asy.var=TRUE,alt="g")
An object of class "cellhtest"
performing hypothesis test of equality of the expected values of the
off-diagonal cell counts (i.e., entries) for each pair of classes under RL or CSR in the NNCT
for
classes.
That is, the test performs Dixon's or Pielou's (first type of) NN symmetry test which is appropriate
(i.e. have the appropriate asymptotic sampling distribution)
for completely mapped data or for sparsely sample data, respectively.
(See Pielou (1961); Dixon (1994); Ceyhan (2014) for more detail).
The type="dixon"
refers to Dixon's NN symmetry test and
type="pielou"
refers to Pielou's first type of NN symmetry test.
The symmetry test is based on the normal approximation of the difference of the off-diagonal entries
in the NNCT and are due to Pielou (1961); Dixon (1994).
The function yields a contingency table of the test statistics, -values for the corresponding
alternative, expected values (i.e. null value(s)), lower and upper confidence levels and sample estimate
for the
values for
(all in the upper-triangular form except for the null value, which is 0
for all pairs) and also names of the test statistics, estimates, null values and the method and the data
set used.
The null hypothesis is that all for
in the
NNCT (i.e., symmetry in the
mixed NN structure) for
.
In the output, if if
type="pielou"
,
the test statistic, -value and the lower and upper confidence limits are valid only
for (properly) sparsely sampled data.
See also (Pielou (1961); Dixon (1994); Ceyhan (2014)) and the references therein.
Znnsym( dat, lab, type = "dixon", alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
Znnsym( dat, lab, type = "dixon", alternative = c("two.sided", "less", "greater"), conf.level = 0.95, ... )
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
lab |
The |
type |
The type of the NN symmetry test with default= |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
stat.names |
Name of the test statistics |
p.value |
The |
LCL , UCL
|
Matrix of Lower and Upper Confidence Levels (in the upper-triangular form) for the |
conf.int |
The confidence interval for the estimates, it is |
cnf.lvl |
Level of the upper and lower confidence limits (i.e., conf.level) of the differences of the off-diagonal entries. |
estimate |
Estimates of the parameters, i.e., matrix of the difference of the off-diagonal entries
(in the upper-triangular form) of the |
est.name , est.name2
|
Names of the estimates, former is a shorter description of the estimates than the latter. |
null.value |
Hypothesized null value for the expected difference between the off-diagonal entries,
|
null.name |
Name of the null values |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Pielou EC (1961).
“Segregation and symmetry in two-species populations as studied by nearest-neighbor relationships.”
Journal of Ecology, 49(2), 255-269.
Znnsym.ss.ct
, Znnsym.ss
, Znnsym.dx.ct
,
Znnsym.dx
and Znnsym2cl
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) Znnsym(Y,cls) Znnsym(Y,cls,method="max") Znnsym(Y,cls,type="pielou") Znnsym(Y,cls,type="pielou",method="max") Znnsym(Y,cls,alt="g") Znnsym(Y,cls,type="pielou",alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Znnsym(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) Znnsym(Y,cls) Znnsym(Y,cls,type="pielou")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) Znnsym(Y,cls) Znnsym(Y,cls,method="max") Znnsym(Y,cls,type="pielou") Znnsym(Y,cls,type="pielou",method="max") Znnsym(Y,cls,alt="g") Znnsym(Y,cls,type="pielou",alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) Znnsym(Y,fcls) ############# n<-40 Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:4,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) Znnsym(Y,cls) Znnsym(Y,cls,type="pielou")
An object of class "htest"
performing hypothesis test of equality of the expected value of the off-diagonal
cell counts (i.e., entries) under RL or CSR in the NNCT for classes.
That is, the test performs Dixon's or Pielou's (first type of) NN symmetry test which is appropriate
(i.e. have the appropriate asymptotic sampling distribution)
for completely mapped data and for sparsely sample data, respectively.
(See Ceyhan (2014) for more detail).
The symmetry test is based on the normal approximation of the difference of the off-diagonal entries in the NNCT and are due to Pielou (1961); Dixon (1994).
The type="dixon"
refers to Dixon's NN symmetry test and
type="pielou"
refers to Pielou's first type of NN symmetry test.
The function yields the test statistic, -value for the
corresponding alternative, the confidence interval, estimate and null value for the parameter of interest
(which is the difference of the off-diagonal entries in the NNCT), and method and name of the data set used.
The null hypothesis is that all in the
NNCT (i.e., symmetry in the
mixed NN structure).
See also (Pielou (1961); Dixon (1994); Ceyhan (2014)) and the references therein.
Znnsym2cl( dat, lab, type = "dixon", alternative = c("two.sided", "less", "greater"), conf.level = 0.95 )
Znnsym2cl( dat, lab, type = "dixon", alternative = c("two.sided", "less", "greater"), conf.level = 0.95 )
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
lab |
The |
type |
The type of the NN symmetry test with default= |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
A list
with the elements
statistic |
The |
p.value |
The |
conf.int |
Confidence interval for the difference of the off-diagonal entries, |
estimate |
Estimate, i.e., the difference of the off-diagonal entries of the |
null.value |
Hypothesized null value for the expected difference between the off-diagonal entries,
|
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014).
“Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations.”
The Scientific World Journal, Volume 2014, Article ID 698296.
Dixon PM (1994).
“Testing spatial segregation using a nearest-neighbor contingency table.”
Ecology, 75(7), 1940-1948.
Pielou EC (1961).
“Segregation and symmetry in two-species populations as studied by nearest-neighbor relationships.”
Journal of Ecology, 49(2), 255-269.
Znnsym2cl.ss.ct
, Znnsym2cl.ss
, Znnsym2cl.dx.ct
,
Znnsym2cl.dx
, Znnsym.ss.ct
, Znnsym.ss
, Znnsym.dx.ct
,
Znnsym.dx
, Znnsym.dx.ct
, Znnsym.dx
and Znnsym
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) Znnsym2cl(Y,cls) Znnsym2cl(Y,cls,type="pielou") Znnsym2cl(Y,cls,alt="g") Znnsym2cl(Y,cls,type="pielou",alt="g")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(1:2,n,replace = TRUE) #or try cls<-rep(1:2,c(10,10)) Znnsym2cl(Y,cls) Znnsym2cl(Y,cls,type="pielou") Znnsym2cl(Y,cls,alt="g") Znnsym2cl(Y,cls,type="pielou",alt="g")
-test for Cuzick and Edwards
statisticAn object of class "htest"
performing a -test for Cuzick and Edwards
test statisticin disease clustering,
where
is a linear combination of some
tests.
For disease clustering, Cuzick and Edwards (1990) developed a -NN test
based on
number of cases among
NNs of the case points, and also proposed a test combining various
tests,
denoted as
.
See page 87 of (Cuzick and Edwards (1990)) for more details.
Under RL of cases and
controls to the given locations in the study region,
approximately has
distribution for large
.
The argument cc.lab
is case-control label, 1 for case, 0 for control, if the argument case.lab
is NULL
,
then cc.lab
should be provided in this fashion, if case.lab
is provided, the labels are converted to 0's
and 1's accordingly.
The argument klist
is the vector
of integers specifying the indices of the values used
in obtaining the
.
The logical argument nonzero.mat
(default=TRUE
) is for using the matrix if
FALSE
or just the matrix of nonzero
locations in the matrix (if
TRUE
) in the computations.
The logical argument asy.cov
(default=FALSE
) is for using the asymptotic covariance or the exact (i.e. finite
sample) covariance for the vector of values used in
Tcomb
in the standardization of .
If
asy.cov=TRUE
, the asymptotic covariance is used, otherwise the exact covariance is used.
See also (Ceyhan (2014); Cuzick and Edwards (1990)) and the references therein.
ZTcomb( dat, cc.lab, klist, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, case.lab = NULL, nonzero.mat = TRUE, asy.cov = FALSE, ... )
ZTcomb( dat, cc.lab, klist, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, case.lab = NULL, nonzero.mat = TRUE, asy.cov = FALSE, ... )
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
cc.lab |
Case-control labels, 1 for case, 0 for control |
klist |
|
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
case.lab |
The label used for cases in the |
nonzero.mat |
A logical argument (default is |
asy.cov |
A logical argument (default is |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
p.value |
The |
conf.int |
Confidence interval for the Cuzick and Edwards |
estimate |
Estimate of the parameter, i.e., the Cuzick and Edwards |
null.value |
Hypothesized null value for the Cuzick and Edwards |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
data.name |
Name of the data set, |
Elvan Ceyhan
Ceyhan E (2014).
“Segregation indices for disease clustering.”
Statistics in Medicine, 33(10), 1662-1684.
Cuzick J, Edwards R (1990).
“Spatial clustering for inhomogeneous populations (with discussion).”
Journal of the Royal Statistical Society, Series B, 52, 73-104.
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) kl<-sample(1:5,3) #try also sample(1:5,2) ZTcomb(Y,cls,kl) ZTcomb(Y,cls,kl,method="max") ZTcomb(Y,cls,kl,nonzero.mat=FALSE) ZTcomb(Y,cls+1,kl,case.lab = 2,alt="l") ZTcomb(Y,cls,kl,conf=.9,alt="g") ZTcomb(Y,cls,kl,asy=TRUE,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ZTcomb(Y,fcls,kl,case.lab="a")
n<-20 #or try sample(1:20,1) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) kl<-sample(1:5,3) #try also sample(1:5,2) ZTcomb(Y,cls,kl) ZTcomb(Y,cls,kl,method="max") ZTcomb(Y,cls,kl,nonzero.mat=FALSE) ZTcomb(Y,cls+1,kl,case.lab = 2,alt="l") ZTcomb(Y,cls,kl,conf=.9,alt="g") ZTcomb(Y,cls,kl,asy=TRUE,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ZTcomb(Y,fcls,kl,case.lab="a")
-test for Cuzick and Edwards
statisticAn object of class "htest"
performing a -test for Cuzick and Edwards
test statistic
which is based on the number of consecutive cases from the cases in the data under RL or CSR independence.
Under RL of cases and
controls to the given locations in the study region,
approximately has
distribution for large
.
The argument cc.lab
is case-control label, 1 for case, 0 for control, if the argument case.lab
is NULL
,
then cc.lab
should be provided in this fashion, if case.lab
is provided, the labels are converted to 0's
and 1's accordingly.
The logical argument var.sim (default=FALSE
) is for using the simulation estimated variance or the exact
variance for the variance of in its standardization.
If
var.sim=TRUE
, the simulation estimated variance is used for (see
varTrun.sim
),
otherwise the exact variance (see varTrun
) is used.
Moreover, when var.sim=TRUE
, the argument Nvar.sim
represents the number of resamplings
(without replacement) in the RL scheme, with default being 1000
.
The function varTrun
might take a very long time when data size is large (even larger than 50);
in this case, it is recommended to use var.sim=TRUE
in this function.
See also (Cuzick and Edwards (1990)) and the references therein.
ZTrun( dat, cc.lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, case.lab = NULL, var.sim = FALSE, Nvar.sim = 1000, ... )
ZTrun( dat, cc.lab, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, case.lab = NULL, var.sim = FALSE, Nvar.sim = 1000, ... )
dat |
The data set in one or higher dimensions, each row corresponds to a data point. |
cc.lab |
Case-control labels, 1 for case, 0 for control |
alternative |
Type of the alternative hypothesis in the test, one of |
conf.level |
Level of the upper and lower confidence limits, default is |
case.lab |
The label used for cases in the |
var.sim |
A logical argument (default is |
Nvar.sim |
The number of simulations, i.e., the number of resamplings under the RL scheme to estimate the
variance of |
... |
are for further arguments, such as |
A list
with the elements
statistic |
The |
p.value |
The |
conf.int |
Confidence interval for the Cuzick and Edwards |
estimate |
Estimate of the parameter, i.e., the Cuzick and Edwards |
null.value |
Hypothesized null value for the Cuzick and Edwards |
alternative |
Type of the alternative hypothesis in the test, one of |
method |
Description of the hypothesis test |
data.name |
Name of the data set, |
Elvan Ceyhan
Cuzick J, Edwards R (1990). “Spatial clustering for inhomogeneous populations (with discussion).” Journal of the Royal Statistical Society, Series B, 52, 73-104.
n<-20 #or try sample(1:20,1) #try also 40, 50, 60 set.seed(123) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) ZTrun(Y,cls) ZTrun(Y,cls,method="max") ZTrun(Y,cls,var.sim=TRUE) ZTrun(Y,cls+1,case.lab = 2,alt="l") #try also ZTrun(Y,cls,conf=.9,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ZTrun(Y,fcls,case.lab="a")
n<-20 #or try sample(1:20,1) #try also 40, 50, 60 set.seed(123) Y<-matrix(runif(3*n),ncol=3) cls<-sample(0:1,n,replace = TRUE) #or try cls<-rep(0:1,c(10,10)) ZTrun(Y,cls) ZTrun(Y,cls,method="max") ZTrun(Y,cls,var.sim=TRUE) ZTrun(Y,cls+1,case.lab = 2,alt="l") #try also ZTrun(Y,cls,conf=.9,alt="g") #cls as a factor na<-floor(n/2); nb<-n-na fcls<-rep(c("a","b"),c(na,nb)) ZTrun(Y,fcls,case.lab="a")