Package 'OnomasticDiversity'

Title: Onomastic Diversity Measures
Description: Different measures which can be used to quantify similarities between regions. These measures are isonymy, isonymy between, Lasker distance, coefficients of Hedrick and Nei. In addition, it calculates biodiversity indices such as Margalef, Menhinick, Simpson, Shannon, Shannon-Wiener, Sheldon, Heip, Hill Numbers, Geometric Mean and Cressie and Read statistics.
Authors: Maria Jose Ginzo Villamayor [aut, cre]
Maintainer: Maria Jose Ginzo Villamayor <[email protected]>
License: GPL-2
Version: 0.1
Built: 2024-11-05 03:56:28 UTC
Source: https://github.com/cran/OnomasticDiversity

Help Index


Onomastic Diversity Measures

Description

Different measures which can be used to quantify similarities between regions. These measures are isonymy, isonymy between, Lasker distance, coefficients of Hedrick and Nei. In addition, it calculates biodiversity indices such as Margalef, Menhinick, Simpson, Shannon, Shannon-Wiener, Sheldon, Heip, Hill Numbers, Geometric Mean and Cressie and Read statistics.

Details

The DESCRIPTION file:

Package: OnomasticDiversity
Type: Package
Title: Onomastic Diversity Measures
Version: 0.1
Date: 2024-02-07
Authors@R: c(person("Maria Jose", "Ginzo Villamayor", role = c("aut", "cre"),email="[email protected]"))
Author: Maria Jose Ginzo Villamayor [aut, cre]
Maintainer: Maria Jose Ginzo Villamayor <[email protected]>
Depends: R(>= 4.2.0)
Imports: sqldf
Description: Different measures which can be used to quantify similarities between regions. These measures are isonymy, isonymy between, Lasker distance, coefficients of Hedrick and Nei. In addition, it calculates biodiversity indices such as Margalef, Menhinick, Simpson, Shannon, Shannon-Wiener, Sheldon, Heip, Hill Numbers, Geometric Mean and Cressie and Read statistics.
License: GPL-2
LazyLoad: yes
Packaged: 2024-02-07 09:00:25 UTC; mjginzo
Encoding: UTF-8
NeedsCompilation: no
RoxygenNote: 7.2.3
Date/Publication: 2024-02-08 21:10:09 UTC
Repository: https://mjginzo.r-universe.dev
RemoteUrl: https://github.com/cran/OnomasticDiversity
RemoteRef: HEAD
RemoteSha: c8464d2eba77b70b3a23cd7b65487bbe8891a1ff

Index of help topics:

OnomasticDiversity-package
                        Onomastic Diversity Measures
fCressieRead            Cressie and Read
fGeneralisedMean        Calculate the Generalised Mean
fGeometricMean          Calculate the Geometric Mean
fHeip                   Calculate the Heip's diversity index
fHill                   Calculate the Hill's diversity numbers
fIsonymy                Calculate the Isonymy within a region
fIsonymyAll             Calculate the Isonymy, Isonymy between regions,
                        Lasker distances, Euclidean distance and Nei's
                        distances
fMargalef               Calculate the Margalef's diversity index
fMenhinick              Calculate the Menhinick's diversity index
fPielou                 Calculate the Pielou's diversity index
fShannon                Calculate the Shannon-Weaver diversity index
fSheldon                Calculate the Sheldon's diversity index
fSimpson                Calculate the Simpson's diversity index
fSimpsonInf             Calculate the Simpson's diversity index and the
                        inverse
namesmengal16           namesmengal16 data
nameswomengal16         nameswomengal16 data
surnamesgal14           surnamesgal14 data

This package computes the different measures which can be used to quantify similarities between regions. These measures are isonymy, isonymy between, Lasker distance, coefficients of Hedrick and Nei. A diversity index is a numerical measure of how many different types (such as species) are present in a dataset (a community), as well as the evolutionary relationships among the individuals distributed throughout those types, such as richness, divergence, and evenness. These indicators are numerical representations of biodiversity in several dimensions (richness, evenness, and dominance). Then, this package calculates biodiversity indices such as Margalef, Menhinick, Simpson, Shannon, Shannon-Wiener Sheldon, Heip, Hill Numbers, Geometric Mean and Cressie and Read statistics.

Author(s)

Maria Jose Ginzo Villamayor [aut, cre]

Maintainer: Maria Jose Ginzo Villamayor <[email protected]>

References

Buckland, S.T., Studeny, A.C., Magurran, A.E., Illian, J.B., & Newson, S.E. (2011). The geometric mean of relative abundance indices: a biodiversity measure with a difference. Ecosphere, 2(9), art.100. <https://doi.org/10.1890/ES11-00186.1>

Cressie, Noel and Read, Timothy RC (1984) Multinomial goodness-of-fit tests. Computational Statistics and Data Analysis, 46(3), 440–464. <http://www.jstor.org/stable/2345686>

Sheldon, A. L. (1969). Equitability indices: dependence on the species count. Ecology, 50, 466–467. <https://doi.org/10.2307/1933900>

Simpson (1949) Measurement of diversity. Nature, 163. <https://doi.org/10.1038/163688a0>

Studeny, A.C. (2012). Quantifying Biodiversity Trends in Time and Space. PhD thesis, University of St Andrews. <https://research-repository.st-andrews.ac.uk/bitstream/handle/10023/3414/AngelikaStudenyPhDThesis.pdf?sequence=3&isAllowed=y>

van Strien, A.J., Soldaat, L.L., & Gregory, R.D. (2012). Desirable mathematical properties of indicators for biodiversity change. Ecological Indicators, 14, 202–208. <https://doi.org/10.1016/j.ecolind.2011.07.007>

See Also

fCressieRead, fGeneralisedMean, fGeometricMean, fHeip, fHill, fIsonymy, fIsonymyAll, fMargalef, fMenhinick, fPielou, fShannon, fSheldon, fSimpson, fSimpsonInf, fGeneralisedMean, fHeip


Cressie and Read

Description

This function obtains the Cressie and Read statistics introduced by Noel Cressie and Timothy Read. It is a method for quantifying species biodiversity that can be adapted to the context of onomastics.

Usage

fCressieRead(x, number, population, ni, location, lambda)

Arguments

x

dataframe of the data values.

number

name of a variable which represents number of individuals of each species.

population

name of variable which represents total number of individuals.

ni

name of variable which represent number of species.

location

name of variable which represent represents the grouping element.

lambda

free parameter.

Details

For a community ii, Cressie and Read (1984) introduced the following parametric form for a generalised statistic In(λ)=2λ(λ+1)kSinki[(nkin/Si)λ1]I_n (\lambda) = \frac{2}{\lambda(\lambda+1)} \sum_{k\in S_i} { n_{ki} \left[ \left(\frac{n_{ki}}{n/S_i}\right)^\lambda-1\right]}, where nkin_{ki} represents the number of individuals of species kk in a sample (in the population is NkiN_{ki}), SiS_i represents all species at the community, species richness, and λ\lambda is a free parameter.

Varying the value of λ\lambda gets different statistics. If λ=1\lambda= -1 and λ=0\lambda= 0, In(λ)I_n(\lambda) is not defined, but in any case, limits λ=1\lambda = -1 and λ=0\lambda = 0 can be taken.

In onomastic context, nkin_{ki} (Nki\approx N_{ki}) denotes the absolute frequency of surname kk in region ii (\approx community diversity context ii).

Value

A dataframe containing the following components:

location

represents the grouping element, for example the communities / regions.

cressieRead

the value of Cressie and Read statistics.

Author(s)

Maria Jose Ginzo Villamayor

References

Cressie, Noel and Read, Timothy RC (1984) Multinomial goodness-of-fit tests. Computational Statistics and Data Analysis, 46(3), 440–464.

See Also

fHill

Examples

data(surnamesgal14)
result = fCressieRead(x= surnamesgal14 , number="number",
population="population", location = "muni", ni="ni",
lambda = 2)
result

Calculate the Generalised Mean

Description

This function obtains the generalised mean of relative abundances for a collection of species introduced by Angelika C. Studeny. It is a method for quantifying species biodiversity that can be adapted to the context of onomastics.

Usage

fGeneralisedMean (x, pki, pki0, s, location, lambda)

Arguments

x

dataframe of the data values for each species not null (because if you have a sample, there might be species that are not represented).

pki

name of a variable which represents the relative frequency for each species.

pki0

variable which represents the relative frequency for each species not null (because if you have a sample, there might be species that are not represented).

location

name of a variable which represents the grouping element.

s

vector which represents total number of species.

lambda

free parameter.

Details

For a community ii, the generalised mean of relative abundances is defined by Mt(λ)=[1SikSi(NkitNkit0)λ]1λM_t (\lambda) = \left[\frac{1}{S_i} \sum_{k\in S_i} \left(\frac{N_{ki}^t}{N_{ki}^{t0}}\right)^\lambda\right]^{\frac{1}{\lambda}}, where NkitN_{ki}^t denotes the number of individuals of species kk at times tt, t0t0 is the baseline year and SiS_i are all species at the community, species richness, and λ\lambda can be any non-zero real number.

In onomastic context, NkitN_{ki}^t denotes the absolute frequency of surname kk in region (\approx community diversity context) ii at times tt.

Value

A dataframe containing the following components:

location

represents the grouping element, for example the communities / regions.

generalisedMean

the value of generalised mean.

Author(s)

Maria Jose Ginzo Villamayor

References

Studeny, A.C. (2012). Quantifying Biodiversity Trends in Time and Space. PhD thesis, University of St Andrews.

See Also

fMargalef, fMenhinick, fPielou, fShannon, fSheldon, fSimpson, fSimpsonInf, fGeometricMean, fHeip

Examples

library(sqldf)
data(surnamesgal14)

loc <- length(unique(surnamesgal14$muni))

apes2=sqldf('select  muni, count(surname) as ni,
sum(number) as population from surnamesgal14
group by muni;')

result = fGeneralisedMean(x= surnamesgal14[surnamesgal14$number != 0,],
pki="pki", pki0=surnamesgal14[surnamesgal14$number != 0,"pki"],
location  = "muni", s = apes2$ni[1:loc], lambda = 1 )
result

data(namesmengal16)

loc <- length(unique(namesmengal16$muni))

namesmengal16$pki <- (namesmengal16$number /
namesmengal16$population)

names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from namesmengal16
group by muni;')

result = fGeneralisedMean(x= namesmengal16[namesmengal16$number != 0,],
pki="pki", pki0=namesmengal16[namesmengal16$number != 0,"pki"],
location  = "muni", s = names2$ni[1:loc], lambda = 1 )
result

data(nameswomengal16)

loc <- length(unique(nameswomengal16$muni))

nameswomengal16$pki <- (nameswomengal16$number /
nameswomengal16$population)

names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from nameswomengal16
group by muni;')

result = fGeneralisedMean(x= nameswomengal16[nameswomengal16$number != 0,],
pki="pki", pki0=nameswomengal16[nameswomengal16$number != 0,"pki"],
location  = "muni", s = names2$ni[1:loc], lambda = 1 )
result

Calculate the Geometric Mean

Description

This function obtains the geometric mean introduced by Stephen Terrence Buckland and coauthors. It is a method for quantifying species biodiversity that can be adapted to the context of onomastic.

Usage

fGeometricMean(x, pki, pki0, s, location)

Arguments

x

dataframe of the data values for each species not null (because if you have a sample, there might be species that are not represented).

pki

name of a variable which represents the relative frequency for each species.

pki0

name of a variable which represents the relative frequency for each species at initial time point.

s

vector which represents total number of species.

location

represents the grouping element.

Details

For a community ii, the geometric mean of relative abundances is defined by Gt=exp(1SikSilogNkitNkit0)G_t = \exp \left(\frac{1}{S_i} \sum_{k\in S_i} \log \frac{N_{ki}^t}{N_{ki}^{t_0}}\right), where NkitN_{ki}^t denotes the number of individuals of species kk at times $t$, t0t_0 is the baseline year and SiS_i are all species at the community, species richness.

In onomastic context, NkitN_{ki}^t denotes the absolute frequency of surname kk in region (\approx community diversity context) ii at times tt.

Value

A dataframe containing the following components:

location

represents the grouping element, for example the communities / regions.

geometricMean

the value of geometric mean.

Author(s)

Maria Jose Ginzo Villamayor

References

Buckland, S.T., Studeny, A.C., Magurran, A.E., Illian, J.B., & Newson, S.E. (2011). The geometric mean of relative abundance indices: a biodiversity measure with a difference. Ecosphere, 2(9), art.100.

Studeny, A.C. (2012). Quantifying Biodiversity Trends in Time and Space. PhD thesis, University of St Andrews.

van Strien, A.J., Soldaat, L.L., & Gregory, R.D. (2012). Desirable mathematical properties of indicators for biodiversity change. Ecological Indicators, 14, 202–208.

See Also

fMargalef, fMenhinick, fPielou, fShannon, fSheldon, fSimpson, fSimpsonInf, fGeneralisedMean, fHeip

Examples

library(sqldf)
data(surnamesgal14)
loc <- length(unique(surnamesgal14$muni))

apes2=sqldf('select  muni, count(surname) as ni,
sum(number) as population from surnamesgal14
group by muni;')
surnamesgal14$pki0 <- surnamesgal14$pki

result = fGeometricMean (x= surnamesgal14[surnamesgal14$number != 0,],
pki="pki", pki0="pki0" , location  = "muni",
s = apes2$ni[1:loc])
result

data(namesmengal16)
loc <- length(unique(namesmengal16$muni))

names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from namesmengal16
group by muni;')

namesmengal16$pki <- (namesmengal16$number /
namesmengal16$population)
namesmengal16$pki0 <- namesmengal16$pki

result = fGeometricMean (x= namesmengal16[namesmengal16$number != 0,],
pki="pki", pki0="pki0" , location  = "muni",
s = names2$ni[1:loc])
result

data(nameswomengal16)
loc <- length(unique(nameswomengal16$muni))

names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from nameswomengal16
group by muni;')

nameswomengal16$pki <- (nameswomengal16$number /
nameswomengal16$population)
nameswomengal16$pki0 <- nameswomengal16$pki

result = fGeometricMean (x= nameswomengal16[nameswomengal16$number != 0,], 
pki = "pki", pki0 = "pki0", location  = "muni", 
s = names2$ni[1:loc])
result

Calculate the Heip's diversity index

Description

This function obtains the Heip's diversity index introduced by Carlo H. R. Heip. It is a method for quantifying species biodiversity that can be adapted to the context of onomastic.

Usage

fHeip (x, k, n, location, s)

Arguments

x

dataframe of the data values for each species not null (because if you have a sample, there might be species that are not represented).

k

name of a variable which represents absolute frequency for each species.

n

name of a variable which represents total number of individuals.

location

represents the grouping element.

s

vector which represents total number of species.

Details

For a community ii, the Heip's diversity index is defined by EHe=2H1Si1E_{He} = \frac{2^{H^{\prime}}-1}{S_i-1} where HH^{\prime} is the Shannon diversity index and SiS_i are all species at the community, species richness. This index varies from 0 to 1 and measures how equally the species richness contributes to the total abundance of the community.

In onomastic context, SiS_i are all surnames in region (\approx community diversity context) ii.

Value

A dataframe containing the following components:

location

represents the grouping element, for example the communities / regions.

heip

the value of the Heip's diversity index.

Author(s)

Maria Jose Ginzo Villamayor

References

Heip, C. (1974). A New Index Measuring Evenness. Journal of the Marine Biological Association of the United Kingdom, 54(3), 555–557.

See Also

fMargalef, fMenhinick, fPielou, fShannon, fSheldon, fSimpson, fSimpsonInf, fGeneralisedMean, fGeometricMean.

Examples

library(sqldf)
data(surnamesgal14)
loc <- length(unique(surnamesgal14$muni))


apes2=sqldf('select  muni, count(surname) as ni,
sum(number) as population from surnamesgal14
group by muni;')


result = fHeip (x= surnamesgal14[surnamesgal14$number != 0,],
k="number", n="population", location  = "muni",
s = apes2$ni[1:loc] )
result

data(namesmengal16)
loc <- length(unique(namesmengal16$muni))


names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from namesmengal16
group by muni;')


result = fHeip (x= namesmengal16[namesmengal16$number != 0,],
k="number", n="population", location  = "muni",
s = names2$ni[1:loc] )
result


data(nameswomengal16)
loc <- length(unique(nameswomengal16$muni))


names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from nameswomengal16
group by muni;')


result = fHeip (x= nameswomengal16[nameswomengal16$number != 0,],
k="number", n="population", location  = "muni",
s = names2$ni[1:loc] )
result

Calculate the Hill's diversity numbers

Description

This function obtains the Hill's diversity numbers introduced by M. O. Hill. It is a method for quantifying species biodiversity that can be adapted to the context of onomastic.

Usage

fHill(x, k, n, location, lambda)

Arguments

x

dataframe of the data values for each species.

k

name of a variable which represents absolute frequency for each species.

n

name of a variable which represents total number of individuals.

location

represents the grouping element.

lambda

free parameter.

Details

For a community ii, the Hill's diversity numbers are defined by the expression J(λ)=(kSipkiλ)11λJ(\lambda) = \left(\sum \limits_{k\in S_i} p_{ki}^\lambda\right)^{\frac{1}{1-\lambda}} with the restriction λ0\lambda \geq 0 where pkip_{ki} represents the relative frequency of species kk and SiS_i are all species at the community, species richness, and λ\lambda is a free parameter. (This is equivalent to the exponential of Renyi's generalised entropy). The Renyi entropy of order λ\lambda, where λ0\lambda \geq 0 and λ1\lambda \neq 1, is defined as Hλ(X)=11λlog(i=1npiλ)\mathrm{H}_{\lambda}(X)=\frac{1}{1-\lambda} \log \left(\sum \limits_{i=1}^{n} p_{i}^{\lambda}\right) Here, XX is a discrete random variable with possible outcomes in the set A={x1,x2,,xn}\mathcal{A}=\left\{x_{1}, x_{2}, \ldots, x_{n}\right\} and corresponding probabilities piPr(X=xi)p_{i} \doteq \operatorname{Pr}\left(X=x_{i}\right) for i=1,,ni=1, \ldots, n. The logarithm is conventionally taken to be base 2, especially in the context of information theory where bits are used. If the probabilities are pi=1/np_{i}=1 / n for all i=1,,ni=1, \ldots, n, then all the Renyi entropies of the distribution are equal: Hλ(X)=logn\mathrm{H}_{\lambda}(X)=\log n. In general, for all discrete random variables X,Hλ(X)X, \mathrm{H}_{\lambda}(X) is a non-increasing function in λ\lambda..

Particular cases of λ\lambda values: λ=0,J(0)=Si\lambda = 0, J(0)=S_i, it corresponds species richness; λ=1,J(1)=eHt\lambda = 1, J(1)=e^{H_{t}}, it corresponds the exponential of Shannon's entropy; and λ=2,J(2)=DSi\lambda = 2, J(2)= D_{S_i}, it corresponds the 'inverse' Simpson index.

In onomastic context, pkip_{ki} denotes the relative frequency of surname kk in region (\approx community diversity context) ii and SiS_i are all surnames in region ii.

Value

A dataframe containing the following components:

location

represents the grouping element, for example the communities / regions.

hill

the value of the Hill's diversity index.

Author(s)

Maria Jose Ginzo Villamayor

References

Hill, M. O. (1973). Diversity and Evenness: a unifying notation and its consequences. Ecology, 54, 427–32.

See Also

fCressieRead.

Examples

data(surnamesgal14)
result = fHill (x= surnamesgal14, k="number", n="population",
location  = "muni", lambda= 0)
result

data(namesmengal16)
result = fHill (x= namesmengal16, k="number", n="population",
location  = "muni", lambda= 0)
result

data(nameswomengal16)
result = fHill (x= nameswomengal16, k="number", n="population",
location  = "muni", lambda= 0)
result

Calculate the Isonymy within a region

Description

This function obtains the isonymy within a region ii which has an associated collection SiS_i of surnames.

Usage

fIsonymy(x, category)

Arguments

x

a vector of relative frequency squared for each surname.

category

represents the grouping element, for example the regions.

Details

Isonymy is defined as Ii=kSipki2I_i=\sum\limits_{k\in S_i}p_{ki}^2 where pkip_{ki} denotes the relative frequency of surname kk in region ii.

In diversity context, pkip_{ki} denotes the relative frequency of species kk in community (\approx region onomastic context) ii and SiS_i are all species in community ii.

Value

A dataframe containing the following components:

category

represents the grouping element, for example the regions / communities.

x

the value of isonymy.

Author(s)

Maria Jose Ginzo Villamayor

References

Crow J.F. and Mange A.P., (1965). Measurement of inbreeding from the frequency of marriages between persons of the same surname. Eugenics Quarterly, 12(4), 199–203.

Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E., and Rodriguez–Larralde, A., (1996). Isonymy and the genetic structure of Switzerland. I: The distributions of surnames. Annals of Human Biology, 23, 431–455.

See Also

fIsonymyAll.

Examples

data(surnamesgal14)
surnamesgal14$pki2 <- (surnamesgal14$number / surnamesgal14$population)^2
result = fIsonymy(surnamesgal14$pki2, surnamesgal14$namuni)
result

data(namesmengal16)
namesmengal16$pki2 <- (namesmengal16$number / namesmengal16$population)^2
result = fIsonymy(namesmengal16$pki2, namesmengal16$namuni)
result

data(nameswomengal16)
nameswomengal16$pki2 <- (nameswomengal16$number / nameswomengal16$population)^2
result = fIsonymy(nameswomengal16$pki2, nameswomengal16$namuni)
result

Calculate the Isonymy, Isonymy between regions, Lasker distances, Euclidean distance and Nei's distances

Description

This function obtains the Isonymy, Isonymy between regions, Lasker distance, Euclidean distance and Nei's distances and Hedrick's coefficient.

Usage

fIsonymyAll (x, n, location, union, measure)

Arguments

x

data frame with the data.

n

number of the locations in the data frame.

location

name of a variable which represents the location in the data.

union

variable to be used to search for matching surnames in two locations.

measure

name of a variable which represents the relative frequency for each surname.

Details

Values of Isonymy, Isonymy between regions, Lasker distance, Euclidean distance and Nei's distances and Hedrick's coefficient.

Surname (dis)similarity among regions can be quantified by different measures. Consider index i=1,,ni=1,\ldots,n for denoting a certain geographical region (for two regions, (i,j)(i,j)). Each region has an associated collection SiS_i of surnames, and for a pair of regions, the collection of all the surnames in them is denoted by Sij(Sij=SiSj)S_{ij} (S_{ij}=S_i\cup S_j). The total number of surnames in a certain region ii is denoted by nin_i. Surnames will be denoted by indices kk and ll.

Isonymy is defined as Ii=kSipki2I_i=\sum \limits _{k\in S_i}p_{ki}^2 where pkip_{ki} denotes the relative frequency of surname kk in region ii. Isonymy can be also extended as a measure of population similarities between groups. Under the assumption of a common origin, isonymy between two regions ii and jj is defined as Iij=kSijpkipkjI_{ij}=\sum \limits_{k\in S_{ij}}p_{k_i}p_{k_j}.

Other different measures of the isonymic distance between a pair of locations can be derived from isonymy between. For instance, the Lasker distance is given by L=log(Iij)L = -\log(I_{ij}).

Lasker distance can be interpreted as a measure of similarity between to areas, where large distance indicate less similarity in surname composition. Nevertheless, Lasker distance is not the only option to quantify surname similarity. Other common coefficients are the Euclidean distance and Nei's distance, both of them given by E=1kSijpkipkjandN=log(IijIiIj),E = \sqrt{1-\sum_{k\in S_{ij}}{\sqrt{p_{ki}p_{kj}}}}\quad\mbox{and}\quad N = -\log\left(\frac{I_{ij}}{\sqrt{I_iI_j}}\right), respectively. Finally, Hedrick's coefficient gives a standardized measure of isonymy using a procedure similar to that utilized in the calculation of a correlation coefficient. Specifically: Hij=2kSijpkipkj(kSijpki2+kSijpkj2), with i,j=1,n.H_{ij} = \frac{ 2 \sum \limits_{k \in S_{ij}} p_{ki} p_{kj}}{ \left(\sum \limits_{k \in S_{ij}} p_{ki}^2 + \sum \limits_{k \in S_{ij}} p_{kj}^2 \right) } \mbox{, with } i,j=1\ldots,n.

In diversity context, pkip_{ki} denotes the relative frequency of species kk in community (\approx region onomastic context) ii and SiS_i are all species in community ii.

Value

A list containing the following components:

isonymy

data frame with two columns and number of rows the number of regions / communities (nn). For each location, it returns the value of the isonymy.

isonymy.btw

the value of isonymy between. Matrix, n×nn \times n.

hedrick

the value of Hedrick's coefficient. Matrix, n×nn \times n.

nei

the value of Nei's distance. Matrix, n×nn \times n.

lasker

the value of Lasker distance. Matrix, n×nn \times n.

distE

the value of Euclidean distance. Matrix, n×nn \times n.

Author(s)

Maria Jose Ginzo Villamayor

References

Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E., and Rodriguez–Larralde, A., (1996) Isonymy and the genetic structure of Switzerland. I: The distributions of surnames. Annals of Human Biology, 23, 431–455.

Cavalli-Sforza, L. L., and Edwards, A. W. F., (1967), Phylogenetic analysis models and estimation procedures. American Journal of Human Genetics, 19, 233 257.

Hedrick, P. W. (1971), A new approach to measuring genetic similarity. Evolution, 25: 276–280.

Lasker, G. W. (1977) A coefficicnt of relationship by isonymy: a method for estimating the genetic relationship between populations. Human Biology, 49, 489–493.

Mikerezi, I., Shina, E. Scapoli, C., Barbujani, G. Mamolini, E., Sandri, M., Carrieri, A., Rodriguez–Larralde, A. and Barrai, I. (2013). Surnames in Albania: a study of the population of Albania through isonymy. Annals of Human Genetics, 77, 232–243.

Nei, M.(1973). The theory and estimation of genetic distance. In Genetic Structure of Populations, edited by N. E. Morton, (Honolulu: University Press of Hawaii), 45–54.

Weiss, V. 1980. Inbreeding and genetic distance between hierarchically structured populations measured by surname frequencies. Mankind Quarterly, 21, 135–149.

See Also

fIsonymy.

Examples

data(surnamesgal14)
result = fIsonymyAll (x= surnamesgal14, n= 314, location = 'muni',
union = 'surname', measure = 'pki')
result

data(namesmengal16)
namesmengal16$pki <- (namesmengal16$number /
namesmengal16$population)
result = fIsonymyAll (x= namesmengal16, n= 313, location = 'muni',
union = 'name', measure = 'pki')
result

data(nameswomengal16)
nameswomengal16$pki <- (nameswomengal16$number /
nameswomengal16$population)
result = fIsonymyAll (x= nameswomengal16, n= 313, location = 'muni',
union = 'name', measure = 'pki')
result

Calculate the Margalef's diversity index

Description

This function obtains the Margalef's diversity index which is a species diversity index developed by Ramon Margalef Lopez during the 1950s. It is a method for quantifying species biodiversity that can be adapted to the context of onomastic.

Usage

fMargalef(x, s, n, location)

Arguments

x

dataframe which contains the number of species and population for each location.

s

name of a variable which represents number of species.

n

name of a variable which represents total number of individuals.

location

name of a variable which represents represents the grouping element.

Details

For a community ii, the Margalef's diversity index is defined by R1=Si1ln(Ni)R_1 = \frac{S_i-1}{\ln(N_i)}, where SiS_i represents the number of species (richness) and NiN_i represents the total number of individuals in all SiS_i.

In onomastic context, NiN_i denotes the number of individuals in region (\approx community diversity context) ii and SiS_i represents the total number of surnames.

Value

A dataframe containing the following components:

location

represents the grouping element, for example the communities / regions.

margalef

the value of the Margalef's diversity index.

Author(s)

Maria Jose Ginzo Villamayor

References

Margalef D.R., (1958), Information theory in ecology. International Journal of General Systems, 3, 36–71.

See Also

fMenhinick, fPielou, fShannon, fSheldon, fSimpson, fSimpsonInf, fGeneralisedMean, fGeometricMean, fHeip.

Examples

library(sqldf)
data(surnamesgal14)

apes2=sqldf('select  muni, count(surname) as ni,
sum(number) as population from surnamesgal14
group by muni;')

result = fMargalef (x= apes2, s="ni", n="population", location  = "muni")
result

data(namesmengal16)

names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from namesmengal16
group by muni;')

result = fMargalef (x= names2, s="ni", n="population", location  = "muni")
result

data(nameswomengal16)

names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from nameswomengal16
group by muni;')

result = fMargalef (x= names2, s="ni", n="population", location  = "muni")
result

Calculate the Menhinick's diversity index

Description

This function obtains the Menhinick's diversity index introduced by Edward F. Menhinick. It is a method for quantifying species biodiversity that can be adapted to the context of onomastic.

Usage

fMenhinick(x, s, n, location)

Arguments

x

dataframe which contains the number of species and population for each location.

s

name of a variable which represents number of species.

n

name of a variable which represents total number of individuals.

location

name of a variable which represents represents the grouping element.

Details

For a community ii, the Menhinick's diversity index is defined by R2=siNiR_2 = \frac{s_i}{\sqrt{N_i}}, where sis_i represents the number of species (richness) and NiN_i represents the total number of individuals in all sis_i.

In onomastic context, NiN_i denotes the number of individuals in region (\approx community diversity context) ii and sis_i represents the total number of surnames.

Value

A dataframe containing the following components:

location

represents the grouping element, for example the communities / regions.

menhinick

the value of the Menhinick's diversity index.

Author(s)

Maria Jose Ginzo Villamayor

References

Menhinick E.F. (1964) A comparison of some species-individuals diversity indices applied to samples of field insects. Ecology, 45, 859–861.

See Also

fMargalef, fPielou, fShannon, fSheldon, fSimpson, fSimpsonInf, fGeneralisedMean, fGeometricMean, fHeip.

Examples

library(sqldf)
data(surnamesgal14)

apes2=sqldf('select  muni, count(surname) as ni,
sum(number) as population from surnamesgal14
group by muni;')

result = fMenhinick(x= apes2, s="ni", n="population",
location  = "muni")
result

data(namesmengal16)

names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from namesmengal16
group by muni;')

result = fMenhinick(x= names2, s="ni", n="population",
location  = "muni")
result

data(nameswomengal16)

names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from nameswomengal16
group by muni;')

result = fMenhinick(x= names2, s="ni", n="population",
location  = "muni")
result

Calculate the Pielou's diversity index

Description

This function obtains the Pielou's diversity index which is an index that measures diversity along with species richness introduced by Evelyn Chrystalla Pielou. It is a method for quantifying species biodiversity that can be adapted to the context of onomastic.

Usage

fPielou(x, k, n, location, s)

Arguments

x

dataframe of the data values for each species not null (because if you have a sample, there might be species that are not represented).

k

name of a variable which represents absolute frequency for each species.

n

name of a variable which represents total number of individuals.

location

represents the grouping element.

s

vector which represents number of species.

Details

For a community ii, the Pielou's diversity index is defined by J=Hlog2SiJ^{\prime} = \frac{H^{\prime}}{\log_2S_i}, where HH^{\prime} denotes the Shannon-Wiener index and log2Si\log_2S_i denotes the maximum diversity HmaxH^{\prime}_{\max}. Pielou's index is the Shannon-Weiner index computed for the sample SiS_i and represents a measure of Evenness of the community. If all species are represented in equal numbers in the sample, then J=1J^{\prime} = 1. If one species strongly dominates JJ^{\prime} is close to zero.

In onomastic context, SiS_i are all surnames in region (\approx community diversity context) ii.

Value

A dataframe containing the following components:

location

represents the grouping element, for example the communities / regions.

pielou

the value of the Pielou's diversity index.

Author(s)

Maria Jose Ginzo Villamayor

References

Pielou, E. C. (1966) The measurement of diversity in different types of biological collections. Journal of Theoretical Biology, 13, 131-144.

See Also

fMargalef, fMenhinick, fShannon, fSheldon, fSimpson, fSimpsonInf, fGeneralisedMean, fGeometricMean, fHeip.

Examples

library(sqldf)
data(surnamesgal14)

apes2=sqldf('select  muni, count(surname) as ni,
sum(number) as population from surnamesgal14
group by muni;')

result = fPielou (x= surnamesgal14[surnamesgal14$number != 0,],
k="number", n="population", location  = "muni", s = apes2$ni )
result

data(namesmengal16)

names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from namesmengal16
group by muni;')

result = fPielou (x= namesmengal16[namesmengal16$number != 0,],
k="number", n="population", location  = "muni", s = names2$ni )
result

data(nameswomengal16)

names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from nameswomengal16
group by muni;')

result = fPielou (x= nameswomengal16[nameswomengal16$number != 0,],
k="number", n="population", location  = "muni", s = names2$ni )
result

Calculate the Shannon-Weaver diversity index

Description

This function obtains the Shannon-Weaver diversity index introduced by Claude Elwood Shannon. This diversity measure came from information theory and measures the order (or disorder) observed within a particular system. It is a method for quantifying species biodiversity that can be adapted to the context of onomastics.

Usage

fShannon(x, k, n, location)

Arguments

x

dataframe of the data values for each species not null (because if you have a sample, there might be species that are not represented)..

k

name of a variable which represents absolute frequency for each species.

n

name of a variable which represents total number of individuals.

location

represents the grouping element.

Details

For a community ii, the index of Shannon-Weaver is defined by the expression H=kSi(pkilog2pki)H^{\prime} = -\sum\limits_{k\in S_i} (p_{ki} \log_2 p_{ki}), where pkip_{ki} represents the relative frequency of species kk, because pki=NkiNip_{ki} = \frac{N_{ki}}{N_i}, (where NkiN_{ki} denotes the number of individuals of species kk and NiN_i total number of individuals in all SiS_i species at the community, species richness. This index is related to the weighted geometric mean of the proportional abundances of the types.

In onomastic context, pkip_{ki} denotes the relative frequency of surname kk in region (\approx community diversity context) ii and SiS_i are all surnames in region ii.

Value

A dataframe containing the following components:

location

represents the grouping element, for example the communities / regions.

shannon

the value of the Shannon-Weaver diversity index.

Author(s)

Maria Jose Ginzo Villamayor

References

Shannon C.E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423.

Shannon C.E., Weaver W. (1949). The Mathematical Theory of Communication. Urbana: University of Illinois Press. USA, 96. pp. 117.

See Also

fMargalef, fMenhinick, fPielou, fSheldon, fSimpson, fSimpsonInf, fGeneralisedMean, fGeometricMean, fHeip.

Examples

data(surnamesgal14)
result = fShannon (x= surnamesgal14[surnamesgal14$number != 0,],
k="number", n="population", location  = "muni" )
result

data(namesmengal16)
result = fShannon (x= namesmengal16[namesmengal16$number != 0,],
k="number", n="population", location  = "muni" )
result

data(nameswomengal16)
result = fShannon (x= nameswomengal16[nameswomengal16$number != 0,],
k="number", n="population", location  = "muni" )
result

Calculate the Sheldon's diversity index

Description

This function obtains the Sheldon's diversity index introduced by A. L. Sheldon. It is a method for quantifying species biodiversity that can be adapted to the context of onomastics.

Usage

fSheldon (x, k, n, location, s)

Arguments

x

dataframe of the data values for each species not null (because if you have a sample, there might be species that are not represented)..

k

name of a variable which represents absolute frequency for each species.

n

name of a variable which represents total number of individuals.

location

represents the grouping element.

s

vector which represents number of species.

Details

For a community ii, the Sheldon's diversity index is defined by EShe=2HSiE_{She} = \frac{2^{H^{\prime}}}{S_i}, where HH^{\prime} denotes the Shannon-Wiener index and SiS_i represents the number of species (richness).

In onomastic context, SiS_i are all surnames in region (\approx community diversity context) ii.

Value

A dataframe containing the following components:

location

represents the grouping element, for example the communities / regions.

sheldon

the value of the Pielou's diversity index.

Author(s)

Maria Jose Ginzo Villamayor

References

Sheldon, A. L. (1969). Equitability indices: dependence on the species count. Ecology, 50, 466–467.

See Also

fMargalef, fMenhinick, fPielou, fShannon, fSimpson, fSimpsonInf, fGeneralisedMean, fGeometricMean, fHeip.

Examples

library(sqldf)
data(surnamesgal14)
apes2=sqldf('select  muni, count(surname) as ni,
sum(number) as population from surnamesgal14
group by muni;')

result = fSheldon (x= surnamesgal14[surnamesgal14$number != 0,],
k="number", n="population", location  = "muni",
s = apes2$ni)
result

data(namesmengal16)
names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from namesmengal16
group by muni;')

result = fSheldon (x= namesmengal16[namesmengal16$number != 0,],
k="number", n="population", location  = "muni",
s = names2$ni)
result

data(nameswomengal16)
names2=sqldf('select  muni, count(name) as ni,
sum(number) as population from nameswomengal16
group by muni;')

result = fSheldon (x= nameswomengal16[nameswomengal16$number != 0,],
k="number", n="population", location  = "muni",
s = names2$ni)
result

Calculate the Simpson's diversity index

Description

This function obtains the Simpson's diversity index and the inverse introduced by Edward Hugh Simpson. It was the first index used in ecology. It is a method for quantifying species biodiversity that can be adapted to the context of onomastics.

Usage

fSimpson(x, k, n, location)

Arguments

x

dataframe of the data values for each species.

k

name of a variable which represents absolute frequency for each species

n

name of a variable which represents total number of individuals.

location

represents the grouping element.

Details

For a community ii, the Simpson's diversity index is defined by DSi=kSipki2D_{S_i} = \sum \limits_{k\in S_i} p_{ki}^2, where pkip_{ki} represents the relative frequency of species kk, because pki=NkiNip_{ki} = \frac{N_{ki}}{N_i}, (where NkiN_{ki} denotes the number of individuals of species kk and NiN_i total number of individuals in all SiS_i species at the community, species richness. The Simpson index tends to be smaller when the community is more diverse.

In onomastic context, pkip_{ki} denotes the relative frequency of surname kk in region (\approx community diversity context) ii, i.e., Simpson's diversity index is equivalent to the concept of isonymy..

Value

A dataframe containing the following components:

location

represents the grouping element, for example the communities / regions.

simpson

the value of the Simpson's diversity index.

divSimpson

the value of the inverse Simpson's diversity index.

Author(s)

Maria Jose Ginzo Villamayor

References

Simpson (1949) Measurement of diversity. Nature, 163.

See Also

fMargalef, fMenhinick, fPielou, fShannon, fSheldon, fSimpsonInf, fGeneralisedMean, fGeometricMean, fHeip.

Examples

data(surnamesgal14)
result = fSimpson (x= surnamesgal14, k="number",
n="population", location  = "muni" )
result

data(namesmengal16)
result = fSimpson (x= namesmengal16, k="number",
n="population", location  = "muni" )
result

data(nameswomengal16)
result = fSimpson (x= nameswomengal16, k="number",
n="population", location  = "muni" )
result

Calculate the Simpson's diversity index and the inverse

Description

This function obtains the Simpson's diversity index and the inverse introduced by Edward Hugh Simpson. It is a method for quantifying species biodiversity that can be adapted to the context of onomastics.

Usage

fSimpsonInf(x, k, n, location)

Arguments

x

dataframe of the data values for each species.

k

name of a variable which represents absolute frequency for each species.

n

name of a variable which represents total number of individuals.

location

represents the grouping element.

Details

For a community ii, the Simpson (when NiN_i is not finite, data are assumed to come from a sample of size NiN_i) diversity index is defined by DSi=kSinki(nki1)ni(ni1)D^{\prime}_{S_i} = \sum \limits_{k\in S_i} \frac{n_{ki}(n_{ki}-1)}{n_i(n_i-1)}, where nkin_{ki} represents the number of individuals of species kk in a sample (in the total is NkiN_{ki}) and SiS_i represents all species at the community, species richness.

In onomastic context, nkin_{ki} (Nki\approx N_{ki}) denotes the absolute frequency of surname kk in region ii and SiS_i are all surnames in region (\approx community diversity context) ii.

Value

A dataframe containing the following components:

location

represents the grouping element, for example the communities / regions.

simpson

the value of the Simpson's Diversity Index.

Author(s)

Maria Jose Ginzo Villamayor

References

Simpson (1949) Measurement of diversity. Nature, 163.

See Also

fMargalef, fMenhinick, fPielou, fShannon, fSheldon, fSimpson, fGeneralisedMean, fGeometricMean, fHeip.

Examples

data(surnamesgal14)
result = fSimpsonInf (x= surnamesgal14, k="number",
n="population", location  = "muni" )
result

data(namesmengal16)
result = fSimpsonInf (x= namesmengal16, k="number",
n="population", location  = "muni" )
result

data(nameswomengal16)
result = fSimpsonInf (x= nameswomengal16, k="number",
n="population", location  = "muni" )
result

namesmengal16 data

Description

This dataset corresponds to 25 most frequent men's names by municipality in Galicia in 2016.

Usage

data(namesmengal16)

Format

namesmengal16 is a data frame with men's names from Galicia in 2016

Source

The data corresponds to 25 most frequent men's names by municipality in Galicia in 2016. The dataset contains 6 columns, prov: the province, muni: the municipality, namuni: the name of the municipality, name: the name, number: the number of people with that name and population: the total population considered by municipality.

These data have been extracted from the website of the Galician Institute of Statistics (IGE). The IGE offers information on the surnames and names of the population whose residence is in the Autonomous Community of Galicia. The base information for the elaboration data is the file of the Municipal Register of inhabitants of 2014 that the National Institute of Statistics (INE) provides to the IGE.

References

Galician Institute of Statistics (IGE), https://www.ige.eu/

Examples

data(namesmengal16)

nameswomengal16 data

Description

This dataset corresponds to 25 most frequent women's names by municipality in Galicia in 2016.

Usage

data(nameswomengal16)

Format

nameswomengal16 is a data frame with women's names from Galicia in 2016.

Source

The data corresponds to 25 most frequent women's names by municipality in Galicia in 2016. The dataset contains 6 columns, prov: the province, muni: the municipality, namuni: the name of the municipality, name: the name, number: the number of people with that name and population: the total population considered by municipality.

These data have been extracted from the website of the Galician Institute of Statistics (IGE). The IGE offers information on the surnames and names of the population whose residence is in the Autonomous Community of Galicia. The base information for the elaboration data is the file of the Municipal Register of inhabitants of 2014 that the National Institute of Statistics (INE) provides to the IGE.

References

Galician Institute of Statistics (IGE), https://www.ige.eu/

Examples

data(nameswomengal16)

surnamesgal14 data

Description

This dataset corresponds to 25 most frequent surnames by municipality in Galicia in 2014.

Usage

data(surnamesgal14)

Format

surnamesgal14 is a data frame with surnames from Galicia in 2014.

Source

The data corresponds to 25 most frequent surnames by municipality in Galicia in 2014. The dataset contains 8 columns, prov: the province, muni: the municipality, namuni: the name of the municipality, surname: the surname, number: the number of people with that surname, population: the total population considered by municipality, ni: the number of surnames considered and pkip_{ki} which is the frequency of surname kk in municipality ii.

These data have been extracted from the website of the Galician Institute of Statistics (IGE). The IGE offers information on the surnames and names of the population whose residence is in the Autonomous Community of Galicia. The base information for the elaboration data is the file of the Municipal Register of inhabitants of 2014 that the National Institute of Statistics (INE) provides to the IGE.

References

Galician Institute of Statistics (IGE), https://www.ige.eu/

Examples

data(surnamesgal14)