Title: | Onomastic Diversity Measures |
---|---|
Description: | Different measures which can be used to quantify similarities between regions. These measures are isonymy, isonymy between, Lasker distance, coefficients of Hedrick and Nei. In addition, it calculates biodiversity indices such as Margalef, Menhinick, Simpson, Shannon, Shannon-Wiener, Sheldon, Heip, Hill Numbers, Geometric Mean and Cressie and Read statistics. |
Authors: | Maria Jose Ginzo Villamayor [aut, cre] |
Maintainer: | Maria Jose Ginzo Villamayor <[email protected]> |
License: | GPL-2 |
Version: | 0.1 |
Built: | 2024-11-05 03:56:28 UTC |
Source: | https://github.com/cran/OnomasticDiversity |
Different measures which can be used to quantify similarities between regions. These measures are isonymy, isonymy between, Lasker distance, coefficients of Hedrick and Nei. In addition, it calculates biodiversity indices such as Margalef, Menhinick, Simpson, Shannon, Shannon-Wiener, Sheldon, Heip, Hill Numbers, Geometric Mean and Cressie and Read statistics.
The DESCRIPTION file:
Package: | OnomasticDiversity |
Type: | Package |
Title: | Onomastic Diversity Measures |
Version: | 0.1 |
Date: | 2024-02-07 |
Authors@R: | c(person("Maria Jose", "Ginzo Villamayor", role = c("aut", "cre"),email="[email protected]")) |
Author: | Maria Jose Ginzo Villamayor [aut, cre] |
Maintainer: | Maria Jose Ginzo Villamayor <[email protected]> |
Depends: | R(>= 4.2.0) |
Imports: | sqldf |
Description: | Different measures which can be used to quantify similarities between regions. These measures are isonymy, isonymy between, Lasker distance, coefficients of Hedrick and Nei. In addition, it calculates biodiversity indices such as Margalef, Menhinick, Simpson, Shannon, Shannon-Wiener, Sheldon, Heip, Hill Numbers, Geometric Mean and Cressie and Read statistics. |
License: | GPL-2 |
LazyLoad: | yes |
Packaged: | 2024-02-07 09:00:25 UTC; mjginzo |
Encoding: | UTF-8 |
NeedsCompilation: | no |
RoxygenNote: | 7.2.3 |
Date/Publication: | 2024-02-08 21:10:09 UTC |
Repository: | https://mjginzo.r-universe.dev |
RemoteUrl: | https://github.com/cran/OnomasticDiversity |
RemoteRef: | HEAD |
RemoteSha: | c8464d2eba77b70b3a23cd7b65487bbe8891a1ff |
Index of help topics:
OnomasticDiversity-package Onomastic Diversity Measures fCressieRead Cressie and Read fGeneralisedMean Calculate the Generalised Mean fGeometricMean Calculate the Geometric Mean fHeip Calculate the Heip's diversity index fHill Calculate the Hill's diversity numbers fIsonymy Calculate the Isonymy within a region fIsonymyAll Calculate the Isonymy, Isonymy between regions, Lasker distances, Euclidean distance and Nei's distances fMargalef Calculate the Margalef's diversity index fMenhinick Calculate the Menhinick's diversity index fPielou Calculate the Pielou's diversity index fShannon Calculate the Shannon-Weaver diversity index fSheldon Calculate the Sheldon's diversity index fSimpson Calculate the Simpson's diversity index fSimpsonInf Calculate the Simpson's diversity index and the inverse namesmengal16 namesmengal16 data nameswomengal16 nameswomengal16 data surnamesgal14 surnamesgal14 data
This package computes the different measures which can be used to quantify similarities between regions. These measures are isonymy, isonymy between, Lasker distance, coefficients of Hedrick and Nei. A diversity index is a numerical measure of how many different types (such as species) are present in a dataset (a community), as well as the evolutionary relationships among the individuals distributed throughout those types, such as richness, divergence, and evenness. These indicators are numerical representations of biodiversity in several dimensions (richness, evenness, and dominance). Then, this package calculates biodiversity indices such as Margalef, Menhinick, Simpson, Shannon, Shannon-Wiener Sheldon, Heip, Hill Numbers, Geometric Mean and Cressie and Read statistics.
Maria Jose Ginzo Villamayor [aut, cre]
Maintainer: Maria Jose Ginzo Villamayor <[email protected]>
Buckland, S.T., Studeny, A.C., Magurran, A.E., Illian, J.B., & Newson, S.E. (2011). The geometric mean of relative abundance indices: a biodiversity measure with a difference. Ecosphere, 2(9), art.100. <https://doi.org/10.1890/ES11-00186.1>
Cressie, Noel and Read, Timothy RC (1984) Multinomial goodness-of-fit tests. Computational Statistics and Data Analysis, 46(3), 440–464. <http://www.jstor.org/stable/2345686>
Sheldon, A. L. (1969). Equitability indices: dependence on the species count. Ecology, 50, 466–467. <https://doi.org/10.2307/1933900>
Simpson (1949) Measurement of diversity. Nature, 163. <https://doi.org/10.1038/163688a0>
Studeny, A.C. (2012). Quantifying Biodiversity Trends in Time and Space. PhD thesis, University of St Andrews. <https://research-repository.st-andrews.ac.uk/bitstream/handle/10023/3414/AngelikaStudenyPhDThesis.pdf?sequence=3&isAllowed=y>
van Strien, A.J., Soldaat, L.L., & Gregory, R.D. (2012). Desirable mathematical properties of indicators for biodiversity change. Ecological Indicators, 14, 202–208. <https://doi.org/10.1016/j.ecolind.2011.07.007>
fCressieRead
,
fGeneralisedMean
,
fGeometricMean
,
fHeip
,
fHill
,
fIsonymy
,
fIsonymyAll
,
fMargalef
,
fMenhinick
,
fPielou
,
fShannon
,
fSheldon
,
fSimpson
,
fSimpsonInf
, fGeneralisedMean
, fHeip
This function obtains the Cressie and Read statistics introduced by Noel Cressie and Timothy Read. It is a method for quantifying species biodiversity that can be adapted to the context of onomastics.
fCressieRead(x, number, population, ni, location, lambda)
fCressieRead(x, number, population, ni, location, lambda)
x |
dataframe of the data values. |
number |
name of a variable which represents number of individuals of each species. |
population |
name of variable which represents total number of individuals. |
ni |
name of variable which represent number of species. |
location |
name of variable which represent represents the grouping element. |
lambda |
free parameter. |
For a community , Cressie and Read (1984) introduced the following parametric form for a generalised statistic
, where
represents the number of individuals of species
in a sample (in the population is
),
represents all species at the community, species richness, and
is a free parameter.
Varying the value of gets different statistics.
If
and
,
is not defined, but in any case, limits
and
can be taken.
In onomastic context, (
) denotes the absolute frequency of surname
in region
(
community diversity context
).
A dataframe containing the following components:
location |
represents the grouping element, for example the communities / regions. |
cressieRead |
the value of Cressie and Read statistics. |
Maria Jose Ginzo Villamayor
Cressie, Noel and Read, Timothy RC (1984) Multinomial goodness-of-fit tests. Computational Statistics and Data Analysis, 46(3), 440–464.
data(surnamesgal14) result = fCressieRead(x= surnamesgal14 , number="number", population="population", location = "muni", ni="ni", lambda = 2) result
data(surnamesgal14) result = fCressieRead(x= surnamesgal14 , number="number", population="population", location = "muni", ni="ni", lambda = 2) result
This function obtains the generalised mean of relative abundances for a collection of species introduced by Angelika C. Studeny. It is a method for quantifying species biodiversity that can be adapted to the context of onomastics.
fGeneralisedMean (x, pki, pki0, s, location, lambda)
fGeneralisedMean (x, pki, pki0, s, location, lambda)
x |
dataframe of the data values for each species not null (because if you have a sample, there might be species that are not represented). |
pki |
name of a variable which represents the relative frequency for each species. |
pki0 |
variable which represents the relative frequency for each species not null (because if you have a sample, there might be species that are not represented). |
location |
name of a variable which represents the grouping element. |
s |
vector which represents total number of species. |
lambda |
free parameter. |
For a community , the generalised mean of relative abundances is defined by
,
where
denotes the number of individuals of species
at times
,
is the baseline year and
are all species at the community, species richness, and
can be any non-zero real number.
In onomastic context, denotes the absolute frequency of surname
in region (
community diversity context)
at times
.
A dataframe containing the following components:
location |
represents the grouping element, for example the communities / regions. |
generalisedMean |
the value of generalised mean. |
Maria Jose Ginzo Villamayor
Studeny, A.C. (2012). Quantifying Biodiversity Trends in Time and Space. PhD thesis, University of St Andrews.
fMargalef
,
fMenhinick
,
fPielou
,
fShannon
,
fSheldon
,
fSimpson
,
fSimpsonInf
,
fGeometricMean
,
fHeip
library(sqldf) data(surnamesgal14) loc <- length(unique(surnamesgal14$muni)) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') result = fGeneralisedMean(x= surnamesgal14[surnamesgal14$number != 0,], pki="pki", pki0=surnamesgal14[surnamesgal14$number != 0,"pki"], location = "muni", s = apes2$ni[1:loc], lambda = 1 ) result data(namesmengal16) loc <- length(unique(namesmengal16$muni)) namesmengal16$pki <- (namesmengal16$number / namesmengal16$population) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') result = fGeneralisedMean(x= namesmengal16[namesmengal16$number != 0,], pki="pki", pki0=namesmengal16[namesmengal16$number != 0,"pki"], location = "muni", s = names2$ni[1:loc], lambda = 1 ) result data(nameswomengal16) loc <- length(unique(nameswomengal16$muni)) nameswomengal16$pki <- (nameswomengal16$number / nameswomengal16$population) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') result = fGeneralisedMean(x= nameswomengal16[nameswomengal16$number != 0,], pki="pki", pki0=nameswomengal16[nameswomengal16$number != 0,"pki"], location = "muni", s = names2$ni[1:loc], lambda = 1 ) result
library(sqldf) data(surnamesgal14) loc <- length(unique(surnamesgal14$muni)) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') result = fGeneralisedMean(x= surnamesgal14[surnamesgal14$number != 0,], pki="pki", pki0=surnamesgal14[surnamesgal14$number != 0,"pki"], location = "muni", s = apes2$ni[1:loc], lambda = 1 ) result data(namesmengal16) loc <- length(unique(namesmengal16$muni)) namesmengal16$pki <- (namesmengal16$number / namesmengal16$population) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') result = fGeneralisedMean(x= namesmengal16[namesmengal16$number != 0,], pki="pki", pki0=namesmengal16[namesmengal16$number != 0,"pki"], location = "muni", s = names2$ni[1:loc], lambda = 1 ) result data(nameswomengal16) loc <- length(unique(nameswomengal16$muni)) nameswomengal16$pki <- (nameswomengal16$number / nameswomengal16$population) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') result = fGeneralisedMean(x= nameswomengal16[nameswomengal16$number != 0,], pki="pki", pki0=nameswomengal16[nameswomengal16$number != 0,"pki"], location = "muni", s = names2$ni[1:loc], lambda = 1 ) result
This function obtains the geometric mean introduced by Stephen Terrence Buckland and coauthors. It is a method for quantifying species biodiversity that can be adapted to the context of onomastic.
fGeometricMean(x, pki, pki0, s, location)
fGeometricMean(x, pki, pki0, s, location)
x |
dataframe of the data values for each species not null (because if you have a sample, there might be species that are not represented). |
pki |
name of a variable which represents the relative frequency for each species. |
pki0 |
name of a variable which represents the relative frequency for each species at initial time point. |
s |
vector which represents total number of species. |
location |
represents the grouping element. |
For a community , the geometric mean of relative abundances is defined by
, where
denotes the number of individuals of species
at times $t$,
is the baseline year and
are all species at the community, species richness.
In onomastic context, denotes the absolute frequency of surname
in region (
community diversity context)
at times
.
A dataframe containing the following components:
location |
represents the grouping element, for example the communities / regions. |
geometricMean |
the value of geometric mean. |
Maria Jose Ginzo Villamayor
Buckland, S.T., Studeny, A.C., Magurran, A.E., Illian, J.B., & Newson, S.E. (2011). The geometric mean of relative abundance indices: a biodiversity measure with a difference. Ecosphere, 2(9), art.100.
Studeny, A.C. (2012). Quantifying Biodiversity Trends in Time and Space. PhD thesis, University of St Andrews.
van Strien, A.J., Soldaat, L.L., & Gregory, R.D. (2012). Desirable mathematical properties of indicators for biodiversity change. Ecological Indicators, 14, 202–208.
fMargalef
,
fMenhinick
,
fPielou
,
fShannon
,
fSheldon
,
fSimpson
,
fSimpsonInf
,
fGeneralisedMean
,
fHeip
library(sqldf) data(surnamesgal14) loc <- length(unique(surnamesgal14$muni)) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') surnamesgal14$pki0 <- surnamesgal14$pki result = fGeometricMean (x= surnamesgal14[surnamesgal14$number != 0,], pki="pki", pki0="pki0" , location = "muni", s = apes2$ni[1:loc]) result data(namesmengal16) loc <- length(unique(namesmengal16$muni)) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') namesmengal16$pki <- (namesmengal16$number / namesmengal16$population) namesmengal16$pki0 <- namesmengal16$pki result = fGeometricMean (x= namesmengal16[namesmengal16$number != 0,], pki="pki", pki0="pki0" , location = "muni", s = names2$ni[1:loc]) result data(nameswomengal16) loc <- length(unique(nameswomengal16$muni)) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') nameswomengal16$pki <- (nameswomengal16$number / nameswomengal16$population) nameswomengal16$pki0 <- nameswomengal16$pki result = fGeometricMean (x= nameswomengal16[nameswomengal16$number != 0,], pki = "pki", pki0 = "pki0", location = "muni", s = names2$ni[1:loc]) result
library(sqldf) data(surnamesgal14) loc <- length(unique(surnamesgal14$muni)) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') surnamesgal14$pki0 <- surnamesgal14$pki result = fGeometricMean (x= surnamesgal14[surnamesgal14$number != 0,], pki="pki", pki0="pki0" , location = "muni", s = apes2$ni[1:loc]) result data(namesmengal16) loc <- length(unique(namesmengal16$muni)) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') namesmengal16$pki <- (namesmengal16$number / namesmengal16$population) namesmengal16$pki0 <- namesmengal16$pki result = fGeometricMean (x= namesmengal16[namesmengal16$number != 0,], pki="pki", pki0="pki0" , location = "muni", s = names2$ni[1:loc]) result data(nameswomengal16) loc <- length(unique(nameswomengal16$muni)) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') nameswomengal16$pki <- (nameswomengal16$number / nameswomengal16$population) nameswomengal16$pki0 <- nameswomengal16$pki result = fGeometricMean (x= nameswomengal16[nameswomengal16$number != 0,], pki = "pki", pki0 = "pki0", location = "muni", s = names2$ni[1:loc]) result
This function obtains the Heip's diversity index introduced by Carlo H. R. Heip. It is a method for quantifying species biodiversity that can be adapted to the context of onomastic.
fHeip (x, k, n, location, s)
fHeip (x, k, n, location, s)
x |
dataframe of the data values for each species not null (because if you have a sample, there might be species that are not represented). |
k |
name of a variable which represents absolute frequency for each species. |
n |
name of a variable which represents total number of individuals. |
location |
represents the grouping element. |
s |
vector which represents total number of species. |
For a community , the Heip's diversity index is defined by
where
is the Shannon diversity index and
are all species at the community, species richness. This index varies from 0 to 1 and measures how equally the species richness contributes to the total abundance of the community.
In onomastic context, are all surnames in region (
community diversity context)
.
A dataframe containing the following components:
location |
represents the grouping element, for example the communities / regions. |
heip |
the value of the Heip's diversity index. |
Maria Jose Ginzo Villamayor
Heip, C. (1974). A New Index Measuring Evenness. Journal of the Marine Biological Association of the United Kingdom, 54(3), 555–557.
fMargalef
,
fMenhinick
,
fPielou
,
fShannon
,
fSheldon
,
fSimpson
,
fSimpsonInf
,
fGeneralisedMean
, fGeometricMean
.
library(sqldf) data(surnamesgal14) loc <- length(unique(surnamesgal14$muni)) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') result = fHeip (x= surnamesgal14[surnamesgal14$number != 0,], k="number", n="population", location = "muni", s = apes2$ni[1:loc] ) result data(namesmengal16) loc <- length(unique(namesmengal16$muni)) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') result = fHeip (x= namesmengal16[namesmengal16$number != 0,], k="number", n="population", location = "muni", s = names2$ni[1:loc] ) result data(nameswomengal16) loc <- length(unique(nameswomengal16$muni)) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') result = fHeip (x= nameswomengal16[nameswomengal16$number != 0,], k="number", n="population", location = "muni", s = names2$ni[1:loc] ) result
library(sqldf) data(surnamesgal14) loc <- length(unique(surnamesgal14$muni)) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') result = fHeip (x= surnamesgal14[surnamesgal14$number != 0,], k="number", n="population", location = "muni", s = apes2$ni[1:loc] ) result data(namesmengal16) loc <- length(unique(namesmengal16$muni)) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') result = fHeip (x= namesmengal16[namesmengal16$number != 0,], k="number", n="population", location = "muni", s = names2$ni[1:loc] ) result data(nameswomengal16) loc <- length(unique(nameswomengal16$muni)) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') result = fHeip (x= nameswomengal16[nameswomengal16$number != 0,], k="number", n="population", location = "muni", s = names2$ni[1:loc] ) result
This function obtains the Hill's diversity numbers introduced by M. O. Hill. It is a method for quantifying species biodiversity that can be adapted to the context of onomastic.
fHill(x, k, n, location, lambda)
fHill(x, k, n, location, lambda)
x |
dataframe of the data values for each species. |
k |
name of a variable which represents absolute frequency for each species. |
n |
name of a variable which represents total number of individuals. |
location |
represents the grouping element. |
lambda |
free parameter. |
For a community , the Hill's diversity numbers are defined by the expression
with the restriction
where
represents the relative frequency of species
and
are all species at the community, species richness, and
is a free parameter. (This is equivalent to the exponential of Renyi's generalised entropy). The Renyi entropy of order
, where
and
, is defined as
Here,
is a discrete random variable with possible outcomes in the set
and corresponding probabilities
for
. The logarithm is conventionally taken to be base 2, especially in the context of information theory where bits are used. If the probabilities are
for all
, then all the Renyi entropies of the distribution are equal:
. In general, for all discrete random variables
is a non-increasing function in
..
Particular cases of values:
, it corresponds species richness;
, it corresponds the exponential of Shannon's entropy; and
, it corresponds the 'inverse' Simpson index.
In onomastic context, denotes the relative frequency of surname
in region (
community diversity context)
and
are all surnames in region
.
A dataframe containing the following components:
location |
represents the grouping element, for example the communities / regions. |
hill |
the value of the Hill's diversity index. |
Maria Jose Ginzo Villamayor
Hill, M. O. (1973). Diversity and Evenness: a unifying notation and its consequences. Ecology, 54, 427–32.
data(surnamesgal14) result = fHill (x= surnamesgal14, k="number", n="population", location = "muni", lambda= 0) result data(namesmengal16) result = fHill (x= namesmengal16, k="number", n="population", location = "muni", lambda= 0) result data(nameswomengal16) result = fHill (x= nameswomengal16, k="number", n="population", location = "muni", lambda= 0) result
data(surnamesgal14) result = fHill (x= surnamesgal14, k="number", n="population", location = "muni", lambda= 0) result data(namesmengal16) result = fHill (x= namesmengal16, k="number", n="population", location = "muni", lambda= 0) result data(nameswomengal16) result = fHill (x= nameswomengal16, k="number", n="population", location = "muni", lambda= 0) result
This function obtains the isonymy within a region which has an associated collection
of surnames.
fIsonymy(x, category)
fIsonymy(x, category)
x |
a vector of relative frequency squared for each surname. |
category |
represents the grouping element, for example the regions. |
Isonymy is defined as where
denotes the relative frequency of surname
in region
.
In diversity context, denotes the relative frequency of species
in community (
region onomastic context)
and
are all species in community
.
A dataframe containing the following components:
category |
represents the grouping element, for example the regions / communities. |
x |
the value of isonymy. |
Maria Jose Ginzo Villamayor
Crow J.F. and Mange A.P., (1965). Measurement of inbreeding from the frequency of marriages between persons of the same surname. Eugenics Quarterly, 12(4), 199–203.
Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E., and Rodriguez–Larralde, A., (1996). Isonymy and the genetic structure of Switzerland. I: The distributions of surnames. Annals of Human Biology, 23, 431–455.
data(surnamesgal14) surnamesgal14$pki2 <- (surnamesgal14$number / surnamesgal14$population)^2 result = fIsonymy(surnamesgal14$pki2, surnamesgal14$namuni) result data(namesmengal16) namesmengal16$pki2 <- (namesmengal16$number / namesmengal16$population)^2 result = fIsonymy(namesmengal16$pki2, namesmengal16$namuni) result data(nameswomengal16) nameswomengal16$pki2 <- (nameswomengal16$number / nameswomengal16$population)^2 result = fIsonymy(nameswomengal16$pki2, nameswomengal16$namuni) result
data(surnamesgal14) surnamesgal14$pki2 <- (surnamesgal14$number / surnamesgal14$population)^2 result = fIsonymy(surnamesgal14$pki2, surnamesgal14$namuni) result data(namesmengal16) namesmengal16$pki2 <- (namesmengal16$number / namesmengal16$population)^2 result = fIsonymy(namesmengal16$pki2, namesmengal16$namuni) result data(nameswomengal16) nameswomengal16$pki2 <- (nameswomengal16$number / nameswomengal16$population)^2 result = fIsonymy(nameswomengal16$pki2, nameswomengal16$namuni) result
This function obtains the Isonymy, Isonymy between regions, Lasker distance, Euclidean distance and Nei's distances and Hedrick's coefficient.
fIsonymyAll (x, n, location, union, measure)
fIsonymyAll (x, n, location, union, measure)
x |
data frame with the data. |
n |
number of the locations in the data frame. |
location |
name of a variable which represents the location in the data. |
union |
variable to be used to search for matching surnames in two locations. |
measure |
name of a variable which represents the relative frequency for each surname. |
Values of Isonymy, Isonymy between regions, Lasker distance, Euclidean distance and Nei's distances and Hedrick's coefficient.
Surname (dis)similarity among regions can be quantified by different measures. Consider index for denoting a certain geographical region (for two regions,
). Each region has an associated collection
of surnames, and for a pair of regions, the collection of all the surnames in them is denoted by
. The total number of surnames in a certain region
is denoted by
. Surnames will be denoted by indices
and
.
Isonymy is defined as where
denotes the relative frequency of surname
in region
. Isonymy can be also extended as a measure of population similarities between groups. Under the assumption of a common origin, isonymy between two regions
and
is defined as
.
Other different measures of the isonymic distance between a pair of locations can be derived from isonymy between. For instance, the Lasker distance is given by .
Lasker distance can be interpreted as a measure of similarity between to areas, where large distance indicate less similarity in surname composition. Nevertheless, Lasker distance is not the only option to quantify surname similarity. Other common coefficients are the Euclidean distance and Nei's distance, both of them given by respectively.
Finally, Hedrick's coefficient gives a standardized measure of isonymy using a procedure similar to that utilized in the calculation of a correlation coefficient. Specifically:
In diversity context, denotes the relative frequency of species
in community (
region onomastic context)
and
are all species in community
.
A list containing the following components:
isonymy |
data frame with two columns and number of rows the number of regions / communities ( |
isonymy.btw |
the value of isonymy between. Matrix, |
hedrick |
the value of Hedrick's coefficient. Matrix, |
nei |
the value of Nei's distance. Matrix, |
lasker |
the value of Lasker distance. Matrix, |
distE |
the value of Euclidean distance. Matrix, |
Maria Jose Ginzo Villamayor
Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E., and Rodriguez–Larralde, A., (1996) Isonymy and the genetic structure of Switzerland. I: The distributions of surnames. Annals of Human Biology, 23, 431–455.
Cavalli-Sforza, L. L., and Edwards, A. W. F., (1967), Phylogenetic analysis models and estimation procedures. American Journal of Human Genetics, 19, 233 257.
Hedrick, P. W. (1971), A new approach to measuring genetic similarity. Evolution, 25: 276–280.
Lasker, G. W. (1977) A coefficicnt of relationship by isonymy: a method for estimating the genetic relationship between populations. Human Biology, 49, 489–493.
Mikerezi, I., Shina, E. Scapoli, C., Barbujani, G. Mamolini, E., Sandri, M., Carrieri, A., Rodriguez–Larralde, A. and Barrai, I. (2013). Surnames in Albania: a study of the population of Albania through isonymy. Annals of Human Genetics, 77, 232–243.
Nei, M.(1973). The theory and estimation of genetic distance. In Genetic Structure of Populations, edited by N. E. Morton, (Honolulu: University Press of Hawaii), 45–54.
Weiss, V. 1980. Inbreeding and genetic distance between hierarchically structured populations measured by surname frequencies. Mankind Quarterly, 21, 135–149.
data(surnamesgal14) result = fIsonymyAll (x= surnamesgal14, n= 314, location = 'muni', union = 'surname', measure = 'pki') result data(namesmengal16) namesmengal16$pki <- (namesmengal16$number / namesmengal16$population) result = fIsonymyAll (x= namesmengal16, n= 313, location = 'muni', union = 'name', measure = 'pki') result data(nameswomengal16) nameswomengal16$pki <- (nameswomengal16$number / nameswomengal16$population) result = fIsonymyAll (x= nameswomengal16, n= 313, location = 'muni', union = 'name', measure = 'pki') result
data(surnamesgal14) result = fIsonymyAll (x= surnamesgal14, n= 314, location = 'muni', union = 'surname', measure = 'pki') result data(namesmengal16) namesmengal16$pki <- (namesmengal16$number / namesmengal16$population) result = fIsonymyAll (x= namesmengal16, n= 313, location = 'muni', union = 'name', measure = 'pki') result data(nameswomengal16) nameswomengal16$pki <- (nameswomengal16$number / nameswomengal16$population) result = fIsonymyAll (x= nameswomengal16, n= 313, location = 'muni', union = 'name', measure = 'pki') result
This function obtains the Margalef's diversity index which is a species diversity index developed by Ramon Margalef Lopez during the 1950s. It is a method for quantifying species biodiversity that can be adapted to the context of onomastic.
fMargalef(x, s, n, location)
fMargalef(x, s, n, location)
x |
dataframe which contains the number of species and population for each location. |
s |
name of a variable which represents number of species. |
n |
name of a variable which represents total number of individuals. |
location |
name of a variable which represents represents the grouping element. |
For a community , the Margalef's diversity index is defined by
, where
represents the number of species (richness) and
represents the total number of individuals in all
.
In onomastic context, denotes the number of individuals in region (
community diversity context)
and
represents the total number of surnames.
A dataframe containing the following components:
location |
represents the grouping element, for example the communities / regions. |
margalef |
the value of the Margalef's diversity index. |
Maria Jose Ginzo Villamayor
Margalef D.R., (1958), Information theory in ecology. International Journal of General Systems, 3, 36–71.
fMenhinick
,
fPielou
,
fShannon
,
fSheldon
,
fSimpson
,
fSimpsonInf
,
fGeneralisedMean
, fGeometricMean
,
fHeip
.
library(sqldf) data(surnamesgal14) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') result = fMargalef (x= apes2, s="ni", n="population", location = "muni") result data(namesmengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') result = fMargalef (x= names2, s="ni", n="population", location = "muni") result data(nameswomengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') result = fMargalef (x= names2, s="ni", n="population", location = "muni") result
library(sqldf) data(surnamesgal14) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') result = fMargalef (x= apes2, s="ni", n="population", location = "muni") result data(namesmengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') result = fMargalef (x= names2, s="ni", n="population", location = "muni") result data(nameswomengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') result = fMargalef (x= names2, s="ni", n="population", location = "muni") result
This function obtains the Menhinick's diversity index introduced by Edward F. Menhinick. It is a method for quantifying species biodiversity that can be adapted to the context of onomastic.
fMenhinick(x, s, n, location)
fMenhinick(x, s, n, location)
x |
dataframe which contains the number of species and population for each location. |
s |
name of a variable which represents number of species. |
n |
name of a variable which represents total number of individuals. |
location |
name of a variable which represents represents the grouping element. |
For a community , the Menhinick's diversity index is defined by
, where
represents the number of species (richness) and
represents the total number of individuals in all
.
In onomastic context, denotes the number of individuals in region (
community diversity context)
and
represents the total number of surnames.
A dataframe containing the following components:
location |
represents the grouping element, for example the communities / regions. |
menhinick |
the value of the Menhinick's diversity index. |
Maria Jose Ginzo Villamayor
Menhinick E.F. (1964) A comparison of some species-individuals diversity indices applied to samples of field insects. Ecology, 45, 859–861.
fMargalef
,
fPielou
,
fShannon
,
fSheldon
,
fSimpson
,
fSimpsonInf
,
fGeneralisedMean
, fGeometricMean
,
fHeip
.
library(sqldf) data(surnamesgal14) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') result = fMenhinick(x= apes2, s="ni", n="population", location = "muni") result data(namesmengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') result = fMenhinick(x= names2, s="ni", n="population", location = "muni") result data(nameswomengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') result = fMenhinick(x= names2, s="ni", n="population", location = "muni") result
library(sqldf) data(surnamesgal14) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') result = fMenhinick(x= apes2, s="ni", n="population", location = "muni") result data(namesmengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') result = fMenhinick(x= names2, s="ni", n="population", location = "muni") result data(nameswomengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') result = fMenhinick(x= names2, s="ni", n="population", location = "muni") result
This function obtains the Pielou's diversity index which is an index that measures diversity along with species richness introduced by Evelyn Chrystalla Pielou. It is a method for quantifying species biodiversity that can be adapted to the context of onomastic.
fPielou(x, k, n, location, s)
fPielou(x, k, n, location, s)
x |
dataframe of the data values for each species not null (because if you have a sample, there might be species that are not represented). |
k |
name of a variable which represents absolute frequency for each species. |
n |
name of a variable which represents total number of individuals. |
location |
represents the grouping element. |
s |
vector which represents number of species. |
For a community , the Pielou's diversity index is defined by
, where
denotes the Shannon-Wiener index and
denotes the maximum diversity
.
Pielou's index is the Shannon-Weiner index computed for the sample
and represents a measure of Evenness of the community. If all species are represented in equal numbers in the sample, then
. If one species strongly dominates
is close to zero.
In onomastic context, are all surnames in region (
community diversity context)
.
A dataframe containing the following components:
location |
represents the grouping element, for example the communities / regions. |
pielou |
the value of the Pielou's diversity index. |
Maria Jose Ginzo Villamayor
Pielou, E. C. (1966) The measurement of diversity in different types of biological collections. Journal of Theoretical Biology, 13, 131-144.
fMargalef
,
fMenhinick
,
fShannon
,
fSheldon
,
fSimpson
,
fSimpsonInf
,
fGeneralisedMean
, fGeometricMean
,
fHeip
.
library(sqldf) data(surnamesgal14) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') result = fPielou (x= surnamesgal14[surnamesgal14$number != 0,], k="number", n="population", location = "muni", s = apes2$ni ) result data(namesmengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') result = fPielou (x= namesmengal16[namesmengal16$number != 0,], k="number", n="population", location = "muni", s = names2$ni ) result data(nameswomengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') result = fPielou (x= nameswomengal16[nameswomengal16$number != 0,], k="number", n="population", location = "muni", s = names2$ni ) result
library(sqldf) data(surnamesgal14) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') result = fPielou (x= surnamesgal14[surnamesgal14$number != 0,], k="number", n="population", location = "muni", s = apes2$ni ) result data(namesmengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') result = fPielou (x= namesmengal16[namesmengal16$number != 0,], k="number", n="population", location = "muni", s = names2$ni ) result data(nameswomengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') result = fPielou (x= nameswomengal16[nameswomengal16$number != 0,], k="number", n="population", location = "muni", s = names2$ni ) result
This function obtains the Shannon-Weaver diversity index introduced by Claude Elwood Shannon. This diversity measure came from information theory and measures the order (or disorder) observed within a particular system. It is a method for quantifying species biodiversity that can be adapted to the context of onomastics.
fShannon(x, k, n, location)
fShannon(x, k, n, location)
x |
dataframe of the data values for each species not null (because if you have a sample, there might be species that are not represented).. |
k |
name of a variable which represents absolute frequency for each species. |
n |
name of a variable which represents total number of individuals. |
location |
represents the grouping element. |
For a community , the index of Shannon-Weaver is defined by the expression
, where
represents the relative frequency of species
, because
, (where
denotes the number of individuals of species
and
total number of individuals in all
species at the community, species richness. This index is related to the weighted geometric mean of the proportional abundances of the types.
In onomastic context, denotes the relative frequency of surname
in region (
community diversity context)
and
are all surnames in region
.
A dataframe containing the following components:
location |
represents the grouping element, for example the communities / regions. |
shannon |
the value of the Shannon-Weaver diversity index. |
Maria Jose Ginzo Villamayor
Shannon C.E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423.
Shannon C.E., Weaver W. (1949). The Mathematical Theory of Communication. Urbana: University of Illinois Press. USA, 96. pp. 117.
fMargalef
,
fMenhinick
,
fPielou
,
fSheldon
,
fSimpson
,
fSimpsonInf
,
fGeneralisedMean
, fGeometricMean
,
fHeip
.
data(surnamesgal14) result = fShannon (x= surnamesgal14[surnamesgal14$number != 0,], k="number", n="population", location = "muni" ) result data(namesmengal16) result = fShannon (x= namesmengal16[namesmengal16$number != 0,], k="number", n="population", location = "muni" ) result data(nameswomengal16) result = fShannon (x= nameswomengal16[nameswomengal16$number != 0,], k="number", n="population", location = "muni" ) result
data(surnamesgal14) result = fShannon (x= surnamesgal14[surnamesgal14$number != 0,], k="number", n="population", location = "muni" ) result data(namesmengal16) result = fShannon (x= namesmengal16[namesmengal16$number != 0,], k="number", n="population", location = "muni" ) result data(nameswomengal16) result = fShannon (x= nameswomengal16[nameswomengal16$number != 0,], k="number", n="population", location = "muni" ) result
This function obtains the Sheldon's diversity index introduced by A. L. Sheldon. It is a method for quantifying species biodiversity that can be adapted to the context of onomastics.
fSheldon (x, k, n, location, s)
fSheldon (x, k, n, location, s)
x |
dataframe of the data values for each species not null (because if you have a sample, there might be species that are not represented).. |
k |
name of a variable which represents absolute frequency for each species. |
n |
name of a variable which represents total number of individuals. |
location |
represents the grouping element. |
s |
vector which represents number of species. |
For a community , the Sheldon's diversity index is defined by
, where
denotes the Shannon-Wiener index and
represents the number of species (richness).
In onomastic context, are all surnames in region (
community diversity context)
.
A dataframe containing the following components:
location |
represents the grouping element, for example the communities / regions. |
sheldon |
the value of the Pielou's diversity index. |
Maria Jose Ginzo Villamayor
Sheldon, A. L. (1969). Equitability indices: dependence on the species count. Ecology, 50, 466–467.
fMargalef
,
fMenhinick
,
fPielou
,
fShannon
,
fSimpson
,
fSimpsonInf
,
fGeneralisedMean
, fGeometricMean
,
fHeip
.
library(sqldf) data(surnamesgal14) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') result = fSheldon (x= surnamesgal14[surnamesgal14$number != 0,], k="number", n="population", location = "muni", s = apes2$ni) result data(namesmengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') result = fSheldon (x= namesmengal16[namesmengal16$number != 0,], k="number", n="population", location = "muni", s = names2$ni) result data(nameswomengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') result = fSheldon (x= nameswomengal16[nameswomengal16$number != 0,], k="number", n="population", location = "muni", s = names2$ni) result
library(sqldf) data(surnamesgal14) apes2=sqldf('select muni, count(surname) as ni, sum(number) as population from surnamesgal14 group by muni;') result = fSheldon (x= surnamesgal14[surnamesgal14$number != 0,], k="number", n="population", location = "muni", s = apes2$ni) result data(namesmengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from namesmengal16 group by muni;') result = fSheldon (x= namesmengal16[namesmengal16$number != 0,], k="number", n="population", location = "muni", s = names2$ni) result data(nameswomengal16) names2=sqldf('select muni, count(name) as ni, sum(number) as population from nameswomengal16 group by muni;') result = fSheldon (x= nameswomengal16[nameswomengal16$number != 0,], k="number", n="population", location = "muni", s = names2$ni) result
This function obtains the Simpson's diversity index and the inverse introduced by Edward Hugh Simpson. It was the first index used in ecology. It is a method for quantifying species biodiversity that can be adapted to the context of onomastics.
fSimpson(x, k, n, location)
fSimpson(x, k, n, location)
x |
dataframe of the data values for each species. |
k |
name of a variable which represents absolute frequency for each species |
n |
name of a variable which represents total number of individuals. |
location |
represents the grouping element. |
For a community , the Simpson's diversity index is defined by
, where
represents the relative frequency of species
, because
, (where
denotes the number of individuals of species
and
total number of individuals in all
species at the community, species richness. The Simpson index tends to be smaller when the community is more diverse.
In onomastic context, denotes the relative frequency of surname
in region (
community diversity context)
, i.e., Simpson's diversity index is equivalent to the concept of isonymy..
A dataframe containing the following components:
location |
represents the grouping element, for example the communities / regions. |
simpson |
the value of the Simpson's diversity index. |
divSimpson |
the value of the inverse Simpson's diversity index. |
Maria Jose Ginzo Villamayor
Simpson (1949) Measurement of diversity. Nature, 163.
fMargalef
,
fMenhinick
,
fPielou
,
fShannon
,
fSheldon
,
fSimpsonInf
,
fGeneralisedMean
, fGeometricMean
,
fHeip
.
data(surnamesgal14) result = fSimpson (x= surnamesgal14, k="number", n="population", location = "muni" ) result data(namesmengal16) result = fSimpson (x= namesmengal16, k="number", n="population", location = "muni" ) result data(nameswomengal16) result = fSimpson (x= nameswomengal16, k="number", n="population", location = "muni" ) result
data(surnamesgal14) result = fSimpson (x= surnamesgal14, k="number", n="population", location = "muni" ) result data(namesmengal16) result = fSimpson (x= namesmengal16, k="number", n="population", location = "muni" ) result data(nameswomengal16) result = fSimpson (x= nameswomengal16, k="number", n="population", location = "muni" ) result
This function obtains the Simpson's diversity index and the inverse introduced by Edward Hugh Simpson. It is a method for quantifying species biodiversity that can be adapted to the context of onomastics.
fSimpsonInf(x, k, n, location)
fSimpsonInf(x, k, n, location)
x |
dataframe of the data values for each species. |
k |
name of a variable which represents absolute frequency for each species. |
n |
name of a variable which represents total number of individuals. |
location |
represents the grouping element. |
For a community , the Simpson (when
is not finite, data are assumed to come from a sample of size
) diversity index is defined by
, where
represents the number of individuals of species
in a sample (in the total is
) and
represents all species at the community, species richness.
In onomastic context, (
) denotes the absolute frequency of surname
in region
and
are all surnames in region (
community diversity context)
.
A dataframe containing the following components:
location |
represents the grouping element, for example the communities / regions. |
simpson |
the value of the Simpson's Diversity Index. |
Maria Jose Ginzo Villamayor
Simpson (1949) Measurement of diversity. Nature, 163.
fMargalef
,
fMenhinick
,
fPielou
,
fShannon
,
fSheldon
,
fSimpson
,
fGeneralisedMean
, fGeometricMean
,
fHeip
.
data(surnamesgal14) result = fSimpsonInf (x= surnamesgal14, k="number", n="population", location = "muni" ) result data(namesmengal16) result = fSimpsonInf (x= namesmengal16, k="number", n="population", location = "muni" ) result data(nameswomengal16) result = fSimpsonInf (x= nameswomengal16, k="number", n="population", location = "muni" ) result
data(surnamesgal14) result = fSimpsonInf (x= surnamesgal14, k="number", n="population", location = "muni" ) result data(namesmengal16) result = fSimpsonInf (x= namesmengal16, k="number", n="population", location = "muni" ) result data(nameswomengal16) result = fSimpsonInf (x= nameswomengal16, k="number", n="population", location = "muni" ) result
This dataset corresponds to 25 most frequent men's names by municipality in Galicia in 2016.
data(namesmengal16)
data(namesmengal16)
namesmengal16
is a data frame with men's names from Galicia in 2016
The data corresponds to 25 most frequent men's names by municipality in Galicia in 2016.
The dataset contains 6 columns, prov
: the province, muni
: the municipality, namuni
: the name of the municipality, name
: the name, number
: the number of people with that name and population
: the total population considered by municipality.
These data have been extracted from the website of the Galician Institute of Statistics (IGE). The IGE offers information on the surnames and names of the population whose residence is in the Autonomous Community of Galicia. The base information for the elaboration data is the file of the Municipal Register of inhabitants of 2014 that the National Institute of Statistics (INE) provides to the IGE.
Galician Institute of Statistics (IGE), https://www.ige.eu/
data(namesmengal16)
data(namesmengal16)
This dataset corresponds to 25 most frequent women's names by municipality in Galicia in 2016.
data(nameswomengal16)
data(nameswomengal16)
nameswomengal16
is a data frame with women's names from Galicia in 2016.
The data corresponds to 25 most frequent women's names by municipality in Galicia in 2016.
The dataset contains 6 columns, prov
: the province, muni
: the municipality, namuni
: the name of the municipality, name
: the name, number
: the number of people with that name and population
: the total population considered by municipality.
These data have been extracted from the website of the Galician Institute of Statistics (IGE). The IGE offers information on the surnames and names of the population whose residence is in the Autonomous Community of Galicia. The base information for the elaboration data is the file of the Municipal Register of inhabitants of 2014 that the National Institute of Statistics (INE) provides to the IGE.
Galician Institute of Statistics (IGE), https://www.ige.eu/
data(nameswomengal16)
data(nameswomengal16)
This dataset corresponds to 25 most frequent surnames by municipality in Galicia in 2014.
data(surnamesgal14)
data(surnamesgal14)
surnamesgal14
is a data frame with surnames from Galicia in 2014.
The data corresponds to 25 most frequent surnames by municipality in Galicia in 2014.
The dataset contains 8 columns, prov
: the province, muni
: the municipality, namuni
: the name of the municipality, surname
: the surname, number
: the number of people with that surname, population
: the total population considered by municipality, ni
: the number of surnames considered and which is the frequency of surname
in municipality
.
These data have been extracted from the website of the Galician Institute of Statistics (IGE). The IGE offers information on the surnames and names of the population whose residence is in the Autonomous Community of Galicia. The base information for the elaboration data is the file of the Municipal Register of inhabitants of 2014 that the National Institute of Statistics (INE) provides to the IGE.
Galician Institute of Statistics (IGE), https://www.ige.eu/
data(surnamesgal14)
data(surnamesgal14)