Please make sure to complete the following preparatory steps before the training workshop:
install.packages("dplyr")
install.packages("tidyr")
install.packages("ggplot2")
install.packages("readr")
install.packages("knitr")
install.packages("data.table")
install.packages("Matrix")
# DemoKin is available on [CRAN](https://cran.r-project.org/web/packages/DemoKin/index.html),
# but we'll use the development version on [GitHub](https://github.com/IvanWilli/DemoKin):
install.packages("remotes")
remotes::install_github("IvanWilli/DemoKin")
Now, check if the packages can be loaded:
< > Code
and then
Download ZIP
, as shown below:This workshop includes material from the course Matrix
Approaches to Modelling Kinship: Theory and Applications prepared by
Ivan Williams and adapted by
Amanda
Martins for the European
Doctoral School of Demography and for the ALAP 2024
Conference. The multistate code was adapted from the corresponding
vignette
in DemoKin
contributed by Joe
Butterick and adapted to the case of Singapore by Sha
Jiang. Saroja
Adhikari provided support with the Wittgenstein educational
data.
For a full list of DemoKin
contributors, see https://github.com/IvanWilli/DemoKin.
Kinship is a fundamental property of human populations and a key form of social structure. Demographers have long been interested in the interplay between demographic change and family configuration. This has led to the development of sophisticated methodological and conceptual approaches for the study of kinship, some of which are reviewed in this course.
Load packages that will be useful:
Load a couple of functions we prepared:
We will use data for Singapore from the 2024 Revision of the World Population Prospects, which we pre-processed for this workshop.
#Load data for women
rates_female <- read.csv("data/sgp_female.csv", stringsAsFactors = F) #female rates
pop_female <- read.csv("data/sgp_popfemale.csv", stringsAsFactors = F) #female population (N)
# For data for men
rates_male <- read.csv("data/sgp_male.csv", stringsAsFactors = F) # male rates
pop_male <- read.csv("data/sgp_popmale.csv", stringsAsFactors = F) # male population (N)
Let’s see what data we will be using. The first object is
rates_female
, a long-format data frame with year- and
age-specific fertility rates (‘fx’) and survival probabilities from a
life table (‘px’).
For women:
## year age px fx
## 1 1950 0 0.9149421 0
## 2 1950 1 0.9836904 0
## 3 1950 2 0.9884776 0
## 4 1950 3 0.9918538 0
## 5 1950 4 0.9942263 0
## 6 1950 5 0.9958862 0
Let us visualise some of our demographic data. We start with the probability of dying between ages \(x\) and \(x+1\),\(q_x\) in a life table for women:
rates_female %>%
filter(year %in% seq(1950, 2020, 30)) %>%
mutate(qx = 1-px) %>%
ggplot() +
geom_line(aes(x = age, y = qx, col = as.character(year)), linewidth = 1) +
scale_y_log10() +
labs(y = "Probability of dying, qx", col = NULL) +
theme_bw() +
theme(legend.position = "bottom")
Next, we have age-specific fertility rates for multiple years:
rates_female %>%
filter(year %in% seq(1950, 2020, 30)) %>%
ggplot() +
geom_line(aes(x = age, y = fx, col = as.character(year)), linewidth = 1) +
labs(y = "Age-specific fertility rate, fx", col = NULL) +
theme_bw() +
theme(legend.position = "bottom")
Finally, we have the population counts:
pop_female %>%
mutate(age = 0:100) %>%
pivot_longer(-age, names_to = "year", values_to = "pop") %>%
mutate(year = gsub("X", "", year)) %>%
filter(year %in% seq(1950, 2020, 30)) %>%
ggplot() +
geom_line(aes(x = age, y = pop, col = as.character(year)), linewidth = 1) +
labs(y = "Population alive in thousands", col = NULL) +
theme_bw() +
theme(legend.position = "bottom")
DemoKin
can be used to compute the number and age
distribution of Focal’s relatives under a range of assumptions,
including living and deceased kin. Let’s explore the main functions.
kin()
The function DemoKin::kin()
currently does most of the
heavy lifting in terms of implementing matrix kinship models.
For this example, we will run the simplest model, with the following assumptions:
First, we need to extract the data for women in 2020 and save it as a matrix:
# First, reshape fertility and survival for a given year
asfr_2020 <- rates_female %>%
filter(year == 2020) %>%
select(age, year, fx) %>%
pivot_wider(names_from = year, values_from = fx) %>%
select(-age) %>%
as.matrix()
surv_2020 <- rates_female %>%
filter(year == 2020) %>%
select(age, year, px) %>%
pivot_wider(names_from = year, values_from = px) %>%
select(-age) %>%
as.matrix()
Now we can run the kinship models:
kin()
Relatives for the output_kin
argument are identified by
a unique code. Note that the relationship codes used in
DemoKin
differ from those in Caswell (2019). The equivalence between the two set of
codes is given in the following table:
## DemoKin Caswell Labels_female Labels_male
## 1 coa t Cousins from older aunts Cousins from older uncles
## 2 cya v Cousins from younger aunts Cousins from younger uncles
## 3 c <NA> Cousins Cousins
## 4 d a Daughters Sons
## 5 gd b Grand-daughters Grand-sons
## 6 ggd c Great-grand-daughters Great-grand-sons
## 7 ggm h Great-grandmothers Great-grandfathers
## 8 gm g Grandmothers Grandfathers
## 9 m d Mother Father
## 10 nos p Nieces from older sisters Nephews from older brothers
## 11 nys q Nieces from younger sisters Nephews from younger brothers
## 12 n <NA> Nieces Nephews
## 13 oa r Aunts older than mother Uncles older than fathers
## 14 ya s Aunts younger than mother Uncles younger than father
## 15 a <NA> Aunts Uncles
## 16 os m Older sisters Older brothers
## 17 ys n Younger sisters Younger brothers
## 18 s <NA> Sisters Brothers
## Labels_2sex
## 1 Cousins from older aunts/uncles
## 2 Cousins from younger aunts/uncles
## 3 Cousins
## 4 Children
## 5 Grand-childrens
## 6 Great-grand-childrens
## 7 Great-grandfparents
## 8 Grandparents
## 9 Parents
## 10 Niblings from older siblings
## 11 Niblings from younger siblings
## 12 Niblings
## 13 Aunts/Uncles older than parents
## 14 Aunts/Uncles younger than parents
## 15 Aunts/Uncles
## 16 Older siblings
## 17 Younger siblings
## 18 Siblings
DemoKin::kin()
returns a list containing two data
frames: kin_full
and kin_summary
.
## List of 2
## $ kin_full : tibble [142,814 × 7] (S3: tbl_df/tbl/data.frame)
## ..$ kin : chr [1:142814] "d" "d" "d" "d" ...
## ..$ age_kin : int [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ age_focal: int [1:142814] 0 1 2 3 4 5 6 7 8 9 ...
## ..$ living : num [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ dead : num [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ cohort : logi [1:142814] NA NA NA NA NA NA ...
## ..$ year : logi [1:142814] NA NA NA NA NA NA ...
## $ kin_summary: tibble [1,414 × 9] (S3: tbl_df/tbl/data.frame)
## ..$ age_focal : int [1:1414] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ kin : chr [1:1414] "coa" "cya" "d" "gd" ...
## ..$ year : logi [1:1414] NA NA NA NA NA NA ...
## ..$ count_living : num [1:1414] 0.0773 0.0243 0 0 0 ...
## ..$ mean_age : num [1:1414] 7.84 3.84 NaN NaN NaN ...
## ..$ sd_age : num [1:1414] 5.83 3.57 NaN NaN NaN ...
## ..$ count_dead : num [1:1414] 0.00001534 0.00000902 0 0 0 ...
## ..$ count_cum_dead: num [1:1414] 0.00001534 0.00000902 0 0 0 ...
## ..$ mean_age_lost : num [1:1414] 0 0 NaN NaN NaN 0 0 0 0 NaN ...
kin_full
This data frame contains expected kin counts by year (or cohort), age of Focal, type of kin and, age of kin, including living and dead kin at that age.
## # A tibble: 6 × 7
## kin age_kin age_focal living dead cohort year
## <chr> <int> <int> <dbl> <dbl> <lgl> <lgl>
## 1 d 0 0 0 0 NA NA
## 2 d 0 1 0 0 NA NA
## 3 d 0 2 0 0 NA NA
## 4 d 0 3 0 0 NA NA
## 5 d 0 4 0 0 NA NA
## 6 d 0 5 0 0 NA NA
kin_summary
This is a ‘summary’ data frame derived from kin_full
. To
produce it, we sum over all ages of kin to produce a data frame of
expected kin counts by year or cohort and age of Focal (but not
by age of kin).
## # A tibble: 6 × 9
## age_focal kin year count_living mean_age sd_age count_dead count_cum_dead
## <int> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 coa NA 0.0773 7.84 5.83 0.0000153 0.0000153
## 2 0 cya NA 0.0243 3.84 3.57 0.00000902 0.00000902
## 3 0 d NA 0 NaN NaN 0 0
## 4 0 gd NA 0 NaN NaN 0 0
## 5 0 ggd NA 0 NaN NaN 0 0
## 6 0 ggm NA 0.226 88.5 6.07 0.0257 0.0257
## # ℹ 1 more variable: mean_age_lost <dbl>
We can visualize the kinship structure we just computed using a
network or ‘Keyfitz’ kinship diagram (Keyfitz and
Caswell 2005) with the plot_diagram
function. Let’s
see the expected number of living kin for a 65 yo woman in Singapore
according to our model:
kin_2020$kin_summary %>%
filter(age_focal == 65) %>%
select(kin, count = count_living) %>%
plot_diagram(rounding = 2)
Let’s run the model again with the same parameters, selecting
specific kin types (note the output_kin
argument:
kin_2020 <-
kin(
p = surv_2020
, f = asfr_2020
, output_kin = c("c", "d", "gd", "ggd", "gm", "m", "n", "a", "s")
, time_invariant = TRUE
)
Now, let’s visualize how the expected number of daughters, siblings,
cousins, etc., changes over the life course of Focal (now, with full
names to identify each relative type using the function
DemoKin::rename_kin()
).
kin_2020$kin_summary %>%
rename_kin() %>%
ggplot() +
geom_line(aes(age_focal, count_living)) +
theme_bw() +
labs(x = "Age of focal", y= "Number of living female relatives") +
facet_wrap(~kin_label)
Note that we are working in a time invariant framework. You can think of the results as analogous to life expectancy (i.e., expected years of life for a synthetic cohort experiencing a given set of period mortality rates).
How does overall family size (and family composition) vary over life for an average woman who survives to each age?
counts <-
kin_2020$kin_summary %>%
group_by(age_focal) %>%
summarise(count_living = sum(count_living)) %>%
ungroup()
kin_2020$kin_summary %>%
select(age_focal, kin, count_living) %>%
rename_kin() %>%
ggplot(aes(x = age_focal, y = count_living)) +
geom_area(aes(fill = kin_label), colour = "black") +
geom_line(data = counts, size = 2) +
labs(x = "Age of focal",
y = "Number of living female relatives",
fill = "Kin") +
theme_bw() +
theme(legend.position = "bottom")
How old are Focal’s relatives? Using the kin_full
data
frame, we can visualize the age distribution of Focal’s relatives
throughout Focal’s life. For example when Focal is 65 (the red vertical
line in the plot), what are the ages of her relatives:
kin_2020$kin_full %>%
rename_kin() %>%
filter(age_focal == 65) %>%
ggplot(aes(age_kin, living)) +
geom_line() +
geom_vline(xintercept = 65, colour = "red") +
labs(x = "Age of kin",
y = "Expected number of living relatives") +
theme_bw() +
facet_wrap(~kin_label)
Human males generally live shorter and reproduce later than females.
These sex-specific processes affect kinship dynamics in a number of
ways. For example, the degree to which an average member of the
population, call her Focal, has a living grandparent is affected by
differential mortality affecting the parental generation at older ages.
We may also be interested in considering how kinship structures vary by
Focal’s sex: a male Focal may have a different number of grandchildren
than a female Focal given differences in fertility by sex. Documenting
these differences matters since women often face greater expectations to
provide support and informal care to relatives. As they live longer,
they may find themselves at greater risk of being having no living kin.
The function kin2sex
implements two-sex kinship models as
introduced by Caswell (2022).
Load packages:
Load male and female data (restricting it to data in the 1950-2023 period):
#Load data for women
rates_female <- read.csv("data/sgp_female.csv", stringsAsFactors = F) #female rates
pop_female <- read.csv("data/sgp_popfemale.csv", stringsAsFactors = F) #female population (N)
# For data for men
rates_male <- read.csv("data/sgp_male.csv", stringsAsFactors = F) # male rates
pop_male <- read.csv("data/sgp_popmale.csv", stringsAsFactors = F) # male population (N)
Since we’re not running projections (yet), we’ll keep only data in the 1950-2023 period:
rates_female <- rates_female %>% filter(year <= 2023)
pop_female <- pop_female[ , 1:(2024-1950)]
rates_male <- rates_male %>% filter(year <= 2023)
pop_male <- pop_male[ , 1:(2024-1950)]
Note that for men there is no fx
(fertility) column
because the WPP does not provide it:
## year age px
## 1 1950 0 0.8989109
## 2 1950 1 0.9800395
## 3 1950 2 0.9872590
## 4 1950 3 0.9917858
## 5 1950 4 0.9945839
## 6 1950 5 0.9962990
Before running the model, we need to reshape the data to have matrices as input.
# Female fertility
asfr_females <-
rates_female %>%
select(age, year, fx) %>%
pivot_wider(names_from = year, values_from = fx) %>%
select(-age) %>%
as.matrix()
# Female survival
surv_females <-
rates_female %>%
select(age, year, px) %>%
pivot_wider(names_from = year, values_from = px) %>%
select(-age) %>%
as.matrix()
# Male survival
surv_males <-
rates_male %>%
select(age, year, px) %>%
pivot_wider(names_from = year, values_from = px) %>%
select(-age) %>%
as.matrix()
The data now has years in columns and ages in rows. For example, male survival probabilities:
## 1950 1951 1952 1953 1954 1955
## [1,] 0.8989109 0.9081987 0.9168968 0.9248786 0.9321124 0.9383029
## [2,] 0.9800395 0.9836895 0.9867753 0.9891777 0.9908956 0.9919618
## [3,] 0.9872590 0.9892537 0.9910078 0.9924139 0.9935382 0.9944831
## [4,] 0.9917858 0.9928559 0.9938370 0.9946455 0.9953902 0.9961763
## [5,] 0.9945839 0.9951549 0.9957015 0.9961628 0.9966740 0.9972937
## [6,] 0.9962990 0.9966080 0.9969170 0.9971827 0.9975558 0.9980201
## [7,] 0.9973456 0.9975199 0.9977021 0.9978603 0.9981553 0.9984851
## [8,] 0.9979803 0.9980872 0.9982033 0.9983044 0.9985583 0.9987756
## [9,] 0.9983634 0.9984369 0.9985192 0.9985906 0.9988243 0.9989514
## [10,] 0.9985861 0.9986440 0.9987099 0.9987666 0.9989925 0.9990482
Be patient: this can take around 5-10 minutes to run…
kin_out <-
kin2sex(
# Survival probabilities of women
pf = surv_females
# Survival probabilities of men
, pm = surv_males
# Fertility rates for women
, ff = asfr_females
# Fertility rates for men (in this example, we use female rates)
, fm = asfr_females
# Female population counts
, nf = pop_female
# Male population counts
, nm = pop_male
# If FALSE, rates vary over time
, time_invariant = FALSE
# Which kin you want estimates for (see demokin_codes)
, output_kin = c("c", "d", "gd", "ggd", "ggm", "gm", "m", "n", "a", "s")
# The sex of the Focal or anchor invididual
, sex_focal = "f"
)
Our model assumes that:
The value returned by kin2sex
is equivalent to the one
returned by kin
, with a couple new columns:
sex_kin
that distinguishes between male and female
kinyear
: period (calendar) yearcohort
: birth cohort of Focal## # A tibble: 6 × 11
## age_focal kin sex_kin year cohort count_living mean_age sd_age count_dead
## <int> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 0 a f 1950 1950 5.03 24.8 10.7 0.0169
## 2 0 a f 1951 1951 5.02 25.7 10.8 0.0169
## 3 0 a f 1952 1952 5.02 25.6 10.8 0.0166
## 4 0 a f 1953 1953 5.02 25.5 10.8 0.0158
## 5 0 a f 1954 1954 5.02 25.5 10.8 0.0151
## 6 0 a f 1955 1955 5.03 25.4 10.8 0.0139
## # ℹ 2 more variables: count_cum_dead <dbl>, mean_age_lost <dbl>
A note on terminology: The
kin2sex
function uses the same codes askin
to identify relatives (seedemokin_codes()
for reference).
Please note that when running a two-sex model, the code “m” refers to either mothers or fathers!
Use thesex_kin
column to filter by the sex of a specific relative.
For example, to consider only sons and ignore daughters, use:
## # A tibble: 6 × 11
## age_focal kin sex_kin year cohort count_living mean_age sd_age count_dead
## <int> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 0 d m 1950 1950 0 NaN NaN 0
## 2 0 d m 1951 1951 0 NaN NaN 0
## 3 0 d m 1952 1952 0 NaN NaN 0
## 4 0 d m 1953 1953 0 NaN NaN 0
## 5 0 d m 1954 1954 0 NaN NaN 0
## 6 0 d m 1955 1955 0 NaN NaN 0
## # ℹ 2 more variables: count_cum_dead <dbl>, mean_age_lost <dbl>
The two sex model provides kinship estimates by age, period, and birth cohort (of Focal) (Caswell 2019):
Let’s take a look at the resulting kin counts for a Focal born in 1960. First, we plot total family size over the life course of Focal for all kin and irrespective of the sex of kin:
kin_out$kin_summary %>%
filter(cohort == 1960) %>%
rename_kin(sex = "2sex") %>%
summarise(count=sum(count_living), .by = c(kin_label, age_focal)) %>%
ggplot(aes(age_focal, count, fill=kin_label)) +
geom_area(colour = "black") +
labs(y = "Expected number of living kin by Focal's age",
x = "Age of a Focal born 1960") +
theme_bw() +
theme(legend.position = "bottom")
Now, we see the distribution of kin by sex of kin for a subset of kin:
kin_out$kin_summary %>%
filter(cohort == 1960) %>%
filter(kin%in% c("c", "d", "gd", "ggd", "gm", "m", "n", "a", "s")) %>%
rename_kin(sex = "2sex") %>%
summarise(count=sum(count_living), .by = c(kin_label, age_focal, sex_kin)) %>%
ggplot(aes(age_focal, count, fill=sex_kin)) +
geom_area() +
labs(y = "Expected number of living kin by sex and Focal's age",
x = "Age of a Focal born 1960",
fill = "Sex of Kin") +
facet_wrap(~kin_label, scales = "free_y") +
theme_bw() +
theme(legend.position = "bottom")
We now focus on the age distribution of kin. Note that we use the
object kin_full
for this:
kin_out$kin_full %>%
filter(age_focal == 60, cohort == 1960) %>%
filter(kin%in% c("d", "gd", "m", "s")) %>%
rename_kin(sex = "2sex") %>%
ggplot(aes(x = age_kin, y = living, colour = sex_kin)) +
geom_line() +
theme_bw() +
labs(y = "Expected number of living kin for a 65 yo Focal born in 1960",
x = "Age of kin",
colour = "Sex of Kin") +
facet_wrap(.~kin_label)
Let us take a snapshot
of the kin distribution in 2023.
We can plot total family size over the life course of Focal for all kin
and irrespective of the sex of kin:
kin_out$kin_summary %>%
filter(year == 2023) %>%
rename_kin(sex = "2sex") %>%
summarise(count=sum(count_living), .by = c(kin_label, age_focal)) %>%
ggplot(aes(age_focal, count, fill=kin_label)) +
geom_area(colour = "black") +
labs(y = "Expected number of living kin by sex and Focal's age",
x = "Age of Focal") +
theme_bw() +
theme(legend.position = "bottom")
Next, let’s plot the number of kin over the life of Focal including the sex of kin for a selected number of kin:
kin_out$kin_summary %>%
filter(year == 2023) %>%
filter(kin%in% c("c", "d", "gd", "ggd", "gm", "m", "n", "a", "s")) %>%
rename_kin(sex = "2sex") %>%
summarise(count=sum(count_living), .by = c(kin_label, age_focal, sex_kin)) %>%
ggplot(aes(age_focal, count, fill=sex_kin)) +
geom_area() +
labs(y = "Expected number of living kin by sex and Focal's age",
x = "Age of Focal",
fill = "Sex of Kin") +
facet_wrap(~kin_label, scales = "free_y") +
theme_bw() +
theme(legend.position = "bottom")
Finally, we see the age distribution of kin for a 65 yo woman in
calendar year 2023. Note that we use the object kin_full
for this:
kin_out$kin_full %>%
filter(age_focal == 65, year == 2023) %>%
filter(kin%in% c("d", "gd", "m", "s")) %>%
rename_kin(sex = "2sex") %>%
ggplot(aes(x = age_kin, y = living, colour = sex_kin)) +
geom_line() +
theme_bw() +
labs(y = "Expected number of living kin for a 65 yo Focal born in 1960",
x = "Age of kin",
colour = "Sex of Kin") +
facet_wrap(.~kin_label)
Load packages:
DemoKin
Since Caswell’s (Caswell 2019) one-sex
time-invariant age-structured matrix model of kinship, the framework has
expanded to incorporate time-varying vital rates (2021), two-sex of kin and focal (2022) and multi-state kin populations (2020).
Here, we provide an R
functionkin_multi_stage_time_variant_2sex
, which combines
the three aforementioned models and provide a two-sex time-varying
multi-state model.
This function computes stage-specific kinship networks across both sexes for an average member of a population (focal) under time-varying demographic rates. It estimates the number, age, sex, and stage distribution of Focal’s relatives, for each age of Focal’s life, and as a function of the year in which Focal is born.
In this example we use education as an example stage. Singaporean data ranging from 2020 - 2090 is sourced from the Wittgenstein Center. The data is aggregated into 5-year age group and 5-year time interval.
In order to implement the model, the function
kin_multi_stage_time_variant_2sex
expects the following 7
inputs of vital rates, fed in as lists:
U_list_females
A list of female age-and-education
specific survival probabilities over the timescale (in matrix forms).
This input list has length = the timescale, and each entry represents
the rates of a specific period in matrix form: stage columns, age
rows.
U_list_males
A list in the same format as
U_list_females
, but for males.
F_list_females
A list of female age-and-education
specific fertility rates over the timescale (in matrix forms). This
input list has length = the timescale, and each entry represents the
rates of a specific period in matrix form: stage columns, age
rows.
F_list_males
A list in the same format as
F_list_females
, but for males.
T_list_females
A list of lists of female
age-specific probabilities of moving up education over the timescale (in
matrix forms). The outer list has length = the timescale. The inner list
has length = number of ages. Each outer list entry is comprised of a
list of matrices (stage*stage dimensional), each matrix describes
age-specific probabilities of moving stage. Thus for each year, we have
a list of age-specific probabilities of moving from one stage to the
next.
T_list_males
A list in the same format as
T_list_females
, but for males.
H_list
A list of length = timescale, in which each
element is a matrix which assigns the offspring of individuals in some
stage to the appropriate age class (age in rows and states in columns).
This list of matrices applies to both male and female
offspring.
We pre-processed the data for this workshop:
We feed the above inputs into the matrix model, along with other arguments:
this run takes some time (ca. 5 min)
# Run kinship model for a female Focal over a timescale of no_years
time_range <- seq(2020,2090,5)
no_years <- length(time_range)-1
# For 5-y age and year intervals
output_year_select <- seq(1, no_years+1, 1)
# since we use 5-year time frame, we are projecting from 2020 to 2020+5*14 = 2090
# We decide here to count accumulated kin by age of Focal, and not distributions of kin
kin_out_2020_2090 <-
kin_multi_stage_time_variant_2sex(U_list_females = U_mat_fem_edu[1:(1+no_years)],
U_list_males = U_mat_male_edu[1:(1+no_years)],
F_list_females = F_mat_fem_edu[1:(1+no_years)],
F_list_males = F_mat_male_edu[1:(1+no_years)],
T_list_females = T_mat_fem_edu[1:(1+no_years)],
T_list_males = T_mat_fem_edu[1:(1+no_years)],
H_list = H_mat_edu[1:(1+no_years)],
birth_female = 1/(1.06+1), ## Sex ratio at birth -- Singapore value (1.06)
parity = FALSE,
summary_kin = TRUE,
sex_Focal = "Female", ## define Focal's sex at birth
initial_stage_Focal = 1, ## Define Focal's stage at birth
output_years = output_year_select # it seems tricky to set up output years when the time interval is not 1
# now I select every 10 year for output
)
## 81.69 sec elapsed
Now we have to recode year and age related variables to show the correct years and educational labels:
# also need to change the age_focal and year variables accordingly
kin_out_2020_2090$kin_summary <-
kin_out_2020_2090$kin_summary %>%
mutate(year = (year-1)*5+min(time_range),
age_focal = age_focal*5,
cohort = year - age_focal,
stage_kin = factor(stage_kin, levels = c(1, 2, 3, 4, 5, 6),
labels = c(
"no education",
"incomplete primary",
"primary",
"lower secondary",
"upper secondary",
"post-secondary"
)))
kin_out_2020_2090$kin_full <- kin_out_2020_2090$kin_full%>%
mutate(year = (year-1)*5+min(time_range),
age_focal = age_focal*5,
age_kin = age_kin*5,
cohort = year - age_focal,
stage_kin = factor(stage_kin, levels = c(1, 2, 3, 4, 5, 6),
labels = c(
"no education",
"incomplete primary",
"primary",
"lower secondary",
"upper secondary",
"post-secondary"
)))
Notice the structure of the output data kin_summary
. We
have columns age_focal
and kin_stage
because
we sum over all ages of kin, and produce the marginal stage distribution
given age of Focal. We have a column corresponding to sex of kin
sex_kin
, a column showing which year
we are
considering, and a column headed group
which selects the
kin type. Finally, we have columns showing Focal’s cohort of birth
cohort
(e.g., year - age of Focal).
Let’s take a look at kin_summary
by selecting 10 rows,
from row 1000 to row 1010.
## # A tibble: 11 × 8
## age_focal stage_kin count sex_kin year group cohort cohort_factor
## <dbl> <fct> <dbl> <chr> <dbl> <chr> <dbl> <fct>
## 1 100 incomplete primary 1.72e-1 Male 2020 ggd 1920 -19
## 2 100 primary 1.05e-1 Female 2020 ggd 1920 -19
## 3 100 primary 1.11e-1 Male 2020 ggd 1920 -19
## 4 100 lower secondary 3.71e-2 Female 2020 ggd 1920 -19
## 5 100 lower secondary 3.93e-2 Male 2020 ggd 1920 -19
## 6 100 upper secondary 1.14e-2 Female 2020 ggd 1920 -19
## 7 100 upper secondary 1.20e-2 Male 2020 ggd 1920 -19
## 8 100 post-secondary 1.14e-4 Female 2020 ggd 1920 -19
## 9 100 post-secondary 1.21e-4 Male 2020 ggd 1920 -19
## 10 0 no education 0 Female 2020 m 2020 1
## 11 0 no education 0 Male 2020 m 2020 1
We start by showing the number of living kin for an average woman born in 2020 in Singapore by level of educational attainment of her kin:
kin_out_2020_2090$kin_summary %>%
# Multistate models also include an (educational) state for the Focal individual
# which is not of interest here
filter(group != "Focal") %>%
filter(cohort == 2020) %>%
rename(kin = group) %>%
rename_kin(sex = "2sex") %>%
summarise(count=sum(count), .by = c(stage_kin, age_focal)) %>%
ggplot(aes(x = age_focal, y = count, fill=stage_kin)) +
geom_area(colour = "black") +
labs(y = "Expected number of living kin",
x = "Age of a woman born in 2020") +
theme_bw() +
theme(legend.position = "bottom")
Show the same values, but separately for selected kin:
kin_out_2020_2090$kin_summary %>%
filter(group %in% c("d", "gd", "m", "gm")) %>%
filter(cohort == 2020) %>%
rename(kin = group) %>%
rename_kin(sex = "2sex") %>%
summarise(count=sum(count), .by = c(kin_label, stage_kin, age_focal)) %>%
ggplot(aes(x = age_focal, y = count, fill=stage_kin)) +
geom_area(colour = "black") +
labs(y = "Expected number of living kin",
x = "Age of a woman born in 2020") +
facet_wrap(. ~ kin_label) +
theme_bw() +
theme(legend.position = "bottom")
Next, we see the age distribution of kin (by education of kin) for a 60 yo woman born in the year 2020:
kin_out_2020_2090$kin_full %>%
dplyr::filter(
group %in% c("d", "gd", "m", "gm")
, age_focal == 60
, cohort == 2020
) %>%
rename(kin = group) %>%
rename_kin("2sex") %>%
ggplot2::ggplot(ggplot2::aes(x = age_kin, y = count, color = stage_kin, fill = stage_kin)) +
ggplot2::geom_bar(position = "stack", stat = "identity") +
ggplot2::scale_x_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90,100)) +
labs(x ="Age of kin for a 65 yo woman born in 2020") +
ggplot2::facet_wrap(. ~ kin_label) +
ggplot2::theme_bw() +
ggplot2::theme(
axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5)
, legend.position = "bottom"
)
Let’s see the expected number of living children and grandchildren for a 60 yo woman over time by the level of educational attainment of the children and grandchildren:
kin_out_2020_2090$kin_summary %>%
filter(
group %in% c("d", "gd")
, age_focal == 60
) %>%
rename(kin = group) %>%
rename_kin(sex = "2sex") %>%
summarise(count=sum(count), .by = c(stage_kin, kin_label, year)) %>%
ggplot(aes(x = year, y = count, fill=stage_kin)) +
geom_area(colour = "black") +
labs(y = "Expected number of descendants a for woman aged 60",
x = "Year") +
facet_grid( . ~ kin_label) +
theme_bw() +
theme(legend.position = "bottom")
And, finally, the level of education of the parents and grandparents of a newborn woman over time:
kin_out_2020_2090$kin_summary %>%
filter(
group %in% c("m", "gm")
, age_focal == 0
) %>%
rename(kin = group) %>%
rename_kin(sex = "2sex") %>%
summarise(count=sum(count), .by = c(stage_kin, kin_label, year)) %>%
ggplot(aes(x = year, y = count, fill=stage_kin)) +
geom_area(colour = "black") +
labs(y = "Expected number of descendants a for newborn woman (aged 0)",
x = "Year") +
facet_grid( . ~ kin_label) +
theme_bw() +
theme(legend.position = "bottom")
We note that there are following simplifying assumptions due to data limitations:
Here, we show how to download fertility, mortality, and population data for other countries and years.
First, we will load the download_wpp24()
function, which
automates the process of downloading these datasets. Note that this
process may take a couple of minutes.
These packages will be used:
Load functions to download and process:
Run the function to download data. In this case, we’re downloading data for Guatemala, but you can chose any country you like. To select the country of interest, refer to the country names available here.
NOTE: This function will take a really long time to run since it downloads a bunch of large files, so make sure to have good and stable internet connection. The good thing is you only need to download the files once!
# Download the data
# Use include_projections = T to include data after 2023
download_wpp24(include_projections = T)
Once the data is downloaded, we can filter the information of interest—for example, by country. To do this, we will load two functions:
UNWPP_data
: This function filters mortality data
(px, qx, mx) and fertility data (fx) by the country of interest and the
desired time period.
UNWPP_pop
: This function filters population size (N)
data.
Now, we read the data and reformat the data from the csv files we downloaded in the earlier step.
## Example output for DemoKin using female rates
# Select country, year and sex to obtain px and fx
fert_female <-
UNWPP_data(country = "Guatemala",
start_year = 1950,
end_year = 2100,
sex = c("Female"),
indicators = c("fx"), #indicators of interest
output_format = "csv", #format you want to save (csv or rdata)
output_file = "Guatemala")
mort_both <-
UNWPP_data(country = "Guatemala",
start_year = 1950,
end_year = 2100,
sex = c("Male", "Female"),
indicators = c("px"), #indicators of interest
output_format = "csv", #format you want to save (csv or rdata)
output_file = "Guatemala")
pop_female <-
UNWPP_pop(country = "Guatemala",
start_year = 1950,
end_year = 2100,
sex = "Female")
pop_male <-
UNWPP_pop(country = "Guatemala",
start_year = 1950,
end_year = 2100,
sex = "Male")
# Join rate objects by sex
rates_female <-
left_join(fert_female, mort_both) %>%
filter(Sex == "Female")
rates_male <-
mort_both %>%
filter(Sex == "Male")
Finally, you can save this data to reuse it in your own project:
write.csv(rates_female, "data/gtm_female.csv", row.names = F)
write.csv(rates_male, "data/gtm_male.csv", row.names = F)
write.csv(pop_female, "data/gtm_popfemale.csv", row.names = F)
write.csv(pop_male, "data/gtm_popmale.csv", row.names = F)
Now we have all the input data needed to run the 2-sex time-variant model (see “1. Setup” in Part 2). We can simply repeat the analysis in Part 2 (starting from “2. Run kinship model”) and get the kin estimates for Guatemala.