• Easy hacks for plotting population pyramids in R.

Following the explosion of data visualisations in the context of the COVID-19 pandemic, I thought I’d share some easy and flexible code to plot population pyramids in R.

For this exercise, we’ll create a bogus population dataset with population totals by age and sex, including a column to indicate the fraction that each group is of the total population.

pop <- data.frame(
  sex = sort(rep(c("Female", "Male"), 6))
  , age = c("0-9", "10-19", "20-29", "30-39", "40-49", "50+", 
            "0-9", "10-19", "20-29", "30-39", "40-49", "50+")
  , pop = c(256L, 335L, 278L, 155L, 103L, 88L, 266L, 317L, 286L, 145L, 118L, 87L)
  , frac = c(0.11, 0.14, 0.11, 0.06, 0.04, 0.04, 0.11, 0.13, 0.12, 
            0.06, 0.05, 0.04)
)

and explore the data:

head(pop)
##      sex   age pop frac
## 1 Female   0-9 256 0.11
## 2 Female 10-19 335 0.14
## 3 Female 20-29 278 0.11
## 4 Female 30-39 155 0.06
## 5 Female 40-49 103 0.04
## 6 Female   50+  88 0.04

We’ll need three packages from the tidyverse family to plot the pyramid (we’re using ggplot 3.3, so please make sure to have the latest version installed!):

library(dplyr)
library(ggplot2)
library(scales)

and a simple function to determine position of the labels next to the bars:

nudge_fun <- function(df){
  ifelse(df$sex == "Female", (sd(df$pop)/3)*-1, sd(df$pop)/3)
}

Now come the code for plotting. I added comments in the code below. Try adjusting the code to fit your purposes!


pop %>%
  # First, we transforms the columns so that female values show in the
  # left-hand side of the plot, in this case as 'negative values'.
  # I also round some values for convenience.
  mutate(
    pop = ifelse(sex=="Female", pop*(-1), pop*1)
    , frac = ifelse(sex=="Female", frac*(-1), frac*1)
    , share = paste0(abs(round(frac*100,1)), "%")
  ) %>% 
  # This starts the actual plotting, first we define which columns 
  # have the data that we want to use
  ggplot(aes(x = pop, y=age, label = share)) +
  # Now we add a layer to the plot with the bars of the pyramid
  geom_col(aes(fill=sex)) +
  # Add the labels indicating the percentages
  geom_text(aes(label = share),
            position = position_nudge(x = nudge_fun(pop)),
            size = 4
  ) +
  # Custom colours from plotting, you can change these
  scale_fill_manual("", values = c("#990099", "#009900")) +
  # Now we make sure that all values in the horizontal axis are positive
  scale_x_continuous(
    "", breaks = scales::pretty_breaks(n = 6),
    # Small function to rescale y axis
    labels =  function(br) ifelse(abs(br)>=1000,paste0(abs(br)/1000, "k"), abs(br))
  ) +
  # Here you can add your own captions and axis titles
  labs(x = "", y = "", caption = "Your caption here: by @d_alburez") +
  theme_bw() +
  theme(
    legend.position = 'bottom'
    ,axis.title.x=element_blank()
  )

Population Pyramid

# You can export this graph to your current working directory:
## NOT RUN
# ggsave(filename = "pyramid.pdf")