9 Огноотой ажиллах нь
R дээр огноог ашиглах нь бусад объектын төрлүүдтэй ажиллахаас илүү анхаарал шаарддаг. Доор бид энэ үйл явцыг арай хялбар болгохын тулд зарим хэрэгсэл, жишээг санал болгож байна. Аз болоход lubridate гэх мэт багцуудын тусламжтайгаар, практик дадлагын үр дүнд огноог хялбархан янзалж чаддаг болно.
Боловсруулаагүй өгөгдлийг R руу импортлох үед огноог ихэвчлэн тэмдэгт (character) обьект хэмээн таньж оруулдаг бөгөөд энэ обьектыг цагийн цуврал хийх, хугацааны интервалыг тооцоолох гэх мэт огнооны ерөнхий үйлдлүүдэд ашиглах боломжгүй байдаг. Дээрээс нь огноог форматлах олон арга байдаг тул та огнооны аль хэсэг нь юу болохыг (сар, өдөр, цаг гэх мэт) танихад нь R-д туслах шаардлагатай болдог нь огноотой ажиллах процессыг улам төвөгтэй болгодог.
R дахь огноо нь өөрсдийн гэсэн объектын ангилал- Date
ангилал. Энд огноо, цаг хоёуланг нь агуулсан объектуудыг хадгалдаг тусгай ангилал бас байдаг гэдгийг тэмдэглэх нь зүйтэй. Огноо, цагийн объектуудыг албан ёсоор POSIXt
, POSIXct
ба/эсвэл POSIXlt
ангилал гэж нэрлэдэг (ялгаа нь чухал биш). Эдгээр объектыг албан бусаар datetime ангилал гэдэг.
Багана нь огноог агуулсан бол R-д тухайн баганыг огноо гэдгийг таниулах нь чухал.
Огноо бол объектын ангилал бөгөөд түүнтэй ажиллахад төвөгтэй байдаг.
-
Энд бид огноо агуулсан багануудыг
Date
ангилалд хөрвүүлэх хэд хэдэн аргыг танилцуулж байна.
9.1 Бэлтгэл
Багцуудыг ачаалах
This code chunk shows the loading of packages required for this page. In this handbook we emphasize p_load()
from pacman, which installs the package if necessary and loads it for use. You can also load installed packages with library()
from base R. See the page on [R basics] for more information on R packages.
# Checks if package is installed, installs if necessary, and loads package for current session
pacman::p_load(
lubridate, # general package for handling and converting dates
linelist, # has function to "guess" messy dates
aweek, # another option for converting dates to weeks, and weeks to dates
zoo, # additional date/time functions
tidyverse, # data management and visualization
rio) # data import/export
Import data
We import the dataset of cases from a simulated Ebola epidemic. If you want to download the data to follow along step-by-step, see instruction in the [Download handbook and data] page. We assume the file is in the working directory so no sub-folders are specified in this file path.
linelist <- import("linelist_cleaned.xlsx")
9.2 Current date
You can get the current “system” date or system datetime of your computer by doing the following with base R.
# get the system date - this is a DATE class
Sys.Date()
## [1] "2021-08-31"
# get the system time - this is a DATETIME class
Sys.time()
## [1] "2021-08-31 20:15:36 +08"
With the lubridate package these can also be returned with today()
and now()
, respectively. date()
returns the current date and time with weekday and month names.
9.3 Convert to Date
After importing a dataset into R, date column values may look like “1989/12/30”, “05/06/2014”, or “13 Jan 2020”. In these cases, R is likely still treating these values as Character values. R must be told that these values are dates… and what the format of the date is (which part is Day, which is Month, which is Year, etc).
Once told, R converts these values to class Date. In the background, R will store the dates as numbers (the number of days from its “origin” date 1 Jan 1970). You will not interface with the date number often, but this allows for R to treat dates as continuous variables and to allow special operations such as calculating the distance between dates.
By default, values of class Date in R are displayed as YYYY-MM-DD. Later in this section we will discuss how to change the display of date values.
Below we present two approaches to converting a column from character values to class Date.
TIP: You can check the current class of a column with base R function class()
, like class(linelist$date_onset)
.
base R
as.Date()
is the standard, base R function to convert an object or column to class Date (note capitalization of “D”).
Use of as.Date()
requires that:
- You specify the existing format of the raw character date or the origin date if supplying dates as numbers (see section on Excel dates)
- If used on a character column, all date values must have the same exact format (if this is not the case, try
guess_dates()
from the linelist package)
First, check the class of your column with class()
from base R. If you are unsure or confused about the class of your data (e.g. you see “POSIXct”, etc.) it can be easiest to first convert the column to class Character with as.character()
, and then convert it to class Date.
Second, within the as.Date()
function, use the format =
argument to tell R the current format of the character date components - which characters refer to the month, the day, and the year, and how they are separated. If your values are already in one of R’s standard date formats (“YYYY-MM-DD” or “YYYY/MM/DD”) the format =
argument is not necessary.
To format =
, provide a character string (in quotes) that represents the current date format using the special “strptime” abbreviations below. For example, if your character dates are currently in the format “DD/MM/YYYY”, like “24/04/1968”, then you would use format = "%d/%m/%Y"
to convert the values into dates. Putting the format in quotation marks is necessary. And don’t forget any slashes or dashes!
# Convert to class date
linelist <- linelist %>%
mutate(date_onset = as.Date(date_of_onset, format = "%d/%m/%Y"))
Most of the strptime abbreviations are listed below. You can see the complete list by running ?strptime
.
%d = Day number of month (5, 17, 28, etc.)
%j = Day number of the year (Julian day 001-366)
%a = Abbreviated weekday (Mon, Tue, Wed, etc.)
%A = Full weekday (Monday, Tuesday, etc.) %w = Weekday number (0-6, Sunday is 0)
%u = Weekday number (1-7, Monday is 1)
%W = Week number (00-53, Monday is week start)
%U = Week number (01-53, Sunday is week start)
%m = Month number (e.g. 01, 02, 03, 04)
%b = Abbreviated month (Jan, Feb, etc.)
%B = Full month (January, February, etc.)
%y = 2-digit year (e.g. 89)
%Y = 4-digit year (e.g. 1989)
%h = hours (24-hr clock)
%m = minutes
%s = seconds %z = offset from GMT
%Z = Time zone (character)
TIP: The format =
argument of as.Date()
is not telling R the format you want the dates to be, but rather how to identify the date parts as they are before you run the command.
TIP: Be sure that in the format =
argument you use the date-part separator (e.g. /, -, or space) that is present in your dates.
Once the values are in class Date, R will by default display them in the standard format, which is YYYY-MM-DD.
lubridate
Converting character objects to dates can be made easier by using the lubridate package. This is a tidyverse package designed to make working with dates and times more simple and consistent than in base R. For these reasons, lubridate is often considered the gold-standard package for dates and time, and is recommended whenever working with them.
The lubridate package provides several different helper functions designed to convert character objects to dates in an intuitive, and more lenient way than specifying the format in as.Date()
. These functions are specific to the rough date format, but allow for a variety of separators, and synonyms for dates (e.g. 01 vs Jan vs January) - they are named after abbreviations of date formats.
# install/load lubridate
pacman::p_load(lubridate)
The ymd()
function flexibly converts date values supplied as year, then month, then day.
# read date in year-month-day format
ymd("2020-10-11")
## [1] "2020-10-11"
ymd("20201011")
## [1] "2020-10-11"
The mdy()
function flexibly converts date values supplied as month, then day, then year.
# read date in month-day-year format
mdy("10/11/2020")
## [1] "2020-10-11"
mdy("Oct 11 20")
## [1] "2020-10-11"
The dmy()
function flexibly converts date values supplied as day, then month, then year.
# read date in day-month-year format
dmy("11 10 2020")
## [1] "2020-10-11"
dmy("11 October 2020")
## [1] "2020-10-11"
If using piping, the conversion of a character column to dates with lubridate might look like this:
linelist <- linelist %>%
mutate(date_onset = lubridate::dmy(date_onset))
Once complete, you can run class()
to verify the class of the column
# Check the class of the column
class(linelist$date_onset)
Once the values are in class Date, R will by default display them in the standard format, which is YYYY-MM-DD.
Note that the above functions work best with 4-digit years. 2-digit years can produce unexpected results, as lubridate attempts to guess the century.
To convert a 2-digit year into a 4-digit year (all in the same century) you can convert to class character and then combine the existing digits with a pre-fix using str_glue()
from the stringr package (see Characters and strings). Then convert to date.
two_digit_years <- c("15", "15", "16", "17")
str_glue("20{two_digit_years}")
## 2015
## 2015
## 2016
## 2017
Combine columns
You can use the lubridate functions make_date()
and make_datetime()
to combine multiple numeric columns into one date column. For example if you have numeric columns onset_day
, onset_month
, and onset_year
in the data frame linelist
:
linelist <- linelist %>%
mutate(onset_date = make_date(year = onset_year, month = onset_month, day = onset_day))
9.4 Excel dates
In the background, most software store dates as numbers. R stores dates from an origin of 1st January, 1970. Thus, if you run as.numeric(as.Date("1970-01-01))
you will get 0
.
Microsoft Excel stores dates with an origin of either December 30, 1899 (Windows) or January 1, 1904 (Mac), depending on your operating system. See this Microsoft guidance for more information.
Excel dates often import into R as these numeric values instead of as characters. If the dataset you imported from Excel shows dates as numbers or characters like “41369”… use as.Date()
(or lubridate’s as_date()
function) to convert, but instead of supplying a “format” as above, supply the Excel origin date to the argument origin =
.
This will not work if the Excel date is stored in R as a character type, so be sure to ensure the number is class Numeric!
NOTE: You should provide the origin date in R’s default date format (“YYYY-MM-DD”).
# An example of providing the Excel 'origin date' when converting Excel number dates
data_cleaned <- data %>%
mutate(date_onset = as.numeric(date_onset)) %>% # ensure class is numeric
mutate(date_onset = as.Date(date_onset, origin = "1899-12-30")) # convert to date using Excel origin
9.5 Messy dates
The function guess_dates()
from the linelist package attempts to read a “messy” date column containing dates in many different formats and convert the dates to a standard format. You can read more online about guess_dates()
. If guess_dates()
is not yet available on CRAN for R 4.0.2, try install via pacman::p_load_gh("reconhub/linelist")
.
For example guess_dates
would see a vector of the following character dates “03 Jan 2018”, “07/03/1982”, and “08/20/85” and convert them to class Date as: 2018-01-03
, 1982-03-07
, and 1985-08-20
.
linelist::guess_dates(c("03 Jan 2018",
"07/03/1982",
"08/20/85"))
## [1] "2018-01-03" "1982-03-07" "1985-08-20"
Some optional arguments for guess_dates()
that you might include are:
-
error_tolerance
- The proportion of entries which cannot be identified as dates to be tolerated (defaults to 0.1 or 10%) -
last_date
- the last valid date (defaults to current date)
-
first_date
- the first valid date. Defaults to fifty years before the last_date.
# An example using guess_dates on the column dater_onset
<- linelist %>% # the dataset is called linelist
linelist mutate(
date_onset = linelist::guess_dates( # the guess_dates() from package "linelist"
date_onset,error_tolerance = 0.1,
first_date = "2016-01-01"
)
9.6 Working with date-time class
As previously mentioned, R also supports a datetime
class - a column that contains date and time information. As with the Date
class, these often need to be converted from character
objects to datetime
objects.
Convert dates with times
A standard datetime
object is formatted with the date first, which is followed by a time component - for example 01 Jan 2020, 16:30. As with dates, there are many ways this can be formatted, and there are numerous levels of precision (hours, minutes, seconds) that can be supplied.
Luckily, lubridate helper functions also exist to help convert these strings to datetime
objects. These functions are extensions of the date helper functions, with _h
(only hours supplied), _hm
(hours and minutes supplied), or _hms
(hours, minutes, and seconds supplied) appended to the end (e.g. dmy_hms()
). These can be used as shown:
Convert datetime with only hours to datetime object
ymd_h("2020-01-01 16hrs")
## [1] "2020-01-01 16:00:00 UTC"
ymd_h("2020-01-01 4PM")
## [1] "2020-01-01 04:00:00 UTC"
Convert datetime with hours and minutes to datetime object
dmy_hm("01 January 2020 16:20")
## [1] "2020-01-01 16:20:00 UTC"
Convert datetime with hours, minutes, and seconds to datetime object
mdy_hms("01 January 2020, 16:20:40")
## [1] "2020-01-20 16:20:40 UTC"
You can supply time zone but it is ignored. See section later in this page on time zones.
mdy_hms("01 January 2020, 16:20:40 PST")
## [1] "2020-01-20 16:20:40 UTC"
When working with a data frame, time and date columns can be combined to create a datetime column using str_glue()
from stringr package and an appropriate lubridate function. See the page on Characters and strings for details on stringr.
In this example, the linelist
data frame has a column in format “hours:minutes”. To convert this to a datetime we follow a few steps:
- Create a “clean” time of admission column with missing values filled-in with the column median. We do this because lubridate won’t operate on missing values. Combine it with the column
date_hospitalisation
, and then use the functionymd_hm()
to convert.
# packages
::p_load(tidyverse, lubridate, stringr)
pacman
# time_admission is a column in hours:minutes
<- linelist %>%
linelist
# when time of admission is not given, assign the median admission time
mutate(
time_admission_clean = ifelse(
is.na(time_admission), # if time is missing
median(time_admission), # assign the median
# if not missing keep as is
time_admission %>%
)
# use str_glue() to combine date and time columns to create one character column
# and then use ymd_hm() to convert it to datetime
mutate(
date_time_of_admission = str_glue("{date_hospitalisation} {time_admission_clean}") %>%
ymd_hm()
)
Convert times alone
If your data contain only a character time (hours and minutes), you can convert and manipulate them as times using strptime()
from base R. For example, to get the difference between two of these times:
# raw character times
time1 <- "13:45"
time2 <- "15:20"
# Times converted to a datetime class
time1_clean <- strptime(time1, format = "%H:%M")
time2_clean <- strptime(time2, format = "%H:%M")
# Difference is of class "difftime" by default, here converted to numeric hours
as.numeric(time2_clean - time1_clean) # difference in hours
## [1] 1.583333
Note however that without a date value provided, it assumes the date is today. To combine a string date and a string time together see how to use stringr in the section just above. Read more about strptime()
here.
To convert single-digit numbers to double-digits (e.g. to “pad” hours or minutes with leading zeros to achieve 2 digits), see this “Pad length” section of the Characters and strings page.
Extract time
You can extract elements of a time with hour()
, minute()
, or second()
from lubridate.
Here is an example of extracting the hour, and then classifing by part of the day. We begin with the column time_admission
, which is class Character in format “HH:MM”. First, the strptime()
is used as described above to convert the characters to datetime class. Then, the hour is extracted with hour()
, returning a number from 0-24. Finally, a column time_period
is created using logic with case_when()
to classify rows into Morning/Afternoon/Evening/Night based on their hour of admission.
linelist <- linelist %>%
mutate(hour_admit = hour(strptime(time_admission, format = "%H:%M"))) %>%
mutate(time_period = case_when(
hour_admit > 06 & hour_admit < 12 ~ "Morning",
hour_admit >= 12 & hour_admit < 17 ~ "Afternoon",
hour_admit >= 17 & hour_admit < 21 ~ "Evening",
hour_admit >=21 | hour_admit <= 6 ~ "Night"))
To learn more about case_when()
see the page on Cleaning data and core functions.
9.7 Working with dates
lubridate
can also be used for a variety of other functions, such as extracting aspects of a date/datetime, performing date arithmetic, or calculating date intervals
Here we define a date to use for the examples:
# create object of class Date
example_date <- ymd("2020-03-01")
Extract date components
You can extract common aspects such as month, day, weekday:
month(example_date) # month number
## [1] 3
day(example_date) # day (number) of the month
## [1] 1
wday(example_date) # day number of the week (1-7)
## [1] 1
You can also extract time components from a datetime
object or column. This can be useful if you want to view the distribution of admission times.
example_datetime <- ymd_hm("2020-03-01 14:45")
hour(example_datetime) # extract hour
minute(example_datetime) # extract minute
second(example_datetime) # extract second
There are several options to retrieve weeks. See the section on Epidemiological weeks below.
Note that if you are seeking to display a date a certain way (e.g. “Jan 2020” or “Thursday 20 March” or “Week 20, 1977”) you can do this more flexibly as described in the section on Date display.
Date math
You can add certain numbers of days or weeks using their respective function from lubridate.
# add 3 days to this date
example_date + days(3)
## [1] "2020-03-04"
# add 7 weeks and subtract two days from this date
example_date + weeks(7) - days(2)
## [1] "2020-04-17"
Date intervals
The difference between dates can be calculated by:
- Ensure both dates are of class date
- Use subtraction to return the “difftime” difference between the two dates
- If necessary, convert the result to numeric class to perform subsequent mathematical calculations
Below the interval between two dates is calculated and displayed. You can find intervals by using the subtraction “minus” symbol on values that are class Date. Note, however that the class of the returned value is “difftime” as displayed below, and must be converted to numeric.
# find the interval between this date and Feb 20 2020
output <- example_date - ymd("2020-02-20")
output # print
## Time difference of 10 days
class(output)
## [1] "difftime"
To do subsequent operations on a “difftime”, convert it to numeric with as.numeric()
.
This can all be brought together to work with data - for example:
pacman::p_load(lubridate, tidyverse) # load packages
linelist <- linelist %>%
# convert date of onset from character to date objects by specifying dmy format
mutate(date_onset = dmy(date_onset),
date_hospitalisation = dmy(date_hospitalisation)) %>%
# filter out all cases without onset in march
filter(month(date_onset) == 3) %>%
# find the difference in days between onset and hospitalisation
mutate(days_onset_to_hosp = date_hospitalisation - date_of_onset)
In a data frame context, if either of the above dates is missing, the operation will fail for that row. This will result in an NA
instead of a numeric value. When using this column for calculations, be sure to set the na.rm =
argument to TRUE
. For example:
# calculate the median number of days to hospitalisation for all cases where data are available
median(linelist_delay$days_onset_to_hosp, na.rm = T)
9.8 Date display
Once dates are the correct class, you often want them to display differently, for example to display as “Monday 05 January” instead of “2018-01-05”. You may also want to adjust the display in order to then group rows by the date elements displayed - for example to group by month-year.
format()
Adjust date display with the base R function format()
. This function accepts a character string (in quotes) specifying the desired output format in the “%” strptime abbreviations (the same syntax as used in as.Date()
). Below are most of the common abbreviations.
Note: using format()
will convert the values to class Character, so this is generally used towards the end of an analysis or for display purposes only! You can see the complete list by running ?strptime
.
%d = Day number of month (5, 17, 28, etc.)
%j = Day number of the year (Julian day 001-366)
%a = Abbreviated weekday (Mon, Tue, Wed, etc.)
%A = Full weekday (Monday, Tuesday, etc.)
%w = Weekday number (0-6, Sunday is 0)
%u = Weekday number (1-7, Monday is 1)
%W = Week number (00-53, Monday is week start)
%U = Week number (01-53, Sunday is week start)
%m = Month number (e.g. 01, 02, 03, 04)
%b = Abbreviated month (Jan, Feb, etc.)
%B = Full month (January, February, etc.)
%y = 2-digit year (e.g. 89)
%Y = 4-digit year (e.g. 1989)
%h = hours (24-hr clock)
%m = minutes
%s = seconds
%z = offset from GMT
%Z = Time zone (character)
An example of formatting today’s date:
## [1] "31 Наймдугаар сар 2021"
# easy way to get full date and time (default formatting)
date()
## [1] "Tue Aug 31 20:15:37 2021"
# formatted combined date, time, and time zone using str_glue() function
str_glue("{format(Sys.Date(), format = '%A, %B %d %Y, %z %Z, ')}{format(Sys.time(), format = '%H:%M:%S')}")
## мягмар, Наймдугаар сар 31 2021, +0000 UTC, 20:15:37
## [1] "2021 Week 35"
Note that if using str_glue()
, be aware of that within the expected double quotes " you should only use single quotes (as above).
Month-Year
To convert a Date column to Month-year format, we suggest you use the function as.yearmon()
from the zoo package. This converts the date to class “yearmon” and retains the proper ordering. In contrast, using format(column, "%Y %B")
will convert to class Character and will order the values alphabetically (incorrectly).
Below, a new column yearmonth
is created from the column date_onset
, using the as.yearmon()
function. The default (correct) ordering of the resulting values are shown in the table.
# create new column
test_zoo <- linelist %>%
mutate(yearmonth = zoo::as.yearmon(date_onset))
# print table
table(test_zoo$yearmon)
##
## 4-р сар 2014 5-р сар 2014 6-р сар 2014 7-р сар 2014 8-р сар 2014 9-р сар 2014 10-р сар 2014 11-р сар 2014 12-р сар 2014
## 7 64 100 226 528 1070 1112 763 562
## 1-р сар 2015 2-р сар 2015 3-р сар 2015 4-р сар 2015
## 431 306 277 186
In contrast, you can see how only using format()
does achieve the desired display format, but not the correct ordering.
# create new column
test_format <- linelist %>%
mutate(yearmonth = format(date_onset, "%b %Y"))
# print table
table(test_format$yearmon)
##
## 1-р сар 2015 10-р сар 2014 11-р сар 2014 12-р сар 2014 2-р сар 2015 3-р сар 2015 4-р сар 2014 4-р сар 2015 5-р сар 2014
## 431 1112 763 562 306 277 7 186 64
## 6-р сар 2014 7-р сар 2014 8-р сар 2014 9-р сар 2014
## 100 226 528 1070
Note: if you are working within a ggplot()
and want to adjust how dates are displayed only, it may be sufficient to provide a strptime format to the date_labels =
argument in scale_x_date()
- you can use "%b %Y"
or "%Y %b"
. See the ggplot tips page.
zoo also offers the function as.yearqtr()
, and you can use scale_x_yearmon()
when using ggplot()
.
9.9 Epidemiological weeks
lubridate
See the page on Grouping data for more extensive examples of grouping data by date. Below we briefly describe grouping data by weeks.
We generally recommend using the floor_date()
function from lubridate, with the argument unit = "week"
. This rounds the date down to the “start” of the week, as defined by the argument week_start =
. The default week start is 1 (for Mondays) but you can specify any day of the week as the start (e.g. 7 for Sundays). floor_date()
is versitile and can be used to round down to other time units by setting unit =
to “second”, “minute”, “hour”, “day”, “month”, or “year”.
The returned value is the start date of the week, in Date class. Date class is useful when plotting the data, as it will be easily recognized and ordered correctly by ggplot()
.
If you are only interested in adjusting dates to display by week in a plot, see the section in this page on Date display. For example when plotting an epicurve you can format the date display by providing the desired strptime “%” nomenclature. For example, use “%Y-%W” or “%Y-%U” to return the year and week number (given Monday or Sunday week start, respectively).
Weekly counts
See the page on Grouping data for a thorough explanation of grouping data with count()
, group_by()
, and summarise()
. A brief example is below.
- Create a new ‘week’ column with
mutate()
, usingfloor_date()
withunit = "week"
- Get counts of rows (cases) per week with
count()
; filter out any cases with missing date
- Finish with
complete()
from tidyr to ensure that all weeks appear in the data - even those with no rows/cases. By default the count values for any “new” rows are NA, but you can make them 0 with thefill =
argument, which expects a named list (below,n
is the name of the counts column).
# Make aggregated dataset of weekly case counts
weekly_counts <- linelist %>%
drop_na(date_onset) %>% # remove cases missing onset date
mutate(weekly_cases = floor_date( # make new column, week of onset
date_onset,
unit = "week")) %>%
count(weekly_cases) %>% # group data by week and count rows per group (creates column 'n')
tidyr::complete( # ensure all weeks are present, even those with no cases reported
weekly_cases = seq.Date( # re-define the "weekly_cases" column as a complete sequence,
from = min(weekly_cases), # from the minimum date
to = max(weekly_cases), # to the maxiumum date
by = "week"), # by weeks
fill = list(n = 0)) # fill-in NAs in the n counts column with 0
Here are the first rows of the resulting data frame:
Epiweek alternatives
Note that lubridate also has functions week()
, epiweek()
, and isoweek()
, each of which has slightly different start dates and other nuances. Generally speaking though, floor_date()
should be all that you need. Read the details for these functions by entering ?week
into the console or reading the documentation here.
You might consider using the package aweek to set epidemiological weeks. You can read more about it on the RECON website. It has the functions date2week()
and week2date()
in which you can set the week start day with week_start = "Monday"
. This package is easiest if you want “week”-style outputs (e.g. “2020-W12”). Another advantage of aweek is that when date2week()
is applied to a date column, the returned column (week format) is automatically of class Factor and includes levels for all weeks in the time span (this avoids the extra step of complete()
described above). However, aweek does not have the functionality to round dates to other time units such as months, years, etc.
Another alternative for time series which also works well to show a a “week” format (“2020 W12”) is yearweek()
from the package tsibble, as demonstrated in the page on Time series and outbreak detection.
9.10 Converting dates/time zones
When data is present in different time time zones, it can often be important to standardise this data in a unified time zone. This can present a further challenge, as the time zone component of data must be coded manually in most cases.
In R, each datetime object has a timezone component. By default, all datetime objects will carry the local time zone for the computer being used - this is generally specific to a location rather than a named timezone, as time zones will often change in locations due to daylight savings time. It is not possible to accurately compensate for time zones without a time component of a date, as the event a date column represents cannot be attributed to a specific time, and therefore time shifts measured in hours cannot be reasonably accounted for.
To deal with time zones, there are a number of helper functions in lubridate that can be used to change the time zone of a datetime object from the local time zone to a different time zone. Time zones are set by attributing a valid tz database time zone to the datetime object. A list of these can be found here - if the location you are using data from is not on this list, nearby large cities in the time zone are available and serve the same purpose.
https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
# assign the current time to a column
time_now <- Sys.time()
time_now
## [1] "2021-08-31 20:15:37 +08"
# use with_tz() to assign a new timezone to the column, while CHANGING the clock time
time_london_real <- with_tz(time_now, "Europe/London")
# use force_tz() to assign a new timezone to the column, while KEEPING the clock time
time_london_local <- force_tz(time_now, "Europe/London")
# note that as long as the computer that was used to run this code is NOT set to London time,
# there will be a difference in the times
# (the number of hours difference from the computers time zone to london)
time_london_real - time_london_local
## Time difference of -7 hours
This may seem largely abstract, and is often not needed if the user isn’t working across time zones.
9.11 Lagging and leading calculations
lead()
and lag()
are functions from the dplyr package which help find previous (lagged) or subsequent (leading) values in a vector - typically a numeric or date vector. This is useful when doing calculations of change/difference between time units.
Let’s say you want to calculate the difference in cases between a current week and the previous one. The data are initially provided in weekly counts as shown below.
When using lag()
or lead()
the order of rows in the dataframe is very important! - pay attention to whether your dates/numbers are ascending or descending
First, create a new column containing the value of the previous (lagged) week.
- Control the number of units back/forward with
n =
(must be a non-negative integer)
- Use
default =
to define the value placed in non-existing rows (e.g. the first row for which there is no lagged value). By default this isNA
.
- Use
order_by = TRUE
if your the rows are not ordered by your reference column
counts <- counts %>%
mutate(cases_prev_wk = lag(cases_wk, n = 1))
Next, create a new column which is the difference between the two cases columns:
counts <- counts %>%
mutate(cases_prev_wk = lag(cases_wk, n = 1),
case_diff = cases_wk - cases_prev_wk)
You can read more about lead()
and lag()
in the documentation here or by entering ?lag
in your console.
9.12 Resources
lubridate tidyverse page
lubridate RStudio cheatsheet
R for Data Science page on dates and times
Online tutorial Date formats