45 Directory interactions
In this page we cover common scenarios where you create, interact with, save, and import with directories (folders).
45.1 Preparation
fs package
The fs package is a tidyverse package that facilitate directory interactions, improving on some of the base R functions. In the sections below we will often use functions from fs.
pacman::p_load(
fs, # file/directory interactions
rio, # import/export
here, # relative file pathways
tidyverse) # data management and visualization
Print directory as a dendrogram tree
Use the function dir_tree()
from fs.
Provide the folder filepath to path =
and decide whether you want to show only one level (recurse = FALSE
) or all files in all sub-levels (recurse = TRUE
). Below we use here()
as shorthand for the R project and specify its sub-folder “data”, which contains all the data used for this R handbook. We set it to display all files within “data” and its sub-folders (e.g. “cache”, “epidemic models”, “population”, “shp”, and “weather”).
## C:/Users/Temuulen/My Drive (temuulen@alumni.usc.edu)/#applied-epi/epiRhandbook_mn/data
## +-- cache
## | \-- epidemic_models
## | +-- 2015-04-30
## | | +-- estimated_reported_cases_samples.rds
## | | +-- estimate_samples.rds
## | | +-- latest_date.rds
## | | +-- reported_cases.rds
## | | +-- summarised_estimated_reported_cases.rds
## | | +-- summarised_estimates.rds
## | | \-- summary.rds
## | +-- epinow_res.rds
## | +-- epinow_res_small.rds
## | +-- generation_time.rds
## | \-- incubation_period.rds
## +-- case_linelists
## | +-- cleaning_dict.csv
## | +-- fluH7N9_China_2013.csv
## | +-- linelist_cleaned.rds
## | +-- linelist_cleaned.xlsx
## | \-- linelist_raw.xlsx
## +-- example
## | +-- Central Hospital.csv
## | +-- district_weekly_count_data.xlsx
## | +-- fluH7N9_China_2013.csv
## | +-- hospital_linelists.xlsx
## | +-- linelists
## | | +-- 20201007linelist.csv
## | | +-- case_linelist20201006.csv
## | | +-- case_linelist_2020-10-02.csv
## | | +-- case_linelist_2020-10-03.csv
## | | +-- case_linelist_2020-10-04.csv
## | | +-- case_linelist_2020-10-05.csv
## | | \-- case_linelist_2020-10-08.xlsx
## | +-- Military Hospital.csv
## | +-- Missing.csv
## | +-- Other.csv
## | +-- Port Hospital.csv
## | \-- St. Mark's Maternity Hospital (SMMH).csv
## +-- flexdashboard
## | +-- outbreak_dashboard.html
## | +-- outbreak_dashboard.Rmd
## | +-- outbreak_dashboard_shiny.Rmd
## | +-- outbreak_dashboard_test.html
## | \-- outbreak_dashboard_test.Rmd
## +-- gis
## | +-- africa_countries.geo.json
## | +-- covid_incidence.csv
## | +-- covid_incidence_map.R
## | +-- linelist_cleaned_with_adm3.rds
## | +-- population
## | | +-- sle_admpop_adm3_2020.csv
## | | \-- sle_population_statistics_sierraleone_2020.xlsx
## | \-- shp
## | +-- README.txt
## | +-- sle_adm3.CPG
## | +-- sle_adm3.dbf
## | +-- sle_adm3.prj
## | +-- sle_adm3.sbn
## | +-- sle_adm3.sbx
## | +-- sle_adm3.shp
## | +-- sle_adm3.shp.xml
## | +-- sle_adm3.shx
## | +-- sle_hf.CPG
## | +-- sle_hf.dbf
## | +-- sle_hf.prj
## | +-- sle_hf.sbn
## | +-- sle_hf.sbx
## | +-- sle_hf.shp
## | \-- sle_hf.shx
## +-- godata
## | +-- cases_clean.rds
## | +-- contacts_clean.rds
## | +-- followups_clean.rds
## | \-- relationships_clean.rds
## +-- likert_data.csv
## +-- linelist_cleaned.xlsx
## +-- make_evd_dataset.R
## +-- malaria_app
## | +-- app.R
## | +-- data
## | | \-- facility_count_data.rds
## | +-- funcs
## | | \-- plot_epicurve.R
## | +-- global.R
## | +-- malaria_app.Rproj
## | +-- server.R
## | \-- ui.R
## +-- malaria_facility_count_data.rds
## +-- phylo
## | +-- sample_data_Shigella_tree.csv
## | +-- Shigella_subtree_2.nwk
## | +-- Shigella_subtree_2.txt
## | \-- Shigella_tree.txt
## +-- rmarkdown
## | +-- outbreak_report.docx
## | +-- outbreak_report.html
## | +-- outbreak_report.pdf
## | +-- outbreak_report.pptx
## | +-- outbreak_report.Rmd
## | +-- report_tabbed_example.html
## | \-- report_tabbed_example.Rmd
## +-- standardization
## | +-- country_demographics.csv
## | +-- country_demographics_2.csv
## | +-- deaths_countryA.csv
## | +-- deaths_countryB.csv
## | \-- world_standard_population_by_sex.csv
## +-- surveys
## | +-- population.xlsx
## | +-- survey_data.xlsx
## | \-- survey_dict.xlsx
## \-- time_series
## +-- campylobacter_germany.xlsx
## \-- weather
## +-- germany_weather2002.nc
## +-- germany_weather2003.nc
## +-- germany_weather2004.nc
## +-- germany_weather2005.nc
## +-- germany_weather2006.nc
## +-- germany_weather2007.nc
## +-- germany_weather2008.nc
## +-- germany_weather2009.nc
## +-- germany_weather2010.nc
## \-- germany_weather2011.nc
45.2 List files in a directory
To list just the file names in a directory you can use dir()
from base R. For example, this command lists the file names of the files in the “population” subfolder of the “data” folder in an R project. The relative filepath is provided using here()
(which you can read about more in the [Import and export] page).
## [1] "sle_admpop_adm3_2020.csv" "sle_population_statistics_sierraleone_2020.xlsx"
To list the full file paths of the directory’s files, you can use you can use dir_ls()
from fs. A base R alternative is list.files()
.
# file paths
dir_ls(here("data", "gis", "population"))
## C:/Users/Temuulen/My Drive (temuulen@alumni.usc.edu)/#applied-epi/epiRhandbook_mn/data/gis/population/sle_admpop_adm3_2020.csv
## C:/Users/Temuulen/My Drive (temuulen@alumni.usc.edu)/#applied-epi/epiRhandbook_mn/data/gis/population/sle_population_statistics_sierraleone_2020.xlsx
To get all the metadata information about each file in a directory, (e.g. path, modification date, etc.) you can use dir_info()
from fs.
This can be particularly useful if you want to extract the last modification time of the file, for example if you want to import the most recent version of a file. For an example of this, see the [Import and export] page.
# file info
dir_info(here("data", "gis", "population"))
Here is the data frame returned. Scroll to the right to see all the columns.
45.3 File information
To extract metadata information about a specific file, you can use file_info()
from fs (or file.info()
from base R).
file_info(here("data", "case_linelists", "linelist_cleaned.rds"))
Here we use the $
to index the result and return only the modification_time
value.
file_info(here("data", "case_linelists", "linelist_cleaned.rds"))$modification_time
## [1] "2021-08-31 20:15:36 +08"
45.4 Check if exists
R objects
You can use exists()
from base R to check whether an R object exists within R (supply the object name in quotes).
exists("linelist")
## [1] TRUE
Note that some base R packages use generic object names like “data” behind the scenes, that will appear as TRUE unless inherit = FALSE
is specified. This is one reason to not name your dataset “data”.
exists("data")
## [1] TRUE
exists("data", inherit = FALSE)
## [1] FALSE
If you are writing a function, you should use missing()
from base R to check if an argument is present or not, instead of exists()
.
Directories
To check whether a directory exists, provide the file path (and file name) to is_dir()
from fs. Scroll to the right to see that TRUE
is printed.
is_dir(here("data"))
## C:/Users/Temuulen/My Drive (temuulen@alumni.usc.edu)/#applied-epi/epiRhandbook_mn/data
## TRUE
An alternative is file.exists()
from base R.
Files
To check if a specific file exists, use is_file()
from fs. Scroll to the right to see that TRUE
is printed.
is_file(here("data", "case_linelists", "linelist_cleaned.rds"))
## C:/Users/Temuulen/My Drive (temuulen@alumni.usc.edu)/#applied-epi/epiRhandbook_mn/data/case_linelists/linelist_cleaned.rds
## TRUE
A base R alternative is file.exists()
.
45.5 Create
Directories
To create a new directory (folder) you can use dir_create()
from fs. If the directory already exists, it will not be overwritten and no error will be returned.
dir_create(here("data", "test"))
An alternative is dir.create()
from base R, which will show an error if the directory already exists. In contrast, dir_create()
in this scenario will be silent.
Files
You can create an (empty) file with file_create()
from fs. If the file already exists, it will not be over-written or changed.
file_create(here("data", "test.rds"))
A base R alternative is file.create()
. But if the file already exists, this option will truncate it. If you use file_create()
the file will be left unchanged.
45.6 Delete
R objects
Use rm()
from base R to remove an R object.
45.7 Running other files
source()
To run one R script from another R script, you can use the source()
command (from base R).
This is equivalent to viewing the above R script and clicking the “Source” button in the upper-right of the script. This will execute the script but will do it silently (no output to the R console) unless specifically intended. See the page on [Interactive console] for examples of using source()
to interact with a user via the R console in question-and-answer mode.
render()
render()
is a variation on source()
most often used for R markdown scripts. You provide the input =
which is the R markdown file, and also the output_format =
(typically either “html_document”, “pdf_document”, “word_document”, "")
See the page on Reports with R Markdown for more details. Also see the documentation for render()
here or by entering ?render
.
Run files in a directory
You can create a for loop and use it to source()
every file in a directory, as identified with dir()
.
for(script in dir(here("scripts"), pattern = ".R$")) { # for each script name in the R Project's "scripts" folder (with .R extension)
source(here("scripts", script)) # source the file with the matching name that exists in the scripts folder
}
If you only want to run certain scripts, you can identify them by name like this:
scripts_to_run <- c(
"epicurves.R",
"demographic_tables.R",
"survival_curves.R"
)
for(script in scripts_to_run) {
source(here("scripts", script))
}
Here is a comparison of the fs and base R functions.
Import files in a directory
See the page on [Import and export] for importing and exporting individual files.
Also see the [Import and export] page for methods to automatically import the most recent file, based on a date in the file name or by looking at the file meta-data.
See the page on Iteration, loops, and lists for an example with the package purrr demonstrating:
- Splitting a data frame and saving it out as multiple CSV files
- Splitting a data frame and saving each part as a separate sheet within one Excel workbook
- Importing multiple CSV files and combining them into one dataframe
- Importing an Excel workbook with multiple sheets and combining them into one dataframe
45.8 base R
See below the functions list.files()
and dir()
, which perform the same operation of listing files within a specified directory. You can specify ignore.case =
or a specific pattern to look for.
list.files(path = here("data"))
list.files(path = here("data"), pattern = ".csv")
# dir(path = here("data"), pattern = ".csv")
list.files(path = here("data"), pattern = "evd", ignore.case = TRUE)
If a file is currently “open”, it will display in your folder with a tilde in front, like “~$hospital_linelists.xlsx”.