There are two types of configuration:
load.project()
behaves when executed. For example, whether to have logging enabled.src
or munge
scripts. For example, you may define plot_footnote = "My Proj"
to control a consistent look and feel for plots.Both types are stored in the config
object accessible from the global environment. The function project.config()
will display the current configuration, including project specific configuration.
The current ProjectTemplate
configuration settings exist in the config/global.dcf
file:
data_loading
: This can be set to ‘on’ or ‘off’. If data_loading
is on, the system will load data from both the cache
and data
directories with cache
taking precedence in the case of name conflict. By default, data_loading
is on.data_loading_header
: This can be set to ‘on’ or ‘off’. If data_loading_header
is on, the system will load text data files, such as CSV, TSV, or XLSX, treating the first row as header.data_ignore
: A comma separated list of files to be ignored when importing from the data/
directory. Regular expressions can be used but should be delimited (on both sides) by /
. The default is to ignore no files. Note that filenames and filepaths should never begin with a /
, entire directories under data/
can be ignored by adding a trailing /
. See Mastering ProjectTemplate for more details.cache_loading
: This can be set to ‘on’ or ‘off’. If cache_loading
is on, the system will load data from the cache
directory before any attempt to load from the data
directory. By default, cache_loading
is on.recursive_loading
: This can be set to ‘on’ or ‘off’. If recursive_loading
is on, the system will load data from the data
directory and all its sub directories recursively. By default, recursive_loading
is off.munging
: This can be set to ‘on’ or ‘off’. If munging
is on, the system will execute the files in the munge
directory sequentially using the order implied by the sort()
function. If munging
is off, none of the files in the munge
directory will be executed. By default, munging
is on.logging
: This can be set to ‘on’ or ‘off’. If logging
is on, a logger object using the log4r
package is automatically created when you run load.project()
. This logger will write to the logs
directory. By default, logging
is off.logging_level
: The value of logging_level
is passed to a logger object using the log4r
package during logging when when you run load.project()
. By default, logging
is INFO.load_libraries
: This can be set to ‘on’ or ‘off’. If load_libraries
is on, the system will load all of the R packages listed in the libraries
field described below. By default, load_libraries
is off.libraries
: This is a comma separated list of all the R packages that the user wants to automatically load when load.project()
is called. These packages must already be installed before calling load.project()
. By default, the reshape2, plyr, tidyverse, stringr and lubridate packages are included in this list.as_factors
: This can be set to ‘on’ or ‘off’. If as_factors
is on, the system will convert every character vector into a factor when creating data frames; most importantly, this automatic conversion occurs when reading in data automatically. If ‘off’, character vectors will remain character vectors. By default, as_factors
is off.tables_type
: This is the format for default tables. Values can be ‘tibble’ (default), ‘data_table’, or ‘data_frame’attach_internal_libraries
: This can be set to ‘on’ or ‘off’. If attach_internal_libraries
is on, then every time a new package is loaded into memory during load.project()
a warning will be displayed informing that has happened. By default, attach_internal_libraries
is off.cache_loaded_data
: This can be set to ‘on’ or ‘off’. If cache_loaded_data
is on, then data loaded from the data
directory during load.project()
will be automatically cached (so it won’t need to be reloaded next time load.project()
is called). By default, cache_loaded_data
is on for newly created projects. Existing projects created without this configuration setting will default to off. Similarly, when migrate.project()
is called in those cases, the default will be off.sticky_variables
: This is a comma separated list of any project-specific variables that should remain in the global environment after a clear()
command. This can be used to clear the global environment, but keep any large datasets in place so they are not unnecessarily re-generated during load.project()
. Note that any this will be over-ridden if the force=TRUE
parameter is passed to clear()
. By default, sticky_variables
is NONEunderscore_variables
: This can be set to TRUE to use underscores (_
) in variable names or FALSE to replace underscores (_
) with dots (.
). The default is TRUE. When migrating old projects, underscore_variables
is set to FALSEcache_file_format
: The default file format for cached data is ‘RData’. This can be set to ‘qs’ in order to benefit from the quick serialization of R objects provided by the qs
package.You can override the values in global.dcf
when loading the project by providing the option with the new setting:
> load.project(cache_loading = FALSE) # load the project without loading from the cache
> reload.project(cache_loading = FALSE, # Don't load from cache
data_ignore = '*.tsv') # Don't load tsv files
For backward compatibility it is still possible to provide a list of options, both as an unnamed
argument or named override.config
:
> load.project(list(cache_loading = FALSE)) # load the project without loading from the cache
> reload.project(override.config = list(cache_loading = FALSE, # Don't load from cache
data_ignore = '*.tsv')) # Don't load tsv files
Note that this behavior might be removed in a future version of ProjectTemplate
.
The project specific configuration is specified in the lib/globals.R
file using the add.config
function. This will contain whatever is relevant for your project, and will look something like this:
> add.config(
keep_data = FALSE, # should temporary data be kept?
header = "Private & Confidential" # header in reports
)
Note that commas need to be present after each config item except the last. Comments can also be inserted to document what each config variable does.
To use project specific configuaration in any lib
, munge
or src
script, simply use the form config$keep_data
.
ProjectTemplate
will automatically load project specific content in lib/globals.R
before any other file in lib
, so the filename should not be changed.
The add.config()
function can also be used anywhere in the project. So if a particular analysis in src
wanted to override the value in globals.R
, you can simply add the relevant add.config()
command to the top of that script.
Another option to override the setting is to pass it to load.project()
as with normal options:
> load.project(keep_data = TRUE)
This only works for calls to add.config
with the parameter apply.override
set
to TRUE
, for the moment this parameter defaults to FALSE
. The globals.R
file
in the standard full
template therefore includes two calls to add.config
with
an explanation which section could be overridden.
###Additional parameters
munge_files
: passing munge_files to load_project allows you to run a givien list of munge files in the munge directory
load.project(munge_files=c(“01-preprocess.R”, “03-output.R”))
logs_sub_dir
: logs_sub_dir can be set to maintain multiple log files for every run of the project. this will create a subdirectory under logs directory. the logs will then be written to logs/
load.project(logs_sub_dir= “08-02-2021”)
munge_sub_dir
: munge_sub_dir can be set to run files from a subdirectory under munge. e.g. munge folder contains directories for multiple experiments, munge/experiment1 , munge/experiment2 , then set munge_sub_dir=”experiment1” to run only those files in munge/experiment1
load.project(munge_sub_dir = “experiment1”)
Running the command below will run files from munge/experiment1 and write logs to logs/experiment1/project.log
> load.project(munge_sub_dir="experiment1", logs_sub_dir="experiment1")