Title: | Cache R Expressions, Taking Their Dependencies into Account |
---|---|
Description: | Hash an expression with its dependencies and store its value, reloading it from a file as long as both the expression and its dependencies stay the same. |
Authors: | Ivan Krylov [aut, cre] |
Maintainer: | Ivan Krylov <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.2 |
Built: | 2025-03-03 03:17:08 UTC |
Source: | https://github.com/aitap/depcache |
Hash an expression with its dependencies and store its value, reloading it from a file as long as both the expression and its dependencies stay the same.
The functions in this package take an expression, walk its code to find its dependencies and calculate a hash of them. If a corresponding file already exists, it is loaded; otherwise, the expression is evaluated and its value is saved in the file. Optionally, this check may be performed every time a variable is accessed.
By default, a subdirectory of the current directory is used to store the cache files.
Index of help topics:
cache Evaluate an expression and cache its results depcache-package Cache R Expressions, Taking Their Dependencies into Account depcache.options Caching options setCached Cache-tracking assignment
Ivan Krylov
As an implementation detail, the package currently uses the 64-bit FNV-1a hash: http://www.isthe.com/chongo/tech/comp/fnv/.
The reproducible package uses a similar approach to caching.
a <- 1 # will evaluate expression cache({ message('evaluating expression'); a + 1 }) # 2 # will reuse cached value x %<-% { message('evaluating expression'); a + 1 } # 2 x a <- 2 # will recalculate the value x # 3
a <- 1 # will evaluate expression cache({ message('evaluating expression'); a + 1 }) # 2 # will reuse cached value x %<-% { message('evaluating expression'); a + 1 } # 2 x a <- 2 # will recalculate the value x # 3
This function extracts all dependencies of an R expression, hashes them together with the expression itself and either loads the already-existing file, or evaluates the expression and stores the result in that file.
cache(expr, extra = NULL, ...)
cache(expr, extra = NULL, ...)
expr |
An expression to evaluate or load from cache, unquoted. |
extra |
Any R value that should be considered part of the state deciding
whether the expression should be re-computed. For example, if
|
... |
Additional options, see |
Currently, the hash is obtained by means of serialisation. In order to make semantically same values have same hashes on a wide range of R versions, the following steps are taken:
When computing the hash of the serialized data (only the XDR format version 2 or 3 is supported), the first 14 bytes containing the header (including the version of R that serialized the data) are ignored.
Every function is “rebuilt” from its body before hashing, forcing R to discard the bytecode and the source references from the copy of the function before it's hashed.
Strings are converted to UTF-8 before hashing.
All this is done recursively.
The exact algorithm used and the way hash is obtained are implementation details and may eventually change, though not without a good reason.
Other aspects of R data structures are currently not handled:
Nothing is done about environments. Due to them being reference objects, any fix-up must re-create them from scratch, taking potentially recursive dependencies into account, which is likely expensive.
Some S4 classes (like reference class implementations) just have different representations in different versions of R and third-party packages. They may mean the same thing, but they serialize to different byte sequences.
The result of evaluating expr
, either directly, or loaded from
cache.
a <- 1 # will evaluate the expression the first time cache({ message('evaluating expression'); a + 1 }) # 2 # saved value of the expression will be used cache({ message('evaluating expression') # even if written a bit differently a + 1 }) # 2 a <- -1 # expression evaluated again because dependencies changed cache({ message('evaluating expression'); a + 1 }) # 0
a <- 1 # will evaluate the expression the first time cache({ message('evaluating expression'); a + 1 }) # 2 # saved value of the expression will be used cache({ message('evaluating expression') # even if written a bit differently a + 1 }) # 2 a <- -1 # expression evaluated again because dependencies changed cache({ message('evaluating expression'); a + 1 }) # 0
Control how the dependencies are gathered and hashed to locate the determine the file name to load the cached object from.
depcache.options( defaults = getOption("depcache.version", '0.2'), skip = getOption("depcache.skip", NULL), dir, compress, local.only, format.version, eval.ellipsis, trace.functions )
depcache.options( defaults = getOption("depcache.version", '0.2'), skip = getOption("depcache.skip", NULL), dir, compress, local.only, format.version, eval.ellipsis, trace.functions )
defaults |
A string containing the version of depcache to get other
defaults from. If not set, takes the value from the
To make the caching more reproducible against package updates, call
Currently, versions |
skip |
A character vector of variables to omit from automatically-gathered dependencies. Variables carrying unintended or unimportant state, which would otherwise interfere with obtaining a reproducible hash, should be listed here. This may be useful when a symbol encountered in the expression doesn't signify a variable in the evaluation frame (e.g. non-standard evaluation when plotting with lattice), or when the variable is being assigned to as part of the expression. Defaults to the |
dir |
The directory to store the cache files inside. |
compress |
Passed as the |
local.only |
If |
format.version |
Passed as the |
eval.ellipsis |
Whether to consider |
trace.functions |
Whether to visit the function definitions inside the
expressions being considered for caching. If |
In all cases, explicitly passed arguments override settings from the
options()
, which override the defaults. Depending on the
defaults
argument or the depcache.version
option, the
defaults may change; setting it explicitly will help your scripts stay
forward-compatible.
Here you can find all the versioned parameters with their defaults:
Parameter | Option name |
0.1 |
0.2 |
dir |
depcache.dir |
‘.depcache’ | |
compress |
depcache.compress |
TRUE |
|
local.only |
depcache.local.only |
TRUE |
|
format.version |
depcache.format.version |
2 |
|
eval.ellipsis |
depcache.eval.ellipsis |
FALSE |
TRUE |
trace.functions |
depcache.trace.functions |
TRUE |
FALSE |
This function shouldn't be normally called by the user (except,
perhaps, to verify the parameters about to be passed to the caching
functions), but it is automatically invoked on every call to
cache
, setCached
, or the use of
cache-tracking assignment operators %<-%
and
%->%
. Any additional options passed to the functions
as ...
are handled here, and so are the global
options
.
A list containing the settings to be used by the caching system.
dir |
The directory used for storage of the cache files. |
compress |
Passed to |
skip |
Variables to skip when hashing the dependencies of the expressions. |
local.only |
Whether to ignore non-local dependencies. |
format.version |
Passed to |
# The output is affected by the user's use of options(...) and the # current version of the package options(depcache.local.only = FALSE) print(depcache.options(format.version = 3)) options(depcache.local.only = TRUE) print(depcache.options()) # "skip" makes it possible to avoid mistaking arguments evaluated in a # non-standard way for local variables speed <- 1 options(depcache.skip = 'speed') x %<-% { message('fitting the model'); lm(dist ~ speed, cars) } speed <- 0 # not fitted again despite change in local variable "speed" summary(x)
# The output is affected by the user's use of options(...) and the # current version of the package options(depcache.local.only = FALSE) print(depcache.options(format.version = 3)) options(depcache.local.only = TRUE) print(depcache.options()) # "skip" makes it possible to avoid mistaking arguments evaluated in a # non-standard way for local variables speed <- 1 options(depcache.skip = 'speed') x %<-% { message('fitting the model'); lm(dist ~ speed, cars) } speed <- 0 # not fitted again despite change in local variable "speed" summary(x)
Cache expression values and automatically recalculate them when their dependencies change
symbol %<-% expr expr %->% symbol setCached(symbol, expr, extra = NULL, ...)
symbol %<-% expr expr %->% symbol setCached(symbol, expr, extra = NULL, ...)
symbol |
A variable name to associate with the expression, unquoted. |
expr |
The expression to cache, taking dependencies into account. |
extra |
An unquoted expression to be considered an extra part of the state, in addition to the automatically determined dependencies. Will be evaluated every time the variable is accessed to determine whether it should be recalculated. |
... |
Additional settings, see |
Sets up the variable symbol to automatically recalculate the
value of expr
any time its dependencies change, using
makeActiveBinding
and the same mechanisms that power
cache
.
Initially, expr
is loaded from cache
or
evaluated, and the hash is remembered. When the variable named by
symbol is accessed, its dependencies are hashed together with
expr
(this may be done recursively if the dependencies are
themselves active bindings set up the same way). If the hash changes,
the value of expr
is again loaded from cache
(if
available) or evaluated anew.
To prevent infinite loops during dependency calculation, symbol
is automatically skipped, but a self-dependent expr
is probably
a bad idea anyway.
Returns the value of expr
, invisibly. Called for the side
effect of creating an active binding with a name specified by
symbol
.
a <- 1 # will evaluate the expression first x %<-% { message('evaluating expression'); a + 1 } x # 2 # will reuse cached value { message('evaluating expression') a + 1 # even if written a bit differently } %->% y y # 2 a <- -1 # will evaluate the expression again x # 0 # will load the new cached value y # 0 (setCached(z, x + y)) # 0 a <- 0 # recalculates two out of three z # 2
a <- 1 # will evaluate the expression first x %<-% { message('evaluating expression'); a + 1 } x # 2 # will reuse cached value { message('evaluating expression') a + 1 # even if written a bit differently } %->% y y # 2 a <- -1 # will evaluate the expression again x # 0 # will load the new cached value y # 0 (setCached(z, x + y)) # 0 a <- 0 # recalculates two out of three z # 2