| Title: | Vector Look-Ups and Safer Sampling |
|---|---|
| Description: | A collection of utility functions that facilitate looking up vector values from a lookup table, annotate values in at table for clearer viewing, and support a safer approach to vector sampling, sequence generation, and aggregation. |
| Authors: | Magnus Thor Torfason |
| Maintainer: | Magnus Thor Torfason <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.3.9006 |
| Built: | 2026-05-05 06:22:16 UTC |
| Source: | https://github.com/torfason/zmisc |
Assert specific values and set memberships
assert_choice(x, choices, ...) assert_environment(x, ...)assert_choice(x, choices, ...) assert_environment(x, ...)
x |
The variable to assert |
choices |
A vector of values representing the which x must be an element of. |
... |
Additional parameters passed to corresponding checkmate
functions |
The original object if the assertion passes.
assert_dots_empty() is an alias for rlang::check_dots_empty(), provided
for naming consistency with other assertion functions. It throws an error if
any arguments are passed through ....
assert_dots_empty( env = caller_env(), error = NULL, call = caller_env(), action = abort )assert_dots_empty( env = caller_env(), error = NULL, call = caller_env(), action = abort )
env |
Environment in which to look for |
error |
An optional error handler passed to |
call |
The execution environment of a currently
running function, e.g. |
action |
The lookup() function implements lookup of values (such as variable names)
from a lookup table which maps keys onto values (such as variable labels or
descriptions).
The lookup table can be in the form of a two-column data.frame, in the form
of a named vector, or in the form of a list. If the table is in the form
of a data.frame, the key column should be named either key or name, and
the value column should be named value (for the value). If the lookup table
is in the form of a named vector or list, the names are used as the key,
and the returned value is taken from the values in the vector or list.
The underlying lookup is done using base::match(), and all atomic data
types except factor are supported. Factors are omitted due to the ambiguity
in what should be looked up (the values or the levels). It is important that
x, .default and the columns of lookup_table are all of the same type
(specifically of the same base::mode()). If the lookup table is specified
as a vector or list, only the character variables are supported,
because name(lookup_table) is always of mode character.
Original values are returned if they are not found in the lookup table.
Alternatively, a .default can be specified for values that are not found.
Note that it is possible to specify NA as one of the keys to look up
NA values (only when using a data.frame as lookup table).
Any names or attributes of x are preserved.
The lookuper() function returns a function equivalent to the lookup()
function, except that instead of taking a lookup table as an argument, the
lookup table is embedded in the function itself.
This can be very useful, in particular when using the lookup function as an
argument to other functions that expect a function which maps
character->character (or other data types), but do not offer a good way
to pass additional arguments to that function.
lookup(x, lookup_table, ..., .default = x) lookuper(lookup_table, ..., .default = NULL)lookup(x, lookup_table, ..., .default = x) lookuper(lookup_table, ..., .default = NULL)
x |
A vector whose elements are to be looked up. |
lookup_table |
The lookup table to use. |
... |
Reserved for future use. |
.default |
If a value is not found in the lookup table, the value will
be taken from |
The lookup() function returns a vector based on x, with
values replaced with the lookup values from lookup_table. Any values not
found in the lookup table are taken from .default.
The lookuper() function returns a function that takes vectors as
its argument x, and returns either the corresponding values from the
underlying lookup table, or the original values from x for those elements
that are not found in the lookup table (or looks them up from the
default).
fruit_lookup_vector <- c(a = "Apple", b = "Banana", c = "Cherry") lookup(letters[1:5], fruit_lookup_vector) lookup(letters[1:5], fruit_lookup_vector, .default = NA) mtcars_lookup_data_frame <- data.frame( name = c("mpg", "hp", "wt"), value = c("Miles/(US) gallon", "Gross horsepower", "Weight (1000 lbs)")) lookup(names(mtcars), mtcars_lookup_data_frame) # A more complex example, with numeric and NA values numeric_lookup_table <- data.frame( key = c(1:5, NA), value = c(sqrt(1:5), 99999)) lookup(c(0:6, NA), numeric_lookup_table) lookup_fruits <- lookuper(list(a = "Apple", b = "Banana", c = "Cherry")) lookup_fruits(letters[1:5]) lookup_fruits_nomatch_na <- lookuper(list(a = "Apple", b = "Banana", c = "Cherry"), .default = NA) lookup_fruits_nomatch_na(letters[1:5])fruit_lookup_vector <- c(a = "Apple", b = "Banana", c = "Cherry") lookup(letters[1:5], fruit_lookup_vector) lookup(letters[1:5], fruit_lookup_vector, .default = NA) mtcars_lookup_data_frame <- data.frame( name = c("mpg", "hp", "wt"), value = c("Miles/(US) gallon", "Gross horsepower", "Weight (1000 lbs)")) lookup(names(mtcars), mtcars_lookup_data_frame) # A more complex example, with numeric and NA values numeric_lookup_table <- data.frame( key = c(1:5, NA), value = c(sqrt(1:5), 99999)) lookup(c(0:6, NA), numeric_lookup_table) lookup_fruits <- lookuper(list(a = "Apple", b = "Banana", c = "Cherry")) lookup_fruits(letters[1:5]) lookup_fruits_nomatch_na <- lookuper(list(a = "Apple", b = "Banana", c = "Cherry"), .default = NA) lookup_fruits_nomatch_na(letters[1:5])
This function adds level/label information as an annotation to either factors
or labelled variables. This function is called notate() rather than
annotate() to avoid conflict with ggplot2::annotate(). It is a generic that
can operate either on individual vectors or on a data.frame.
When printing labelled variables from a tibble in a console, both the
numeric value and the text label are shown, but no variable labels. When
using the View() function, only variable labels are shown but no value
labels. For factors, there is no way to view the integer levels and values at
the same time.
In order to allow the viewing of both variable and value labels at the same
time, this function converts both factor and labelled variables to
character, including both numeric levels (labelled values) and character
values (labelled labels) in the output.
notate(x)notate(x)
x |
The object (either vector or |
The processed data.frame, suitable for viewing, in particular
through the View() function.
if (getRversion() >= "4") { d <- data.frame( chr = letters[1:4], fct = factor(c("alpha", "bravo", "chrly", "delta")), lbl = ll_labelled(c(1, 2, 3, NA), labels = c(one=1, two=2), label = "A labelled vector") ) dn <- notate(d) dn # View(dn) }if (getRversion() >= "4") { d <- data.frame( chr = letters[1:4], fct = factor(c("alpha", "bravo", "chrly", "delta")), lbl = ll_labelled(c(1, 2, 3, NA), labels = c(one=1, two=2), label = "A labelled vector") ) dn <- notate(d) dn # View(dn) }
Most common checkmate functions, adapted to output rlang style error
messages on failed assertions. The actual checking is done by
checkmate::qtest(), checkmate::check_flag() and related functions.
| R Type | Scalar | Vector |
logical |
assert_flag(x) |
assert_logical(x) |
character |
assert_string(x) |
assert_character(x) |
numeric |
assert_number(x) |
assert_numeric(x) |
integer |
assert_inumber(x)⁴ |
assert_integer(x) |
double |
assert_dnumber(x)⁴ |
assert_double(x) |
integerish¹ |
assert_int(x) |
assert_integerish(x) |
naturalish² |
assert_count(x) |
assert_naturalish(x)⁴ |
factor |
³ | assert_factor(x) |
complex |
³ | assert_complex(x) |
raw |
³ | assert_raw(x) |
Date |
assert_day(x) |
assert_date(x)
|
¹ integerish refers to functional integers (numbers that are very close
to integer values), regardless of type (integer or double )
² naturalish refers to functional integers restricted to the natural
numbers (zero and positive numbers
³ No assertion functions are provided for scalar factor, complex, or raw
⁴ Not available in the checkmate package
| a | b | c |
| a | b | c |
qassert(x, ...) assert_flag(x, ...) assert_string(x, ...) assert_number(x, ...) assert_inumber(x, ...) assert_dnumber(x, ...) assert_int(x, ...) assert_count(x, ...) assert_day(x, ...) assert_logical(x, ...) assert_character(x, ...) assert_numeric(x, ...) assert_integer(x, ...) assert_double(x, ...) assert_integerish(x, ...) assert_naturalish(x, ...) assert_factor(x, ...) assert_complex(x, ...) assert_raw(x, ...) assert_date(x, ...) assert_scalar(x, ...) assert_atomic(x, ...) assert_list(x, ...) assert_list(x, ...) assert_class(x, ...) assert_data_frame(x, ...) assert_data_table(x, ...) assert_tibble(x, ...)qassert(x, ...) assert_flag(x, ...) assert_string(x, ...) assert_number(x, ...) assert_inumber(x, ...) assert_dnumber(x, ...) assert_int(x, ...) assert_count(x, ...) assert_day(x, ...) assert_logical(x, ...) assert_character(x, ...) assert_numeric(x, ...) assert_integer(x, ...) assert_double(x, ...) assert_integerish(x, ...) assert_naturalish(x, ...) assert_factor(x, ...) assert_complex(x, ...) assert_raw(x, ...) assert_date(x, ...) assert_scalar(x, ...) assert_atomic(x, ...) assert_list(x, ...) assert_list(x, ...) assert_class(x, ...) assert_data_frame(x, ...) assert_data_table(x, ...) assert_tibble(x, ...)
x |
The variable to assert |
... |
Additional parameters passed to corresponding checkmate
functions |
The original object if the assertion passes.
Recode elements of a vector using a series of formulas (lhs ~ rhs) passed
via .... Each lhs is matched against elements of x, and the
corresponding rhs provides the new value.
This function is closely based on dplyr::case_match() with minimal changes
to make it more intuitive for re-coding tasks. In particular, rather than
setting unmatched values to NA by default, they remain unchanged
.default, which itself defaults to x. The output type can be controlled
with .ptype. .ptype defaults to .default, which means that type can be
changed by setting .default to either NA or to a value of the same type
as the rhs formula values. Incompatibility between the rhs values and the
.ptype results in a type error.
recode_tilde(x, ..., .default = x, .ptype = NULL)recode_tilde(x, ..., .default = x, .ptype = NULL)
x |
A vector to recode. |
... |
Formulae specifying recoding rules, recoding from |
.default |
Default value for unmatched inputs. Defaults to |
.ptype |
Optional output type, defaults to |
A vector with recoded values.
recode_tilde(letters, "a" ~ "first", "z" ~ "last") recode_tilde(1:5, 1 ~ 10, 2 ~ 20) # Recoding to different type requires explicit .default values recode_tilde(1:4, 1 ~ "low", 2 ~ "medium", 3 ~ "high", .default = NA)recode_tilde(letters, "a" ~ "first", "z" ~ "last") recode_tilde(1:5, 1 ~ 10, 2 ~ 20) # Recoding to different type requires explicit .default values recode_tilde(1:4, 1 ~ "low", 2 ~ "medium", 3 ~ "high", .default = NA)
The zample() function duplicates the functionality of sample(), with the
exception that it does not attempt the (sometimes dangerous)
user-friendliness of switching the interpretation of the first element to a
number if the length of the vector is 1. zample() always treats its first
argument as a vector containing elements that should be sampled, so your code
won't break in unexpected ways when the input vector happens to be of length
1.
zample(x, size = length(x), replace = FALSE, prob = NULL)zample(x, size = length(x), replace = FALSE, prob = NULL)
x |
The vector to sample from |
size |
The number of elements to sample from |
replace |
Should elements be replaced after sampling (defaults to |
prob |
A vector of probability weights (defaults to equal probabilities) |
If what you really want is to sample from an interval between 1 and n, you can
use sample(n) or sample.int(n) (but make sure to only pass vectors of
length one to those functions).
The resulting sample
# For vectors of length 2 or more, zample() and sample() are identical set.seed(42); zample(7:11) set.seed(42); sample(7:11) # For vectors of length 1, zample() will still sample from the vector, # whereas sample() will "magically" switch to interpreting the input # as a number n, and sampling from the vector 1:n. set.seed(42); zample(7) set.seed(42); sample(7) # The other arguments work in the same way as for sample() set.seed(42); zample(7:11, size=13, replace=TRUE, prob=(5:1)^3) set.seed(42); sample(7:11, size=13, replace=TRUE, prob=(5:1)^3) # Of course, sampling more than the available elements without # setting replace=TRUE will result in an error set.seed(42); tryCatch(zample(7, size=2), error=wrap_error)# For vectors of length 2 or more, zample() and sample() are identical set.seed(42); zample(7:11) set.seed(42); sample(7:11) # For vectors of length 1, zample() will still sample from the vector, # whereas sample() will "magically" switch to interpreting the input # as a number n, and sampling from the vector 1:n. set.seed(42); zample(7) set.seed(42); sample(7) # The other arguments work in the same way as for sample() set.seed(42); zample(7:11, size=13, replace=TRUE, prob=(5:1)^3) set.seed(42); sample(7:11, size=13, replace=TRUE, prob=(5:1)^3) # Of course, sampling more than the available elements without # setting replace=TRUE will result in an error set.seed(42); tryCatch(zample(7, size=2), error=wrap_error)
The zeq() function creates an increasing integer sequence, but differs from
the standard one in that it will not silently generate a decreasing sequence
when the second argument is smaller than the first. If the second argument is
one smaller than the first it will generate an empty sequence, if the
difference is greater, the function will throw an error.
zeq(from, to)zeq(from, to)
from |
The lower bound of the sequence |
to |
The higher bound of the sequence |
A sequence ranging from from to to
# For increasing sequences, zeq() and seq() are identical zeq(11,15) zeq(11,11) # If second argument equals first-1, an empty sequence is returned zeq(11,10) # If second argument is less than first-1, the function throws an error tryCatch(zeq(11,9), error=wrap_error)# For increasing sequences, zeq() and seq() are identical zeq(11,15) zeq(11,11) # If second argument equals first-1, an empty sequence is returned zeq(11,10) # If second argument is less than first-1, the function throws an error tryCatch(zeq(11,9), error=wrap_error)
The zingle() function returns the first element in a vector, but only if
all the other elements are identical to the first one (the vector only has a
zingle value). If the elements are not all identical, it throws an error.
The vector must contain at least one non-NA value, or the function errors
out as well. This is especially useful in aggregations, when all values in a
given group should be identical, but you want to make sure.
zingle(x, na.rm = FALSE)zingle(x, na.rm = FALSE)
x |
Vector of elements that should all be identical |
na.rm |
Should |
Optionally takes a na.rm parameter, similarly to sum, mean and other
aggregate functions. If TRUE, NA values will be removed prior to
comparing the elements, so the function will accept input values that contain
a combination of the single value and any NA values (but at least one
non-NA value is required).
Only values are tested for equality. Any names are simply ignored, and the result is an unnamed value. This is in line with how other aggregation functions handle names.
The zingle element in the vector
# If all elements are identical, all is good. # The value of the element is returned. zingle(c("Alpha", "Alpha", "Alpha")) # If any elements differ, an error is thrown tryCatch(zingle(c("Alpha", "Beta", "Alpha")), error=wrap_error) if (require("dplyr", quietly=TRUE, warn.conflicts=FALSE)) { d <- tibble::tribble( ~id, ~name, ~fouls, 1, "James", 3, 2, "Jack", 2, 1, "James", 4 ) # If the data is of the correct format, all is good d %>% dplyr::group_by(id) %>% dplyr::summarise(name=zingle(name), total_fouls=sum(fouls)) } if (require("dplyr", quietly=TRUE, warn.conflicts=FALSE)) { # If a name does not match its ID, we should get an error d[1,"name"] <- "Jammes" tryCatch({ d %>% dplyr::group_by(id) %>% dplyr::summarise(name=zingle(name), total_fouls=sum(fouls)) }, error=wrap_error) }# If all elements are identical, all is good. # The value of the element is returned. zingle(c("Alpha", "Alpha", "Alpha")) # If any elements differ, an error is thrown tryCatch(zingle(c("Alpha", "Beta", "Alpha")), error=wrap_error) if (require("dplyr", quietly=TRUE, warn.conflicts=FALSE)) { d <- tibble::tribble( ~id, ~name, ~fouls, 1, "James", 3, 2, "Jack", 2, 1, "James", 4 ) # If the data is of the correct format, all is good d %>% dplyr::group_by(id) %>% dplyr::summarise(name=zingle(name), total_fouls=sum(fouls)) } if (require("dplyr", quietly=TRUE, warn.conflicts=FALSE)) { # If a name does not match its ID, we should get an error d[1,"name"] <- "Jammes" tryCatch({ d %>% dplyr::group_by(id) %>% dplyr::summarise(name=zingle(name), total_fouls=sum(fouls)) }, error=wrap_error) }