| Title: | Bidirectional Spreadsheet-Formula to OOXML Translator |
|---|---|
| Description: | Translates spreadsheet formulas between the user-facing format (as displayed to the end user) and the OOXML storage format (as found in .xlsx XML source). Handles future function prefixes (_xlfn.), web-service namespaces (_xlfn._xlws.), LAMBDA parameter prefixes (_xlpm.), spilled range operators (ANCHORARRAY), implicit intersection (SINGLE), and optional localised function name translation across the locales in the Microsoft Terminology Collection (run setup_terminology() once to download and cache the translations). |
| Authors: | Jan Marvin Garbuszus [aut, cre] |
| Maintainer: | Jan Marvin Garbuszus <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.0 |
| Built: | 2026-05-16 11:04:31 UTC |
| Source: | https://github.com/JanMarvin/spaghetti |
Tokenises a formula and reports any function names that are not in the function registry, with spelling suggestions for likely typos.
check_formula(formula, locale = NULL)check_formula(formula, locale = NULL)
formula |
Character scalar or vector of formulas. |
locale |
Locale code or NULL. When set, localised names are translated before checking. |
Unlike to_xml(), this does not translate — it is a pure linting pass.
A data frame with columns formula, fn, suggestion.
fn is the unknown function name. suggestion is a comma-separated
string of close matches, or NA if no suggestion was found.
Returns an empty data frame (invisibly) if no issues are found.
check_formula("=SUIM(A1:A10)") # typo: SUIM -> SUM check_formula("=VLOKUP(A1,B:B,2,0)") # typo: VLOKUP -> VLOOKUP check_formula(c("=SUM(A1)", "=FLITER(A1:A10,B1:B10>0)"))check_formula("=SUIM(A1:A10)") # typo: SUIM -> SUM check_formula("=VLOKUP(A1,B:B,2,0)") # typo: VLOKUP -> VLOOKUP check_formula(c("=SUM(A1)", "=FLITER(A1:A10,B1:B10>0)"))
Remove the cached terminology RDS.
clear_terminology()clear_terminology()
Invisibly TRUE if a cache was removed, FALSE if none existed.
Convert an OOXML storage formula to user-facing format
from_xml(formula, locale = NULL)from_xml(formula, locale = NULL)
formula |
Character scalar or vector. OOXML formula(s), with or without |
locale |
Two-letter locale code or NULL. When set, function names are translated to the target locale and the locale separator is used in output. |
Character scalar or vector: user-facing formula(s) starting with =.
from_xml("=_xlfn.SEQUENCE(10)") from_xml("=_xlfn.LAMBDA(_xlpm.temp, (5/9) * (_xlpm.temp-32))(100)") from_xml("=_xlfn._xlws.FILTER(A1:A10,B1:B10>5)") from_xml("=SUM(_xlfn.ANCHORARRAY(A1))") ## Not run: from_xml("=_xlfn.SEQUENCE(10)", locale = "de") from_xml(c("=_xlfn.SEQUENCE(5)", "=SUM(_xlfn.ANCHORARRAY(A1))"))from_xml("=_xlfn.SEQUENCE(10)") from_xml("=_xlfn.LAMBDA(_xlpm.temp, (5/9) * (_xlpm.temp-32))(100)") from_xml("=_xlfn._xlws.FILTER(A1:A10,B1:B10>5)") from_xml("=SUM(_xlfn.ANCHORARRAY(A1))") ## Not run: from_xml("=_xlfn.SEQUENCE(10)", locale = "de") from_xml(c("=_xlfn.SEQUENCE(5)", "=SUM(_xlfn.ANCHORARRAY(A1))"))
Reads from the cached Microsoft Terminology Collection table; available
only after setup_terminology() has been run. Returns NA for every
input if the cache is missing.
function_description(fn)function_description(fn)
fn |
Character vector of function names (English, any case). |
Character vector of descriptions (NA where not found / available).
function_description(c("SUM", "XLOOKUP", "LAMBDA"))function_description(c("SUM", "XLOOKUP", "LAMBDA"))
Useful for inspecting the registry without running a full translation.
function_prefix(fn)function_prefix(fn)
fn |
Character vector of function names. |
Character vector: "legacy", "xlfn", or "xlws".
function_prefix(c("SUM", "SEQUENCE", "FILTER", "LAMBDA", "XLOOKUP"))function_prefix(c("SUM", "SEQUENCE", "FILTER", "LAMBDA", "XLOOKUP"))
Columns: fn (English name), description (if available), and one
column per supported locale with the localised function name (NA if
not translated for that locale).
function_table()function_table()
If setup_terminology() hasn't been run, returns a single-column
data frame with just fn.
data.frame
setup_terminology(), supported_locales()
head(function_table())head(function_table())
Check whether locale terminology has been loaded.
has_terminology()has_terminology()
Logical scalar.
setup_terminology(), clear_terminology()
has_terminology()has_terminology()
A formula is considered "already OOXML" if it contains at least one
_xlfn., _xlws., or _xlpm. token.
is_ooxml(formula)is_ooxml(formula)
formula |
Character scalar. |
Logical.
is_ooxml("=_xlfn.SEQUENCE(10)") # TRUE is_ooxml("=SEQUENCE(10)") # FALSEis_ooxml("=_xlfn.SEQUENCE(10)") # TRUE is_ooxml("=SEQUENCE(10)") # FALSE
Converts to OOXML then back to user-facing form. Useful for testing idempotency.
round_trip(formula, locale = NULL, out_locale = NULL)round_trip(formula, locale = NULL, out_locale = NULL)
formula |
Character scalar. |
locale |
Locale for input formula (passed to |
out_locale |
Locale for output (passed to |
Named list with xml (OOXML form) and formula (round-tripped
user-facing form).
round_trip("=LAMBDA(x, x * 2)(5)")round_trip("=LAMBDA(x, x * 2)(5)")
Source()s the parser script bundled in inst/extdata/parse_locales.R,
then calls its download_and_parse_tbx() function to fetch the zip
from Microsoft's public download URL, validate the SHA-256 (if you
supply one), unzip, parse, and write the resulting translation table
to a per-user cache directory.
setup_terminology( expected_sha256 = .MTC_EXPECTED_SHA256, force = FALSE, workers = max(1L, parallel::detectCores() - 1L), quiet = FALSE )setup_terminology( expected_sha256 = .MTC_EXPECTED_SHA256, force = FALSE, workers = max(1L, parallel::detectCores() - 1L), quiet = FALSE )
expected_sha256 |
SHA-256 hex digest expected for the downloaded
zip. Defaults to the digest of the version of the zip known to this
release of spaghetti. A mismatch produces a warning (not an error),
since Microsoft may republish the file. Pass |
force |
If TRUE, re-download even if a cache exists. |
workers |
Number of parallel TBX-parsing workers. Default detects cores - 1, capped to 8. |
quiet |
If TRUE, suppress progress messages. |
Subsequent R sessions load the cache automatically on package attach.
Invisibly, the path to the cached RDS.
The parser uses the openxlsx2 package (for XML parsing) and
digest (for SHA-256 verification). Both are declared in
Suggests: and installed only if you call this function.
Microsoft has not published an explicit license for the contents of the Terminology Collection. The data is downloaded directly from Microsoft; this package does not redistribute it.
has_terminology(), clear_terminology(), terminology_info()
## Not run: setup_terminology() # Skip verification entirely: setup_terminology(expected_sha256 = NULL) ## End(Not run)## Not run: setup_terminology() # Skip verification entirely: setup_terminology(expected_sha256 = NULL) ## End(Not run)
Returns the locale columns present in the cached function table, i.e.
the locales for which at least partial translation data was loaded.
Returns character(0) if setup_terminology() hasn't been run.
supported_locales()supported_locales()
Character vector of locale codes (e.g. c("de", "fr", "es", …)).
setup_terminology(), has_terminology()
supported_locales()supported_locales()
Returns the provenance attributes that were attached to the cached
RDS at download time: source URL, observed SHA-256, download timestamp,
and the spaghetti version that produced the cache. Returns NULL if
no terminology is currently loaded.
terminology_info()terminology_info()
A named list, or NULL.
terminology_info()terminology_info()
Convert a user-facing formula to OOXML storage format
to_xml(formula, locale = NULL, warn_unknown = TRUE)to_xml(formula, locale = NULL, warn_unknown = TRUE)
formula |
Character scalar or vector. Formula(s), with or without |
locale |
Two-letter locale code ( |
warn_unknown |
Logical; warn for unknown function names (default TRUE). |
Character scalar or vector: OOXML formula(s) starting with =.
to_xml("=SEQUENCE(10)") to_xml("=LAMBDA(temp, (5/9) * (temp-32))(100)") to_xml("=FILTER(A1:A10, B1:B10 > 5)") to_xml("=SUM(A1#)") to_xml("=LET(tc,(B2-32)*5/9,rh,0.6,tc*ATAN(0.151977*(rh*100+8.313659)^0.5))") ## Not run: to_xml("=SUMMEWENN(A1:A10;\"x\";B1:B10)", locale = "de") to_xml(c("=SUM(A1:A10)", "=SEQUENCE(5)", "=FILTER(A1:A10, B1:B10 > 0)"))to_xml("=SEQUENCE(10)") to_xml("=LAMBDA(temp, (5/9) * (temp-32))(100)") to_xml("=FILTER(A1:A10, B1:B10 > 5)") to_xml("=SUM(A1#)") to_xml("=LET(tc,(B2-32)*5/9,rh,0.6,tc*ATAN(0.151977*(rh*100+8.313659)^0.5))") ## Not run: to_xml("=SUMMEWENN(A1:A10;\"x\";B1:B10)", locale = "de") to_xml(c("=SUM(A1:A10)", "=SEQUENCE(5)", "=FILTER(A1:A10, B1:B10 > 0)"))
Parses a formula using spaghetti's lexer and renders the token tree in
the style of tidyxl::xlex(). Nesting follows the call structure: tokens
inside a function's parentheses appear as children of that function node.
xlex(formula, locale = NULL, print = TRUE)xlex(formula, locale = NULL, print = TRUE)
formula |
Character scalar. Formula with or without leading |
locale |
Two-letter locale code or NULL. Used to select the correct
argument separator ( |
print |
Logical. Print the tree to the console (default TRUE). Set FALSE to get the data frame silently. |
The formula is displayed as-is (no OOXML translation). Pass the result of
to_xml() if you want to inspect the storage form.
A data frame with columns depth, val, label, invisibly.
depth is the nesting level (0 = root, 1 = top-level arguments, …).
Printed as a side-effect when print = TRUE.
xlex("=SUM(A1:A10)") xlex("=IF(A1>0, VLOOKUP(A1, B:C, 2, 0), NA())") xlex("=LAMBDA(x, x * 2)(5)") xlex("=SUMMEWENNS(C2:C10; A2:A10; \"Berlin\")", locale = "de") # Inspect OOXML form xlex(to_xml("=FILTER(A1:A10, B1:B10 > 0)"))xlex("=SUM(A1:A10)") xlex("=IF(A1>0, VLOOKUP(A1, B:C, 2, 0), NA())") xlex("=LAMBDA(x, x * 2)(5)") xlex("=SUMMEWENNS(C2:C10; A2:A10; \"Berlin\")", locale = "de") # Inspect OOXML form xlex(to_xml("=FILTER(A1:A10, B1:B10 > 0)"))