Add formulas to a workbook

library(openxlsx2)

Below you find various examples how to create formulas with openxlsx2. Though, before we start with the examples, let us begin with a word of warning. Please be aware, while it is possible to create all these formulas, they are not evaluated unless they are opened in spreadsheet software. Even worse, if there are cells containing the result of some formula, it can not be trusted unless the formula is evaluated in spreadsheet software.

This can be shown in a simple example: We have a spreadsheet with a formula A1 + B1. This formula was evaluated with spreadsheet software as A1 + B1 = 2. Therefore if we read the cell, we see the value 2. Lets recreate this output in openxlsx2

# Create artificial xlsx file
wb <- wb_workbook()$add_worksheet()$add_data(x = t(c(1, 1)), col_names = FALSE)$
  add_formula(dims = "C1", x = "A1 + B1")
# Users should never modify cc as shown here
wb$worksheets[[1]]$sheet_data$cc$v[3] <- 2

# we expect a value of 2
wb_to_df(wb, col_names = FALSE)
#>   A B C
#> 1 1 1 2

Now, lets assume we modify the data in cell A1 to 2.

wb$add_data(x = 2)

# we expect 3
wb_to_df(wb, col_names = FALSE)
#>   A B C
#> 1 2 1 2

What happened? Even though we see cells A1 and B1 show a value of 2 and 1 our formula in C1 was not updated. It still shows a value of 2. This is because openxlsx2 does not evaluate formulas and workbooks on a more general scale. In the open xml style the cell looks something like this:

<c r="C1">
  <f>A1 + B1</f>
  <v>2</v>
</c>

And when we read from this cell, we always return the value of v. In this case it is obvious, but still wrong and it is a good idea to check if underlying fields contain formulas.

wb_to_df(wb, col_names = FALSE, show_formula = TRUE)
#>   A B       C
#> 1 2 1 A1 + B1

If openxlsx2 writes formulas, as shown in the examples below, the fields will be entirely blank. These fields will only be evaluated and filled, once the output file is opened in spreadsheet software.

The only way to avoid surprises is to be aware of this all the time and similar, checking for similar things all the time.

Simple formulas

wb <- wb_workbook()$add_worksheet()$
  add_data(x = head(cars))$
  add_formula(x = "SUM(A2, B2)", dims = "D2")$
  add_formula(x = "A2 + B2", dims = "D3")
# wb$open()

Array formulas

wb <- wb_workbook()$add_worksheet()$
  add_data(x = head(cars))$
  add_formula(x = "A2:A7 * B2:B7", dims = "C2:C7", array = TRUE)
# wb$open()

Array formulas creating multiple fields

In the example below we want to use MMULT() which creates a matrix multiplication. This requires us to write an array formula and to specify the region where the output will be written to.

m1 <- matrix(1:6, ncol = 2)
m2 <- matrix(7:12, nrow = 2)

wb <- wb_workbook()$add_worksheet()$
  add_data(x = m1)$
  add_data(x = m2, dims = wb_dims(from_col = 4))$
  add_formula(x = "MMULT(A2:B4, D2:F3)", dims = "H2:J4", array = TRUE)
# wb$open()

Similar a the coefficients of a linear regression

# we expect to find this in D1:E1
# coef(lm(head(cars)))
wb <- wb_workbook()$add_worksheet()$
  add_data(x = head(cars))$
  add_formula(x = "LINEST(A2:A7, B2:B7, TRUE)", dims = "D2:E2", array = TRUE)
# wb$open()

cells metadata (cm) formulas

Similar to array formulas, these cell metadata (cm) formulas hide to the user that they are array formulas. Using these is implemented in openxlsx2 > 0.6.1:

wb <- wb_workbook()$add_worksheet()$
  add_data(x = head(cars))$
  add_formula(x = "SUM(ABS(A2:A7))", dims = "D2", cm = TRUE)
#> Warning in write_data2(wb = wb, sheet = sheet, data = x, name = name, colNames
#> = colNames, : modifications with cm formulas are experimental. use at own risk
# wb$open()

dataTable formulas1

dataTable formula differences

A B C
1 sales_price COGS sales_quantity
2 20 5 1
3 30 11 2
4 40 13 3

Given a basic table like the above, a similarly basic formula for total_sales would be “= A2 * C2” with the row value changing at each row.

An implementation for this formula using wb_add_formula() would look this (taken from current documentation) lets say we’ve read in the data and assigned it to the table company_sales

## creating example data
company_sales <- data.frame(
    sales_price = c(20, 30, 40),
    COGS = c(5, 11, 13),
    sales_quantity = c(1, 2, 3)
)

## write in the formula
company_sales$total_sales  <- paste(paste0("A", 1:3 + 1L), paste0("C", 1:3 + 1L), sep = " * ")
## add the formula class
class(company_sales$total_sales) <- c(class(company_sales$total_sales), "formula")

## write a workbook
wb <- wb_workbook()$
  add_worksheet("Total Sales")$
  add_data_table(x = company_sales)

Then we create the workbook, worksheet, and use wb_add_data_table().

One of the advantages of the open xml dataTable syntax is that we don’t have to specify row numbers or columns as letters. The table also grows dynamically, adding new rows as new data is appended and extending formulas to the new rows. These dataTable have named columns that we can use instead of letters. When writing the formulas within the dataTable we would use the following syntax [@[column_name]] to reference the current row. So the “total_sales” formula written in open xml in dataTable would look like this; =[@[sales_price]] * [@[sales_quantity]]

If we are writing the formula outside of the dataTable we have to reference the table name. In this case lets say the table name is ‘daily_sales’ =daily_sales[@[sales_price]] * daily_sales[@[sales_quantity]]

However, if we were to pass this as the text for the formula to be written it would cause an error because the syntax that open xml requires for selecting the current row is different.

In open xml the dataTable formula looks like this:

<calculatedColumnFormula>
  daily_sales[[#This Row],[sales_price]]*daily_sales[[#ThisRow],[sales_quantity]]
</calculatedColumnFormula>

Now we can see that open xml replaces [@[sales_price]] with daily_sales[[#This Row],[sales_price]] We must then use this syntax when writing formulas for dataTable

## Because we want the `dataTable` formula to propagate down the entire column of the data
## we can assign the formula by itself to any column and allow that single string to be repeated for each row.

## creating example data
example_data <-
  data.frame(
    sales_price = c(20, 30, 40),
    COGS = c(5, 11, 13),
    sales_quantity = c(1, 2, 3)
  )

## base R method
example_data$gross_profit       <- "daily_sales[[#This Row],[sales_price]] - daily_sales[[#This Row],[COGS]]"
example_data$total_COGS        <- "daily_sales[[#This Row],[COGS]] * daily_sales[[#This Row],[sales_quantity]]"
example_data$total_sales       <- "daily_sales[[#This Row],[sales_price]] * daily_sales[[#This Row],[sales_quantity]]"
example_data$total_gross_profit <- "daily_sales[[#This Row],[total_sales]] - daily_sales[[#This Row],[total_COGS]]"

class(example_data$gross_profit)       <- c(class(example_data$gross_profit),       "formula")
class(example_data$total_COGS)        <- c(class(example_data$total_COGS),          "formula")
class(example_data$total_sales)       <- c(class(example_data$total_sales),         "formula")
class(example_data$total_gross_profit) <- c(class(example_data$total_gross_profit), "formula")
wb$
  add_worksheet("Daily Sales")$
  add_data_table(
    x           = example_data,
    table_style = "TableStyleMedium2",
    table_name  = "daily_sales"
  )

And if we open the workbook to view the table we created we can see that the formula has worked.

A B C D E F G
1 sales_price COGS sales_quantity gross_profit total_COGS total_sales total_gross_profit
2 20 5 1 15 5 20 15
3 30 11 2 19 22 60 38
4 40 13 3 27 39 120 81

We can also see that it has replaced [#This Row] with @.

A B C D E F G
1 sales_price COGS sales_quantity gross_profit total_COGS total_sales total_gross_profit
2 20 5 1 =[@sales_price] - [@COGS] =[@COGS] * [@sales_quantity] =[@sales_price] * [@sales_quantity] =[@[total_sales]] - [@[total_COGS]]
3 30 11 2 =[@sales_price] - [@COGS] =[@COGS] * [@sales_quantity] =[@sales_price] * [@sales_quantity] =[@[total_sales]] - [@[total_COGS]]
4 40 13 3 =[@sales_price] - [@COGS] =[@COGS] * [@sales_quantity] =[@sales_price] * [@sales_quantity] =[@[total_sales]] - [@[total_COGS]]

For completion, the formula as we wrote it appears as;

D E F G
gross_profit total_COGS total_sales total_gross_profit
=gross_profit[[#This Row],[sales_price]] - gross_profit[[#This Row],[COGS]] =gross_profit[[#This Row],[COGS]] * gross_profit[[#This Row],[sales_quantity]] =gross_profit[[#This Row],[sales_price]] * gross_profit[[#This Row],[sales_quantity]] =gross_profit[[#This Row],[total_sales]] - gross_profit[[#This Row],[total_COGS]]
=gross_profit[[#This Row],[sales_price]] - gross_profit[[#This Row],[COGS]] =gross_profit[[#This Row],[COGS]] * gross_profit[[#This Row],[sales_quantity]] =gross_profit[[#This Row],[sales_price]] * gross_profit[[#This Row],[sales_quantity]] =gross_profit[[#This Row],[total_sales]] - gross_profit[[#This Row],[total_COGS]]
=gross_profit[[#This Row],[sales_price]] - gross_profit[[#This Row],[COGS]] =gross_profit[[#This Row],[COGS]] * gross_profit[[#This Row],[sales_quantity]] =gross_profit[[#This Row],[sales_price]] * gross_profit[[#This Row],[sales_quantity]] =gross_profit[[#This Row],[total_sales]] - gross_profit[[#This Row],[total_COGS]]
#### sum dataTable examples
wb$add_worksheet("sum_examples")

### Note: dataTable formula do not need to be used inside of dataTables. dataTable formula are for referencing the data within the dataTable.

### Note: dataTable formula do not need to be used inside of dataTables. dataTable formula are for referencing the data within the dataTable.
sum_examples <- data.frame(
  description = c("sum_sales_price", "sum_product_Price_Quantity"),
  formula = c("", "")
)

wb$add_data(x = sum_examples)

# add formulas
wb$add_formula(x = "sum(daily_sales[[#Data],[sales_price]])", dims = "B2")
wb$add_formula(x = "sum(daily_sales[[#Data],[sales_price]] * daily_sales[[#Data],[sales_quantity]])", dims = "B3", array = TRUE)

#### dataTable referencing
wb$add_worksheet("dt_references")

### Adding the headers by themselves.
wb$add_formula(
  x = "daily_sales[[#Headers],[sales_price]:[total_gross_profit]]",
  dims = "A1:G1",
  array = TRUE
)

### Adding the raw data by reference and selecting them directly.
wb$add_formula(
  x = "daily_sales[[#Data],[sales_price]:[total_gross_profit]]",
  start_row = 2,
  dims = "A2:G4",
  array = TRUE
)
# wb$open()

  1. this example was originally provided by @zykezero for openxlsx.↩︎