Skip to contents

Introduction

The zztable1 package provides a next-generation architecture for creating publication-ready summary tables (commonly called “Table 1”) used in biomedical research and clinical trials. This vignette demonstrates the key features and capabilities of the package.

Key Features

  • Lazy Evaluation Architecture: Fast blueprint creation with computation on demand
  • Journal-Specific Theming: NEJM, Lancet, JAMA, BMJ formatting styles
  • Advanced Footnote System: Variable-specific, column-specific, and general footnotes with superscript markers
  • Multiple Output Formats: Console, LaTeX, and HTML output with proper column headers
  • Flexible Statistics: Built-in and custom summary statistics
  • Stratified Analysis: Support for subgroup analyses
  • Full Compatibility: Same interface as original zztable1 package
  • R Markdown Integration: Automatic format detection for seamless PDF/HTML output

Installation and Setup

# Development version
devtools::install_github("rgt47/zztable1")

Basic Usage

Simple Summary Tables

Let’s start with a basic example using the mtcars dataset:

# Prepare data
data(mtcars)
mtcars$transmission <- factor(
  ifelse(mtcars$am == 1, "Manual", "Automatic"),
  levels = c("Automatic", "Manual")
)
mtcars$engine_type <- factor(
  ifelse(mtcars$vs == 1, "V-shaped", "Straight"),
  levels = c("Straight", "V-shaped")
)

# Create basic summary table
create_table(transmission ~ mpg + hp + wt, data = mtcars)
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg 17.1 (3.8) 24.4 (6.2) 0
hp 160.3 (53.9) 126.8 (84.1) 0.18
wt 3.8 (0.8) 2.4 (0.6) 0

Adding Statistical Tests

Include p-values for group comparisons:

create_table(transmission ~ mpg + hp + wt,
             data = mtcars,
             pvalue = TRUE)
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg 17.1 (3.8) 24.4 (6.2) 0
hp 160.3 (53.9) 126.8 (84.1) 0.18
wt 3.8 (0.8) 2.4 (0.6) 0

Including Total Column

Add an overall summary column:

create_table(transmission ~ mpg + hp + wt,
             data = mtcars,
             pvalue = TRUE,
             totals = TRUE)
Variable Automatic
(N=19)
Manual
(N=13)
Total
(N=32)
P value
mpg 17.1 (3.8) 24.4 (6.2) 20.1 (6) 0
hp 160.3 (53.9) 126.8 (84.1) 146.7 (68.6) 0.18
wt 3.8 (0.8) 2.4 (0.6) 3.2 (1) 0

Advanced Features

Custom Numeric Summaries

Built-in Options

The package provides several built-in summary statistics:

# Default: Mean (SD)
cat("Mean (SD) format:\n")

Mean (SD) format:

create_table(transmission ~ mpg + hp, data = mtcars)
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg 17.1 (3.8) 24.4 (6.2) 0
hp 160.3 (53.9) 126.8 (84.1) 0.18

# Median [IQR]
cat("\nMedian [IQR] format:\n")

Median [IQR] format:

create_table(transmission ~ mpg + hp, data = mtcars,
             numeric_summary = "median_iqr")
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg 17.3 [14.9-19.2] 22.8 [21-30.4] 0
hp 175 [116.5-192.5] 109 [66-113] 0.18

# Mean +/- SE
cat("\nMean +/- SE format:\n")

Mean +/- SE format:

create_table(transmission ~ mpg + hp, data = mtcars,
             numeric_summary = "mean_se")
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg 17.1 +/- 0.9 24.4 +/- 1.7 0
hp 160.3 +/- 12.4 126.8 +/- 23.3 0.18

Custom Functions

Create your own summary statistics:

# Custom function: Median (Min-Max)
custom_summary <- function(x) {
  med <- round(median(x, na.rm = TRUE), 1)
  min_val <- round(min(x, na.rm = TRUE), 1)
  max_val <- round(max(x, na.rm = TRUE), 1)
  paste0(med, " (", min_val, "-", max_val, ")")
}

cat("Custom Median (Min-Max) format:\n")

Custom Median (Min-Max) format:

create_table(transmission ~ mpg + hp, data = mtcars,
             numeric_summary = custom_summary)
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg 17.3 (10.4-24.4) 22.8 (15-33.9) 0
hp 175 (62-245) 109 (52-335) 0.18

Stratified Analysis

Perform subgroup analyses using stratification:

# Create stratification variable
mtcars$cylinder_group <- factor(
  ifelse(mtcars$cyl <= 4, "4-cylinder",
  ifelse(mtcars$cyl <= 6, "6-cylinder", "8-cylinder")),
  levels = c("4-cylinder", "6-cylinder", "8-cylinder")
)

# Stratified analysis
create_table(transmission ~ mpg + hp,
             data = mtcars,
             strata = "cylinder_group",
             pvalue = TRUE)
Variable Automatic
(N=19)
Manual
(N=13)
P value
Cylinder_group: 6-cylinder
mpg 19.1 (1.6) 20.6 (0.8) 0
hp 115.2 (9.2) 131.7 (37.5) 0.18
Cylinder_group: 4-cylinder
mpg 22.9 (1.5) 28.1 (4.5) 0
hp 84.7 (19.7) 81.9 (22.7) 0.18
Cylinder_group: 8-cylinder
mpg 15.1 (2.8) 15.4 (0.6) 0
hp 194.2 (33.4) 299.5 (50.2) 0.18

Journal-Specific Theming

Available Themes

View all available themes:

themes <- list_available_themes()
print(themes)

[1] “console” “nejm” “lancet” “jama” “bmj” “simple”

Theme Comparison

Console Theme (Default)

cat("Console Theme:\n")

Console Theme:

create_table(transmission ~ mpg + hp, data = mtcars,
             theme = "console")
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg 17.1 (3.8) 24.4 (6.2) 0
hp 160.3 (53.9) 126.8 (84.1) 0.18

NEJM Theme (1 decimal place)

cat("NEJM Theme (1 decimal place):\n")

NEJM Theme (1 decimal place):

create_table(transmission ~ mpg + hp, data = mtcars,
             theme = "nejm")
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg 17.1 ± 3.8 24.4 ± 6.2 0
hp 160.3 ± 53.9 126.8 ± 84.1 0.18

JAMA Theme (1 decimal place)

cat("JAMA Theme (2 decimal places):\n")

JAMA Theme (2 decimal places):

create_table(transmission ~ mpg + hp, data = mtcars,
             theme = "jama")
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg 17.1 (3.8) 24.4 (6.2) 0
hp 160.3 (53.9) 126.8 (84.1) 0.18

Lancet Theme

cat("Lancet Theme:\n")

Lancet Theme:

create_table(transmission ~ mpg + hp, data = mtcars,
             theme = "lancet")
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg 17.1 (3.8) 24.4 (6.2) 0
hp 160.3 (53.9) 126.8 (84.1) 0.18

Footnote System

Variable-Specific Footnotes

Add footnotes to specific variables with superscript markers:

create_table(transmission ~ mpg + hp + wt,
             data = mtcars,
             theme = "nejm",
             footnotes = list(
               variables = list(
                 mpg = "EPA fuel economy rating in miles per gallon",
                 hp = "Gross horsepower measured at crankshaft",
                 wt = "Vehicle weight in thousands of pounds"
               )
             ))
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg* 17.1 ± 3.8 24.4 ± 6.2 0
hp† 160.3 ± 53.9 126.8 ± 84.1 0.18
wt‡ 3.8 ± 0.8 2.4 ± 0.6 0
* EPA fuel economy rating in miles per gallon
Gross horsepower measured at crankshaft
Vehicle weight in thousands of pounds

Column-Specific Footnotes

Add footnotes to columns:

create_table(transmission ~ mpg + hp,
             data = mtcars,
             theme = "nejm",
             pvalue = TRUE,
             footnotes = list(
               columns = list(
                 "p.value" = "Two-tailed t-test, alpha = 0.05"
               )
             ))
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg 17.1 ± 3.8 24.4 ± 6.2 0
hp 160.3 ± 53.9 126.8 ± 84.1 0.18
* Two-tailed t-test, alpha = 0.05

Comprehensive Footnotes

Combine multiple footnote types:

create_table(transmission ~ mpg + hp,
             data = mtcars,
             theme = "nejm",
             pvalue = TRUE,
             footnotes = list(
               variables = list(
                 mpg = "EPA fuel economy standard",
                 hp = "Gross horsepower"
               ),
               columns = list(
                 "p.value" = "Statistical significance testing"
               ),
               general = list(
                 "Data source: Henderson and Velleman (1981)",
                 "Missing values excluded from analysis"
               )
             ))
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg* 17.1 ± 3.8 24.4 ± 6.2 0
hp† 160.3 ± 53.9 126.8 ± 84.1 0.18
* EPA fuel economy standard
Gross horsepower
Statistical significance testing

Clinical Trial Example

Simulated Clinical Trial Data

Let’s create a more realistic clinical trial example:

set.seed(123)
n <- 200

# Generate clinical trial data
trial_data <- data.frame(
  patient_id = 1:n,
  treatment = factor(
    sample(c("Placebo", "Drug A", "Drug B"), n, replace = TRUE),
    levels = c("Placebo", "Drug A", "Drug B")
  ),
  age = round(rnorm(n, 65, 12)),
  sex = factor(sample(c("Male", "Female"), n, replace = TRUE)),
  race = factor(
    sample(c("White", "Black", "Hispanic", "Asian", "Other"), 
           n, replace = TRUE, prob = c(0.6, 0.2, 0.1, 0.08, 0.02)),
    levels = c("White", "Black", "Hispanic", "Asian", "Other")
  ),
  baseline_bmi = round(rnorm(n, 28, 5), 1),
  diabetes = factor(sample(c("No", "Yes"), n, replace = TRUE, prob = c(0.7, 0.3))),
  hypertension = factor(sample(c("No", "Yes"), n, replace = TRUE, prob = c(0.6, 0.4))),
  center = factor(sample(paste("Center", 1:4), n, replace = TRUE))
)

# Preview the data
head(trial_data, 10)

patient_id treatment age sex race baseline_bmi diabetes hypertension 1 1 Drug B 70 Female Black 25.5 Yes No 2 2 Drug B 65 Male White 20.9 No Yes 3 3 Drug B 60 Male Black 28.6 No Yes 4 4 Drug A 40 Male White 37.7 Yes No 5 5 Drug B 79 Male White 32.0 Yes Yes 6 6 Drug A 47 Male White 33.8 No No 7 7 Drug A 74 Female White 29.8 No No 8 8 Drug A 88 Female White 25.0 No No 9 9 Drug B 48 Female White 27.0 Yes Yes 10 10 Placebo 73 Male White 26.6 No No center 1 Center 3 2 Center 1 3 Center 3 4 Center 4 5 Center 1 6 Center 1 7 Center 4 8 Center 2 9 Center 3 10 Center 3

Basic Clinical Table 1

create_table(treatment ~ age + sex + race + baseline_bmi +
             diabetes + hypertension,
             data = trial_data,
             theme = "nejm",
             pvalue = TRUE)
Variable Placebo
(N=63)
Drug A
(N=70)
Drug B
(N=67)
P value
age 64.2 ± 9.3 66.6 ± 13.6 66.3 ± 12.3 0.246
sex – no. (%) 30 (47.6) 41 (58.6) 33 (49.3) 0.385
race
White 42 (67%) 46 (66%) 40 (60%) 0.231
Black 10 (16%) 11 (16%) 17 (25%)
Hispanic 5 (8%) 6 (9%) 9 (13%)
Asian 5 (8%) 7 (10%) 1 (1%)
Other 1 (2%) 0 (0%) 0 (0%)
baseline_bmi 27 ± 4.8 27.7 ± 5.2 28.1 ± 5 0.415
diabetes – no. (%) 25 (39.7) 19 (27.1) 22 (32.8) 0.333
hypertension – no. (%) 27 (42.9) 28 (40) 37 (55.2) 0.165

With Footnotes and Stratification

create_table(treatment ~ age + sex + race + baseline_bmi +
             diabetes + hypertension,
             data = trial_data,
             strata = "center",
             theme = "nejm",
             pvalue = TRUE,
             footnotes = list(
               variables = list(
                 age = "Age at enrollment (years)",
                 baseline_bmi = "Body mass index at baseline (kg/m²)",
                 diabetes = "Type 2 diabetes mellitus diagnosis",
                 hypertension = "Hypertension diagnosis"
               ),
               columns = list(
                 "p.value" = "ANOVA for continuous, chi-squared for categorical"
               ),
               general = list(
                 "Data are mean (SD) or n (%)",
                 "ITT population (N=200)"
               )
             ))
Variable Placebo
(N=63)
Drug A
(N=70)
Drug B
(N=67)
P value
Center: Center 3
age 63.2 ± 7.7 65.5 ± 12.4 64.2 ± 11.5 0.246
sex – no. (%) 30 (250) 41 (205) 33 (206.2) 0.385
race
White 9 (75%) 14 (70%) 11 (68.8%) 0.231
Black 1 (8.3%) 6 (30%) 4 (25%)
Hispanic 0 (0%) 0 (0%) 1 (6.2%)
Asian 2 (16.7%) 0 (0%) 0 (0%)
Other 0 (0%) 0 (0%) 0 (0%)
baseline_bmi 26 ± 4.2 28 ± 4.1 28.4 ± 5.1 0.415
diabetes – no. (%)‡ 25 (208.3) 19 (95) 22 (137.5) 0.333
hypertension – no. (%)§ 27 (225) 28 (140) 37 (231.2) 0.165
Center: Center 1
age 67.2 ± 10.2 63.3 ± 10.9 68.2 ± 13 0.246
sex – no. (%) 30 (214.3) 41 (195.2) 33 (137.5) 0.385
race
White 7 (50%) 12 (57.1%) 18 (75%) 0.231
Black 5 (35.7%) 3 (14.3%) 3 (12.5%)
Hispanic 1 (7.1%) 3 (14.3%) 3 (12.5%)
Asian 0 (0%) 3 (14.3%) 0 (0%)
Other 1 (7.1%) 0 (0%) 0 (0%)
baseline_bmi 29.1 ± 3.5 27.8 ± 4.5 27.3 ± 5.1 0.415
diabetes – no. (%)‡ 25 (178.6) 19 (90.5) 22 (91.7) 0.333
hypertension – no. (%)§ 27 (192.9) 28 (133.3) 37 (154.2) 0.165
Center: Center 4
age 62.2 ± 9.6 68.8 ± 13.9 66.6 ± 14.7 0.246
sex – no. (%) 30 (214.3) 41 (292.9) 33 (366.7) 0.385
race
White 9 (64.3%) 10 (71.4%) 3 (33.3%) 0.231
Black 2 (14.3%) 0 (0%) 4 (44.4%)
Hispanic 1 (7.1%) 1 (7.1%) 1 (11.1%)
Asian 2 (14.3%) 3 (21.4%) 1 (11.1%)
Other 0 (0%) 0 (0%) 0 (0%)
baseline_bmi 25.9 ± 5.9 27.2 ± 7.3 27.1 ± 6.9 0.415
diabetes – no. (%)‡ 25 (178.6) 19 (135.7) 22 (244.4) 0.333
hypertension – no. (%)§ 27 (192.9) 28 (200) 37 (411.1) 0.165
Center: Center 2
age 64 ± 9.5 70.5 ± 17.8 65.3 ± 11.3 0.246
sex – no. (%) 30 (130.4) 41 (273.3) 33 (183.3) 0.385
race
White 17 (73.9%) 10 (66.7%) 8 (44.4%) 0.231
Black 2 (8.7%) 2 (13.3%) 6 (33.3%)
Hispanic 3 (13%) 2 (13.3%) 4 (22.2%)
Asian 1 (4.3%) 1 (6.7%) 0 (0%)
Other 0 (0%) 0 (0%) 0 (0%)
baseline_bmi 26.8 ± 4.8 27.4 ± 5.7 29.4 ± 3.5 0.415
diabetes – no. (%)‡ 25 (108.7) 19 (126.7) 22 (122.2) 0.333
hypertension – no. (%)§ 27 (117.4) 28 (186.7) 37 (205.6) 0.165
* Age at enrollment (years)
Body mass index at baseline (kg/m²)
Type 2 diabetes mellitus diagnosis
§ Hypertension diagnosis
ANOVA for continuous, chi-squared for categorical

Different Output Formats

Console Output (Default)

create_table(transmission ~ mpg + hp, data = mtcars, theme = "nejm")
Variable Automatic
(N=19)
Manual
(N=13)
P value
mpg 17.1 ± 3.8 24.4 ± 6.2 0
hp 160.3 ± 53.9 126.8 ± 84.1 0.18

LaTeX Output

bp_latex <- table1(transmission ~ mpg + hp, data = mtcars,
                   layout = "latex", theme = "nejm")

# Note: LaTeX output would contain LaTeX markup
cat("LaTeX theme config:\n")

LaTeX theme config:

cat("Font size:", bp_latex$metadata$theme$latex$font_size, "\n")

Font size:

cat("Packages:", paste(bp_latex$metadata$theme$latex$packages, collapse = ", "), "\n")

Packages:

HTML Output

bp_html <- table1(transmission ~ mpg + hp, data = mtcars,
                  layout = "html", theme = "nejm")

# Note: HTML output would contain HTML markup
cat("HTML theme ready for web display\n")

HTML theme ready for web display

Performance and Architecture

Blueprint Architecture

The lazy evaluation approach provides several benefits:

# Large dataset simulation
large_data <- data.frame(
  group = factor(sample(c("A", "B", "C"), 10000, replace = TRUE)),
  var1 = rnorm(10000),
  var2 = rnorm(10000),
  var3 = rnorm(10000),
  var4 = rnorm(10000),
  var5 = rnorm(10000)
)

# Fast blueprint creation (no computations yet)
system.time({
  bp_large <- table1(group ~ var1 + var2 + var3 + var4 + var5, 
                     data = large_data)
})

user system elapsed 0.004 0.001 0.004


# Computations happen only during display
cat("Blueprint created instantly. Computations happen during display.\n")

Blueprint created instantly. Computations happen during display.

cat("Blueprint dimensions:", dim(bp_large), "\n")

Blueprint dimensions: 5 5

Memory Efficiency

# Blueprint object structure
bp_small <- table1(transmission ~ mpg, data = mtcars)

cat("Blueprint components:\n")

Blueprint components:

cat("- Cells: ", length(bp_small$cells), "\n")
  • Cells: 4
cat("- Dimensions: ", dim(bp_small), "\n")
  • Dimensions: 1 4
cat("- Metadata keys: ", names(bp_small$metadata), "\n")
  • Metadata keys: formula options data_info data dimensions footnote_markers footnote_list created optimized version cell_count theme stat_cache spanner_store summary_store

Best Practices

Recommendations

  1. Choose Appropriate Themes: Use journal-specific themes for manuscript preparation
  2. Add Informative Footnotes: Explain variables and statistical methods
  3. Use Stratification Wisely: For meaningful subgroup analyses
  4. Custom Functions: Create domain-specific summary statistics
  5. Validate Results: Check statistical assumptions and interpret p-values carefully

Common Patterns

# Standard clinical trial baseline table
create_baseline_table <- function(data, treatment_var, theme = "nejm") {
  formula_str <- paste(treatment_var, "~ .")
  bp <- table1(as.formula(formula_str), 
               data = data,
               theme = theme,
               pvalue = TRUE,
               footnotes = list(
                 general = list(
                   "Data are mean (SD) or n (%)",
                   "P-values from ANOVA or chi-squared test"
                 )
               ))
  return(bp)
}

# Example usage
# bp_standard <- create_baseline_table(trial_data, "treatment")
cat("Utility function created for standardized baseline tables\n")

Utility function created for standardized baseline tables

Troubleshooting

Common Issues

  1. Missing Variables: Ensure all formula variables exist in the data
  2. Factor Levels: Check factor level ordering for expected display
  3. Missing Values: Use missing = TRUE to show missing counts
  4. Theme Application: Themes affect decimal places and formatting
  5. Large Tables: Use stratification to break down complex tables

Error Handling

# Example of error handling
tryCatch({
  # This will cause an error - variable doesn't exist
  bp_error <- table1(nonexistent_var ~ mpg, data = mtcars)
}, error = function(e) {
  cat("Error caught:", e$message, "\n")
  cat("Solution: Check that all variables in formula exist in data\n")
})

Error caught: Variables not found in data: nonexistent_var

Available variables: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, transmission, engine_type, cylinder_group Solution: Check that all variables in formula exist in data

Conclusion

The zztable1 package provides a powerful, flexible system for creating publication-ready summary tables. Key advantages include:

  • Performance: Lazy evaluation for fast blueprint creation
  • Flexibility: Multiple themes, custom statistics, advanced footnotes
  • Compatibility: Same interface as original zztable1
  • Publication-Ready: Journal-specific formatting out of the box

For more information, see the package documentation and function help files.


Session Information

R version 4.6.0 (2026-04-24) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 24.04.4 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0

locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

time zone: UTC tzcode source: system (glibc)

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] zztable1_0.5.0 kableExtra_1.4.0 htmltools_0.5.9

loaded via a namespace (and not attached): [1] vctrs_0.7.3 svglite_2.2.2 cli_3.6.6 knitr_1.51
[5] rlang_1.2.0 xfun_0.57 stringi_1.8.7 otel_0.2.0
[9] textshaping_1.0.5 jsonlite_2.0.0 glue_1.8.1 ragg_1.5.2
[13] sass_0.4.10 scales_1.4.0 rmarkdown_2.31 evaluate_1.0.5
[17] jquerylib_0.1.4 fastmap_1.2.0 yaml_2.3.12 lifecycle_1.0.5
[21] stringr_1.6.0 compiler_4.6.0 RColorBrewer_1.1-3 fs_2.1.0
[25] htmlwidgets_1.6.4 rstudioapi_0.18.0 farver_2.1.2 systemfonts_1.3.2 [29] digest_0.6.39 viridisLite_0.4.3 R6_2.6.1 pillar_1.11.1
[33] parallel_4.6.0 magrittr_2.0.5 bslib_0.10.0 tools_4.6.0
[37] xml2_1.5.2 pkgdown_2.2.0 cachem_1.1.0 desc_1.4.3