
Customizing Statistics and Tests in zztable1
zztable1
2026-05-02
Source:vignettes/customizing_statistics.Rmd
customizing_statistics.RmdIntroduction
The zztable1 package provides powerful customization
options for both summary statistics and
statistical tests used in Table 1 generation. This
vignette demonstrates how to:
- Customize numeric summary statistics with built-in and custom functions
- Select appropriate statistical tests for different data types
- Apply these customizations across different medical journal themes
- Handle real-world scenarios with varying data distributions
Key Customization Parameters
The package provides two main customization parameters that mirror each other in design:
-
numeric_summary: Controls how continuous variables are summarized -
continuous_test: Controls statistical tests for continuous variables
-
categorical_test: Controls statistical tests for categorical variables
Dataset Preparation
We’ll use a simulated clinical trial dataset to demonstrate all customization options:
# Create comprehensive clinical trial dataset
set.seed(123)
n_per_group <- 40
clinical_data <- data.frame(
# Treatment groups
treatment = factor(rep(c("Placebo", "Low Dose", "High Dose"), each = n_per_group)),
# Patient characteristics
age = c(
rnorm(n_per_group, mean = 65, sd = 12), # Placebo
rnorm(n_per_group, mean = 63, sd = 10), # Low Dose
rnorm(n_per_group, mean = 67, sd = 15) # High Dose
),
# Primary efficacy endpoint (with treatment effect)
efficacy_score = c(
rnorm(n_per_group, mean = 20, sd = 8), # Placebo: lower scores
rnorm(n_per_group, mean = 28, sd = 9), # Low Dose: moderate improvement
rnorm(n_per_group, mean = 35, sd = 7) # High Dose: best improvement
),
# Safety endpoint (non-normal distribution)
biomarker_level = c(
rexp(n_per_group, rate = 0.1), # Placebo: exponential distribution
rexp(n_per_group, rate = 0.08), # Low Dose
rexp(n_per_group, rate = 0.12) # High Dose
),
# Binary outcomes
response = factor(c(
sample(c("Responder", "Non-responder"), n_per_group, replace = TRUE, prob = c(0.3, 0.7)),
sample(c("Responder", "Non-responder"), n_per_group, replace = TRUE, prob = c(0.6, 0.4)),
sample(c("Responder", "Non-responder"), n_per_group, replace = TRUE, prob = c(0.8, 0.2))
)),
# Categorical safety outcome
safety_grade = factor(c(
sample(c("None", "Mild", "Moderate", "Severe"), n_per_group, replace = TRUE, prob = c(0.4, 0.4, 0.15, 0.05)),
sample(c("None", "Mild", "Moderate", "Severe"), n_per_group, replace = TRUE, prob = c(0.3, 0.45, 0.2, 0.05)),
sample(c("None", "Mild", "Moderate", "Severe"), n_per_group, replace = TRUE, prob = c(0.2, 0.5, 0.25, 0.05))
), levels = c("None", "Mild", "Moderate", "Severe"))
)
# Display sample data
knitr::kable(head(clinical_data),
caption = "Sample of Clinical Trial Dataset")| treatment | age | efficacy_score | biomarker_level | response | safety_grade |
|---|---|---|---|---|---|
| Placebo | 58.27429 | 20.94117 | 15.9962002 | Responder | None |
| Placebo | 62.23787 | 12.42020 | 3.7548537 | Non-responder | None |
| Placebo | 83.70450 | 16.07554 | 0.8025235 | Non-responder | None |
| Placebo | 65.84610 | 17.95126 | 3.5644771 | Non-responder | Moderate |
| Placebo | 66.55145 | 34.75090 | 12.7150987 | Non-responder | Moderate |
| Placebo | 85.58078 | 14.78440 | 22.0726519 | Non-responder | Mild |
Customizing Summary Statistics
Built-in Summary Options
The package provides several built-in summary statistics optimized for different data types and journal requirements:
cat("**Available Built-in Summary Statistics:**\n\n")Available Built-in Summary Statistics:
cat("- `mean_sd`: Mean +/- SD (default, most common)\n")-
mean_sd: Mean +/- SD (default, most common)
cat("- `median_iqr`: Median [Q1-Q3] (for non-normal data)\n") -
median_iqr: Median [Q1-Q3] (for non-normal data)
cat("- `median_range`: Median (min-max) (for small samples)\n")-
median_range: Median (min-max) (for small samples)
cat("- `mean_se`: Mean +/- SE (for experimental data)\n")-
mean_se: Mean +/- SE (for experimental data)
cat("- `mean_ci`: Mean (95% CI) (for effect estimates)\n\n")-
mean_ci: Mean (95% CI) (for effect estimates)
Comparison of Built-in Summaries
Let’s see how different summary statistics present the same data:
cat("### Mean +/- SD (Default Clinical Standard)\n")Mean +/- SD (Default Clinical Standard)
create_table(
treatment ~ age + efficacy_score + biomarker_level,
data = clinical_data,
numeric_summary = "mean_sd",
pvalue = TRUE,
theme = "nejm"
)| Variable |
High Dose (N=40) |
Low Dose (N=40) |
Placebo (N=40) |
P value |
|---|---|---|---|---|
| age | 67.1 ± 12.7 | 62.9 ± 9.6 | 65.5 ± 10.8 | 0.094 |
| efficacy_score | 35.1 ± 6.5 | 28.1 ± 8.9 | 19.2 ± 8.4 | 0 |
| biomarker_level | 8 ± 7.5 | 10.4 ± 12.1 | 9.2 ± 8 | 0.259 |
cat("\n### Median [IQR] (For Non-Normal Data)\n")Median [IQR] (For Non-Normal Data)
create_table(
treatment ~ age + efficacy_score + biomarker_level,
data = clinical_data,
numeric_summary = "median_iqr",
pvalue = TRUE,
theme = "nejm"
)| Variable |
High Dose (N=40) |
Low Dose (N=40) |
Placebo (N=40) |
P value |
|---|---|---|---|---|
| age | 66.2 [57.9-74.9] | 62.4 [57.5-67.8] | 66.1 [58.3-73.3] | 0.094 |
| efficacy_score | 33.4 [30-38.9] | 27 [22.2-32.1] | 18.5 [12.4-25.5] | 0 |
| biomarker_level | 5.3 [3.2-9.9] | 5.6 [1.7-15] | 7.4 [3.4-11.6] | 0.259 |
cat("\n### Median (Range) (For Small Samples)\n")Median (Range) (For Small Samples)
create_table(
treatment ~ age + efficacy_score + biomarker_level,
data = clinical_data,
numeric_summary = "median_range",
pvalue = TRUE,
theme = "nejm"
)| Variable |
High Dose (N=40) |
Low Dose (N=40) |
Placebo (N=40) |
P value |
|---|---|---|---|---|
| age | 66.2 (42-99.8) | 62.4 (39.9-84.7) | 66.1 (41.4-86.4) | 0.094 |
| efficacy_score | 33.4 (25.8-50.4) | 27 (16.2-57.2) | 18.5 (3.6-36.8) | 0 |
| biomarker_level | 5.3 (0-32.3) | 5.6 (0.2-45.5) | 7.4 (0.1-30.4) | 0.259 |
cat("\n### Mean +/- SE (For Experimental Data)\n")Mean +/- SE (For Experimental Data)
create_table(
treatment ~ age + efficacy_score,
data = clinical_data,
numeric_summary = "mean_se",
pvalue = TRUE,
theme = "nejm"
)| Variable |
High Dose (N=40) |
Low Dose (N=40) |
Placebo (N=40) |
P value |
|---|---|---|---|---|
| age | 67.1 +/- 2 | 62.9 +/- 1.5 | 65.5 +/- 1.7 | 0.094 |
| efficacy_score | 35.1 +/- 1 | 28.1 +/- 1.4 | 19.2 +/- 1.3 | 0 |
Custom Summary Functions
You can create custom summary functions for specialized requirements:
# Example 1: Bootstrap confidence intervals
bootstrap_ci_summary <- function(x) {
if (all(is.na(x))) return("N/A")
# Bootstrap 95% CI for mean
set.seed(42) # For reproducibility in vignette
n_boot <- 1000
boot_means <- replicate(n_boot, {
sample_data <- sample(x[!is.na(x)], replace = TRUE)
mean(sample_data)
})
mean_est <- round(mean(x, na.rm = TRUE), 1)
ci_lower <- round(quantile(boot_means, 0.025), 1)
ci_upper <- round(quantile(boot_means, 0.975), 1)
paste0(mean_est, " (", ci_lower, "-", ci_upper, ")")
}
# Example 2: Robust statistics (median with MAD)
robust_summary <- function(x) {
if (all(is.na(x))) return("N/A")
med <- round(median(x, na.rm = TRUE), 1)
mad_val <- round(mad(x, na.rm = TRUE), 1)
paste0(med, " [+/-", mad_val, "]")
}
# Example 3: Multi-line detailed summary
detailed_summary <- function(x) {
if (all(is.na(x))) return("N/A")
mean_val <- round(mean(x, na.rm = TRUE), 1)
median_val <- round(median(x, na.rm = TRUE), 1)
sd_val <- round(sd(x, na.rm = TRUE), 1)
paste0(mean_val, " +/- ", sd_val, "\n", "(median: ", median_val, ")")
}
cat("### Custom Summary: Bootstrap 95% CI\n")Custom Summary: Bootstrap 95% CI
create_table(
treatment ~ efficacy_score,
data = clinical_data,
numeric_summary = bootstrap_ci_summary,
pvalue = TRUE,
theme = "jama"
)| Variable |
High Dose (N=40) |
Low Dose (N=40) |
Placebo (N=40) |
P value |
|---|---|---|---|---|
| efficacy_score | 35.1 (33.2-37) | 28.1 (25.8-30.9) | 19.2 (16.7-21.8) | 0 |
cat("\n### Custom Summary: Robust Statistics (Median +/- MAD)\n")Custom Summary: Robust Statistics (Median +/- MAD)
create_table(
treatment ~ biomarker_level,
data = clinical_data,
numeric_summary = robust_summary,
pvalue = TRUE,
theme = "jama"
)| Variable |
High Dose (N=40) |
Low Dose (N=40) |
Placebo (N=40) |
P value |
|---|---|---|---|---|
| biomarker_level | 5.3 [+/-5] | 5.6 [+/-7.1] | 7.4 [+/-6.1] | 0.259 |
cat("\n### Custom Summary: Multi-line Detailed\n")Custom Summary: Multi-line Detailed
create_table(
treatment ~ efficacy_score,
data = clinical_data,
numeric_summary = detailed_summary,
pvalue = TRUE,
theme = "console"
)| Variable |
High Dose (N=40) |
Low Dose (N=40) |
Placebo (N=40) |
P value |
|---|---|---|---|---|
| efficacy_score | 35.1 +/- 6.5 (median: 33.4) | 28.1 +/- 8.9 (median: 27) | 19.2 +/- 8.4 (median: 18.5) | 0 |
Customizing Statistical Tests
Available Statistical Tests
The package supports multiple statistical tests appropriate for different data distributions and study designs:
cat("**Continuous Variable Tests:**\n")Continuous Variable Tests:
cat("- `ttest`: Linear model t-test (default, robust for multiple groups)\n")-
ttest: Linear model t-test (default, robust for multiple groups)
cat("- `anova`: Traditional ANOVA F-test\n") -
anova: Traditional ANOVA F-test
cat("- `welch`: Welch's t-test (unequal variances, two groups only)\n")-
welch: Welch’s t-test (unequal variances, two groups only)
cat("- `kruskal`: Kruskal-Wallis test (non-parametric)\n\n")-
kruskal: Kruskal-Wallis test (non-parametric)
cat("**Categorical Variable Tests:**\n")Categorical Variable Tests:
cat("- `fisher`: Fisher's exact test (default, conservative)\n")-
fisher: Fisher’s exact test (default, conservative)
cat("- `chisq`: Chi-square test (requires adequate cell counts)\n\n")-
chisq: Chi-square test (requires adequate cell counts)
Comparing Different Statistical Tests
Let’s demonstrate how different tests can give different p-values for the same data:
cat("### Default Tests (ttest + fisher)\n")Default Tests (ttest + fisher)
create_table(
treatment ~ efficacy_score + response + safety_grade,
data = clinical_data,
pvalue = TRUE,
theme = "console"
)| Variable |
High Dose (N=40) |
Low Dose (N=40) |
Placebo (N=40) |
P value |
|---|---|---|---|---|
| efficacy_score | 35.1 (6.5) | 28.1 (8.9) | 19.2 (8.4) | 0 |
| response | ||||
| Non-responder | 8 (20%) | 18 (45%) | 31 (78%) | 0 |
| Responder | 32 (80%) | 22 (55%) | 9 (22%) | |
| safety_grade | ||||
| None | 11 (28%) | 17 (42%) | 19 (48%) | 0.019 |
| Mild | 14 (35%) | 17 (42%) | 15 (38%) | |
| Moderate | 15 (38%) | 4 (10%) | 4 (10%) | |
| Severe | 0 (0%) | 2 (5%) | 2 (5%) |
cat("\n### ANOVA + Chi-square\n")ANOVA + Chi-square
create_table(
treatment ~ efficacy_score + response + safety_grade,
data = clinical_data,
pvalue = TRUE,
continuous_test = "anova",
categorical_test = "chisq",
theme = "console"
)| Variable |
High Dose (N=40) |
Low Dose (N=40) |
Placebo (N=40) |
P value |
|---|---|---|---|---|
| efficacy_score | 35.1 (6.5) | 28.1 (8.9) | 19.2 (8.4) | 0 |
| response | ||||
| Non-responder | 8 (20%) | 18 (45%) | 31 (78%) | 0 |
| Responder | 32 (80%) | 22 (55%) | 9 (22%) | |
| safety_grade | ||||
| None | 11 (28%) | 17 (42%) | 19 (48%) | 0.019 |
| Mild | 14 (35%) | 17 (42%) | 15 (38%) | |
| Moderate | 15 (38%) | 4 (10%) | 4 (10%) | |
| Severe | 0 (0%) | 2 (5%) | 2 (5%) |
cat("\n### Non-parametric Approach (Kruskal-Wallis + Fisher)\n")Non-parametric Approach (Kruskal-Wallis + Fisher)
create_table(
treatment ~ efficacy_score + biomarker_level + response,
data = clinical_data,
pvalue = TRUE,
continuous_test = "kruskal",
categorical_test = "fisher",
theme = "console"
)| Variable |
High Dose (N=40) |
Low Dose (N=40) |
Placebo (N=40) |
P value |
|---|---|---|---|---|
| efficacy_score | 35.1 (6.5) | 28.1 (8.9) | 19.2 (8.4) | 0 |
| biomarker_level | 8 (7.5) | 10.4 (12.1) | 9.2 (8) | 0.822 |
| response | ||||
| Non-responder | 8 (20%) | 18 (45%) | 31 (78%) | 0 |
| Responder | 32 (80%) | 22 (55%) | 9 (22%) |
Manual Verification of Test Results
Let’s manually verify the different test results:
cat("**Manual Statistical Test Verification:**\n\n")Manual Statistical Test Verification:
# Test efficacy_score with different methods
lm_result <- lm(efficacy_score ~ treatment, data = clinical_data)
aov_result <- aov(efficacy_score ~ treatment, data = clinical_data)
kw_result <- kruskal.test(efficacy_score ~ treatment, data = clinical_data)
cat("Efficacy Score Tests:\n")Efficacy Score Tests:
- Linear model p-value: 2e-04
- ANOVA p-value: 0
- Kruskal-Wallis p-value: 0
# Test categorical data
response_table <- table(clinical_data$treatment, clinical_data$response)
fisher_result <- fisher.test(response_table)
chisq_result <- chisq.test(response_table)
cat("Response Rate Tests:\n")Response Rate Tests:
print(response_table) Non-responder Responder
High Dose 8 32 Low Dose 18 22 Placebo 31 9
- Fisher’s exact p-value: 0
- Chi-square p-value: 0
Two-Group Comparisons
For two-group studies, Welch’s t-test is often preferred when variances are unequal:
# Create two-group subset
two_group_data <- clinical_data[clinical_data$treatment %in% c("Placebo", "High Dose"), ]
two_group_data$treatment <- factor(two_group_data$treatment)
cat("### Two-Group Comparison: Welch's t-test vs Standard t-test\n")Two-Group Comparison: Welch’s t-test vs Standard t-test
cat("**Welch's t-test (unequal variances assumed):**\n")Welch’s t-test (unequal variances assumed):
create_table(
treatment ~ age + efficacy_score + biomarker_level,
data = two_group_data,
pvalue = TRUE,
continuous_test = "welch",
theme = "nejm"
)| Variable |
High Dose (N=40) |
Placebo (N=40) |
P value |
|---|---|---|---|
| age | 67.1 ± 12.7 | 65.5 ± 10.8 | 0.551 |
| efficacy_score | 35.1 ± 6.5 | 19.2 ± 8.4 | 0 |
| biomarker_level | 8 ± 7.5 | 9.2 ± 8 | 0.501 |
cat("\n**Standard linear model t-test:**\n")Standard linear model t-test:
create_table(
treatment ~ age + efficacy_score + biomarker_level,
data = two_group_data,
pvalue = TRUE,
continuous_test = "ttest",
theme = "nejm"
)| Variable |
High Dose (N=40) |
Placebo (N=40) |
P value |
|---|---|---|---|
| age | 67.1 ± 12.7 | 65.5 ± 10.8 | 0.551 |
| efficacy_score | 35.1 ± 6.5 | 19.2 ± 8.4 | 0 |
| biomarker_level | 8 ± 7.5 | 9.2 ± 8 | 0.501 |
# Manual verification
cat("\n**Manual Verification for Efficacy Score:**\n")Manual Verification for Efficacy Score:
welch_test <- t.test(efficacy_score ~ treatment, data = two_group_data, var.equal = FALSE)
standard_test <- t.test(efficacy_score ~ treatment, data = two_group_data, var.equal = TRUE)
cat("- Welch's t-test p-value: ", round(welch_test$p.value, 4), "\n")- Welch’s t-test p-value: 0
- Standard t-test p-value: 0
Real-World Clinical Scenarios
Scenario 1: Dose-Escalation Study
For dose-escalation studies, you might want non-parametric tests and robust summaries:
# Simulate dose-escalation data with safety focus
set.seed(456)
dose_data <- data.frame(
dose_level = factor(c("Cohort 1 (1mg)", "Cohort 2 (3mg)", "Cohort 3 (10mg)", "Cohort 4 (30mg)"),
levels = c("Cohort 1 (1mg)", "Cohort 2 (3mg)", "Cohort 3 (10mg)", "Cohort 4 (30mg)")),
# Efficacy increases with dose
efficacy = c(rnorm(8, 15, 5), rnorm(8, 25, 6), rnorm(8, 35, 8), rnorm(8, 40, 10)),
# Safety events increase with dose
dlt_grade = c(rexp(8, 2), rexp(8, 1.5), rexp(8, 1), rexp(8, 0.8)),
# Binary DLT outcome
dlt = factor(c(
sample(c("Yes", "No"), 8, replace = TRUE, prob = c(0.1, 0.9)),
sample(c("Yes", "No"), 8, replace = TRUE, prob = c(0.2, 0.8)),
sample(c("Yes", "No"), 8, replace = TRUE, prob = c(0.4, 0.6)),
sample(c("Yes", "No"), 8, replace = TRUE, prob = c(0.6, 0.4))
))
)
dose_footnotes <- list(
variables = list(
efficacy = "Efficacy endpoint measured on 0-50 scale",
dlt_grade = "Dose-limiting toxicity severity score",
dlt = "Dose-limiting toxicity occurrence (binary)"
),
general = c(
"Phase I dose-escalation study with 3+3 design",
"Non-parametric tests used due to small sample sizes",
"Median with IQR preferred for safety data"
)
)
cat("### Phase I Dose-Escalation Study Analysis\n")Phase I Dose-Escalation Study Analysis
create_table(
dose_level ~ efficacy + dlt_grade + dlt,
data = dose_data,
numeric_summary = "median_iqr",
continuous_test = "kruskal",
categorical_test = "fisher",
pvalue = TRUE,
footnotes = dose_footnotes,
theme = "nejm"
)| Variable |
Cohort 1 (1mg) (N=8) |
Cohort 2 (3mg) (N=8) |
Cohort 3 (10mg) (N=8) |
Cohort 4 (30mg) (N=8) |
P value |
|---|---|---|---|---|---|
| efficacy* | 31.1 [26.1-42] | 28.8 [20.5-35.7] | 21.5 [18.9-34.9] | 35.4 [28.7-36.7] | 0.893 |
| dlt_grade† | 0.5 [0.3-1] | 0.5 [0.1-0.6] | 0.6 [0.1-0.7] | 1 [0.6-1.2] | 0.316 |
| dlt – no. (%)‡ | 5 (62.5) | 1 (12.5) | 3 (37.5) | 2 (25) | 0.258 |
|
* Efficacy endpoint measured on 0-50 scale † Dose-limiting toxicity severity score ‡ Dose-limiting toxicity occurrence (binary) • Phase I dose-escalation study with 3+3 design • Non-parametric tests used due to small sample sizes • Median with IQR preferred for safety data |
|||||
Scenario 2: Bioequivalence Study
For bioequivalence studies, you might prefer confidence intervals and specific tests:
# Simulate crossover bioequivalence data
set.seed(789)
be_data <- data.frame(
formulation = factor(rep(c("Reference", "Test"), each = 24)),
# Primary PK parameters
cmax = c(
rnorm(24, mean = 100, sd = 20), # Reference
rnorm(24, mean = 105, sd = 18) # Test (slight difference)
),
auc = c(
rnorm(24, mean = 500, sd = 80), # Reference
rnorm(24, mean = 495, sd = 75) # Test
),
tmax = c(
rexp(24, rate = 0.5) + 1, # Reference (non-normal)
rexp(24, rate = 0.6) + 1 # Test
)
)
# Custom summary for bioequivalence (geometric mean +/- %CV)
geometric_mean_summary <- function(x) {
if (all(is.na(x))) return("N/A")
# Remove zeros and negative values for log transformation
x_pos <- x[x > 0 & !is.na(x)]
if (length(x_pos) == 0) return("N/A")
geom_mean <- round(exp(mean(log(x_pos))), 1)
cv_percent <- round(100 * sqrt(exp(sd(log(x_pos))^2) - 1), 1)
paste0(geom_mean, " (", cv_percent, "%CV)")
}
be_footnotes <- list(
variables = list(
cmax = "Maximum plasma concentration (ng/mL)",
auc = "Area under concentration-time curve (ng*h/mL)",
tmax = "Time to maximum concentration (hours)"
),
general = c(
"Randomized crossover bioequivalence study",
"Geometric mean and %CV shown for PK parameters",
"Non-parametric tests used for tmax (non-normal distribution)"
)
)
cat("### Bioequivalence Study Analysis\n")Bioequivalence Study Analysis
create_table(
formulation ~ cmax + auc,
data = be_data,
numeric_summary = geometric_mean_summary,
continuous_test = "welch",
pvalue = TRUE,
footnotes = be_footnotes,
theme = "jama"
)| Variable |
Reference (N=24) |
Test (N=24) |
P value |
|---|---|---|---|
| cmax¹ | 93.3 (16.6%CV) | 101.3 (21.8%CV) | 0.077 |
| auc² | 515 (14.1%CV) | 479.7 (18.6%CV) | 0.162 |
|
1 Maximum plasma concentration (ng/mL) 2 Area under concentration-time curve (ng*h/mL) • Randomized crossover bioequivalence study • Geometric mean and %CV shown for PK parameters • Non-parametric tests used for tmax (non-normal distribution) |
|||
cat("\n### Non-parametric Analysis for Tmax\n")Scenario 3: Multi-center Trial
For large multi-center trials, you might use different approaches:
# Simulate multi-center data
set.seed(101112)
center_data <- data.frame(
center = factor(paste("Center", rep(1:4, each = 30))),
treatment = factor(rep(rep(c("Active", "Control"), each = 15), 4)),
# Primary endpoint with center effects
primary_endpoint = c(
# Center 1
rnorm(15, mean = 75, sd = 12), rnorm(15, mean = 65, sd = 15),
# Center 2
rnorm(15, mean = 78, sd = 10), rnorm(15, mean = 68, sd = 12),
# Center 3
rnorm(15, mean = 72, sd = 14), rnorm(15, mean = 62, sd = 18),
# Center 4
rnorm(15, mean = 76, sd = 11), rnorm(15, mean = 66, sd = 13)
),
# Secondary binary endpoint
response = factor(c(
# Center 1: Active vs Control
sample(c("Success", "Failure"), 15, replace = TRUE, prob = c(0.7, 0.3)),
sample(c("Success", "Failure"), 15, replace = TRUE, prob = c(0.4, 0.6)),
# Center 2
sample(c("Success", "Failure"), 15, replace = TRUE, prob = c(0.75, 0.25)),
sample(c("Success", "Failure"), 15, replace = TRUE, prob = c(0.35, 0.65)),
# Center 3
sample(c("Success", "Failure"), 15, replace = TRUE, prob = c(0.65, 0.35)),
sample(c("Success", "Failure"), 15, replace = TRUE, prob = c(0.45, 0.55)),
# Center 4
sample(c("Success", "Failure"), 15, replace = TRUE, prob = c(0.8, 0.2)),
sample(c("Success", "Failure"), 15, replace = TRUE, prob = c(0.3, 0.7))
))
)
multicenter_footnotes <- list(
variables = list(
primary_endpoint = "Primary efficacy endpoint (0-100 scale)",
response = "Binary treatment response (success/failure)"
),
general = c(
"Multi-center randomized controlled trial",
"ANOVA used to account for treatment and center effects",
"Chi-square test for categorical outcomes (adequate sample size)"
)
)
cat("### Multi-center Trial: Overall Treatment Comparison\n")Multi-center Trial: Overall Treatment Comparison
create_table(
treatment ~ primary_endpoint + response,
data = center_data,
continuous_test = "anova", # ANOVA for multi-center
categorical_test = "chisq", # Chi-square for large samples
pvalue = TRUE,
totals = TRUE,
footnotes = multicenter_footnotes,
theme = "lancet"
)| Variable |
Active (N=60) |
Control (N=60) |
Total (N=120) |
P value |
|---|---|---|---|---|
| primary_endpoint¹ | 76.1 (12.5) | 64.7 (16.9) | 70.4 (15.9) | 0 |
| response² | ||||
| Failure | 15 (25%) | 30 (50%) | 45 (38%) | 0.008 |
| Success | 45 (75%) | 30 (50%) | 75 (62%) | |
|
1 Primary efficacy endpoint (0-100 scale) 2 Binary treatment response (success/failure) • Multi-center randomized controlled trial • ANOVA used to account for treatment and center effects • Chi-square test for categorical outcomes (adequate sample size) |
||||
cat("\n### Multi-center Trial: Stratified by Center\n")Multi-center Trial: Stratified by Center
create_table(
treatment ~ primary_endpoint + response,
data = center_data,
strata = "center",
continuous_test = "anova",
categorical_test = "chisq",
pvalue = TRUE,
theme = "lancet"
)| Variable |
Active (N=60) |
Control (N=60) |
P value |
|---|---|---|---|
| Center: Center 1 | |||
| primary_endpoint | 77.3 (14.7) | 71.5 (10.9) | 0 |
| response | |||
| Failure | 1 (6.7%) | 8 (53.3%) | 0.008 |
| Success | 14 (93.3%) | 7 (46.7%) | |
| Center: Center 2 | |||
| primary_endpoint | 76.1 (11.2) | 71.4 (15) | 0 |
| response | |||
| Failure | 2 (13.3%) | 6 (40%) | 0.008 |
| Success | 13 (86.7%) | 9 (60%) | |
| Center: Center 3 | |||
| primary_endpoint | 74.9 (15.5) | 55.5 (22.2) | 0 |
| response | |||
| Failure | 3 (20%) | 7 (46.7%) | 0.008 |
| Success | 12 (80%) | 8 (53.3%) | |
| Center: Center 4 | |||
| primary_endpoint | 76.2 (8.4) | 60.4 (12.5) | 0 |
| response | |||
| Failure | 9 (60%) | 9 (60%) | 0.008 |
| Success | 6 (40%) | 6 (40%) |
Best Practice Guidelines
Choosing Summary Statistics
cat("### Summary Statistic Selection Guidelines:\n\n")
#> ### Summary Statistic Selection Guidelines:
cat("**Mean +/- SD:** Use when data is approximately normal and for most clinical trials\n")
#> **Mean +/- SD:** Use when data is approximately normal and for most clinical trials
cat("- Standard for continuous endpoints in medical literature\n")
#> - Standard for continuous endpoints in medical literature
cat("- Allows readers to assess both central tendency and variability\n\n")
#> - Allows readers to assess both central tendency and variability
cat("**Median [IQR]:** Use for non-normal data or when outliers are present\n")
#> **Median [IQR]:** Use for non-normal data or when outliers are present
cat("- Biomarker levels, time-to-event data, cost data\n")
#> - Biomarker levels, time-to-event data, cost data
cat("- More robust to extreme values\n\n")
#> - More robust to extreme values
cat("**Median (range):** Use for small samples or ordinal data\n")
#> **Median (range):** Use for small samples or ordinal data
cat("- Phase I studies with small cohorts\n")
#> - Phase I studies with small cohorts
cat("- When full range of values is important\n\n")
#> - When full range of values is important
cat("**Mean +/- SE:** Use when emphasizing precision of the estimate\n")
#> **Mean +/- SE:** Use when emphasizing precision of the estimate
cat("- Experimental studies where precision matters\n")
#> - Experimental studies where precision matters
cat("- Generally not recommended for descriptive Table 1\n\n")
#> - Generally not recommended for descriptive Table 1
cat("### Statistical Test Selection Guidelines:\n\n")
#> ### Statistical Test Selection Guidelines:
cat("**Continuous Variables:**\n")
#> **Continuous Variables:**
cat("- `ttest` (default): Robust, works well for most scenarios\n")
#> - `ttest` (default): Robust, works well for most scenarios
cat("- `anova`: Traditional choice, equivalent to ttest for multiple groups\n")
#> - `anova`: Traditional choice, equivalent to ttest for multiple groups
cat("- `welch`: Two groups with potentially unequal variances\n")
#> - `welch`: Two groups with potentially unequal variances
cat("- `kruskal`: Non-parametric alternative for non-normal data\n\n")
#> - `kruskal`: Non-parametric alternative for non-normal data
cat("**Categorical Variables:**\n")
#> **Categorical Variables:**
cat("- `fisher` (default): Conservative, exact test, works with small samples\n")
#> - `fisher` (default): Conservative, exact test, works with small samples
cat("- `chisq`: Requires adequate cell counts (rule of thumb: all cells >= 5)\n\n")
#> - `chisq`: Requires adequate cell counts (rule of thumb: all cells >= 5)Theme Integration
All customization options work seamlessly with medical journal themes:
cat("### NEJM Theme with Custom Bootstrap CI Summary\n")NEJM Theme with Custom Bootstrap CI Summary
create_table(
treatment ~ efficacy_score + response,
data = clinical_data,
numeric_summary = bootstrap_ci_summary,
continuous_test = "anova",
categorical_test = "fisher",
pvalue = TRUE,
totals = TRUE,
theme = "nejm"
)| Variable |
High Dose (N=40) |
Low Dose (N=40) |
Placebo (N=40) |
Total (N=120) |
P value |
|---|---|---|---|---|---|
| efficacy_score | 35.1 (33.2-37) | 28.1 (25.8-30.9) | 19.2 (16.7-21.8) | 27.5 (25.7-29.3) | 0 |
| response – no. (%) | 32 (80) | 22 (55) | 9 (22.5) | 63 (52.5) | 0 |
cat("\n### JAMA Theme with Robust Statistics\n")JAMA Theme with Robust Statistics
create_table(
treatment ~ biomarker_level + safety_grade,
data = clinical_data,
numeric_summary = robust_summary,
continuous_test = "kruskal",
categorical_test = "chisq",
pvalue = TRUE,
totals = TRUE,
theme = "jama"
)| Variable |
High Dose (N=40) |
Low Dose (N=40) |
Placebo (N=40) |
Total (N=120) |
P value |
|---|---|---|---|---|---|
| biomarker_level | 5.3 [+/-5] | 5.6 [+/-7.1] | 7.4 [+/-6.1] | 6.3 [+/-6.8] | 0.822 |
| safety_grade | |||||
| None | 11 (28%) | 17 (42%) | 19 (48%) | 47 (39%) | 0.019 |
| Mild | 14 (35%) | 17 (42%) | 15 (38%) | 46 (38%) | |
| Moderate | 15 (38%) | 4 (10%) | 4 (10%) | 23 (19%) | |
| Severe | 0 (0%) | 2 (5%) | 2 (5%) | 4 (3%) |
Conclusion
The zztable1 package provides comprehensive
customization options that allow you to:
- Adapt to different data types using appropriate summary statistics
- Choose statistically appropriate tests for your study design
- Maintain journal formatting standards across all customizations
- Handle complex study designs like dose-escalation and multi-center trials
Key Features Demonstrated:
- Built-in summaries: mean_sd, median_iqr, median_range, mean_se, mean_ci
- Custom summary functions: Bootstrap CI, robust statistics, multi-line summaries
- Statistical tests: ttest, anova, welch, kruskal for continuous; fisher, chisq for categorical
- Real-world scenarios: Dose-escalation, bioequivalence, multi-center studies
- Theme integration: All customizations work with NEJM, JAMA, Lancet themes
The consistent parameter interface (numeric_summary,
continuous_test, categorical_test) makes it
easy to standardize analyses across studies while maintaining the
flexibility to adapt to specific requirements.
Package Features Demonstrated:
- Flexible summary statistics with custom function support
- Comprehensive statistical test options
- Medical journal theme integration
- Real-world clinical trial scenarios
- Best practice guidelines for method selection
- Seamless parameter integration across all output formats