
Stratified Table 1 Analysis Examples
zztable1
2026-05-02
Source:vignettes/stratified_examples.Rmd
stratified_examples.RmdIntroduction
Stratified analysis is a crucial component of epidemiological and
clinical research. This vignette demonstrates how to create stratified
Table 1 analyses using the zztable1 package. Stratified
tables allow you to examine patterns within subgroups of your data,
revealing important differences that might be obscured in overall
analyses.
Clinical Trial Data Example
Let’s create a comprehensive clinical trial dataset that includes multiple potential stratification variables.
# Create a realistic multi-center clinical trial dataset
set.seed(42)
n <- 300
clinical_data <- data.frame(
# Primary treatment variable
treatment = factor(
sample(c("Placebo", "Low Dose", "High Dose"), n, replace = TRUE,
prob = c(0.4, 0.3, 0.3)),
levels = c("Placebo", "Low Dose", "High Dose")
),
# Potential stratification variables
site = factor(sample(paste("Site", LETTERS[1:4]), n, replace = TRUE)),
sex = factor(sample(c("Male", "Female"), n, replace = TRUE, prob = c(0.55, 0.45))),
age_group = factor(
sample(c("18-44", "45-64", "65+"), n, replace = TRUE, prob = c(0.3, 0.4, 0.3)),
levels = c("18-44", "45-64", "65+")
),
disease_severity = factor(
sample(c("Mild", "Moderate", "Severe"), n, replace = TRUE, prob = c(0.4, 0.4, 0.2)),
levels = c("Mild", "Moderate", "Severe")
),
# Baseline characteristics
age = round(rnorm(n, 58, 15)),
bmi = round(rnorm(n, 26.5, 4.2), 1),
systolic_bp = round(rnorm(n, 135, 18)),
# Comorbidities
diabetes = factor(sample(c("No", "Yes"), n, replace = TRUE, prob = c(0.75, 0.25))),
hypertension = factor(sample(c("No", "Yes"), n, replace = TRUE, prob = c(0.65, 0.35))),
# Lab values
hemoglobin = round(rnorm(n, 13.2, 1.8), 1),
creatinine = round(rnorm(n, 1.1, 0.3), 2)
)
# Add some realistic missing values
clinical_data$bmi[sample(1:n, 8)] <- NA
clinical_data$hemoglobin[sample(1:n, 5)] <- NA
clinical_data$creatinine[sample(1:n, 3)] <- NA
# Show dataset structure
str(clinical_data)‘data.frame’: 300 obs. of 12 variables: $ treatment : Factor w/ 3 levels “Placebo”,“Low Dose”,..: 2 2 1 2 3 3 2 1 3 2 … $ site : Factor w/ 4 levels “Site A”,“Site B”,..: 1 4 2 4 1 1 2 3 1 2 … $ sex : Factor w/ 2 levels “Female”,“Male”: 2 2 1 2 2 2 1 2 2 2 … $ age_group : Factor w/ 3 levels “18-44”,“45-64”,..: 2 2 3 2 3 1 2 2 3 2 … $ disease_severity: Factor w/ 3 levels “Mild”,“Moderate”,..: 1 1 1 2 3 3 1 1 2 1 … $ age : num 34 29 53 61 24 62 64 77 60 61 … $ bmi : num 27.5 28.3 26.5 29.1 28.3 28.8 28.6 33.5 22.1 32.8 … $ systolic_bp : num 128 132 148 126 144 145 117 134 125 143 … $ diabetes : Factor w/ 2 levels “No”,“Yes”: 1 1 1 2 1 1 1 1 1 1 … $ hypertension : Factor w/ 2 levels “No”,“Yes”: 1 1 1 2 1 1 1 2 1 1 … $ hemoglobin : num 13.9 12.4 13.4 12 14.6 10.1 13.5 13.2 15.1 12.8 … $ creatinine : num 0.94 1.54 1.23 0.79 0.71 1.17 1.01 1.07 1.05 1.3 …
head(clinical_data, 10)treatment site sex age_group disease_severity age bmi systolic_bp 1 Low Dose Site A Male 45-64 Mild 34 27.5 128 2 Low Dose Site D Male 45-64 Mild 29 28.3 132 3 Placebo Site B Female 65+ Mild 53 26.5 148 4 Low Dose Site D Male 45-64 Moderate 61 29.1 126 5 High Dose Site A Male 65+ Severe 24 28.3 144 6 High Dose Site A Male 18-44 Severe 62 28.8 145 7 Low Dose Site B Female 45-64 Mild 64 28.6 117 8 Placebo Site C Male 45-64 Mild 77 33.5 134 9 High Dose Site A Male 65+ Moderate 60 22.1 125 10 Low Dose Site B Male 45-64 Mild 61 32.8 143 diabetes hypertension hemoglobin creatinine 1 No No 13.9 0.94 2 No No 12.4 1.54 3 No No 13.4 1.23 4 Yes Yes 12.0 0.79 5 No No 14.6 0.71 6 No No 10.1 1.17 7 No No 13.5 1.01 8 No Yes 13.2 1.07 9 No No 15.1 1.05 10 No No 12.8 1.30
Stratified Analysis Examples
Example 1: Stratified by Study Site
Understanding how baseline characteristics vary across different study sites is crucial for multi-center trials.
create_table(
treatment ~ age + sex + bmi + diabetes + systolic_bp,
data = clinical_data,
strata = "site",
theme = "nejm",
pvalue = TRUE,
totals = TRUE
)| Variable |
Placebo (N=123) |
Low Dose (N=94) |
High Dose (N=83) |
Total (N=300) |
P value |
|---|---|---|---|---|---|
| Site: Site A | |||||
| age | 57 ± 13 | 57.2 ± 19.7 | 61.6 ± 17.1 | NaN ± NA | 0.822 |
| sex – no. (%) | 75 (267.9) | 52 (247.6) | 45 (160.7) | 172 (223.4) | 0.558 |
| bmi | 26.9 ± 3.6 | 26.9 ± 4.4 | 26.4 ± 4.8 | NaN ± NA | 0.746 |
| diabetes – no. (%) | 36 (128.6) | 29 (138.1) | 25 (89.3) | 90 (116.9) | 0.987 |
| systolic_bp | 138.9 ± 14.8 | 134 ± 17.3 | 134.1 ± 14.7 | NaN ± NA | 0.724 |
| Site: Site D | |||||
| age | 62.5 ± 16.1 | 60.6 ± 14.9 | 56.3 ± 12.6 | NaN ± NA | 0.822 |
| sex – no. (%) | 75 (340.9) | 52 (371.4) | 45 (281.2) | 172 (330.8) | 0.558 |
| bmi | 26.9 ± 3.5 | 27.6 ± 5.4 | 25.9 ± 3 | NaN ± NA | 0.746 |
| diabetes – no. (%) | 36 (163.6) | 29 (207.1) | 25 (156.2) | 90 (173.1) | 0.987 |
| systolic_bp | 131.9 ± 16.7 | 136 ± 18.9 | 135.8 ± 12.6 | NaN ± NA | 0.724 |
| Site: Site B | |||||
| age | 54.7 ± 16.6 | 54.2 ± 12.7 | 56 ± 19.3 | NaN ± NA | 0.822 |
| sex – no. (%) | 75 (182.9) | 52 (152.9) | 45 (214.3) | 172 (179.2) | 0.558 |
| bmi | 27.1 ± 4.2 | 26.2 ± 4.2 | 27.2 ± 4.4 | NaN ± NA | 0.746 |
| diabetes – no. (%) | 36 (87.8) | 29 (85.3) | 25 (119) | 90 (93.8) | 0.987 |
| systolic_bp | 135.6 ± 18.2 | 131.4 ± 18.7 | 134.2 ± 21.7 | NaN ± NA | 0.724 |
| Site: Site C | |||||
| age | 58.2 ± 18.3 | 58.7 ± 16.4 | 61 ± 14.3 | NaN ± NA | 0.822 |
| sex – no. (%) | 75 (234.4) | 52 (208) | 45 (250) | 172 (229.3) | 0.558 |
| bmi | 25.2 ± 4 | 26.9 ± 3.9 | 24.9 ± 4.4 | NaN ± NA | 0.746 |
| diabetes – no. (%) | 36 (112.5) | 29 (116) | 25 (138.9) | 90 (120) | 0.987 |
| systolic_bp | 137.3 ± 15.9 | 141.2 ± 17.7 | 129.2 ± 19.9 | NaN ± NA | 0.724 |
Key Observations: - Each study site shows as a separate table section - Within-site treatment group comparisons - Helps identify site-specific recruitment patterns - Essential for assessing treatment balance across sites
Example 2: Stratified by Sex
Sex-stratified analysis is important for understanding treatment effects and baseline differences between male and female participants.
create_table(
treatment ~ age + age_group + bmi + diabetes + hypertension + hemoglobin,
data = clinical_data,
strata = "sex",
theme = "lancet",
pvalue = TRUE,
totals = TRUE
)| Variable |
Placebo (N=123) |
Low Dose (N=94) |
High Dose (N=83) |
Total (N=300) |
P value |
|---|---|---|---|---|---|
| Sex: Male | |||||
| age | 58.2 (17.7) | 54.4 (16) | 59.6 (15.8) | NaN (NA) | 0.822 |
| age_group | |||||
| 18-44 | 21 (28%) | 13 (25%) | 12 (26.7%) | 0 (0%) | 0.936 |
| 45-64 | 33 (44%) | 25 (48.1%) | 18 (40%) | 0 (0%) | |
| 65+ | 21 (28%) | 14 (26.9%) | 15 (33.3%) | 0 (0%) | |
| bmi | 26.9 (4.1) | 26.9 (4.3) | 26.3 (3.3) | NaN (NA) | 0.746 |
| diabetes | |||||
| No | 50 (66.7%) | 36 (69.2%) | 32 (71.1%) | 0 (0%) | 0.987 |
| Yes | 25 (33.3%) | 16 (30.8%) | 13 (28.9%) | 0 (0%) | |
| hypertension | |||||
| No | 45 (60%) | 31 (59.6%) | 33 (73.3%) | 0 (0%) | 0.336 |
| Yes | 30 (40%) | 21 (40.4%) | 12 (26.7%) | 0 (0%) | |
| hemoglobin | 13.3 (1.7) | 13 (1.8) | 13.7 (1.7) | NaN (NA) | 0.329 |
| Sex: Female | |||||
| age | 56.5 (13.8) | 60.2 (14.9) | 58.4 (17) | NaN (NA) | 0.822 |
| age_group | |||||
| 18-44 | 17 (35.4%) | 12 (28.6%) | 13 (34.2%) | 0 (0%) | 0.932 |
| 45-64 | 16 (33.3%) | 17 (40.5%) | 18 (47.4%) | 0 (0%) | |
| 65+ | 15 (31.2%) | 13 (31%) | 7 (18.4%) | 0 (0%) | |
| bmi | 26.1 (3.6) | 26.5 (4.4) | 26.1 (5.3) | NaN (NA) | 0.746 |
| diabetes | |||||
| No | 37 (77.1%) | 29 (69%) | 26 (68.4%) | 0 (0%) | 0.987 |
| Yes | 11 (22.9%) | 13 (31%) | 12 (31.6%) | 0 (0%) | |
| hypertension | |||||
| No | 27 (56.2%) | 28 (66.7%) | 24 (63.2%) | 0 (0%) | 0.336 |
| Yes | 21 (43.8%) | 14 (33.3%) | 14 (36.8%) | 0 (0%) | |
| hemoglobin | 13.3 (2) | 13.1 (1.9) | 12.4 (1.7) | NaN (NA) | 0.329 |
Key Observations: - Separate baseline characteristics for males and females - Age and BMI distributions may differ by sex - Hemoglobin levels typically differ between sexes - Treatment allocation balance within each sex
Example 3: Stratified by Disease Severity
Disease severity stratification helps understand how patient characteristics and treatment allocation vary by baseline disease status.
create_table(
treatment ~ age + sex + bmi + systolic_bp + diabetes + hypertension + creatinine,
data = clinical_data,
strata = "disease_severity",
theme = "jama",
pvalue = TRUE,
totals = TRUE
)| Variable |
Placebo (N=123) |
Low Dose (N=94) |
High Dose (N=83) |
Total (N=300) |
P value |
|---|---|---|---|---|---|
| Disease_severity: Mild | |||||
| age | 56.9 (15.8) | 57.8 (15.5) | 58.5 (17.2) | NaN (NA) | 0.822 |
| sex | |||||
| Female | 23 (43.4%) | 22 (50%) | 9 (34.6%) | 0 (0%) | 0.558 |
| Male | 30 (56.6%) | 22 (50%) | 17 (65.4%) | 0 (0%) | |
| bmi | 26.7 (3.7) | 26.8 (4.4) | 25.2 (4.7) | NaN (NA) | 0.746 |
| systolic_bp | 137.2 (16.4) | 135.9 (19.3) | 135.7 (19.5) | NaN (NA) | 0.724 |
| diabetes | |||||
| No | 43 (81.1%) | 32 (72.7%) | 15 (57.7%) | 0 (0%) | 0.987 |
| Yes | 10 (18.9%) | 12 (27.3%) | 11 (42.3%) | 0 (0%) | |
| hypertension | |||||
| No | 30 (56.6%) | 26 (59.1%) | 17 (65.4%) | 0 (0%) | 0.336 |
| Yes | 23 (43.4%) | 18 (40.9%) | 9 (34.6%) | 0 (0%) | |
| creatinine | 1.1 (0.3) | 1 (0.4) | 1.1 (0.3) | NaN (NA) | 0.777 |
| Disease_severity: Moderate | |||||
| age | 56.4 (16) | 53.8 (14.8) | 59.2 (16.4) | NaN (NA) | 0.822 |
| sex | |||||
| Female | 20 (39.2%) | 12 (35.3%) | 21 (52.5%) | 0 (0%) | 0.558 |
| Male | 31 (60.8%) | 22 (64.7%) | 19 (47.5%) | 0 (0%) | |
| bmi | 26.7 (4.3) | 26.5 (4.6) | 27 (4.1) | NaN (NA) | 0.746 |
| systolic_bp | 135.9 (17.8) | 134.2 (16) | 133.2 (16.5) | NaN (NA) | 0.724 |
| diabetes | |||||
| No | 31 (60.8%) | 24 (70.6%) | 30 (75%) | 0 (0%) | 0.987 |
| Yes | 20 (39.2%) | 10 (29.4%) | 10 (25%) | 0 (0%) | |
| hypertension | |||||
| No | 35 (68.6%) | 24 (70.6%) | 32 (80%) | 0 (0%) | 0.336 |
| Yes | 16 (31.4%) | 10 (29.4%) | 8 (20%) | 0 (0%) | |
| creatinine | 1.1 (0.3) | 1.2 (0.3) | 1.1 (0.3) | NaN (NA) | 0.777 |
| Disease_severity: Severe | |||||
| age | 62.1 (18.5) | 61.9 (17.5) | 59.5 (15.4) | NaN (NA) | 0.822 |
| sex | |||||
| Female | 5 (26.3%) | 8 (50%) | 8 (47.1%) | 0 (0%) | 0.558 |
| Male | 14 (73.7%) | 8 (50%) | 9 (52.9%) | 0 (0%) | |
| bmi | 26 (3.7) | 27.3 (3.8) | 25.7 (3.9) | NaN (NA) | 0.746 |
| systolic_bp | 133.8 (14) | 136.1 (20.8) | 130.5 (16.6) | NaN (NA) | 0.724 |
| diabetes | |||||
| No | 13 (68.4%) | 9 (56.2%) | 13 (76.5%) | 0 (0%) | 0.987 |
| Yes | 6 (31.6%) | 7 (43.8%) | 4 (23.5%) | 0 (0%) | |
| hypertension | |||||
| No | 7 (36.8%) | 9 (56.2%) | 8 (47.1%) | 0 (0%) | 0.336 |
| Yes | 12 (63.2%) | 7 (43.8%) | 9 (52.9%) | 0 (0%) | |
| creatinine | 1.3 (0.3) | 1.2 (0.3) | 1.2 (0.2) | NaN (NA) | 0.777 |
Key Observations: - Baseline characteristics across mild, moderate, and severe disease - Treatment allocation may vary by severity - Comorbidity prevalence often increases with disease severity - Important for stratified randomization assessment
Example 4: Stratified by Age Group
Age-group stratified analysis reveals how baseline characteristics and treatment allocation vary across different age ranges.
create_table(
treatment ~ sex + bmi + systolic_bp + diabetes + hypertension + hemoglobin + creatinine,
data = clinical_data,
strata = "age_group",
theme = "nejm",
pvalue = TRUE,
totals = TRUE
)| Variable |
Placebo (N=123) |
Low Dose (N=94) |
High Dose (N=83) |
Total (N=300) |
P value |
|---|---|---|---|---|---|
| Age_group: 45-64 | |||||
| sex – no. (%) | 75 (153.1) | 52 (123.8) | 45 (125) | 172 (135.4) | 0.558 |
| bmi | 26.9 ± 3.9 | 25.7 ± 4.5 | 25.5 ± 4.5 | NaN ± NA | 0.746 |
| systolic_bp | 135.7 ± 15.7 | 134.8 ± 19.9 | 131.8 ± 18.5 | NaN ± NA | 0.724 |
| diabetes – no. (%) | 36 (73.5) | 29 (69) | 25 (69.4) | 90 (70.9) | 0.987 |
| hypertension – no. (%) | 51 (104.1) | 35 (83.3) | 26 (72.2) | 112 (88.2) | 0.336 |
| hemoglobin | 13.6 ± 1.7 | 13.1 ± 1.5 | 13.2 ± 1.8 | NaN ± NA | 0.329 |
| creatinine | 1.1 ± 0.3 | 1.2 ± 0.4 | 1 ± 0.3 | NaN ± NA | 0.777 |
| Age_group: 65+ | |||||
| sex – no. (%) | 75 (208.3) | 52 (192.6) | 45 (204.5) | 172 (202.4) | 0.558 |
| bmi | 26.1 ± 4.1 | 28 ± 4.9 | 26.4 ± 3.6 | NaN ± NA | 0.746 |
| systolic_bp | 134.2 ± 15.1 | 133.6 ± 17.3 | 132.5 ± 17.9 | NaN ± NA | 0.724 |
| diabetes – no. (%) | 36 (100) | 29 (107.4) | 25 (113.6) | 90 (105.9) | 0.987 |
| hypertension – no. (%) | 51 (141.7) | 35 (129.6) | 26 (118.2) | 112 (131.8) | 0.336 |
| hemoglobin | 13.1 ± 1.8 | 13 ± 2.1 | 13.3 ± 1.7 | NaN ± NA | 0.329 |
| creatinine | 1.2 ± 0.3 | 1 ± 0.3 | 1.1 ± 0.3 | NaN ± NA | 0.777 |
| Age_group: 18-44 | |||||
| sex – no. (%) | 75 (197.4) | 52 (208) | 45 (180) | 172 (195.5) | 0.558 |
| bmi | 26.6 ± 3.8 | 27.1 ± 3 | 27 ± 4.6 | NaN ± NA | 0.746 |
| systolic_bp | 138.5 ± 19 | 138 ± 16.8 | 136.5 ± 15.4 | NaN ± NA | 0.724 |
| diabetes – no. (%) | 36 (94.7) | 29 (116) | 25 (100) | 90 (102.3) | 0.987 |
| hypertension – no. (%) | 51 (134.2) | 35 (140) | 26 (104) | 112 (127.3) | 0.336 |
| hemoglobin | 13.1 ± 1.9 | 13 ± 2.1 | 12.7 ± 1.9 | NaN ± NA | 0.329 |
| creatinine | 1.1 ± 0.3 | 1.1 ± 0.2 | 1.2 ± 0.3 | NaN ± NA | 0.777 |
Key Observations: - Younger vs middle-aged vs older participants - Comorbidity prevalence increases with age - Lab values may vary by age group - Treatment allocation balance across age groups
Advanced Stratified Analysis
Multiple Variables with Missing Data
Let’s examine how stratified analysis handles missing data and multiple variable types.
create_table(
treatment ~ age + bmi + hemoglobin + creatinine + diabetes + hypertension,
data = clinical_data,
strata = "sex",
theme = "lancet",
missing = TRUE, # Show missing value patterns
pvalue = TRUE,
totals = TRUE
)| Variable |
Placebo (N=123) |
Low Dose (N=94) |
High Dose (N=83) |
Total (N=300) |
P value |
|---|---|---|---|---|---|
| Sex: Male | |||||
| age | 58.2 (17.7) | 54.4 (16) | 59.6 (15.8) | NaN (NA) | 0.822 |
| bmi | 26.9 (4.1) | 26.9 (4.3) | 26.3 (3.3) | NaN (NA) | 0.746 |
| hemoglobin | 13.3 (1.7) | 13 (1.8) | 13.7 (1.7) | NaN (NA) | 0.329 |
| creatinine | 1.1 (0.3) | 1.2 (0.3) | 1.1 (0.3) | NaN (NA) | 0.777 |
| diabetes | |||||
| No | 50 (66.7%) | 36 (69.2%) | 32 (71.1%) | 0 (0%) | 0.987 |
| Yes | 25 (33.3%) | 16 (30.8%) | 13 (28.9%) | 0 (0%) | |
| hypertension | |||||
| No | 45 (60%) | 31 (59.6%) | 33 (73.3%) | 0 (0%) | 0.336 |
| Yes | 30 (40%) | 21 (40.4%) | 12 (26.7%) | 0 (0%) | |
| Sex: Female | |||||
| age | 56.5 (13.8) | 60.2 (14.9) | 58.4 (17) | NaN (NA) | 0.822 |
| bmi | 26.1 (3.6) | 26.5 (4.4) | 26.1 (5.3) | NaN (NA) | 0.746 |
| hemoglobin | 13.3 (2) | 13.1 (1.9) | 12.4 (1.7) | NaN (NA) | 0.329 |
| creatinine | 1.2 (0.3) | 1 (0.3) | 1.1 (0.3) | NaN (NA) | 0.777 |
| diabetes | |||||
| No | 37 (77.1%) | 29 (69%) | 26 (68.4%) | 0 (0%) | 0.987 |
| Yes | 11 (22.9%) | 13 (31%) | 12 (31.6%) | 0 (0%) | |
| hypertension | |||||
| No | 27 (56.2%) | 28 (66.7%) | 24 (63.2%) | 0 (0%) | 0.336 |
| Yes | 21 (43.8%) | 14 (33.3%) | 14 (36.8%) | 0 (0%) | |
Site and Sex Combined Analysis
For comprehensive analysis, we might want to examine the interaction of multiple stratification factors.
# Create a combined stratification variable for demonstration
clinical_data$site_sex <- interaction(clinical_data$site, clinical_data$sex, sep = " - ")
# Show the distribution
table(clinical_data$site_sex, clinical_data$treatment) Placebo Low Dose High Dose
Site A - Female 10 8 13 Site B - Female 10 19 11 Site C - Female 20 10 8 Site D - Female 8 5 6 Site A - Male 18 13 15 Site B - Male 31 15 10 Site C - Male 12 15 10 Site D - Male 14 9 10
create_table(
treatment ~ age + bmi + diabetes + systolic_bp,
data = clinical_data,
strata = "site_sex",
theme = "jama",
pvalue = TRUE
)| Variable |
Placebo (N=123) |
Low Dose (N=94) |
High Dose (N=83) |
P value |
|---|---|---|---|---|
| Site_sex: Site A - Male | ||||
| age | 58 (12.5) | 51.9 (16.4) | 58.9 (19.2) | 0.822 |
| bmi | 25.9 (3.5) | 27.2 (4.7) | 26.9 (3.6) | 0.746 |
| diabetes | ||||
| No | 8 (44.4%) | 9 (69.2%) | 12 (80%) | 0.987 |
| Yes | 10 (55.6%) | 4 (30.8%) | 3 (20%) | |
| systolic_bp | 137.5 (16.3) | 133.2 (18.5) | 129.1 (14.4) | 0.724 |
| Site_sex: Site D - Male | ||||
| age | 64.9 (17.6) | 59.7 (16.2) | 59.5 (13.6) | 0.822 |
| bmi | 27.7 (3.9) | 26.3 (4.8) | 26.3 (2.5) | 0.746 |
| diabetes | ||||
| No | 10 (71.4%) | 6 (66.7%) | 8 (80%) | 0.987 |
| Yes | 4 (28.6%) | 3 (33.3%) | 2 (20%) | |
| systolic_bp | 133.5 (15.5) | 134.3 (18.6) | 137.7 (15.3) | 0.724 |
| Site_sex: Site B - Female | ||||
| age | 52.1 (13.8) | 54.3 (11.1) | 50.5 (19.4) | 0.822 |
| bmi | 25.8 (3.8) | 25.8 (3.6) | 28.5 (4.3) | 0.746 |
| diabetes | ||||
| No | 8 (80%) | 17 (89.5%) | 7 (63.6%) | 0.987 |
| Yes | 2 (20%) | 2 (10.5%) | 4 (36.4%) | |
| systolic_bp | 135.5 (12.5) | 123.7 (18.3) | 129.5 (22.6) | 0.724 |
| Site_sex: Site C - Male | ||||
| age | 57.4 (24.2) | 53.9 (17.4) | 58.1 (11.2) | 0.822 |
| bmi | 25.4 (4.4) | 27.3 (3.4) | 25.6 (2.8) | 0.746 |
| diabetes | ||||
| No | 9 (75%) | 9 (60%) | 7 (70%) | 0.987 |
| Yes | 3 (25%) | 6 (40%) | 3 (30%) | |
| systolic_bp | 131.2 (12.3) | 140.3 (20.5) | 131.3 (15.5) | 0.724 |
| Site_sex: Site B - Male | ||||
| age | 55.5 (17.5) | 54 (14.9) | 62 (18.2) | 0.822 |
| bmi | 27.6 (4.3) | 26.8 (4.9) | 25.9 (4.3) | 0.746 |
| diabetes | ||||
| No | 23 (74.2%) | 12 (80%) | 5 (50%) | 0.987 |
| Yes | 8 (25.8%) | 3 (20%) | 5 (50%) | |
| systolic_bp | 135.7 (19.9) | 141.2 (14.7) | 139.5 (20.4) | 0.724 |
| Site_sex: Site C - Female | ||||
| age | 58.7 (14.4) | 65.9 (12.1) | 64.6 (17.5) | 0.822 |
| bmi | 25.1 (3.9) | 26.3 (4.7) | 24.2 (5.9) | 0.746 |
| diabetes | ||||
| No | 15 (75%) | 6 (60%) | 7 (87.5%) | 0.987 |
| Yes | 5 (25%) | 4 (40%) | 1 (12.5%) | |
| systolic_bp | 140.9 (16.9) | 142.5 (13.2) | 126.5 (25.2) | 0.724 |
| Site_sex: Site D - Female | ||||
| age | 58.2 (12.9) | 62.4 (13.6) | 51 (9.5) | 0.822 |
| bmi | 25.5 (2.3) | 29.9 (6.2) | 25.2 (3.8) | 0.746 |
| diabetes | ||||
| No | 5 (62.5%) | 2 (40%) | 3 (50%) | 0.987 |
| Yes | 3 (37.5%) | 3 (60%) | 3 (50%) | |
| systolic_bp | 129 (19.4) | 139 (21.2) | 132.5 (6.2) | 0.724 |
| Site_sex: Site A - Female | ||||
| age | 55.1 (14.3) | 65.9 (22.7) | 64.7 (14.5) | 0.822 |
| bmi | 28.7 (3.2) | 26.3 (4.3) | 25.8 (5.9) | 0.746 |
| diabetes | ||||
| No | 9 (90%) | 4 (50%) | 9 (69.2%) | 0.987 |
| Yes | 1 (10%) | 4 (50%) | 4 (30.8%) | |
| systolic_bp | 141.4 (12) | 135.4 (16.2) | 139.8 (13.4) | 0.724 |
Comparative Analysis
Before and After Stratification
Let’s compare an overall analysis with a stratified analysis to see how stratification reveals important patterns.
Overall Analysis (Non-stratified)
create_table(
treatment ~ age + sex + bmi + diabetes + hypertension + systolic_bp,
data = clinical_data,
theme = "console",
pvalue = TRUE,
totals = TRUE
)| Variable |
Placebo (N=123) |
Low Dose (N=94) |
High Dose (N=83) |
Total (N=300) |
P value |
|---|---|---|---|---|---|
| age | 57.5 (16.3) | 57 (15.7) | 59 (16.3) | 57.8 (16.1) | 0.822 |
| sex | |||||
| Female | 48 (39%) | 42 (45%) | 38 (46%) | 128 (43%) | 0.558 |
| Male | 75 (61%) | 52 (55%) | 45 (54%) | 172 (57%) | |
| bmi | 26.6 (3.9) | 26.8 (4.3) | 26.2 (4.3) | 26.5 (4.2) | 0.746 |
| diabetes | |||||
| No | 87 (71%) | 65 (69%) | 58 (70%) | 210 (70%) | 0.987 |
| Yes | 36 (29%) | 29 (31%) | 25 (30%) | 90 (30%) | |
| hypertension | |||||
| No | 72 (59%) | 59 (63%) | 57 (69%) | 188 (63%) | 0.336 |
| Yes | 51 (41%) | 35 (37%) | 26 (31%) | 112 (37%) | |
| systolic_bp | 136.1 (16.6) | 135.3 (18.3) | 133.4 (17.4) | 135.1 (17.3) | 0.724 |
Stratified by Disease Severity
create_table(
treatment ~ age + sex + bmi + diabetes + hypertension + systolic_bp,
data = clinical_data,
strata = "disease_severity",
theme = "console",
pvalue = TRUE,
totals = TRUE
)| Variable |
Placebo (N=123) |
Low Dose (N=94) |
High Dose (N=83) |
Total (N=300) |
P value |
|---|---|---|---|---|---|
| Disease_severity: Mild | |||||
| age | 56.9 (15.8) | 57.8 (15.5) | 58.5 (17.2) | NaN (NA) | 0.822 |
| sex | |||||
| Female | 23 (43.4%) | 22 (50%) | 9 (34.6%) | 0 (0%) | 0.558 |
| Male | 30 (56.6%) | 22 (50%) | 17 (65.4%) | 0 (0%) | |
| bmi | 26.7 (3.7) | 26.8 (4.4) | 25.2 (4.7) | NaN (NA) | 0.746 |
| diabetes | |||||
| No | 43 (81.1%) | 32 (72.7%) | 15 (57.7%) | 0 (0%) | 0.987 |
| Yes | 10 (18.9%) | 12 (27.3%) | 11 (42.3%) | 0 (0%) | |
| hypertension | |||||
| No | 30 (56.6%) | 26 (59.1%) | 17 (65.4%) | 0 (0%) | 0.336 |
| Yes | 23 (43.4%) | 18 (40.9%) | 9 (34.6%) | 0 (0%) | |
| systolic_bp | 137.2 (16.4) | 135.9 (19.3) | 135.7 (19.5) | NaN (NA) | 0.724 |
| Disease_severity: Moderate | |||||
| age | 56.4 (16) | 53.8 (14.8) | 59.2 (16.4) | NaN (NA) | 0.822 |
| sex | |||||
| Female | 20 (39.2%) | 12 (35.3%) | 21 (52.5%) | 0 (0%) | 0.558 |
| Male | 31 (60.8%) | 22 (64.7%) | 19 (47.5%) | 0 (0%) | |
| bmi | 26.7 (4.3) | 26.5 (4.6) | 27 (4.1) | NaN (NA) | 0.746 |
| diabetes | |||||
| No | 31 (60.8%) | 24 (70.6%) | 30 (75%) | 0 (0%) | 0.987 |
| Yes | 20 (39.2%) | 10 (29.4%) | 10 (25%) | 0 (0%) | |
| hypertension | |||||
| No | 35 (68.6%) | 24 (70.6%) | 32 (80%) | 0 (0%) | 0.336 |
| Yes | 16 (31.4%) | 10 (29.4%) | 8 (20%) | 0 (0%) | |
| systolic_bp | 135.9 (17.8) | 134.2 (16) | 133.2 (16.5) | NaN (NA) | 0.724 |
| Disease_severity: Severe | |||||
| age | 62.1 (18.5) | 61.9 (17.5) | 59.5 (15.4) | NaN (NA) | 0.822 |
| sex | |||||
| Female | 5 (26.3%) | 8 (50%) | 8 (47.1%) | 0 (0%) | 0.558 |
| Male | 14 (73.7%) | 8 (50%) | 9 (52.9%) | 0 (0%) | |
| bmi | 26 (3.7) | 27.3 (3.8) | 25.7 (3.9) | NaN (NA) | 0.746 |
| diabetes | |||||
| No | 13 (68.4%) | 9 (56.2%) | 13 (76.5%) | 0 (0%) | 0.987 |
| Yes | 6 (31.6%) | 7 (43.8%) | 4 (23.5%) | 0 (0%) | |
| hypertension | |||||
| No | 7 (36.8%) | 9 (56.2%) | 8 (47.1%) | 0 (0%) | 0.336 |
| Yes | 12 (63.2%) | 7 (43.8%) | 9 (52.9%) | 0 (0%) | |
| systolic_bp | 133.8 (14) | 136.1 (20.8) | 130.5 (16.6) | NaN (NA) | 0.724 |
Key Differences: - Overall analysis may mask important subgroup differences - Stratified analysis reveals severity-specific patterns - P-values may differ when accounting for stratification - Treatment balance assessment within severity levels
Summary and Best Practices
When to Use Stratified Analysis
- Multi-center Studies: Always stratify by study site
- Sex Differences: Important for most clinical studies
- Age Groups: Especially relevant for studies spanning wide age ranges
- Disease Severity: Critical for understanding baseline risk
- Geographic Regions: For studies spanning different populations
Interpretation Guidelines
- Sample Sizes: Check adequate sample sizes within strata
- Missing Data: Consider missing data patterns within strata
- P-values: Interpret within-strata comparisons carefully
- Clinical Relevance: Focus on clinically meaningful differences
- Multiple Comparisons: Consider adjustment for multiple testing
Available Stratification Variables in This Dataset
| Variable | Description | Use Case |
|---|---|---|
| site | Study site (A, B, C, D) | Multi-center trial balance |
| sex | Participant sex (Male, Female) | Sex-specific effects |
| age_group | Age groups (18-44, 45-64, 65+) | Age-related patterns |
| disease_severity | Disease severity (Mild, Moderate, Severe) | Baseline risk stratification |
| diabetes | Diabetes status (No, Yes) | Comorbidity analysis |
| hypertension | Hypertension status (No, Yes) | Cardiovascular risk factors |
The stratified analysis capabilities of zztable1 provide
powerful tools for understanding complex clinical trial data. By
examining baseline characteristics within meaningful subgroups,
researchers can better assess treatment allocation, identify potential
confounders, and plan appropriate statistical analyses.