Keith Baxelbaum, Rose Hartman, and
Alexis Zavez (presenter)
Data Science and Biostatistics Unit (DSBU) and
Arcus Education, DBHI
2024-04-08
Link to join: https://bit.ly/chopRusers
Link to calendar: https://bit.ly/chopROfficeHours
R-based tools should have a substantial role in the work
The content of the presentation should be of interest to the R/Medicine community
CFP ends Monday, April 22 at 11:59 PM EDT
R/Medicine 2024: June 10-14 (virtual)
This is the second talk in a new series called R102: MasteRing the Fundamentals
Next up: Reshaping Data with tidyr, May 6th 12:00pm ET
Learn more about this new series, including dates and titles for each session:
https://arcus.github.io/r102/
Summary statistics, also called descriptive statistics, serve a few important functions:
Categorical variables:
Continuous variables:
Measures of central tendency:
Measures of variability:
We won’t get into the mathematical definitions of each of these statistics here, but if you’d like a refresher, check out these excellent Khan academy videos:
Two options:
Work in the cloud: https://posit.cloud/content/6126628
Work on your computer: https://github.com/arcus/education_r_environment
There’s a lot of helpful information (including examples and tutorials) on the package websites for each of the packages we’ll be using:
Only if needed:
Each R session:
In the console or in the exercises rmd file, run the following command:
ID age sex race diagnosis diagnosis.type
1 1 61 1 0 acute myeloid leukemia 1
2 2 62 1 1 non-Hodgkin lymphoma 0
3 3 63 0 1 non-Hodgkin lymphoma 0
4 4 33 0 1 Hodgkin lymphoma 0
5 5 54 0 1 acute lymphoblastic leukemia 0
6 6 55 1 1 myelofibrosis 1
time.to.transplant prior.radiation prior.chemo prior.transplant recipient.cmv
1 5.16 0 2 0 1
2 79.05 1 3 0 0
3 35.58 0 4 0 1
4 33.02 1 4 0 1
5 11.40 0 5 0 1
6 2.43 0 0 0 1
donor.cmv donor.sex TNC.dose CD34.dose CD3.dose CD8.dose TBI.dose C1/C2 aKIRs
1 0 0 18.31 2.29 3.21 0.95 200 0 1
2 0 1 4.26 2.04 NA NA 200 1 5
3 1 0 8.09 6.97 2.19 0.59 200 0 3
4 0 1 21.02 6.09 4.87 2.32 200 0 2
5 1 0 14.70 2.36 6.55 2.40 400 0 6
6 1 1 4.29 6.91 2.53 0.86 200 0 2
cmv time.to.cmv agvhd time.to.agvhd cgvhd time.to.cgvhd
1 1 3.91 1 3.55 0 6.28
2 0 65.12 0 65.12 0 65.12
3 0 3.75 0 3.75 0 3.75
4 0 48.49 1 28.55 1 10.45
5 0 4.37 1 2.79 0 4.37
6 1 4.53 1 3.88 0 6.87
To learn more about this dataset:
From the help documentation:
This data set contains 64 consecutive patients who underwent T-cell replete, matched sibling donor reduced-intensity conditioning allogeneic hematopoietic stem cell transplant. The primary risk factor of interest was the number of activating killer immunoglobulin-like receptors (aKIRs: 1-4 vs. 5-6).
medicaldata
R package these data are published in, see the medicaldata
package website – and note that the maintainers are always looking for more data contributions!summary()
First try running summary()
on just the age
column:
Min. 1st Qu. Median Mean 3rd Qu. Max.
29.00 46.00 55.00 52.44 61.00 67.00
summary()
Now try running summary
on the whole dataframe all at once:
ID age sex race
Min. : 1.00 Min. :29.00 Min. :0.0000 Min. :0.0000
1st Qu.:16.75 1st Qu.:46.00 1st Qu.:0.0000 1st Qu.:1.0000
Median :32.50 Median :55.00 Median :1.0000 Median :1.0000
Mean :32.50 Mean :52.44 Mean :0.5312 Mean :0.9062
3rd Qu.:48.25 3rd Qu.:61.00 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :64.00 Max. :67.00 Max. :1.0000 Max. :1.0000
diagnosis diagnosis.type time.to.transplant prior.radiation
Length:64 Min. :0.0000 Min. : 1.84 Min. :0.0000
Class :character 1st Qu.:0.0000 1st Qu.: 8.15 1st Qu.:0.0000
Mode :character Median :1.0000 Median : 13.70 Median :0.0000
Mean :0.5172 Mean : 30.85 Mean :0.1719
3rd Qu.:1.0000 3rd Qu.: 33.28 3rd Qu.:0.0000
Max. :1.0000 Max. :173.83 Max. :1.0000
NA's :6 NA's :1
prior.chemo prior.transplant recipient.cmv donor.cmv
Min. :0.000 Min. :0.0000 Min. :0.000 Min. :0.0000
1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000
Median :2.000 Median :0.0000 Median :1.000 Median :1.0000
Mean :2.188 Mean :0.1875 Mean :0.625 Mean :0.5938
3rd Qu.:3.000 3rd Qu.:0.0000 3rd Qu.:1.000 3rd Qu.:1.0000
Max. :8.000 Max. :1.0000 Max. :1.000 Max. :1.0000
donor.sex TNC.dose CD34.dose CD3.dose CD8.dose
Min. :0.0 Min. : 2.06 Min. : 2.040 Min. :1.08 Min. :0.160
1st Qu.:0.0 1st Qu.: 6.89 1st Qu.: 4.165 1st Qu.:3.08 1st Qu.:0.805
Median :0.5 Median :10.48 Median : 5.840 Median :3.98 Median :1.200
Mean :0.5 Mean :10.44 Mean : 5.546 Mean :4.30 Mean :1.323
3rd Qu.:1.0 3rd Qu.:13.42 3rd Qu.: 6.990 3rd Qu.:5.48 3rd Qu.:1.663
Max. :1.0 Max. :21.02 Max. :12.510 Max. :8.18 Max. :3.190
NA's :3 NA's :12
TBI.dose C1/C2 aKIRs cmv
Min. :200.0 Min. :0.0000 Min. :1.000 Min. :0.0000
1st Qu.:200.0 1st Qu.:0.0000 1st Qu.:1.000 1st Qu.:0.0000
Median :200.0 Median :1.0000 Median :2.000 Median :0.0000
Mean :287.5 Mean :0.5156 Mean :2.844 Mean :0.4062
3rd Qu.:400.0 3rd Qu.:1.0000 3rd Qu.:4.250 3rd Qu.:1.0000
Max. :400.0 Max. :1.0000 Max. :6.000 Max. :1.0000
time.to.cmv agvhd time.to.agvhd cgvhd
Min. : 0.430 Min. :0.0000 Min. : 0.660 Min. :0.0000
1st Qu.: 2.712 1st Qu.:0.0000 1st Qu.: 2.623 1st Qu.:0.0000
Median : 5.030 Median :0.0000 Median : 5.800 Median :0.0000
Mean :13.206 Mean :0.4062 Mean :15.605 Mean :0.4375
3rd Qu.:16.012 3rd Qu.:1.0000 3rd Qu.:21.410 3rd Qu.:1.0000
Max. :84.470 Max. :1.0000 Max. :85.190 Max. :1.0000
time.to.cgvhd
Min. : 0.820
1st Qu.: 3.848
Median : 6.735
Mean :10.304
3rd Qu.:11.307
Max. :65.120
Convert diagnosis column from character to factor:
Re-run the summary for that column:
acute lymphoblastic leukemia acute myeloid leukemia
1 12
aplastic anemia chronic lymphocytic leukemia
1 5
chronic myeloid leukemia congenital anemia
4 1
Hodgkin lymphoma multiple myelomas
3 7
myelodysplastic syndrome myelofibrosis
9 4
myeloproliferative disorder non-Hodgkin lymphoma
1 12
renal cell carcinoma
4
cytomegalovirus <- cytomegalovirus |>
mutate(sex = factor(sex, levels = c(0, 1), labels = c("Female", "Male")),
race = factor(race, levels = c(0, 1), labels = c("African-American", "White")),
diagnosis.type = factor(diagnosis.type, levels = c(0, 1), labels = c("Lymphoid", "Myeloid")),
prior.radiation = factor(prior.radiation, levels = c(0, 1), labels = c("No", "Yes")),
prior.transplant = factor(prior.transplant, levels = c(0, 1), labels = c("No", "Yes")),
donor.cmv = factor(donor.cmv, levels = c(0, 1), labels = c("Negative", "Positive")),
recipient.cmv = factor(recipient.cmv, levels = c(0, 1), labels = c("Negative", "Positive")),
donor.sex = factor(donor.sex, levels = c(0, 1), labels = c("Male", "Female")),
`C1/C2` = factor(`C1/C2`, levels = c(0, 1), labels = c("Heterozygous", "Homozygous")),
cmv = factor(cmv, levels = c(0, 1), labels = c("No", "Yes")),
agvhd = factor(agvhd, levels = c(0, 1), labels = c("No", "Yes")),
cgvhd = factor(cgvhd, levels = c(0, 1), labels = c("No", "Yes"))
)
A few things from the above code that you might want to look into further:
|>
instead of %>%
.mutate
and data transformation in general in R4DS section on mutateIf you make a mistake modifying the data, how can you undo it?
If this were a dataset we read in from an external file (like a .csv), you could just read it in again to get a fresh copy. But how do you get a fresh copy of a built-in dataset?
To reset the data to its original state, run rm(cytomegalovirus)
in the console. This will delete your current version of the data from R’s environment, and you’ll just be left with the original clean copy from the medicaldata
package.
Let’s run summary
on the whole dataframe again to see the results now that we’ve cleaned up the column types.
ID age sex race
Min. : 1.00 Min. :29.00 Female:30 African-American: 6
1st Qu.:16.75 1st Qu.:46.00 Male :34 White :58
Median :32.50 Median :55.00
Mean :32.50 Mean :52.44
3rd Qu.:48.25 3rd Qu.:61.00
Max. :64.00 Max. :67.00
diagnosis diagnosis.type time.to.transplant
acute myeloid leukemia :12 Lymphoid:28 Min. : 1.84
non-Hodgkin lymphoma :12 Myeloid :30 1st Qu.: 8.15
myelodysplastic syndrome : 9 NA's : 6 Median : 13.70
multiple myelomas : 7 Mean : 30.85
chronic lymphocytic leukemia: 5 3rd Qu.: 33.28
chronic myeloid leukemia : 4 Max. :173.83
(Other) :15 NA's :1
prior.radiation prior.chemo prior.transplant recipient.cmv donor.cmv
No :53 Min. :0.000 No :52 Negative:24 Negative:26
Yes:11 1st Qu.:1.000 Yes:12 Positive:40 Positive:38
Median :2.000
Mean :2.188
3rd Qu.:3.000
Max. :8.000
donor.sex TNC.dose CD34.dose CD3.dose CD8.dose
Male :32 Min. : 2.06 Min. : 2.040 Min. :1.08 Min. :0.160
Female:32 1st Qu.: 6.89 1st Qu.: 4.165 1st Qu.:3.08 1st Qu.:0.805
Median :10.48 Median : 5.840 Median :3.98 Median :1.200
Mean :10.44 Mean : 5.546 Mean :4.30 Mean :1.323
3rd Qu.:13.42 3rd Qu.: 6.990 3rd Qu.:5.48 3rd Qu.:1.663
Max. :21.02 Max. :12.510 Max. :8.18 Max. :3.190
NA's :3 NA's :12
TBI.dose C1/C2 aKIRs cmv time.to.cmv
Min. :200.0 Heterozygous:31 Min. :1.000 No :38 Min. : 0.430
1st Qu.:200.0 Homozygous :33 1st Qu.:1.000 Yes:26 1st Qu.: 2.712
Median :200.0 Median :2.000 Median : 5.030
Mean :287.5 Mean :2.844 Mean :13.206
3rd Qu.:400.0 3rd Qu.:4.250 3rd Qu.:16.012
Max. :400.0 Max. :6.000 Max. :84.470
agvhd time.to.agvhd cgvhd time.to.cgvhd
No :38 Min. : 0.660 No :36 Min. : 0.820
Yes:26 1st Qu.: 2.623 Yes:28 1st Qu.: 3.848
Median : 5.800 Median : 6.735
Mean :15.605 Mean :10.304
3rd Qu.:21.410 3rd Qu.:11.307
Max. :85.190 Max. :65.120
What is the shortest time from diagnosis to transplant in the data?
[1] NA
From the help documentation for min
:
Usage
max(..., na.rm = FALSE)
min(..., na.rm = FALSE)
[1] 1.84
What is the longest time from diagnosis to transplant?
What is the mean time to transplant?
Your turn!
Look in the summary_stats_exercises.rmd
file to find your first coding challenge.
02:00
For categorical variables, summary statistics are just counts.
We’ll look at two ways to get the counts for a single categorical variable in R:
Lymphoid Myeloid NA's
28 30 6
To get counts of two categorical variables together (e.g. what is the diagnosis type breakdown by recipient CMV status?), use cross-tabulation, also called “crosstabs” or “two-way frequency tables”.
diagnosis.type
cmv Lymphoid Myeloid
No 15 19
Yes 13 11
To get the results of a chi-squared test on the crosstabs, you can apply the summary
function to an xtabs object.
Option 1:
Option 2:
Call: xtabs(formula = ~cmv + diagnosis.type, data = cytomegalovirus)
Number of cases in table: 58
Number of factors: 2
Test for independence of all factors:
Chisq = 0.569, df = 1, p-value = 0.4507
For beautifully printed crosstabs, see the tbl_cross
function in the gtsummary
package.
Your turn!
Look in the summary_stats_exercises.rmd
file to find your next coding challenge.
02:00
gtsummary
We’ll recreate an approximation of the cytomegalovirus Table 1 using gtsummary
:
cytomegalovirus <- cytomegalovirus |>
mutate(aKIRs_groups = if_else(aKIRs > 4,
"5 to 6 aKIRs",
"1 to 4 aKIRs"),
aKIRs_groups = factor(aKIRs_groups,
levels = c("1 to 4 aKIRs", "5 to 6 aKIRs")),
prior.chemo_groups = if_else(prior.chemo >= 4,
"≥ 4",
as.character(prior.chemo)),
prior.chemo_groups = factor(prior.chemo_groups,
levels = c("0", "1", "2", "3", "≥ 4")))
The if_else
function is what’s called a ternary operator, and it has three parts:
TRUE
FALSE
.For more on using mutate
and if_else
, see see the R Basics: Data Transformation sections on mutate
and logical operators to see more examples of the kinds of conditional tests you can put in.
Note that there are two different functions that look similar and do almost (but not quite!) exactly the same thing: if_else
and ifelse
.
if_else
is part of the dplyr
package, and ifelse
is part of base R. This is a common point of confusion. There are a few differences in how the two functions work, but one of the big ones is that ifelse
does NOT preserve missingness. It’s a subtle difference, but watch out!
Now we can use the new aKIRs_groups
and prior.chemo_groups
variables in our Table 1.
Characteristic | 1 to 4 aKIRs, N = 481 | 5 to 6 aKIRs, N = 161 |
---|---|---|
age | 57 (45, 61) | 54 (52, 60) |
sex | ||
Female | 23 (48%) | 7 (44%) |
Male | 25 (52%) | 9 (56%) |
race | ||
African-American | 5 (10%) | 1 (6.3%) |
White | 43 (90%) | 15 (94%) |
diagnosis.type | ||
Lymphoid | 22 (51%) | 6 (40%) |
Myeloid | 21 (49%) | 9 (60%) |
Unknown | 5 | 1 |
time.to.transplant | 16 (9, 35) | 11 (5, 16) |
Unknown | 1 | 0 |
prior.chemo_groups | ||
0 | 7 (15%) | 3 (19%) |
1 | 10 (21%) | 5 (31%) |
2 | 13 (27%) | 4 (25%) |
3 | 7 (15%) | 2 (13%) |
≥ 4 | 11 (23%) | 2 (13%) |
prior.transplant | 10 (21%) | 2 (13%) |
cmv | 23 (48%) | 3 (19%) |
donor.cmv | ||
Negative | 20 (42%) | 6 (38%) |
Positive | 28 (58%) | 10 (63%) |
1 Median (IQR); n (%) |
gtsummary
tableThis is a decent table! Notice there are a few things about this table that we might still want to change, for example:
Characteristic | 1 to 4 aKIRs, N = 481 | 5 to 6 aKIRs, N = 161 | p-value2 |
---|---|---|---|
age | 57 (45, 61) | 54 (52, 60) | 0.7 |
sex | 0.8 | ||
Female | 23 (48%) | 7 (44%) | |
Male | 25 (52%) | 9 (56%) | |
race | >0.9 | ||
African-American | 5 (10%) | 1 (6.3%) | |
White | 43 (90%) | 15 (94%) | |
diagnosis.type | 0.5 | ||
Lymphoid | 22 (51%) | 6 (40%) | |
Myeloid | 21 (49%) | 9 (60%) | |
Unknown | 5 | 1 | |
time.to.transplant | 16 (9, 35) | 11 (5, 16) | 0.2 |
Unknown | 1 | 0 | |
prior.radiation | 9 (19%) | 2 (13%) | 0.7 |
prior.chemo_groups | 0.9 | ||
0 | 7 (15%) | 3 (19%) | |
1 | 10 (21%) | 5 (31%) | |
2 | 13 (27%) | 4 (25%) | |
3 | 7 (15%) | 2 (13%) | |
≥ 4 | 11 (23%) | 2 (13%) | |
prior.transplant | 10 (21%) | 2 (13%) | 0.7 |
recipient.cmv | 0.2 | ||
Negative | 16 (33%) | 8 (50%) | |
Positive | 32 (67%) | 8 (50%) | |
donor.cmv | 0.8 | ||
Negative | 20 (42%) | 6 (38%) | |
Positive | 28 (58%) | 10 (63%) | |
1 Median (IQR); n (%) | |||
2 Wilcoxon rank sum test; Pearson’s Chi-squared test; Fisher’s exact test |
The help documentation (?add_p.tbl_summary
) explains a little more thoroughly:
Tests default to “kruskal.test” for continuous variables (“wilcox.test” when “by” variable has two levels), “chisq.test.no.correct” for categorical variables with all expected cell counts >=5, and “fisher.test” for categorical variables with any expected cell count <5.
Make the p values round to 2 digits instead of 1, and use t.test instead of wilcoxon rank sum test:
Characteristic | 1 to 4 aKIRs, N = 481 | 5 to 6 aKIRs, N = 161 | p-value2 |
---|---|---|---|
age | 57 (45, 61) | 54 (52, 60) | 0.43 |
sex | 0.77 | ||
Female | 23 (48%) | 7 (44%) | |
Male | 25 (52%) | 9 (56%) | |
race | 0.62 | ||
African-American | 5 (10%) | 1 (6.3%) | |
White | 43 (90%) | 15 (94%) | |
diagnosis.type | 0.46 | ||
Lymphoid | 22 (51%) | 6 (40%) | |
Myeloid | 21 (49%) | 9 (60%) | |
Unknown | 5 | 1 | |
time.to.transplant | 16 (9, 35) | 11 (5, 16) | 0.86 |
Unknown | 1 | 0 | |
prior.radiation | 9 (19%) | 2 (13%) | 0.57 |
prior.chemo_groups | 0.85 | ||
0 | 7 (15%) | 3 (19%) | |
1 | 10 (21%) | 5 (31%) | |
2 | 13 (27%) | 4 (25%) | |
3 | 7 (15%) | 2 (13%) | |
≥ 4 | 11 (23%) | 2 (13%) | |
prior.transplant | 10 (21%) | 2 (13%) | 0.46 |
recipient.cmv | 0.23 | ||
Negative | 16 (33%) | 8 (50%) | |
Positive | 32 (67%) | 8 (50%) | |
donor.cmv | 0.77 | ||
Negative | 20 (42%) | 6 (38%) | |
Positive | 28 (58%) | 10 (63%) | |
1 Median (IQR); n (%) | |||
2 Two Sample t-test; Pearson’s Chi-squared test |
If you’re feeling like these arguments look confusing (why does it start with list()
? What does the ~
mean??), you’re not alone!
As you get further away from the default settings in gtsummary
(and other packages), your code is likely to become less intuitive and harder to read. That’s okay!
When you ran the above code, you may have noticed that R printed out several warnings with text like Warning for variable 'race': simpleWarning in stats::chisq.test(x = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, : Chi-squared approximation may be incorrect
.
One glaring issue that remains is the row labels. We don’t want labels like “prior.chemo_groups” in our final table!
We can clean those up with the label
argument in the tbl_summary
command:
table_data |>
tbl_summary(by = aKIRs_groups,
label = list(age ~ "Age at transplant (yrs)",
sex ~ "Sex",
race ~ "Race",
diagnosis.type ~ "Diagnostic category",
time.to.transplant ~ "Months from diagnosis to transplant",
prior.radiation ~ "Prior radiation therapy",
prior.chemo_groups ~ "Number of prior chemotherapy regimens",
prior.transplant ~ "Prior transplants",
recipient.cmv ~ "Recipient CMV seropositive",
donor.cmv ~ "Donor CMV seropositive")) |>
add_p(test = list(all_continuous() ~ "t.test",
all_categorical() ~ "chisq.test.no.correct"),
test.args = all_continuous() ~ list(var.equal = TRUE),
pvalue_fun = function(x) style_pvalue(x, digits = 2))
Characteristic | 1 to 4 aKIRs, N = 481 | 5 to 6 aKIRs, N = 161 | p-value2 |
---|---|---|---|
Age at transplant (yrs) | 57 (45, 61) | 54 (52, 60) | 0.43 |
Sex | 0.77 | ||
Female | 23 (48%) | 7 (44%) | |
Male | 25 (52%) | 9 (56%) | |
Race | 0.62 | ||
African-American | 5 (10%) | 1 (6.3%) | |
White | 43 (90%) | 15 (94%) | |
Diagnostic category | 0.46 | ||
Lymphoid | 22 (51%) | 6 (40%) | |
Myeloid | 21 (49%) | 9 (60%) | |
Unknown | 5 | 1 | |
Months from diagnosis to transplant | 16 (9, 35) | 11 (5, 16) | 0.86 |
Unknown | 1 | 0 | |
Prior radiation therapy | 9 (19%) | 2 (13%) | 0.57 |
Number of prior chemotherapy regimens | 0.85 | ||
0 | 7 (15%) | 3 (19%) | |
1 | 10 (21%) | 5 (31%) | |
2 | 13 (27%) | 4 (25%) | |
3 | 7 (15%) | 2 (13%) | |
≥ 4 | 11 (23%) | 2 (13%) | |
Prior transplants | 10 (21%) | 2 (13%) | 0.46 |
Recipient CMV seropositive | 0.23 | ||
Negative | 16 (33%) | 8 (50%) | |
Positive | 32 (67%) | 8 (50%) | |
Donor CMV seropositive | 0.77 | ||
Negative | 20 (42%) | 6 (38%) | |
Positive | 28 (58%) | 10 (63%) | |
1 Median (IQR); n (%) | |||
2 Two Sample t-test; Pearson’s Chi-squared test |
So how do you get the table from R to Word? You can save the table as a Word doc, which you can then open and copy into your article.
To save a table as a Word file, add the following two lines to the end of your table commands (you can change “my_table.docx” to be whatever you want the file name to be):
Note that this requires a function from the flextable
package. If you’ve never used flextable
before, you may need to run install.packages("flextable")
before this code will work.
For example, to save the table above, you could do the following:
table_data |>
tbl_summary(by = aKIRs_groups,
label = list(age ~ "Age at transplant (yrs)",
sex ~ "Sex",
race ~ "Race",
diagnosis.type ~ "Diagnostic category",
time.to.transplant ~ "Months from diagnosis to transplant",
prior.radiation ~ "Prior radiation therapy",
prior.chemo_groups ~ "Number of prior chemotherapy regimens",
prior.transplant ~ "Prior transplants",
recipient.cmv ~ "Recipient CMV seropositive",
donor.cmv ~ "Donor CMV seropositive")) |>
add_p(test = list(all_continuous() ~ "t.test",
all_categorical() ~ "chisq.test.no.correct"),
test.args = all_continuous() ~ list(var.equal = TRUE),
pvalue_fun = function(x) style_pvalue(x, digits = 2)) |>
as_flex_table() |>
flextable::save_as_docx(path = "my_table.docx")
For a review of how to specify a path, see the explanation of absolute and relative paths in our Directories and File Paths module.
For your final coding challenge, return to RStudio and follow the instructions there to create a Table 1 for a brand new data set, the laryngoscope data.
Here’s the published table you’ll be trying to recreate:
For inspiration (and lots of examples of things to try), check out the tbl_summary
help documentation.
Don’t struggle in silence!
summary_stats_solutions.Rmd
, but you’ll learn a lot more if you try it yourself firstmedicaldata
packagesummary()
min()
, max()
, mean()
, median()
, var()
, sd()
, quantile()
table()
xtabs()
gtsummary
package, specifically using tbl_summary()
For general background on statistics:
For more on the gtsummary
package, see their website, especially the FAQ and gallery of example tables
Want to go through this material again? It’s posted as an interactive tutorial online as part of DART (Data and Analytics for Research Training)!
You can also explore the rest of the DART modules with the new module discovery app prototype
Have funding for your research project and interested in working with an experienced biostatistician or data scientist to analyze your data?
The Data Science and Biostatistics Unit (DSBU) is DBHi’s and CHOP Research Institute’s centralized service unit for biostatistics and data science analysis support. Reach out to Alexis Zavez (zaveza@chop.edu) or Keith Baxelbaum (baxelbaumk@chop.edu) for more info!