This article showcases core functionalities of the package, and how to apply them in practice.
Introduction
The metapsyTools
package is part of the Metapsy project.
Functions included in the package can be used to run metaanalyses of
Metapsy databases, or
of databases that adhere to the same data
standard, “out of the box” using R.
The package also contains a set of tools that can be used to prepare metaanalytic data sets based on the Metapsy data standard, and to calculate effect sizes (Hedges’ \(g\) and risk ratios, \(RR\)).
The package consists of two modules:
 A module to check the data format and calculate effect sizes for all possible study comparisons (preparation module);
 A module to select relevant comparisons for a specific metaanalysis, calculate the results (including subgroup, metaregression, and publication bias analyses), and generate tables (analysis module).
The idea is to use the two modules in different contexts. For example, the preparation module can be used every time the database is updated to gather all information, calculate effect sizes, and bring the data into a format suitable for further analyses.
Prepared data sets that follow the Metapsy data standard build the basis for the analysis module. Researchers simply have to filter out the comparisons that are relevant for the investigation, and can then use functions of the package to run a full metaanalysis, including sensitivity and subgroup analyses.
Important: The preparation module requires a
familiarity with the general structure of the database, as well as
intermediatelevel knowledge of R and the metapsyTools
package itself, in order to diagnose (potential) issues. We advise that
the preparation step (i.e. checking the data format and calculating
effect sizes each time the database is updated) should be conducted by a
person with some proficiency in R & data wrangling. The analysis
module, on the other hand, should be usable by all researchers.
The companion package of metapsyTools
is metapsyData
. Using
metapsyData
, metaanalytic databases can be automatically
downloaded into your R environment. Databases retrieved using
metapsyData
can directly be analyzed using
metapsyTools
.
If your goal is to analyze existing databases included in
Metapsy, only the analysis module in metapsyTools
is
relevant to you. Click here
to navigate directly to the analysis module
section.
The Preparation Module
The preparation module in metapsyTools
allows to:
 check if the classes of variables in the imported data are set as required, using the checkDataFormat function.
 check for formatting conflicts which have to be resolved prior to any further steps, using the checkConflicts function.
 calculate Hedges’ \(g\) and/or risk ratios (\(RR\)) for each comparison, provided suitable data is available, and generate a data set ready for metaanalyses, using the calculateEffectSizes function.
Required Data Structure
For your own convenience, it is highly advised to strictly follow some data formatting rules prior to importing a data set for effect size calculations. If all of these rules are followed closely, it minimizes the risk of receiving error messages, producing conflicts, etc.
A detailed description of the preferred data structure can be found
in the Metapsy
documentation. The metapsyTools
package also
includes a “toy” data set called depressionPsyCtr
,
which has already been preformatted to follow the required data
format.
To use the data preparation module (i.e. calculate effect sizes), study design (I) and effect size data variables (II) must be included in the data set.
Study Design Variables (I)

study
: The study name, formatted as “last name of the first author”, “year” (e.g."Smith, 2011"
). 
condition_arm1
: Condition in the first trial arm. The condition name is standardized to ensure comparability across trials (e.g.cbt
for all trial arms that employed cognitivebehavioral psychotherapy). 
condition_arm2
: Condition in the second trial arm. The condition name is standardized to ensure comparability across trials (e.g.wlc
for all trial arms that employed a waitlist control group). 
multi_arm1
: In multiarm trials, this variable provides a “specification” of the type of treatment used in the first arm. This variable is set toNA
(missing) when the study was not a multiarm trial. For example, if a multiarm trial employed two types of CBT interventions, facetoface and Internetbased, this variable would be set tof2f
andInternet
, respectively. 
multi_arm2
: In multiarm trials, this variable provides a “specification” of the type of treatment used in the second arm. This variable is set toNA
(missing) when the study was not a multiarm trial. For example, if a multiarm trial employed two types of control groups, waitlist and placebo, this variable would be set towl
andplac
, respectively. 
outcome_type
: This variable encodes the type of outcome that builds the basis of the comparison, e.g."response"
,"remission"
or"deterioration"
. This is variable is particularly relevant for dichotomous effect size data, because it indicates what the event counts refer to. The"msd"
factor level is used for outcomes expressed in means and standard deviations. 
instrument
: This variable describes the instrument through which the relevant outcome was measured. 
time
: The measurement point at which the outcome was obtained (e.g.post
orfollowup
). 
time_weeks
: The measurement point at which the outcome was obtained, in weeks after randomization (set toNA
if this information was not available). 
rating
: This variable encodes if the reported outcome was selfreported ("selfreport"
) or clinicianrated ("clinician"
).
Effect Size Data Variables (II)
Each Metapsy database also contains variables in which the (raw or
precalculated) effect size data is stored. In each row, one of the
following variable groups (a) to (d) are specified, depending on the
type of outcome data reported in the paper. The rest of the
variable groups will contain NA
in that row.
When preparing the effect size data, it is very important that each
row only has data provided for one variable group (as defined below),
while the other variable groups are NA
(i.e., have missing
values). Otherwise, the calculateEffectSizes
function that we use later will not know which type of effect size
should be calculated.
(a) Continuous Outcome Data

mean_arm1
: Mean of the outcome in the first arm at the measured time point. 
mean_arm2
: Mean of the outcome in the second arm at the measured time point. 
sd_arm1
: Standard deviation of the outcome in the first arm at the measured time point. 
sd_arm2
: Standard deviation of the outcome in the second arm at the measured time point. 
n_arm1
: Sample size in the first trial arm. 
n_arm2
: Sample size in the second trial arm.
(b) Change Score Data

mean_change_arm1
: Mean score change between baseline and the measured time point in the first arm. 
mean_change_arm2
: Mean score change between baseline and the measured time point in the second arm. 
sd_change_arm1
: Standard deviation of the mean change in the first arm. 
sd_change_arm2
: Standard deviation of the mean change in the second arm. 
n_change_arm1
: Sample size in the first trial arm. 
n_change_arm2
: Sample size in the second trial arm.
(c) Dichotomous Outcome Data (Response, Remission, Deterioration, …)

event_arm1
: Number of events (responders, remission, deterioration cases) in the first trial arm. 
event_arm2
: Number of events (responders, remission, deterioration cases) in the second trial arm. 
totaln_arm1
: Sample size in the first trial arm. 
totaln_arm2
: Sample size in the second trial arm.
(d) Precalculated Hedges’ \(g\)

precalc_g
: The precalculated value of Hedges’ \(g\) (smallsample bias corrected standardized mean difference; Hedges, 1981). 
precalc_g_se
: Standard error of \(g\), viz. \(\sqrt{V_g}\).
(d) Precalculated logrisk ratio

precalc_log_rr
: The precalculated value of the logrisk ratio \(\log_{e}\text{RR}\), comparing events in the first arm to events in the second arm. 
precalc_log_rr_se
: The standard error of the logrisk ratio \(\log_{e}\text{RR}\), comparing events in the first arm to events in the second arm.
Additional Considerations
Metapsy databases also contain additional variables. These are used, for example, to collect subjectspecific information that is not relevant for all indications. Nevertheless, there are several formatting rules that all variables/columns follow:
 All variable names are in
snake_case
.  Variable names start with a standard letter (
_
is not allowed,.
is only allowed formetapsyTools
variables).  Variable names do not contain special characters (like @, ğ).
 Semicolons (
;
) are not used; neither as variable names nor as cell content.  Character values contain no leading/trailing white spaces.
 Missing values are encoded using
NA
.
Data Format & Conflict Check
Once the data set has been prestructured correctly,
metapsyTools
allows you to check the required variable
format and control for potential formatting conflicts.
If the formatting rules above are followed, none of the default
behavior of the preparation module has to be changed. At first, one can
run checkDataFormat
. As an illustration, we will use the depressionPsyCtr
data frame, a preinstalled “toy” data set which is directly available
after installing metapsyTools
.
# Load metapsyTools & dplyr
library(metapsyTools)
library(dplyr)
# Load preinstalled "toy" dataset
data("depressionPsyCtr")
# Check data format
depressionPsyCtr < checkDataFormat(depressionPsyCtr)
#  [OK] Data set contains all variables in 'must.contain'.
#  [OK] 'study' has desired class character.
#  [OK] 'condition_arm1' has desired class character.
# [...]
We see that the function has checked the required variables and their class. If divergent, the function will also try to convert the variable to the desired format.
Required variables for unique IDs: To function
properly, the metapsyTools
functions must be able to
generate a unique ID for each comparison. By default, this is
achieved by combining information of the study
,
condition_arm1
, condition_arm2
,
multi_arm1
, multi_arm2
,
outcome_type
, instrument
, time
,
time_weeks
, rating
variables. These variables
are included by default in the must.contain
argument in the
checkDataFormat
function. If all required variables are
detected, an OK
message is returned; as is the case in our
example.
We can also check the data for potential formatting issues, using the
checkConflicts
function.
depressionPsyCtr < checkConflicts(depressionPsyCtr)
#  [OK] No data format conflicts detected.
In our case, no data format conflicts were detected. This is because all the data were prestructured correctly. If data conflicts exist, the output looks similar to this:
#  [!] Data format conflicts detected!
# ID conflicts
#  check if variable(s) study create(s) unique assessment point IDs
#  check if specification uniquely identifies all trial arms in multiarm trials
# 
# Abas, 2018
# Ammerman, 2013
# Andersson, 2005
# [...]
ID conflicts typically occur because the information in the variables
used to build the IDs (see box above) is not sufficient to create a
unique ID for each comparison. Studies with nonunique IDs are
subsequently printed out. The checkConflicts function also has an
argument called vars.for.id
, which can be used define
(other or additional) columns to be used for ID generation. This can
sometimes help to create unique IDs.
As an illustration, we now define an extended variable list to be
used for our IDs in checkConflicts
:
# Define extended ID list
# "target_group" is now also included
id.list < c("study", "outcome_type",
"instrument", "time", "time_weeks",
"rating", "target_group")
# Use id.list as input for vars.for.id argument
checkConflicts(depressionPsyCtr,
vars.for.id = id.list)
Effect Size Calculation
If the data set has been preformatted correctly, and all ID
conflicts were resolved, the calculateEffectSizes
function can be used to calculate effect sizes (Hedges’
\(g\) or risk ratios, \(RR\)) for all comparisons in the
database.
The calculateEffectSizes
function uses the information
stored in the effect
size data variables to calculate \(g\) and/or \(RR\), depending on what information is
available. Therefore, it is very important to make sure these variables
are defined correctly before running
calculateEffectSizes
.
The calculateEffectSizes
function adds nine columns to
the data set, all of which start with a dot (.
). They are
included so that metaanalysis functions in metapsyTools
can be applied “out of the box”.

.id
: Unique identifier for a trial arm comparison/row. 
.g
: Calculated effect size (Hedges’ \(g\)). 
.g_se
: Standard error of Hedges’ \(g\). 
.log_rr
: Calculated effect size (\(\log_{e}\text{RR}\)). 
.log_rr_se
: Standard error of \(\log_{e}\text{RR}\). 
.event_arm1
: Number of events (e.g. responders) in the first trial arm. 
.event_arm2
: Number of events (e.g. responders) in the second trial arm. 
.totaln_arm1
: Total sample size in the first trial arm. 
.totaln_arm2
: Total sample size in the second trial arm.
data < calculateEffectSizes(depressionPsyCtr)
Now, let us have a look at the calculated effect sizes (\(g\) and \(\log_eRR\)) in a random slice of the data:
# Show rows 10 to 15, and variables "study", ".g", and ".log_rr"
data %>%
slice(10:15) %>%
select("study", ".g", ".log_rr")
# study .g .log_rr
# 10 Ahmadpanah, 2016 1.7788677 NA
# 11 Alexopoulos, 2016 0.2151835 0.2436221
# 12 Alexopoulos, 2016 0.2151835 0.2436221
# 13 Allart van Dam, 2003 0.5696379 NA
# 14 Allart van Dam, 2003 0.5696379 NA
# 15 Ammerman, 2013 0.9439887 NA
We see one important detail here. When dichotomous outcome data is
provided, the calculateEffectSizes
function automatically calculates both the logrisk ratio, and an
estimate of \(g\). Therefore, effects
on remission, response, etc. are by default also included in
metaanalyses based on Hedges’ \(g\).
Please make sure to filter these effect sizes out if you do not want to
have them included in your metaanalysis. The filterPoolingData
function, which we will cover later, allows to do this.
Change Effect Size Sign
The change.sign
argument in
calculateEffectSizes
can be used to change the sign of one
or several calculated effect sizes (e.g. change the calculated effect
size from \(g\)=0.30 to \(g\)=—0.30, or vice versa). This is
illustrated below:
# Take a slice of the data for illustration
depressionPsyCtrSubset < depressionPsyCtr %>% slice(10:15)
# Create a logical variable which, for each row,
# indicates if the sign of g should be changed
depressionPsyCtrSubset$sign_change < c(TRUE, TRUE, TRUE,
TRUE, TRUE, FALSE)
# Calculate effect sizes while defining the variable in
# our data in which the change sign information is stored
data.change.sign < calculateEffectSizes(
depressionPsyCtrSubset,
change.sign = "sign_change")
# Show the relevant columns
data.change.sign %>%
select(study, .g, .log_rr)
# study .g .log_rr
# 1 Ahmadpanah, 2016 1.7788677 NA
# 2 Alexopoulos, 2016 0.2151835 0.2436221
# 3 Alexopoulos, 2016 0.2151835 0.2436221
# 4 Allart van Dam, 2003 0.5696379 NA
# 5 Allart van Dam, 2003 0.5696379 NA
# 6 Ammerman, 2013 0.9439887 NA
We see that .g
has now changed where ever
sign_change
has been set to TRUE
. We also see
that this does not affect the estimates of \(\log_eRR\).
Including Switched Arms
By default, Metapsy databases will only include all unique trial arm combinations, not all unique comparisons. To show when and why this difference matters, let us consider a small example.
Imagine that we have data of a fourarm trial. Say that this trial included CBT, PST, CAU and a waitlist condition. While this trial provides 6 unique trial arm combinations^{1}, the number of unique comparisons is higher: \(\frac{n!}{(nk)!} = \frac{4!}{2!} = 12\).
This is because the sign of the calculated effect size depends on which treatment serves as the reference. The effect of CBT vs. PST, for example, depends on which arm serves as the reference group (i.e. CBT vs. PST or PST vs. CBT). Depending on which arm is chosen (and assuming nonzero mean differences), the calculated effect size (e.g. Hedges’ \(g\)) value will either be negative (e.g. \(g\)=0.31) or positive (\(g\)=0.31). We see this visualized in the graph below. There are now two directed arrows for each comparison (12 in total), and each arrow reads as “is compared to”:
To truly calculate all unique comparisons, we have to set
the include.switched.arms
argument to TRUE
when running calculateEffectSizes
.
data.switched.arms < calculateEffectSizes(
depressionPsyCtr,
include.switched.arms = TRUE)
The resulting data frame data
now includes all effect
sizes that can theoretically be calculated (assuming that all relevant
combinations were included in the original data).
Should I use include.switched.arms? It is not essential to calculate all possible arm comparisons. However, having all possible comparisons in your data set is convenient, because all required comparisons pertaining to a particular research question (e.g. “how effective is PST when CBT is the reference group?”) can directly be filtered out; there is typically no need for further data manipulation steps then.
The include.switched.arms
argument can also be helpful
if you want to prepare network metaanalysis data
(which is subsequently analyzed using, e.g., the netmeta
package). However, this assumes that the data set already contains all
unique combinations in multiarm trials (e.g. all 6
combinations in the fourarm study above, not just a selection).
This leads to our final data set which can be used with the analysis module, which will be described next.
The Analysis Module
The analysis module in metapsyTools
allows to run
different kinds of metaanalyses based on the final data set created by
the preparation module. It is designed to also be usable by researchers
who were not directly involved in the preparation step, and is less
demanding in terms of required background knowledge. Metapsy databases
included in metapsyData can
directly be used by the analysis module in
metapsyTools
.
The analysis module allows, among other things, to:
 Filter data using strict rules, fuzzy string matching, or by
implementing prioritization rules (
filterPoolingData
andfilterPriorityRule
)^{2}.  Run conventional randomeffects metaanalysis models, as well as
commonly reported sensitivity analyses (
runMetaAnalysis
).  Correct effect estimates for potential publication bias or
smallstudy effects (
correctPublicationBias
).  Run subgroup analyses based on the fitted model (
subgroupAnalysis
)  Create study information tables (
createStudyTable
).
Filtering Relevant Comparisons
The metapsyTools
package contains functions which allow
to flexibly filter out comparisons that are relevant for a particular
metaanalysis.
The filterPoolingData
function is the main function for filtering. We simply have to provide
the final data set with calculated effect sizes (see preparation
module), as well as one or several filtering criteria pertaining to
our specific research question.
Say that we want to run a metaanalysis of all studies in which CBT was compared to waitlists, with the BDII at posttest being the analyzed outcome. We can filter out the relevant comparisons like this:
# Load the preinstalled "depressionPsyCtr" dataset
load("depressionPsyCtr")
data < depressionPsyCtr
# Filter relevant comparisons
data %>%
filterPoolingData(
condition_arm1 == "cbt",
condition_arm2 == "wl",
instrument == "bdi1",
time == "post") %>% nrow()
# [1] 7
We see that \(k\)=7 comparisons
fulfill these criteria. Note, however, that this will only select
comparisons for which condition_arm1
was exactly
defined as "cbt"
.
It is also possible to filter out all rows that simply
contain a specific search term (fuzzy
matching). For example, instead of "cbt"
, we might
want to filter out all studies that employed some type of “other”
therapy format. This can be achieved by using the Detect
function within filterPoolingData
:
data %>%
filterPoolingData(
Detect(condition_arm1, "other")) %>%
pull(condition_arm1)
# [1] "other psy" "other ctr" "other ctr" "other ctr" "other ctr" "other ctr"
# [7] "other ctr" "other ctr" "other psy"
We can create even more complex filters. Say that we want to examine the effect, as measured by the BDII at posttest, of CBT, PST, and BAT compared to either CAU or waitlists. We can visualize this with a graph again:
We can use the OROperator 
within
filterPoolingData
for such filter types. Again, we use the
Detect
function to allow for fuzzy matching:
data %>%
filterPoolingData(
Detect(condition_arm1, "cbtpstbat"),
Detect(condition_arm2, "cauwl"),
instrument == "bdi1",
time == "post") %>% nrow()
# [1] 11
Lastly, one can also filter data according to a specified
priority rule, using the filterPriorityRule
function. This is particularly helpful to select instruments. Say, for
example, that we ordered certain instruments based on their known
reliability. Now, for each study, we only want to select the comparison
in which the most reliable instrument was used. It is possible that, for
some studies, all used instruments are relatively unreliable. However,
given our priority rule, we can still extract the comparison with a
relatively high reliability, and discard all the other
measurements within one study that are even less reliable.
Assume that our priority rule for the employed instrument is
"hdrs"
(priority 1), "bdi2"
(priority 2),
"phq9"
(priority 3) and "bdi1"
(priority 4).
We can implement the rule like this:
data %>%
# First, filter all other relevant characteristics
filterPoolingData(
Detect(condition_arm1, "cbtpstbat"),
Detect(condition_arm2, "cauwl"),
time == "post") %>%
# Now, implement the priority rule for the outcome instrument
filterPriorityRule(
instrument = c("hdrs", "bdi2",
"phq9", "bdi1")) %>%
# Show number of entries
nrow()
# [1] 20
Pooling Effects
After all relevant rows have been filtered out, we can start to pool
the effect sizes. The runMetaAnalysis
function serves as a wrapper for several commonly used metaanalytic
approaches, and, by default, applies them all at once to our data:

"overall"
. Runs a generic inversevariance (randomeffects) model. All included effect sizes are treated as independent. If raw event data is available in theevent_arm1
,event_arm2
,totaln_arm1
,totaln_arm2
columns (see Effect Size Calculation), the MantelHaenszel method (Greenland & Robins, 1985) is used to pool effects. Please note that assumptions of this model are violated whenever a study contributes more than one effect sizes (since these effect sizes are assumed to be correlated). In this case, results of the"combined"
and"threelevel"
/"threelevel.che"
models should be reported instead. 
"combined"
. Aggregates all effect sizes within one study before calculating the overall effect. This ensures that all effect sizes are independent (i.e., unitofanalysis error & doublecounting is avoided). To combine the effects, one has to assume a correlation of effect sizes within studies, empirical estimates of which are typically not available. By default,runMetaAnalysis
assumes that \(\rho\)=0.6. 
"lowest.highest"
. Runs a metaanalysis, but with only (i) the lowest and (ii) highest effect size within each study included. 
"outlier"
. Runs a metaanalysis without statistical outliers (i.e. effect sizes for which the confidence interval does not overlap with the confidence interval of the overall effect). 
"influence"
. Runs a metaanalysis without influential cases, as defined using the influence diagnostics in Viechtbauer & Cheung (2010). See also theInfluenceAnalysis
function indmetar
, and Harrer et al. (2022, chap. 5.4.2). 
"rob"
. Runs a metaanalysis with only lowRoB studies included. By default, only studies with a value> 2
in therob
variable are considered for this analysis. 
"threelevel"
. Runs a multilevel (threelevel) metaanalysis model, with effect sizes nested in studies (Harrer et al., 2022, chap. 10). 
"threelevel.che"
. Runs a multilevel (threelevel) “correlated and hierarchical effects” (CHE) metaanalysis model. In this model, effect sizes are nested in studies, and effects within studies are assumed to be correlated (Pustejovsky & Tipton, 2022; by default, it is assumed that \(\rho\)=0.6). This will typically be a plausible modeling assumption in most realworld use cases. In most scenarios, the"threelevel.che"
model can be seen as a better (and therefore preferable) approximation of the data structure at hand, compared to the simpler"threelevel"
model.
To illustrate how the function works, we first select a subset of our
“toy” depressionPsyCtr
data.
# Only select data comparing CBT to waitlists and CAU
depressionPsyCtr %>%
filterPoolingData(
condition_arm1 == "cbt",
Detect(condition_arm2, "wlcau")
) > ma.data
Then, we plug the resulting ma.data
data set into the
runMetaAnalysis
function:
res < runMetaAnalysis(ma.data)
#  Running metaanalyses...
#  [OK] Using Hedges' g as effect size metric...
#  [OK] Calculating overall effect size... DONE
#  [OK] Calculating effect size using only lowest effect... DONE
#  [OK] Calculating effect size using only highest effect... DONE
#  [OK] Calculating effect size using combined effects (rho=0.6; armwise)... DONE
#  [OK] Calculating effect size with outliers removed... DONE
#  [OK] Calculating effect size with influential cases removed... DONE
#  [OK] Calculating effect size using only low RoB information... DONE
#  [OK] Calculating effect size using threelevel MA model... DONE
#  [OK] Robust variance estimation (RVE) used for threelevel MA model... DONE
#  [OK] Calculating effect size using threelevel CHE model (rho=0.6)... DONE
#  [OK] Robust variance estimation (RVE) used for threelevel CHE model... DONE
#  [OK] Done!
As we can see, while running, the runMetaAnalysis
function provides us with a few messages that allow to trace if all
models have been fitted successfully. When there are problems during the
calculation, the runMetaAnalysis
function will print a
warning.
# This tells the function to only consider studies with a rob score
# above 4 in the "low risk of bias" sensitivity analysis
runMetaAnalysis(ma.data, low.rob.filter = "rob > 4")
# [...]
#  [!] No low risk of bias studies detected! Switching to 'general'... DONE
# [...]
We can inspect the results by calling the created res
object in the console.
res
# Model results 
# Model k g g.ci p i2 i2.ci prediction.ci nnt
# Overall 26 0.57 [0.85; 0.28] <0.001 76.4 [65.65; 83.76] [1.57; 0.43] 5.22
# Combined 17 0.54 [0.89; 0.18] 0.005 77.1 [63.64; 85.57] [1.64; 0.57] 5.56
# One ES/study (lowest) 14 0.57 [1; 0.13] 0.015 79.8 [66.82; 87.65] [1.78; 0.64] 5.22
# One ES/study (highest) 14 0.47 [0.84; 0.11] 0.015 73.9 [55.75; 84.62] [1.45; 0.5] 6.41
# Outliers removed 23 0.46 [0.63; 0.29] <0.001 61.6 [39.55; 75.55] [1.11; 0.19] 6.64
# Influence Analysis 24 0.43 [0.6; 0.27] <0.001 63.9 [44.18; 76.66] [1.08; 0.22] 7.08
# Only rob > 2 13 0.34 [0.58; 0.11] 0.008 73.4 [53.91; 84.71] [1.1; 0.41] 9.12
# ThreeLevel Model 26 0.61 [1.05; 0.17] 0.011 92.1  [2.17; 0.96] 4.81
# ThreeLevel Model (CHE) 26 0.55 [0.92; 0.18] 0.007 88  [1.8; 0.7] 5.42
#
# Variance components (threelevel model) 
# Source tau2 i2
# Between Studies 0.536 92.1
# Within Studies 0 0
# Total 0.536 92.1
#
# Variance components (threelevel CHE model) 
# Source tau2 i2
# Between Studies 0.336 87.2
# Within Studies 0.003 0.8
# Total 0.339 88
By default, an HTML table should pop up along with
the console output. These preformatted HTML results tables can easily
be transferred to, for example, MS Word using copy and paste. To disable
this feature, we have to specify html=FALSE
inside the
function call.
Using the summary
method, details of the analysis
settings can be printed. This function also returns recommended
citations for the applied methods and/or packages. Furthermore, it
creates a summary forest plot in which all pooled
effects are displayed.
summary(res)
## Analysis settings 
##
## ✓ [Overall] Random effects model assumed.
## ✓ [Overall] Heterogeneity variance (tau2) calculated using restricted maximumlikelihood estimator.
## ✓ [Overall] Test statistic and CI of the pooled effect calculated using the KnappHartung adjustment.
## ✓ [Outliers removed] ES removed as outliers if the CI did not overlap with pooled effect CI.
## ✓ [Influence Analysis] Influential cases determined using diagnostics of Viechtbauer and Cheung (2010).
## [...]
##
##
## Cite the following packages/methods: 
##
##  {meta}: Balduzzi S, Rücker G, Schwarzer G (2019),
## How to perform a metaanalysis with R: a practical tutorial,
## EvidenceBased Mental Health; 22: 153160.
##  {metafor}: Viechtbauer, W. (2010). Conducting metaanalyses in R
## with the metafor package. Journal of Statistical Software, 36(3), 148.
## https://doi.org/10.18637/jss.v036.i03.
## [...]
Additional Arguments
The runMetaAnalysis
function allows to tweak many, many
details of the specific metaanalysis models (run
?runMetaAnalysis
to see the entire documentation). The most
important arguments one may want to specify are:

es.measure
. By default, the function conducts analyses using the Hedges \(g\) values stored in the data set. To conduct analyses using dichotomous outcome data (i.e. response, remission, etc.), one has to set this argument to"RR"
. 
es.type
. By default, analyses are conducted using the"precalculated"
effect sizes stored in the prepared data set. For risk ratios, one can alternatively set this argument to"raw"
, which means that the “raw” event counts are used for computations (instead of the precalculated logrisk ratio and its standard error). This is typically preferable, but may not be possible if this information could not be extracted from all primary studies. 
method.tau
. This argument controls the method to be used for estimating the betweenstudy heterogeneity variance \(\tau^2\). The default is"REML"
(restricted maximum likelihood), but other options such as the DerSimonianLaird estimator ("DL"
) are also available (see therunMetaAnalysis
function documentation for more details). Note that threelevel metaanalysis models can only be fitted using (restricted) maximum likelihood. 
nnt.cer
. TherunMetaAnalysis
function uses the method by Furukawa and Leucht (2011) to calculate the Number Needed to Treat (NNT) for each pooled effect size. This method needs an estimate of the control group event rate (CER). By default,nnt.cer = 0.2
is used, but you can set this to another value if desired. 
rho.within.study
. To combine effect sizes on a study level before pooling ("combined"
model), one has to assume a withinstudy correlation of effects. By default,rho.within.study = 0.6
is assumed, but this value can and should be changed based on better approximations. This value also controls the assumed withinstudy correlation in the multilevel CHE model ("theelevel.che"
). 
low.rob.filter
. By default, the function uses all comparisons for which the risk of bias rating in therob
variable has a value greater2
(low.rob.filter = "rob > 2"
). If your risk of bias rating is in another variable, or if another threshold should be used, you can change this argument accordingly.
Here is an example of a runMetaAnalysis
call with
nondefault settings:
runMetaAnalysis(ma.data,
es.measure = "RR",
es.type = "raw",
method.tau = "DL",
nnt.cer = 0.4,
rho.within.study = 0.8)
Also note that for the "combined"
model, by default,
effect sizes of multiarm trials will be aggregated on an
arms level. If, for example, a study
included a CBT, Internetbased CBT, and waitlist arm (where waitlist
is the reference condition), two separate aggregated effects will be
calculated: one for CBT vs. waitlist, and one for Internetbased CBT
vs. waitlist.
From a statistical perspective, it is typically preferable to aggregate on a study level instead^{3} (in our example, this would lead to one aggregated effect for CBT/Internetbased CBT combined vs. waitlist), although the aggregated effect sizes may then be more difficult to interpret from a clinical perspective.
To combine effect sizes on a study level instead, we can set the
which.combine
argument to "studies"
.
runMetaAnalysis(ma.data, which.combine = "studies")
Confidence Intervals for ThreeLevel Model \(\tau^2\) and \(I^2\) Values
Please note that, by default, no confidence intervals are printed for
the \(I^2\) value of the
"threelevel"
and "threelevel.che"
model (the
same is also the case for the \(G^2\)
value, the equivalent of \(I^2\) in
limit metaanalyses; see Publication
Bias section)^{4}. This is because there are two
heterogeneity variances in threelevel models: the “conventional”
betweenstudy heterogeneity variance \(\tau^2_{(1)}\) and the heterogeneity
variance within studies, \(\tau^2_{(2)}\). This means that there are
also two \(I^2\) values, one within
studies, and one for the betweenstudy heterogeneity. Established
closedform or iterative solutions (such as the \(Q\)Profile method; Viechtbauer, 2007) are not
directly applicable here, which means that confidence intervals cannot
be calculated that easily.
In metapsyTools
, one can resort to the
parametric bootstrap (van den Noortgate & Onghena,
2005)
to obtain approximate confidence intervals around the \(\tau^2\) and \(I^2\) values. Bootstrapping is performed
when the i2.ci.boot
argument is set to TRUE
.
By default, \(n\)=5000 bootstrap
samples are drawn, which should be seen as the absolute lower limit;
more replications are preferable and can be specified via the
nsim.boot
argument.
Please note that the bootstrapping is computationally expensive, which means that it can take up to an hour and longer until final results are obtained. During the bootstrapping procedure, the function prints a progress update so that users get an indication of how long the function (still) has to run.
res.boot < runMetaAnalysis(ma.data,
i2.ci.boot = TRUE,
nsim.boot = 500)
res.boot
# 1% completed  2% completed  3% completed ...
Confidence intervals around \(I^2\) and the \(\tau^2\) will then be printed out along with the usual results.
# [...]
# Variance components (threelevel model) 
# Source tau2 tau2.ci i2 i2.ci
# Between Studies 0.536 [0.178; 1.092] 92.1 [74.29; 95.72]
# Within Studies 0 [0; 0.055] 0 [0; 10.43]
# Total 0.536 [0.182; 1.107] 92.1 [79.83; 96.01]
#
# Variance components (threelevel CHE model) 
# Source tau2 tau2.ci i2 i2.ci
# Between Studies 0.336 [0.068; 0.667] 87.2 [56.65; 92.9]
# Within Studies 0.003 [0; 0.03] 0.8 [0; 10.31]
# Total 0.339 [0.079; 0.681] 88 [63.17; 93.67]
When runMetaAnalysis
was run with
i2.ci.boot
set to TRUE
, the \(I^2\) (or, more correctly, \(G^2\)) value of the limit
metaanalysis will also be calculated using parametric
bootstraping, provided that correctPublicationBias
is
called afterwards (see publication
bias section).
Complex VarianceCovariance Approximation
The vcov
argument in
runMetaAnalysis
controls if the effect size dependencies
within the data should be approximated using a "simple"
(default) or more "complex"
(but potentially more accurate)
method. This argument is only relevant for the "combined"
and "threelevel.che"
models. The default “simple” method
constructs variancecovariance matrices \(\Sigma_k\) for each study using a constant
sampling correlation \(\rho\) (defined
by rho.within.study
), which is identical across all
studies, outcomes, and time points. This simplifying assumption is part
of the formulation of the CHE model originally provided by Pustejovsky
and Tipton (2022).
Naturally, employing a common value of \(\rho\) across all studies may not be
reasonable in some analyses, and other information may be available to
better approximate the effect size dependencies in the collected data.
Setting vcov
to "complex"
allows to assume
that correlations between effect sizes may differ conditional on the
type of dependency. This means that the variancecovariance matrix \(\Sigma_k\) of some study \(k\) is approximated by an unstructured
matrix with varying \(\rho_{ij}\).
Put simply, this method allows to account for the fact that there can be various types of effect size dependencies (e.g. effect sizes are dependent because different instruments were used on the same sample; or they are dependent because effects of two different intervention groups were compared to the same control group in a multiarm trial, and so on) and that, therefore, the magnitude of the correlation also differs.
Setting vcov = "complex"
allows to additionally
incorporate assumed correlations specific to multiple testing over time
(e.g. correlations between effects at posttest and longterm
followup). The value provided in phi.within.study
represents the (auto)correlation coefficient \(\phi\), which serves as a rough estimate of
the retest correlation after 1 week. This allows to model the gradual
decrease in correlation between measurements over time.
Furthermore, it is possible to calculate a correlation coefficient
\(\rho_w\) for multiarm
trials, which is directly proportional to the size of each
individual trial arm. When all trial arms have the same size, meaning
that each arm’s weight \(w\) is
identical, \(\rho_w\) is known to be
0.5. Multiarm weights \(w\) (and thus
\(\rho_w\)) can be derived if the
w1.var
and w2.var
variables, containing the
sample size of each study arm, are available.
Using the complex approximation method increases the risk that at least one studies’ \(\Sigma_k\) matrix is not positive definite. In this case, the function automatically switches back to the constant sampling correlation approximation.
If the supplied data set format strictly follows the Metapsy
data standard, it is possible to set vcov
to
"complex"
without any further preparation steps. The
function will then inform us if the complex variancecovariance
approximation was not possible, meaning that the function automatically
reverted to the standard "simple"
approach.
runMetaAnalysis(ma.data,
vcov = "complex")
If you want to use the complex variancecovariance matrix approximation feature, it is advised to make sure that the data set contains the following columns:
"n_arm1"
and"n_arm2"
: sample sizes in the two compared groups."time_weeks"
: postrandomization time point at which the measurement was made (in weeks).
For a correctly preformatted dataset, users are referred to the depressionPsyCtr
“toy” dataset.
Internal Objects
It is important to note that runMetaAnalysis
is a
wrapper function. It provides a common interface to run
several metaanalysis models, while the models themselves are fitted
internally by other packages. In particular, we use functionality
provided by the meta
(Balduzzi, Rücker & Schwarzer, 2019), metafor
(Viechtbauer, 2010),
and dmetar
(Harrer, Cuijpers, Furukawa & Ebert, 2019) packages.
When returning the results, runMetaAnalysis
saves all
fitted models inside the returned object. Depending on their type, these
internal metaanalysis models work just like regular
meta
or metafor
models, which means that all
functionality developed for them is also available.
It is also possible to directly extract each calculated model from
the runMetaAnalysis
results object. Each of these models
are identical to the ones one would receive by running metagen
,
metabin
,
or rma.mv
directly.
res$model.overall # "overall" model (metagen/metabin)
res$model.combined # "combined" model (metagen/metabin)
res$model.lowest # lowest effect size/study only model (metagen/metabin)
res$model.highest # highest effect size/study only model (metagen/metabin)
res$model.outliers # outliers removed model (metagen/metabin)
res$model.influence # influential cases removed model (metagen/metabin)
res$model.rob # low RoB model (metagen/metabin)
res$model.threelevel # threelevel model
res$model.threelevel.che # threelevel CHE model
This means that any kind of operation available for
metagen
or rma
models is also
available for the models created by
runMetaAnalysis
. For example, we can generate a
funnel plot for our “overall” model like this:
library(meta)
# model.overall is a "meta" model, so we can directly
# Apply the funnel function included in meta
funnel(res$model.overall,
studlab = TRUE,
contour = c(0.9, 0.95, 0.99),
col.contour = c("darkgreen", "green", "lightgreen"))
This feature can also be used to run, for example, Egger’s
regression test, using functions already included in
meta
and dmetar
:
# Egger's regression test applied to the "overall" model:
library(dmetar)
eggers.test(res$model.overall)
# Eggers' test of the intercept
# =============================
#
# intercept 95% CI t p
# 2.814 3.97  1.66 4.785 7.17608e05
#
# Eggers' test indicates the presence of funnel plot asymmetry.
For an overview of all available functionality, you can consult the specific documentation entry for each metaanalysis model type:
"overall"
,"combined"
,"lowest"
,"highest"
,"outliers"
,"influence"
,"rob"
: run?metagen
in the console (if effect size if \(g\), or if if effect size is \(RR\) andes.type
is"precalculated"
); otherwise, run?metabin
."threelevel"
and"threelevel.che"
: run?rma.mv
in the console. The methods section in this documentation entry is particularly relevant, since it shows functions that can be applied to this model type (socalled S3 methods).
Forest Plots
It is also possible to generate forest plots of all
the calculated models. We only have to plug the results object into
plot
, and specify the name of the model for which the
forest plot should be retrieved:
plot(res, "overall") # Overall model (ES assumed independent)
plot(res, "combined") # ES combined within studies before pooling
plot(res, "lowest.highest") # Lowest and highest ES removed (creates 2 plots)
plot(res, "outliers") # Outliersremoved model
plot(res, "influence") # Influential casesremoved model
plot(res, "threelevel") # Threelevel model
plot(res, "threelevel.che") # Threelevel CHE model
This is what the "overall"
model forest plot looks like,
for example:
plot(res, "overall")
Note that the plot
function is simply a wrapper for the
forest.meta
function in the meta
package.
Therefore, all the advanced styling
options are also available using extra arguments.
plot(res, "overall",
col.predict = "lightgreen",
col.square = "gray",
fontfamily = "Palatino")
It is also possible to generate Baujat and LeaveOneOut Plots (not displayed here).
Replacement Functions
Once a model has been fitted using runMetaAnalysis
,
replacement functions are defined for each function
argument. This allows to quickly tweak one or more analysis settings,
which are implemented once the rerun
function is
called.
If, for example, we want to check the results using a different estimator of \(\tau^2\), and a different CER to calculate the number needed to treat, leaving all other settings the same, we can run:
method.tau(res) < "PM"
nnt.cer(res) < 0.7
Then, we have to call the rerun
function:
rerun(res)
A list of all available setting replacement functions is provided here. Replacement functions may be particularly helpful for sensitivity analyses.
Publication Bias
It is also possible to correct effect size estimates for potential
publication bias and/or smallstudy effects using the correctPublicationBias
function. This will apply three correction methods at the same time,
providing a range of (more or less) plausible corrected values. Using
the which.run
argument, we can apply these methods to a
specific model we fitted previously.^{5} Here, we select the "combined"
analysis.
correctPublicationBias(res, which.run = "combined")
# [...]
# Publication bias correction ('combined' model) 
# Model k g g.ci p i2 i2.ci prediction.ci nnt
# Trimandfill method 24 0.16 [0.62; 0.31] 0.490 85.4 [79.41; 89.61] [2.16; 1.85] 24.4
# Limit metaanalysis 17 0.1 [0.32; 0.52] 0.649 95.1  [1.05; 1.25] 35.4
# Selection model 17 0.87 [1.38; 0.36] <0.001 85.7 [64.72; 94.57] [1.98; 0.24] 6.38
# [...]
The reason why three publication bias methods are applied at the same time is because all publication bias methods are based on (to some extent) untestable assumptions, and these assumption differ between methods:
The trimandfill method is included mostly as a “legacy” method, given its popularity in the last decades. This method is also frequently requested by reviewers. However, it has been found that this method does not perform well as a way to correct for publication bias, especially when the betweenstudy heterogeneity is high. The trimandfill method assumes that publication bias results in funnel plot asymmetry, and provides an algorithm that imputes studies so that this asymmetry is removed, after which the results are reestimated.
The limit metaanalysis method assumes that publication bias manifests itself in socalled smallstudy effects (i.e., it assumes that small studies with high standard errors are more likely to be affected by publication bias). It calculates the expected (“shrunken”) pooled effect as the standard error \(\epsilon_k\) goes to zero, while accounting for betweenstudy heterogeneity. Importantly, this methods controls for smallstudy effects only, which can be associated with publication bias, but does not have to.
Lastly, a step function selection model is calculated, which allows to account for the possibility that results are more or less likely to get published depending on their \(P\) value. By default, the
selmodel.steps
argument is set to 0.05, which means that we assume that results with \(p\)<0.1 are more likely to get published. It is also possible to define more than one selection threshold, e.g. by settingselmodel.steps
toc(0.025, 0.05)
.
For further details, see Harrer, Cuijpers, Furukawa & Ebert (2022, chap. 9.3).
The correctPublicationBias
function also allows to
provide additional arguments that will be applied if they match the ones
defined in the trimfill
and limitmeta
functions in meta
and metasens
(Schwarzer,
Carpenter & Rücker, 2022),
respectively. You can find an example below:
correctPublicationBias(
res, which.run = "combined",
# Use the Rtype estimator to trim & fill
type = "R",
# Use a randomeffects model without bias parameter
# to shrink effects in the limi metaanalysis
method.adjust = "mulim")
Subgroup Analysis
The subgroupAnalysis
function can be used to perform
subgroup analyses. Every column included in the data set initially
supplied to runMetaAnalysis
can be used as a subgroup
variable.
For example, we might want to check if effects differ by country
(country
) or intervention type
(condition_arm1
):
sg < subgroupAnalysis(res, country, condition_arm2)
sg
# Subgroup analysis results 
# variable group n.comp g g.ci i2 i2.ci nnt p
# country 3 13 0.61 [1.3; 0.07] 81.5 [69.5; 88.8] 4.80 0.666
# . 1 13 0.76 [0.93; 0.58] 0 [0; 56.6] 3.74 .
# condition_arm2 cau 12 0.32 [0.54; 0.09] 71.9 [49.6; 84.3] 9.90 0.025
# . wl 14 0.96 [1.54; 0.38] 70.6 [49.4; 83] 2.88 .
By default, an HTML table should also pop up. The p
column to the right represents the significance of the overall subgroup
effect (there is, for example, no significant moderator effect of
country
).
It is also possible to conduct subgroup analyses using another model,
say, the threelevel model. We only have to specify
.which.run
:
sg < subgroupAnalysis(res,
country, condition_arm2,
.which.run = "threelevel")
When the number of studies in subgroups are small, it is sensible to
use a common estimate of the betweenstudy
heterogeneity variance \(\tau^2\) (in
lieu of subgroupspecific estimates). This can be done by setting the
.tau.common
argument to TRUE
in the
function:
sg < subgroupAnalysis(res,
country, condition_arm2,
.which.run = "combined",
.tau.common = TRUE)
MetaRegression
Since the runMetaAnalysis
function saves all fitted
models internally, it is also possible to extract each of them
individually to do further computations. Say, for example, that we want
to perform multiple metaregression using our threelevel
model. We want to use the risk of bias rating, as well as the scaled
study year as predictors. This can be achieved by extracting the model
to be analyzed using the $
operator, and then using
metaRegression
to fit the model.
metaRegression(res$model.threelevel,
~ scale(as.numeric(rob)) + scale(year))
# Multivariate MetaAnalysis Model (k = 26; method: REML)
#
# Variance Components:
#
# estim sqrt nlvls fixed factor
# sigma^2.1 0.3821 0.6181 14 no study
# sigma^2.2 0.0000 0.0000 26 no study/es.id
#
# Test for Residual Heterogeneity:
# QE(df = 23) = 64.6042, pval < .0001
#
# Test of Moderators (coefficients 2:3):
# F(df1 = 2, df2 = 23) = 3.0051, pval = 0.0693
#
# Model Results:
#
# estimate se tval df pval ci.lb ci.ub
# intrcpt 0.6999 0.1923 3.6394 23 0.0014 1.0978 0.3021 **
# scale(as.numeric(rob)) 0.5538 0.2364 2.3428 23 0.0282 0.0648 1.0428 *
# scale(year) 0.2656 0.2827 0.9395 23 0.3572 0.8503 0.3192
#
# 
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Please note that, to be available in metaRegression
, is
is necessary that the moderator variable(s) are already included in the
data set that we originally used to call
runMetaAnalysis
.
Study Tables
The createStudyTable
function allows to create an
overview table for the included studies/comparisons. One only has to
supply the filtered data set first, and then the names of the desired
variables in the order in which they should appear in the table. It is
also possible to rename factor labels directly in the function,
determine how far values should be rounded, and if variable names should
be changed:
createStudyTable(
# Dataset. Alternatively, a runMetaAnalysis
# object can also be used.
ma.data,
## Columns 
# Simply add columns in the order in which
# they should appear in the table
study, age_group, mean_age, percent_women,
# You can directly recode values within a variable
condition_arm1 = c("CBT" = "cbt"),
multi_arm1,
condition_arm2 = c("WaitList" = "wl",
"CareAsUsual" = "cau"),
n_arm1, n_arm2,
country = c("Europe" = "3", "USA" = "1"),
sg, ac, itt,
## Specifications 
# .round.by.digits controls the number of rounded digits for
# specified columns
.round.by.digits = list(mean_age = 0,
n_arm1 = 0,
n_arm2 = 0),
# .column.names allows to rename columns
.column.names = list(age_group = "age group",
mean_age = "mean age",
percent_women = "% female",
condition_arm1 = "Intervention",
condition_arm2 = "Control",
n_arm1 = "N (IG)",
n_arm2 = "N (CG)",
multi_arm1 = "Specification"))
By default, createStudyTable
also returns an HTML table
that one can copy and paste into MS Word.
Still have questions? This vignette only provides a
superficial overview of metapsyTools
’ functionality. Every
function also comes with a detailed documentation, which you may consult
to learn more about available options and covered use cases. Please note
that metapsyTools
is still under active development, which
means that errors or other problems may still occur under some
circumstances. To report an issue or ask for help, you can contact
Mathias (mathias.harrer@tum.de).
CBT vs. PST, CBT vs. CAU, CBT vs. Waitlist, PST vs. CAU, PST vs. Waitlist, and CAU vs. Waitlist. You can also confirm this by running:
combn(c("cbt", "pst", "cau", "wl"), 2)
. Please note that not all possible combinations will be included in all Metapsy data set (e.g., the CAUWaitlist combination in our example may typically not be included).↩︎These functions are technically part of the preparation module, but are listed here for convenience.↩︎
This is because, if effect sizes are aggregated on an arms level, the resulting effect size estimates of multiarm studies are correlated, while the metaanalytic model assumes that they are independent.↩︎
The \(G^2\) statistic (Rücker et al., 2011) quanitifies the percentage of betweenstudy heterogeneity that remains after smallstudy effects have been accounted for (in
metapsyTools
, \(G^2\) is printed as \(I^2\) ).↩︎By default,
correctPublicationBias
uses the first model defined viawhich.run
inrunMetaAnalysis
(typically, the"overall"
model). Other possible values are"overall"
,"combined"
,"lowest"
,"highest"
,"outliers"
,"influence"
and"rob"
. Currently, threelevel models are not supported incorrectPublicationBias
.↩︎