metaDETECT is the data-extraction error framework now built into metaConvert. As the package computes your effect sizes, it checks every study in a pairwise meta-analysis and flags the values that don't add up, before they reach your pooled estimate.
metaDETECT checks every row of your extraction at two stages: the values you entered (the Invalid family) and the effect sizes they produce (the Unusual and Discordant families). It flags any that do not hold up, before they reach your meta-analysis. This guidance is organised in two parts.
Each tab isolates one family: a worked example, what makes the value wrong, and how to resolve it. Select a family.
Values that cannot be true for any study, whatever the data.
| study | Smith 2021 |
|---|---|
| or | 1.80 |
| ci_low | 2.10 |
| ci_up | 1.55 |
The interval's lower bound (2.10) is above its upper bound (1.55). No estimate can fall below its own floor.
The two bounds were swapped, or a digit was dropped.
Re-enter the interval in the right order. The value is set to
NA so it cannot enter the pool.
Values that are technically possible but highly improbable, given the rest of the dataset.
| study | Lee 2020 |
|---|---|
| mean_exp | 18.4 |
| mean_sd_exp | 1.2 |
| n_exp | 54 |
| mean_nexp | 12.1 |
| mean_sd_nexp | 1.4 |
| n_nexp | 51 |
The two SDs (1.2 and 1.4) are about ten times smaller than the peers in the pool (≈ 14), giving a Hedges' g of 4.70.
A standard error entered as a standard deviation (SD ≈ SE × √n).
Check whether the column is an SD or an SE, against the sample size. The value is kept, not deleted.
Independent computations of the same effect that ought to agree, but do not.
| study | Park 2019 |
|---|---|
| mean_exp | 13.1 |
| mean_sd_exp | 5.1 |
| n_exp | 52 |
| mean_nexp | 12.0 |
| mean_sd_nexp | 5.4 |
| n_nexp | 53 |
| student_t | 3.18 |
The two estimates disagree: g = 0.21 versus g = 0.62.
At least one of the redundant inputs was mis-extracted.
Re-check the means, the SDs and the t. The disagreement localises the suspect value.
The analyst works from the included articles, recording each study's statistics in the wide-format sheet. Capturing every statistic a paper reports for an effect - not only the minimum one formula needs - gives metaDETECT more than one estimate of that effect to compare. Run before pooling, metaDETECT then surfaces an error a forest plot cannot: two redundant statistics in one row that disagree, while the value entering the model still looks unremarkable.
It begins with the included studies themselves: each article is the source for one row, and the analyst records its statistics directly from the paper. Five trials are included here; what each reports varies, and one, Okafor 2020, also reports a Student's t for the comparison.
A paper usually reports an effect several ways - group means and standard deviations, a test statistic, a confidence interval, a p-value. Extract all of them, not only the one a single formula needs: metaConvert turns each into an effect size, and the agreement between those routes is the check that follows. Each statistic must be matched to the right outcome, since a paper reports more than one.
“Primary outcome (cognitive score): the intervention group exceeded control (24.6 ± 6.3 versus 21.7 ± 6.5; n = 52, 48), t(98) = 2.26, P = .026.”
“Secondary outcome (fatigue scale): a larger separation was observed, t(98) = 4.78, P < .001.”
Each study occupies one row. Okafor 2020 carries both the group
statistics and its reported t (in the
student_t column), giving metaDETECT two estimates of
that effect to compare; the other four offer a single route.
| study_id | n_exp | mean_exp | mean_sd_exp | n_nexp | mean_nexp | mean_sd_nexp | student_t |
|---|---|---|---|---|---|---|---|
| Adesina 2018 | 44 | 31.2 | 7.4 | 45 | 27.6 | 7.6 | - |
| Brandt 2019 | 39 | 28.9 | 6.9 | 41 | 25.7 | 7.2 | - |
| Okafor 2020 | 52 | 24.6 | 6.3 | 48 | 21.7 | 6.5 | 4.78 |
| Petrov 2021 | 50 | 35.1 | 8.1 | 49 | 30.9 | 8.4 | - |
| Sandberg 2022 | 36 | 19.8 | 5.9 | 38 | 16.9 | 6.1 | - |
metaConvert recomputes every effect size each row allows, and metaDETECT checks all of them - including the routes a forest plot discards - and flags any that disagree. In R this is two lines; the same sheet is uploaded, and the same checks run, in the web application:
The summary returns one row per study: the selected estimate in
es_crude, its standard error in se_crude,
and any checks in flags_crude. Four rows are clear;
Okafor 2020 carries a flag.
| study_id | es_crude | se_crude | flags_crude |
|---|---|---|---|
| Adesina 2018 | 0.476 | 0.213 | - |
| Brandt 2019 | 0.449 | 0.224 | - |
| Okafor 2020 | 0.450 | 0.201 | Discordant |
| Petrov 2021 | 0.505 | 0.203 | - |
| Sandberg 2022 | 0.478 | 0.233 | - |
es_crude, se_crude and
flags_crude are the crude-scope columns metaConvert
returns by default; the flags_crude cell above is
abbreviated to its family, and the complete message appears
beneath the table.
The two routes for Okafor 2020 disagree. The means and standard deviations give g = 0.45; the recorded t = 4.78 implies g = 0.95, more than twice as large, and the two confidence intervals overlap by only 24 percent. Because both describe the same comparison, they cannot both be right.
Returning to the article resolves it: the t = 4.78 is the secondary-outcome (fatigue) value, recorded in place of the primary-outcome statistic, t = 2.26. With the value corrected, the two routes coincide (g = 0.45 from each, intervals overlapping by 99.8 percent) and the row clears:
| study_id | mean_exp | mean_sd_exp | mean_nexp | mean_sd_nexp | student_t | es_crude | flags_crude |
|---|---|---|---|---|---|---|---|
| Okafor 2020 | 24.6 | 6.3 | 21.7 | 6.5 | 2.26 | 0.450 | - |
es_crude = 0.450);
with the corrected t = 2.26 the two routes now agree, and
metaDETECT no longer flags the row.
Reviewers and editors rarely hold the data file, yet a substantial share of extraction errors remain detectable from the published forest plot together with the source articles. This guidance is organised in two parts.
Each tab isolates one recurring pattern: its visual signature, its likely cause, and the metaDETECT check that flags it.
One study's effect is far larger than any intervention could plausibly produce: here a standardised mean difference above 3.
A standard error entered as a standard deviation, or a unit error. For scale, an SMD of 3 on a cognitive test is a 45-point IQ gain (3 × the 15-point SD).
Convert the effect back to the test's units. If the implied change is clinically impossible, an input is wrong.
One study sits far from the others, though its value is not impossible on its own.
A wrong row or arm, a unit mismatch, or a misread statistic.
Re-extract that study from the source and recompute.
One interval is very short and its square dominates the plot, though the study is no larger than its neighbours.
A standard error entered as a confidence-interval width, or an inflated sample size.
Check the interval against the reported sample size.
The pool splits into two opposed clusters: a group of studies is significant in one direction and another group in the other, each with intervals that exclude the null. A few studies still straddle it.
In one cluster the treatment and control arms were swapped, or the sign of a difference was dropped, flipping those studies across the null.
Confirm the arm coding for the studies on each side of the split.
Two rows carry an identical estimate and confidence interval: the same trial appears more than once.
Usually an accidental duplicate of one trial. A study can also contribute several rows legitimately (subgroups, timepoints, or outcomes), all belonging to the same trial.
Remove an accidental duplicate. If the rows are genuinely separate estimates from one trial, model the dependency with a multilevel or multivariate meta-analysis rather than entering them independently, so the trial is not double-counted.
A published forest plot reports each trial's group statistics alongside the effect size they yield. The procedure below re-derives those effect sizes with metaConvert and checks them with metaDETECT.
The forest plot lists, for each trial, the per-arm sample size, mean and standard deviation, and the standardised mean difference they produce. Five estimates fall between g = 0.38 and 0.61; Nakamura 2021 is reported at g = 3.04. Its standard deviations (2.0 and 2.1) are also far smaller than those of the other trials (about 10 to 12) - a combination that is implausible for a cognitive-training outcome and requires verification.
An implausible value is best confirmed by recomputation. The statistics the plot reports are entered into metaConvert, which recomputes each effect size and applies the metaDETECT checks - an objective record that can be cited in a review.
Every trial is entered in a single sheet: the per-arm summary statistics in the standard columns, and the published effect size in metaConvert's user-input columns - not a column named after the measure. Holding both lets metaConvert recompute each estimate from the raw data and cross-check it against the published value.
| study | n_exp | mean_exp | mean_sd_exp | n_nexp | mean_nexp | mean_sd_nexp | user_es_original_measure_crude | user_es_crude | user_ci_lo_crude | user_ci_up_crude |
|---|---|---|---|---|---|---|---|---|---|---|
| Alvarez 2017 | 41 | 62.4 | 11.0 | 43 | 57.7 | 11.4 | g | 0.42 | 0.10 | 0.74 |
| Bianchi 2018 | 38 | 58.1 | 9.8 | 39 | 52.6 | 10.1 | g | 0.55 | 0.22 | 0.88 |
| Cohen 2019 | 52 | 70.3 | 12.1 | 50 | 65.8 | 11.6 | g | 0.38 | 0.05 | 0.71 |
| Duarte 2020 | 45 | 49.2 | 10.4 | 44 | 44.1 | 10.7 | g | 0.49 | 0.16 | 0.82 |
| Faulkner 2021 | 33 | - | - | 31 | - | - | g | 0.61 | 0.20 | 1.02 |
| Nakamura 2021 | 30 | 24.5 | 2.0 | 30 | 18.2 | 2.1 | g | 3.04 | 2.62 | 3.46 |
g;
user_es_target_measure_crude is set from the analysis
measure and is not extracted. Faulkner 2021 reports no means or
SDs, so its effect size alone is used. The
input-data page lists every recognised
column.
metaConvert recomputes each effect size from the summary statistics and applies the metaDETECT checks to every row. The procedure runs in R, or without code in the web application:
In the metaConvert web app the same sheet is uploaded and the metaDETECT panel opened. In either case, one row is returned with a flag:
metaDETECT identifies an implausible value, not its cause. A standardised mean difference of 3.04 corresponds to a separation of approximately three standard deviations between arms. The authors should be asked to verify the means and standard deviations for the trial and, in particular, to confirm whether the reported dispersion is a standard deviation or a standard error - the latter being smaller by a factor of √n.