metaDETECT | metaConvert

For meta-analysts

Checking your extraction before you pool

metaDETECT checks every row of your extraction at two stages: the values you entered (the Invalid family) and the effect sizes they produce (the Unusual and Discordant families). It flags any that do not hold up, before they reach your meta-analysis. This guidance is organised in two parts.

1 Part one The three families What each check looks for and why. 2 Part two A worked example Catching an inconsistency a forest plot cannot show.

Part 1 · The three families

Three families of error

Each tab isolates one family: a worked example, what makes the value wrong, and how to resolve it. Select a family.

Mathematically impossible

Values that cannot be true for any study, whatever the data.

Worked example

study	Smith 2021
or	1.80
ci_low	2.10
ci_up	1.55

What's wrong

The interval's lower bound (2.10) is above its upper bound (1.55). No estimate can fall below its own floor.

How to resolve it

Likely cause

The two bounds were swapped, or a digit was dropped.

What to do

Re-enter the interval in the right order. The value is set to NA so it cannot enter the pool.

Also in this family

SD < 0 |r| > 1 OR / RR ≤ 0 RD ∉ [−1, 1] estimate ∉ its CI cells > margin

metaDETECT flag

⚑ Inverted CI for 'or': lower > upper, set to NA

Statistically implausible

Values that are technically possible but highly improbable, given the rest of the dataset.

Worked example

study	Lee 2020
mean_exp	18.4
mean_sd_exp	1.2
n_exp	54
mean_nexp	12.1
mean_sd_nexp	1.4
n_nexp	51

What's wrong

The two SDs (1.2 and 1.4) are about ten times smaller than the peers in the pool (≈ 14), giving a Hedges' g of 4.70.

How to resolve it

Likely cause

A standard error entered as a standard deviation (SD ≈ SE × √n).

What to do

Check whether the column is an SD or an SE, against the sample size. The value is kept, not deleted.

Also in this family

|logOR| > 5 α / ICC > 0.99 n-ratio > 10 SE ≫ ES cross-row IQR outlier

metaDETECT flag

⚑ Large SMD: |g| = 4.70 (threshold: 3) (from means_sd)

Inconsistent across methods

Independent computations of the same effect that ought to agree, but do not.

Worked example

study	Park 2019
mean_exp	13.1
mean_sd_exp	5.1
n_exp	52
mean_nexp	12.0
mean_sd_nexp	5.4
n_nexp	53
student_t	3.18

from means + SDsg = 0.21

from student_tg = 0.62

What's wrong

The two estimates disagree: g = 0.21 versus g = 0.62.

How to resolve it

Likely cause

At least one of the redundant inputs was mis-extracted.

What to do

Re-check the means, the SDs and the t. The disagreement localises the suspect value.

Also in this family

CI overlap = 0 large min-max range CI width ≠ SE cross-study sign flip

metaDETECT flag

⚑ Low CI overlap between min/max estimates: 31.2% (threshold: 85%) - min: means_sd, max: student_t

Part 2 · A worked example

From the source articles to a clean dataset

The analyst works from the included articles, recording each study's statistics in the wide-format sheet. Capturing every statistic a paper reports for an effect - not only the minimum one formula needs - gives metaDETECT more than one estimate of that effect to compare. Run before pooling, metaDETECT then surfaces an error a forest plot cannot: two redundant statistics in one row that disagree, while the value entering the model still looks unremarkable.

Collect the source articles

It begins with the included studies themselves: each article is the source for one row, and the analyst records its statistics directly from the paper. Five trials are included here; what each reports varies, and one, Okafor 2020, also reports a Student's t for the comparison.

Adesina 2018means + SDs

Brandt 2019means + SDs

Okafor 2020means, SDs + t

Petrov 2021means + SDs

Sandberg 2022means + SDs

Each article is the primary source for one study. Most report group means and standard deviations; Okafor 2020 also reports a test statistic, a second route to the same effect.

Extract every statistic reported for the effect

A paper usually reports an effect several ways - group means and standard deviations, a test statistic, a confidence interval, a p-value. Extract all of them, not only the one a single formula needs: metaConvert turns each into an effect size, and the agreement between those routes is the check that follows. Each statistic must be matched to the right outcome, since a paper reports more than one.

Okafor 2020 · results section

“Primary outcome (cognitive score): the intervention group exceeded control (24.6 ± 6.3 versus 21.7 ± 6.5; n = 52, 48), t(98) = 2.26, P = .026.”

“Secondary outcome (fatigue scale): a larger separation was observed, t(98) = 4.78, P < .001.”

One effect, several routes. The means, SDs, sample sizes and the primary-outcome t all describe the cognitive-score effect, and each yields an effect size. The t = 4.78 belongs to a different outcome and must not be paired with this comparison.

Record each statistic in the extraction sheet

Each study occupies one row. Okafor 2020 carries both the group statistics and its reported t (in the student_t column), giving metaDETECT two estimates of that effect to compare; the other four offer a single route.

study_id	n_exp	mean_exp	mean_sd_exp	n_nexp	mean_nexp	mean_sd_nexp	student_t
Adesina 2018	44	31.2	7.4	45	27.6	7.6	-
Brandt 2019	39	28.9	6.9	41	25.7	7.2	-
Okafor 2020	52	24.6	6.3	48	21.7	6.5	4.78
Petrov 2021	50	35.1	8.1	49	30.9	8.4	-
Sandberg 2022	36	19.8	5.9	38	16.9	6.1	-

The column names are metaConvert's recognised inputs; the input-data page lists every one. Only Okafor 2020 provides a second route, so only its row can be cross-checked.

metaDETECT flags suspect estimates before pooling

metaConvert recomputes every effect size each row allows, and metaDETECT checks all of them - including the routes a forest plot discards - and flags any that disagree. In R this is two lines; the same sheet is uploaded, and the same checks run, in the web application:

## compute every effect size, then flag suspect rows res <- convert_df(data, measure = "g") summary(res, flags = TRUE)

The summary returns one row per study: the selected estimate in es_crude, its standard error in se_crude, and any checks in flags_crude. Four rows are clear; Okafor 2020 carries a flag.

study_id	es_crude	se_crude	flags_crude
Adesina 2018	0.476	0.213	-
Brandt 2019	0.449	0.224	-
Okafor 2020	0.450	0.201	Discordant
Petrov 2021	0.505	0.203	-
Sandberg 2022	0.478	0.233	-

⚑ Discordant - Low CI overlap between min/max estimates: 24% (threshold: 85%) - min: means_sd, max: student_t

es_crude, se_crude and flags_crude are the crude-scope columns metaConvert returns by default; the flags_crude cell above is abbreviated to its family, and the complete message appears beneath the table.

Read the flag, and correct it at the source

The two routes for Okafor 2020 disagree. The means and standard deviations give g = 0.45; the recorded t = 4.78 implies g = 0.95, more than twice as large, and the two confidence intervals overlap by only 24 percent. Because both describe the same comparison, they cannot both be right.

from means + SDsg = 0.45 [0.05, 0.85]

from student_tg = 0.95 [0.53, 1.37]

CI overlap24% (threshold 85%)

Two estimates of one effect that ought to coincide. metaConvert carries the higher-ranked means-and-SD estimate, g = 0.45, into the analysis, so a forest plot would look entirely normal; the conflict shows only when metaDETECT checks the second route.

Returning to the article resolves it: the t = 4.78 is the secondary-outcome (fatigue) value, recorded in place of the primary-outcome statistic, t = 2.26. With the value corrected, the two routes coincide (g = 0.45 from each, intervals overlapping by 99.8 percent) and the row clears:

study_id	mean_exp	mean_sd_exp	mean_nexp	mean_sd_nexp	student_t	es_crude	flags_crude
Okafor 2020	24.6	6.3	21.7	6.5	2.26	0.450	-

The means and SDs were always the higher-ranked route, so the selected estimate is unchanged (es_crude = 0.450); with the corrected t = 2.26 the two routes now agree, and metaDETECT no longer flags the row.

Why capture the redundant statistic. Had only the means and SDs been extracted, Okafor 2020 would have entered the pool at a plausible g = 0.45 and the mis-recorded t would never have been examined. The second route is what gives metaConvert a value to disagree with, and metaDETECT is what reads the disagreement before pooling.

Checklist before pooling

Was every statistic each paper reports for an effect captured, not only the minimum one formula needs?
Where a row offers two routes to the same effect, do they agree?
Is any effect size implausibly large for the intervention or exposure?
Does any estimate sit far from the rest of the pool?
Do the studies split into opposed directions — some intervals entirely above the null, others entirely below?
Does the same trial enter the dataset more than once?

Run the checks in the app How to enter your data

For reviewers & editors

Detecting extraction errors in a published meta-analysis

Likely cause

Usually an accidental duplicate of one trial. A study can also contribute several rows legitimately (subgroups, timepoints, or outcomes), all belonging to the same trial.

Detected by

Info: duplicate study_id

Reviewer action

Remove an accidental duplicate. If the rows are genuinely separate estimates from one trial, model the dependency with a multilevel or multivariate meta-analysis rather than entering them independently, so the trial is not double-counted.

Part 2 · A worked review

Verifying a suspect estimate, from plot to author query

Imagine you are a reviewer or a member of an editorial team and want to critically appraise the meta-analytic findings of a paper. The steps below shows the concrete procedure, step by step, to double check the results.

Identify the anomalous estimate

The forest plot lists, for each trial, the per-arm sample size, mean and standard deviation, and the standardised mean difference they produce. Five estimates fall between g = 0.38 and 0.61; Nakamura 2021 is reported at g = 3.04. Its standard deviations (2.0 and 2.1) are also far smaller than those of the other trials (about 10 to 12) - a combination that is implausible for a cognitive-training outcome and requires verification.

Re-derive the estimate, rather than judge it by eye

An implausible value is best confirmed by recomputation. The statistics the plot reports are entered into metaConvert, which recomputes each effect size and applies the metaDETECT checks - an objective record that can be cited in a review.

Transcribe the data into one extraction sheet

Every trial is entered in a single sheet: the per-arm summary statistics in the standard columns, and the published effect size in metaConvert's user-input columns - not a column named after the measure. Holding both lets metaConvert recompute each estimate from the raw data and cross-check it against the published value.

study	n_exp	mean_exp	mean_sd_exp	n_nexp	mean_nexp	mean_sd_nexp	user_es_original_measure_crude	user_es_crude	user_ci_lo_crude	user_ci_up_crude
Alvarez 2017	41	62.4	11.0	43	57.7	11.4	g	0.42	0.10	0.74
Bianchi 2018	38	58.1	9.8	39	52.6	10.1	g	0.55	0.22	0.88
Cohen 2019	52	70.3	12.1	50	65.8	11.6	g	0.38	0.05	0.71
Duarte 2020	45	49.2	10.4	44	44.1	10.7	g	0.49	0.16	0.82
Faulkner 2021	33	-	-	31	-	-	g	0.61	0.20	1.02
Nakamura 2021	30	24.5	2.0	30	18.2	2.1	g	3.04	2.62	3.46

The reported effect size occupies the user-input columns, never a column named g; user_es_target_measure_crude is set from the analysis measure and is not extracted. Faulkner 2021 reports no means or SDs, so its effect size alone is used. The input-data page lists every recognised column.

Recompute and check with metaConvert

metaConvert recomputes each effect size from the summary statistics and applies the metaDETECT checks to every row. The procedure runs in R, or without code in the web application:

## recompute every effect size, then check each row res <- convert_df(data, measure = "g") summary(res, flags = TRUE)

In the metaConvert web app the same sheet is uploaded and the metaDETECT panel opened. In either case, one row is returned with a flag:

⚑ Unusual - Large SMD: |g| = 3.04 (threshold: 3) (from means_sd)

Refer the discrepancy to the authors

metaDETECT identifies an implausible value, not its cause. A standardised mean difference of 3.04 corresponds to a separation of approximately three standard deviations between arms. The authors should be asked to verify the means and standard deviations for the trial and, in particular, to confirm whether the reported dispersion is a standard deviation or a standard error - the latter being smaller by a factor of √n.

Suggested query. “The reported standardised mean difference for Nakamura 2021 (g ≈ 3.04) implies a three-standard-deviation separation between arms. Please confirm the group means and standard deviations, and whether the value in the SD column is a standard deviation or a standard error.”

Checklist for reviewers

Is any effect larger than the intervention could plausibly produce?
Does any study sit far from the others?
Is any interval so narrow that one study holds most of the weight?
Do the studies split into opposed directions, some intervals excluding the null above it and others below?
Does any interval sit unevenly around its estimate?
Do two studies report identical numbers?
Does the same trial appear more than once?

Run the checks in the app How to enter your data