Two-stage least squares

Power or sample size calculations for two-stage least squares Mendelian Randomization studies using a genetic instrument \(Z\) (a SNP or allele score), a continuous exposure variable \(X\) (e.g. body mass index [BMI, \(\frac{kg}{m^2}]\)) and a continuous outcome variable \(Y\) (e.g. blood pressure [mmHg]).

YZ association

Power or sample size calculations for the regression association of a genetic instrument \(Z\) (e.g. a BMI SNP), with a continuous outcome variable \(Y\) (blood pressure).

Working Example

If we are interested in calculating the minimum required sample size for performing a Mendelian Randomization (MR) study ascertaining the causal effects of body mass index (BMI) on systolic blood pressure (SBP) in children, the required parameters for this online calculator could be taken from, for example, results from a published observational epidemiology study reporting associations between BMI and SBP and a SNP instrument that is reliably associated with BMI.

In an observational study reporting the association of BMI and SBP in children\(^{[1]}\), the regression coefficients for the association between BMI and SBP (averaged coefficients for boys and girls) was observed to be \(1.41 \frac{mmHg}{SD}\) (no confounder-adjustment) and \(1.30 \frac{mmHg}{SD}\) \(^{[*]}\) (adjusted for confounders). The SD for SBP in this sample (from the paper’s online supplementary data) was \(10.8\), with an SD (standard deviation) of \(1\) for BMI.

Assume that the causal effect of BMI on SBP is \(1.30 \frac{mmHg}{SD}\) \(^{[*]}\) and that the population regression coefficient of BMI on SBP, including the effects of confounders, is \(1.41 \frac{mmHg}{SD}\). Also assume that for the MR study we have a genetic instrument that explains \(R^2_{xz} = 0.01\) of variation in BMI (based on e.g. FTO SNP, which explains \(\sim 1 \%\) of the variation in BMI)\(^{[2]}\). Then we can calculate the power of an MR study using the following parameters:

\(\beta_{OLS} = 1.41 \frac{mmHg}{SD}\)

\(\beta_{yx} = 1.3 \frac{mmHg}{SD}\) \(^{[*]}\)

\(\sigma^2(x) = 1\)

\(\sigma^2(y) = 10.8^2 = 116.6~mmHg^2\)

For an \(\alpha\) of \(0.05\) and power of \(0.8\), the calculated minimum sample size for the Mendelian Randomization study is \(N = 53,218\). The reason why this sample size is so large is because BMI explains a small amount of variation in SBP in this case and because the genetic instrument explains a small proportion of variance in BMI.

\(*\) \(\beta_{yx}\) refers to the unknown true causal association between \(X\) and \(Y\) (between BMI and blood pressure, in this example) and therefore instead of 1.3 mmHg one could potentially use any value of \(\beta_{yx}\) deemed plausible or, for example, inspect the power/sample size calculations for a range of hypothetical values of \(\beta_{yx}\).

1. Lawlor DA, Benfield L, Logue J et al. Association between general and central adiposity in childhood, and change in these, with cardiovascular risk factors in adolescence: prospective cohort study. BMJ 2010; 341: c6224.

2. Frayling TM, Timpson NJ, Weedon MN et al. A Common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 2007; 316(5826): 889-894.

Estimating Power in Mendelian Randomization: Binary Outcomes

Previous equations for estimating power using the non-centrality parameter in the case of continuous outcomes\(^{[1]}\) were adapted for binary outcomes using an approximate linear model on the observed binary (0-1) scale. The calculations below are approximations and in the absence of X-Y confounding.

Input parameters for power calculations:
\(K\) = proportion of cases in the (intended) study
\(N\) = total sample size
\(OR\) = True odds ratio of the outcome variable per standard deviation of the exposure variable
\(R^2_{xz}\) = proportion of variance in exposure variable explained by SNPs

A linear model on the 0-1 scale in the population:

\(y_{01} = K + b_{01}x + e\)

The probabilities of the binary outcomes (y = 0 or y =1) for x = 0 and x = 1 standard deviation above the mean are:

\(Prob(disease | x = 0) = K\)
\(Prob(control | x = 0) = 1 - K\)
\(Prob(disease | x = 1) = K + b_{01}\)
\(Prob(control | x = 1) = 1 - K - b_{01}\)

The odds ratio \(OR = \frac{\frac{K + b_{01}}{1 - K - b_{01}}}{\frac{K}{1 - K}}\)

With input variables OR and K, the regression coefficient is derived on the observed scale:

\(b_{01} = K(\frac{OR}{1 + K(OR - 1)} – 1)\)

The sampling variance of the estimate of \(b_{01}\) is, approximately,

\(var(\hat b_{01}) = var(e) = var(y_{01}) – b^2_{01}var(x) = K(1 - K) – b^2_{01}\)

So the mean and sampling variance of the MR estimator on the linear scale are:

\(b_{MR} = K(\frac{OR}{1 + K(OR-1)} – 1)\)

\(var(b_{MR}) = \frac{var(e)}{N R^2_{xz}} = \frac{K(1-K) – b^2_{01}}{N R^2_{xz}}\)


\(NCP = \frac{b^2_{MR}}{var(b_{MR})} = N R^2_{xz} \frac{(K(\frac{OR}{1 + K(OR-1)} – 1))^2}{K(1-K) – b^2_{01}}\)

1 Brion M.J., Shakhbazov K. and Visscher P.M. 2013. Calculating statistical power in Mendelian randomization studies. Int J Epidemiol 42(5) 1497-1501.

Calculating statistical power in Mendelian randomization studies Marie-Jo A Brion, Konstantin Shakhbazov, Peter M Visscher International Journal of Epidemiology 2013 42: 1497-1501