Home  /  Products  /  Stata 19  /  Power analysis for logistic regression

← See Stata 19's new features

Highlights

Designing an effective study requires balancing power, sample size, and cost. Whether you are analyzing customer conversion rates (purchased versus not purchased), patient recovery outcomes (recovered versus not recovered), or user engagement decisions (subscribed versus unsubscribed), Stata's new power logistic command helps you determine the optimal sample size for logistic regression models without wasting time or resources. This feature is a part of StataNow™.

Stata's power logistic command computes sample size, power, or effect size for the test of one coefficient of interest in a logistic regression model. As with all other power methods, power logistic allows you to specify multiple values of parameters and automatically produce tabular and graphical results.

Let's see it work

The power logistic command can calculate power, sample size, and effect size for logistic regression with one binary covariate, two binary covariates, or a mix of binary and continuous covariates.

One binary covariate

You are designing an online advertising campaign on a social media platform for a battery-powered tools retailer, and the overall click-through rate for their ads is 2%. You are interested in testing whether the click-through rate for electric vehicle (EV) owners differs from that for owners of traditional vehicles. You know that 90% of your target population owns traditional vehicles, and the click-through rate for owners of traditional vehicles is 1.8%, which is slightly lower than the overall click-through rate. Is this disparity due to chance, or are EV owners more receptive to the advertisements than owners of traditional vehicles? You design a study to find out.

To test whether the click-through rate differs between EV owners and owners of traditional vehicles, you design a marketing study where the binary outcome \((Y)\) is whether an individual clicks on the ad. The binary covariate \((X)\) is whether the individual owns an EV. You plan to fit a logistic regression model to test whether \(X\) has an effect on \(Y\). You can use power logistic with one binary covariate to determine how large a sample is necessary to achieve 80% power with a test of size \(\alpha =\) 0.05 (for example, a 5% type I error rate).

You specify power logistic with options px(0.1) to denote 10% EV ownership, py(0.02) to indicate the overall click-through rate of 2%, and pycondx0(0.018) to specify the conditional probability of \(Y\) for owners of traditional vehicles: \(Pr(Y=1 \: | \: X=0) = \) 1.8%. The default power and \(\alpha\) level are 80% and 0.05, respectively, so you do not need to specify the power() or alpha() option.

. power logistic, px(0.1) py(0.02) pycondx0(0.018)

Estimated sample size for logistic regression odds-ratio test
Likelihood-ratio test
H0: OR_X = 1  versus  Ha: OR_X != 1

Study parameters:

        alpha =    0.0500
        power =    0.8000
        delta =    2.1550  (odds ratio)
      oratiox =    2.1550
           px =    0.1000
           py =    0.0200
     pycondx0 =    0.0180

Estimated sample size:

            N =     5,166

It would take a sample size of 5,166 to achieve 80% power in this situation.

We could have carried out the above computation by specifying the odds ratio for \(X\) (2.155 from the above output) as an argument instead of using the py() option and specifying the intercept() option instead of the pycondx0() option. The intercept was stored as r(intercept) but not displayed in the output. We save this as local macro b0, then specify `b0' in the intercept() option to make sure we use the same value for the intercept.

. local b0 = r(intercept)

. power logistic 2.155, px(0.1) intercept(`b0')

Estimated sample size for logistic regression odds-ratio test
Likelihood-ratio test
H0: OR_X = 1  versus  Ha: OR_X != 1

Study parameters:

        alpha =    0.0500
        power =    0.8000
        delta =    2.1550  (odds ratio)
      oratiox =    2.1550
           px =    0.1000
    intercept =   -3.9992

Estimated sample size:

            N =     5,166

Two binary covariates

To reduce costs, you start thinking about ways to reduce the sample size without sacrificing power or type I error. Looking through data from previous advertising campaigns, you notice that the click-through rate is higher for individuals who see the ad on a desktop than for those who see it on a mobile device. Only 30% of the population uses a desktop to view the ads, while the other 70% uses mobile devices. But among traditional vehicle owners, the click-through rate is 2.2% on desktops and just 1.6% on mobile devices.

To take advantage of this additional information, you plan to add a second binary covariate to the logistic regression model: nuisance covariate \(Z\), the binary indicator that a mobile device is used to view the ad. To calculate the required sample size for a test with 80% power and \(\alpha = \) 0.05, you use power logistic with two binary covariates. You specify px(0.1) and py(0.02) as before, but now you add pz(0.7) to indicate mobile device usage of 70%. You specify pycondx0z0(0.022) and pycondx0z1(0.016) to indicate \(Pr(Y=1 \: | \: X=0 \:\: Z=0) =\) 2.2% and \(Pr(Y=1 \: | \: X=0 \:\: Z=1) =\) 1.6%, respectively.

. power logistic, px(0.1) py(0.02) pz(0.7) pycondx0z0(0.022) pycondx0z1(0.016)

Estimated sample size for logistic regression odds-ratio test
Likelihood-ratio test
H0: OR_X = 1  versus  Ha: OR_X != 1

Study parameters:

        alpha =    0.0500
        power =    0.8000
        delta =    2.2884  (odds ratio)
      oratiox =    2.2884
           px =    0.1000
           pz =    0.7000
           py =    0.0200
   pycondx0z1 =    0.0160
   pycondx0z0 =    0.0220

Estimated sample size:

            N =     4,326

After you include mobile device usage as a covariate in the model, the required sample size dropped to 4,326 without sacrificing power or type I error. Impressed with your resourcefulness, your manager gives you a bonus—enough to upgrade to the latest phone and finally take that well-deserved vacation!

Mix of binary and continuous covariates

You are thrilled that including a second binary covariate reduced the required sample size, but you wonder whether you can optimize it even further.

Looking deeper into previous similar advertising campaigns, you realize that time spent viewing the ad \((Z2)\) is another crucial factor. Studies show that users who linger on an ad for a longer time are more likely to click on it. Instead of dichotomizing viewing time into “short” versus “long” groups, you decide to treat it as a continuous variable in your logistic regression.

The first two power logistic specifications used a specialized syntax designed to handle the two special cases where there are exactly one or two binary covariates, respectively. Now that we are introducing a mix of binary and continuous covariates, we switch to the general case syntax, where we explicitly specify the distribution of each covariate. For instance, the first specification

. power logistic 2.155, px(0.1) intercept(`b0')

can be expressed using the general syntax as

. power logistic, x(distribution(bernoulli 0.1) oratio(2.155)) intercept(`b0')

Estimated sample size for logistic regression odds-ratio test
Likelihood-ratio test
H0: OR_X = 1  versus  Ha: OR_X != 1

Study parameters:

        alpha =    0.0500
        power =    0.8000
        delta =    2.1550  (odds ratio)
    intercept =   -3.9992

Covariate of interest X: Bernoulli(px), bins = 2
      oratiox =    2.1550
           px =    0.1000

Estimated sample size:

            N =     5,166

To illustrate, we assume that \(Z2\) is normally distributed with mean viewing time of seven seconds and a standard deviation of one second. Suppose you have prior knowledge about the odds ratios for \(X\), \(Z1\) (binary covariate \(Z\) from the previous section), and \(Z2\), but you know only that the intercept is somewhere in the range of −1 to −.25, so we perform a sensitivity analysis by specifying multiple values of option intercept().

. power logistic 2.3, x(distribution(bernoulli 0.1))
                         z1(distribution(bernoulli 0.7) oratio(0.8))
		         z2(distribution(normal 7 1) oratio(1.1))
		         intercept(-1 -.5 -.25)

Estimated sample size for logistic regression odds-ratio test
Likelihood-ratio test
H0: OR_X = 1  versus  Ha: OR_X != 1

Covariate of interest X: Bernoulli(px)
Nuisance covariates:
   Z1: Bernoulli(pz1)
   Z2: Normal(muz2, sigmaz2)

alpha power N delta oratiox px oratioz1 pz1 oratioz2
.05 .8 513 2.3 2.3 .1 .8 .7 1.1
.05 .8 549 2.3 2.3 .1 .8 .7 1.1
.05 .8 593 2.3 2.3 .1 .8 .7 1.1
muz2 sigmaz2 intercept
7 1 -1
7 1 -.5
7 1 -.25

power logistic displays a table when we provide a numlist of values for intercept(). We see that, for example, a sample of 513 subjects would be necessary if the intercept was −1.

By leveraging the continuous covariate, \(Z2\), you made your study even more efficient while preserving statistical power. The required sample size dropped to somewhere between 513 and 593. Your marketing team is amazed—you have cracked the code for optimal ad campaign analysis. Maybe this time, your manager will throw in an electric car in addition to that vacation!

Ready to get started?

Experience powerful statistical tools, reproducible workflows, and a seamless user experience—all in one trusted platform.