Home  /  Learn  /  Teaching undergraduates with Stata

Teaching undergraduates with Stata

Teach statistics, not software. Use Stata to easily demonstrate statistical concepts, while you focus on explaining them.

Why teach with Stata

Stata's point-and-click interface, intuitive and consistent syntax, and free learning resources enable you to focus more on teaching statistics rather than on software implementation. Any materials you create in Stata will remain functional semester after semester. From basic to advanced statistics, Stata makes it easy to illustrate statistical concepts in your course.

Learn more about Stata →

Free student licenses

Give your undergraduate students access to reliable statistical software for free. Stata's easy learning curve and reproducibility mean undergraduates spend less time fighting code and more time analyzing data using the same tools used by top researchers. No barriers, no costs—just genuine tools to help them succeed from their very first assignment.

Examples

Adopt these ready-to-use examples for your classroom. Use our example datasets or simply substitute variable names and run the examples with your data. Whether students type the command or use point and click, Stata prints the syntax as they go—making it easy to learn while they work.

Datasets

Open native Stata .dta files, import text data and files from other software, download directly from the web, or explore our extensive library of built-in example datasets.

Load the built-in auto.dta example dataset.

. sysuse auto
(1978 automobile data)

To learn more, see [U] 1.2.2 Example datasets

Summary statistics

Compute basic summary statistics for all variables in the current dataset.

. sysuse auto

. summarize

Variable Obs Mean Std. dev. Min Max
make 0
price 74 6165.257 2949.496 3291 15906
mpg 74 21.2973 5.785503 12 41
rep78 69 3.405797 .9899323 1 5
headroom 74 2.993243 .8459948 1.5 5
trunk 74 13.75676 4.277404 5 23
weight 74 3019.459 777.1936 1760 4840
length 74 187.9324 22.26634 142 233
turn 74 39.64865 4.399354 31 51
displacement 74 197.2973 91.83722 79 425
gear_ratio 74 3.014865 .4562871 2.19 3.89
foreign 74 .2972973 .4601885 0 1

To learn more, see [R] summarize

Histogram

Create a histogram of mpg overlaid with a normal density curve.

. sysuse auto

. histogram mpg, normal
(bin=8, start=12, width=3.625)

To learn more, see [R] histogram

Bar chart

Create a bar chart of percentages of observations for each level of foreign grouped by level of rep78.

. sysuse auto

. graph bar, over(rep78) over(foreign)

To learn more, see [G] graph bar

Scatterplot

Create a scatterplot of price versus mpg overlaid with a line showing their linear relationship.

. sysuse auto

. twoway (scatter price mpg) (lfit price mpg)

To learn more, see [G] graph twoway scatter

Correlation

Compute a correlation matrix for variables price, mpg, and weight.

. sysuse auto

. pwcorr price mpg weight

price mpg weight
price 1.0000
mpg -0.4686 1.0000
weight 0.5386 -0.8072 1.0000

To learn more, see [R] correlate

Cross-tabulations

Create a two-way table of frequencies for rep78 and foreign with Pearson's χ2 test and Fisher's exact test.

. sysuse auto

. tabulate rep78 foreign, chi2 exact

Enumerating sample-space combinations:
stage 5:  enumerations = 1
stage 4:  enumerations = 3
stage 3:  enumerations = 24
stage 2:  enumerations = 203
stage 1:  enumerations = 0

Repair
record Car origin
1978 Domestic Foreign Total
1 2 0 2
2 8 0 8
3 27 3 30
4 9 9 18
5 2 9 11
Total 48 21 69
Pearson chi2(4) = 27.2640  Pr = 0.000 Fisher's exact =  0.000

To learn more, see [R] tabulate twoway

Test of proportions

Test that the proportion of 1s in foreign is equal to 0.5.

. sysuse auto

. prtest foreign==0.5

One-sample test of proportion                   Number of obs      =        74
Variable Mean Std. err. [95% conf. interval]
foreign .2972973 .0531331 .1931583 .4014363
p = proportion(foreign) z = -3.4874 H0: p = 0.5 Ha: p < 0.5 Ha: p != 0.5 Ha: p > 0.5 Pr(Z < z) = 0.0002 Pr(|Z| > |z|) = 0.0005 Pr(Z > z) = 0.9998

Test proportions without a dataset by using the immediate command. In a sample size of 74, test that an observed proportion of 0.3 is equal to the hypothesized proportion of 0.5.

. prtesti 74 0.3 .5

One-sample test of proportion                      x: Number of obs =       74
Mean Std. err. [95% conf. interval]
x .3 .0532714 .1955901 .4044099
p = proportion(x) z = -3.4409 H0: p = 0.5 Ha: p < 0.5 Ha: p != 0.5 Ha: p > 0.5 Pr(Z < z) = 0.0003 Pr(|Z| > |z|) = 0.0006 Pr(Z > z) = 0.9997

To learn more, see [R] prtest and [U] 19 Immediate commands

z test

Test that the mean mpg is 20 miles per gallon, assuming that the population standard deviation is 6.

. sysuse auto

. ztest mpg==20, sd(6)

One-sample z test
Variable Obs Mean Std. err. Std. dev. [95% conf. interval]
mpg 74 21.2973 .6974858 6 19.93025 22.66434
mean = mean(mpg) z = 1.8600 H0: mean = 20 Ha: mean < 20 Ha: mean != 20 Ha: mean > 20 Pr(Z < z) = 0.9686 Pr(|Z| > |z|) = 0.0629 Pr(Z > z) = 0.0314

Test means without a dataset by using the immediate command. In a sample size of 74, test that an observed mean of 21.3 with an assumed population standard deviation of 6 is equal to the hypothesized mean of 20.

. ztesti 74 21.3 6 20

One-sample z test
Obs Mean Std. err. Std. dev. [95% conf. interval]
x 74 21.3 .6974858 6 19.93295 22.66705
mean = mean(x) z = 1.8638 H0: mean = 20 Ha: mean < 20 Ha: mean != 20 Ha: mean > 20 Pr(Z < z) = 0.9688 Pr(|Z| > |z|) = 0.0623 Pr(Z > z) = 0.0312

To learn more, see [R] ztest and [U] 19 Immediate commands

t test

Test that the mean price is equal between the two groups defined by foreign.

. sysuse auto

. ttest price, by(foreign)

Two-sample t test with equal variances
Group Obs Mean Std. err. Std. dev. [95% conf. interval]
Domestic 52 6072.423 429.4911 3097.104 5210.184 6934.662
Foreign 22 6384.682 558.9942 2621.915 5222.19 7547.174
Combined 74 6165.257 342.8719 2949.496 5481.914 6848.6
diff -312.2587 754.4488 -1816.225 1191.708
diff = mean(Domestic) - mean(Foreign)  t = -0.4139 H0: diff = 0  Degrees of freedom = 72 Ha: diff < 0 Ha: diff != 0  Ha: diff > 0 Pr(T < t) = 0.3401 Pr(|T| > |t|) = 0.6802  Pr(T > t) = 0.6599

Test means without a dataset by using the immediate command. Test that the mean of group 1, which has a sample size of 52, an observed mean of 6072, and a standard deviation of 3097, is equal to the mean of group 2, which has a sample size of 22, an observed mean of 6385, and a standard deviation of 2622.

. ttesti 52 6072 3097 22 6385 2622

Two-sample t test with equal variances
Obs Mean Std. err. Std. dev. [95% conf. interval]
x 52 6072 429.4766 3097 5209.79 6934.21
y 22 6385 559.0123 2622 5222.47 7547.53
Combined 74 6165.054 342.8675 2949.458 5481.72 6848.388
diff -313 754.4348 -1816.938 1190.938
diff = mean(x) - mean(y) t = -0.4149 H0: diff = 0 Degrees of freedom = 72 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.3397 Pr(|T| > |t|) = 0.6795 Pr(T > t) = 0.6603

To learn more, see [R] ttest and [U] 19 Immediate commands

ANOVA

Fit a one-way ANOVA model of price for factor rep78.

. sysuse auto

. anova price rep78

                  Number of obs =         69    R-squared     =  0.0145
                  Root MSE      =    2980.24    Adj R-squared = -0.0471

Source Partial SS df MS F Prob>F
Model 8360542.6 4 2090135.7 0.24 0.9174
rep78 8360542.6 4 2090135.7 0.24 0.9174
Residual | 5.684e+08 64 8881819
Total 5.768e+08 68 8482308.2

To learn more, see [R] anova

Linear regression

Regress dependent variable price on continuous independent variables mpg and weight, categorical independent variable foreign, and the interaction between weight and foreign.

. sysuse auto

. regress price mpg weight i.foreign c.weight#foreign

Source SS df MS Number of obs = 74
F(4, 69) = 19.84
Model 339707110 4 84926777.6 Prob > F = 0.0000
Residual 295358286 69 4280554.87 R-squared = 0.5349
Adj R-squared = 0.5080
Total 635065396 73 8699525.97 Root MSE = 2069
price Coefficient Std. err. t P>|t| [95% conf. interval]
mpg 71.05732 75.20168 0.94 0.348 -78.96593 221.0806
weight 3.419387 .6127688 5.58 0.000 2.196947 4.641827
foreign
Foreign -2830.393 2916.153 -0.97 0.335 -8647.959 2987.174
foreign#
c.weight
Foreign 2.683494 1.17166 2.29 0.025 .3460963 5.020891
_cons -6678.926 3298.749 -2.02 0.047 -13259.75 -98.10259

To learn more, see [R] regress

Model interpretation

(Run example on “Linear regression” first.)

Get predictions and residuals after fitting any model.

. predict y_hat
(option xb assumed; fitted values)

. predict epsilon_hat, residual

Interpret and visualize model results by using marginal means.

. margins foreign, at(weight=(2000(500)4500)) plot

To learn more, see [R] regress postestimation

Power analysis

Estimate the required sample size for a one-sample t test assuming a null mean of 0 to detect target means of 1, 2, and 3 with the default 80% power, given the default significance level of 0.05 and the sample standard deviation of 1.

. power onemean 0 (1 2 3), graph

To learn more, see [PSS] power onemean

Simulating distributions

Generate a normally distributed variable with mean 10 and standard deviation 2 and a binomially distributed variable with 20 trials and a 40% success probability.

. clear

. set obs 100
Number of observations (_N) was 0, now 100.

. generate normal = rnormal(10,2)

. generate binomial = rbinomial(20,.4)

To learn more, see [FN] Random-number functions

Reproducibility

Tired of updating your teaching materials to make sure your examples continue to run semester after semester?

Stata’s integrated version control ensures that the commands you teach today will produce the same results in the future. Simply include a version statement—like version 19.5—at the beginning of your script or program.

Stata Press

A variety of titles with tutorials and examples in Stata, plus day-one access for your students through inclusive access, so you can begin teaching immediately.

Affordable licensing options

Find the license that perfectly fits your needs.

Individual student pricing

Students can purchase a six-month or one-year license at an affordable rate.


Prof+ Plan pricing

Our best pricing for faculty and staff to purchase their own single-user license.


Undergraduate course license

A free six-month Stata/BE license for your students to use, available only to instructors actively teaching Stata at accredited, degree-granting institutions.

FAQs

Answers to the most-asked questions posed from our user community.

Can I receive a license before my course starts?

Yes. Please include the course start date when completing the undergraduate course license form. The license will be sent two weeks before the course.

Will my students be able to download Stata on their personal devices?

Yes. You will receive one classroom license along with download information to distribute to your students. The students may then download their own copy of Stata to their Windows, Mac, or Linux device.

Can my teaching assistants also use the license?

TAs can request a personal copy of Stata for use during their assistantships. Please have the TA email [email protected] with the following information for their own six-month Stata/BE license:

  • First and last names

  • Institution name

  • School-issued email address

  • Professor's name

  • Course description or syllabus URL

Does Stata integrate with Canvas, Blackboard, and other LMSs?

Stata does not integrate with LMSs. However, you can upload practice datasets from our documentation, handy cheat sheets and guides, and Stata Press books through VitalSource's Inclusive Access program as resources to complement your teaching.

I had late enrollments in my course. Can they access the license as well?

Yes. All students actively enrolled in your undergraduate class will be able to access the license.

Who can I reach out to with questions?

Contact our customer service team at [email protected] or 1-979-696-4600.