Structural equation modeling (SEM)
Estimate mediation effects, analyze the relationship between an unobserved latent concept such as depression and the observed variables that measure depression, or fit a model with complex relationships among both latent and observed variables. Fit models with continuous, binary, count, and ordinal outcomes. Even fit hierarchical models with groups of correlated observations such as children within the same schools. Evaluate model fit. Compute indirect and total effects. Fit models by drawing a path diagram or using the straightforward command syntax.
Whether the groupings in your data arise in a nested fashion (students nested in classrooms and classrooms nested in schools) or in a nonnested fashion (regions crossed with occupations), you can fit a hierarchical model to account for the lack of independence within these groups. Fit models for continuous, binary, count, ordinal, and survival outcomes. Estimate variances of random intercepts and random coefficients. Compute intraclass correlations. Estimate relationships that are population averaged over the random effects. Predict random effects or use Bayesian multilevel models to obtain their entire distributions.
General linear models
Fit one- and two-way models. Or fit models with three, four, or even more factors. Analyze data with nested factors, with fixed and random factors, or with repeated measures. Use ANCOVA models when you have continuous covariates and MANOVA models when you have multiple outcome variables. Further explore the relationships between your outcome and predictors by estimating effect sizes and computing least-squares and marginal means. Perform contrasts and pairwise comparisons. Analyze and plot interactions.
Power, precision, and sample size
Before you conduct your experiment, determine the sample size needed to detect meaningful effects without wasting resources. Do you intend to compute CIs for means or variances, or perform tests for proportions or correlations? Do you plan to fit a Cox proportional hazards model or compare survivor functions using a log-rank test? Do you want to use a Cochran—Mantel—Haenszel test of association or a Cochran—Armitage trend test? Use Stata's power command to compute power and sample size, create customized tables, and automatically graph the relationships between power, sample size, and effect size for your planned study. Or, use the ciwidth command to do the same but for CIs instead of hypothesis tests by computing the required sample size for the desired CI precision. Instead of commands, use the interactive Control Panel to perform your analysis.
Linear, binary, and count regressions
Fit classical linear models of the relationship between a continuous outcome, such as weight, and the determinants of weight, such as height, diet, and levels of exercise. If your response is binary (for example, employed or unemployed), ordinal (education level), or count (number of children), don't worry. Stata has maximum likelihood estimators— logistic, ordered logistic, Poisson, and many others—that estimate the relationship between such outcomes and their determinants. A vast array of tools is available to analyze such models. Predict outcomes and their confidence intervals. Test equality of parameters. Compute linear and nonlinear combinations of parameters.
IRT (item response theory)
Explore the relationship between unobserved latent characteristics such as mathematical aptitude and the probability of correctly answering test questions (items). Or explore the relationship between unobserved health and self-reported responses to questions about mobility, independence, and other health-affected activities. IRT can be used to create measures of such unobserved traits or place individuals on a scale measuring the trait. It can also be used to select the best items for measuring a latent trait. IRT models are available for binary, graded, rated, partial-credit, and nominal response items. Visualize the relationships using item characteristic curves, and measure overall test performance using test information functions.
Contrasts, marginal means, and profile plots
Quickly and easily obtain contrasts for categorical variables and their interactions. A simple R.edlevel will give you all the contrasts of education level with a reference category. A.edlevel will give you each paired contrast with the next higher education level. There are many more named contrasts, and you can specify your own. If you don't like typing, use a dialog box to select your contrasts. Marginal means are just a simple command or mouse click away after almost any estimation command. Evaluating interaction effects, the effects of moderating variables, is just as easy. And this is not just for linear models but for models with binary, ordinal, and count outcomes. Even for hierarchical models with correct handling of random effects. A simple command or a few mouse clicks will get you a profile plot of any of these results.
Combine results of multiple studies to estimate an overall effect. Use forest plots to visualize results. Use subgroup analysis and meta-regression to explore study heterogeneity. Use funnel plots and formal tests to explore publication bias and small-study effects. Use trim-and-fill analysis to assess the impact of publication bias on results. Perform cumulative and leave-one-out meta-analysis. Perform univariate and multivariate meta-analysis. Use the meta suite, or let the Control Panel interface guide you through your entire meta-analysis.
Account for missing data in your sample using multiple imputation. Choose from univariate and multivariate methods to impute missing values in continuous, censored, truncated, binary, ordinal, categorical, and count variables. Then, in a single step, estimate parameters using the imputed datasets, and combine results. Fit a linear model, logit model, Poisson model, hierarchical model, survival model, or one of the many other supported models. Use the mi command, or let the Control Panel interface guide you through your entire MI analysis.
Use multivariate analyses to evaluate relationships among variables from many different perspectives. Perform multivariate tests of means, or fit multivariate regression and MANOVA models. Explore relationships between two sets of variables, such as aptitude measurements and achievement measurements, using canonical correlation. Examine the number and structure of latent concepts underlying a set of variables using exploratory factor analysis. Or use principal component analysis to find underlying structure or to reduce the number of variables used in a subsequent analysis. Discover groupings of observations in your data using cluster analysis. If you have known groups in your data, describe differences between them using discriminant analysis.
Model your discrete choice data. If your outcome is, for instance, high-school graduates' choices to attend college, attend a trade school, or to work, you can fit a conditional logit, multinomial probit, or mixed logit model. Is your outcome instead a ranking of prefered alternatives? Fit a rank-ordered probit or rank-ordered logit model. Regardless of the model fit, you can use the margins to easily interpret the results. Estimate how much distance to the nearest college affects the probability of enrolling in college and even the probability of going to a trade school.
Estimate experimental-style causal effects from observational data. With Stata's treatment-effect estimators, you can use a potential-outcomes (counterfactuals) framework to estimate, for instance, the effect of family structure on child development or the effect of unemployment on anxiety. Fit models for continuous, binary, count, fractional, and survival outcomes with binary or multivalued treatments using inverse-probability weighting (IPW), propensity-score matching, nearest-neighbor matching, regression adjustment, or doubly robust estimators. If the assignment to a treatment is not independent of the outcome, you can use an endogenous treatment-effects estimator. In the presence of group and time effects, you can use difference-in-differences (DID) and triple-differences (DDD) estimators. In the presence of high-dimensional covariates, you can use lasso.
Whether your data require a simple weighted adjustment because of differential sampling rates or you have data from a complex multistage survey, Stata's survey features can provide you with correct standard errors and confidence intervals for your inferences. Simply specify the relevant characteristics of your sampling design, such as sampling weights (including weights at multiple stages), clustering (at one, two, or more stages), stratification, and poststratification. After that, most of Stata's estimation commands can adjust their estimates to correct for your sampling design.
Fit Bayesian regression models using one of the Markov chain Monte Carlo (MCMC) methods. You can choose from a variety of supported models or even program your own. Extensive tools are available to check convergence, including multiple chains. Compute posterior mean estimates and credible intervals for model parameters and functions of model parameters. You can perform both interval- and model-based hypothesis testing. Compare models using Bayes factors. Compute model fit using posterior predictive values and generate predictions.
Automated reporting and customizable tables
Stata is designed for reproducible research, including the ability to create dynamic documents incorporating your analysis results. Create Word or PDF files, populate Excel worksheets with results and format them to your liking, and mix Markdown, HTML, Stata results, and Stata graphs, all from within Stata. Create tables that compare regression results or summary statistics, use default styles or apply your own, and export your tables to Word, PDF, HTML, LaTeX, Excel, or Markdown and include them in your reports.
Jupyter Notebook with Stata
Jupyter Notebook is widely used by researchers and scientists to share their ideas and results for collaboration and innovation. It is an easy-to-use web application that allows you to combine code, visualizations, mathematical formulas, narrative text, and other rich media in a single document (a "notebook") for interactive computing and developing. You can invoke Stata and Mata from Jupyter Notebook with the IPython (interactive Python) kernel. This means you can combine the capabilities of both Python and Stata in a single environment to make your work easily reproducible and shareable with others.
With my statistical training originating from psychology, there was a great emphasis on factorial designs, especially learning to understand interactions, both graphically and analytically. Stata allows you to graph the results of any interaction. Furthermore, Stata lets you easily dissect interactions into their components, easily obtaining simple effects, simple contrasts, partial interactions, and interaction contrasts. You can even easily graph the results of contrasts.
— Michael Mitchell
Senior statistician at the USC Children's Data Network, author of four Stata Press books, and former UCLA statistical consultant who envisioned and designed the UCLA Statistical Consulting Resources website
Intuitive and easy to use.
Once you learn the syntax of one estimator, graphics command, or data management tool, you will effortlessly understand the rest.
Accuracy and reliability.
Stata is extensively and continually tested. Stata's tests produce approximately 4 million lines of output. Each of those lines is compared against known-to-be-accurate results across editions of Stata and every operating system Stata supports to ensure accuracy and reproducibility.
One package. No modules.
When you buy Stata, you obtain everything for your statistical, graphical, and data analysis needs. You do not need to buy separate modules or import your data to specialized software.
Write your own Stata programs.
You can easily write your own Stata programs and commands. Share them with others or use them to simplify your work. Utilize Stata's do-files, ado-files, and Mata: Stata's own advanced programming language that adds direct support for matrix programming. You can also access and benefit from the thousands of existing Stata community-contributed programs.
Stata offers 33 manuals with more than 17,000 pages of PDF documentation containing detailed examples, in-depth discussions, references to relevant literature, and methods and formulas. Stata's documentation is a great place to learn about Stata and the statistics, graphics, data management, and data science tools you are using for your research.
Top-notch technical support.
Stata's technical support is known for their prompt, accurate, detailed, and clear responses. People answering your questions have master's and PhD degrees in relevant areas of research.
Stata's YouTube has over 350 videos with a dedicated playlist of methodologies important to behavioral scientists. And they are a convenient teaching aid in the classroom.
Get started quickly at using Stata effectively, or even learn how to perform rigorous time-series, panel-data, or survival analysis, all from the comfort of you home or office. NetCourses make it easy.
Stata Press offers books with clear, step-by-step examples that make teaching easier and that enable students to learn and behavioral scientists to implement the latest best practices in analysis.