Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: sample by() vs. bysort: sample; and some unexpected ssc install trouble with a Mata library

From   Gabi Huiber <>
Subject   st: sample by() vs. bysort: sample; and some unexpected ssc install trouble with a Mata library
Date   Fri, 19 Feb 2010 17:54:58 -0500

Hello Statalist,

I am trying to get a stratified random sample without replacement.
There are three ways that I can think of, and I am curious about the
differences between them.

Suppose I define a sample count as a local named sample_ct. Then in a
local named strata I list the variables that define my strata. The
three ways that I can think of go like this:

sample `sample_ct', count by(`strata')         // (1) as suggested in [D] sample
bysort `strata': sample `sample_ct', count     // (2) per
gsample `sample_ct', strata(`strata') replace  // (3) using gsample by
Ben Jann, via ssc install

You can replicate this setup with one of your .dta files and a
variable list of your choice, side-by-side with one of the included
data sets. I chose lifeexp.dta, with the do-file below:

___do-file starts here___

local strata1    region lexp
local strata2    ${LAT_strata} // use your own varlist

local whichfile1 sysuse lifeexp
local whichfile2 use "${sub_file}" // use your own file

local sample_ct 1

local formulaz     "bysortsample sampleby gsample"
local formulaz     "bysortsample sampleby"

forvalues i=1/2 {
	local bysortsample "bysort `strata`i'': sample `sample_ct', count"
	local sampleby     "sample `sample_ct', count by(`strata`i'')"
	local gsample      "gsample `sample_ct', strata(`strata`i'')"
	foreach k in `formulaz' {
		tempfile `k'_file`i'
		set seed 1234567
		save "``k'_file`i''", replace
	di ""
	di "`whichfile'"
	drop _all
	use "`sampleby_file`i''"
	cf _all using "`bysortsample_file`i''"

___and ends here___

The cf command will turn up all sorts of discrepancies between the
files generated by (1) and (2) and I have no idea why that would be
so. That is my first question.

But gsample as applied with (3) is making further trouble. Here's the output:

mm_sample() from -moremata- is required; type ssc install moremata
end of do-file

Yet I do have moremata. I checked:

. ssc install moremata
checking moremata consistency and verifying not already installed...
all files already exist and are up to date.

Has anybody seen this kind of phantom ssc installs before? How did you
fix yours?

Thank you,


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index