Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: sample by() vs. bysort: sample; and some unexpected ssc install trouble with a Mata library


From   Gabi Huiber <ghuiber@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: sample by() vs. bysort: sample; and some unexpected ssc install trouble with a Mata library
Date   Fri, 19 Feb 2010 17:54:58 -0500

Hello Statalist,

I am trying to get a stratified random sample without replacement.
There are three ways that I can think of, and I am curious about the
differences between them.

Suppose I define a sample count as a local named sample_ct. Then in a
local named strata I list the variables that define my strata. The
three ways that I can think of go like this:

sample `sample_ct', count by(`strata')         // (1) as suggested in [D] sample
bysort `strata': sample `sample_ct', count     // (2) per
http://www.ats.ucla.edu/stat/Stata/faq/sample.htm
gsample `sample_ct', strata(`strata') replace  // (3) using gsample by
Ben Jann, via ssc install

You can replicate this setup with one of your .dta files and a
variable list of your choice, side-by-side with one of the included
data sets. I chose lifeexp.dta, with the do-file below:

___do-file starts here___

local strata1    region lexp
local strata2    ${LAT_strata} // use your own varlist

local whichfile1 sysuse lifeexp
local whichfile2 use "${sub_file}" // use your own file

local sample_ct 1

local formulaz     "bysortsample sampleby gsample"
local formulaz     "bysortsample sampleby"

forvalues i=1/2 {
	local bysortsample "bysort `strata`i'': sample `sample_ct', count"
	local sampleby     "sample `sample_ct', count by(`strata`i'')"
	local gsample      "gsample `sample_ct', strata(`strata`i'')"
	foreach k in `formulaz' {
		tempfile `k'_file`i'
		`whichfile`i''
		set seed 1234567
		``k''
		save "``k'_file`i''", replace
		count	
	}
	di ""
	di "`whichfile'"
	drop _all
	use "`sampleby_file`i''"
	cf _all using "`bysortsample_file`i''"
}	

___and ends here___

The cf command will turn up all sorts of discrepancies between the
files generated by (1) and (2) and I have no idea why that would be
so. That is my first question.

But gsample as applied with (3) is making further trouble. Here's the output:

mm_sample() from -moremata- is required; type ssc install moremata
r(499);
end of do-file
r(499);

Yet I do have moremata. I checked:

. ssc install moremata
checking moremata consistency and verifying not already installed...
all files already exist and are up to date.
.

Has anybody seen this kind of phantom ssc installs before? How did you
fix yours?

Thank you,

Gabi

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index