Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: Random sampling of subset of the data

 From Zovanga Kone To "statalist@hsphsun2.harvard.edu" Subject st: Random sampling of subset of the data Date Thu, 2 Aug 2012 13:04:02 +0100

```Hi,
I am fairly new to stata so apologies beforehand in case my question seems trivial.

I am trying to obtain sets of coefficient estimates from randomly drawn samples from my dataset, with specific proportions from different subgroups of population in the dataset (e.g. only females, and say 90% of observations from group 2)

I have looked at the "bootstrap"," sample" and the "bsample" commands,  and the sample command seemed the most appropriates  (at least, it seems to make the most sense for what I want to me - I could be wrong, so please feel free to comment on that too).
I would like to obtain estimates from let say 200 randomly drawn samples. So I am looping 200 times, and also trying to create a matrix which is made up of the coefficient estimates from each regression. I am particularly interested in 6 coefficients. However, because of the number of observations for some subgroups in the dataset is fairly small, if not enough observations are drawn from one subgroup I don't get all the 6 coefficients.
The problem: in creating my matrix of coefficients, I am first taking the coefficients I am interested in from e(b) and putting them into a row vector - call this matrix A - and my final matrix (call it matrix B) is then basically made up of all these row vector As. Therefore in instances that not all the 6 coefficients are obtained, matrix A is not computed, and so B cannot also be computed.

I would be much grateful if you could help me with how to compute matrix A such that it is always a 1x6 regardless of whether all the 6 coefficients are estimated or not - I have in mind something that would replace the non-estimated coefficients by 99 for example.

Here is my command:

forvalue i=1(1)200 {
use age35vali_check.dta, clear // my dataset
drop if sex==1
sample 89.09 if gnrtiong2!=0 // THIS IS WHERE I AM SELECTING A FIX PROPORTION OF OBSERVATIONS FROM A PARTICULAR SUBGROUP IN THE DATA
probit employment1 tea potexp tri_potexp sq_potexp qrpotexp _Igovreggb* i.year duk2 duk3 duk4 duk5 duk6 duk7, robust  level(99)
mfx compute, force level(99)
mat FD=e(Xmfx_dydx)
mat Female_corrupt_q13 = FD[1, 18..23]
mat Fsmlsplq13 = nullmat(Fsmlsplq13)\ Female_corrupt_q13 // reduces samples to match sample size as if we were dropping all 3+ generations
}

This is the error stata is giving:
conformability error
r(503);

When type mat list Fsmlsplq13, I get a 87 x 6 matrix, which I am guessing is because the loop stopped at the 88th replication.

Ideally I would also like to have the level of significance for each coefficient, but I have no idea how to go about doing that.

Z

This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.   Please do not use, copy or disclose the information contained in this message or in any attachment.  Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```