Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: random sampling matching the characteristics of the sample

 From Maarten Buis To statalist@hsphsun2.harvard.edu Subject Re: st: random sampling matching the characteristics of the sample Date Tue, 1 May 2012 11:01:15 +0200

```You are not going to gain anything, except that pleasing reviewers is
good for your career. The 10% dummy will mean that there is not as
much information in your data as you would hope, but no amount of
statistical trickery will create information that is not present in
your data... Conceptually, what the reviewer asked you to do seems to
correspond with propensity score matching, and there are tools in
Stata available for such an analysis, see: -findit propensity score-.
There are good rasons for using propensity score matching (and equally
good reasons for not doing so, it all depends on the exact nature of
your research question, your data, etc.) but a sparse dummy is not one
of them.

Hope this helps,
Maarten

On Tue, May 1, 2012 at 10:22 AM, Andrea Rispoli <andrea.rspl@gmail.com> wrote:
> Dear Stan,
> Thank you. This is the request of a reviewer. Would you recommend that
> I simply chose a random sample?
>
> On Tue, May 1, 2012 at 3:13 AM, Stas Kolenikov <skolenik@gmail.com> wrote:
>> So why exactly do you want to do this? You will only lose in
>> precision, provided your model is OK; if it is badly misspecified,
>> then God only knows how your coefficients could jump around, so you
>> probably should not trust either specification, anyway.
>>
>> On Mon, Apr 30, 2012 at 6:44 PM, Andrea Rispoli <andrea.rspl@gmail.com> wrote:
>>> Dear Statalisters,
>>> I am running a regression model: y=f(x, age, size) where x is a dummy
>>> variable that can take value 1 or 0.
>>> Since in my sample x=1 for 10% of the sample and x=0 for 90% of the
>>> sample, I would like to identify a random subsample among the group
>>> x=0 so that it is more "comparable" in terms of size with the
>>> subsample for which x=1.
>>>
>>> My problem is that I would like that the selected subsample (in which
>>> x=0) matched the characteristics of the first subsample (x=1) on the
>>> other dimensions (e,g age and size).
>>> For instance, if I take the subsample x=1, mean of age = 37, mean of size=45.
>>> I would like to randomly select the second subsample (x=0), so that
>>> mean of age = 37, mean of size=45 as it is the case in the first
>>> subsample (x=1).
>>>
>>> Do you have any suggestions on how I could achieve such result in stata?
>>>
>>> Kind Regards
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> --
>> Stas Kolenikov, also found at http://stas.kolenikov.name
>> Small print: I use this email account for mailing lists only.
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

--
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```