Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: how to do subsampling in stata

From	László Sándor <[email protected]>
To	[email protected]
Subject	Re: st: how to do subsampling in stata
Date	Fri, 16 Aug 2013 11:37:11 -0400

To make my comments on -bootstrap- more meaningful:

I followed simply Wasserman's blogpost and thought I need to turn the
quantiles of estimates in subsamples into a confidence interval by
modifying the corresponding part of _bs_sum.ado the following way:

GetIntChar _dta[size] // this works only because I save this with the
bootstrap replication data myself
scalar `size' = r(val)
GetIntChar _dta[N]
scalar `obs' = r(val)
_pctile `x' if `touse', p(`=`p1'', `=`p2'')
scalar `p1' = `b_i' -(sqrt(`size')*(r(r2)-`b_i'))/sqrt(`obs')
scalar `p2' = `b_i' -(sqrt(`size')*(r(r1)-`b_i'))/sqrt(`obs')

Note that I had to use both the number of observations in the original
data and the size of the subsamples. Yes, these cancel out if the two
are equal. But as -bootstrap- allows the size to differ too, why don't
_bs_sum.ado need a similar adjustment?

I does look intuitive to my eye, without doing any of the math: If you
sample small samples, the estimates are noisier, so perhaps you want
scale down the deviations of the quantiles. Or this is wrong even for
subsampling? Or the sampling with replacement is what changes this for
bootstrap? (As some texts of bootstrap call sampling without
replacement a version of the bootstrap, I wonder if this would matter
that much.)

The third option is that I don't understand what the percentile CI is
supposed to be. E.g. I am pretty sure the higher quantile of the
statistic should matter for the lower bound, and that comes from the
higher quantile of the estimates, no? Though of course, this does not
give back my simplest intuition about the CI being the "middle 95%" of
the estimates as such.

Thanks for any thoughts,

Laszlo

On Fri, Aug 16, 2013 at 6:10 AM, Nick Cox <[email protected]> wrote:
> B wasn't well worded. Matching and subsampling are not equivalent or
> parallel. The matching example is intended to show the kind of user
> commitment that tends to change StataCorp's mind about what should be
> supported officially.
> Nick
> [email protected]
>
>
> On 16 August 2013 09:15, Nick Cox <[email protected]> wrote:
>> All your points are valid to me, but
>>
>> A. At any users' meeting or Stata conference, people will say "Stata
>> should support X, which is big in field Y, and that would be really
>> popular and people would buy Stata just to use that!" Meanwhile, one
>> is looking round the room and there are puzzled faces and people are
>> muttering to their friends "What's that? Never heard of it." Mostly,
>> everyone is right, but there is a long list of desires. (Often X is
>> really big, or an entire approach.)
>>
>> B. A big difference with matching is the evident volume of real
>> interest, shown as sustained activity over a period of years from the
>> Stata user community: major user-written programs downloaded
>> frequently, lots of papers and talks, numerous questions on Statalist.
>> That is a level of commitment not matched by evident interest in
>> subsampling. Whether everyone is looking in the wrong direction
>> remains a good question.
>>
>> C. StataCorp is very cautious and slow to react on big statistical
>> additions, arguably in the user community's best interests.
>> Statistical science, like anything else, is full of five-year fads,
>> things transiently popular but dropped abruptly when something else
>> becomes hot, or people see that they have been oversold. StataCorp
>> doesn't want to spend massive effort on implementing something that
>> will be quickly superseded in users' affections. Academics tend to
>> read papers and come to favourable views of something and come to
>> think "This is great and should be implemented now", but StataCorp
>> have a different time scale.
>>
>> Nick
>> [email protected]
>>
>>
>> On 16 August 2013 02:07, László Sándor <[email protected]> wrote:
>>> Stas, I am not sure I'm with you on this one.
>>>
>>> 1. Subsampling looks much, much easier to implement than other novelties.
>>> 2. Many if not most people use bootstrap not because they derived that
>>> their estimator is smooth but exactly because they worry that
>>> something is not exactly canonical in their problem or application,
>>> but hey, they can just bootstrap it. My admittedly limited
>>> understanding of the difference between the two methods suggest that
>>> subsampling is the safer bet.
>>> 3. The original (?) question on Statalist even mentioned that Abadie
>>> and Imbens tried to warn people that matching is exactly a problem
>>> where the bootstrap can be problematic, while subsampling they
>>> recommend. With version 13, Stata became a matching powerhouse. Why
>>> not support this simple thing, then?
>>> http://www.stata.com/statalist/archive/2009-04/msg00920.html
>>>
>>> Best,
>>>
>>> Laszlo
>>>
>>> On Thu, Aug 15, 2013 at 7:13 PM, Stas Kolenikov <[email protected]> wrote:
>>>> On Thu, Aug 15, 2013 at 12:12 PM, Phil Schumm <[email protected]> wrote:
>>>>> On Aug 15, 2013, at 11:45 AM, László Sándor <[email protected]> wrote:
>>>>>> Or of course, if StataCorp reading this is confident about how easy the transition from -bsample- to -sample- would be for a clone of -bootstrap-
>>>>>
>>>>> I'm not familiar with the literature on the subsampling, so what I'm about to say may not entirely apply here.  However, it is worth noting that a lot of what StataCorp does is not simply implementing estimators and methods, but is making sure that the theory behind them is sound, and that the various things users might do once the method is implemented in Stata are reasonable.  Thus, even though it might be fairly simple for a user to patch an existing command to accommodate a specific situation (for which they are willing to take full responsibility), it might take StataCorp longer to verify for themselves that the enhancement is really something with which they feel comfortable.
>>>>
>>>> Of many other wonderful theoretical developments in statistics and
>>>> econometrics, why not (a) empirical likelihood and exponential
>>>> tilting? (b) block bootstrap for time series? (c) delete-k jackknife
>>>> for complex survey data? (d) degrees of freedom corrections in mixed
>>>> models? (e) tetrad analysis in latent variable models? and an endless
>>>> wish list follows. Each of these are well established in their
>>>> specific literature, but their use is required in a fairly limited
>>>> range of situations. It took Stata Corp about 10 years from seeing the
>>>> first user-written multiple imputation and generalized linear latent
>>>> variable and mixed model pacakges (-ice/mim- and -gllamm-, of course)
>>>> to the production versions of these (-mi-, -meglm- and -gsem-), and
>>>> these have three order of magnitude greater generalizability and
>>>> potential user base than subsampling (which is really called for in
>>>> weird situations with non-smooth estimators, so one needs to put a lot
>>>> of work to even produce such an estimator) or empirical likelihood
>>>> (which is asymptotically equivalent to the existing -gmm-, anyway).
>>>>
>>>> That's a long introduction to say that I would not expect to see Stata
>>>> Corp working on this for the next three or so releases. If Laszlo's
>>>> needs are more urgent, he should start working on his own
>>>> implementation of subsampling. As I did with empirical likelihood :).
>>>>
>>>> -- Stas Kolenikov, PhD, PStat (ASA, SSC)
>>>> -- Senior Survey Statistician, Abt SRBI
>>>> -- Opinions stated in this email are mine only, and do not reflect the
>>>> position of my employer
>>>> -- http://stas.kolenikov.name
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: how to do subsampling in stata
  - From: Stas Kolenikov <[email protected]>

References:
- st: how to do subsampling in stata
  - From: László Sándor <[email protected]>
- Re: st: how to do subsampling in stata
  - From: Phil Schumm <[email protected]>
- Re: st: how to do subsampling in stata
  - From: Phil Schumm <[email protected]>
- Re: st: how to do subsampling in stata
  - From: László Sándor <[email protected]>
- Re: st: how to do subsampling in stata
  - From: Stas Kolenikov <[email protected]>
- Re: st: how to do subsampling in stata
  - From: László Sándor <[email protected]>
- Re: st: how to do subsampling in stata
  - From: Nick Cox <[email protected]>
- Re: st: how to do subsampling in stata
  - From: László Sándor <[email protected]>
- Re: st: how to do subsampling in stata
  - From: Phil Schumm <[email protected]>
- Re: st: how to do subsampling in stata
  - From: Stas Kolenikov <[email protected]>
- Re: st: how to do subsampling in stata
  - From: László Sándor <[email protected]>
- Re: st: how to do subsampling in stata
  - From: Nick Cox <[email protected]>
- Re: st: how to do subsampling in stata
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: panel data/no observations
Next by Date: Re: st: panel data/no observations
Previous by thread: Re: st: how to do subsampling in stata
Next by thread: Re: st: how to do subsampling in stata
Index(es):
- Date
- Thread