Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Using -collapse- extensively to find historical, irregular matches: Better way?


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: Using -collapse- extensively to find historical, irregular matches: Better way?
Date   Wed, 1 Oct 2003 13:27:44 +0100

Chih-Mao Hsieh

> The first suggestion that you mentioned, essentially the following:
>
> . egen cited2 = group(cited) ;
>
> . gen allcited = "" ;
> . tostring citing ;
> . tostring cited2 ;
>
> . bysort citing (cited2) : replace allcited = allcited[_n-1] + " " +
> cited2 ;
> . by citing : keep if _n == _N ;
> . bysort allcited (citing) : gen counter = _n - 1 ;
> . sort citing ;
>
> As can be expected, when it tries to do the first -bysort-,
> it returns
> the error message "no room to add more variables due to width".  My
> question is: Is there a best way to truncate the
> "concatenation" before
> it goes over the max (presumably 255?), preferably without
> any loops?

In general, as memory is short, -compress- and -drop- any
variables you don't need.

You are of course right that for this approach -cited2- needs
to be string. However, once you have -cited2- you do not
need -cited-, at least for the purpose of identifying which
groups match.  (-cited- is needed for identifying on which
patents they match.)

In addition, you could -drop- any observations for which
no patent is cited, although there may be none.

You could match on the first so many patents, e.g. 7:

egen cited2 = group(cited)
gen allcited = ""
bysort citing (cited2) : replace allcited = allcited[_n-1] + " " +
	string(cited2) if _n <= 7
by citing : replace allcited = allcited[_n-1] if mi(allcited)
by citing : keep if _n == _N
bysort allcited (citing) : gen counter = _n - 1
sort citing

> P.S. I tried the second option with reshape that you
> suggested -- it is
> consuming much more computing time than this -bysort-
> method, so I will
> stick with this.

A pity, as I think that -reshape- offers a much cleaner
approach.

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index