Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: RE: Re: first program

From   "Rodrigo A. Alfaro" <>
To   <>
Subject   st: Re: RE: Re: first program
Date   Thu, 1 Jun 2006 11:16:58 -0400

Thanks Nick! this improves a lot the program. As Nick said, you need to 
setup the panel with -tsset- to use balance. Anyway, let me remark that Dirk 
has to know that -balance- will drop the undesirable observations. Before to 
use it, maybe you would like to make a copy of the original data. Rodrigo.

----- Original Message ----- 
From: "Nick Cox" <>
To: <>
Sent: Thursday, June 01, 2006 10:52 AM
Subject: st: RE: Re: first program

Tweaking Rodrigo's program, which improves a lot
on the original:

1. This will fail if data are not sorted as desired.

2. Using -egen- is over the top here. You can and
should use _N directly.

3. As you just want the maximum, use -summarize,

4. This will fail if the data are not set up
as panel data.

Now find my bugs...

program balance, sort
     version 8.2
     syntax [if] [in]
     marksample touse
     tempvar count
     qui {
         local ivar: char _dta[iis]
   if "`ivar'" == "" {
di as err "no identifier set"
exit 498
         bysort `touse' `ivar': gen long `count' = _N
         sum `count' if `touse', meanonly
         keep if `count' == r(max) & `touse'

(Also, contrary to this, -syntax id- is just illegal
-syntax- syntax.)


Rodrigo A. Alfaro

> If you are planning to run this program only in one dataset
> (that uses id as
> cross-sectional identifier) you dont need to put it in the
> syntax. When you
> put id in the syntax you are creating a temporary variable
> called id that
> will be use further, for that reason you have to invoke it using `id'
> instead of id.
> You are not using [if] and [in], putting these in the syntax
> line just
> allows to use them. This means that the users can conditional
> the result but
> your program still needs to apply the restrictions... read
> marksample to get
> a formal uses of [if] and [in].
> You have to define a temporary variable count in order to
> prevent that this
> variable already exists in the dataset. Again, you have to
> invoke it using
> `count' instead of count. After keep you will lose
> information, be careful
> with that.
> Finally, I don't understand what did you mean with "compile".
> You just load
> the program with -do balance- or even more, you can save this
> program with
> the extension .ado into your personal folder and this will be
> your own
> command.
> Rodrigo.
> PS: This is my version of your program:
> program balance
>     version 8.2
>     syntax [if] [in]
>     tempvar count
>     qui     {
>         local ivar: char _dta[iis]
>         by `ivar': egen `count'=count(`ivar') `if' `in'
>         sum `count'
>         local max=r(max)
>         keep if `count'==`max'
>     }
> end
> I dropped rclass, because I don't need to save any value
> after reducing the
> panel. Also, I deleted id of the syntax because you can use
> char _dta[iis]
> that tells you which cross-sectional variable was defined
> using -tsset-.
> Note that local variables as well [if] and [in] are invoked using `'.

Dirk Nachbar

> I am trying to write my first Stata program and was wondering
> if someone
> could go through it and tell me if it's correct, how I should
> refer to id
> and what rclass means (just copied that).
> Another thing, I compiled it once and then wanted to
> recompile it. How do I
> do that?
> /*
> program to balance an unbalanced panel, keep only those
> individuals with the
> max duration
> */
> program balance, rclass
> version 8.2
> syntax id [if] [in]
> qui     {
>     sort id
>     by id: egen count=count(id)
>     sum count
>     local max=r(max)
>     keep if count==max
>     }
> end

*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index