[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: A loop for S.D calculation in paneldata set

From	"Austin Nichols" <[email protected]>
To	[email protected]
Subject	Re: st: A loop for S.D calculation in paneldata set
Date	Tue, 22 Apr 2008 09:46:22 -0400

Asgar Khademvatani <[email protected]> :
When you say "use observations numbers: 1, 45, 87, and 131 in
calculation of S.D for year 1958" it indicates to me that you have
variables stacked like so:

    obs   year      name  value
      1   1958   retail      10
      2   1959   retail      12
...
     43   2000   retail      16
     44   1958     FIRE       5
     45   1959     FIRE       9
...
     86   2000     FIRE      10
     87   1958    manuf      15
     88   1959    manuf      19
...
    129   2000    manuf      10
    130   1958   public    3
    131   1959   public    9
...
    172   2000   public   10

and you actually mean to use observations numbers 1, 44, 87, and 130
in calculations for 1958.

It may be the case that working in "wide" format is easier for some
downstream calculations, though -egen- is the simplest approach for
the question at hand (i.e. Maarten's advice is spot on for the
specific question you asked):

clear
clear
input obs year str7 name v
 1  1958 "retail" 10
 2  1959 "retail" 12
 43 2000 "retail" 16
 44 1958 "FIRE" 5
 45 1959 "FIRE" 9
 86 2000 "FIRE" 10
 87 1958 "manuf" 15
 88 1959 "manuf" 19
129 2000 "manuf" 10
130 1958 "public" 3
131 1959 "public" 9
172 2000 "public" 10
end
li, noo clean
egen mn=mean(v), by(year)
egen sd=sd(v), by(year)
encode name, g(ind)
drop obs
reshape wide v n, i(year) j(ind)
g m=(v1+v2+v3+v4)/4
g sd4=sqrt(((v1-m)^2+(v2-m)^2+(v3-m)^2+(v4-m)^2)/4)
g sd3=sd4*sqrt(4/3)
li, noo

There are some calculations that are easier in wide format, though
mean and SD are not among them. If you decide to perform some
calculations in wide format, you can always -reshape- back to long
format when you want to.

On Tue, Apr 22, 2008 at 7:01 AM, Nick Cox <[email protected]> wrote:
> I agree with Maarten. No such loop seems to be called for.
>
>  Other possibilities include -collapse- for reduction to a new dataset
>  and -tabstat- to see summary statistics.
>
>  In any case, if you need to select (e.g.) the year being 1958 using -if-
>  not -in- is generally a much
>  better way to go.
>
>  Nick
>  [email protected]
>
>  Maarten buis
>
>
>
>  --- Asgar Khademvatani <[email protected]> wrote:
>  > I am using stata 8.2 platform in Windows. I have loaded a panel data
>  > set in a long-format  for 4 sectors each sketched from 1958 to 2000.
>  > Thus, I have 172 observations for each variable. My target is
>  > calculating S.D or mean cross-4 sectors for each year and graph over
>  > time. For more clarity, for instance, I would like to
>  > calculate S.D. of 4-sectors for year 1958. In doing so, I need to
>  > have a loop to use observations numbers: 1, 45, 87, and 131 in
>  > calculation of S.D for year 1958, and this continues for calculation
>  > of S.D cross-sectors for other years, and so on and so forth.
>
>  There is no need for a loop, you can use -by-. Say your year variable
>  is called year and you want the mean and standard deviation of the
>  variable called var:
>
>  bys year : egen mean = mean(var)
>  bys year : egen sd = se(var)
>
>  twoway line mean year , sort
>  twoway line sd year , sort
>
>
>  Notice that each year occurs four times in your graph, one for each
>  sector, but that you won't see it because all sectors have the same
>  cross-sector mean and standard deviation. However, they are in your
>  graph and make the graph larger in terms of memory than it needs to be.
>  With four sectors that is probably not going to be such a big deal, in
>  larger panels you can use the following trick which can save a lot of
>  memory for larger panels.
>
>  bys year : egen mean = mean(var)
>  bys year : egen sd = se(var)
>
>  gen byte mis = missing(mean, sd)
>  bys year mis : gen byte mark = _n == 1 if mis == 0
>
>  twoway line mean year if mark == 1, sort
>  twoway line sd year if mark == 1, sort
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: A loop for S.D calculation in paneldata set
  - From: Asgar Khademvatani <[email protected]>
- Re: st: A loop for S.D calculation in paneldata set
  - From: Maarten buis <[email protected]>
- RE: st: A loop for S.D calculation in paneldata set
  - From: "Nick Cox" <[email protected]>

Prev by Date: Re: st: re: moremata on the Mac
Next by Date: RE: st: trend in ORs across ordered levels of a 3rd variable
Previous by thread: RE: st: A loop for S.D calculation in paneldata set
Next by thread: st: Xtpoisson, fe bug?
Index(es):
- Date
- Thread