Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Count non-missing


From   n j cox <[email protected]>
To   [email protected]
Subject   Re: st: Count non-missing
Date   Sun, 29 Apr 2007 17:59:43 +0100

In other terminology, Nikolaos wants to identify, and to give sequence
numbers to, spells of non-missing values. When his -x- is missing, Nikolaos wants the counter to be missing; when it is non-missing, he wants the counter to go 1, 2, 3, ... .

The principles of identifying spells will be discussed in Stata Journal
7(2), out in about 5 weeks' time. Alternatively, -tsspell- from SSC is one user-written tool in this territory. But you can get there directly in at most two lines of Stata.

1. Consider the first non-missing value in each spell. Then

!missing(x) & missing(x[_n-1]) (*)

will evaluate to 1 for the observation with that value. There are
two true-or-false conditions here:

!missing(x) this value of x is not missing

and

missing(x[_n-1]) the previous value of x is missing.

Jointly, these two conditions define the first non-missing value in a spell.

Otherwise, (*) will evaluate to 0.

This criterion applies also to any non-missing value that is the first observed, as it then becomes

!missing(x[1])& missing(x[0])

This is not problematic, as any varname[0] is evaluated as missing.

-missing(x)- and -x < .- are equivalent, as any-non-missing value is deemed to be less than any missing value. Sometimes, we want to write
-x < .-, just for brevity.

2. We should set up a count

gen seq = .

3. Now the key step is

replace seq = cond(!missing(x) & missing(x[_n-1]), 1, seq[_n-1] + 1)
if x < .

That is

(a) if this observation contains the first non-missing value of a spell, set the count to 1

(b) otherwise, take the previous count and add 1.

4. Nikos wants to do this for panel data, but the generalisation is easy. Here it is:

----------------------------------------------- NJC solution
gen seq = .
bysort i (t) : replace seq =
cond(!missing(x) & missing(x[_n-1], 1, seq[_n-1] + 1) if x < .
-----------------------------------------------

Now compare Svend Juul's solution.

----------------------------------------------- Svend Juul solution
sort i t
gen var4=0
replace var4=1 if i>i[_n-1] & x<.
replace var4=1 if i==i[_n-1] & x[_n-1]==.
replace var4=1 if _n==1
replace var4=0 if x==.
replace var4=var4[_n-1]+1 if i==i[_n-1] & x<.
recode var4 (0=.)
------------------------------------------------

The principles are the same.

Nick
[email protected]

Nikolaos Kanellopoulos

I am trying to create a variable that counts the number of nonmissing
values for another variable, but starts counting from the beginning when
a missing value is found.

In the folowing example I have an individual identifier (i) and a time
variable (t). I want to create var4 which counts the number of non
missing observations of x by I and t, but I want it to start counting
when a missing value appears on x.

+------------------+
i t x var4
------------------
1 1 1 1
1 2 0 2
1 3 1 3
1 4 . .
1 5 1 1
------------------
2 1 1 1
2 2 1 2
2 3 . .
2 4 0 1
2 5 0 2
+------------------+

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index