Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Some kind of count or tabulation?


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: Some kind of count or tabulation?
Date   Fri, 6 Jan 2006 21:28:58 -0000

I certainly and Scott probably overlooked the fact that 
you were using "." as a personal code for missing. 
By and large, Stata commands do not treat "." as meaning 
missing. The main, and perhaps only, exception is -destring-, 
which is working on the assumption that a string variable 
is really a numeric variable trapped in a string body. 
(-compare- used to be another exception.) 

It follows that counting missings, whether using -egen- or my 
more direct approach, won't work for you until you re-code
"." as "", hence Svend's suggestion. Otherwise, Scott's and Svend's 
suggestions are suggesting complementary -egen- functions. 

I can't explain why Scott's and my suggestions give
different results unless you have other variables that 
are not captured by -Var*-. I used -*- as a wildcard, 
not -Var*-. 

In your code, the -sort- and the -by:- do no harm 
but are completely irrelevant. It would be easier to 
count "." rather than cycle through all the other 
values. With your previous set-up, 

gen nperiod = 0 
foreach v of var Var* { 
	replace nperiod = nperiod + (`v' == ".") 
} 

gives you a count of period missings, after which 

gen allpresent = nperiod == 0 

gives what I think you want. You could also
count occurrences != ".". 

For this and other reasons, -foreach- and -forval-
are strongly recommended. The usual searches
point to tutorials on those constructs. 

Nick
n.j.cox 

(much editing in this digest) 

barleywater is using Stata 8.2, and asked  

> My data set looks like this:
> 
> obs	Var1	Var2	Var3	Var(nth)
> 1	jacn	clstr	lnreg	pval
> 2	bstr	.	lgreg	nopval
> 3	.	rct	.	nopval
> 4	jacn	clstr	anova	.

> I want to find out how many observations contained all the variables. 
> In this example, only the first observation contained all the variables. 

Scott Merryman suggested 

> egen all_var = rmiss(Var*)
> 
> count if all_var == 0

Nick Cox commented 

> It can also be done without generating a new variable. 

> unab var : * 
> local var : subinstr local var " " ",", all
> count if !mi(`var')

barleywater replied 
 
> I understand what Scott tried to do but looking at his commands made 
> me realised that perhaps he, and by extension also Nick too, 
> misunderstood my question, which could be better expressed. 
> I have less understanding of Nick's commands which use macros 
> (afraid my Stata fluency doesn't go that far yet).
> 
> However, running Scott's command and Nick's showed a 
> difference of 1, 
> e.g. Scott's would return a value of 78 whilst Nick's 77. 
> I am not sure why that is the case. But 
> neither was what I was looking for. 
> Here's what I did to get what i want.
> 
> gen dumvar1=.
> gen dumvar2=.
> .
> .
> .
> sort var1
> by var1: replace dumvar1 = 1 if var1 == "jacn"
> by var1: replace dumvar1 = 1 if var1 == "bstr"
> sort dumvar1
> replace dumvar1 = 0 if dumvar1==.
> .
> .
> .
> sort var2
> by var2: replace dumvar2 = 1 if var2 == "clstr"
> by var2: replace dumvar2 = 1 if var2 == "rct"
> by var2: replace dumvar2 = 1 if var2 == "xovr"
> sort dumvar2
> replace dumvar1 = 0 if dumvar1==.
> .
> .
> gen total = var1 + var2 +...
> sort total
> l obs total
> 
> but...
> 
> 1. not elegant (not a problem since it does the job)
> 2. it loses the information the variables conveyed by 
> replacing with 1's 
> (not ideal)
> 
> I would appreciate further help/advice to shorten the do-file 
> if possible (i think it needed -foreach val- at the beginning).

Svend Juul suggested 

> I understand that your var1-varn are string variables. For strings, 
> the missing value typically is a blank, not a period, so I would 
> first:

> foreach V of varlist var1-varn {
> 	replace `V' = "" if `V' == "."
> }

> If you feel unsecure about the above construct, you might instead give
> as many -replace- commands as you have variables:
>
> replace var1="" if var1=="."
> ...

> Now you can use egen's -robs()- function with the -strok- option:
>
> egen nonmiss = robs(var1-varn) , strok

> In Stata 9 the -robs()- function got the more telling name -rownonmiss()-.

> Now, the variable -nonmiss- tells the number of nonmissing (i.e.
> non-blank) values for each observation. 

barleywater replied 

> Your -egen- suggestion worked. Earlier on, at the 
stage of inputting the data, I indeed used many 

> replace var if ...

> in a do-file to replace blanks with a period before running my 
> small do-file to count but I appreciate your -foreach- help suggestion.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index