Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Some kind of count or tabulation?

Subject   Re: st: Some kind of count or tabulation?
Date   Fri, 6 Jan 2006 17:37:19 -0500

Nick wrote:
> You could also
count occurrences != ".".
I think an if condition my apply here.

More generally, I do not fully understand what Barley wants.
If the goal is to see the number of missings, then nmissing and
npresent can do the job.

If he wants to count the number of missings by other characteristics,
he cant also use "nmissing if...".

But if we wants some other summary statistics (?), then the suggestions
of Nick, Scott, Sven and others are fine. He can then use summarize,
tabstat or tablemat to get where he wants.

Apology if I got it all wrong.

Warmest regards,



Nick J. Cox wrote:

I certainly and Scott probably overlooked the fact that
you were using "." as a personal code for missing.
By and large, Stata commands do not treat "." as meaning
missing. The main, and perhaps only, exception is -destring-,
which is working on the assumption that a string variable
is really a numeric variable trapped in a string body.
(-compare- used to be another exception.)

It follows that counting missings, whether using -egen- or my
more direct approach, won't work for you until you re-code
"." as "", hence Svend's suggestion. Otherwise, Scott's and Svend's
suggestions are suggesting complementary -egen- functions.

I can't explain why Scott's and my suggestions give
different results unless you have other variables that
are not captured by -Var*-. I used -*- as a wildcard,
not -Var*-.

In your code, the -sort- and the -by:- do no harm
but are completely irrelevant. It would be easier to
count "." rather than cycle through all the other
values. With your previous set-up,

gen nperiod = 0
foreach v of var Var* {
replace nperiod = nperiod + (`v' == ".")

gives you a count of period missings, after which

gen allpresent = nperiod == 0

gives what I think you want. You could also
count occurrences != ".".

For this and other reasons, -foreach- and -forval-
are strongly recommended. The usual searches
point to tutorials on those constructs.


(much editing in this digest)

barleywater is using Stata 8.2, and asked

> My data set looks like this:
> obs Var1 Var2 Var3 Var(nth)
> 1 jacn clstr lnreg pval
> 2 bstr . lgreg nopval
> 3 . rct . nopval
> 4 jacn clstr anova .

> I want to find out how many observations contained all the variables.
> In this example, only the first observation contained all the

Scott Merryman suggested

> egen all_var = rmiss(Var*)
> count if all_var == 0

Nick Cox commented

> It can also be done without generating a new variable.

> unab var : *
> local var : subinstr local var " " ",", all
> count if !mi(`var')

barleywater replied

> I understand what Scott tried to do but looking at his commands made
> me realised that perhaps he, and by extension also Nick too,
> misunderstood my question, which could be better expressed.
> I have less understanding of Nick's commands which use macros
> (afraid my Stata fluency doesn't go that far yet).
> However, running Scott's command and Nick's showed a
> difference of 1,
> e.g. Scott's would return a value of 78 whilst Nick's 77.
> I am not sure why that is the case. But
> neither was what I was looking for.
> Here's what I did to get what i want.
> gen dumvar1=.
> gen dumvar2=.
> .
> .
> .
> sort var1
> by var1: replace dumvar1 = 1 if var1 == "jacn"
> by var1: replace dumvar1 = 1 if var1 == "bstr"
> sort dumvar1
> replace dumvar1 = 0 if dumvar1==.
> .
> .
> .
> sort var2
> by var2: replace dumvar2 = 1 if var2 == "clstr"
> by var2: replace dumvar2 = 1 if var2 == "rct"
> by var2: replace dumvar2 = 1 if var2 == "xovr"
> sort dumvar2
> replace dumvar1 = 0 if dumvar1==.
> .
> .
> gen total = var1 + var2 +...
> sort total
> l obs total
> but...
> 1. not elegant (not a problem since it does the job)
> 2. it loses the information the variables conveyed by
> replacing with 1's
> (not ideal)
> I would appreciate further help/advice to shorten the do-file
> if possible (i think it needed -foreach val- at the beginning).

Svend Juul suggested

> I understand that your var1-varn are string variables. For strings,
> the missing value typically is a blank, not a period, so I would
> first:

> foreach V of varlist var1-varn {
> replace `V' = "" if `V' == "."
> }

> If you feel unsecure about the above construct, you might instead
> as many -replace- commands as you have variables:
> replace var1="" if var1=="."
> ...

> Now you can use egen's -robs()- function with the -strok- option:
> egen nonmiss = robs(var1-varn) , strok

> In Stata 9 the -robs()- function got the more telling name

> Now, the variable -nonmiss- tells the number of nonmissing (i.e.
> non-blank) values for each observation.

barleywater replied

> Your -egen- suggestion worked. Earlier on, at the
stage of inputting the data, I indeed used many

> replace var if ...

> in a do-file to replace blanks with a period before running my
> small do-file to count but I appreciate your -foreach- help

*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index