Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Miguel A. Duran" <maduran@uma.es> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: Counting firms in a panel dataset |
Date | Thu, 16 Jan 2014 13:29:17 +0100 |
Thanks, Nick. This simple (and elegant) solution works. And my intuition about the differences between the numbers of agents counted was right. Best, Miguel. -----Mensaje original----- De: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] En nombre de Nick Cox Enviado el: jueves, 16 de enero de 2014 13:17 Para: statalist@hsphsun2.harvard.edu Asunto: Re: st: Counting firms in a panel dataset This helps; thanks. You can identify individuals without data for their start dates by egen start_info = total(date == startdate), by(id) and then look at observations ... if start_info == 0 Nick njcoxstata@gmail.com On 16 January 2014 12:10, Miguel A. Duran <maduran@uma.es> wrote: > Nick, thanks for your help. I will try to be clearer. There is no > fallacy in your logic argument, but this is not the problem. In > addition, what I am showing is a simplified version of the relevant > part of my dataset (the whole dataset has 178,410 observations and > about 40 variables), just to illustrate what I mean. > These are the codes I am using: > In this simplified version: > -codebook id if mean_var1 != 11- counts both agents (408 in my dataset). > -codebook id if mean_var1 != 11 & (var1 == 10/var2 | var1 == 11/var2)- > counts 1 agent (id1) (397 in my dataset). > But -codebook id if mean_var1 != 11 & !(var1 == 10/var2 | var1 == > 11/var2)- also counts both agents. The reason is because -(var1 == > 10/var2 | var1 == > 11/var2)- focuses on any value of var1 equal to 10 or 11 if var2 == 1 > (ie, if startdate == date). Nevertheless, -!(var1 == 10/var2 | var1 > == 11/var2)- refers to any observation where var1 and var2 are not > equal to 10 or 11 regardless the value of var2. Therefore, > observations 1, 2 and 4 for id1 and > 5-8 for id2 are taken into account, ie, both agents are counted. > What I want to count is agents (i) whose mean_var is not equal to 11, > (ii) and have no observation in the date of the startdate (eg, for > id2, startdate = 192, but there is no observation for that date). > Please, note that the latter requirement is not having a missing value > when startdate == date, but that there is no observation. > > obs id startdate date var1 var2 mean_var1 > 1 1 189 187 10 . 10.75 > 2 1 189 188 11 . 10.75 > 3 1 189 189 11 1 10.75 > 4 1 189 190 11 . 10.75 > 5 2 192 189 10 . 10.5 > 6 2 192 190 10 . 10.5 > 7 2 192 191 11 . 10.5 > 8 2 192 193 11 . 10.5 > > -----Mensaje original----- > De: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] En nombre de Nick Cox > Enviado el: jueves, 16 de enero de 2014 12:31 > Para: statalist@hsphsun2.harvard.edu > Asunto: Re: st: Counting firms in a panel dataset > > Sorry, but I am lost here. Clearly I don't have your data and you > don't even show your code, nor do I understand in what sense what any > code used doesn't work. > > As I understand it, you want to identify the 11 observations that > appear when 408 are selected but do not appear when 397 are selected. > I am waving general logic at you, namely that > > the complement of A & B in A is A & !B > > and you don't identify a fallacy in that. > > What are you showing us? It's not 11 observations. > Nick > njcoxstata@gmail.com > > > On 16 January 2014 11:10, Miguel A. Duran <maduran@uma.es> wrote: >> Yes, Nick, I tried something quite similar, and I have just tried >> what you propose. If I am not mistaken the reason why it doesn't work >> is because >> -!(var1 == 10/var2 | var1 == 11/var2)- includes observations 1, 2 and >> 4 for >> id1 and all observations of id2. Therefore, both agents are taken >> into account under -codebook id if...- >> >> obs id startdate date var1 var2 mean_var1 >> 1 1 189 187 10 . >> 10.75 >> 2 1 189 188 11 . >> 10.75 >> 3 1 189 189 11 1 > 10.75 >> 4 1 189 190 11 . >> 10.75 >> 5 2 192 189 10 . >> 10.5 >> 6 2 192 190 10 . >> 10.5 >> 7 2 192 191 11 . >> 10.5 >> 8 2 192 193 11 . >> 10.5 >> >> -----Mensaje original----- >> De: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] En nombre de Nick Cox >> Enviado el: jueves, 16 de enero de 2014 11:51 >> Para: statalist@hsphsun2.harvard.edu >> Asunto: Re: st: Counting firms in a panel dataset >> >> Did you try it? As I understand it, the complement of >> >> A & B >> >> in A is >> >> A & !B >> >> Nick >> njcoxstata@gmail.com >> >> >> On 16 January 2014 10:36, Miguel A. Duran <maduran@uma.es> wrote: >>> Thanks, Nick, for your answer. I thought of something similar to >>> what you propose, but if I am not mistaken it has a problem: I would >>> be counting both >>> id1 and id2, i.e., I would get again 408 (what I get just using >>> -codebook id if mean_var1 != 11-). >>> >>> id startdate date var1 var2 mean_var1 >>> 1 189 187 10 . 10.75 >>> 1 189 188 11 . 10.75 >>> 1 189 189 11 1 10.75 >>> 1 189 190 11 . 10.75 >>> 2 192 189 10 . 10.5 >>> 2 192 190 10 . 10.5 >>> 2 192 191 11 . 10.5 >>> 2 192 193 11 . 10.5 >>> >>> -----Mensaje original----- >>> De: owner-statalist@hsphsun2.harvard.edu >>> [mailto:owner-statalist@hsphsun2.harvard.edu] En nombre de Nick Cox >>> Enviado el: miércoles, 15 de enero de 2014 20:28 >>> Para: statalist@hsphsun2.harvard.edu >>> Asunto: Re: st: Counting firms in a panel dataset >>> >>> I'd look at data that satisfy >>> >>> if mean_var1 != 11 & !(var1 == 10/var2 | var1 == 11/var2) >>> >>> i.e. negating the second condition. Note that if -var1- and -var2- >>> are both missing, then the second condition >>> >>> (var1 == 10/var2 | var1 == 11/var2) >>> >>> reduces to >>> >>> . == . >>> >>> which is always true. >>> Nick >>> njcoxstata@gmail.com >>> >>> >>> On 15 January 2014 19:18, Miguel A. Duran <maduran@uma.es> wrote: >>>> Hi, Statlisters. I am using -codebook- to count the number of >>>> agents in a panel dataset under different criteria. Under a >>>> criterion I get >>>> 408 agents and under another one I get 397. I have an intuition >>>> about the cause of this difference and I would like to check it >>>> out, but I do >>> not know how to do it. >>>> To help make clear my point, (the relevant part of) my dataset >>>> looks similar to this, >>>> >>>> id startdate date var1 var2 mean_var1 >>>> 1 189 187 10 . 10.75 >>>> 1 189 188 11 . 10.75 >>>> 1 189 189 11 1 10.75 >>>> 1 189 190 11 . 10.75 >>>> 2 192 189 10 . 10.5 >>>> 2 192 190 10 . 10.5 >>>> 2 192 191 11 . 10.5 >>>> 2 192 193 11 . 10.5 >>>> >>>> Using the command -codebook id if mean_var1 != 11- I get 408 >>>> agents, but using the command -codebook id if mean_var1 != 11 & >>>> (var1 == >>>> 10/var2 | var1 == 11/var2)- I get 397 agents. My intuition is that >>>> this happens because there are agents (like agent 2) that do not >>>> have the observation corresponding to the startdate. If I am right >>>> adding this requirement to the command -codebook id if mean_var1 != >>>> 11- should count 11 agents, but I do not know how to include that >> requirement. >>> Will anyone please help with this? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/