Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Counting firms in a panel dataset
From
"Miguel A. Duran" <[email protected]>
To
<[email protected]>
Subject
RE: st: Counting firms in a panel dataset
Date
Thu, 16 Jan 2014 13:29:17 +0100
Thanks, Nick. This simple (and elegant) solution works. And my intuition
about the differences between the numbers of agents counted was right.
Best,
Miguel.
-----Mensaje original-----
De: [email protected]
[mailto:[email protected]] En nombre de Nick Cox
Enviado el: jueves, 16 de enero de 2014 13:17
Para: [email protected]
Asunto: Re: st: Counting firms in a panel dataset
This helps; thanks. You can identify individuals without data for their
start dates by
egen start_info = total(date == startdate), by(id)
and then look at observations
... if start_info == 0
Nick
[email protected]
On 16 January 2014 12:10, Miguel A. Duran <[email protected]> wrote:
> Nick, thanks for your help. I will try to be clearer. There is no
> fallacy in your logic argument, but this is not the problem. In
> addition, what I am showing is a simplified version of the relevant
> part of my dataset (the whole dataset has 178,410 observations and
> about 40 variables), just to illustrate what I mean.
> These are the codes I am using:
> In this simplified version:
> -codebook id if mean_var1 != 11- counts both agents (408 in my dataset).
> -codebook id if mean_var1 != 11 & (var1 == 10/var2 | var1 == 11/var2)-
> counts 1 agent (id1) (397 in my dataset).
> But -codebook id if mean_var1 != 11 & !(var1 == 10/var2 | var1 ==
> 11/var2)- also counts both agents. The reason is because -(var1 ==
> 10/var2 | var1 ==
> 11/var2)- focuses on any value of var1 equal to 10 or 11 if var2 == 1
> (ie, if startdate == date). Nevertheless, -!(var1 == 10/var2 | var1
> == 11/var2)- refers to any observation where var1 and var2 are not
> equal to 10 or 11 regardless the value of var2. Therefore,
> observations 1, 2 and 4 for id1 and
> 5-8 for id2 are taken into account, ie, both agents are counted.
> What I want to count is agents (i) whose mean_var is not equal to 11,
> (ii) and have no observation in the date of the startdate (eg, for
> id2, startdate = 192, but there is no observation for that date).
> Please, note that the latter requirement is not having a missing value
> when startdate == date, but that there is no observation.
>
> obs id startdate date var1 var2 mean_var1
> 1 1 189 187 10 . 10.75
> 2 1 189 188 11 . 10.75
> 3 1 189 189 11 1 10.75
> 4 1 189 190 11 . 10.75
> 5 2 192 189 10 . 10.5
> 6 2 192 190 10 . 10.5
> 7 2 192 191 11 . 10.5
> 8 2 192 193 11 . 10.5
>
> -----Mensaje original-----
> De: [email protected]
> [mailto:[email protected]] En nombre de Nick Cox
> Enviado el: jueves, 16 de enero de 2014 12:31
> Para: [email protected]
> Asunto: Re: st: Counting firms in a panel dataset
>
> Sorry, but I am lost here. Clearly I don't have your data and you
> don't even show your code, nor do I understand in what sense what any
> code used doesn't work.
>
> As I understand it, you want to identify the 11 observations that
> appear when 408 are selected but do not appear when 397 are selected.
> I am waving general logic at you, namely that
>
> the complement of A & B in A is A & !B
>
> and you don't identify a fallacy in that.
>
> What are you showing us? It's not 11 observations.
> Nick
> [email protected]
>
>
> On 16 January 2014 11:10, Miguel A. Duran <[email protected]> wrote:
>> Yes, Nick, I tried something quite similar, and I have just tried
>> what you propose. If I am not mistaken the reason why it doesn't work
>> is because
>> -!(var1 == 10/var2 | var1 == 11/var2)- includes observations 1, 2 and
>> 4 for
>> id1 and all observations of id2. Therefore, both agents are taken
>> into account under -codebook id if...-
>>
>> obs id startdate date var1 var2 mean_var1
>> 1 1 189 187 10 .
>> 10.75
>> 2 1 189 188 11 .
>> 10.75
>> 3 1 189 189 11 1
> 10.75
>> 4 1 189 190 11 .
>> 10.75
>> 5 2 192 189 10 .
>> 10.5
>> 6 2 192 190 10 .
>> 10.5
>> 7 2 192 191 11 .
>> 10.5
>> 8 2 192 193 11 .
>> 10.5
>>
>> -----Mensaje original-----
>> De: [email protected]
>> [mailto:[email protected]] En nombre de Nick Cox
>> Enviado el: jueves, 16 de enero de 2014 11:51
>> Para: [email protected]
>> Asunto: Re: st: Counting firms in a panel dataset
>>
>> Did you try it? As I understand it, the complement of
>>
>> A & B
>>
>> in A is
>>
>> A & !B
>>
>> Nick
>> [email protected]
>>
>>
>> On 16 January 2014 10:36, Miguel A. Duran <[email protected]> wrote:
>>> Thanks, Nick, for your answer. I thought of something similar to
>>> what you propose, but if I am not mistaken it has a problem: I would
>>> be counting both
>>> id1 and id2, i.e., I would get again 408 (what I get just using
>>> -codebook id if mean_var1 != 11-).
>>>
>>> id startdate date var1 var2 mean_var1
>>> 1 189 187 10 . 10.75
>>> 1 189 188 11 . 10.75
>>> 1 189 189 11 1 10.75
>>> 1 189 190 11 . 10.75
>>> 2 192 189 10 . 10.5
>>> 2 192 190 10 . 10.5
>>> 2 192 191 11 . 10.5
>>> 2 192 193 11 . 10.5
>>>
>>> -----Mensaje original-----
>>> De: [email protected]
>>> [mailto:[email protected]] En nombre de Nick Cox
>>> Enviado el: miércoles, 15 de enero de 2014 20:28
>>> Para: [email protected]
>>> Asunto: Re: st: Counting firms in a panel dataset
>>>
>>> I'd look at data that satisfy
>>>
>>> if mean_var1 != 11 & !(var1 == 10/var2 | var1 == 11/var2)
>>>
>>> i.e. negating the second condition. Note that if -var1- and -var2-
>>> are both missing, then the second condition
>>>
>>> (var1 == 10/var2 | var1 == 11/var2)
>>>
>>> reduces to
>>>
>>> . == .
>>>
>>> which is always true.
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 15 January 2014 19:18, Miguel A. Duran <[email protected]> wrote:
>>>> Hi, Statlisters. I am using -codebook- to count the number of
>>>> agents in a panel dataset under different criteria. Under a
>>>> criterion I get
>>>> 408 agents and under another one I get 397. I have an intuition
>>>> about the cause of this difference and I would like to check it
>>>> out, but I do
>>> not know how to do it.
>>>> To help make clear my point, (the relevant part of) my dataset
>>>> looks similar to this,
>>>>
>>>> id startdate date var1 var2 mean_var1
>>>> 1 189 187 10 . 10.75
>>>> 1 189 188 11 . 10.75
>>>> 1 189 189 11 1 10.75
>>>> 1 189 190 11 . 10.75
>>>> 2 192 189 10 . 10.5
>>>> 2 192 190 10 . 10.5
>>>> 2 192 191 11 . 10.5
>>>> 2 192 193 11 . 10.5
>>>>
>>>> Using the command -codebook id if mean_var1 != 11- I get 408
>>>> agents, but using the command -codebook id if mean_var1 != 11 &
>>>> (var1 ==
>>>> 10/var2 | var1 == 11/var2)- I get 397 agents. My intuition is that
>>>> this happens because there are agents (like agent 2) that do not
>>>> have the observation corresponding to the startdate. If I am right
>>>> adding this requirement to the command -codebook id if mean_var1 !=
>>>> 11- should count 11 agents, but I do not know how to include that
>> requirement.
>>> Will anyone please help with this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/