Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Counting firms in a panel dataset


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Counting firms in a panel dataset
Date   Thu, 16 Jan 2014 12:37:45 +0000

Good; thanks for closure to the thread.
Nick
[email protected]


On 16 January 2014 12:29, Miguel A. Duran <[email protected]> wrote:
> Thanks, Nick. This simple (and elegant) solution works. And my intuition
> about the differences between the numbers of agents counted was right.
>
> Best,
> Miguel.
>
> -----Mensaje original-----
> De: [email protected]
> [mailto:[email protected]] En nombre de Nick Cox
> Enviado el: jueves, 16 de enero de 2014 13:17
> Para: [email protected]
> Asunto: Re: st: Counting firms in a panel dataset
>
> This helps; thanks. You can identify individuals without data for their
> start dates by
>
> egen start_info = total(date == startdate), by(id)
>
> and then look at observations
>
> ... if start_info == 0
>
> Nick
> [email protected]
>
>
> On 16 January 2014 12:10, Miguel A. Duran <[email protected]> wrote:
>> Nick, thanks for your help. I will try to be clearer. There is no
>> fallacy in your logic argument, but this is not the problem. In
>> addition, what I am showing is a simplified version of the relevant
>> part of my dataset (the whole dataset has 178,410 observations and
>> about 40 variables), just to illustrate what I mean.
>> These are the codes I am using:
>> In this simplified version:
>> -codebook id if mean_var1 != 11- counts both agents (408 in my dataset).
>> -codebook id if mean_var1 != 11 & (var1 == 10/var2 | var1 == 11/var2)-
>> counts 1 agent (id1) (397 in my dataset).
>> But -codebook id if mean_var1 != 11 & !(var1 == 10/var2 | var1 ==
>> 11/var2)- also counts both agents. The reason is because -(var1 ==
>> 10/var2 | var1 ==
>> 11/var2)- focuses on any value of var1 equal to 10 or 11 if var2 == 1
>> (ie, if startdate == date).  Nevertheless, -!(var1 == 10/var2 | var1
>> == 11/var2)- refers to any observation where var1 and var2 are not
>> equal to 10 or 11 regardless the value of var2. Therefore,
>> observations 1, 2 and 4 for id1 and
>> 5-8 for id2 are taken into account, ie, both agents are counted.
>> What I want to count is agents (i) whose mean_var is not equal to 11,
>> (ii) and have no observation in the date of the startdate (eg, for
>> id2, startdate = 192, but there is no observation for that date).
>> Please, note that the latter requirement is not having a missing value
>> when startdate == date, but that there is no observation.
>>
>> obs  id     startdate    date   var1      var2    mean_var1
>>  1      1           189          187     10           .         10.75
>>  2      1           189          188     11           .         10.75
>>  3      1           189          189     11           1        10.75
>>  4      1           189          190     11           .         10.75
>>  5      2           192          189     10           .         10.5
>>  6      2           192          190     10           .         10.5
>>  7      2           192          191     11           .         10.5
>>  8      2           192          193     11           .         10.5
>>
>> -----Mensaje original-----
>> De: [email protected]
>> [mailto:[email protected]] En nombre de Nick Cox
>> Enviado el: jueves, 16 de enero de 2014 12:31
>> Para: [email protected]
>> Asunto: Re: st: Counting firms in a panel dataset
>>
>> Sorry, but I am lost here. Clearly I don't have your data and you
>> don't even show your code, nor do I understand in what sense what any
>> code used doesn't work.
>>
>> As I understand it, you want to identify the 11 observations that
>> appear when 408 are selected but do not appear when 397 are selected.
>> I am waving general logic at you, namely that
>>
>> the complement of A & B in A is A & !B
>>
>> and you don't identify a fallacy in that.
>>
>>  What are you showing us? It's not 11 observations.
>> Nick
>> [email protected]
>>
>>
>> On 16 January 2014 11:10, Miguel A. Duran <[email protected]> wrote:
>>> Yes, Nick, I tried something quite similar, and I have just tried
>>> what you propose. If I am not mistaken the reason why it doesn't work
>>> is because
>>> -!(var1 == 10/var2 | var1 == 11/var2)- includes observations 1, 2 and
>>> 4 for
>>> id1 and all observations of id2. Therefore, both agents are taken
>>> into account under -codebook id if...-
>>>
>>> obs  id     startdate    date   var1      var2       mean_var1
>>>    1      1           189          187     10           .
>>> 10.75
>>>    2      1           189          188     11           .
>>> 10.75
>>>    3      1           189          189     11           1
>> 10.75
>>>    4       1           189          190     11           .
>>> 10.75
>>>    5       2           192          189     10           .
>>> 10.5
>>>    6       2           192          190     10           .
>>> 10.5
>>>    7       2           192          191     11           .
>>> 10.5
>>>    8       2           192          193     11           .
>>> 10.5
>>>
>>> -----Mensaje original-----
>>> De: [email protected]
>>> [mailto:[email protected]] En nombre de Nick Cox
>>> Enviado el: jueves, 16 de enero de 2014 11:51
>>> Para: [email protected]
>>> Asunto: Re: st: Counting firms in a panel dataset
>>>
>>> Did you try it? As I understand it, the complement of
>>>
>>> A & B
>>>
>>> in A is
>>>
>>> A & !B
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 16 January 2014 10:36, Miguel A. Duran <[email protected]> wrote:
>>>> Thanks, Nick, for your answer. I thought of something similar to
>>>> what you propose, but if I am not mistaken it has a problem: I would
>>>> be counting both
>>>> id1 and id2, i.e., I would get again 408 (what I get just using
>>>> -codebook id if mean_var1 != 11-).
>>>>
>>>> id     startdate    date   var1      var2       mean_var1
>>>>  1           189          187     10           .               10.75
>>>>  1           189          188     11           .               10.75
>>>>  1           189          189     11           1              10.75
>>>>  1           189          190     11           .               10.75
>>>>  2           192          189     10           .               10.5
>>>>  2           192          190     10           .               10.5
>>>>  2           192          191     11           .               10.5
>>>>  2           192          193     11           .               10.5
>>>>
>>>> -----Mensaje original-----
>>>> De: [email protected]
>>>> [mailto:[email protected]] En nombre de Nick Cox
>>>> Enviado el: miércoles, 15 de enero de 2014 20:28
>>>> Para: [email protected]
>>>> Asunto: Re: st: Counting firms in a panel dataset
>>>>
>>>> I'd look at data that satisfy
>>>>
>>>> if mean_var1 != 11 & !(var1 == 10/var2 | var1 == 11/var2)
>>>>
>>>> i.e. negating the second condition. Note that if -var1- and -var2-
>>>> are both missing, then the second condition
>>>>
>>>> (var1 == 10/var2 | var1 == 11/var2)
>>>>
>>>> reduces to
>>>>
>>>> . == .
>>>>
>>>> which is always true.
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 15 January 2014 19:18, Miguel A. Duran <[email protected]> wrote:
>>>>> Hi, Statlisters. I am using -codebook- to count the number of
>>>>> agents in a panel dataset under different criteria. Under a
>>>>> criterion I get
>>>>> 408 agents and under another one I get 397. I have an intuition
>>>>> about the cause of this difference and I would like to check it
>>>>> out, but I do
>>>> not know how to do it.
>>>>> To help make clear my point, (the relevant part of) my dataset
>>>>> looks similar to this,
>>>>>
>>>>> id     startdate    date   var1      var2       mean_var1
>>>>> 1           189          187     10           .               10.75
>>>>> 1           189          188     11           .               10.75
>>>>> 1           189          189     11           1              10.75
>>>>> 1           189          190     11           .               10.75
>>>>> 2           192          189     10           .               10.5
>>>>> 2           192          190     10           .               10.5
>>>>> 2           192          191     11           .               10.5
>>>>> 2           192          193     11           .               10.5
>>>>>
>>>>> Using the command -codebook id if mean_var1 != 11- I get 408
>>>>> agents, but using the command -codebook id if mean_var1 != 11 &
>>>>> (var1 ==
>>>>> 10/var2 | var1 == 11/var2)- I get 397 agents. My intuition is that
>>>>> this happens because there are agents (like agent 2) that do not
>>>>> have the observation corresponding to the startdate. If I am right
>>>>> adding this requirement to the command -codebook id if mean_var1 !=
>>>>> 11- should count 11 agents, but I do not know how to include that
>>> requirement.
>>>> Will anyone please help with this?
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index