Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SV: st: Using values in an variable to save parts of an dataset


From   Andreas Dall Frøseth <Andreas.Froseth@stud.nhh.no>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   SV: st: Using values in an variable to save parts of an dataset
Date   Thu, 2 May 2013 09:10:10 +0000

I did a test run with the -count if- command, and it seems to be just what I need.

Thank you, Nick!

-Andreas
________________________________________
Fra: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] p&#229; vegne av Nick Cox [njcoxstata@gmail.com]
Sendt: 2. mai 2013 10:47
Til: statalist@hsphsun2.harvard.edu
Emne: Re: st: Using values in an variable to save parts of an dataset

-keep- and -drop- are just complementary. Use whatever is easier to
think with. The complementarity means that

keep if foo

and

drop if !foo

are the same, and vice versa. Here -foo- could be a variable, or it
could be pseudocode for a condition.

So starting with

sysuse auto

(a)     drop if foreign == 1

(b)     keep if !(foreign == 1)

are the same. Now, why didn't I write there

(b')    keep if foreign == 0

? That _is_ equivalent _in this example_, but I want to emphasise that
-- as you get more and more conditions -- it can be helpful to
parenthesise a long compound condition using ( )

so the pair

drop if  (<long complicated condition, possibly compound>)

keep if !((<long complicated condition, possibly compound>)

are the same, as are the pair

drop if  !(<long complicated condition, possibly compound>)

keep if ((<long complicated condition, possibly compound>)

Negate the whole expression, once.

Turning to your specific problem it sounds as any case with zero
observations is stopping your code. (You don't show us your exact
code!)

You might try something like this

count if <condition to be satisfied>

if r(N) > 0 {
       <actions if there are some data>
}

Alternatively, check out the -capture- command.

Nick
njcoxstata@gmail.com


On 2 May 2013 08:58, Andreas Dall Frøseth <Andreas.Froseth@stud.nhh.no> wrote:

> It looks like I need some help with this code again...
>
> After running this a couple of times, for different restrictions, I experience some difficulties. In addition to -keep-ing if industryid==`industryid', I wish to drop if the value in the variable "length" is less than a certain value.
> But, for some industryids there are no companies listed with a "length" over lets say 8, which leads to an error, and the do-file stops.
>
> How can I make the code ignore those industries, and keep on splitting my dataset?

Andreas Dall Frøseth [Andreas.Froseth@stud.nhh.no]

> I tried the approach where you exploited a feature of -use-, and it seems to work just fine.
> Thank you.

Nick Cox

> Your local references are the wrong way round.
>
> foreach industryid in `industryids' {
>         keep if industryid==`industryid'
>         save industry_`industryid'
> }
>
> You want each statement inside the loop to take on each individual value.
>
> This is all assuming that -industryid- is numeric.
>
> However, even with this fixed your loop won't work. Second time round,
> all you have in memory is the first subset resulting from -keep-.
>
> But then you can exploit a feature of -use-:
>
> levelsof industryid, local(industryids)
>
> foreach industryid in `industryids' {
>         use mydata if industryid==`industryid', clear
>         save industry_`industryid'
> }

Andreas Dall Frøseth

>> I'm trying to divide my dataset into pieces based on the values in an variable. My set is an large panel data, containing a number of companies. Each company has a value in the variable "industryid", which allows me to identify what industry it operates in.
>> I am now trying to divide this large dataset into smaller sets for each single industry.
>>
>> The reason why I'm struggling is that I wish to apply this separation for a number of different sets, which might contain a different amount of industries, without having to identify the values in the industry-variable myself.
>> I have tried to make a macro with the values using the "levelsof" command, and then apply it with:
>>
>> foreach industryid in `industryids' {
>>         keep if industryid==`industryids'
>>         save industry_`industryids'
>> }
>>
>>
>> But this runs back as "invalid '10'".
>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index