Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Using values in an variable to save parts of an dataset


From   Sergiy Radyakin <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Using values in an variable to save parts of an dataset
Date   Thu, 2 May 2013 13:10:55 -0400

Andreas writes:

"for some industryids there are no companies listed with a "length"
over lets say 8, which leads to an error, and the do-file stops."

Andreas doesn't write, which command is aborting with error. I assume
it is -save-. This Stata command has an option 'emptyok' which allows
saving a dataset even if there is no observation in it. For details
see the help for -save- here:
http://www.stata.com/help.cgi?save

Best, Sergiy

On Thu, May 2, 2013 at 5:10 AM, Andreas Dall Frøseth
<[email protected]> wrote:
> I did a test run with the -count if- command, and it seems to be just what I need.
>
> Thank you, Nick!
>
> -Andreas
> ________________________________________
> Fra: [email protected] [[email protected]] p&#229; vegne av Nick Cox [[email protected]]
> Sendt: 2. mai 2013 10:47
> Til: [email protected]
> Emne: Re: st: Using values in an variable to save parts of an dataset
>
> -keep- and -drop- are just complementary. Use whatever is easier to
> think with. The complementarity means that
>
> keep if foo
>
> and
>
> drop if !foo
>
> are the same, and vice versa. Here -foo- could be a variable, or it
> could be pseudocode for a condition.
>
> So starting with
>
> sysuse auto
>
> (a)     drop if foreign == 1
>
> (b)     keep if !(foreign == 1)
>
> are the same. Now, why didn't I write there
>
> (b')    keep if foreign == 0
>
> ? That _is_ equivalent _in this example_, but I want to emphasise that
> -- as you get more and more conditions -- it can be helpful to
> parenthesise a long compound condition using ( )
>
> so the pair
>
> drop if  (<long complicated condition, possibly compound>)
>
> keep if !((<long complicated condition, possibly compound>)
>
> are the same, as are the pair
>
> drop if  !(<long complicated condition, possibly compound>)
>
> keep if ((<long complicated condition, possibly compound>)
>
> Negate the whole expression, once.
>
> Turning to your specific problem it sounds as any case with zero
> observations is stopping your code. (You don't show us your exact
> code!)
>
> You might try something like this
>
> count if <condition to be satisfied>
>
> if r(N) > 0 {
>        <actions if there are some data>
> }
>
> Alternatively, check out the -capture- command.
>
> Nick
> [email protected]
>
>
> On 2 May 2013 08:58, Andreas Dall Frøseth <[email protected]> wrote:
>
>> It looks like I need some help with this code again...
>>
>> After running this a couple of times, for different restrictions, I experience some difficulties. In addition to -keep-ing if industryid==`industryid', I wish to drop if the value in the variable "length" is less than a certain value.
>> But, for some industryids there are no companies listed with a "length" over lets say 8, which leads to an error, and the do-file stops.
>>
>> How can I make the code ignore those industries, and keep on splitting my dataset?
>
> Andreas Dall Frøseth [[email protected]]
>
>> I tried the approach where you exploited a feature of -use-, and it seems to work just fine.
>> Thank you.
>
> Nick Cox
>
>> Your local references are the wrong way round.
>>
>> foreach industryid in `industryids' {
>>         keep if industryid==`industryid'
>>         save industry_`industryid'
>> }
>>
>> You want each statement inside the loop to take on each individual value.
>>
>> This is all assuming that -industryid- is numeric.
>>
>> However, even with this fixed your loop won't work. Second time round,
>> all you have in memory is the first subset resulting from -keep-.
>>
>> But then you can exploit a feature of -use-:
>>
>> levelsof industryid, local(industryids)
>>
>> foreach industryid in `industryids' {
>>         use mydata if industryid==`industryid', clear
>>         save industry_`industryid'
>> }
>
> Andreas Dall Frøseth
>
>>> I'm trying to divide my dataset into pieces based on the values in an variable. My set is an large panel data, containing a number of companies. Each company has a value in the variable "industryid", which allows me to identify what industry it operates in.
>>> I am now trying to divide this large dataset into smaller sets for each single industry.
>>>
>>> The reason why I'm struggling is that I wish to apply this separation for a number of different sets, which might contain a different amount of industries, without having to identify the values in the industry-variable myself.
>>> I have tried to make a macro with the values using the "levelsof" command, and then apply it with:
>>>
>>> foreach industryid in `industryids' {
>>>         keep if industryid==`industryids'
>>>         save industry_`industryids'
>>> }
>>>
>>>
>>> But this runs back as "invalid '10'".
>>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index