Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: st: Re: st: Re: st: picking the closest source from multiple — nested cond fail


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: Re: st: Re: st: Re: st: picking the closest source from multiple — nested cond fail
Date   Fri, 9 Aug 2013 02:57:44 +0100

Your original variables may be -double- but the results of these
-egen- and -generate- commands will be -float- unless you have -set
type double-.  For what you want, as soon as one variable is -float-
testing for exact equality (always tricky with non-integers) becomes a
small nightmare.

Nick
[email protected]


On 9 August 2013 01:18, László Sándor <[email protected]> wrote:

> Thanks, Sergiy, as always.
>
> My trouble is that I don't know which part of my code causes the
> problem. Then I could test it, or show a fail in a mock example.
>
> All my variables are double, by the way unless otherwise specified,
> i.e. for "closest."
>
> But the real problem is why the second nested cond fails. The first
> produces the "closest" integer values that seem to be correct. And
> they do look integer, as I tried to verify. But the second nested
> cond, that assigns values run all the way to missing values, which I
> don't understand.
>
> I will let you know if I find the source of problem.

On Thu, Aug 8, 2013 at 7:44 PM, Sergiy Radyakin <[email protected]> wrote:

>> See if this is the reason for problems:
>> ....cond(minmeanskvdiff==meanmmskvdiff & .....
>> Are your diffs doubles? try (a-b)<epsilon for conditions of that type.
>>
>> Or rewrite your example based on auto.dta, and repost.
>>
>> Writing huge gens like that makes it really difficult to read the code.

On Thu, Aug 8, 2013 at 6:46 PM, László Sándor <[email protected]> wrote:

>>> Please let me ask for some help because I cannot figure out why my
>>> code works for some observations but not others. Maybe this is a
>>> -cond- or precision issue and we all learnt something.
>>>
>>> I need to use financial prices from various sources, but maddeningly,
>>> the sources don't line up completely (there is some ambiguity about
>>> the assets in question). I do have some reference prices for a few
>>> years for a part of the panel. I want to price at least those assets
>>> right.
>>>
>>> I think have some code that calculates the average distance from each
>>> source, and fills a variable about which came closest. This could be
>>> subject to rounding errors, but this seems to fill in impressively
>>> many values. And manual checks verify that the indicated source is
>>> indeed the closest to the reference.
>>>
>>> The next round of -cond-s should pick the value from the right source
>>> but produces many missing values. The conditions are all the
>>> source-indicator being equal to a single-digit integer value, which
>>> seems to be true for the variable, e.g. the label apply to it, I can
>>> tab it etc.
>>>
>>> What is going on? I do have further details, but would confuse you
>>> more only if necessary.
>>>
>>> * Code to generate the closest source for an asset, with panel id isin
>>>
>>> foreach v in bb ds ms fs mm {
>>> gen fakerawpriceornav_`v'=cond(mi(rawpriceornav_`v'),2*rawpriceornav_skv,rawpriceornav_`v')
>>> // I need this to punish a source missing when "SKV" reference still
>>> has value, otherwise a missing difference would benefit the source by
>>> the logic of -egen mean-
>>> gen `v'skvdiff = abs(fakerawpriceornav_`v'-rawpriceornav_skv)
>>> bys isin: egen mean`v'skvdiff = mean(`v'skvdiff)
>>> }
>>> egen minmeanskvdiff = rowmin(mean*skvdiff)
>>>
>>> g byte closest = cond(minmeanskvdiff==meanbbskvdiff & !mi(minmeanskvdiff),1, ///
>>> cond(minmeanskvdiff==meanmmskvdiff & !mi(minmeanskvdiff),2, ///
>>> cond(minmeanskvdiff==meanmsskvdiff & !mi(minmeanskvdiff),3, ///
>>>   cond(minmeanskvdiff==meanfsskvdiff & !mi(minmeanskvdiff),4, ///
>>>   cond(minmeanskvdiff==meandsskvdiff & !mi(minmeanskvdiff),5, ///     )
>>> .)))))
>>>
>>> * So far not many missing values generated, though vast majority of
>>> "closest" is simply the first value, I think this is still correct
>>> (There are many ties, and I break the tie in favor of Bloomberg.)
>>>
>>> la def closest 1 "Bloomberg" 3 "Morningstar" 5 "Datastream" 2
>>> "MoneyMate" 4 "FactSet"
>>> la val closest closest
>>> * This verifies that labels are picked up by the integer values.
>>>
>>> g rawpriceornav_pick = ///
>>> cond(closest==1,rawpriceornav_bb, ///
>>> cond(closest==2,rawpriceornav_mm, ///
>>> cond(closest==3,rawpriceornav_ms, ///
>>> cond(closest==4,rawpriceornav_fs, ///
>>> cond(closest==5,rawpriceornav_ds, ///
>>> .)))))
>>> * But here I get back a ton of missing values. Yes, it can happen that
>>> in my panel a source is the closest for an ISIN but has missing values
>>> for a few years, so closest will always have less missing than this
>>> variable. But only 20% or so of this variable gets filled here, which
>>> is not reasonable.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index