Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Relative Comparision between Observations


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Relative Comparision between Observations
Date   Thu, 25 Aug 2011 15:52:23 +0100

With that number, "my" approach can't get its shoes on, let alone run
down the track -- although it is I think what you asked for.

I think you really need advice from people who do your kind of thing
in Stata, but unfortunately I am not one of them. I only have two
broad thoughts: think long not wide, and at some point -merge- will be
your friend.

Nick

On Thu, Aug 25, 2011 at 3:43 PM,  <chaiselongue@gmx.de> wrote:
> Hi Nick,
> thanks a lot.
> The dataset contains 500 000 transactions (in addition to the 7 million spreads), but I will use your approach as a starting point for an algorithm that allows to cope with this large dataset.
>
> Any suggestion to get this done quickly is still very welcome.
>
>
> Best regards and thanks again,
>
> Jens
>
>
>
>
>
>
>
> -------- Original-Nachricht --------
>> Datum: Thu, 25 Aug 2011 15:20:58 +0100
>> Von: Nick Cox <njcoxstata@gmail.com>
>> An: statalist@hsphsun2.harvard.edu
>> Betreff: Re: st: Relative Comparision between Observations
>
>> For -transaction[2]- (e.g.) you can generate
>>
>> . gen within_2 = inrange(transaction[2], start, end) & isspread
>>
>> Is the number of transactions small enough to allow a variable for
>> every one of them?
>>
>> If so, this is crude but should work
>>
>> forval i = 1/`=_N' {
>>      if isspread[`i'] == 0 gen within_`i' = inrange(transaction[`i'],
>> start, end) & isspread
>> }
>>
>> A visceral reaction is that getting the wrong data structure is
>> horribly easy here, but people who work with this kind of data may be
>> able to advise constructively.
>>
>> Nick
>>
>> On Thu, Aug 25, 2011 at 2:55 PM, Jens Kruk <chaiselongue@gmx.de> wrote:
>> > Hi Nick,
>> > lets say the data looks like this:
>> >
>> > id____isspread____start____end____transaction
>> > 1_____1___________3________6______.
>> > 2_____0___________.________.______5
>> > 3_____1___________2________5______.
>> > 4_____0___________.________.______5.5
>> >
>> >
>> >
>> > now what I want Stata to do is to tell me (for example by creating
>> additional variables that contain the ids) that ids 2 and 4 occured between
>> start and end date of observation 1 (5 and 5.5 are between 3 and 6) and that id
>> 2 occured between the start and end date of spread 3 (5 is weakly between
>> 2 and 5).
>> > A perfect result of the procedure would look like this:
>> >
>> > id____isspread____start____end____transaction____tr1___tr2
>> > 1_____1___________3________6______.______________2_____4__
>> > 2_____0___________.________.______5______________._____.__
>> > 3_____1___________2________5______.______________2_____.__
>> > 4_____0___________.________.______5.5____________._____.__
>> >
>> >
>> > Best, Jens
>> >
>> >
>> >
>> >
>> > -------- Original-Nachricht --------
>> >> Datum: Thu, 25 Aug 2011 14:22:19 +0100
>> >> Von: Nick Cox <njcoxstata@gmail.com>
>> >> An: statalist@hsphsun2.harvard.edu
>> >> Betreff: Re: st: Relative Comparision between Observations
>> >
>> >> Please show a representative chunk of your data so that precisely what
>> >> are your variables and your observations becomes clear.
>> >>
>> >> Nick
>> >>
>> >> On Thu, Aug 25, 2011 at 2:09 PM,  <chaiselongue@gmx.de> wrote:
>> >>
>> >> > I want to perform the following task for a very large dataset (so
>> >> writing a Mata loop is probably not the solution): the dataset consists
>> of two
>> >> sorts of data: spreads and transactions. Spreads do have a start and an
>> end
>> >> date, while transactions only have a transaction date. Now I want to
>> know
>> >> whether some transaction happend between the start and end date of a
>> spread.
>> >> Ideally, I would like to have variables containing all the ids of
>> >> transactions that occured between the start and end data of the spread
>> for each
>> >> spread. Is there a way to use inexact matching or merging for this ?
>> >> > This should be a familiar problem, however, I do not have a clue how
>> to
>> >> solve it.
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> --
> Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
> belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index