Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | chaiselongue@gmx.de |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Relative Comparision between Observations |
Date | Thu, 25 Aug 2011 16:43:34 +0200 |
Hi Nick, thanks a lot. The dataset contains 500 000 transactions (in addition to the 7 million spreads), but I will use your approach as a starting point for an algorithm that allows to cope with this large dataset. Any suggestion to get this done quickly is still very welcome. Best regards and thanks again, Jens -------- Original-Nachricht -------- > Datum: Thu, 25 Aug 2011 15:20:58 +0100 > Von: Nick Cox <njcoxstata@gmail.com> > An: statalist@hsphsun2.harvard.edu > Betreff: Re: st: Relative Comparision between Observations > For -transaction[2]- (e.g.) you can generate > > . gen within_2 = inrange(transaction[2], start, end) & isspread > > Is the number of transactions small enough to allow a variable for > every one of them? > > If so, this is crude but should work > > forval i = 1/`=_N' { > if isspread[`i'] == 0 gen within_`i' = inrange(transaction[`i'], > start, end) & isspread > } > > A visceral reaction is that getting the wrong data structure is > horribly easy here, but people who work with this kind of data may be > able to advise constructively. > > Nick > > On Thu, Aug 25, 2011 at 2:55 PM, Jens Kruk <chaiselongue@gmx.de> wrote: > > Hi Nick, > > lets say the data looks like this: > > > > id____isspread____start____end____transaction > > 1_____1___________3________6______. > > 2_____0___________.________.______5 > > 3_____1___________2________5______. > > 4_____0___________.________.______5.5 > > > > > > > > now what I want Stata to do is to tell me (for example by creating > additional variables that contain the ids) that ids 2 and 4 occured between > start and end date of observation 1 (5 and 5.5 are between 3 and 6) and that id > 2 occured between the start and end date of spread 3 (5 is weakly between > 2 and 5). > > A perfect result of the procedure would look like this: > > > > id____isspread____start____end____transaction____tr1___tr2 > > 1_____1___________3________6______.______________2_____4__ > > 2_____0___________.________.______5______________._____.__ > > 3_____1___________2________5______.______________2_____.__ > > 4_____0___________.________.______5.5____________._____.__ > > > > > > Best, Jens > > > > > > > > > > -------- Original-Nachricht -------- > >> Datum: Thu, 25 Aug 2011 14:22:19 +0100 > >> Von: Nick Cox <njcoxstata@gmail.com> > >> An: statalist@hsphsun2.harvard.edu > >> Betreff: Re: st: Relative Comparision between Observations > > > >> Please show a representative chunk of your data so that precisely what > >> are your variables and your observations becomes clear. > >> > >> Nick > >> > >> On Thu, Aug 25, 2011 at 2:09 PM, <chaiselongue@gmx.de> wrote: > >> > >> > I want to perform the following task for a very large dataset (so > >> writing a Mata loop is probably not the solution): the dataset consists > of two > >> sorts of data: spreads and transactions. Spreads do have a start and an > end > >> date, while transactions only have a transaction date. Now I want to > know > >> whether some transaction happend between the start and end date of a > spread. > >> Ideally, I would like to have variables containing all the ids of > >> transactions that occured between the start and end data of the spread > for each > >> spread. Is there a way to use inexact matching or merging for this ? > >> > This should be a familiar problem, however, I do not have a clue how > to > >> solve it. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ -- Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/