one important note: the number of transactions per spread should be small, probably smaller than 10 for virtually every spread. Jens -------- Original-Nachricht -------- > Datum: Thu, 25 Aug 2011 15:52:23 +0100 > Von: Nick Cox <njcoxstata@gmail.com> > An: statalist@hsphsun2.harvard.edu > Betreff: Re: st: Relative Comparision between Observations > With that number, "my" approach can't get its shoes on, let alone run > down the track -- although it is I think what you asked for. > > I think you really need advice from people who do your kind of thing > in Stata, but unfortunately I am not one of them. I only have two > broad thoughts: think long not wide, and at some point -merge- will be > your friend. > > Nick > > On Thu, Aug 25, 2011 at 3:43 PM, <chaiselongue@gmx.de> wrote: > > Hi Nick, > > thanks a lot. > > The dataset contains 500 000 transactions (in addition to the 7 million > spreads), but I will use your approach as a starting point for an algorithm > that allows to cope with this large dataset. > > > > Any suggestion to get this done quickly is still very welcome. > > > > > > Best regards and thanks again, > > > > Jens > > > > > > > > > > > > > > > > -------- Original-Nachricht -------- > >> Datum: Thu, 25 Aug 2011 15:20:58 +0100 > >> Von: Nick Cox <njcoxstata@gmail.com> > >> An: statalist@hsphsun2.harvard.edu > >> Betreff: Re: st: Relative Comparision between Observations > > > >> For -transaction[2]- (e.g.) you can generate > >> > >> . gen within_2 = inrange(transaction[2], start, end) & isspread > >> > >> Is the number of transactions small enough to allow a variable for > >> every one of them? > >> > >> If so, this is crude but should work > >> > >> forval i = 1/`=_N' { > >> if isspread[`i'] == 0 gen within_`i' = > inrange(transaction[`i'], > >> start, end) & isspread > >> } > >> > >> A visceral reaction is that getting the wrong data structure is > >> horribly easy here, but people who work with this kind of data may be > >> able to advise constructively. > >> > >> Nick > >> > >> On Thu, Aug 25, 2011 at 2:55 PM, Jens Kruk <chaiselongue@gmx.de> wrote: > >> > Hi Nick, > >> > lets say the data looks like this: > >> > > >> > id____isspread____start____end____transaction > >> > 1_____1___________3________6______. > >> > 2_____0___________.________.______5 > >> > 3_____1___________2________5______. > >> > 4_____0___________.________.______5.5 > >> > > >> > > >> > > >> > now what I want Stata to do is to tell me (for example by creating > >> additional variables that contain the ids) that ids 2 and 4 occured > between > >> start and end date of observation 1 (5 and 5.5 are between 3 and 6) and > that id > >> 2 occured between the start and end date of spread 3 (5 is weakly > between > >> 2 and 5). > >> > A perfect result of the procedure would look like this: > >> > > >> > id____isspread____start____end____transaction____tr1___tr2 > >> > 1_____1___________3________6______.______________2_____4__ > >> > 2_____0___________.________.______5______________._____.__ > >> > 3_____1___________2________5______.______________2_____.__ > >> > 4_____0___________.________.______5.5____________._____.__ > >> > > >> > > >> > Best, Jens > >> > > >> > > >> > > >> > > >> > -------- Original-Nachricht -------- > >> >> Datum: Thu, 25 Aug 2011 14:22:19 +0100 > >> >> Von: Nick Cox <njcoxstata@gmail.com> > >> >> An: statalist@hsphsun2.harvard.edu > >> >> Betreff: Re: st: Relative Comparision between Observations > >> > > >> >> Please show a representative chunk of your data so that precisely > what > >> >> are your variables and your observations becomes clear. > >> >> > >> >> Nick > >> >> > >> >> On Thu, Aug 25, 2011 at 2:09 PM, <chaiselongue@gmx.de> wrote: > >> >> > >> >> > I want to perform the following task for a very large dataset (so > >> >> writing a Mata loop is probably not the solution): the dataset > consists > >> of two > >> >> sorts of data: spreads and transactions. Spreads do have a start and > an > >> end > >> >> date, while transactions only have a transaction date. Now I want to > >> know > >> >> whether some transaction happend between the start and end date of a > >> spread. > >> >> Ideally, I would like to have variables containing all the ids of > >> >> transactions that occured between the start and end data of the > spread > >> for each > >> >> spread. Is there a way to use inexact matching or merging for this ? > >> >> > This should be a familiar problem, however, I do not have a clue > how > >> to > >> >> solve it. > >> > >> * > >> * For searches and help try: > >> * http://www.stata.com/help.cgi?search > >> * http://www.stata.com/support/statalist/faq > >> * http://www.ats.ucla.edu/stat/stata/ > > > > -- > > Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir > > belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de > > * > > * For searches and help try: > > * http://www.stata.com/help.cgi?search > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ -- NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie! * For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

