Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Relative Comparision between Observations |

Date |
Thu, 25 Aug 2011 15:52:23 +0100 |

With that number, "my" approach can't get its shoes on, let alone run down the track -- although it is I think what you asked for. I think you really need advice from people who do your kind of thing in Stata, but unfortunately I am not one of them. I only have two broad thoughts: think long not wide, and at some point -merge- will be your friend. Nick On Thu, Aug 25, 2011 at 3:43 PM, <chaiselongue@gmx.de> wrote: > Hi Nick, > thanks a lot. > The dataset contains 500 000 transactions (in addition to the 7 million spreads), but I will use your approach as a starting point for an algorithm that allows to cope with this large dataset. > > Any suggestion to get this done quickly is still very welcome. > > > Best regards and thanks again, > > Jens > > > > > > > > -------- Original-Nachricht -------- >> Datum: Thu, 25 Aug 2011 15:20:58 +0100 >> Von: Nick Cox <njcoxstata@gmail.com> >> An: statalist@hsphsun2.harvard.edu >> Betreff: Re: st: Relative Comparision between Observations > >> For -transaction[2]- (e.g.) you can generate >> >> . gen within_2 = inrange(transaction[2], start, end) & isspread >> >> Is the number of transactions small enough to allow a variable for >> every one of them? >> >> If so, this is crude but should work >> >> forval i = 1/`=_N' { >> if isspread[`i'] == 0 gen within_`i' = inrange(transaction[`i'], >> start, end) & isspread >> } >> >> A visceral reaction is that getting the wrong data structure is >> horribly easy here, but people who work with this kind of data may be >> able to advise constructively. >> >> Nick >> >> On Thu, Aug 25, 2011 at 2:55 PM, Jens Kruk <chaiselongue@gmx.de> wrote: >> > Hi Nick, >> > lets say the data looks like this: >> > >> > id____isspread____start____end____transaction >> > 1_____1___________3________6______. >> > 2_____0___________.________.______5 >> > 3_____1___________2________5______. >> > 4_____0___________.________.______5.5 >> > >> > >> > >> > now what I want Stata to do is to tell me (for example by creating >> additional variables that contain the ids) that ids 2 and 4 occured between >> start and end date of observation 1 (5 and 5.5 are between 3 and 6) and that id >> 2 occured between the start and end date of spread 3 (5 is weakly between >> 2 and 5). >> > A perfect result of the procedure would look like this: >> > >> > id____isspread____start____end____transaction____tr1___tr2 >> > 1_____1___________3________6______.______________2_____4__ >> > 2_____0___________.________.______5______________._____.__ >> > 3_____1___________2________5______.______________2_____.__ >> > 4_____0___________.________.______5.5____________._____.__ >> > >> > >> > Best, Jens >> > >> > >> > >> > >> > -------- Original-Nachricht -------- >> >> Datum: Thu, 25 Aug 2011 14:22:19 +0100 >> >> Von: Nick Cox <njcoxstata@gmail.com> >> >> An: statalist@hsphsun2.harvard.edu >> >> Betreff: Re: st: Relative Comparision between Observations >> > >> >> Please show a representative chunk of your data so that precisely what >> >> are your variables and your observations becomes clear. >> >> >> >> Nick >> >> >> >> On Thu, Aug 25, 2011 at 2:09 PM, <chaiselongue@gmx.de> wrote: >> >> >> >> > I want to perform the following task for a very large dataset (so >> >> writing a Mata loop is probably not the solution): the dataset consists >> of two >> >> sorts of data: spreads and transactions. Spreads do have a start and an >> end >> >> date, while transactions only have a transaction date. Now I want to >> know >> >> whether some transaction happend between the start and end date of a >> spread. >> >> Ideally, I would like to have variables containing all the ids of >> >> transactions that occured between the start and end data of the spread >> for each >> >> spread. Is there a way to use inexact matching or merging for this ? >> >> > This should be a familiar problem, however, I do not have a clue how >> to >> >> solve it. >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > -- > Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir > belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Relative Comparision between Observations***From:*"Jens Kruk" <chaiselongue@gmx.de>

**References**:**st: Relative Comparision between Observations***From:*chaiselongue@gmx.de

**Re: st: Relative Comparision between Observations***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Relative Comparision between Observations***From:*"Jens Kruk" <chaiselongue@gmx.de>

**Re: st: Relative Comparision between Observations***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Relative Comparision between Observations***From:*chaiselongue@gmx.de

- Prev by Date:
**Re: st: Relative Comparision between Observations** - Next by Date:
**Re: st: Relative Comparision between Observations** - Previous by thread:
**Re: st: Relative Comparision between Observations** - Next by thread:
**Re: st: Relative Comparision between Observations** - Index(es):