Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Robert Picard <picard@netbox.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Relative Comparision between Observations |

Date |
Thu, 25 Aug 2011 15:54:53 -0400 |

A simple and direct Stata solution to this problem is to use -cross- to form all combinations of spreads and transactions and then drop observations where the transaction date is not within the spread dates. Obviously, this is not feasible given the size of the data in question. The following solution expands each spread by the number of calendar days covered and creates a date variable that can be used to match with the transaction dates. The -joinby- command is then used to form all combinations of spread and transaction that occur on each calendar day in the data. You might have to further split the problem by month or year. Hope this help, Robert *----------- begin example ------------- set seed 1234 * fake spread data, up to 3 days per spread clear set obs 10000 gen sid = _n gen start = mdy(1,1,2010) + int(runiform() * 365) gen end = start + int(runiform() * 3) tempfile spreads save "`spreads'" * fake transaction data clear set obs 5000 gen tid = _n gen tday = mdy(1,1,2010) + int(runiform() * 365) sort tday tempfile transactions save "`transactions'" * expand each spread by the number of days use "`spreads'" gen ndays = end - start + 1 expand ndays * each observation targets a specific day sort sid by sid: gen tday = start + _n - 1 keep sid ndays tday * form all combinations of spreads and transactions sort tday joinby tday using "`transactions'" format %d tday sort sid tday tid * If desired, reshape to wide by sid: gen n = _n sum n, meanonly local nmax = r(max) qui forvalues i = 1/`nmax' { by sid: gen trans`i' = tid[`i'] } by sid: keep if _n == 1 drop tday tid n *------------ end example -------------- On Thu, Aug 25, 2011 at 10:56 AM, Jens Kruk <chaiselongue@gmx.de> wrote: > one important note: the number of transactions per spread should be small, probably smaller than 10 for virtually every spread. > > Jens > > > -------- Original-Nachricht -------- >> Datum: Thu, 25 Aug 2011 15:52:23 +0100 >> Von: Nick Cox <njcoxstata@gmail.com> >> An: statalist@hsphsun2.harvard.edu >> Betreff: Re: st: Relative Comparision between Observations > >> With that number, "my" approach can't get its shoes on, let alone run >> down the track -- although it is I think what you asked for. >> >> I think you really need advice from people who do your kind of thing >> in Stata, but unfortunately I am not one of them. I only have two >> broad thoughts: think long not wide, and at some point -merge- will be >> your friend. >> >> Nick >> >> On Thu, Aug 25, 2011 at 3:43 PM, <chaiselongue@gmx.de> wrote: >> > Hi Nick, >> > thanks a lot. >> > The dataset contains 500 000 transactions (in addition to the 7 million >> spreads), but I will use your approach as a starting point for an algorithm >> that allows to cope with this large dataset. >> > >> > Any suggestion to get this done quickly is still very welcome. >> > >> > >> > Best regards and thanks again, >> > >> > Jens >> > >> > >> > >> > >> > >> > >> > >> > -------- Original-Nachricht -------- >> >> Datum: Thu, 25 Aug 2011 15:20:58 +0100 >> >> Von: Nick Cox <njcoxstata@gmail.com> >> >> An: statalist@hsphsun2.harvard.edu >> >> Betreff: Re: st: Relative Comparision between Observations >> > >> >> For -transaction[2]- (e.g.) you can generate >> >> >> >> . gen within_2 = inrange(transaction[2], start, end) & isspread >> >> >> >> Is the number of transactions small enough to allow a variable for >> >> every one of them? >> >> >> >> If so, this is crude but should work >> >> >> >> forval i = 1/`=_N' { >> >> if isspread[`i'] == 0 gen within_`i' = >> inrange(transaction[`i'], >> >> start, end) & isspread >> >> } >> >> >> >> A visceral reaction is that getting the wrong data structure is >> >> horribly easy here, but people who work with this kind of data may be >> >> able to advise constructively. >> >> >> >> Nick >> >> >> >> On Thu, Aug 25, 2011 at 2:55 PM, Jens Kruk <chaiselongue@gmx.de> wrote: >> >> > Hi Nick, >> >> > lets say the data looks like this: >> >> > >> >> > id____isspread____start____end____transaction >> >> > 1_____1___________3________6______. >> >> > 2_____0___________.________.______5 >> >> > 3_____1___________2________5______. >> >> > 4_____0___________.________.______5.5 >> >> > >> >> > >> >> > >> >> > now what I want Stata to do is to tell me (for example by creating >> >> additional variables that contain the ids) that ids 2 and 4 occured >> between >> >> start and end date of observation 1 (5 and 5.5 are between 3 and 6) and >> that id >> >> 2 occured between the start and end date of spread 3 (5 is weakly >> between >> >> 2 and 5). >> >> > A perfect result of the procedure would look like this: >> >> > >> >> > id____isspread____start____end____transaction____tr1___tr2 >> >> > 1_____1___________3________6______.______________2_____4__ >> >> > 2_____0___________.________.______5______________._____.__ >> >> > 3_____1___________2________5______.______________2_____.__ >> >> > 4_____0___________.________.______5.5____________._____.__ >> >> > >> >> > >> >> > Best, Jens >> >> > >> >> > >> >> > >> >> > >> >> > -------- Original-Nachricht -------- >> >> >> Datum: Thu, 25 Aug 2011 14:22:19 +0100 >> >> >> Von: Nick Cox <njcoxstata@gmail.com> >> >> >> An: statalist@hsphsun2.harvard.edu >> >> >> Betreff: Re: st: Relative Comparision between Observations >> >> > >> >> >> Please show a representative chunk of your data so that precisely >> what >> >> >> are your variables and your observations becomes clear. >> >> >> >> >> >> Nick >> >> >> >> >> >> On Thu, Aug 25, 2011 at 2:09 PM, <chaiselongue@gmx.de> wrote: >> >> >> >> >> >> > I want to perform the following task for a very large dataset (so >> >> >> writing a Mata loop is probably not the solution): the dataset >> consists >> >> of two >> >> >> sorts of data: spreads and transactions. Spreads do have a start and >> an >> >> end >> >> >> date, while transactions only have a transaction date. Now I want to >> >> know >> >> >> whether some transaction happend between the start and end date of a >> >> spread. >> >> >> Ideally, I would like to have variables containing all the ids of >> >> >> transactions that occured between the start and end data of the >> spread >> >> for each >> >> >> spread. Is there a way to use inexact matching or merging for this ? >> >> >> > This should be a familiar problem, however, I do not have a clue >> how >> >> to >> >> >> solve it. >> >> >> >> * >> >> * For searches and help try: >> >> * http://www.stata.com/help.cgi?search >> >> * http://www.stata.com/support/statalist/faq >> >> * http://www.ats.ucla.edu/stat/stata/ >> > >> > -- >> > Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir >> > belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de >> > * >> > * For searches and help try: >> > * http://www.stata.com/help.cgi?search >> > * http://www.stata.com/support/statalist/faq >> > * http://www.ats.ucla.edu/stat/stata/ >> > >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > -- > NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie! > Jetzt informieren: http://www.gmx.net/de/go/freephone > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Relative Comparision between Observations***From:*chaiselongue@gmx.de

**Re: st: Relative Comparision between Observations***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Relative Comparision between Observations***From:*"Jens Kruk" <chaiselongue@gmx.de>

**Re: st: Relative Comparision between Observations***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Relative Comparision between Observations***From:*chaiselongue@gmx.de

**Re: st: Relative Comparision between Observations***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Relative Comparision between Observations***From:*"Jens Kruk" <chaiselongue@gmx.de>

- Prev by Date:
**Re: st: Panel cointegration** - Next by Date:
**Re: st: insufficient observations r(2001)** - Previous by thread:
**Re: st: Relative Comparision between Observations** - Next by thread:
**st: 17th London Stata User Group Meeting** - Index(es):