Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Relative Comparision between Observations

 From [email protected] To [email protected] Subject Re: st: Relative Comparision between Observations Date Thu, 25 Aug 2011 16:43:34 +0200

```Hi Nick,
thanks a lot.
The dataset contains 500 000 transactions (in addition to the 7 million spreads), but I will use your approach as a starting point for an algorithm that allows to cope with this large dataset.

Any suggestion to get this done quickly is still very welcome.

Best regards and thanks again,

Jens

-------- Original-Nachricht --------
> Datum: Thu, 25 Aug 2011 15:20:58 +0100
> Von: Nick Cox <[email protected]>
> An: [email protected]
> Betreff: Re: st: Relative Comparision between Observations

> For -transaction[2]- (e.g.) you can generate
>
> . gen within_2 = inrange(transaction[2], start, end) & isspread
>
> Is the number of transactions small enough to allow a variable for
> every one of them?
>
> If so, this is crude but should work
>
> forval i = 1/`=_N' {
>      if isspread[`i'] == 0 gen within_`i' = inrange(transaction[`i'],
> }
>
> A visceral reaction is that getting the wrong data structure is
> horribly easy here, but people who work with this kind of data may be
>
> Nick
>
> On Thu, Aug 25, 2011 at 2:55 PM, Jens Kruk <[email protected]> wrote:
> > Hi Nick,
> > lets say the data looks like this:
> >
> > 1_____1___________3________6______.
> > 2_____0___________.________.______5
> > 3_____1___________2________5______.
> > 4_____0___________.________.______5.5
> >
> >
> >
> > now what I want Stata to do is to tell me (for example by creating
> additional variables that contain the ids) that ids 2 and 4 occured between
> start and end date of observation 1 (5 and 5.5 are between 3 and 6) and that id
> 2 occured between the start and end date of spread 3 (5 is weakly between
> 2 and 5).
> > A perfect result of the procedure would look like this:
> >
> > 1_____1___________3________6______.______________2_____4__
> > 2_____0___________.________.______5______________._____.__
> > 3_____1___________2________5______.______________2_____.__
> > 4_____0___________.________.______5.5____________._____.__
> >
> >
> > Best, Jens
> >
> >
> >
> >
> > -------- Original-Nachricht --------
> >> Datum: Thu, 25 Aug 2011 14:22:19 +0100
> >> Von: Nick Cox <[email protected]>
> >> An: [email protected]
> >> Betreff: Re: st: Relative Comparision between Observations
> >
> >> Please show a representative chunk of your data so that precisely what
> >>
> >> Nick
> >>
> >> On Thu, Aug 25, 2011 at 2:09 PM,  <[email protected]> wrote:
> >>
> >> > I want to perform the following task for a very large dataset (so
> >> writing a Mata loop is probably not the solution): the dataset consists
> of two
> >> sorts of data: spreads and transactions. Spreads do have a start and an
> end
> >> date, while transactions only have a transaction date. Now I want to
> know
> >> whether some transaction happend between the start and end date of a
> >> Ideally, I would like to have variables containing all the ids of
> >> transactions that occured between the start and end data of the spread
> for each
> >> spread. Is there a way to use inexact matching or merging for this ?
> >> > This should be a familiar problem, however, I do not have a clue how
> to
> >> solve it.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

--
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

• References: