# Re: st: Relative Comparision between Observations

```one important note: the number of transactions per spread should be small, probably smaller than 10 for virtually every spread.

Jens

> With that number, "my" approach can't get its shoes on, let alone run
> down the track -- although it is I think what you asked for.
> I think you really need advice from people who do your kind of thing
> in Stata, but unfortunately I am not one of them. I only have two
> broad thoughts: think long not wide, and at some point -merge- will be
> Nick
On Thu, Aug 25, 2011 at 3:43 PM,  <[email protected]> wrote:
> > Hi Nick,
> > thanks a lot.
> > The dataset contains 500 000 transactions (in addition to the 7 million
> spreads), but I will use your approach as a starting point for an algorithm
> that allows to cope with this large dataset.
> > Any suggestion to get this done quickly is still very welcome.
> >
> > Best regards and thanks again,
> >
> > Jens
> >> For -transaction[2]- (e.g.) you can generate
> >>
> >> . gen within_2 = inrange(transaction[2], start, end) & isspread
> >>
> >> Is the number of transactions small enough to allow a variable for
> >> every one of them?
> >>
> >> If so, this is crude but should work
> >>
> >> forval i = 1/`=_N' {
> >>      if isspread[`i'] == 0 gen within_`i' =
> inrange(transaction[`i'],
> >> start, end) & isspread
> >> }
> >>
> >> A visceral reaction is that getting the wrong data structure is
> >> horribly easy here, but people who work with this kind of data may be
> >> able to advise constructively.
> >>
> >> Nick
> >>
On Thu, Aug 25, 2011 at 2:55 PM, Jens Kruk <[email protected]> wrote:
> >> > Hi Nick,
> >> > lets say the data looks like this:
> >> >
> >> > 1_____1___________3________6______.
> >> > 2_____0___________.________.______5
> >> > 3_____1___________2________5______.
> >> > 4_____0___________.________.______5.5
> >> >
> >> >
> >> >
> >> > now what I want Stata to do is to tell me (for example by creating
> >> additional variables that contain the ids) that ids 2 and 4 occured
> between
> >> start and end date of observation 1 (5 and 5.5 are between 3 and 6) and
> that id
> >> 2 occured between the start and end date of spread 3 (5 is weakly
> between
> >> 2 and 5).
> >> > A perfect result of the procedure would look like this:
> >> >
> >> > 1_____1___________3________6______.______________2_____4__
> >> > 2_____0___________.________.______5______________._____.__
> >> > 3_____1___________2________5______.______________2_____.__
> >> > 4_____0___________.________.______5.5____________._____.__
> >> >
> >> >
> >> > Best, Jens
> >> >
> >> >
> >> >
> >> >
> >> >> Please show a representative chunk of your data so that precisely
> what
> >> >> Nick
> >> >>
On Thu, Aug 25, 2011 at 2:09 PM,  <[email protected]> wrote:
> >> >> > I want to perform the following task for a very large dataset (so
> >> >> writing a Mata loop is probably not the solution): the dataset
> consists
> >> of two
> >> >> sorts of data: spreads and transactions. Spreads do have a start and
> an
> >> end
> >> >> date, while transactions only have a transaction date. Now I want to
> >> know
> >> >> whether some transaction happend between the start and end date of a
> >> >> Ideally, I would like to have variables containing all the ids of
> >> >> transactions that occured between the start and end data of the
> >> for each
> >> >> spread. Is there a way to use inexact matching or merging for this ?
> >> >> > This should be a familiar problem, however, I do not have a clue
> how
> >> to
> >> >> solve it.
> >>
```

• References: