Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Going through each observation of a variable

From   Derya Karaci <>
To   "" <>
Subject   Re: st: Going through each observation of a variable
Date   Mon, 10 Jun 2013 07:26:14 -0700 (PDT)

Hi David, 
I do have a lot of data... That is why I was trying to avoid the cross them to make 4 million observations. But that would do the job. I can compute the expression for each individual and price level, then collapse it over individuals to get the mean and the standard deviation. 

I tried to program it instead of doing this. In a loop if I can go through price levels one by one, and compute the mean as I go along. 

Currently they are all in one dataset. They are not stacked, but prices are added as additional variables. 

"It may be better to store the prices data in a matrix or a virtual matrix using macros." This sounds promising but the prices are already stored as additional variables in the same data. I am wondering if I can use these variables...

Thanks again, David. Much appreciated.  


----- Original Message -----
From: David Kantor <>
Sent: Saturday, June 8, 2013 3:44 PM
Subject: Re: st: Going through each observation of a variable

Hello Derya,
See below.

At 02:52 PM 6/8/2013, you wrote:
>Hi David,
>Organization of the data is that I simply copy-pasted these prices 
>in the data as additional variables. The variables Price1 and Price2 
>has 500 observations, each row representing a price vector. os1 and 
>os2 are the expenditure shares for each individual and has 80,000 
>I am computing Y for each individual as the expenditure share of 
>good1 for each individual (os1), multiplied by price of good 1 (P1) 
>plus the same for good 2. If I had only a single price vector, this 
>is straightforward to compute. I could just write 
>'genY=os1*P1+os2*P2'. But I have 500 different price vectors. I 
>would like to generate Y 500 times, and take the average across the 500.
>The program I posted choose randomly from these price vectors. But I 
>don't want randomness at this stage. I would like to compute Y for 
>each price vector one by one...This is what I meant by replication.
>Here is an example with 3 price vectors and 10 individuals to show 
>what I am trying to do:
>Thanks again, greatly appreciated!

It seems that you have two datasets:
1, prices: 500 observations
2, individuals: 80000 obsrvations

Or that in some virtual sense, this is what you have.
But it's still not clear how you have it organized.
Are all these observations packed into one dataset? Is it that they 
are together -- stacked on top of each other, then it is a 
meaningless "structure".
On the other hand, maybe you need a cross-product of the two 
datasets; maybe it is already in that form; it would have 40000000 
That's a lot of data -- and redundant. but it might be the right 
shape to do the job. If it's not already in that shape, then you can 
combine them with -cross-.

But again, that's a lot of observations. Your system might choke.
I would guess that your main dataset if the individuals, and each 
individual needs to be mated with the prices data.

It may be better to store the prices data in a matrix or a virtual 
matrix using macros. Maybe that's what you have in mind. It may be a 
situation that works well in Mata, but I am no expert in that.

Other options:
         create wide data; still a lot of data.
         step through each individual grabbing one at a time; cross 
that with the prices, and output the result (or write it to a Stata data file).

We can best proceed if you clarify how each of these two datasets are 
stored -- and if they are together, how.

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index