Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Going through each observation of a variable


From   David Kantor <kantor.d@att.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Going through each observation of a variable
Date   Sat, 08 Jun 2013 17:44:00 -0400

Hello Derya,
See below.

At 02:52 PM 6/8/2013, you wrote:
Hi David,

Organization of the data is that I simply copy-pasted these prices in the data as additional variables. The variables Price1 and Price2 has 500 observations, each row representing a price vector. os1 and os2 are the expenditure shares for each individual and has 80,000 observations.

I am computing Y for each individual as the expenditure share of good1 for each individual (os1), multiplied by price of good 1 (P1) plus the same for good 2. If I had only a single price vector, this is straightforward to compute. I could just write 'genY=os1*P1+os2*P2'. But I have 500 different price vectors. I would like to generate Y 500 times, and take the average across the 500.

The program I posted choose randomly from these price vectors. But I don't want randomness at this stage. I would like to compute Y for each price vector one by one...This is what I meant by replication.

Here is an example with 3 price vectors and 10 individuals to show what I am trying to do: https://www.dropbox.com/s/boslxhpkyljcq45/Book1.xlsx

Thanks again, greatly appreciated!

Derya
[...]

It seems that you have two datasets:
1, prices: 500 observations
2, individuals: 80000 obsrvations

Or that in some virtual sense, this is what you have.
But it's still not clear how you have it organized.
Are all these observations packed into one dataset? Is it that they are together -- stacked on top of each other, then it is a meaningless "structure". On the other hand, maybe you need a cross-product of the two datasets; maybe it is already in that form; it would have 40000000 observations. That's a lot of data -- and redundant. but it might be the right shape to do the job. If it's not already in that shape, then you can combine them with -cross-.

But again, that's a lot of observations. Your system might choke.
I would guess that your main dataset if the individuals, and each individual needs to be mated with the prices data.

It may be better to store the prices data in a matrix or a virtual matrix using macros. Maybe that's what you have in mind. It may be a situation that works well in Mata, but I am no expert in that.

Other options:
        create wide data; still a lot of data.
step through each individual grabbing one at a time; cross that with the prices, and output the result (or write it to a Stata data file).

We can best proceed if you clarify how each of these two datasets are stored -- and if they are together, how.
--David

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index