Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: exploratory data analysis for finding substitutes and complements


From   "Dimitriy V. Masterov" <dvmaster@gmail.com>
To   Statalist <statalist@hsphsun2.harvard.edu>
Subject   st: exploratory data analysis for finding substitutes and complements
Date   Fri, 30 Sep 2011 11:34:50 -0400

I have a panel data set with store-level sales data for 125 items at a
chain restaurant. My variables are quantity sold of that item in a
particular store and time. My data looks like this: store_id, week,
hot_dogs, burgers, fries, and drinks. For each item, I would like to
figure out which items are substitutes or complements. For example, I
would expect hamburgers and fries and hot dogs and fries to be
complements, while hot dogs and hamburgers to be substitutes. I would
like to group items into clusters to make some time-series graphs, but
plotting all 125 items on the same graph is messy.

My first attempt at this involved calculating pairwise correlations
between items, and grabbing those where the correlation is above some
threshold X in absolute value. This works reasonably well, but I don't
want to do this by hand for all the items and my loop-over-items
approach is slow and inefficient.

Is there a command that can accomplish this for me? Or is there a
better way of doing this using some sort of clustering algorithm?

DVM
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index