Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: exploratory data analysis for finding substitutes and complements

From   Nick Cox <>
Subject   Re: st: exploratory data analysis for finding substitutes and complements
Date   Fri, 30 Sep 2011 18:31:05 +0100

If this were my problem I would restructure to fewer variables and do
something like a correspondence analysis.

Ecologists often have data for lots of sites and lots of species, and
sometimes lots of times too. At first stab, their problem is similar:
you want to see which species occur together. With over a hundred
items some favourite methods such as scatter plot matrices and
correlation matrices pose as many problems as they solve. I think you
have to bite the multivariate bullet somehow.

I would collapse over week in the first instance.

In Stata terms, you would probably need to -reshape long- to get a
structure of item, store, week, expenditure.

2011/9/30 Cameron McIntosh <>:
> Hi Dimitriy,
> This type of analysis might be a bit dicey without basket data (record per customer with a transaction date, along with items purchased), but I don't imagine ecological data is completely prohibitive, either -- this is discussed in the Nestorov and Jukić (2003) paper below. I don't know about Stata specifically...
> Hahsler, M., Buchta, C., Gruen, B., & Hornik, K. (September 19, 2011). Mining Association Rules and Frequent Itemsets: Package 'arules', Version 1.0-6.
> Hahsler, M., Chelluboina, S. Hornik, K., & Buchta, C. (2011). The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets. Journal of Machine Learning Research, 12, 2021-2025.
> Zhang, S., & Wu, X. (2011). Fundamentals of association rules in data mining and knowledge discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(2), 97-116.
> Ben Messaoud, R., Loudcher Rabaséda, S. Missaoui, R. & Boussaid, O. (2008). OLEMAR: an On-Line Environment for Mining Association Rules in Multidimensional Data. In D. Taniar, (Ed.), Data Mining and Knowledge Discovery Technologies (pp. 1-35). IGI Global, 2008.
> Khan, A., Baharudin, B., & Khan, K. (2011). Mining customer data for decision-making using new hybrid classification algorithm. Journal of Theoretical and Applied Information Technology, 27(1), 54-61.
> Nestorov, S., & Jukić, N. (2003). Ad-Hoc Association-Rule Mining within the Data Warehouse. Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 8 - Volume 8. Washington, DC, USA: IEEE Computer Society.
> Cam
>> Date: Fri, 30 Sep 2011 11:34:50 -0400
>> Subject: st: exploratory data analysis for finding substitutes and complements
>> From:
>> To:
>> I have a panel data set with store-level sales data for 125 items at a
>> chain restaurant. My variables are quantity sold of that item in a
>> particular store and time. My data looks like this: store_id, week,
>> hot_dogs, burgers, fries, and drinks. For each item, I would like to
>> figure out which items are substitutes or complements. For example, I
>> would expect hamburgers and fries and hot dogs and fries to be
>> complements, while hot dogs and hamburgers to be substitutes. I would
>> like to group items into clusters to make some time-series graphs, but
>> plotting all 125 items on the same graph is messy.
>> My first attempt at this involved calculating pairwise correlations
>> between items, and grabbing those where the correlation is above some
>> threshold X in absolute value. This works reasonably well, but I don't
>> want to do this by hand for all the items and my loop-over-items
>> approach is slow and inefficient.
>> Is there a command that can accomplish this for me? Or is there a
>> better way of doing this using some sort of clustering algorithm?

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index