Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: exploratory data analysis for finding substitutes and complements

From   Cameron McIntosh <[email protected]>
To   STATA LIST <[email protected]>
Subject   RE: st: exploratory data analysis for finding substitutes and complements
Date   Fri, 30 Sep 2011 13:11:16 -0400

Hi Dimitriy,
This type of analysis might be a bit dicey without basket data (record per customer with a transaction date, along with items purchased), but I don't imagine ecological data is completely prohibitive, either -- this is discussed in the Nestorov and Jukić (2003) paper below. I don't know about Stata specifically... 
Hahsler, M., Buchta, C., Gruen, B., & Hornik, K. (September 19, 2011). Mining Association Rules and Frequent Itemsets: Package 'arules', Version 1.0-6.
Hahsler, M., Chelluboina, S. Hornik, K., & Buchta, C. (2011). The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets. Journal of Machine Learning Research, 12, 2021-2025.
Zhang, S., & Wu, X. (2011). Fundamentals of association rules in data mining and knowledge discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(2), 97-116. ;
Ben Messaoud, R., Loudcher Rabaséda, S. Missaoui, R. & Boussaid, O. (2008). OLEMAR: an On-Line Environment for Mining Association Rules in Multidimensional Data. In D. Taniar, (Ed.), Data Mining and Knowledge Discovery Technologies (pp. 1-35). IGI Global, 2008.
Khan, A., Baharudin, B., & Khan, K. (2011). Mining customer data for decision-making using new hybrid classification algorithm. Journal of Theoretical and Applied Information Technology, 27(1), 54-61.
Nestorov, S., & Jukić, N. (2003). Ad-Hoc Association-Rule Mining within the Data Warehouse. Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 8 - Volume 8. Washington, DC, USA: IEEE Computer Society.
> Date: Fri, 30 Sep 2011 11:34:50 -0400
> Subject: st: exploratory data analysis for finding substitutes and complements
> From: [email protected]
> To: [email protected]
> I have a panel data set with store-level sales data for 125 items at a
> chain restaurant. My variables are quantity sold of that item in a
> particular store and time. My data looks like this: store_id, week,
> hot_dogs, burgers, fries, and drinks. For each item, I would like to
> figure out which items are substitutes or complements. For example, I
> would expect hamburgers and fries and hot dogs and fries to be
> complements, while hot dogs and hamburgers to be substitutes. I would
> like to group items into clusters to make some time-series graphs, but
> plotting all 125 items on the same graph is messy.
> My first attempt at this involved calculating pairwise correlations
> between items, and grabbing those where the correlation is above some
> threshold X in absolute value. This works reasonably well, but I don't
> want to do this by hand for all the items and my loop-over-items
> approach is slow and inefficient.
> Is there a command that can accomplish this for me? Or is there a
> better way of doing this using some sort of clustering algorithm?
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index