Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: cluster analysis validation


From   "Dasinger, Lisa" <ldasinger@thezenith.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: cluster analysis validation
Date   Mon, 23 Apr 2012 11:45:04 -0700

Paul -  Thank you for your response.  I have been using -cluster
wardslinkage-, which produces the same result as long as the same
dataset is used.  It sounds like you are talking about kmeans.  In any
case, I will look into your program as it may also be helpful here.

Lisa


Date: Thu, 19 Apr 2012 17:41:56 -0400
From: Paul Millar <paulmi@nipissingu.ca>
Subject: Re: st: cluster analysis validation

Lisa,

Cluster analysis is empirical - the group assignments are based on
minimizing the "distance" between cases, given the number of groups.
So a different sample entails different groups.  If you run cluster
analysis many times on the same data you will also get different
results for the same data (because the starting case is different,
assigned randomly using -set seed-).  I have written a routine that
tests the reliability of a groups assignment by sampling the group
assignments and then estimating whether the probability of assignment
in a particular group is > 0.5 in the population of group assignments.
 See -help clustpop- after -ssc install clustpop-

You can also do this after pooling the data, as suggested earlier.

- - Paul


Lisa Dasinger, Ph.D.
Data Reporting Manager
Claims Analytics
 
Zenith Insurance Company
Pleasanton Regional Office
4309 Hacienda Drive, Suite 200
Pleasanton, CA 94588
 
Phone: 925.416.5235
RightFax: 925.460.1235
Branch: 925.460.0600
ldasinger@thezenith.com
 
www.TheZenith.com
 


***********************************************************
NOTICE:
This e-mail, including attachments, contains information
that may be confidential, protected by the attorney/client
or other privileges, or exempt from disclosure under
applicable law.  Further, this e-mail may contain
information that is proprietary and/or constitutes a trade
secret.  This e-mail, including attachments, constitutes 
non-public information intended to be conveyed only to the
designated recipient of this communication, please be
advised that any disclosure, dissemination, distribution,
copying, or other use of this communication or any attached
document is strictly prohibited.  If you have received this
communication in error, please notify the sender
immediately by reply e-mail and promptly destroy all
electronic and printed copies of this communication and
attached documents.

***********************************************************



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index