Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Cluster Analysis: Probability, Reliability - Arbitrariness


From   Mathias Fürer <mathias.fuerer@students.unibe.ch>
To   statalist@hsphsun2.harvard.edu
Subject   st: Cluster Analysis: Probability, Reliability - Arbitrariness
Date   Sat, 2 Oct 2010 14:10:29 +0200

Dear Cluster-Analysis Specialists

I want to do a cluster analysis on certain variables. I chose to recode the ordinal variables to binary ones and doing the following cluster analysis:
type: hierarchical,  method: average,  similarity: Kulczynski 
I've got 4496 observations from which I drew a random sample of 100 to be able to draw a dendrogram of my cluster. For reliability reasons I drew several samples (no replace), using "set seed" to be able to replicate my results. Like this Stata draws totally different clusters from the different samples, using the same procedure. This kind for bad for reability reasons.

I am very short before my deadline and every help is very appreciated. Maybe someone of you did a similiar cluster analysis and have a do-file to share at which I could orientate.

Thanks a lot in advance and best regards,
Mathias

Here are my commands:

*---------- CONSTRUCTING BINARY VARIABLES FOR CLUSTER ANALYSES ----------

*Would have been easier with the following command:
*tab leseort, gen(readingplace)

recode preisabo (1=1) (2/8=0), gen(pa300)
la var pa300 "bis 300 Fr."
recode preisabo (2=1) (1=0) (3/8=0), gen(pa351)
la var pa351 "351-400 Fr."
recode preisabo (3=1) (1/2=0) (4/8=0), gen(pa401)
la var pa401 "401-450 Fr."
recode preisabo (4=1) (1/3=0) (5/8=0), gen(pa451)
la var pa451 "451-500 Fr."
recode preisabo (5=1) (1/4=0) (6/8=0), gen(pa501)
la var pa501 "501-550 Fr."
recode preisabo (6=1) (1/5=0) (7/8=0), gen(pa551)
la var pa551 "551-600 Fr."
recode preisabo (7=1) (1/6=0) (8=0), gen(pa600)
la var pa600 "401-450 Fr."

recode educ (1=1) (2/10=0), gen(edos01)
la var edos "obligatorische Schulzeit"
recode educ (2=1) (1=0) (3/10=0), gen(edbl02)
la var edbl02 "2- bis 4-jährige Berufslehre"
recode educ (3=1) (1/2=0) (4/10=0), gen(eddm03)
la var eddm03 "Berufs- oder Diplom-Mittelschule"
recode educ (4=1) (1/3=0) (5/10=0), gen(edls04)
la var edls04 "LehrerInnen-Seminar"
recode educ (5=1) (1/4=0) (6/10=0), gen(edma05)
la var edma05 "Matur"
recode educ (6=1) (1/5=0) (7/10=0), gen(edfh06)
la var edfh06 "Fachhochschule (inkl. HTL, HWV usw.)"
recode educ (7=1) (1/6=0) (8/10=0), gen(edhf07)
la var edhf07 "Höhere Fachhochschule"
recode educ (8=1) (1/7=0) (9/10=0), gen(edmp08)
la var edmp08 "Meisterprüfung oder Äquivalent"
recode educ (9=1) (1/8=0) (10=0), gen(edun09)
la var edun09 "Universitäts- oder ETH-Studium"
recode educ (10=1) (1/9=0), gen(eddr10)
la var eddr10 "Promotion, Habilitation, Professur"

recode alter (10/19=1) (20/92=0), gen(age10)
recode alter (20/29=1) (10/19=0) (30/92=0), gen(age20)
recode alter (30/39=1) (10/29=0) (40/92=0), gen(age30)
recode alter (40/49=1) (10/39=0) (50/92=0), gen(age40)
recode alter (50/59=1) (10/49=0) (60/92=0), gen(age50)
recode alter (60/69=1) (10/59=0) (70/92=0), gen(age60)
recode alter (70/79=1) (10/69=0) (80/92=0), gen(age70)
recode alter (80/89=1) (10/79=0) (90/92=0), gen(age80)
recode alter (90/92=1) (10/89=0), gen(age90)

recode alter (10/19=1) (20/29=2) (30/39=3) (40/49=4) (50/59=5) ///
			 (60/69=6) (70/79=7) (80/89=8) (90/92=9), gen(dec_age)
				 
recode sex (0=1) (1=0), gen(sexma)
gen sexfe=sex

recode andereprintmedien (0=1) (1=0), gen(anprno)
gen anpryes=andereprintmedien

recode internet (0=1) (1=0), gen(intno)
gen intyes=internet

recode lektuere (1=1) (2/5=0), gen(lek10)
recode lektuere (2=1) (1=0) (3/5=0), gen(lek20)
recode lektuere (3=1) (1/2=0) (4/5=0), gen(lek30)
recode lektuere (4=1) (1/3=0) (5=0), gen(lek40)
recode lektuere (5=1) (1/4=0), gen(lek50)

recode gratiszeitung (0=1) (1=0), gen(gratisno)
gen gratisyes=gratiszeitung


*Sampling (S3)
set seed 36734638
sample 100, count

*Cluster-Analysis
cluster averagelinkage pa300 pa351 pa401 pa451 pa501 ///
pa551 pa600 edos01 edbl02 eddm03 edls04 edma05 edfh06 ///
edhf07 edmp08 edun09 eddr10 age10 age20 age30 age40 ///
age50 age60 age70 age80 age90 sexma sexfe anprno ///
anpryes intno intyes lek10 lek20 lek30 lek40 lek50 ///
gratisno gratisyes, measure(Kulczynski)



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index