Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: crosstab with a large dataset


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: crosstab with a large dataset
Date   Thu, 2 Feb 2006 18:33:41 -0000

The main issue here seems to be getting Stata to 
be smart enough to recognise (for example) 
that "GRANT" and "SONYA GRANT" are the same 
person. You could try working in terms of 
last name only, which would be 

word(teacher, -1) 

-- but this might create the opposite problem 
of conflating different teachers. 

Alternatively there are various handles
in -groups- on SSC that might be useful. 

Nick 
[email protected] 

Gushta, Matthew

> i have a dataset containing student test scores. within this data are
> district, school, and teacher variables. i will be running a 
> mixed model
> incorporating all of these variables, unfortunately, the teacher
> variable is a manually-entered string variable. this means that within
> school X, there might be teachers A, B, and C, however, due to
> variations in data entry, teachers may appear different who 
> in fact are not.
> 
> in order to QC this and recode teacher values where 
> appropriate, i would
> like to basically crosstab school and teacher variables, so that only
> unique teacher values appear within each school. you can see that each
> school is presented in a separate table and teacher "grant" appears
> twice in school 2766 (see the syntax and sample output below).
> 
> ...given 2105 districts and 5262 teachers, this output is quite
> cumbersome.
> 
> is there a simpler, more compressed format for such output? i.e., a
> single table?
 
> bysort schirn: tab teacher
> 
> **************************************************
> OUTPUT
> --------------------------------------------------
> -> schirn = 2758
> 
>       TEACHER |      Freq.     Percent        Cum.
> --------------+-----------------------------------
>      HANTHORX |         14       31.11       31.11
>        MILLER |         15       33.33       64.44
>         SMITH |         16       35.56      100.00
> --------------+-----------------------------------
>         Total |         45      100.00
> 
> --------------------------------------------------
> -> schirn = 2766
> 
>       TEACHER |      Freq.     Percent        Cum.
> --------------+-----------------------------------
>      CAMPBELL |         24        7.50        7.50
>     DOLORESCO |         23        7.19       14.69
> FLEMING RACHE |         25        7.81       22.50
>         GRANT |          1        0.31       22.81
>          HAAS |         25        7.81       30.63
>      HARRISON |         25        7.81       38.44
>         JONES |         25        7.81       46.25
>       L SMITH |         25        7.81       54.06
>         LABUS |         25        7.81       61.88
>         OWENS |         25        7.81       69.69
>       SMIALEK |         22        6.88       76.56
>   SONYA GRANT |         25        7.81       84.38
>      STAUFFER |         25        7.81       92.19
>       WELLING |         25        7.81      100.00
> --------------+-----------------------------------
>         Total |        320      100.00
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index