# st: RE: Re: Constructing a group level variable

 From "Martin Weiss" To Subject st: RE: Re: Constructing a group level variable Date Mon, 15 Feb 2010 20:21:09 +0100

```<>

" was not optimal it created a dataset of means,
and not counts at the school level (unless I was doing something
incorrect....)"

Just to maintain fairness w.r.t. -collapse-, it can produce all kinds of
statistics, as you can see from its help file, not just means. Glad my
solution worked out for you, though :-)

HTH
Martin

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of amardeep@ucla.edu
Sent: Montag, 15. Februar 2010 20:05
To: statalist@hsphsun2.harvard.edu
Subject: st: Re: Constructing a group level variable

Thanks to Nick and Martin for their replies. Suggestions I received
and their results are:

1) Use -collapse - : was not optimal it created a dataset of means,
and not counts at the school level (unless I was doing something
incorrect....)

2) use - contract - :

. contract timepub08 timefin08 schid

. sort schid

. l, sepby(schid)

+-------------------------------------+
| schid   timep~08   timef~08   _freq |
|-------------------------------------|
1. |     2          2          2      15 |
2. |     2          .          .       8 |
3. |     2          2          1       4 |
4. |     2          2          3       1 |
5. |     2          3          3       5 |
|-------------------------------------|
6. |     4          .          .       4 |
7. |     4          1          3       1 |
8. |     4          3          1       8 |

again, this was not precisely what I was looking for.

3) using reshape:

reshape wide time*, i(schid) j(studid)

forv i=1/3{
egen byte timep`i' = anycount(timepub0*), values(`i')
egen byte timef`i' = anycount(timefin0*), values(`i')
}

drop timepub0* timefin0*
order schid timep* timef*

l, noo

Note: I had to make minor changes in the code (to correct the varnames).
This worked like a charm! Although repeating it on my large dataset
will take quite a bit of time :-(

Data                               long   ->   wide
----------------------------------------------------------------------------
-
Number of obs.                     3530   ->      48
Number of variables                   4   ->    7061
j variable (3530 values)         studid   ->   (dropped)
xij variables:
timepub08   ->   timepub083779
timepub083780 ... timepub087567
timefin08   ->   timefin083779
timefin083780 ... timefin087567
----------------------------------------------------------------------------
-

.
. forv i=1/3{
2.         egen byte timep`i' = anycount(timepub0*), values(`i')
3.         egen byte timef`i' = anycount(timefin0*), values(`i')
4. }

.
. drop timepub0* timefin0*

. order schid timep* timef*

.
. l, noo

+-------------------------------------------------------------+
| schid   timep1   timep2   timep3   timef1   timef2   timef3 |
|-------------------------------------------------------------|
|     2        0       20        5        4       15        6 |
|     4       19       70       17       37       60        3 |
|     6       19       65        4       43       42        3 |
|     7       10       60       20       35       46        7 |
|     8       24       79        7       47       59        6 |
|-------------------------------------------------------------|
|    10       15       61       15       35       52        3 |
|    11        4       38        1       10       30        2 |
|    12        0       35        5       14       26        0 |
|    16       30       26        2       38       16        2 |
|    18        5       53       19       31       40        3 |
|-------------------------------------------------------------|
|    20       17       53        6       31       41        3 |
|    27        2       35        8       16       26        2 |
|    28        8       59       17       19       55        9 |
|    32        4       42       14       27       27        5 |
|    33        0       23       11        6       23        4 |
|-------------------------------------------------------------|
|    34        7       60        1       14       51        2 |
|    36       20       33        2       29       22        3 |
|    38        8       68       18       59       32        2 |
|    40       10       18        1       20       10        0 |
|    42       44       95       15       57       79       13 |
|-------------------------------------------------------------|

Many thanks!

***************************************************************************
This is "so not elegant" :-(

*************
clear*

input byte schid   studid  byte timep08  byte timef08
2     6910          2          2
2     6911          2          2
2     6912          2          3
2     6913          3          3
4     7299          2          2
4     7300          2          2
4     7301          3          1
4     7302          2          2
4     7303          2          2
4     7304          2          1
4     7305          1          .

end

reshape wide time*, i(schid) j(studid)

forv i=1/3{
egen byte timep`i' = anycount(timep0*), values(`i')
egen byte timef`i' = anycount(timef0*), values(`i')
}

drop timep0* timef0*
order schid timep* timef*

l, noo
*************

HTH
Martin

-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von
amardeep@ucla.edu
Gesendet: Mittwoch, 10. Februar 2010 17:25
An: statalist@hsphsun2.harvard.edu
Cc: amardeep@ucla.edu
Betreff: st: Constructing a group level variable

Hi all,

I have a dataset that consists of students (studid) in 49 schools
(schid) responding to a survey. They were asked their impressions of the
curriculum ("do you believe time devoted to subject xxx was ....") and
all responses were categorical (with 1 denoting 'not enough', 2 denoting
'just right', and 3 being 'too much'). A slice of the data is:

list    schid studid timepub08 timefin08 in 30/40

+--------------------------------------+
schid   studid   timep~08   timef~08
--------------------------------------
30.    2     6910          2          2
31.    2     6911          2          2
32.    2     6912          2          3
33.    2     6913          3          3
34.    4     7299          2          2
--------------------------------------
35.    4     7300          2          2
36.    4     7301          3          1
37.    4     7302          2          2
38.    4     7303          2          2
39.    4     7304          2          1
--------------------------------------
40.    4     7305          1          .
+--------------------------------------+

Question: Is there a way to generate a (or collapse this) dataset to get
school levels variables? I am interested in school level variables that
captures the number of responses to each category (1 'not enough' 2
'just right' and 3 'too much') for each question (timepub08 timefin08).

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```