Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: -longshape- available from SSC
From 
 
Nick Cox <[email protected]> 
To 
 
[email protected] 
Subject 
 
st: -longshape- available from SSC 
Date 
 
Fri, 7 Oct 2011 18:45:20 +0100 
Thanks to Kit Baum, a program -longshape- may now be downloaded from
SSC. Stata 9.2 is required.
-longshape- is a wrapper for -reshape long- to fix a side-effect of
-reshape long- that bites very occasionally. I've been a happy user of
-reshape- for many years until the problem bit me with a particular
kind of data very recently.
To make this concrete, consider ecological data that include
measurements of abundance for several taxa (usually but not
necessarily species) at several sites. This is the result of a
-describe- for a small dataset of this kind from
<http://www.cambridge.org/gb/knowledge/isbn/item5708032/>.
Contains data from dune.dta
  obs:            20
 vars:            36                          10 Aug 2011 18:17
 size:           860 (99.9% of memory free)
--------------------------------------------------------------------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------
id              byte   %9.0g
achmil          byte   %8.0g                  Achillea millefolium
agrsto          byte   %8.0g                  Agrostis stolonifera
airpra          byte   %8.0g                  Aira praecox
alogen          byte   %8.0g                  Alopecurus geniculatus
antodo          byte   %8.0g                  Anthoxanthum odoratum
belper          byte   %8.0g                  Bellis perennis
brohor          byte   %8.0g                  Bromus hordaceus
chealb          byte   %8.0g                  Chenopodium album
cirarv          byte   %8.0g                  Cirsium arvense
elepal          byte   %8.0g                  Eleocharis palustris
elyrep          byte   %8.0g                  Elymus repens
empnig          byte   %8.0g                  Empetrum nigrum
hyprad          byte   %8.0g                  Hypochaeris radicata
junart          byte   %8.0g                  Juncus articulatus
junbuf          byte   %8.0g                  Juncus bufonius
leoaut          byte   %8.0g                  Leontodon autumnalis
lolper          byte   %8.0g                  Lolium perenne
plalan          byte   %8.0g                  Plantago lanceolata
poapra          byte   %8.0g                  Poa pratensis
poatri          byte   %8.0g                  Poa trivialis
potpal          byte   %8.0g                  Potentilla palustris
ranfla          byte   %8.0g                  Ranunculus flammula
rumace          byte   %8.0g                  Rumex acetosa
sagpro          byte   %8.0g                  Sagina procumbens
salrep          byte   %8.0g                  Salix repens
tripra          byte   %8.0g                  Trifolium pratense
trirep          byte   %8.0g                  Trifolium repens
viclat          byte   %8.0g                  Vicia lathyroides
brarut          byte   %8.0g                  Brachythecium rutabulum
calcus          byte   %8.0g                  Calliergonella cuspidata
A1              float  %9.0g                  A1 horizon thickness (cm)
moisture        byte   %8.0g                  moisture class
management      byte   %8.0g       management
                                              management type
use             byte   %8.0g       use        grassland use
manure          byte   %8.0g                  manure class
--------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:  id
This is a natural data structure for recording data, but some analyses
require a long structure of taxon X site. To -reshape- such data there
is first a minor problem that the species names need a common prefix.
That is soluble e.g. with -renvars- (SJ) or the much improved -rename-
in Stata 12.
. renvars achmil-calcus, prefix(y)
However, a much bigger problem is evident when we do -reshape-: the
variable labels all disappear. They are really valuable detail, and
typing them in all again does not appeal.
. reshape long y, i(id) j(species) string
(note: j = achmil agrsto airpra alogen antodo belper brarut brohor
calcus chealb cirarv elepal elyrep empnig hyprad junart junbuf leoaut
lol
> per plalan poapra poatri potpal ranfla rumace sagpro salrep tripra trirep viclat)
Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       20   ->     600
Number of variables                  36   ->       8
j variable (30 values)                    ->   species
xij variables:
            yachmil yagrsto ... yviclat   ->   y
-----------------------------------------------------------------------------
Normally with a -reshape long-; that is immaterial, as the bundle of
variables to be reshaped are something like -invest1975- to
-invest2005- and the variable labels, if there are any, don't carry
any important information that is not otherwise available. The main
aim of -longshape- is to carry the variable labels automatically. In
fact, _two_ extra variables are created to give the best of both
worlds, a new string variable with the original variable names and a
new numeric variable whose value labels are the original variable
labels. (Both can be useful for subsequent graphs and tables.)
. u dune, clear
. longshape achmil-calcus, i(id) j(species) y(abundance)
. d
Contains data
  obs:           600
 vars:             9
 size:        12,600 (99.9% of memory free)
--------------------------------------------------------------------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------
id              byte   %9.0g
species         byte   %24.0g      species
_species        str6   %9s
abundance       byte   %8.0g
A1              float  %9.0g                  A1 horizon thickness (cm)
moisture        byte   %8.0g                  moisture class
management      byte   %8.0g       management
                                              management type
use             byte   %8.0g       use        grassland use
manure          byte   %8.0g                  manure class
--------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:  id  species
     Note:  dataset has changed since last saved
. tab  species
                 species |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
    Achillea millefolium |         20        3.33        3.33
    Agrostis stolonifera |         20        3.33        6.67
            Aira praecox |         20        3.33       10.00
  Alopecurus geniculatus |         20        3.33       13.33
   Anthoxanthum odoratum |         20        3.33       16.67
         Bellis perennis |         20        3.33       20.00
 Brachythecium rutabulum |         20        3.33       23.33
        Bromus hordaceus |         20        3.33       26.67
Calliergonella cuspidata |         20        3.33       30.00
       Chenopodium album |         20        3.33       33.33
         Cirsium arvense |         20        3.33       36.67
    Eleocharis palustris |         20        3.33       40.00
           Elymus repens |         20        3.33       43.33
         Empetrum nigrum |         20        3.33       46.67
    Hypochaeris radicata |         20        3.33       50.00
      Juncus articulatus |         20        3.33       53.33
         Juncus bufonius |         20        3.33       56.67
    Leontodon autumnalis |         20        3.33       60.00
          Lolium perenne |         20        3.33       63.33
     Plantago lanceolata |         20        3.33       66.67
           Poa pratensis |         20        3.33       70.00
           Poa trivialis |         20        3.33       73.33
    Potentilla palustris |         20        3.33       76.67
     Ranunculus flammula |         20        3.33       80.00
           Rumex acetosa |         20        3.33       83.33
       Sagina procumbens |         20        3.33       86.67
            Salix repens |         20        3.33       90.00
      Trifolium pratense |         20        3.33       93.33
        Trifolium repens |         20        3.33       96.67
       Vicia lathyroides |         20        3.33      100.00
-------------------------+-----------------------------------
                   Total |        600      100.00
That's it really, except that there may be a question: is there, or
will there be, a -wideshape-? Yes and no. I wrote one as a test of the
reversibility of this process, but I won't be making it public. It
isn't useful independently unless you happen to use precisely the kind
of structure that -longshape- produces, which is unlikely. Also,
-longshape- won't willingly perform unless you have -save-d your data,
so unless you wilfully destroy the original dataset you should have no
need to reverse the process.
Nick
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/