Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: AW: -longshape- available from SSC


From   "Kaulisch, Marc" <[email protected]>
To   <[email protected]>
Subject   st: AW: -longshape- available from SSC
Date   Sat, 8 Oct 2011 17:00:32 +0200

Hi Nick,

Thanks for this program, I also encountered the same problems more than once...
One additional comment: I can think of a use-case where -wideshape- might be a good complementary program (although I am not completely sure).

Example:
Wide-data reshaped into long-data
Creating new variables with long-data - for example: clusters after an optimal-matching-analysis (s. -sq- (from ssc)).
Long-data into wide-data in order to use the clusters for further analysis

Of course, I could do a merge of wide and long to add the new vars to the "old" wide data; but intuitively I would do a reshape...


Marc


-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] Im Auftrag von Nick Cox
Gesendet: Freitag, 7. Oktober 2011 19:45
An: [email protected]
Betreff: st: -longshape- available from SSC

Thanks to Kit Baum, a program -longshape- may now be downloaded from SSC. Stata 9.2 is required.

-longshape- is a wrapper for -reshape long- to fix a side-effect of -reshape long- that bites very occasionally. I've been a happy user of
-reshape- for many years until the problem bit me with a particular kind of data very recently.

To make this concrete, consider ecological data that include measurements of abundance for several taxa (usually but not necessarily species) at several sites. This is the result of a
-describe- for a small dataset of this kind from <http://www.cambridge.org/gb/knowledge/isbn/item5708032/>.

Contains data from dune.dta
  obs:            20
 vars:            36                          10 Aug 2011 18:17
 size:           860 (99.9% of memory free)
--------------------------------------------------------------------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------
id              byte   %9.0g
achmil          byte   %8.0g                  Achillea millefolium
agrsto          byte   %8.0g                  Agrostis stolonifera
airpra          byte   %8.0g                  Aira praecox
alogen          byte   %8.0g                  Alopecurus geniculatus
antodo          byte   %8.0g                  Anthoxanthum odoratum
belper          byte   %8.0g                  Bellis perennis
brohor          byte   %8.0g                  Bromus hordaceus
chealb          byte   %8.0g                  Chenopodium album
cirarv          byte   %8.0g                  Cirsium arvense
elepal          byte   %8.0g                  Eleocharis palustris
elyrep          byte   %8.0g                  Elymus repens
empnig          byte   %8.0g                  Empetrum nigrum
hyprad          byte   %8.0g                  Hypochaeris radicata
junart          byte   %8.0g                  Juncus articulatus
junbuf          byte   %8.0g                  Juncus bufonius
leoaut          byte   %8.0g                  Leontodon autumnalis
lolper          byte   %8.0g                  Lolium perenne
plalan          byte   %8.0g                  Plantago lanceolata
poapra          byte   %8.0g                  Poa pratensis
poatri          byte   %8.0g                  Poa trivialis
potpal          byte   %8.0g                  Potentilla palustris
ranfla          byte   %8.0g                  Ranunculus flammula
rumace          byte   %8.0g                  Rumex acetosa
sagpro          byte   %8.0g                  Sagina procumbens
salrep          byte   %8.0g                  Salix repens
tripra          byte   %8.0g                  Trifolium pratense
trirep          byte   %8.0g                  Trifolium repens
viclat          byte   %8.0g                  Vicia lathyroides
brarut          byte   %8.0g                  Brachythecium rutabulum
calcus          byte   %8.0g                  Calliergonella cuspidata
A1              float  %9.0g                  A1 horizon thickness (cm)
moisture        byte   %8.0g                  moisture class
management      byte   %8.0g       management
                                              management type
use             byte   %8.0g       use        grassland use
manure          byte   %8.0g                  manure class
--------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:  id

This is a natural data structure for recording data, but some analyses require a long structure of taxon X site. To -reshape- such data there is first a minor problem that the species names need a common prefix.
That is soluble e.g. with -renvars- (SJ) or the much improved -rename- in Stata 12.

. renvars achmil-calcus, prefix(y)

However, a much bigger problem is evident when we do -reshape-: the variable labels all disappear. They are really valuable detail, and typing them in all again does not appeal.

. reshape long y, i(id) j(species) string
(note: j = achmil agrsto airpra alogen antodo belper brarut brohor calcus chealb cirarv elepal elyrep empnig hyprad junart junbuf leoaut lol
> per plalan poapra poatri potpal ranfla rumace sagpro salrep tripra 
> trirep viclat)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       20   ->     600
Number of variables                  36   ->       8
j variable (30 values)                    ->   species
xij variables:
            yachmil yagrsto ... yviclat   ->   y
-----------------------------------------------------------------------------

Normally with a -reshape long-; that is immaterial, as the bundle of variables to be reshaped are something like -invest1975- to
-invest2005- and the variable labels, if there are any, don't carry any important information that is not otherwise available. The main aim of -longshape- is to carry the variable labels automatically. In fact, _two_ extra variables are created to give the best of both worlds, a new string variable with the original variable names and a new numeric variable whose value labels are the original variable labels. (Both can be useful for subsequent graphs and tables.)

. u dune, clear

. longshape achmil-calcus, i(id) j(species) y(abundance)

. d

Contains data
  obs:           600
 vars:             9
 size:        12,600 (99.9% of memory free)
--------------------------------------------------------------------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------
id              byte   %9.0g
species         byte   %24.0g      species
_species        str6   %9s
abundance       byte   %8.0g
A1              float  %9.0g                  A1 horizon thickness (cm)
moisture        byte   %8.0g                  moisture class
management      byte   %8.0g       management
                                              management type
use             byte   %8.0g       use        grassland use
manure          byte   %8.0g                  manure class
--------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:  id  species
     Note:  dataset has changed since last saved

. tab  species

                 species |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
    Achillea millefolium |         20        3.33        3.33
    Agrostis stolonifera |         20        3.33        6.67
            Aira praecox |         20        3.33       10.00
  Alopecurus geniculatus |         20        3.33       13.33
   Anthoxanthum odoratum |         20        3.33       16.67
         Bellis perennis |         20        3.33       20.00
 Brachythecium rutabulum |         20        3.33       23.33
        Bromus hordaceus |         20        3.33       26.67
Calliergonella cuspidata |         20        3.33       30.00
       Chenopodium album |         20        3.33       33.33
         Cirsium arvense |         20        3.33       36.67
    Eleocharis palustris |         20        3.33       40.00
           Elymus repens |         20        3.33       43.33
         Empetrum nigrum |         20        3.33       46.67
    Hypochaeris radicata |         20        3.33       50.00
      Juncus articulatus |         20        3.33       53.33
         Juncus bufonius |         20        3.33       56.67
    Leontodon autumnalis |         20        3.33       60.00
          Lolium perenne |         20        3.33       63.33
     Plantago lanceolata |         20        3.33       66.67
           Poa pratensis |         20        3.33       70.00
           Poa trivialis |         20        3.33       73.33
    Potentilla palustris |         20        3.33       76.67
     Ranunculus flammula |         20        3.33       80.00
           Rumex acetosa |         20        3.33       83.33
       Sagina procumbens |         20        3.33       86.67
            Salix repens |         20        3.33       90.00
      Trifolium pratense |         20        3.33       93.33
        Trifolium repens |         20        3.33       96.67
       Vicia lathyroides |         20        3.33      100.00
-------------------------+-----------------------------------
                   Total |        600      100.00

That's it really, except that there may be a question: is there, or will there be, a -wideshape-? Yes and no. I wrote one as a test of the reversibility of this process, but I won't be making it public. It isn't useful independently unless you happen to use precisely the kind of structure that -longshape- produces, which is unlikely. Also,
-longshape- won't willingly perform unless you have -save-d your data, so unless you wilfully destroy the original dataset you should have no need to reverse the process.

Nick
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index