Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Sarah Edgington" <sedging@ucla.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: complete data import |

Date |
Wed, 9 Apr 2014 13:33:08 -0700 |

Doug, See below for example code that I think does what you want. One assumption I'm making here is that no record has more than 1 mem or tnd line. As long as that's true this should work. For this example I'm assuming that all the original variables ended up importing as strings, but it should work find even if that isn't true. I just didn't feel like figuring out which columns were pure numeric and which ended up being a mix of numeric and strings. There are probably many other ways to do this. This is just what occurred to me first. -Sarah clear input str15 v1 str15 v2 str15 v3 str15 v4 str15 v5 str15 v6 str15 v7 str15 v8 ITM S 0 1 1.35 5100001261 soup TND N 0 0 0 1.35 0 0 ITM S 0 1 1.35 5100001261 soup TND N 0 0 0 1.35 0 0 MEM N 280211400001 ITM S 0 1 1.35 5100001261 soup TND N 0 0 0 0 11.28 0 ITM S 0 1 1.2`i' 5000001011 milk TND N 0 0 0 1.2`i' 0 0 MEM N 28021140015`i' ITM S 0 1 1.2`i' 5000001011 milk ITM S 0 1 1.2`i' 5000001011 milk ITM S 0 1 1.2`i' 5000001011 milk TND N 4 0 0 0 0 0 end /*create a line number to be able to retain original ordering */ gen orgline=_n /*now create an identifier within the tnd line */ /*sort on v1 and within that orgline, assign id based on _n for tnd records only */ bysort v1 (orgline) : gen id=_n if v1=="TND" /*now back to original ordering */ sort orgline /* since tnd is always the end of the record, flip the order of the original data so that the last line is first */ /* this will make it easier to populate id to the next line until a new record is encountered */ /* now new record will start with tnd, so non-missing id is sign of new record. Only fill forward if missing id */ gsort -orgline replace id=id[_n-1] if missing(id) /*now create variables from the mem and tnd lines so that these can be distributed to all itm lines within that id */ /*first make a loop to create v9-v15 out of tnd vars */ forv i=2/7 { local j=7+`i' gen v`j'=v`i' if v1=="TND" replace v`i'="" if v1=="TND" } /* use same logic to create v15 & v16 out of mem vars */ forv i=2/3 { local j=13+`i' gen v`j'=v`i' if v1=="MEM" replace v`i'="" if v1=="MEM" } /* now we can sort on the id var and use similar strategy as with id to distribute to all lines values of v8-v15 */ /* create flags for the tnd and mem lines so this can be used in a sort */ mark tnd if v1=="TND" mark mem if v1=="MEM" gsort id -tnd forv i=9/14 { by id: replace v`i'=v`i'[_n-1] if missing(v`i') } gsort id -mem forv i=15/16 { by id: replace v`i'=v`i'[_n-1] if missing(v`i') } drop if inlist(v1,"TND","MEM") /*now look at the example data to see if it looks as expected*/ /*may need to use -order- command to get variables into desired final order */ sort orgline list * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: complete data import***From:*Douglas Levy <douglas_levy@post.harvard.edu>

**Re: st: complete data import***From:*Sergiy Radyakin <serjradyakin@gmail.com>

**Re: st: complete data import***From:*Douglas Levy <douglas_levy@post.harvard.edu>

- Prev by Date:
**Re: st: complete data import** - Next by Date:
**Re: st: complete data import** - Previous by thread:
**Re: st: complete data import** - Next by thread:
**Re: st: complete data import** - Index(es):