Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Changing 2 columns of repetitive ASCII data into many columnssorted by variable


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Changing 2 columns of repetitive ASCII data into many columnssorted by variable
Date   Fri, 12 May 2006 15:41:44 +0100

You have some very good advice, so this 
may well be redundant, but better three
solutions than none. 

I assume variables -var1- -var2- and 
that lines "***" are just separators. 

drop if var1 == "***"
gen col = var1 == "RECTYPE"
replace col = col[_n-1] + 1 if col == 0 
gen id = var1 == "RECTYPE"
replace id = sum(id) 
su col, meanonly 
forval i = 1/`r(max)' { 
	local names "`names' `=var1[`i']'" 
}
drop var1
reshape wide var2, i(id) j(col) 
local i = 1 
foreach v of var var2* { 
	rename `v' `: word `i' of `names''
	local ++i
}

Nick 
n.j.cox@durham.ac.uk 

Simon Wheeler
 
> I have recently purchased a dataset which contains food codes 
> in 7-bit 
> ASCII format from the UK Food Standards Agency.  However, 
> they are not 
> yet in a format that I can use for any meaningful analysis and their 
> technicial support is unable to assist.
> 
> The data as they stand fall into 2 columns: varname|value.  
> NUMB is the 
> food code - i.e. the variable that I want to sort the data by.
> 
> I would like to have all of the other variables along the top as 
> separate columns, so that I end up with individual food codes 
> and their 
> nutrient values - see below.  With these data I can then do dietary 
> analysis.
> 
> This format goes on for several thousand different food 
> codes, over and 
> over.  Codes are separated by the line ***
> 
> How it is now:
> 
> RECTYPE	1
> NUMB	14002
> CONST	50852
> PREV	50852
> GROUP	 FA
> NAME	 Apples, cooking, raw, peeled
> EDPOR	1
> WATER	87.7
> TOTNIT	0.05
> PROT	0.3
> FAT	0.1
> CHO	8.9
> KCALS	35
> KJ	151
> STAR	 Tr
> TOTSUG	8.9
> GLUC	2
> FRUCT	5.9
> SUCR	1
> MALT	0
> LACT	0
> SOUTHFIB	2.2
> ENGFIB	1.6
> CELL	0.6
> SOLNCP	0.6
> INSOLNCP	0.4
> LIGNIN	 Tr
> NA	2
> ***
> RECTYPE	1
> NUMB	14003
> CONST	50853
> PREV	50853
> GROUP	 FA
> NAME	 Apples, cooking, weighed with skin and core
> EDPOR	0.73
> WATER	63.1
> 
> etc
> 
> What I would like to have:
> 
> Code    | Description | Group | Water | Fat | CHO | Protein | 
> Sodium | 
> Potassium
> 
> 14001   |  Apples     | Fruit |  26   | 5.1 | 4.2 |   3.6   | 
> 0.1    |  0.04
> 14002   |  Oranges    | Fruit |  etc
> 
> 
> Does anyone have any suggestions as to how to arrange the data into 
> something I can use?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index