Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: AW: Changing 2 columns of repetitive ASCII data into many columns sorted by variable


From   Gawrich Stefan <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   st: AW: Changing 2 columns of repetitive ASCII data into many columns sorted by variable
Date   Fri, 12 May 2006 15:54:45 +0200

Hi Simon,

try this:

*** Import: As far as I see the two vars are tab-seperated. Put the data in
a single text file and use insheet

insheet using "C:\temp\new1.txt", tab clear

** As a result you get two string vars: v1 and v2


*** Generate a  group-variable for each food code  
*** I'm sorry, my approach is a bit complicated: First I compute the running
number (_n) of each "***"-case, 
*** then I give the running number to each case of the following data group
and at least I exchange with a group number
*** There'll be much more elegant ways, but it works. 
gen test = _n
gen stars = 0
replace stars = test if v1 =="***"
replace stars = stars[_n-1]  if v1 !="***" & test > 1
egen group = group(stars)
drop test stars

* get rid of unneccessary data
drop if v1 =="***" | v1 == "Rectype"

** reshape from long to wide
reshape wide v2, i(group) j(v1) string


*** Delete the v2 stub before every varname
renpfix v2

*The given dataset contains only string vars. Change all "numerical" vars
from string to numerical format 
for var  CELL- GLUC INSOLNCP -LACT MALT-NA NUMB-SOUTHFIB SUCR-WATER:
destring X, gen(numX) \ drop X 
renpfix num


* Now you can "drop", "rename" or "move" the vars as you need it.





> 
> Dear All
> 
> I have recently purchased a dataset which contains food codes 
> in 7-bit 
> ASCII format from the UK Food Standards Agency.  However, 
> they are not 
> yet in a format that I can use for any meaningful analysis and their 
> technicial support is unable to assist.
> 
> The data as they stand fall into 2 columns: varname|value.  
> NUMB is the 
> food code - i.e. the variable that I want to sort the data by.
> 
> I would like to have all of the other variables along the top as 
> separate columns, so that I end up with individual food codes 
> and their 
> nutrient values - see below.  With these data I can then do dietary 
> analysis.
> 
> This format goes on for several thousand different food 
> codes, over and 
> over.  Codes are separated by the line ***
> 
> How it is now:
> 
> RECTYPE	1
> NUMB	14002
> CONST	50852
> PREV	50852
> GROUP	 FA
> NAME	 Apples, cooking, raw, peeled
> EDPOR	1
> WATER	87.7
> TOTNIT	0.05
> PROT	0.3
> FAT	0.1
> CHO	8.9
> KCALS	35
> KJ	151
> STAR	 Tr
> TOTSUG	8.9
> GLUC	2
> FRUCT	5.9
> SUCR	1
> MALT	0
> LACT	0
> SOUTHFIB	2.2
> ENGFIB	1.6
> CELL	0.6
> SOLNCP	0.6
> INSOLNCP	0.4
> LIGNIN	 Tr
> NA	2
> ***
> RECTYPE	1
> NUMB	14003
> CONST	50853
> PREV	50853
> GROUP	 FA
> NAME	 Apples, cooking, weighed with skin and core
> EDPOR	0.73
> WATER	63.1
> 
> etc
> 
> What I would like to have:
> 
> Code    | Description | Group | Water | Fat | CHO | Protein | 
> Sodium | 
> Potassium
> 
> 14001   |  Apples     | Fruit |  26   | 5.1 | 4.2 |   3.6   | 
> 0.1    |  0.04
> 14002   |  Oranges    | Fruit |  etc
> 
> 
> Does anyone have any suggestions as to how to arrange the data into 
> something I can use?
> 
> Thanks,
> 
> Simon Wheeler
> 
> 



 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index