Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Elementary question


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Elementary question
Date   Thu, 5 Dec 2002 22:13:10 -0000

Wallace, John
>
> This sounds like a FAQ to me, but a quick look around didn't show me
> anything, so please redirect me if its covered somewhere...
>
> I have a 22,000 row x 50 column flat datasheet (Excel
> format) that I'd like
> to "transpose" so that rows become columns and vice versa.
> Excel clearly
> can't do it, and I'm certain that it would be simple to do
> it in stata - but
> I can't find an appropriate command.  I thought -reshape-
> would be the one,
> but the syntax for it seems more complex than would be
> necessary. Note-I
> already saved it as a tab-delimited text file and imported it using
> -insheet-.
>
> Also, when I bring the dataset into stata, the column
> headers are changed
> from the category names to v2, v3, v4, etc.  I tried to use
> -insheet using
> filename.txt, names-.  This resulted in one category (with
> a string name)
> getting the correct header, but all the other categories
> (which had numeric
> titles) were still called v2, v3 and so on.

Richard Goldstein has pointed to -xpose-.

I am not clear what form your column names take,
but the issue may be clarified by reference to

http://www.stata.com/support/faqs/data/newexcel.html#names

==== begin extract

4.4 Variable names
Stata limits variable names to 32 characters and does not
allow within such names any characters that it uses as
operators or delimiters. Note also that your variable names
should start with a letter. People who are Excel users first
and Stata users second are often quite creative with the
names they apply to columns. Stata converts illegal column
(field) names to labels and makes a best guess at a
sensible variable name. Stata's best guess, however, may
not be as good as the name a user would choose knowing
Stata's naming restrictions.

For example, Stata will make variable names using the first
32 characters of the variable name, and use the remainder
for a label. If the first 32 characters are not unique,
subsequent occurrences will be called var1, var2, etc. or
v1, v2, etc. (If you paste the data in, the variable stub
is var; if you use insheet, the stub is v, so be careful
writing do-files.)

=== end extract

Evidently your column names are informative and they
are being mangled by Stata. It may be that you
need to tackle this upstream. For example, suppose
your data set starts out (assume suitable separators)

v1,v2,v3,v4
some text,more text,whatever,another bit
1 2 3 4
5 6 7 8

then with

insheet v1 v2 v3 v4 using myfile.csv

Stata will read this as string variables because
the first and second observations certainly both
look like strings. It seems that you need _both_
to look like strings, as the first observation
otherwise will get ignored. Not quite the Bellman's
"What I tell you three times is true", but
close.

If you can get to this then you could e.g.

* get rid of v1, v2, v3, v4 etc.
drop in 1

* save the original second record as a series of macros
forval i = 1 / 50 {
	local text`i' = v`i'[1]
}

drop in 1
destring, replace
xpose, clear

* put the macros in a column
gen str1 text = ""
forval i = 1 / 50 {
	replace text = "`text`i''" in `i'
}

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index