Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Encoding and matching string values

From	"Martin Weiss" <[email protected]>
To	<[email protected]>
Subject	st: RE: Encoding and matching string values
Date	Fri, 24 Sep 2010 23:53:00 +0200

<>

I am not sure the description here is clear enough: -encode- forces you to -generate()- the new numeric variable, so that both the string and its -encode-d counterpart coexist afterwards. So it is hard to see how a) your dataset is supposed to decrease in size via -encode- b) how the "original string values" are no longer there...


How does Stata (_not STATA_) "...mess up the the numerical values after appending the dataset"?

HTH
Martin


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Florian Seliger
Sent: Freitag, 24. September 2010 20:57
To: [email protected]
Subject: st: Encoding and matching string values

Hi,
we have about 300 individual company files, each file with up to 100,000 patents. To each patent, up to 500 patent_numbers and citations (string values) are assigned.  In the next step, we would like to put all files together and match the values to each other.

First, we  want to decrease the enormous sizes of the datasets by using the encode command on the strings.

However, after encoding each individual file’s variables and using the append command, the numerical values cannot be decoded correctly at all so that the string values become wrong.

The reason is that STATA messes up the the numerical values after appending the dataset.
Therefore, we search for a possibility to use the encode command, but still keep the original string values after appending the datasets in a way that a matching is possible.

Thank you in advance,
Florian
-- 
Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!  
Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: Encoding and matching string values
  - From: Eric Booth <[email protected]>

References:
- st: Encoding and matching string values
  - From: "Florian Seliger" <[email protected]>

Prev by Date: RE: st: Estimation Storage with XTPQML
Next by Date: st: Time-series data with many '0' observations
Previous by thread: st: Encoding and matching string values
Next by thread: Re: st: RE: Encoding and matching string values
Index(es):
- Date
- Thread