[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: follow up on Stata

From   "Prashant Shukla" <>
To   <>
Subject   st: RE: follow up on Stata
Date   Wed, 8 Oct 2008 23:07:56 +0100

Hi Martin,

Thanks again for the answers. I was thinking and reading about those
commands too. While I can't give you sample data b/c of privacy
commitments to the data source I can definitely expand more upon my

Say we have some demographic data on the employees of the statistics
department of a University. It contains data like First Name, Last name,
Date of Birth, Hire Date, various address fields etc, etc. Also, let's
assume it has 30 variables.

Now say the department is shifting payroll systems and this data needs
to be in a "new" format so that the new system can accept it. So this
new demographic file needs to have 40 variables, much of it coming from
our old demographic file, but some to be created and left blank. So you
know, the formats for dates are different and address fields have
different labels but much of the data is being mapped from the old file.

As I said, we could just work on the original data set to format it to
suit the new system. But, if I am doing this not just for the statistics
department, but all the departments at the university, then I would much
rather write one program that can map the data across to the standard
format for the new payroll system, even when the old demographic data
format might be different across departments.

Does this explanation make more sense? I hope it does. Please let me
know if I can explain anything better. Thanks so much.

Answers from all are very much welcome!


-----Original Message-----
From: Martin Weiss [] 
Sent: Wednesday, October 08, 2008 3:12 PM
Cc: Prashant Shukla
Subject: Re: follow up on Stata 

Hello Prashant,

it is very good to hear from you, but you should use the list for any 
questions so that the whole community can profit from the responses and
get archived and can be dug out by future Stata beginners.

Several commands come to mind, such as -joinby- -merge- -append-, but 
also -tempfile-s where you can store a dataset temporarily and then do 
things to it later. But in the absence of a concrete example, it is hard
know what "populating from another dataset" might actually mean. Give me
hands-on sample of your data, and I will do my best to solve your

----- Original Message ----- 
From: "Prashant Shukla" <>
To: "Martin Weiss" <>
Sent: Wednesday, October 08, 2008 10:30 PM
Subject: follow up on Stata

Hi Martin,

Thanks so much again for your answers again. I am writing to you
personally as your answers were the most helpful to me. I think you are
doing a great service to all of us Stata users with all your prolific
suggestions. I hope you don't mind me contacting you this way

I just wanted to ask you one more question, for now. Basically, now that
I have all the variables generated and formatted correctly. How can I
map variables from other datasets into this new blank one? For instance,
our new data set has variable "ZIP" that needs to be populated with
"Zipcode" from another data set. How would I do that? Of course, I could
just rename the old dataset and format it the way I want it.

But, I want to write programs that can just develop a dataset for me
where I populate variables from other data sets in any format. I have to
do this a lot, very, very often. I am sure you see how this would be
more efficient in the long run as opposed to dealing with every dataset

Thank you so much Martin,



Welcome and good to see you in the community! On the first question, you
can open as many instances of Stata as your computer allows; or you can
-append- or -merge- datasets. On the second, every time you open an
instance of Stata, "a new dataset" is already there. -generate- as many
variables as you like, -format- them and -save- them in .dta format. If
you want to get file transfers, both in and out of Stata, right fast, I
would recommend investing in Stat/Transfer which is great in combination
with the -ssc describe stcmd- package.
BTW, the red columns are string variables. Stata is not particularly
fond of those, so most of the time you want to -decode- or -destring-
them. After -destring-, they will be highlighted in blue in the Data
Editor and Data Browser. Note that not all strings should be treated
this way, though. If you have a dataset whose 250 rows are populated by
the countries of the earth, you probably should leave the "country"
variable as string...


*   For searches and help try:

© Copyright 1996–2023 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index