Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: reshape long is changing j values


From   Tiffany Shih <tmshih@gmail.com>
To   statalist@hsphsun2.harvard.edu, Nick Cox <n.j.cox@durham.ac.uk>
Subject   Re: st: RE: reshape long is changing j values
Date   Fri, 27 Jan 2012 14:05:34 -0500

Thanks so much Nick! That worked brilliantly, and I need the dates in string eventually anyways so it's perfect. I doubt I would have figured that out on my own!

Thanks again!

Tiffany  
-----------------

Tiffany Shih
Ph.D. Candidate
tmshih@gmail.com
tshih@berkeley.edu
Agriculture and Resource Economics
http://are.berkeley.edu
UC Berkeley




On Jan 27, 2012, at 1:58 PM, Nick Cox wrote:

> What is biting you here is that your date variable contains very large integers which are being put into a -float- variable. That doesn't have enough precision to hold every distinct value, so your dates are being mangled. 
> 
> We can see this directly 
> 
> . gen foo = 20080111
> 
> . di foo[1]
> 20080112
> 
> The default default [repetition intended] type for a new numeric variable is -float-, but for values of the order of 20 million only even integers can be held exactly; odd integers are approximated, as you have observed. 
> 
> One way to fix this is 
> 
> reshape long maxtempF, i(store_city_id threshold) j(date) string 
> destring date, replace 
> 
> Notes:
> 
> 1. The above insists on mapping the dates to a string variable, after which -destring- is smart enough to see that the numeric information inside doesn't get mangled. 
> 
> 2. You don't need to specify values in the -j()- option. 
> 
> Nick 
> n.j.cox@durham.ac.uk 
> 
> Tiffany Shih
> 
> I am trying to reshape wide weather data to long format and while the reshape command completes, the resulting long form data are incorrect.
> 
> In wide form, my variables are
> 
> . de
> 
> Contains data from tmpminmaxtempF.dta
> obs:            10                          
> vars:            14                          27 Jan 2012 09:16
> size:           240 (99.9% of memory free)
> 
> -------------------------------------------------------------------------------
>             storage  display     value
> variable name   type   format      label      variable label
> -------------------------------------------------------------------------------
> maxtem~20080101 byte   %10.0g                 20080101 maxtempF
> maxtem~20080102 byte   %10.0g                 20080102 maxtempF
> maxtem~20080103 byte   %10.0g                 20080103 maxtempF
> maxtem~20080104 byte   %10.0g                 20080104 maxtempF
> maxtem~20080105 byte   %10.0g                 20080105 maxtempF
> maxtem~20080106 byte   %10.0g                 20080106 maxtempF
> maxtem~20080107 byte   %10.0g                 20080107 maxtempF
> maxtem~20080108 byte   %10.0g                 20080108 maxtempF
> maxtem~20080109 byte   %10.0g                 20080109 maxtempF
> maxtem~20080110 byte   %10.0g                 20080110 maxtempF
> maxtem~20080111 byte   %10.0g                 20080111 maxtempF
> maxtem~20080112 byte   %10.0g                 20080112 maxtempF
> store_city_id   float  %9.0g                  
> threshold       float  %9.0g                  
> -------------------------------------------------------------------------------
> 
> 
> The command I am using is:
> 
> "reshape long maxtempF, i(store_city_id threshold) j(date 20080101 20080102 20080103 20080104 20080105 20080106 20080107 20080108 20080109 20080110 20080111 20080112)"
> 
> The resulting long form data has the correct variable names, but it is missing all the odd values in "date" and seems to have doubled them up into the even values of date. For example, it introduces a new value of date, 20080100, and there is no 20080101. It seems to be substituting in the value of date from maxtempF20080101 into the long form data for 20080100. In addition, there is no value "date" for 20080103, 20080105, 20080107, etc., and instead there are two entries for each of 20080102, 20080104,... for each store_city_id with values in maxtempF that should be in the corresponding odd numbers.
> 
> If I repeat the same command but leave out the j values, the reshape only reshapes the even values and treats the odd values like i variables. Same problem if I turn store_city_id and threshold into one variable.
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index