Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Converting count to dichotomous variable


From   Eric Booth <[email protected]>
To   [email protected]
Subject   Re: st: Converting count to dichotomous variable
Date   Sat, 10 Mar 2012 19:11:42 -0600

<>
Hi Aminu:

"nothing works" doesnt help us help you.   What, exactly, didn't work from my example? (what code did you run? what error did you get? etc.)

The code example I provided in the previous email _does_ work on the fake data I generated in the example. Also, it _does_ plot a line-graph of monthly mortalities by pond.

 However, the way you describe your dataset in your last email does not match how you initially described your dataset at all.  So, the most obvious answer here is that you need to align your variable names between your two emails and adapt my example accordingly (e.g., in your initial email the daily mortality was "var1", now you say it is "var3", in your initial email pond was "var3", not you say it is "var5", etc).  [Strategically naming your varnames is often a good idea for helping you avoid this issue (call your variable "pond" instead of "var5" or "var3" or whatever it is now).]

If making these adjustments to your variable names doesnt fix your error, then report what you ran & what error you got.

EAB
__
Eric A. Booth
Public Policy Research Institute 
Texas A&M University
[email protected]
+979.845.6754



On Mar 10, 2012, at 6:28 PM, Shittu, Aminu wrote:

> Hi Eric,
> 
> Nothing works, probably it was my fault. Below is the full description of my data set:
> 
> var1 = serial number (1 - 2783 records)
> var2 = date/day of sampling (1st Jan - 31 Dec for each of the 7 ponds)
> var3 = total mortality recorded daily (count)
> var4 = present total number of fish in the pond (count)
> var5 = pond number (1 - 7)
> 
> I am interested in plotting a single line-graph showing a trend of monthly mortality by pond number (var5).
> 
> Hope this one is clear and apologies for my previous vague description.
> 
> Aminu.
> 
> 
> 
> ----- Original Message -----
> From: Eric Booth <[email protected]>
> To: [email protected]
> Cc: 
> Sent: Saturday, March 10, 2012 9:57:40 PM
> Subject: Re: st: Converting count to dichotomous variable
> 
> On Mar 10, 2012, at 3:05 PM, Shittu, Aminu wrote:
>> 
>>   Is it possible to force them in a single graph with 7 lines each for a pond, showing monthly totals?
> 
> 
> 
> 
> Building on the previous example....
> 
> ***************!
> clear
> 
> *--fake data
> set obs 7
> g var3 = _n
> expand 365
> sort var3
> g var2 = int(30+runiform()*100)
> g var1 = int(runiform()*4)
> su var*
> by var3:  g day = _n
> 
> *---create fake month var:
> by var3: g month = int(runiform()*12+1)
> loc i = 1
> foreach m in `c(Mons)' { 
>     loc l `l' `i' `"`m'"' 
>     loc i `++i'
>     } //label.months
> lab def jj `l', modify
> lab val month jj
> ta month //use your real month data for real day counts
> 
> 
> 
> 
> *---graph: monthly totals for each pond (var3)
> 
> *--1. create monthly totals of mortalities(var1)
> bys month var3: egen mt_var1 = total(var1)
> lab var mt_var1 "monthly total mortalities for each pond"
> bys month: egen mt_var2 = total(var1)
> lab var mt_var2 "monthly total mortalities for ALL ponds"
> lab var var3 "Pond"
> 
> 
> *--2. graphs
> 
> **graph macro for options:
> loc opts `"name(g1, replace) xtitle(Month) xlabel(#12, labels labsize(small) angle(forty_five) valuelabel) title(Mortalities Totals each Month) scheme(sj) xsize(10) ysize(6)"' //watch for wrapping!!
> 
> 
> **a. Overall Totals
> twoway (line mt_var2 month,  ///
>      sort  cmissing(y)),  ///
>      subtitle({bf:All Ponds}) `opts' 
>     graph save g1 `"overall.gph"', replace
> 
> **b. by Pond
> *-build graph command:
>     levelsof var3, loc(ponds)
> foreach x in `ponds' {
>     loc p  `" `p'  (line mt_var1 month if var3 == `x') "' 
>     loc leg `" `leg' `x' `"Pond `x'"' "'
>     } //line for each pond
>     di `"`leg'"'
> twoway `p',  legend(on order(`"`leg'"') size(vsmall)) ///
>     subtitle({bf:By Ponds}) `opts'
>     graph save g1 `"ByPond.gph"', replace
> 
> ***************!
> 
> 
> 
> - Eric
> 
> __
> Eric A. Booth
> Public Policy Research Institute 
> Texas A&M University
> [email protected]
> +979.845.6754
> 
> On Mar 10, 2012, at 3:05 PM, Shittu, Aminu wrote:
> 
>>   Hi Eric and all,
>> 
>> Thank you very much, it works perfectly!
>> 
>> May I further ask on how I could plot a single line graph showing the total number of daily or monthly mortality totals in each of the 7 ponds? My hurdle is that mortality was recorded at least everyday for 365 days (365*7), say from 1st - 31st of January.....1st to 31st of December for each of the 7 ponds. If I am to work with the monthly record, I will have 30*12*7 or 31*12*7 rows/pond except for February which has 28 in this case. Is it possible to force them in a single graph with 7 lines each for a pond, showing monthly totals?
>> 
>> Aminu.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ________________________________
>> From: Eric Booth <[email protected]>
>> To: [email protected] 
>> Sent: Friday, March 9, 2012 10:23:31 PM
>> Subject: Re: st: Converting count to dichotomous variable
>> 
>> 
>> 
>> <>
>> 
>> Sorry, your mortalities were in var1, not var2, so the line:
>> 
>> bys day: egen var4_b = max(var2)
>> 
>> should have said:
>> 
>> bys day: egen var4_b = max(var1)
>> 
>> but the main point is the same.
>> 
>> EAB
>> 
>> __
>> Eric A. Booth
>> Public Policy Research Institute
>> Texas A&M University
>> [email protected]
>> Office: +979.845.6754
>> 
>> 
>> 
>> 
>> On Mar 9, 2012, at 3:57 PM, Eric Booth wrote:
>> 
>> <>
>> 
>> One strategy is to take the max number of mortalities (via -egen-) across all obs. in each day (or each pond/day -- I'm not sure which you need) and then recode anything greater than zero as a "1" indicating that mortalities occurred (leaving it as zero otherwise).  Here a quick example:
>> 
>> 
>> ***************!
>> clear
>> 
>> *--fake data
>> set obs 7
>> g var3 = _n
>> expand 365
>> sort var3
>> g var2 = int(30+runiform()*100)
>> g var1 = int(runiform()*4)
>> su var*
>> 
>> 
>> 
>> **var4**
>> *--daily mortality
>> by var3:  g day = _n
>> 
>> *---1. for each pond/day
>> g var4_a = 1 if var1 != 0 //no mortalities each pond/day
>> replace var4_a = 0 if mi(var4_a)
>> 
>> *---2.  for each day only
>> bys day: egen var4_b = max(var2)
>> recode var4_b (1/max = 1) (0=0)
>> 
>> su var4*
>> 
>> 
>> ***************!
>> 
>> - Eric
>> 
>> __
>> Eric A. Booth
>> Public Policy Research Institute
>> Texas A&M University
>> [email protected]
>> Office: +979.845.6754
>> 
>> 
>> On Mar 9, 2012, at 3:30 PM, Aminu Shittu wrote:
>> 
>> Dear Statalisters,
>> 
>> I have var1 in my data set which represents number of death in a fish pond and var2 representing the existing number. The daily mortality was recorded in 7 fish ponds (var3) for a period of 1 year (365x7 rows). I am interested in creating a dichotomous var4, to indicate whether a daily mortality had occurred or not, without taking number of counts into consideration. Is it possible to do this in Stata or Excel?
>> 
>> Aminu.
>> *
>> *   For searches and help try:
>> *  http://www.stata.com/help.cgi?search
>> *  http://www.stata.com/support/statalist/faq
>> *  http://www.ats.ucla.edu/stat/stata/
>> 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index