Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Date: Wed, 22 Sep 2010 08:37:30 +1000

From	"John Morton" <[email protected]>
To	<[email protected]>
Subject	st: Date: Wed, 22 Sep 2010 08:37:30 +1000
Date	Tue, 21 Sep 2010 18:37:34 -0400 (EDT)

Hi,

I am seeking advice on analysis of a time series dataset in Stata. The same
site was visited irregularly 30 times over 3 years (median interval between
visits 35 days, range 18 to 68 days). At each visit, usually 5 tadpoles (but
sometimes 6 or 9) were sampled (numbers were limited because this is an
endangered species). Different tadpoles were sampled at each visit. Each
tadpole was tested and categorised as test positive or test negative.
Apparent prevalences were 1.00 at about half of the visits and 0.00 at about
25% of visits. 

The researcher?s question is whether prevalence varies by month (ie Jan,
Feb, Mar etc) or by season. 

The features of this data that seem important are that the errors would be
expected to be serially correlation over time, the dependent variable is
binary, prevalences of 0 and 1 were common, the very small number of
tadpoles sampled at each visit, and these are not panel data (ie different
tadpoles were sampled at each visit).

I have done some exploratory modelling treating prevalence as a continuous
dependent variable (using -regress-) after declaring the data to be
time-series data (with sequential visit number rather than day number as the
time variable, using -tsset-). With a null model, tests for serial
correlation (Durbin-Watson test (-estat dwatson-), Durbin?s alternative (h)
test (-estat durbinalt-),Breush-Godfrey test ( -estat bgodfrey,lag(6)-),
Portmaneau (Q) test (-wntestq-) and the autocorrelogram (-ac-)(all from Baum
2006) indicate serial correlation. In contrast, after fitting month as a
fixed effect, these tests do not support rejecting the null hypothesis that
no serial correlation exists. However treating prevalence (a proportion) as
a continuous dependent variable (using -regress-) is inappropriate. 

Any suggestions on approaches to answer the research question would be much
appreciated.

Many thanks for any help.

John

***************************************************************
Dr John Morton BVSc (Hons) PhD MACVSc (Veterinary Epidemiology)
Veterinary Epidemiological Consultant
Jemora Pty Ltd
PO Box 2277
Geelong 3220
Victoria Australia
Ph:  +61 (0)3 52 982 082
Mob: 0407 092 558
Email: [email protected]
***************************************************************



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Logistic regression interpretation
Next by Date: st: Interpreting VEC output
Previous by thread: st: fuzzy merge problem
Next by thread: Re: st: Date: Wed, 22 Sep 2010 08:37:30 +1000
Index(es):
- Date
- Thread