Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: comparing xtdes-like patterns for variables |

Date |
Thu, 1 Nov 2012 13:30:31 +0000 |

I've done a quick hack of a program to show where the missings lie. Its effectiveness in showing structure seems likely to diminish with dataset size. Example: sysuse nlsw88 missingplot *! 1.0.0 NJC 1 November 2012 program missingplot version 8.2 syntax [varlist] [if] [in] [ , all varnames * ] quietly { marksample touse, novarlist count if `touse' if r(N) == 0 error 2000 local y = 0 tempvar obsno gen long `obsno' = _n if `touse' label variable `obsno' "observations" local toomany = 0 foreach v of local varlist { local include = 1 if "`all'" == "" { count if `touse' & missing(`v') if r(N) == 0 local include = 0 } if `include' { local ++y if `y' > 20 { local toomany = 1 continue, break } tempvar ynew gen `ynew' = `y' if missing(`v') if "`varnames'" != "" { local which "`v'" } else { local which : var label `v' if `"`which'"' == "" local which "`v'" } local call `call' `y' `"`which'"' local Y `Y' `ynew' } } } if "`Y'" == "" { di as txt "nothing to plot!" exit 0 } if `toomany' { di as txt "note: only first 20 variables plotted" } scatter `Y' `obsno' if `touse', /// yla(`call', ang(h) noticks) ytitle("") /// legend(off) mcolor(blue ..) ms(dh ..) `options' end On Thu, Nov 1, 2012 at 12:46 PM, Nick Cox <njcoxstata@gmail.com> wrote: > Sorry for previous premature send. > > If you had several variables you could try something like this > > local y = 0 > gen long obsno = _n > > qui foreach v of var <whatever> { > local ++y > gen y`y' = `y' if missing(`v') > local which : var label `v' > if "`which'" == "" local which "`v'" > local call `call' `y' "`which'" > local Y `Y' y`y' > } > > scatter `Y' obsno, ms(dh ..) yla(`call', ang(h) noticks) legend(off) > > >> >> On Thu, Nov 1, 2012 at 1:10 AM, Nick Cox <njcoxstata@gmail.com> wrote: >>> You could create variables like >>> >>> gen yxmiss = missing(y) - missing(x) >>> gen long obs = _n >>> >>> scatter yxmiss obs if missing(y, x) >>> >>> On Wed, Oct 31, 2012 at 7:39 PM, László Sándor <sandorl@gmail.com> wrote: >>>> Thanks, Nick. >>>> >>>> The values definitely don't line up that neatly, but that's a worry >>>> for another day. >>>> >>>> Basically my problem is, if I know I can expect differences between >>>> the variables, is there a neat way to compare their missing patterns >>>> (one always starting early, or one mistakenly having the years in >>>> reverse order)? >>>> >>>> On Wed, Oct 31, 2012 at 3:15 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>>>> If # different versions of the same data should be the same, there >>>>> will be # duplicates of everything in a combined dataset. >>>>> >>>>> This applies to missings too. >>>>> >>>>> -duplicates- is therefore something that springs to mind. Panels are >>>>> no problem, as panel identifiers are just other variables >>>>> >>>>> Naturally, if the combined dataset is extremely large, this won't be >>>>> very practical. . >>>>> >>>>> Nick >>>>> >>>>> On Wed, Oct 31, 2012 at 7:02 PM, László Sándor <sandorl@gmail.com> wrote: >>>>> >>>>>> I have a panel-data cleaning problem that probably has some neat >>>>>> solution, probably already out there. I am happy to try any solutions >>>>>> for Stata 12.1 MP. >>>>>> >>>>>> Background: I had to try to look up supposedly the same data from >>>>>> multiple sources. (Financial data for the same securities, but >>>>>> different data sources were expected to cover different subsets of my >>>>>> universe, or for different time periods.) >>>>>> >>>>>> But now I have a panel where I would like to cross-check different >>>>>> version of the same data, and most crucially, I would like to verify >>>>>> that I got the years correctly for each version. (FYI: financial data >>>>>> sources can be opaque about how they handle missing data if you ask >>>>>> for "end-of-year prices for the last 15 calendar years", and whether >>>>>> they give years in ascending or descending order). For this, I would >>>>>> like to compare what periods I have non-missing values for a family of >>>>>> variables, say, bloomberg_price and reuters_price. >>>>>> >>>>>> Presumably, if I got the start and the end years right, I could hope >>>>>> -compare- those, (e.g. -compare *_price_first- ). And hope that the >>>>>> patterns will be clear. >>>>>> >>>>>> That said, I'm afraid some more nuanced analysis of missing value >>>>>> patterns might be justified. What are good tools for that? (How can I >>>>>> "xtdes by variable"? Or "misstable pattern in a panel"?) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: comparing xtdes-like patterns for variables***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: comparing xtdes-like patterns for variables***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**RE: st: changing values to either all caps or all lowercase** - Next by Date:
**st: Stata matrix to Mata matrix form** - Previous by thread:
**Re: st: comparing xtdes-like patterns for variables** - Next by thread:
**st: Stata matrix to Mata matrix form** - Index(es):