Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: strgroup


From   Neil Shephard <nshephard@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: strgroup
Date   Tue, 22 Feb 2011 12:18:54 +0000

On Tue, Feb 22, 2011 at 11:58 AM, Emily Farchy
<emily.farchy@sant.ox.ac.uk> wrote:
> Dear All,
>
> This is a long shot, but does anyone know if its possible to input data into stata directly from pdf's?  In the following for example, data on district level population is embedded in many different pdfs (eg table 5) where there is one pdf for each district?  A long shot I know, but I thought it worth throwing it out there.

Stata can't read PDFs (well, what I really mean is that there is no
way that I'm aware of to easily read such information).

My approach (on a GNU/Linux system) would be to use a tool like
pdf2html[1] to convert en mass the PDFs to html and then use some
combination of regular expression searching (either using sed/awk/grep
or do it all in Python/Perl) to pull out the dates.

Whether this is worth the effort would depend on how many files there
are and your familiarity with such tools.

Neil

[1] http://atrey.karlin.mff.cuni.cz/~clock/twibright/pdf2html/

-- 
“Truth in science can be defined as the working hypothesis best suited
to open the way to the next better one.” - Konrad Lorenz

Email - nshephard@gmail.com
Website - http://kimura.no-ip.org/
Photos - http://www.flickr.com/photos/slackline/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index