Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Loading long string variables (from SQL) into Stata

From	"Dimitriy V. Masterov" <[email protected]>
To	Statalist <[email protected]>
Subject	Re: st: Loading long string variables (from SQL) into Stata
Date	Fri, 30 Nov 2012 09:30:35 -0800

Jen,

You can split the long variable into substring using SQL. The function
may have a slightly different name depending what dialect of SQL you
have. For example:

odbc load, exec("select substr(longstringvar,1,240) as s1,
substr(longstringvar,241,480) as s2 from dbname.tablename") clear
dsn("dsn");

If you have lots of these, I don't know of a way to automate the
process of deciding how many pieces there should be. I would do an
initial query to get the length of the varchar variables in SQL. Then
you can maybe automate something if the need arises.

DVM

On Fri, Nov 30, 2012 at 6:01 AM, Jen Zhen <[email protected]> wrote:
> Coming back to this earlier post:
>
> Does anyone actually know how to split SQL text columns (string
> variables) after each 240 characters and save the resulting columns
> (variables) as a new table (dataset), using SQL browser (or maybe even
> from within Stata, but I guess that's probably not possible)?
>
> Thanks so much,
> JZ
>
>
> On Tue, Sep 4, 2012 at 6:25 PM, Nick Cox <[email protected]> wrote:
>> It's my understanding that
>>
>> 1. -odbc-, just like any Stata import command, expects a one-to-one
>> mapping between what it imports and the Stata variables that result.
>>
>> 2. Thus, as you say, it seems that you need to manipulate your data
>> outside Stata.
>>
>> 3. The only alternative is to read directly into Mata.
>>
>> Note that parts of long strings that are -str244- would not be much
>> use inside Stata any way, except in so far as you could re-combine
>> them in Mata.
>>
>> Nick
>>
>> On Tue, Sep 4, 2012 at 5:15 PM, Jen Zhen <[email protected]> wrote:
>>> The ideal thing would be if it was somehow possible to split the
>>> variables upon loading,
>>> or to load only the first x characters once and load only the second x
>>> characters the second time etc,
>>> but I guess if any splitting can only be done after loading then there
>>> seems to be no way around having to do the adjustment already in
>>> SQL...
>>>
>>>
>>> On Tue, Sep 4, 2012 at 4:53 PM, Nick Cox <[email protected]> wrote:
>>>> -compress- can do nothing to help and not just for the reason you
>>>> mention, that the variables in Stata were truncated on input.
>>>>
>>>> -compress- will only compress variables from one storage type to a
>>>> more compact storage type when that is possible without loss of
>>>> information. But the longer strings could not be variables to start
>>>> with and in any case there would be loss of information.
>>>>
>>>> Nick
>>>>
>>>> On Tue, Sep 4, 2012 at 2:41 PM, tashi lama <[email protected]> wrote:
>>>>> I would also like to find the answer. There is _compress but I am not sure it will be any help since the columns are already truncated before it loads. _compress seems to be helpful once the columns are successfully loaded and you want to save the memory.
>>>>
>>>> By the way, if you are running stata in unix, did you have any problem
>>>> loading sql columns with type varchar? I am having an issue while
>>>> loading columns with type varchar.
>>>>
>>>> From: [email protected]
>>>>
>>>>>> I`m loading some variables from an SQL database into Stata using the 2
>>>>>> lines of code given below.
>>>>>> One of them is a very long string variable that gets cut off when
>>>>>> loaded into Stata. Is there a way to either increase the maximum
>>>>>> length of string variables allowed in Stata, or to automatically split
>>>>>> each string at whatever is the maximum number of symbols and put the
>>>>>> rest into further string variables?
>>>>>> Thanks so much and best regards,
>>>>>> JZ
>>>>>>
>>>>>> local sql "SELECT var1, var2, var3 FROM database WHERE var1>5"
>>>>>> odbc load, exec("`sql'") conn("`db'")
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: Loading long string variables (from SQL) into Stata
  - From: Jen Zhen <[email protected]>

Prev by Date: Re: st: Bootstrapping Malmquist Poductivity Index
Next by Date: Re: st: Reshaping long to wide data from complex experimental design
Previous by thread: Re: st: Loading long string variables (from SQL) into Stata
Next by thread: st: Reshaping long to wide data from complex experimental design
Index(es):
- Date
- Thread