Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: survey answers imported from google, checkbox type


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: survey answers imported from google, checkbox type
Date   Mon, 22 Apr 2013 08:48:14 +0100

What's wrong with the original data structure? If you want wide, but
something a bit different, it is likely to be easiest to return to
that and then change.

Nick
njcoxstata@gmail.com


On 22 April 2013 05:58, Steven Young <youngazn@gmail.com> wrote:
> Ok I may have figured out how to .... re-arrange them.
>
> I used sort, by var: gen newj = 1 if _n == 1
> replace newj = sum(newj)
>
> Now I run into the problem of, trying to reshape back into Wide, it
> says that newj is not unique within id; there are multiple
> observations at the same newj within id. How can I persuade it to
> ignore that and just reshape it?
>
> I think maybe this solution may not work, because I created a newj
> data value of 4, and so when it tries to remake the columns based on
> newj, it will not know how to properly take that into account.
>
> I guess the same problem exists, because the original id does not have
> the 4th "option" of "none"...
>
> On Sun, Apr 21, 2013 at 6:40 PM, Steven Young <youngazn@gmail.com> wrote:
>> I did read them, and I just used reshape, but it is still somewhat lacking...
>>
>> After I used split, I wrote "gen id = _n" and reshaped based on the
>> stub from the split
>>
>> Now I have:
>>
>> id      _j          stub
>> 1       1           boy girl
>> 1       2           boy boy
>> 1       3           girl girl
>> 2       1           boy girl
>> 2       2           girl girl
>> 2       3
>> 3       1           boy boy
>> 3       2
>> 3       3
>> 4       1           girl girl
>> 4       2
>> 4       3
>>
>> I managed to assign a numerical value through gen and recode to each
>> stub's data, and relabeled it as well.
>>
>> Two last questions:
>>
>> There's an option for "none". That means someone did not pick "boy
>> girl", "boy boy" or "girl girl". As it is, right now that option would
>> show up under _j = 1.
>> id     _j     stub
>> 5      1      none
>> 5      2
>> 5      3
>>
>> What's the best way to take this into consideration/introduce this var
>> so that all id's have that 4th option??
>>
>> How do I now re-sort/reshape this back into wide so that it shows up as:
>>         Var1       Var2       Var3       Var4
>> 1     boy girl   boy boy  girl girl
>> 2     boy girl                  girl girl
>> 3                    boy boy
>> 4                                   girl girl
>> 5                                                 None
>>
>> On Sun, Apr 21, 2013 at 5:26 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>> Force is on your mind. Better to think of persuasion. Specific answers below.
>>>
>>> Nick
>>> njcoxstata@gmail.com
>>>
>>>
>>> On 21 April 2013 23:15, Steven Young <youngazn@gmail.com> wrote:
>>>> Thanks Nick for your reply.
>>>>
>>>> I used tabsplit from tab_chi (SSC). However it just lists the
>>>> tabulation. Is there a way to force it to create new variables based
>>>> on the splitting?
>>>
>>> -tabsplit- restructures the dataset temporarily to do what it does.
>>> The tabulation uses -tabulate-, but the original data structure is
>>> restored. That's deliberate. If you want something else, you are free
>>> to clone the program and rewrite it accordingly.But this is the same
>>> question as the next really, so see below.
>>>
>>>> I read through the Stata support, and I liked split. I can use it to
>>>> break the compound strings at the "," (comma).
>>>>
>>>> One thing I'm running into now, is that for instance in the original Var:
>>>> 1    "boy girl, boy boy, girl girl"
>>>> 2    "boy girl, girl girl"
>>>> 3    "boy boy"
>>>> 4    "girl girl"
>>>>
>>>> When using split, it of course makes 3 new vars called Var1, Var2, Var3.
>>>> It also splits the data in the order it sees it.
>>>>
>>>>     Var1       Var2       Var3
>>>> 1  boy girl  boy boy   girl girl
>>>> 2  boy girl  girl girl
>>>> 3  boy boy
>>>> 4  girl girl
>>>>
>>>> Is there a way to force split to appropriately split them so that they
>>>> are under the same var name?
>>>
>>> You want a -stack- or -reshape-. Advice at length was given in the
>>> references I gave earlier, so I have to guess you haven't read them.
>>>
>>> Nick
>>>
>>>> On Thu, Apr 18, 2013 at 1:40 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>> See (for example)
>>>>>
>>>>> -tabsplit- in -tab_chi- (SSC)
>>>>>
>>>>> FAQ     . . . . . . . . . . . . . . . . . . .  Dealing with multiple responses
>>>>>         . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox and U. Kohler
>>>>>         4/05    How do I deal with multiple responses?
>>>>>                 http://www.stata.com/support/faqs/data/multresp.html
>>>>>
>>>>> SJ-5-1  st0082  . . . . . . . . . . . . . . . Tabulation of multiple responses
>>>>>         (help _mrsvmat, mrgraph, mrtab if installed)  . . . . . . . .  B. Jann
>>>>>         Q1/05   SJ 5(1):92--122
>>>>>         introduces new commands for the computation of one- and
>>>>>         two-way tables of multiple responses
>>>>>
>>>>> SJ-3-1  pr0008   Speaking Stata: On structure & shape: the case of mult. resp.
>>>>>         . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox & U. Kohler
>>>>>         Q1/03   SJ 3(1):81--99                                   (no commands)
>>>>>         discussion of data manipulations for multiple response data
>>>
>>>>> Nick
>>>>> njcoxstata@gmail.com
>>>>>
>>>>> On 18 April 2013 08:29, Steven Young <youngazn@gmail.com> wrote:
>>>>>
>>>>>> So I have a survey with answers imported from Google.
>>>>>>
>>>>>> One of the questions asks "Which have you heard of" and lists 4 items below
>>>>>> in a checkbox fashion (tick all that you know).
>>>>>>
>>>>>> Google aggregated the data into one cell, so a person (each row) may answer
>>>>>> "a, b, d", a second may answer "a, b, c" and a third may answer "a, d".
>>>>>> Unfortunately each of these answers are quite long... not as short as a, b,
>>>>>> c, d. I also cannot change how Google "aggregates" this data into one cell.
>>>>>>
>>>>>> Now the issue I have is that when it's imported to stata, it will list in
>>>>>> one cell, each of the selected items, separated by comma.
>>>>>>
>>>>>> How do I go about making a "do" file that will go through this and find out
>>>>>> what each person answered, ie make sub-columns of answer choice a, b, c, d,
>>>>>> and then assigning a value of 1 to each column that the person answered?
>>>>>>
>>>>>> For instance if Joe answered a, b, d, then his answer columsn will be 1, 1,
>>>>>> 0, 1.
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index