Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: survey answers imported from google, checkbox type


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: survey answers imported from google, checkbox type
Date   Mon, 22 Apr 2013 19:19:57 +0100

Thanks for the closure report. For anyone following the reference is to

FAQ . . . . . . . . . . . . . . . . . . . Dealing with multiple responses
. . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and U. Kohler
4/05 How do I deal with multiple responses?
http://www.stata.com/support/faqs/data/multresp.html


Nick
njcoxstata@gmail.com


On 22 April 2013 18:36, Steven Young <youngazn@gmail.com> wrote:
> I figured out how to format the data into what I want,
> gen id = _n
> split var, p(,)
> drop var
> reshape long var , i(id) string
>
> gen number = 1 if var == "string1"
> recode . = 1 if var == " string1"     *Unfortunately the dataset had
> values that had a leading space... so I had to take this into accout
> ...
>
> drop if number == .
>
> drop var _j
>
> *encode numerical value of yes = 1
> gen byte one = 1
>
> reshape wide one, i(id) j(number)
>
> rename one1 string1
> rename one2 string2
> rename one3 string3
> rename one4 string4
>
>
> In this way I am able to shape the data set from split (and all the
> things are split by the comma, but not categorized under the same
> "variable name") and now have var names that correspond to the
> multiple response properly. Espcially now I've taken into account the
> values for "none"
> So id1 will now show a 1 under string1 and string3, while id2 will now
> show a 1 under string1 string2 and string3, and they will all
> correspond to what the original data set had.
>
>
> FAQ 3.5 was very helpful from your link, after a very very thorough
> reading. I was tempted to use Tabulate number, gen(g) as well, to
> create dummy variables, but saw how 3.5 was a little more efficient,
> especially when I dropped all of the missing observations and assigned
> everything a value of 1, and then reshaped.
>
> On Mon, Apr 22, 2013 at 12:48 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> What's wrong with the original data structure? If you want wide, but
>> something a bit different, it is likely to be easiest to return to
>> that and then change.
>>
>> Nick
>> njcoxstata@gmail.com
>>
>>
>> On 22 April 2013 05:58, Steven Young <youngazn@gmail.com> wrote:
>>> Ok I may have figured out how to .... re-arrange them.
>>>
>>> I used sort, by var: gen newj = 1 if _n == 1
>>> replace newj = sum(newj)
>>>
>>> Now I run into the problem of, trying to reshape back into Wide, it
>>> says that newj is not unique within id; there are multiple
>>> observations at the same newj within id. How can I persuade it to
>>> ignore that and just reshape it?
>>>
>>> I think maybe this solution may not work, because I created a newj
>>> data value of 4, and so when it tries to remake the columns based on
>>> newj, it will not know how to properly take that into account.
>>>
>>> I guess the same problem exists, because the original id does not have
>>> the 4th "option" of "none"...
>>>
>>> On Sun, Apr 21, 2013 at 6:40 PM, Steven Young <youngazn@gmail.com> wrote:
>>>> I did read them, and I just used reshape, but it is still somewhat lacking...
>>>>
>>>> After I used split, I wrote "gen id = _n" and reshaped based on the
>>>> stub from the split
>>>>
>>>> Now I have:
>>>>
>>>> id      _j          stub
>>>> 1       1           boy girl
>>>> 1       2           boy boy
>>>> 1       3           girl girl
>>>> 2       1           boy girl
>>>> 2       2           girl girl
>>>> 2       3
>>>> 3       1           boy boy
>>>> 3       2
>>>> 3       3
>>>> 4       1           girl girl
>>>> 4       2
>>>> 4       3
>>>>
>>>> I managed to assign a numerical value through gen and recode to each
>>>> stub's data, and relabeled it as well.
>>>>
>>>> Two last questions:
>>>>
>>>> There's an option for "none". That means someone did not pick "boy
>>>> girl", "boy boy" or "girl girl". As it is, right now that option would
>>>> show up under _j = 1.
>>>> id     _j     stub
>>>> 5      1      none
>>>> 5      2
>>>> 5      3
>>>>
>>>> What's the best way to take this into consideration/introduce this var
>>>> so that all id's have that 4th option??
>>>>
>>>> How do I now re-sort/reshape this back into wide so that it shows up as:
>>>>         Var1       Var2       Var3       Var4
>>>> 1     boy girl   boy boy  girl girl
>>>> 2     boy girl                  girl girl
>>>> 3                    boy boy
>>>> 4                                   girl girl
>>>> 5                                                 None
>>>>
>>>> On Sun, Apr 21, 2013 at 5:26 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>> Force is on your mind. Better to think of persuasion. Specific answers below.
>>>>>
>>>>> Nick
>>>>> njcoxstata@gmail.com
>>>>>
>>>>>
>>>>> On 21 April 2013 23:15, Steven Young <youngazn@gmail.com> wrote:
>>>>>> Thanks Nick for your reply.
>>>>>>
>>>>>> I used tabsplit from tab_chi (SSC). However it just lists the
>>>>>> tabulation. Is there a way to force it to create new variables based
>>>>>> on the splitting?
>>>>>
>>>>> -tabsplit- restructures the dataset temporarily to do what it does.
>>>>> The tabulation uses -tabulate-, but the original data structure is
>>>>> restored. That's deliberate. If you want something else, you are free
>>>>> to clone the program and rewrite it accordingly.But this is the same
>>>>> question as the next really, so see below.
>>>>>
>>>>>> I read through the Stata support, and I liked split. I can use it to
>>>>>> break the compound strings at the "," (comma).
>>>>>>
>>>>>> One thing I'm running into now, is that for instance in the original Var:
>>>>>> 1    "boy girl, boy boy, girl girl"
>>>>>> 2    "boy girl, girl girl"
>>>>>> 3    "boy boy"
>>>>>> 4    "girl girl"
>>>>>>
>>>>>> When using split, it of course makes 3 new vars called Var1, Var2, Var3.
>>>>>> It also splits the data in the order it sees it.
>>>>>>
>>>>>>     Var1       Var2       Var3
>>>>>> 1  boy girl  boy boy   girl girl
>>>>>> 2  boy girl  girl girl
>>>>>> 3  boy boy
>>>>>> 4  girl girl
>>>>>>
>>>>>> Is there a way to force split to appropriately split them so that they
>>>>>> are under the same var name?
>>>>>
>>>>> You want a -stack- or -reshape-. Advice at length was given in the
>>>>> references I gave earlier, so I have to guess you haven't read them.
>>>>>
>>>>> Nick
>>>>>
>>>>>> On Thu, Apr 18, 2013 at 1:40 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>>>> See (for example)
>>>>>>>
>>>>>>> -tabsplit- in -tab_chi- (SSC)
>>>>>>>
>>>>>>> FAQ     . . . . . . . . . . . . . . . . . . .  Dealing with multiple responses
>>>>>>>         . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox and U. Kohler
>>>>>>>         4/05    How do I deal with multiple responses?
>>>>>>>                 http://www.stata.com/support/faqs/data/multresp.html
>>>>>>>
>>>>>>> SJ-5-1  st0082  . . . . . . . . . . . . . . . Tabulation of multiple responses
>>>>>>>         (help _mrsvmat, mrgraph, mrtab if installed)  . . . . . . . .  B. Jann
>>>>>>>         Q1/05   SJ 5(1):92--122
>>>>>>>         introduces new commands for the computation of one- and
>>>>>>>         two-way tables of multiple responses
>>>>>>>
>>>>>>> SJ-3-1  pr0008   Speaking Stata: On structure & shape: the case of mult. resp.
>>>>>>>         . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox & U. Kohler
>>>>>>>         Q1/03   SJ 3(1):81--99                                   (no commands)
>>>>>>>         discussion of data manipulations for multiple response data
>>>>>
>>>>>>> Nick
>>>>>>> njcoxstata@gmail.com
>>>>>>>
>>>>>>> On 18 April 2013 08:29, Steven Young <youngazn@gmail.com> wrote:
>>>>>>>
>>>>>>>> So I have a survey with answers imported from Google.
>>>>>>>>
>>>>>>>> One of the questions asks "Which have you heard of" and lists 4 items below
>>>>>>>> in a checkbox fashion (tick all that you know).
>>>>>>>>
>>>>>>>> Google aggregated the data into one cell, so a person (each row) may answer
>>>>>>>> "a, b, d", a second may answer "a, b, c" and a third may answer "a, d".
>>>>>>>> Unfortunately each of these answers are quite long... not as short as a, b,
>>>>>>>> c, d. I also cannot change how Google "aggregates" this data into one cell.
>>>>>>>>
>>>>>>>> Now the issue I have is that when it's imported to stata, it will list in
>>>>>>>> one cell, each of the selected items, separated by comma.
>>>>>>>>
>>>>>>>> How do I go about making a "do" file that will go through this and find out
>>>>>>>> what each person answered, ie make sub-columns of answer choice a, b, c, d,
>>>>>>>> and then assigning a value of 1 to each column that the person answered?
>>>>>>>>
>>>>>>>> For instance if Joe answered a, b, d, then his answer columsn will be 1, 1,
>>>>>>>> 0, 1.
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index