Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: survey answers imported from google, checkbox type


From   Steven Young <youngazn@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: survey answers imported from google, checkbox type
Date   Mon, 22 Apr 2013 10:36:14 -0700

I figured out how to format the data into what I want,
gen id = _n
split var, p(,)
drop var
reshape long var , i(id) string

gen number = 1 if var == "string1"
recode . = 1 if var == " string1"     *Unfortunately the dataset had
values that had a leading space... so I had to take this into accout
...

drop if number == .

drop var _j

*encode numerical value of yes = 1
gen byte one = 1

reshape wide one, i(id) j(number)

rename one1 string1
rename one2 string2
rename one3 string3
rename one4 string4


In this way I am able to shape the data set from split (and all the
things are split by the comma, but not categorized under the same
"variable name") and now have var names that correspond to the
multiple response properly. Espcially now I've taken into account the
values for "none"
So id1 will now show a 1 under string1 and string3, while id2 will now
show a 1 under string1 string2 and string3, and they will all
correspond to what the original data set had.


FAQ 3.5 was very helpful from your link, after a very very thorough
reading. I was tempted to use Tabulate number, gen(g) as well, to
create dummy variables, but saw how 3.5 was a little more efficient,
especially when I dropped all of the missing observations and assigned
everything a value of 1, and then reshaped.

On Mon, Apr 22, 2013 at 12:48 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> What's wrong with the original data structure? If you want wide, but
> something a bit different, it is likely to be easiest to return to
> that and then change.
>
> Nick
> njcoxstata@gmail.com
>
>
> On 22 April 2013 05:58, Steven Young <youngazn@gmail.com> wrote:
>> Ok I may have figured out how to .... re-arrange them.
>>
>> I used sort, by var: gen newj = 1 if _n == 1
>> replace newj = sum(newj)
>>
>> Now I run into the problem of, trying to reshape back into Wide, it
>> says that newj is not unique within id; there are multiple
>> observations at the same newj within id. How can I persuade it to
>> ignore that and just reshape it?
>>
>> I think maybe this solution may not work, because I created a newj
>> data value of 4, and so when it tries to remake the columns based on
>> newj, it will not know how to properly take that into account.
>>
>> I guess the same problem exists, because the original id does not have
>> the 4th "option" of "none"...
>>
>> On Sun, Apr 21, 2013 at 6:40 PM, Steven Young <youngazn@gmail.com> wrote:
>>> I did read them, and I just used reshape, but it is still somewhat lacking...
>>>
>>> After I used split, I wrote "gen id = _n" and reshaped based on the
>>> stub from the split
>>>
>>> Now I have:
>>>
>>> id      _j          stub
>>> 1       1           boy girl
>>> 1       2           boy boy
>>> 1       3           girl girl
>>> 2       1           boy girl
>>> 2       2           girl girl
>>> 2       3
>>> 3       1           boy boy
>>> 3       2
>>> 3       3
>>> 4       1           girl girl
>>> 4       2
>>> 4       3
>>>
>>> I managed to assign a numerical value through gen and recode to each
>>> stub's data, and relabeled it as well.
>>>
>>> Two last questions:
>>>
>>> There's an option for "none". That means someone did not pick "boy
>>> girl", "boy boy" or "girl girl". As it is, right now that option would
>>> show up under _j = 1.
>>> id     _j     stub
>>> 5      1      none
>>> 5      2
>>> 5      3
>>>
>>> What's the best way to take this into consideration/introduce this var
>>> so that all id's have that 4th option??
>>>
>>> How do I now re-sort/reshape this back into wide so that it shows up as:
>>>         Var1       Var2       Var3       Var4
>>> 1     boy girl   boy boy  girl girl
>>> 2     boy girl                  girl girl
>>> 3                    boy boy
>>> 4                                   girl girl
>>> 5                                                 None
>>>
>>> On Sun, Apr 21, 2013 at 5:26 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>> Force is on your mind. Better to think of persuasion. Specific answers below.
>>>>
>>>> Nick
>>>> njcoxstata@gmail.com
>>>>
>>>>
>>>> On 21 April 2013 23:15, Steven Young <youngazn@gmail.com> wrote:
>>>>> Thanks Nick for your reply.
>>>>>
>>>>> I used tabsplit from tab_chi (SSC). However it just lists the
>>>>> tabulation. Is there a way to force it to create new variables based
>>>>> on the splitting?
>>>>
>>>> -tabsplit- restructures the dataset temporarily to do what it does.
>>>> The tabulation uses -tabulate-, but the original data structure is
>>>> restored. That's deliberate. If you want something else, you are free
>>>> to clone the program and rewrite it accordingly.But this is the same
>>>> question as the next really, so see below.
>>>>
>>>>> I read through the Stata support, and I liked split. I can use it to
>>>>> break the compound strings at the "," (comma).
>>>>>
>>>>> One thing I'm running into now, is that for instance in the original Var:
>>>>> 1    "boy girl, boy boy, girl girl"
>>>>> 2    "boy girl, girl girl"
>>>>> 3    "boy boy"
>>>>> 4    "girl girl"
>>>>>
>>>>> When using split, it of course makes 3 new vars called Var1, Var2, Var3.
>>>>> It also splits the data in the order it sees it.
>>>>>
>>>>>     Var1       Var2       Var3
>>>>> 1  boy girl  boy boy   girl girl
>>>>> 2  boy girl  girl girl
>>>>> 3  boy boy
>>>>> 4  girl girl
>>>>>
>>>>> Is there a way to force split to appropriately split them so that they
>>>>> are under the same var name?
>>>>
>>>> You want a -stack- or -reshape-. Advice at length was given in the
>>>> references I gave earlier, so I have to guess you haven't read them.
>>>>
>>>> Nick
>>>>
>>>>> On Thu, Apr 18, 2013 at 1:40 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>>> See (for example)
>>>>>>
>>>>>> -tabsplit- in -tab_chi- (SSC)
>>>>>>
>>>>>> FAQ     . . . . . . . . . . . . . . . . . . .  Dealing with multiple responses
>>>>>>         . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox and U. Kohler
>>>>>>         4/05    How do I deal with multiple responses?
>>>>>>                 http://www.stata.com/support/faqs/data/multresp.html
>>>>>>
>>>>>> SJ-5-1  st0082  . . . . . . . . . . . . . . . Tabulation of multiple responses
>>>>>>         (help _mrsvmat, mrgraph, mrtab if installed)  . . . . . . . .  B. Jann
>>>>>>         Q1/05   SJ 5(1):92--122
>>>>>>         introduces new commands for the computation of one- and
>>>>>>         two-way tables of multiple responses
>>>>>>
>>>>>> SJ-3-1  pr0008   Speaking Stata: On structure & shape: the case of mult. resp.
>>>>>>         . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox & U. Kohler
>>>>>>         Q1/03   SJ 3(1):81--99                                   (no commands)
>>>>>>         discussion of data manipulations for multiple response data
>>>>
>>>>>> Nick
>>>>>> njcoxstata@gmail.com
>>>>>>
>>>>>> On 18 April 2013 08:29, Steven Young <youngazn@gmail.com> wrote:
>>>>>>
>>>>>>> So I have a survey with answers imported from Google.
>>>>>>>
>>>>>>> One of the questions asks "Which have you heard of" and lists 4 items below
>>>>>>> in a checkbox fashion (tick all that you know).
>>>>>>>
>>>>>>> Google aggregated the data into one cell, so a person (each row) may answer
>>>>>>> "a, b, d", a second may answer "a, b, c" and a third may answer "a, d".
>>>>>>> Unfortunately each of these answers are quite long... not as short as a, b,
>>>>>>> c, d. I also cannot change how Google "aggregates" this data into one cell.
>>>>>>>
>>>>>>> Now the issue I have is that when it's imported to stata, it will list in
>>>>>>> one cell, each of the selected items, separated by comma.
>>>>>>>
>>>>>>> How do I go about making a "do" file that will go through this and find out
>>>>>>> what each person answered, ie make sub-columns of answer choice a, b, c, d,
>>>>>>> and then assigning a value of 1 to each column that the person answered?
>>>>>>>
>>>>>>> For instance if Joe answered a, b, d, then his answer columsn will be 1, 1,
>>>>>>> 0, 1.
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index