Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: RE: cases to variables with automated timestamp-based unitizing

 From A Loumiotis To statalist@hsphsun2.harvard.edu Subject Re: st: RE: cases to variables with automated timestamp-based unitizing Date Tue, 8 Jun 2010 11:02:17 +0300

```I came up with a code that I think does what you asked:

clear*

inp byte case str2 Rater Start End byte(Var1 Var2)
1 R1 17.54    123.29   4  2
2 R2 18.02    123.76   4  3
3 R1 128.43  171.53   2  1
4 R2 130.13  148.21   2  1
end

list, noo

reshape wide Var?, i(case) j(Rater) string

list, noo

sort Start End

gen start=.
gen end=.
forvalues j=1/`=_N-1' {
forvalues i=0/`=_N-`j'-1' {
replace start=Start[`j'] in `=_N-`i'' if
abs(Start[`=_N-`i'']-Start[`j'])<5 & abs(End[`=_N-`i'']-End[`j'])<5
replace end=End[`j'] in `=_N-`i'' if abs(End[`=_N-`i'']-End[`j'])<5 &
abs(End[`=_N-`i'']-End[`j'])<5
}
}

replace start=Start if start==.
replace end=End if end==.
gen uniqid=start+end
foreach var of varlist Var* {
bysort uniqid: egen byte n_`var'=total(`var')
}
drop Var*
list, noo
duplicates drop uniqid, force
list, noo

Antonis Loumiotis

On Mon, Jun 7, 2010 at 2:13 AM, Martin Weiss <martin.weiss1@gmx.de> wrote:
>
> <>
>
> The " simple "cases to variables" procedure" is a -reshape wide-, in Stata parlance:
>
>
>
> ***********
> clear*
>
> inp byte case str2 Rater Start End byte(Var1 Var2)
> 1 R1 17.54    123.29   4  2
> 2 R2 18.02    123.76   4  3
> 3 R1 128.43  171.53   2  1
> 4 R2 130.13  148.21   2  1
> end
>
> list, noo
>
> reshape wide Var?, i(case) j(Rater) string
>
> list, noo
> ***********
>
>
> HTH
> Martin
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of emrinke@web.de
> Sent: Montag, 7. Juni 2010 00:51
> To: statalist@hsphsun2.harvard.edu
> Subject: st: cases to variables with automated timestamp-based unitizing
>
> Dear all,
>
> I have a data transformation problem and would greatly appreciate any suggestions on how to solve it.
>
> I am analyzing data from a rating task with multiple raters. The ratings concerned audiovisual material, i.e. continuous data which had to be properly segmented (unitized) by the raters. Each segment coded by the raters has a timestamp attached to it in the "start" and "end" variables in which start and end times of the segment are recorded (in sec).
>
> The data look like this (this is a simplified version with only two raters when in actuality there are nine):
>
>
>            Rater    Start       End          Var1    Var2    ...
>
> case1    R1       17.54      123.29      4        2
>
> case2    R2       18.02      123.76      4        3
>
> case3    R1       128.43    171.53      2        1
>
> case4    R2       130.13    148.21      2        1
> .
> .
> .
>
>
>
> I now intend to do analyses for which the data need to be set up differently. The ratings of the separate judges for all variables should be represented in individual columns while the rows should correspond to a single observed unit each. This means that a relatively simple "cases to variables" procedure is in order. However, the issue is complicated by the need to identify the units and match cases accordingly beforehand. I do not expect agreement on start and end times of the segments to be exactly the same for them to be considered a unit. Instead what is expected here is agreement between raters in a range of, say, 5 sec for both start and end time of the segment. That is, in the above data example, cases 1 and 2 should be counted as a unit and both ratings put into its single row, while for cases 3 and 4 raters disagree too much on the end time of the segment. Therefore, cases 3 and 4 should be kept as single units within the dataset.
>
> Consequently, this is what the data should look like in the end:
>
>
>             Start        End            Var1_R1    Var1_R2    Var2_R1    Var2_R2    ...
>
> case1    17.54      123.29        4               4               2                3
>
> case2    128.43    171.53        2               .                1                .
>
> case3    130.13    148.21        .                2                .                1
> .
> .
> .
>
>
> For the analyses intended it does not matter much whether start and end times of the cases (now "units") equal those set by the first rater (as is the case in the example data matrix) or (more elegant) the mean of all ratings then subsumed under the case/unit.
>
> I am unsure how to go about solving this transformation task in an automated fashion in  Stata - hence any help is much appreciated.
>
> Eike
> ___________________________________________________________
> GRATIS für alle WEB.DE Nutzer: Die maxdome Movie-FLAT!
> Jetzt freischalten unter http://movieflat.web.de
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```