Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: cases to variables with automated timestamp-based unitizing

Subject   st: cases to variables with automated timestamp-based unitizing
Date   Mon, 7 Jun 2010 00:50:37 +0200 (CEST)

Dear all,

I have a data transformation problem and would greatly appreciate any suggestions on how to solve it.

I am analyzing data from a rating task with multiple raters. The ratings concerned audiovisual material, i.e. continuous data which had to be properly segmented (unitized) by the raters. Each segment coded by the raters has a timestamp attached to it in the "start" and "end" variables in which start and end times of the segment are recorded (in sec).

The data look like this (this is a simplified version with only two raters when in actuality there are nine):

            Rater    Start       End          Var1    Var2    ...

case1    R1       17.54      123.29      4        2    

case2    R2       18.02      123.76      4        3

case3    R1       128.43    171.53      2        1

case4    R2       130.13    148.21      2        1

I now intend to do analyses for which the data need to be set up differently. The ratings of the separate judges for all variables should be represented in individual columns while the rows should correspond to a single observed unit each. This means that a relatively simple "cases to variables" procedure is in order. However, the issue is complicated by the need to identify the units and match cases accordingly beforehand. I do not expect agreement on start and end times of the segments to be exactly the same for them to be considered a unit. Instead what is expected here is agreement between raters in a range of, say, 5 sec for both start and end time of the segment. That is, in the above data example, cases 1 and 2 should be counted as a unit and both ratings put into its single row, while for cases 3 and 4 raters disagree too much on the end time of the segment. Therefore, cases 3 and 4 should be kept as single units within the dataset.

Consequently, this is what the data should look like in the end:

             Start        End            Var1_R1    Var1_R2    Var2_R1    Var2_R2    ...

case1    17.54      123.29        4               4               2                3

case2    128.43    171.53        2               .                1                .

case3    130.13    148.21        .                2                .                1

For the analyses intended it does not matter much whether start and end times of the cases (now "units") equal those set by the first rater (as is the case in the example data matrix) or (more elegant) the mean of all ratings then subsumed under the case/unit.

I am unsure how to go about solving this transformation task in an automated fashion in  Stata - hence any help is much appreciated.

Thanks in advance,
GRATIS für alle WEB.DE Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index