Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: cases to variables with automated timestamp-based unitizing

From	A Loumiotis <[email protected]>
To	[email protected]
Subject	Re: st: RE: cases to variables with automated timestamp-based unitizing
Date	Tue, 8 Jun 2010 11:02:17 +0300

I came up with a code that I think does what you asked:

clear*

inp byte case str2 Rater Start End byte(Var1 Var2)
1 R1 17.54    123.29   4  2
2 R2 18.02    123.76   4  3
3 R1 128.43  171.53   2  1
4 R2 130.13  148.21   2  1
end

list, noo

reshape wide Var?, i(case) j(Rater) string

list, noo

sort Start End

gen start=.
gen end=.
forvalues j=1/`=_N-1' {
forvalues i=0/`=_N-`j'-1' {
	replace start=Start[`j'] in `=_N-`i'' if
abs(Start[`=_N-`i'']-Start[`j'])<5 & abs(End[`=_N-`i'']-End[`j'])<5
	replace end=End[`j'] in `=_N-`i'' if abs(End[`=_N-`i'']-End[`j'])<5 &
abs(End[`=_N-`i'']-End[`j'])<5
}
}	

replace start=Start if start==.
replace end=End if end==.
gen uniqid=start+end
foreach var of varlist Var* {
bysort uniqid: egen byte n_`var'=total(`var')
}
drop Var*
list, noo
duplicates drop uniqid, force
list, noo

Antonis Loumiotis

On Mon, Jun 7, 2010 at 2:13 AM, Martin Weiss <[email protected]> wrote:
>
> <>
>
> The " simple "cases to variables" procedure" is a -reshape wide-, in Stata parlance:
>
>
>
> ***********
> clear*
>
> inp byte case str2 Rater Start End byte(Var1 Var2)
> 1 R1 17.54    123.29   4  2
> 2 R2 18.02    123.76   4  3
> 3 R1 128.43  171.53   2  1
> 4 R2 130.13  148.21   2  1
> end
>
> list, noo
>
> reshape wide Var?, i(case) j(Rater) string
>
> list, noo
> ***********
>
>
> HTH
> Martin
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of [email protected]
> Sent: Montag, 7. Juni 2010 00:51
> To: [email protected]
> Subject: st: cases to variables with automated timestamp-based unitizing
>
> Dear all,
>
> I have a data transformation problem and would greatly appreciate any suggestions on how to solve it.
>
> I am analyzing data from a rating task with multiple raters. The ratings concerned audiovisual material, i.e. continuous data which had to be properly segmented (unitized) by the raters. Each segment coded by the raters has a timestamp attached to it in the "start" and "end" variables in which start and end times of the segment are recorded (in sec).
>
> The data look like this (this is a simplified version with only two raters when in actuality there are nine):
>
>
>            Rater    Start       End          Var1    Var2    ...
>
> case1    R1       17.54      123.29      4        2
>
> case2    R2       18.02      123.76      4        3
>
> case3    R1       128.43    171.53      2        1
>
> case4    R2       130.13    148.21      2        1
> .
> .
> .
>
>
>
> I now intend to do analyses for which the data need to be set up differently. The ratings of the separate judges for all variables should be represented in individual columns while the rows should correspond to a single observed unit each. This means that a relatively simple "cases to variables" procedure is in order. However, the issue is complicated by the need to identify the units and match cases accordingly beforehand. I do not expect agreement on start and end times of the segments to be exactly the same for them to be considered a unit. Instead what is expected here is agreement between raters in a range of, say, 5 sec for both start and end time of the segment. That is, in the above data example, cases 1 and 2 should be counted as a unit and both ratings put into its single row, while for cases 3 and 4 raters disagree too much on the end time of the segment. Therefore, cases 3 and 4 should be kept as single units within the dataset.
>
> Consequently, this is what the data should look like in the end:
>
>
>             Start        End            Var1_R1    Var1_R2    Var2_R1    Var2_R2    ...
>
> case1    17.54      123.29        4               4               2                3
>
> case2    128.43    171.53        2               .                1                .
>
> case3    130.13    148.21        .                2                .                1
> .
> .
> .
>
>
> For the analyses intended it does not matter much whether start and end times of the cases (now "units") equal those set by the first rater (as is the case in the example data matrix) or (more elegant) the mean of all ratings then subsumed under the case/unit.
>
> I am unsure how to go about solving this transformation task in an automated fashion in  Stata - hence any help is much appreciated.
>
> Thanks in advance,
> Eike
> ___________________________________________________________
> GRATIS für alle WEB.DE Nutzer: Die maxdome Movie-FLAT!
> Jetzt freischalten unter http://movieflat.web.de
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: cases to variables with automated timestamp-based unitizing
  - From: [email protected]

Prev by Date: Re: st: comparing two mim gllamm logistic models
Next by Date: st: FW: graphing ordinal panel data over time
Previous by thread: st: RE: cases to variables with automated timestamp-based unitizing
Next by thread: st: -ssc new- error
Index(es):
- Date
- Thread