Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: problem with Stata on a multi-server cluster


From   <[email protected]>
To   <[email protected]>
Subject   st: problem with Stata on a multi-server cluster
Date   Wed, 18 Jul 2012 14:58:28 +0100

I seek help diagnosing a problem arising with running jobs on my
workplace's multi-server cluster ("Abacus"). Each of the Abacus servers
has 32 Gb RAM with two quad-core processors plus large disk array.
Abacus runs with the Windows 2008 server operating system, and has
Stata, MATLAB, R, etc. installed. Stata/MP 11.2 for Windows (64-bit
x86-64), with 20-user 2-core Stata network perpetual license. Total
physical memory:  33552328 KB. Available physical memory: 23654612 KB

The problem:

I have been running multiple Monte-Carlo simulation jobs concurrently
(multiple instances of Stata).  On last Monday evening, all these jobs
ended prematurely.  I have previously run some other jobs to successful
completion using the same "driver" do file but with different arguments.

Specifics: within each job, I ask for R=1000 Monte-Carlo replications.
Each replication gets Stata to fit an -xtmelogit- regression equation
with the iter(250) option (which should very rarely bite). On Monday
night, after successfully completing some replications, the estimation
program simply gave up on completing the replications, and went directly
to the end of the replication cycle.

Below I give (1) extract from log file of successful job; and (2)
extract from log file of one of the failed jobs. Look at the odd
sequence of "xxxxxxxxxx" in (2), and note that no error message was
produced at that stage. [Strangely, a -ds, varwidth(20)- command later
on led to a r(199) error; but no such error arose in job (1).]

The questions:

Has anyone ever experienced similar problems in a similar environment,
and can anyone offer diagnostic advice please?

A conversation this morning with an IT services person suggested that
the problem might be within-cluster interactions between MATLAB and
Stata. In short, the suggestion was that the MATLAB jobs being run by
other users were very memory hungry and took memory away from Stata,
thence problems if another user started such a MATLAB job while mine was
running. I didn't quite understand why this should cause what was
observed -- I thought Stata would have better "defences" -- but I raise
it in case it jogs someone's mind.  If it is something like this that is
causing the problem, I'd appreciate tips to pass on to our IT people
about how to avoid such unfortunate interactions.

Meanwhile, if there are people who'd like to leave their computer and
Stata running while they go on vacation, I'd love to give them a job to
run. (Each job uses little memory, creates its own data (around 500kb)
and an output data set of around 500kb, ... but takes a long time to
run.) Get in touch if you'd like to help! :)

Thanks 
Stephen

++++++++++++++++++++++++++++++++++++++++++++++
(1) successful run

============
. di "Time is: " c(current_time) " on " c(current_date)
Time is: 11:31:29 on 29 Jun 2012

. simulate _b _se converged = e(converged) logRLL = e(ll) ///
>         , reps(1000) saving(mc_partic_model3_v01_`Nc'_`C'_output.dta,
replace double)  ///
>         : mc_silcp (`sig_u') (`sig_b3c') (`sig_b4c')

        command:  mc_silcp (.38) (.25) (.13)
[_eq9]converged:  e(converged)
   [_eq9]logRLL:  e(ll)
(note: file mc_partic_model3_v01_1000_10_output.dta not found)

Simulations (1000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100
..................................................   150
..................................................   200
..................................................   250
..................................................   300
..................................................   350
..................................................   400
........................x.........................   450
..................................................   500
..................................................   550
..................................................   600
..................................x...............   650
..........................x.......................   700
..................................................   750
..................................................   800
..................................................   850
..................................................   900
....x.............................................   950
..................................................  1000

. di "Time is: " c(current_time) " on " c(current_date)
Time is: 05:01:07 on 12 Jul 2012

============

(2) unsuccessful run

. di "Time is: " c(current_time) " on " c(current_date)
Time is: 11:31:53 on 29 Jun 2012

. simulate _b _se converged = e(converged) logRLL = e(ll) ///
>         , reps(1000) saving(mc_partic_model3_v01_`Nc'_`C'_output.dta,
replace double)  ///
>         : mc_silcp (`sig_u') (`sig_b3c') (`sig_b4c')

        command:  mc_silcp (.38) (.25) (.13)
[_eq9]converged:  e(converged)
   [_eq9]logRLL:  e(ll)
(note: file mc_partic_model3_v01_1000_20_output.dta not found)

Simulations (1000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100
..................................................   150
..................................................   200
..................................................   250
..................................................   300
..................................................   350
..................................................   400
..................................................   450
..................................................   500
..................................................   550
..................................................   600
..................................................   650
..................................................   700
..................................................   750
..................................................   800
...................xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   850
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   900
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   950
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  1000

. di "Time is: " c(current_time) " on " c(current_date)
Time is: 19:25:42 on 16 Jul 2012

============

Thank you

Stephen
------------------
Professor Stephen P. Jenkins <[email protected]>
Department of Social Policy and STICERD
London School of Economics and Political Science
Houghton Street, London WC2A 2AE, UK
Tel: +44(0)20 7955 6527
Changing Fortunes: Income Mobility and Poverty Dynamics in Britain, OUP
2011, http://ukcatalogue.oup.com/product/9780199226436.do
Survival Analysis Using Stata:
http://www.iser.essex.ac.uk/survival-analysis
Downloadable papers and software: http://ideas.repec.org/e/pje7.html



Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer

<<winmail.dat>>



© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index