[DRMAA-WG] Use Cases for Wait
Daniel Templeton
Dan.Templeton at Sun.COM
Thu Aug 13 10:45:50 CDT 2009
In the last working group meeting, we spent a good bit of time talking
about how to design the new wait call in the v2 spec. In the end, what
we decided is that we don't have a clear enough idea of what wait should
actually do to be able to define it. To that end, we vowed to generate
some use cases. This is my attempt at getting that thread going.
1) Job monitoring application
I have an application that submits jobs on behalf of the user and then
displays the jobs' status as they go from pending to running to
finished. The application submits the jobs and then waits for any job
to reach the running or finished state. When a transition occurs, the
application updates the UI with the new state information. If a
transition gets lost, then the UI will have stale data that could
mislead the user.
2) qsub
I want to reimplement qsub using DRMAAv2. qsub has two interesting
options. -now tells qsub to wait to return until the job has been
started. -sync tells qsub not to return until the job has completed.
Both can be used in the same submission. qsub will submit a job and
then, based on the options, it might call wait to wait for job start or
job finish. It cannot miss either of these transition changes, because
it stays blocked until it sees them.
3) suspend timeout
I have an application that submits tens of thousands of jobs and waits
for them to complete. The jobs are short, however, and if one gets
suspended, it's better to submit a second copy and then keep the winner
and kill the loser. I want to give my jobs a 30-second suspension grace
period. After any job is suspended for more than 30 seconds, then I
want to submit a duplicate. The application would submit the jobs and
then wait for any job to enter or exit the suspended state. When a
transition happens, an in-memory time table gets updated and a timer
gets set. It then waits again for the next transition. Because of the
volume of jobs being submitted, the number of jobs being suspended or
resumed at any moment in time could be very large.
4) state tracking
I want to write an application that submits a single job and then
records the time that every state transition occurred. It submits the
job and then waits for that job to have any state transition. When a
state transition occurs, it writes the information to a file and then
waits for the next transition. Because writing to a file to slow, there
could be a lag between calls to wait, but nonetheless, it cannot lose
any transitions.
Tag, you're it!
Daniel
More information about the drmaa-wg
mailing list