OpenMP schedule(static) with no chunk size specified: chunk size and order
of assignment
I have a few questions regarding #pragma omp for schedule(static) where
the chunck size is not specified.
One way to paralleize a loop in OpenMP is to do it manually like this:
#pragma omp parallel
{
const int nthreads = omp_get_num_threads();
const int ithread = omp_get_thread_num();
const int start = ithread*N/nthreads;
const int finish = (ithread+1)*N/nthreads;
for(int i = start; i<finish; i++) {
//
}
}
Is there a good reason not to do manually parallize a loop like this in
OpenMP? If I compare the values with #pragma omp for schedule(static) I
see that the chunk sizes for a given thread don't always agree so OpenMP
(in GCC) is implementing the chuck sizes different then as defined in
start and finish. Why is this?
The start and finish values I defined have several convenient properties.
1.) Each thread gets exactly one chunk
2.) The range of values for iterations increase directly with thread
number (i.e. for 100 threads with two threads the first thread will
process iterations 1-50 and the second thread 51-100 and not the
other way around).
3.) For two for loops over exactly the same range each thread will run
over exactly the same iterations.
Are all these properties are guaranteed when using #pragam omp for
schedule(static)?
According to the OpenMP specifications. "Programs that depend on which
thread executes a particular iteration under any other circumstances are
non-conforming." and "Different loop regions with the same schedule and
iteration count, even if they occur in the same parallel region, can
distribute iterations among threads differently. The only exception is for
the static schedule.
For schedule(static) the specification says "chunks are assigned to the
threads in the team in a round-robin fashion in the order of the thread
number."
Additionally the specificiation says for schedule(static). "When no
chunk_size is specified, the iteration space is divided into chunks that
are approximately equal in size, and at most one chunk is distributed to
each thread".
Finally, the specifiation says for schedule(static) "A compliant
implementation of the static schedule must ensure that the same assignment
of logical iteration numbers to threads will be used in two loop regions
if the following conditions are satisfied: 1) both loop regions have the
same number of loop iterations, 2) both loop regions have the same value
of chunk_size specified, or both loop regions have no chunk_size
specified, 3) both loop regions bind to the same parallel region"
So if I read this correctly schedule(static) will have the same convenient
properties I listed as start and finsih even though my code relies on
thread executes a particular iteration. Do I interrupt this correctly?
This seems to be a special case for schedule(static) when the chunk size
is not specified.
It's easier to just define start and finish like I did then try and
interrupt the specification for this case.
No comments:
Post a Comment