Ticket #9 (assigned defect)

Opened 3 years ago

Last modified 3 years ago

Valgrind error in TestInterCollObjWorld.testAllgather with openmpi 1.1 on FC6

Reported by: albertstrasheim Assigned to: dalcinl (accepted)
Priority: major Milestone:
Component: component1 Version:
Keywords: Cc:

Description

I'm using Valgrind 3.2.1 to run test_collobj.py from mpi4py 0.4.0rc2 on Fedora Core 6 with the included openmpi 1.1-7. I use Valgrind as follows:

mpiexec -n 3 \
        valgrind \
        --tool=memcheck \
        --leak-check=yes \
        --error-limit=no \
        --suppressions=valgrind-python.supp \
        --num-callers=20 \
        --freelist-vol=536870912 \
        -v \
        python test_collobj.py -v

valgrind-python.supp can be found in the Python SVN repository. Some lines have to be uncommented to suppress most of the warnings caused by Python (see README.valgrind for more info).

The following error shows up when running TestInterCollObjWorld.testAllgather:

testAllgather (__main__.TestInterCollObjWorld) ...
==23157== Invalid read of size 4
==23157==    at 0x4BC15A7: MPI_Allgatherv (allgatherv.c:63)
==23157==    by 0x4B37D94: comm_allgather_string (mpi.c:6897)
==23157==    by 0xB2654C: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
...
==23157==  Address 0x5B0EF54 is 0 bytes after a block of size 4 alloc'd
==23157==    at 0x4005400: malloc (vg_replace_malloc.c:149)
==23157==    by 0xB278F0: PyMem_Malloc (in /usr/lib/libpython2.4.so.1.0)
==23157==    by 0x4B37B93: comm_allgather_string (mpi.c:6869)
==23157==    by 0xB2654C: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
...

The test is run on all three nodes, but the errors only show up twice, so one of the nodes (the root node probably?) isn't having this error.

If this isn't a mpi4bug issue, maybe it's a problem with openmpi -- unfortunately openmpi 1.1.2 segfaults when I tried to use mpiexec to run Valgrind.

Change History

11/10/06 16:22:03 changed by dalcinl

  • status changed from new to assigned.
  • owner changed from somebody to dalcinl.

This is a bug in OMPI. I've reported it.