Getting wrong ELF class error when attempting to use library

I’m having an issue where I’m trying to compile a shared library on system A and move it onto another system B with the same architecture (x86_64). I then try to utilize some functions defined in this library but I see that I get wrong ELF class: ELFCLASS64.

When I compile a similar program as a binary on system A, and move it to system B, it runs fine. I’ve been trying to figure out why I’m getting this issue with the library, and not the binary but can’t seem to figure out anything, any word of advise would be greatly appreciated!

That is…interesting. Trying to get ld to tell you more might be fruitful. Use LD_DEBUG=libs in the environment (there are other values you can use too for more debugging).

So I’m writing this as an external stored procedure for a database, and I found out I was getting the issue because of how I defined the stored procedure itself, so I’ve resolved that particular issue. Now I’m running into an even stranger problem…

Not sure if it’s related to the stored procedure definition, or if it’s due to how I’m compiling stuff so I will elaborate more.

When I’m running the stored procedure which is using the library I defined, I’m getting crashes when I use particular parts of an external library I’ve imported.

For further context here is an example:

some_print_function_I_defined("start");

ObjectFromExternalLibrary objectA;
some_print_function_I_defined(objectA.attributeA);
some_print_function_I_defined(objectA.attributeB);

some_print_function_I_defined("end");

I will find that when I access attributeA the stored procedure continues to run fine, but as soon as I touch attributeB the procedure will stop with a crash. There are a multitude of other functions/properties/objects in the library which also crash the program (always the same ones), as well as a multitude of other functions/properties/objects that don’t seem to offend the procedure at all.

The strangest case I’ve encountered is that I will assign some value to an attribute for ObjectFromExternalLibrary, and this is the only thing I am doing functionally in the procedure with a print before and after the assignment. I’ll see the print statement at the end, but still get a report that the procedure crashed even though it reached the last print statement.

This is essentially the definition of my CMakeLists.txt

...
find_package(EXTERNAL_LIBRARY REQUIRED COMPONENTS COMPONENT)
...
add_library(${PROJECT_NAME} SHARED my_code.C ${HEADER_AND_SUCH})
target_link_libraries(${PROJECT_NAME} ${EXTERNAL_LIBARARY})

As mentioned before, when compiling similar code as a binary, the program works completely fine, but when I’m trying to use it as a library for a stored procedure I am hitting this issue.

Also I noticed that you posted in all my questions, so that was very appreciated!

This smells like an ABI problem. The program you’re using has some layout X while your library has some layout Y:

struct ObjectFromExternalLibrary {
    //               X     Y
    type1 field1; // 0     0
    type2 field2; // 8     16
}

So one library is expecting field2 to live at address &objectA + 8 while the other is looking at &objectA + 16. Assuming you can change the binary loading the library, I would dump out this for each field of the structure in question:

#define DUMP_FIELD(type, obj, name) #name << ',' << sizeof((obj).name) << ',' << offsetof(type, name)
type obj;
std::cerr << "Data for type TYPE" << std::endl;
std::cerr << DUMP_FIELD(type, obj, field1) << std::endl;
std::cerr << DUMP_FIELD(type, obj, field2) << std::endl;

Do the same in your library. If they match, something else is going wrong; valgrind or similar tools may be able to help. If these don’t match, you’ve got code expecting different things of the same memory; that never ends well.

If there’s some “waterline” you can draw in the class declaration where members after it crash and those before it work, that can help pinpoint where the object layouts differ. If there is preprocessor code involved, you’ll need to make things look the same as the binary that is loading it.

Hey Ben,

so I had to do a some what different check then the one you suggested as it seemed to crash every time I tried to access any attribute in the object in question. Instead I took just the sizeof the overall object which returned the same value (776) for the library and the binary, so I’m not sure if it is the ABI issue you suggested. Will try to dig a bit further see if I can get the suggested values from a parameter that won’t crash the library.

The partner I am working with thinks the issue could be coming up due to a library collision issue. This external library I’m trying to use already exists on the system I’m trying to move this new compiled library onto. If this is potentially the issue, I was wondering if there was a way to compile this shared library on system A through cmake, and when we move this library onto our destination system B, then have it point/link to this external library that we’ve been trying to use that is already installed on system B? I tried to fiddle with cmake configurations, but can’t seem to find a way to do this.

Thanks again for the replies.

Yes, this can appear as ABI mismatch issues. You can use RPATH entries to prefer finding libraries you ship with your package. The specific values to use depend on the deployment layout, but $ORIGIN is likely what you want in the end and then you bundle the external library with your libraries.