I have a CI builds which occasionally fail in FetchContent when downloading an URL with the error
status_string: “Timeout was reached”
which is unfortunate. I guess this machine is on a intermittently flaky network. I see that this FetchContent is a wrapper around ExternalProject_Add, which is calling FILE DOWNLOAD, which generates the errror.
My feature requests:
Is there any documentation for what the error codes from FILE DOWNLOAD are? The documentation I see just says “0 is no error”
I see that download.cmake has a list of retry-able errors from FILE DOWNLOAD:
set(download_retry_codes 7 6 8 15)
Can we add "28" (timeout) to this list of retry-able error codes?
Are there any other error codes which should be retry-able?
Is best practice to wrap FetchContent in a loop in this case, where we see rare, but ongoing network failures?
No, I wouldn’t normally recommend that. We should be able to handle brief network outages within CMake’s implementation. See issue 24410 for a somewhat related discussion. If your infrastructure is fairly unreliable and network problems are common or take non-trivial time to resolve, looping to retry is probably just going to take longer and still fail.
The error codes probably correspond with curl exit codes.
6 Could not resolve host. The given remote host could not be resolved.
7 Failed to connect to host.
8 Weird server reply. The server sent data curl could not parse.
15 FTP cannot use host. Could not resolve the host IP we got in the 227-line.
which mostly look like host lookup-related issues. As for why 8 and not 13 or 14 which deals with strange FTP responses…no idea.
Thanks, Craig, especially for the related issue. I’m also curious where the “right” place in the stack for a retry should be – there’s one in curl file download code. Seems wrong to have a separate retry loop, perhaps with separate backoff times in the git clone code.
I’ve added the smallest possible MR to add 28 (timeout) to the list of retryable curl errors. Do with that what you will.