Feature Request: More robust retry policy for FetchContent / ExternalProject_Add in download.cmake

I have a CI builds which occasionally fail in FetchContent when downloading an URL with the error

status_code: 28
status_string: “Timeout was reached”

which is unfortunate. I guess this machine is on a intermittently flaky network. I see that this FetchContent is a wrapper around ExternalProject_Add, which is calling FILE DOWNLOAD, which generates the errror.

My feature requests:

  • Is there any documentation for what the error codes from FILE DOWNLOAD are? The documentation I see just says “0 is no error”
  • I see that download.cmake has a list of retry-able errors from FILE DOWNLOAD:

set(download_retry_codes 7 6 8 15)

 Can we add "28" (timeout) to this list of retry-able error codes?
 Are there any other error codes which should be retry-able?

Is best practice to wrap FetchContent in a loop in this case, where we see rare, but ongoing network failures?

Thanks,

-greg

1 Like

@ben.boeckel @brad.king Do either of you know the history behind why those particular error codes were added?

No, I wouldn’t normally recommend that. We should be able to handle brief network outages within CMake’s implementation. See issue 24410 for a somewhat related discussion. If your infrastructure is fairly unreliable and network problems are common or take non-trivial time to resolve, looping to retry is probably just going to take longer and still fail.

The error codes probably correspond with curl exit codes.

       6      Could not resolve host. The given remote host could not be resolved.
       7      Failed to connect to host.
       8      Weird server reply. The server sent data curl could not parse.
…
       15     FTP cannot use host. Could not resolve the host IP we got in the 227-line.

which mostly look like host lookup-related issues. As for why 8 and not 13 or 14 which deals with strange FTP responses…no idea.

This is an example of using file(DOWNLOAD) to feed into FetchContent, to allow more download option control. CMake FetchContent with manual download and optional retry · GitHub

2 Likes

Thanks, Craig, especially for the related issue. I’m also curious where the “right” place in the stack for a retry should be – there’s one in curl file download code. Seems wrong to have a separate retry loop, perhaps with separate backoff times in the git clone code.

I’ve added the smallest possible MR to add 28 (timeout) to the list of retryable curl errors. Do with that what you will.

For reference, that is CMake MR 8270.

1 Like

Hi. Can we add retries for 500/503 response codes as well? When files are downloaded from s3 I intermittently get these errors which are mitigated by a retry.

Unfortunately curl does not have a special return code for these and all http status codes >= 400 fall under error code 22 so it isn’t as straightforward to implement.