First suggestion: use TCP.
TCP gives you reliable delivery. It gives you rate-limiting, it tries to use the maximum bandwidth it can use. It works in high-speed environments and in low-speed environments. It might seem like it's not always 100% efficient (because TCP headers overhead). But forget about that. If you're gonna write your own protocol, e.g. based on UDP, you'll be wasting a lot of time. You'll be trying to solve all problems that TCP has solved. It'll take you weeks or months to come up with something that works. In the end it'll look a lot like TCP, but it will work less good that TCP.
Second suggestion: make the receiver initiate the transfer
What you wrote kinda suggests that you have a lot (thousands) of programs working towards a partial solution. And once they are done, they will decide by themselves to send the result to the gatherer. Don't do that. Make the gatherer gather the information. That way the gatherer dictates the speed at which data comes in. And the gatherer can make sure it slows down if it gets overwhelmed.
E.g. when every process starts, let them set up a TCP connection to the gatherer. Then they send a small message to the gatherer that they exist. Then they start working on the problem. When they are done, and have the results ready, they send a small message to the gatherer again. The gatherer waits for these notifications to come in. When the gatherer sees one notification come in, it sends a command to the node to request it to start sending data. You can do this in parallel (I assume I don't need to teach you about listening sockets, select systemcall, non-blocking I/O, etc). But even when you gather data in parallel, the gatherer has control over how many nodes are allowed to send at the same time. You might think this is a bit complicated to program. But I think it's not that bad. And if you really have a lot of nodes, and a lot of data, this will pay off in the end.
Third suggestion: don't do a simple copy of all data over the connection
I'm not sure if you're gonna use scp or ssh to copy data. Or if you are just gonna write your data over the TCP connection. That might be more efficient (no encryption overhead, which will have a big impact on big data-sets). How much data are we talking ? A MegaByte per node ? A GigaByte per node ? More ? What you don't want is that nodes all start sending data, and then the shit hits the fan, and connections are being reset. And then all nodes start their transfer all over again, from scratch.
In stead, make your own simple little protocol. (Over TCP of course). You are already implementing a few messages in either direction in your custom protocol. One message so that a node can tell the gatherer it exists. One message for the gatherer to tell the node to start sending. While you're at it, add another little message. A header for your data, that describes the offset and the length of the data in the file you are transfering. Say for every 1 MByte or every 10 MByte, you start that data with your little message telling the gatherer "here is data starting at 180 MB, to 190 MB". That way, if the TCP connection gets reset, the gatherer knows how many of those blocks it has received fully, and which block was in transmission, but didn't finish. The gatherer can then request the node to start resending half-way your data set. First of all this can prevent a lot of problem, imho. But the nice thing is that it almost guarantees that there is contineous progress. And hopefully at some time all data will have made it accross.
Fourth suggestion: prevent synchonization between your nodes
What you don't want is 4000 nodes being ready at exactly the same time, and start sending packets at exactly the same time. Also, you don't want the gatherer to send on broadcast or multicast message, and all 4000 nodes responding at the same time to the gatherer. That is a guarantee that messages will be lost.
A good protocol has random delays in it. Whenever a host or router decides that it needs to send a message, don't send it immediately. In stead, start a timer, e.g. for 10 milliseconds or 100 milliseconds. And jitter the exact time. Higher granularity of the timer helps. Now suppose your protocol lets the gatherer broadcast a message to the nodes saying: "tell me who is ready and wants to send data to me". If all nodes reply immediately, that causes packet drops. But if they all wait between 100 milliseconds and a full second, then the gatherer will receive 4000 messages in a 900 millisecond timeframe. That's 5 messages per millisecond. Maybe still a bit much, but better than getting 4000 messages within a few milliseconds.
Of course, if you use TCP, some of these issues will be less relevant. But keep in mind, if you design or implement a protocol, try to smooth out all actions over time, and try to prevent synchronization.
This should get you started.
I hope these ideas help. They are all old ideas that have been implemented already decades ago. But you can still learn from them.
If you have more questions, or ideas to discuss, let me know. I enjoy this kind of stuff.

Good luck.