• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

I am trying to figure out an answer to this question: suppose process 1 sends a heartbeat message to process 2 every T unit of time. Process 2 declar

Hilary

Junior Member
May 30, 2021
2
0
6
I am trying to figure out an answer to this question:

suppose process 1 sends a heartbeat message to process 2 every T unit of time. Process 2 declares 1 as failed if it does not receive a response from it in the T+d time unit. Considering the worst-case scenario, after how long process 2 can detect that 1 has failed?

I highly appreciate any relevant response to this question.

Thanks
 

damian101

Senior member
Aug 11, 2020
281
100
76
I guess that depends on the hardware and how much latency is allowed on the system by the kernel.
Not an expert though.
 

StefanR5R

Elite Member
Dec 10, 2016
4,180
4,684
136
I believe there is a simple and obvious answer to this question. Whether this belief is despite or because I am not a computer systems engineer, I don't know. — The answer, as it occurs to me:

Take the latencies of the subsystems which perform process 1, pass the heartbeat message, and perform process 2, and there you have your answer. (Latencies should include all relevant effects, possibly starting at basic issues like clock drift.) More precisely:
  • In the special case that all of these subsystems have deterministic latencies (so-called hard realtime systems), determine the worst case total latency, and that's it.
    (If the subsystem latencies are independent of each other, then the total latency is the sum of subsystem latencies. If they are not independent, then the total may be less than the sum.)
  • In the more general case that one or more of these subsystems behave stochastically in time (so-called soft realtime systems), you don't determine whether or not process 1 has failed to begin with. You determine what the probability is that process 1 failed.
    • Obtain the probability measures of the latencies of the subsystems.
    • Calculate the overall probability measure of your system.

Edit, PS: Often enough, when we are faced with stochastic systems, we nevertheless model them deterministically with reasonably good results. We do so because math is hard, or already because obtaining all of the information is hard.
 
Last edited:
  • Like
Reactions: Rudy Toody

ASK THE COMMUNITY