# I am trying to figure out an answer to this question: suppose process 1 sends a heartbeat message to process 2 every T unit of time. Process 2 declar

#### Hilary

##### Junior Member
I am trying to figure out an answer to this question:

suppose process 1 sends a heartbeat message to process 2 every T unit of time. Process 2 declares 1 as failed if it does not receive a response from it in the T+d time unit. Considering the worst-case scenario, after how long process 2 can detect that 1 has failed?

I highly appreciate any relevant response to this question.

Thanks

#### crashtech

##### Lifer
Hi, I think you might get a better response by posting in Programming:

#### Hilary

##### Junior Member
Hi, I think you might get a better response by posting in Programming:

Actually, it's more of a theoretical question, so I thought to post it here.

#### damian101

##### Senior member
I guess that depends on the hardware and how much latency is allowed on the system by the kernel.
Not an expert though.

#### StefanR5R

##### Elite Member
I believe there is a simple and obvious answer to this question. Whether this belief is despite or because I am not a computer systems engineer, I don't know. — The answer, as it occurs to me:

Take the latencies of the subsystems which perform process 1, pass the heartbeat message, and perform process 2, and there you have your answer. (Latencies should include all relevant effects, possibly starting at basic issues like clock drift.) More precisely:
• In the special case that all of these subsystems have deterministic latencies (so-called hard realtime systems), determine the worst case total latency, and that's it.
(If the subsystem latencies are independent of each other, then the total latency is the sum of subsystem latencies. If they are not independent, then the total may be less than the sum.)
• In the more general case that one or more of these subsystems behave stochastically in time (so-called soft realtime systems), you don't determine whether or not process 1 has failed to begin with. You determine what the probability is that process 1 failed.
• Obtain the probability measures of the latencies of the subsystems.
• Calculate the overall probability measure of your system.

Edit, PS: Often enough, when we are faced with stochastic systems, we nevertheless model them deterministically with reasonably good results. We do so because math is hard, or already because obtaining all of the information is hard.

Last edited:
Rudy Toody