I am trying to figure out an answer to this question: suppose process 1 sends a heartbeat message to process 2 every T unit of time. Process 2 declar

Hilary · May 30, 2021

I am trying to figure out an answer to this question:

suppose process 1 sends a heartbeat message to process 2 every T unit of time. Process 2 declares 1 as failed if it does not receive a response from it in the T+d time unit. Considering the worst-case scenario, after how long process 2 can detect that 1 has failed?

I highly appreciate any relevant response to this question.

Thanks

crashtech · May 30, 2021

Hi, I think you might get a better response by posting in Programming:

Programming

A forum dedicated to the dark art of computer programming.

forums.anandtech.com

Hilary · May 30, 2021

crashtech said:
Hi, I think you might get a better response by posting in Programming:

Programming

A forum dedicated to the dark art of computer programming.

forums.anandtech.com

Actually, it's more of a theoretical question, so I thought to post it here.

damian101 · May 30, 2021

I guess that depends on the hardware and how much latency is allowed on the system by the kernel.
Not an expert though.

StefanR5R · May 30, 2021

I believe there is a simple and obvious answer to this question. Whether this belief is despite or because I am not a computer systems engineer, I don't know. — The answer, as it occurs to me:

Take the latencies of the subsystems which perform process 1, pass the heartbeat message, and perform process 2, and there you have your answer. (Latencies should include all relevant effects, possibly starting at basic issues like clock drift.) More precisely:

In the special case that all of these subsystems have deterministic latencies (so-called hard realtime systems), determine the worst case total latency, and that's it.
(If the subsystem latencies are independent of each other, then the total latency is the sum of subsystem latencies. If they are not independent, then the total may be less than the sum.)

In the more general case that one or more of these subsystems behave stochastically in time (so-called soft realtime systems), you don't determine whether or not process 1 has failed to begin with. You determine what the probability is that process 1 failed.
- Obtain the probability measures of the latencies of the subsystems.
- Calculate the overall probability measure of your system.

Edit, PS: Often enough, when we are faced with stochastic systems, we nevertheless model them deterministically with reasonably good results. We do so because math is hard, or already because obtaining all of the information is hard.

Search

I am trying to figure out an answer to this question: suppose process 1 sends a heartbeat message to process 2 every T unit of time. Process 2 declar

Hilary

Junior Member

crashtech

Lifer

Programming

Hilary

Junior Member

Programming

damian101

Senior member

StefanR5R

Elite Member

TRENDING THREADS