Welcome to one of the thorniest problems in practical computer science.
I'll try to give a simple explanation, hopefully not leaving out (or forgetting) too many factors. I'm going to assume a single-level cache here (although size and associativity will be touched on in a minute), that you *must* read data into cache to do anything with it (you can't somehow read data directly into registers while bypassing cache), that you do not know the exact order of the data needed (if you did, there wouldn't be much of a problem), and with the assumption that there is some very large amount of slow storage out there somewhere for you to pull data from (read: hard disk).
One of the more important things to know when talking about cache is the size of the "working set" -- this is all the information that is going to be accessed by your system. If the working set is known in advance and is smaller than the cache, and the cache is fully associative, then it is possible to simply load the entire working set into cache and then you'll have a 100% hit ratio. Unfortunately, this is rarely the case -- due to the high cost of low-latency memory, you inevitably have only a limited amount at your disposal. With a fully associative cache, in theory the best you can do is to have a hit ratio of (cache size / working set size) -- that is, if you have 50MB of cache, and 100MB of data that you might want to access at any given time, then there's a 50/100 = 50% chance of your data being in cache when you go to look for it. This assumes you have no precise knowledge of the workload -- that the data accesses are all random.
Associativity affects the probabilities and stuff basically linearly -- if your cache is 8-way associative, for example, then each particular piece of data has only 1/8 as many effective spaces for it. For this analysis, I'm going to assume a fully associative cache (one where any data can go into any space). It makes things much simpler.
The problem, of course, is that you rarely know the working set in advance, even if you do it's almost always larger than the amount of cache you have available, and in any general-purpose computing system, it's constantly changing. Thus, you wind up with various algorithms for dealing with cache. There are basically two questions you're trying to answer: a) what data will be needed next, and b) if we have to kick something out cache for new data, what should be kicked out?
The first question is highly dependent on your system and what you're doing with it. For instance, your program might be reading a very large file sequentially. If it's possible for the operating system (or, in more recent years, the hard disk itself) to detect this series of reads, it can start spooling the next pieces of data into cache
before you even ask for them! You rarely have this exact luxury, but you can often make educated guesses about what's going to be read next depending on what has been read in the recent past. As this is more of an art than a science, and is usually done at an application level (rather than in hardware), I'm not going to talk much about it.
The second questions is that of a cache replacement algorithm -- if the cache is full, and you need a piece of data that's *not* in the cache currently, some data has to be removed to make room for it. If you had perfect information, you would simply remove the data which whose next access is furthest away. But since you don't *know* exactly which data is going to be accessed next, you have to guess. The two approaches generally used are LFU (Least Frequently Used) and LRU (Least Recently Used). The idea is that you track usage of the data in cache, and try to get rid of data that hasn't been used in a while, or that hasn't been used very frequently. In practice (at least for physical cache memories), LFU schemes require too much overhead, and LRU (often an approximate LRU) is overwhelmingly used. Another piece of data that can be exploited here is "locality of reference" -- if you just looked at data block n, you're more likely to need data near n in the near future than other data. This requires too much overhead to be used in low-level cache memory (since it's dependent on the data itself), but is often used for swapping virtual memory pages and in relational database applications.
So... your question is, if a piece of data is in cache at time t, how likely is it to be in cache at time t + n? For a very loose first-order approximation, what you'd need to know is how many data accesses there are per time unit, how those accesses are distributed, and the relative sizes of the cache and working set (for sanity's sake, I'll assume a constant working set here). We'll use an LRU replacement method. Let's say we have 128 units of fully associative cache, and 1024 units of working set (an 8:1 ratio), and we read 16 pieces of data per time interval, uniformly distributed over the working set.
Let's say at time t=0, we just read in data block n. Block n will be removed if we read 128 items that *aren't* block n (as at that point, it will be the least recently used block). If we read block n again, then we "reset" back to time t=0.
In any particular time interval, we read 16 items. Each has a (1/1024) chance of being block n, and a (1023/1024) chance of *not* being block n. So, on average it should take 64 time intervals before we randomly hit block n again (1/ (16 * (1/1024)), but only 8 before it gets removed from the cache (128 / (16 * (1023/1024)).
In this case, our data block isn't going to last long, and we're going to have a sorry hit ratio. But since the data accesses are random, you can't do much better -- 7/8 of the data is just not going to be in cache at any given time. If we were using block n all the time (say, there was a 1 in 16 chance of any particular access being on block n), then it would be very unlikely to leave cache -- on average, we'd hit it once per time interval, and it would only be removed if we didn't hit it for 8 time intervals. This is a grossly simplified analysis -- a book on discrete matematics and/or probability theory would be a good place to start if you want to get more complex.
In short, your question is dependent on everything.

Perhaps if you had some particular situations in mind, more precise answers could be provided.
Your second question -- "what's the probability something will be needed again, given that it is needed now, as a function of time?" -- is a function of the workload of the system. That's something you'd define as an input to a model like this, not a question you'd ask of it. Also, in real life, you often have *no idea* what this number will be -- if you knew this number for every piece of data, then you'd simply load in all the most-often-used pieces of data and be done with the whole cache management thing.
But hey, if this was easy, I wouldn't have a job.
