• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

How do YOU balance memory and processor usage?

This is something I think about pretty often when developing server side web applications. How do you balance memory and processor usage? A sample question to get you going: What's your threshold for storing something in memory that's used multiple times vs. reproducing it multiple times?
 
Originally posted by: joshsquall
This is something I think about pretty often when developing server side web applications. How do you balance memory and processor usage? A sample question to get you going: What's your threshold for storing something in memory that's used multiple times vs. reproducing it multiple times?

Way too little information to answer. How big is the item, how many times is it used, and how expensive is it to reproduce, and how often is the same item requested (the same item requested 1 time a day is different from the same item requested 1 time a second)...
 
Memory is cheap. CPU time is not. So short answer, I prefer wasting memory to free up the CPU :evil: See below for example.

When I write multi-core stuff I double buffer object state so that any CPU can access the read only current state from the public state while the object itself is running on another core updating it's own internal private state, then I sync and swap.

Call it 'serial multithreading' or 'scatter gather' multi theading because you control the program from a single thread, but scatter the work to be done in multiple threads, and gather the results back and synchronize. The main loop runs on one CPU then all of the sudden there is an explosion of threadable work tasks to keep all available cores busy.

For example say you have 1000 objects:

for(i=0;i<1000;++i)
QueueJobTask(objects[ I], DoPhysics); // #1
WaitForMultipleObjects(objects, 1000); // #2
for(i=0;i<1000;++i)
objects[ I].SwapStates(); // #3

#1) this loops runs extremely fast and does nothing more than perform a locked increment and add a job to the job queue and return. In fact by the time you are adding the second or third job, a worker thread has already woken up and started processing the first job on another CPU core.

#2) this waits for all jobs in the queue to be completed before moving on (sync) and also puts this control loop to sleep so that the CPU that issues this call to start jobs can join in on the job queue, and the main loop won't be awaken until the condition of all the work jobs finish. All 2, 4, or 8 cores are working on your queued jobs and nothing else.

#3) after all jobs are finished and no objects are updating, swap public/private states so the next iteration of the main loop can observe the updated 'current' state that was just updated this frame

This uses a technique known as thread pooling where you aren't creating and destroying threads (extremely expensive) but create 1 worker thread per core at program start, and put them in a loop that goes to sleep until a job is added to the queue. A 'job' is just a function pointer that the worker thread loop calls, returns, retires the job, grabs the next job, etc. Your main program is still serial in nature (which games are, no avoiding the fact that a game loop has to occur in order every time and each step depends on the previous step), but exploits parallelism at each step requiring extensive work. For example, you can't run AI until object positions and environmental factors have been decided (ie: physics). So you can't run AI and physics together, but you can thread the physics by creating threaded jobs to resolve the 100 collision pairs you just discovered... You can picture it like a Protoss carrier in Starcraft performing a 'scatter/gather' maneuver with it's drones as cores are put to work at each step.

The double buffer of data means that no locking is needed whatsoever, other than the locking needed for interfacing with the job queue when adding or removing jobs. So this means if object 103 needs to read the state of object 456 (a Cat AI tracking a Bird wanting to eat it) the state of all objects are guaranteed consistent and unchanging for the duration of the frame. So you don't stall your other cores with entering and leaving with critical sections, and at the same time you can read x from another object that is also currently updating itself (only it's internal state), and know that y and z haven't been updated (end up with old x, and new y and z). It also guarantees that two Cats won't see the same Bird at different locations (new and old) as the Bird thread runs in parallel and moves the Bird around. This is very similar to having x, oldx, y, oldy, etc. Double buffer everything that another thread might need to access while you are updating it.

So you don't waste time and resources with things like critical sections and locks because the reads to other objects are read only. If you need to modify an object (ie: bullet collision with player avatar subtracts 50 life) you send it a message and that message is processed on the next update, internally. So adding and removing jobs from the queue are the only thing that need to be in a critical section, and since it's simple incrementing and decrementing a job count and adding/removing a job from a list, it locks and unlocks very fast. Basically I could run something like this on a 4 core CPU and see all 4 cores at 100% with no usual pitfalls associated with multi threading. This also means seeing double performance when going form 4 cores to 8 cores, etc.

So back to topic, the double buffering of all object states consumes much more memory (remember even things like meshes need to be doubled, so that physics can access the 'current' mesh for interference detection but the AI will be animating the 'soon to be current' copy), but memory is cheap, and this method allows multiple cores and thousands of objects to scale extremely well, as they were meant to. Double buffering states is the only method I am aware of that completely decouples threads from the shared data they need to access and update simultaneously and allows uninhibited all out thread crunching without the need for expensive locks and exclusion methods on shared data.
 
Originally posted by: bsobel
Originally posted by: joshsquall
This is something I think about pretty often when developing server side web applications. How do you balance memory and processor usage? A sample question to get you going: What's your threshold for storing something in memory that's used multiple times vs. reproducing it multiple times?

Way too little information to answer. How big is the item, how many times is it used, and how expensive is it to reproduce, and how often is the same item requested (the same item requested 1 time a day is different from the same item requested 1 time a second)...

This is an extremely general question. I'm not looking for a solution to a particular problem, just asking for what you usually consider when making these decisions, and where your limits are for storing vs. reprocessing.
 
This is really just a question of how to optimize code in general since most data structures are a tradeoff of memory versus processor power. In most cases, write it in an easy to understand/implement/test/debug way first. Then see what's slow. Most of the code will be fast enough done in the simple way - it's either not done that often, or is fast no matter how you do it, or is waiting for IO.

Optimizing a few percent of the code (counted in lines or functions or whatever) will get you the majority of the benefit. Finding that few percent of the code is half the battle, and often finding the slow spot will make it obvious what needs to be done.

So to answer your simple question, use extra memory when the naive but memory efficient way of doing it is too slow. And use less memory when the naive but memory intensive way of doing it is too slow.

 
Back
Top