CISC instruction...vs...SQL Query

TaylorMade · Oct 28, 2005

I am trying to find the bridge between actual application-level calls down to the instruction set commands that the CPU processes.

Theoretically, I should be able to map one SQL query (for example), break it down into everything that a query actually does with a database, and map that into the number of instructions that the CPU actually processes.'

If I were able to figure that out, I could accurately size CPU requirements with the figures to back it up, rather than "guesstimating".

I know about hardware, CISC vs. RISC, and/or gates, clockspeed, cycles, and the classic equations of computing execution time, and have done the research from the application side, but I'm missing the link where the two MATHEMATICALLY meet.

Any help?

Matthias99 · Oct 28, 2005

Run the database software through a debugger and count the number of instructions executed/amount of CPU time used by the process (and on its behalf by the OS) between entering a query and getting the results back. Repeat a lot of times. That would be the most accurate way (unless you have access to the source code). A less sophisticated method would be to time run-throughs of a set database workload; if the entire databse fits in RAM, then the instruction count will be close to the runtime divided by the frequency of the CPU.

It's difficult to give a simple number for the number of instructions, because the CPU may have to do more or less work depending on the complexity of the query and the size of the database (and whether or not stuff is in RAM or has to be fetched from disk, if you have to wait because someone is accessing the same database record, etc.). But the actual instructions that the CPU will execute are determined when the program is compiled.

Mark R · Oct 28, 2005

This is really a very difficult thing to estimate.

The amount of work needed to run an SQL query depends on many things - the size and type of the data, availability of RAM, database server load, allowable optimisations (e.g. splitting a single query into multiple threads), ability to cache query plans, etc.

If you are working with large datasets - then disk speed and RAM availability may be more of an issue than CPU. Some more advanced databases will try and estimate CPU and disk times, and play one off against the other to get optimum performance. E.g. if you have fast CPU and slow disk - it may be better to build an ad hoc hashtable to run a query, whereas if you have slow CPU and fast disk it may be better to spool the entire table from disk and sort it.

If a server is busy (lots of concurrent querys / updates, etc.) then each query will require more work than on a lightly loaded server because of the increased workload of maintaining transactions, locks, etc.

Planning the query is also a major task - I've had queries that take 10 times as long to plan as to run. Some databases can cache the query plan, so that next time you run it, the old plan can be used. However, in a busy database with lots of data changes the plan has to be refreshed more often (different data may need a different plan for optimal performance).

If you really want to estimate performance then it is best done empirically (benchmark how many queries can be run per second) in a situation that represents the production environement as closely as possible.

smack Down · Oct 29, 2005

Also don't forget to factor in TLB misses/hits, cache misses/hits, pipeline lenght, number of execution units, the ILP of the coded, disk cache. Unless you have access to the simulation models that intel uses to design their chips your not going to know how fast each instruction is going

itachi · Oct 30, 2005

download mysql source code, compile it into assembly, find the file that contains the code for a particular query, then try and trace the code.. 2 years later, if you have it figured out.. feel free to let us know.

Future Shock · Nov 1, 2005

While I won't say what you are looking for can't be done (the above comments are excellent starting points, even the sarcastic ones), it's just not worth doing. The SQL plan optimizers of a good database engine are amongst the most difficult things to figure, and the queries themselves vary so much in how it affects performance. A classic example is indexing large datasets: if an optimizer decides that the query will affect more than ~ 1/3 of 1 percent of the single largest table in the query, indexes are not used, but table scans - a vastly different proposition for CPU usage. To make matters worse, the optimizer looks at the historical performance of the database under measure, and tunes it's SQL execution plans based upon what it learns (OK, Oracle does to some degree, not sure about the others). So a plan that works one way now, may work slightly differently in 6 months time...

I have written some performance sizing software in years past using queueing networks, and we merely decided to pick some "average" cpu utilization for a given type of application call, and then tune it by empirical testing. It takes time, but it does eventually work well enough...

Future Shock

CISC instruction...vs...SQL Query

TaylorMade

Junior Member

Matthias99

Diamond Member

Mark R

Diamond Member

smack Down

Diamond Member

itachi

Senior member

Future Shock

Senior member

TRENDING THREADS