Mysql aggregate functions vs. Python?

Feb 25, 2011
16,992
1,621
126
Hey,

I'm setting up some stats/reporting to monitor our developer build process and performance. The makefiles keep track of build stats and kick data back to a database server. (a couple hundred builds per day.) Then python and matplotlib fetch data from the DB and generate some pretty graphs, which I post on our internal wiki.

Managers love pretty graphs.

When doing various calculations, I'm wondering if it's better in general to 1) have MySQL do AVG, COUNT, etc., to get as close as possible the data I want, or 2) just retrieve the raw table data and have my scripts work it out?

Is there a preferred method or best practice here? Does it depend on the DB size?

I don't want the DBA Mafia to come throw rotten fruit at me.

Thanks for your advice! :-D
 

Cogman

Lifer
Sep 19, 2000
10,284
138
106
I was going to say it depends, but the more I think about it, the more I believe the right solution is using the aggregate functions.

The cost of calculating the aggregates is pretty minimal all things considered, so why not do that rather then using network resources and another machine's CPU resources to do the calculations. The only exceptions here is when you are doing things like sorting which are very CPU intensive on the database.

With that being said. If possible, you may want to look into something like graphite instead of storing stats in MySQL. Graphite was built for this type of data and for generating pretty graphs and charts for this sort of thing.
 

Merad

Platinum Member
May 31, 2010
2,586
19
81
My general attitude is, don't do it on the database until you need the performance of the database. IMO the *only* advantage of doing it in SQL is the performance aspect if your data is large enough (at a few hundred items per day, it probably isn't). Most devs tend to be more comfortable with app code than with SQL, and depending on the setup at your company it's often much easier to update/debug/etc the app code than to do changes on the database.
 

Scarpozzi

Lifer
Jun 13, 2000
26,391
1,780
126
The reports I run out of SQL typically are very slow due to the vastness of the tables and lack of proper indexing. I typically export my queries in csv format and then work them over to generate more complex data sets. I've even got a few scripts that read csv files, do a bunch of calculations, then create rich HTML files as output so I can view the end result as a web page. It makes them super portable, easy to format, small in size, etc...

I could do the same kind of thing with Python, but I would have more flexibility if I had more SQL access than I actually have. Due to the indexing issues and problems with calculated fields slowing the queries down, it's much quicker doing some processing line-by-line after I get the data out of the tables.