Mysql aggregate functions vs. Python?

dave_the_nerd · Nov 1, 2016

Hey,

I'm setting up some stats/reporting to monitor our developer build process and performance. The makefiles keep track of build stats and kick data back to a database server. (a couple hundred builds per day.) Then python and matplotlib fetch data from the DB and generate some pretty graphs, which I post on our internal wiki.

Managers love pretty graphs.

When doing various calculations, I'm wondering if it's better in general to 1) have MySQL do AVG, COUNT, etc., to get as close as possible the data I want, or 2) just retrieve the raw table data and have my scripts work it out?

Is there a preferred method or best practice here? Does it depend on the DB size?

I don't want the DBA Mafia to come throw rotten fruit at me.

Thanks for your advice! :-D

Cogman · Nov 1, 2016

I was going to say it depends, but the more I think about it, the more I believe the right solution is using the aggregate functions.

The cost of calculating the aggregates is pretty minimal all things considered, so why not do that rather then using network resources and another machine's CPU resources to do the calculations. The only exceptions here is when you are doing things like sorting which are very CPU intensive on the database.

With that being said. If possible, you may want to look into something like graphite instead of storing stats in MySQL. Graphite was built for this type of data and for generating pretty graphs and charts for this sort of thing.

Merad · Nov 1, 2016

My general attitude is, don't do it on the database until you need the performance of the database. IMO the *only* advantage of doing it in SQL is the performance aspect if your data is large enough (at a few hundred items per day, it probably isn't). Most devs tend to be more comfortable with app code than with SQL, and depending on the setup at your company it's often much easier to update/debug/etc the app code than to do changes on the database.

Scarpozzi · Nov 11, 2016

The reports I run out of SQL typically are very slow due to the vastness of the tables and lack of proper indexing. I typically export my queries in csv format and then work them over to generate more complex data sets. I've even got a few scripts that read csv files, do a bunch of calculations, then create rich HTML files as output so I can view the end result as a web page. It makes them super portable, easy to format, small in size, etc...

I could do the same kind of thing with Python, but I would have more flexibility if I had more SQL access than I actually have. Due to the indexing issues and problems with calculated fields slowing the queries down, it's much quicker doing some processing line-by-line after I get the data out of the tables.

Search

Mysql aggregate functions vs. Python?

dave_the_nerd

Lifer

Cogman

Lifer

Merad

Platinum Member

Scarpozzi

Lifer

TRENDING THREADS