SQL vs noSQL

talion83

Member
Mar 21, 2011
67
0
61
I am working on a project in which we want to move a number of constantly running 'rules' from our main SQL databases onto a new system designed specifically to process these items in a VERY fast manner.

What happens right now is we get in requests from vendors, which we then need to run these requests through a series of rules to make sure they are good requests. Currently it is done through a mixture of T-SQL and scripts on MSSQL Databases. Each one of our vendors will normally submit 400-1000 of these requests a month, each request averages about 20 lines, and each of those lines we need to run against these rules.

If we get in 700 requests, averaging 20 lines per request, from 10 vendors and it takes a second to process each one it would take ~39 hours to do all 10 vendors requests.

So what I am looking to try do is get each one of these processed faster than that. On the hardware side we are looking at moving this system over to SSD based drives. But on the software I am wondering if this would be an appropriate system to look into a noSQL type model.

The main caveat is that we would be looking for something running on Windows Server. I know that there are many types of different noSQL systems and I honestly am not sure what would be appropriate, this is my first delve into that environment.

A few other pieces of information that would probably be helpful:
The rules which are running would be looking at the data in the noSQL system. The data would have information such as First Name, Last Name, Date1, Date2, and a few other key datapoints. The rules would look at the requests and compare them to the information found, then make sure it is a valid request. The scripts are mostly if/then statements at the moment.

As the data itself is all stored in SQL, we aren't really concerned with redundancy within the noSQL system itself - if there was an issue we would just repopulate that main database, and rerun the items through it.

Thank you ahead of time for anyone who is able to help.
 

KLin

Lifer
Feb 29, 2000
29,500
125
106
What sort of request are we talking about? A purchase order?
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
What about caching or keeping all "rules" in memory?

Kind of vague what a "rule" is and how much data you are talking about.
 

KLin

Lifer
Feb 29, 2000
29,500
125
106
Definitely vague. Another option would be looking into indexing certain columns that are getting used as lookups in the rules for faster performance.
 

aceO07

Diamond Member
Nov 6, 2000
4,491
0
76
Have you look into other areas such as looking into which queries are slow (MySQL has a slow queries log, maybe MSSQL does too), or seeing if it's currently limited by IO speed or memory?
 

LokutusofBorg

Golden Member
Mar 20, 2001
1,065
0
76
To do real performance improvements you need to do proper analysis that has the singular goal of finding the performance bottlenecks. What if moving to SSDs, or putting a bunch of effort into moving to a NoSQL solution, yields no performance improvements? Or actually makes it worse?
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
A few other pieces of information that would probably be helpful:
The rules which are running would be looking at the data in the noSQL system. The data would have information such as First Name, Last Name, Date1, Date2, and a few other key datapoints. The rules would look at the requests and compare them to the information found, then make sure it is a valid request. The scripts are mostly if/then statements at the moment.

As the data itself is all stored in SQL, we aren't really concerned with redundancy within the noSQL system itself - if there was an issue we would just repopulate that main database, and rerun the items through it.

Still you would need to ensure that data in SQL and noSQL is up to date (synchronized).

As said indexing could help.

But IMHO the most important question is how much data you have and the distribution of the request. If 80% of request call 20% of the same data, that basically is asking for caching. If all data is requested more or less as often the effect of a cache is smaller but can still improve performance. An option would then be to keep everything in memory and with simple names and addresses i think that could be possible unless you have tens of millions of records.
 

dwell

pics?
Oct 9, 1999
5,189
2
0
Sounds like more of an analytics job than realtime requests. Something like Hadoop might lend itself better to what you're trying to do. If the rules validation is something you only do periodically (once a day, for instance) you can just spin up some Amazon Elastic MapReduce instances daily and let them crunch the data in parallel. Depending on how large your dataset is it should be fast and inexpensive.
 

WannaFly

Platinum Member
Jan 14, 2003
2,811
1
0
What happens right now is we get in requests from vendors, which we then need to run these requests through a series of rules to make sure they are good requests. Currently it is done through a mixture of T-SQL and scripts on MSSQL Databases. Each one of our vendors will normally submit 400-1000 of these requests a month, each request averages about 20 lines, and each of those lines we need to run against these rules.

If we get in 700 requests, averaging 20 lines per request, from 10 vendors and it takes a second to process each one it would take ~39 hours to do all 10 vendors requests.

Is this a system that is already in place? Is it taking 39 hours? I think you are assuming a server can only process them synchronously, whereas it can process multiple asynchronously (probably). You need to do some serious performance testing and see where the real bottleneck is. 1 second is an eternity for a SQL request, IMHO. A properly written stored procedure and indexed tables, etc should be able to fullfill your rules within milliseconds.
 

talion83

Member
Mar 21, 2011
67
0
61
Thank you for all the initial responses, I will try and add as much information as I can:

What about caching or keeping all "rules" in memory?

Kind of vague what a "rule" is and how much data you are talking about.
It does a look up on the first/last name to make sure that person exists, looks up the dates to make sure those are in the correct range, then there are specific codes that are entered it needs to look up, etc...basically it goes through each item in each line to make sure they are all valid. Right now it is all in a VBscript with a lot of if/then statements. As to memory - currently these rules are running on the primary database server for the application, so keeping them in memory would slow down everything else which is why we are looking to offload it to another system.

The dataset right now is only about 250,000 records but we are expecting that to grow by 5 times or more in the next two years. Which also means the number of incoming requests will be growing as well by the same amount.

How about a proper rules engine such as Drools?
I will look into this engine.

Have you look into other areas such as looking into which queries are slow (MySQL has a slow queries log, maybe MSSQL does too), or seeing if it's currently limited by IO speed or memory?
Still you would need to ensure that data in SQL and noSQL is up to date (synchronized).

As said indexing could help.

But IMHO the most important question is how much data you have and the distribution of the request. If 80% of request call 20% of the same data, that basically is asking for caching. If all data is requested more or less as often the effect of a cache is smaller but can still improve performance. An option would then be to keep everything in memory and with simple names and addresses i think that could be possible unless you have tens of millions of records.

MSSQL does of analytic tools. And these need to be run on an 'as soon as possible' basis (currently they are run hourly). As for the data staying up to date - only the specific columns of information from SQL that are necessary would be taken over, the requests are always for previous dates (never same day) so having yesterdays data is fine. That is the only data that would need to be synchronized - after that all of the other information is the incoming requests.
---------------

Ultimately - we are going to be offloading these rules to a separate system. The question is should we be sticking with SQL or looking at a noSQL system. I have heard that in certain instances noSQL can be considerably faster than SQL - so I am trying to discover if this may be one of those instances.

If we stayed with SQL, then the plan would be to keep just the columns we need synchronized between them (which is the same as it would be with the noSQL) and those columns would be indexed.

As mentioned above we are expecting an increase of 5 times or more in the next two years, so we want to make sure that something which is 'fast enough' today will also be fast enough then.
 

talion83

Member
Mar 21, 2011
67
0
61
Is this a system that is already in place? Is it taking 39 hours? I think you are assuming a server can only process them synchronously, whereas it can process multiple asynchronously (probably). You need to do some serious performance testing and see where the real bottleneck is. 1 second is an eternity for a SQL request, IMHO. A properly written stored procedure and indexed tables, etc should be able to fullfill your rules within milliseconds.

Yes and no, my 1 second comment was just a comment - not really sure why I put it in to be honest (I am going to blame it being a hectic day yesterday). There is a system currently in place though I have not (as of yet) had a chance to delve into it - it was put into place 8 years ago by an employee who isn't here anymore. From what I've been told it hasn't been scaling well.

I was told we want to move the rules onto a new server and were looking into testing out different systems that can provide very high performance, in a situation where Redundancy and consistency a concern.

From my understanding the place that noSQL tends to do best is in a scenario in which it C and R are not high priorities - but pure performance speed is.
 

MrChad

Lifer
Aug 22, 2001
13,507
3
81
The dataset right now is only about 250,000 records but we are expecting that to grow by 5 times or more in the next two years. Which also means the number of incoming requests will be growing as well by the same amount.

250,000 records (even 1,250,000 records) is peanuts for a properly configured SQL-based DBMS. My guess (as others have suggested) is that you are missing some indexes or that your rules are being executed inefficiently. A rules engine will help you manage and maintain your rules, but ultimately you're going to have to look at the underlying database and queries to determine where optimization is needed.

NoSQL is very cool, but in general it's used for massive multi-TB datasets (think Facebook, Amazon). You aren't even close to approaching that size of data, so I think a NoSQL solution would be overkill for what you want to do.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
Ok VbScript, that rings certain alarm bells. How are the checks done? Select whole record or a select for each rule? All select under the same connection (very important)? What about connection pooling?

Creating a connection to database is pretty time-intensive, up to hunderts of milliseconds and could explain the issues.

besides that, go with a proper language (eg. Java, C#) and a cache solution for the chosen language. As far as I understood you have very structured data and hence perfect for SQL/RDBMS system.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
13
81
www.markbetz.net
Yeah, you're way overthinking this for the requirements you've outlined. Address the performance issues. You don't need a rules engine or a different dbms. The number of rows you're talking about isn't consequential even at your expected growth rates.

First, decide where to run the checks. As beginner99 alludes, VbScript implies some possible inefficiencies. If you want the maximum performance then the best way to go on your current platform is probably to write a stored procedure that takes the parameters of a request, performs the checks, and returns a boolean result. I'm not saying this is the best long-term place to have those rules, from a business perspective, but it will probably be fastest.
 

talion83

Member
Mar 21, 2011
67
0
61
Thank you all for your responses. I will stick with standard SQL. As far as the scripting portion of it - yes this process is going to be completely recreated from the top down.
 

BoberFett

Lifer
Oct 9, 1999
37,563
9
81
I find it hard to believe that there's much overhead in the VBScript portion. Unless your rules are incredibly complex, you'll likely find that most of the time is calls to SQL running individual queries and pulling back a single record at a time to check your rules. Moving to another language likely won't make as much of a difference as you'd like, so use the language you're most comfortable with maintaining and not the one that has the highest theoretical performance when running world climate simulation calculations.
 

talion83

Member
Mar 21, 2011
67
0
61
I don't necessarily plan on changing the script language (though it may move to C# depending upon if our new programmer works on it) - I meant more the general design/concept of how it is working. Right now it isn't using any stored procedures, the information it is grabbing from SQL is in several different tables (and in some cases even different DB's), there is no indexing on those columns...basically the original design wasn't done very well.

So I get to figure out everything it is currently doing, map it all out, then figure out how to do it better. Just applying indexing to most of the tables has already increased the performance on many of these older scripts/applications significantly. So I expect much the same from this one.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
13
81
www.markbetz.net
Unless your rules are incredibly complex, you'll likely find that most of the time is calls to SQL running individual queries and pulling back a single record at a time to check your rules.

Actually I assumed that was the primary implication of his doing it in VBScript: pulling the data back one record at a time and performing the rules check in the script.
 

KLin

Lifer
Feb 29, 2000
29,500
125
106
How are you accessing the sql server from vbscript? Are you using ADO or DAO recordsets? Are you opening up a new recordset for each line item for validation?
 

BoberFett

Lifer
Oct 9, 1999
37,563
9
81
Actually I assumed that was the primary implication of his doing it in VBScript: pulling the data back one record at a time and performing the rules check in the script.

Right, in which case rewriting it in a "proper language" as has been suggested would yield little in the way of results. VBScript is fine if all it's doing is launching stored procs where most of the work is done. I get weary of the language argument.
 

BoberFett

Lifer
Oct 9, 1999
37,563
9
81
I don't necessarily plan on changing the script language (though it may move to C# depending upon if our new programmer works on it) - I meant more the general design/concept of how it is working. Right now it isn't using any stored procedures, the information it is grabbing from SQL is in several different tables (and in some cases even different DB's), there is no indexing on those columns...basically the original design wasn't done very well.

So I get to figure out everything it is currently doing, map it all out, then figure out how to do it better. Just applying indexing to most of the tables has already increased the performance on many of these older scripts/applications significantly. So I expect much the same from this one.

Be cautious not to over-index, especially if you do a lot of writing to those tables. Indexes are like salt. A little is good, too much can ruin the dish.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
I find it hard to believe that there's much overhead in the VBScript portion. Unless your rules are incredibly complex, you'll likely find that most of the time is calls to SQL running individual queries and pulling back a single record at a time to check your rules. Moving to another language likely won't make as much of a difference as you'd like, so use the language you're most comfortable with maintaining and not the one that has the highest theoretical performance when running world climate simulation calculations.

BoberFett said:
Right, in which case rewriting it in a "proper language" as has been suggested would yield little in the way of results. VBScript is fine if all it's doing is launching stored procs where most of the work is done. I get weary of the language argument.

Well, I don't know much about VBScript but I doubt it has any useful mechanisms for caching which could be very helpful.
Also if each request opens a new database connection, I bet most of the time will be spend creating connections. Although I agree that language is somewhat irrelevant for this mistake, because it can be made in any language. However I don't know if there is connection pooling in VBScript. This could help a bit too.

Also it is clear that it is calling more than just stored-procedures. Besides that I'm on the side of putting business logic in the application and not the database but that is a whole other topic.
 

ringtail

Golden Member
Mar 10, 2012
1,030
34
91
I think for small databases, that computer processing power is so very fast, that the entire db ca be searched for 'whatever' so fast it's really unnecessary to worry about it.

WHen you're dealing with say maybe (arbitrarily) half a million or more records, then the noSQL approach starts losing to a properly structured RDBMS structured design.

OP, I lean toward thinking indexes are your solution.