Go Back   AnandTech Forums > Software > Programming

Forums
· Hardware and Technology
· CPUs and Overclocking
· Motherboards
· Video Cards and Graphics
· Memory and Storage
· Power Supplies
· Cases & Cooling
· SFF, Notebooks, Pre-Built/Barebones PCs
· Networking
· Peripherals
· General Hardware
· Highly Technical
· Computer Help
· Home Theater PCs
· Consumer Electronics
· Digital and Video Cameras
· Mobile Devices & Gadgets
· Audio/Video & Home Theater
· Software
· Software for Windows
· All Things Apple
· *nix Software
· Operating Systems
· Programming
· PC Gaming
· Console Gaming
· Distributed Computing
· Security
· Social
· Off Topic
· Politics and News
· Discussion Club
· Love and Relationships
· The Garage
· Health and Fitness
· Home and Garden
· Merchandise and Shopping
· For Sale/Trade
· Hot Deals with Free Stuff/Contests
· Black Friday 2014
· Forum Issues
· Technical Forum Issues
· Personal Forum Issues
· Suggestion Box
· Moderator Resources
· Moderator Discussions
   

Reply
 
Thread Tools
Old 04-30-2012, 01:53 PM   #1
talion83
Member
 
Join Date: Mar 2011
Posts: 65
Default SQL vs noSQL

I am working on a project in which we want to move a number of constantly running 'rules' from our main SQL databases onto a new system designed specifically to process these items in a VERY fast manner.

What happens right now is we get in requests from vendors, which we then need to run these requests through a series of rules to make sure they are good requests. Currently it is done through a mixture of T-SQL and scripts on MSSQL Databases. Each one of our vendors will normally submit 400-1000 of these requests a month, each request averages about 20 lines, and each of those lines we need to run against these rules.

If we get in 700 requests, averaging 20 lines per request, from 10 vendors and it takes a second to process each one it would take ~39 hours to do all 10 vendors requests.

So what I am looking to try do is get each one of these processed faster than that. On the hardware side we are looking at moving this system over to SSD based drives. But on the software I am wondering if this would be an appropriate system to look into a noSQL type model.

The main caveat is that we would be looking for something running on Windows Server. I know that there are many types of different noSQL systems and I honestly am not sure what would be appropriate, this is my first delve into that environment.

A few other pieces of information that would probably be helpful:
The rules which are running would be looking at the data in the noSQL system. The data would have information such as First Name, Last Name, Date1, Date2, and a few other key datapoints. The rules would look at the requests and compare them to the information found, then make sure it is a valid request. The scripts are mostly if/then statements at the moment.

As the data itself is all stored in SQL, we aren't really concerned with redundancy within the noSQL system itself - if there was an issue we would just repopulate that main database, and rerun the items through it.

Thank you ahead of time for anyone who is able to help.
talion83 is offline   Reply With Quote
Old 04-30-2012, 03:18 PM   #2
KLin
Lifer
 
KLin's Avatar
 
Join Date: Feb 2000
Location: Phreaznaux
Posts: 28,772
Default

What sort of request are we talking about? A purchase order?
__________________
'L_'
KLin is online now   Reply With Quote
Old 04-30-2012, 03:26 PM   #3
beginner99
Platinum Member
 
Join Date: Jun 2009
Posts: 2,248
Default

What about caching or keeping all "rules" in memory?

Kind of vague what a "rule" is and how much data you are talking about.
beginner99 is offline   Reply With Quote
Old 04-30-2012, 03:28 PM   #4
KLin
Lifer
 
KLin's Avatar
 
Join Date: Feb 2000
Location: Phreaznaux
Posts: 28,772
Default

Definitely vague. Another option would be looking into indexing certain columns that are getting used as lookups in the rules for faster performance.
__________________
'L_'
KLin is online now   Reply With Quote
Old 04-30-2012, 03:30 PM   #5
MrChad
Lifer
 
MrChad's Avatar
 
Join Date: Aug 2001
Posts: 13,500
Default

How about a proper rules engine such as Drools?
MrChad is offline   Reply With Quote
Old 04-30-2012, 07:07 PM   #6
aceO07
Diamond Member
 
Join Date: Nov 2000
Posts: 4,489
Default

Have you look into other areas such as looking into which queries are slow (MySQL has a slow queries log, maybe MSSQL does too), or seeing if it's currently limited by IO speed or memory?
aceO07 is offline   Reply With Quote
Old 04-30-2012, 10:43 PM   #7
LokutusofBorg
Golden Member
 
LokutusofBorg's Avatar
 
Join Date: Mar 2001
Location: Rocky Mtns, USA
Posts: 1,055
Default

To do real performance improvements you need to do proper analysis that has the singular goal of finding the performance bottlenecks. What if moving to SSDs, or putting a bunch of effort into moving to a NoSQL solution, yields no performance improvements? Or actually makes it worse?
LokutusofBorg is offline   Reply With Quote
Old 05-01-2012, 03:07 AM   #8
beginner99
Platinum Member
 
Join Date: Jun 2009
Posts: 2,248
Default

Quote:
Originally Posted by talion83 View Post
A few other pieces of information that would probably be helpful:
The rules which are running would be looking at the data in the noSQL system. The data would have information such as First Name, Last Name, Date1, Date2, and a few other key datapoints. The rules would look at the requests and compare them to the information found, then make sure it is a valid request. The scripts are mostly if/then statements at the moment.

As the data itself is all stored in SQL, we aren't really concerned with redundancy within the noSQL system itself - if there was an issue we would just repopulate that main database, and rerun the items through it.
Still you would need to ensure that data in SQL and noSQL is up to date (synchronized).

As said indexing could help.

But IMHO the most important question is how much data you have and the distribution of the request. If 80% of request call 20% of the same data, that basically is asking for caching. If all data is requested more or less as often the effect of a cache is smaller but can still improve performance. An option would then be to keep everything in memory and with simple names and addresses i think that could be possible unless you have tens of millions of records.
beginner99 is offline   Reply With Quote
Old 05-01-2012, 08:04 AM   #9
dwell
pics?
 
Join Date: Oct 1999
Posts: 5,190
Default

Sounds like more of an analytics job than realtime requests. Something like Hadoop might lend itself better to what you're trying to do. If the rules validation is something you only do periodically (once a day, for instance) you can just spin up some Amazon Elastic MapReduce instances daily and let them crunch the data in parallel. Depending on how large your dataset is it should be fast and inexpensive.
dwell is offline   Reply With Quote
Old 05-01-2012, 09:49 AM   #10
WannaFly
Platinum Member
 
WannaFly's Avatar
 
Join Date: Jan 2003
Posts: 2,811
Default

Quote:
Originally Posted by talion83 View Post
What happens right now is we get in requests from vendors, which we then need to run these requests through a series of rules to make sure they are good requests. Currently it is done through a mixture of T-SQL and scripts on MSSQL Databases. Each one of our vendors will normally submit 400-1000 of these requests a month, each request averages about 20 lines, and each of those lines we need to run against these rules.

If we get in 700 requests, averaging 20 lines per request, from 10 vendors and it takes a second to process each one it would take ~39 hours to do all 10 vendors requests.
Is this a system that is already in place? Is it taking 39 hours? I think you are assuming a server can only process them synchronously, whereas it can process multiple asynchronously (probably). You need to do some serious performance testing and see where the real bottleneck is. 1 second is an eternity for a SQL request, IMHO. A properly written stored procedure and indexed tables, etc should be able to fullfill your rules within milliseconds.
WannaFly is offline   Reply With Quote
Old 05-01-2012, 09:58 AM   #11
talion83
Member
 
Join Date: Mar 2011
Posts: 65
Default

Thank you for all the initial responses, I will try and add as much information as I can:

Quote:
What about caching or keeping all "rules" in memory?

Kind of vague what a "rule" is and how much data you are talking about.
It does a look up on the first/last name to make sure that person exists, looks up the dates to make sure those are in the correct range, then there are specific codes that are entered it needs to look up, etc...basically it goes through each item in each line to make sure they are all valid. Right now it is all in a VBscript with a lot of if/then statements. As to memory - currently these rules are running on the primary database server for the application, so keeping them in memory would slow down everything else which is why we are looking to offload it to another system.

The dataset right now is only about 250,000 records but we are expecting that to grow by 5 times or more in the next two years. Which also means the number of incoming requests will be growing as well by the same amount.

Quote:
How about a proper rules engine such as Drools?
I will look into this engine.

Quote:
Have you look into other areas such as looking into which queries are slow (MySQL has a slow queries log, maybe MSSQL does too), or seeing if it's currently limited by IO speed or memory?
Quote:
Still you would need to ensure that data in SQL and noSQL is up to date (synchronized).

As said indexing could help.

But IMHO the most important question is how much data you have and the distribution of the request. If 80% of request call 20% of the same data, that basically is asking for caching. If all data is requested more or less as often the effect of a cache is smaller but can still improve performance. An option would then be to keep everything in memory and with simple names and addresses i think that could be possible unless you have tens of millions of records.
MSSQL does of analytic tools. And these need to be run on an 'as soon as possible' basis (currently they are run hourly). As for the data staying up to date - only the specific columns of information from SQL that are necessary would be taken over, the requests are always for previous dates (never same day) so having yesterdays data is fine. That is the only data that would need to be synchronized - after that all of the other information is the incoming requests.
---------------

Ultimately - we are going to be offloading these rules to a separate system. The question is should we be sticking with SQL or looking at a noSQL system. I have heard that in certain instances noSQL can be considerably faster than SQL - so I am trying to discover if this may be one of those instances.

If we stayed with SQL, then the plan would be to keep just the columns we need synchronized between them (which is the same as it would be with the noSQL) and those columns would be indexed.

As mentioned above we are expecting an increase of 5 times or more in the next two years, so we want to make sure that something which is 'fast enough' today will also be fast enough then.
talion83 is offline   Reply With Quote
Old 05-01-2012, 10:21 AM   #12
talion83
Member
 
Join Date: Mar 2011
Posts: 65
Default

Quote:
Originally Posted by WannaFly View Post
Is this a system that is already in place? Is it taking 39 hours? I think you are assuming a server can only process them synchronously, whereas it can process multiple asynchronously (probably). You need to do some serious performance testing and see where the real bottleneck is. 1 second is an eternity for a SQL request, IMHO. A properly written stored procedure and indexed tables, etc should be able to fullfill your rules within milliseconds.
Yes and no, my 1 second comment was just a comment - not really sure why I put it in to be honest (I am going to blame it being a hectic day yesterday). There is a system currently in place though I have not (as of yet) had a chance to delve into it - it was put into place 8 years ago by an employee who isn't here anymore. From what I've been told it hasn't been scaling well.

I was told we want to move the rules onto a new server and were looking into testing out different systems that can provide very high performance, in a situation where Redundancy and consistency a concern.

From my understanding the place that noSQL tends to do best is in a scenario in which it C and R are not high priorities - but pure performance speed is.
talion83 is offline   Reply With Quote
Old 05-01-2012, 10:25 AM   #13
MrChad
Lifer
 
MrChad's Avatar
 
Join Date: Aug 2001
Posts: 13,500
Default

Quote:
Originally Posted by talion83 View Post
The dataset right now is only about 250,000 records but we are expecting that to grow by 5 times or more in the next two years. Which also means the number of incoming requests will be growing as well by the same amount.
250,000 records (even 1,250,000 records) is peanuts for a properly configured SQL-based DBMS. My guess (as others have suggested) is that you are missing some indexes or that your rules are being executed inefficiently. A rules engine will help you manage and maintain your rules, but ultimately you're going to have to look at the underlying database and queries to determine where optimization is needed.

NoSQL is very cool, but in general it's used for massive multi-TB datasets (think Facebook, Amazon). You aren't even close to approaching that size of data, so I think a NoSQL solution would be overkill for what you want to do.
MrChad is offline   Reply With Quote
Old 05-01-2012, 11:46 AM   #14
Zargon
Lifer
 
Join Date: Nov 2009
Location: IL
Posts: 11,930
Default

http://www.brentozar.com/archive/200...rmance-tuning/


why dont you run some metrics and see if something is lacking hardware wise thats easy to fix first?

http://www.brentozar.com/archive/200...rmance-tuning/
__________________
heatware | steam | ebay
Reaumur(Gamer): Silverstone Raven02, Antec HCG900, Z77 Fata1ity/3770K, 16GB DDR3-1833, 2x7950@1Ghz, 27" IPS
Kelvin(3D-HTPC):Q9550 w/H50, P5G41, 4GB DDR2-1066, 5870 w accellero xtreme
Chill(Server): Thuban x6@3.3 wH50 8gb ddr3 4x1.5tb ESXi
Zargon is offline   Reply With Quote
Old 05-01-2012, 11:48 AM   #15
beginner99
Platinum Member
 
Join Date: Jun 2009
Posts: 2,248
Default

Ok VbScript, that rings certain alarm bells. How are the checks done? Select whole record or a select for each rule? All select under the same connection (very important)? What about connection pooling?

Creating a connection to database is pretty time-intensive, up to hunderts of milliseconds and could explain the issues.

besides that, go with a proper language (eg. Java, C#) and a cache solution for the chosen language. As far as I understood you have very structured data and hence perfect for SQL/RDBMS system.
beginner99 is offline   Reply With Quote
Old 05-01-2012, 02:25 PM   #16
Markbnj
Moderator
Programming
 
Markbnj's Avatar
 
Join Date: Sep 2005
Posts: 13,422
Default

Yeah, you're way overthinking this for the requirements you've outlined. Address the performance issues. You don't need a rules engine or a different dbms. The number of rows you're talking about isn't consequential even at your expected growth rates.

First, decide where to run the checks. As beginner99 alludes, VbScript implies some possible inefficiencies. If you want the maximum performance then the best way to go on your current platform is probably to write a stored procedure that takes the parameters of a request, performs the checks, and returns a boolean result. I'm not saying this is the best long-term place to have those rules, from a business perspective, but it will probably be fastest.
__________________
Everytime I try to tell you, the words just come out wrong

**
Some meaningless scribbling of no account

The 4th Realm

Arts and Letters Daily - Get some culture
Markbnj is offline   Reply With Quote
Old 05-02-2012, 09:12 AM   #17
talion83
Member
 
Join Date: Mar 2011
Posts: 65
Default

Thank you all for your responses. I will stick with standard SQL. As far as the scripting portion of it - yes this process is going to be completely recreated from the top down.
talion83 is offline   Reply With Quote
Old 05-02-2012, 02:43 PM   #18
BoberFett
Lifer
 
BoberFett's Avatar
 
Join Date: Oct 1999
Location: Minneapolis
Posts: 34,018
Default

I find it hard to believe that there's much overhead in the VBScript portion. Unless your rules are incredibly complex, you'll likely find that most of the time is calls to SQL running individual queries and pulling back a single record at a time to check your rules. Moving to another language likely won't make as much of a difference as you'd like, so use the language you're most comfortable with maintaining and not the one that has the highest theoretical performance when running world climate simulation calculations.
__________________
"You had me at Meat Tornado." -- "I was born ready. I'm Ron f***ing Swanson."
BoberFett is offline   Reply With Quote
Old 05-02-2012, 03:53 PM   #19
talion83
Member
 
Join Date: Mar 2011
Posts: 65
Default

I don't necessarily plan on changing the script language (though it may move to C# depending upon if our new programmer works on it) - I meant more the general design/concept of how it is working. Right now it isn't using any stored procedures, the information it is grabbing from SQL is in several different tables (and in some cases even different DB's), there is no indexing on those columns...basically the original design wasn't done very well.

So I get to figure out everything it is currently doing, map it all out, then figure out how to do it better. Just applying indexing to most of the tables has already increased the performance on many of these older scripts/applications significantly. So I expect much the same from this one.
talion83 is offline   Reply With Quote
Old 05-02-2012, 04:25 PM   #20
Markbnj
Moderator
Programming
 
Markbnj's Avatar
 
Join Date: Sep 2005
Posts: 13,422
Default

Quote:
Originally Posted by BoberFett View Post
Unless your rules are incredibly complex, you'll likely find that most of the time is calls to SQL running individual queries and pulling back a single record at a time to check your rules.
Actually I assumed that was the primary implication of his doing it in VBScript: pulling the data back one record at a time and performing the rules check in the script.
__________________
Everytime I try to tell you, the words just come out wrong

**
Some meaningless scribbling of no account

The 4th Realm

Arts and Letters Daily - Get some culture
Markbnj is offline   Reply With Quote
Old 05-02-2012, 06:48 PM   #21
KLin
Lifer
 
KLin's Avatar
 
Join Date: Feb 2000
Location: Phreaznaux
Posts: 28,772
Default

How are you accessing the sql server from vbscript? Are you using ADO or DAO recordsets? Are you opening up a new recordset for each line item for validation?
__________________
'L_'
KLin is online now   Reply With Quote
Old 05-02-2012, 08:15 PM   #22
BoberFett
Lifer
 
BoberFett's Avatar
 
Join Date: Oct 1999
Location: Minneapolis
Posts: 34,018
Default

Quote:
Originally Posted by Markbnj View Post
Actually I assumed that was the primary implication of his doing it in VBScript: pulling the data back one record at a time and performing the rules check in the script.
Right, in which case rewriting it in a "proper language" as has been suggested would yield little in the way of results. VBScript is fine if all it's doing is launching stored procs where most of the work is done. I get weary of the language argument.
__________________
"You had me at Meat Tornado." -- "I was born ready. I'm Ron f***ing Swanson."
BoberFett is offline   Reply With Quote
Old 05-02-2012, 08:17 PM   #23
BoberFett
Lifer
 
BoberFett's Avatar
 
Join Date: Oct 1999
Location: Minneapolis
Posts: 34,018
Default

Quote:
Originally Posted by talion83 View Post
I don't necessarily plan on changing the script language (though it may move to C# depending upon if our new programmer works on it) - I meant more the general design/concept of how it is working. Right now it isn't using any stored procedures, the information it is grabbing from SQL is in several different tables (and in some cases even different DB's), there is no indexing on those columns...basically the original design wasn't done very well.

So I get to figure out everything it is currently doing, map it all out, then figure out how to do it better. Just applying indexing to most of the tables has already increased the performance on many of these older scripts/applications significantly. So I expect much the same from this one.
Be cautious not to over-index, especially if you do a lot of writing to those tables. Indexes are like salt. A little is good, too much can ruin the dish.
__________________
"You had me at Meat Tornado." -- "I was born ready. I'm Ron f***ing Swanson."
BoberFett is offline   Reply With Quote
Old 05-03-2012, 03:36 AM   #24
beginner99
Platinum Member
 
Join Date: Jun 2009
Posts: 2,248
Default

Quote:
Originally Posted by BoberFett View Post
I find it hard to believe that there's much overhead in the VBScript portion. Unless your rules are incredibly complex, you'll likely find that most of the time is calls to SQL running individual queries and pulling back a single record at a time to check your rules. Moving to another language likely won't make as much of a difference as you'd like, so use the language you're most comfortable with maintaining and not the one that has the highest theoretical performance when running world climate simulation calculations.
Quote:
Originally Posted by BoberFett
Right, in which case rewriting it in a "proper language" as has been suggested would yield little in the way of results. VBScript is fine if all it's doing is launching stored procs where most of the work is done. I get weary of the language argument.
Well, I don't know much about VBScript but I doubt it has any useful mechanisms for caching which could be very helpful.
Also if each request opens a new database connection, I bet most of the time will be spend creating connections. Although I agree that language is somewhat irrelevant for this mistake, because it can be made in any language. However I don't know if there is connection pooling in VBScript. This could help a bit too.

Also it is clear that it is calling more than just stored-procedures. Besides that I'm on the side of putting business logic in the application and not the database but that is a whole other topic.
beginner99 is offline   Reply With Quote
Old 05-03-2012, 08:37 PM   #25
ringtail
Senior Member
 
ringtail's Avatar
 
Join Date: Mar 2012
Location: USA left coast
Posts: 802
Default

I think for small databases, that computer processing power is so very fast, that the entire db ca be searched for 'whatever' so fast it's really unnecessary to worry about it.

WHen you're dealing with say maybe (arbitrarily) half a million or more records, then the noSQL approach starts losing to a properly structured RDBMS structured design.

OP, I lean toward thinking indexes are your solution.
ringtail is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 02:09 AM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.