Parsing text file and load in to database

rifken2

Member
Feb 1, 2010
140
0
71
It has been a while since I worked with building a program to parse a text file. Looking for some general direction here.

Scenerio is that we recieve a text file from an overnight process. That text file is currently processed using VBA in Excel. The data is parsed in the VBA and then output to Access. I know, I know... not the best solution but those are the cards I have been dealt.

Long term solution I will switch this over to a SQL database and hook in to that to query the data instead of using Access. I have to spec out a server and await approval and due to red tape it will take 6 weeks or more.

The text file contains around 1.6MM rows of data and the reading and parsing of the file takes a couple minutes. Writing the data to Access takes around 25 minutes. It loads the data from an array in the VBA in the Access database one record at a time.

A couple of thoughts, adding an index to the Access table should speed up the process, but what I would really like is a way to insert all the records at once. It has been quite some time since I have worked with VBA and Access in such a manner. I have converted the VBA to .Net with little hassle but that hasn't sped up the database issue, obviously.

Is there a way to pass the array in to a record set and insert that recordset in one fell swoop? Or something along those lines? There has to be a way to bulk insert that array instead of a single row in the array at a time.

Any ideas?
 

clamum

Lifer
Feb 13, 2003
26,256
406
126
I'm not quite sure I understand the issue completely, but you mentioned converting VBA to .NET.

Couldn't you write a .NET web service, command-line app, or Windows Service that would parse the text file and insert into the Access DB (and it would be easy enough to switch to a SQL Server backend in the future if you did it right)? I'm just surprised at the times you listed; minutes for parsing and 25 mins for inserts in Access seems high for 1.6 million records (granted that is quite a bit, but ~30 mins for that operation seems high to me).
 

rifken2

Member
Feb 1, 2010
140
0
71
I could be mistaken, but I believe the issue is because the logic is inserting one row of the array in to the MDB at a time and then looping to the next row in the array and then inserting that one row at a time.

Long term solution would be using SQL Server and SSIS to do all of this and cut out the crappy VBA and Access. Until I get that in place however I would like to rebuild this so it takes 5 minutes instead of 30 minutes. Several people use this as the first step in their job duties and if I can save 3 man hours a day I can save 15 hours a week in a month I can give the company back 60 hours. Not a bad return for a simple rewrite.

I say simple but I think I am making this too hard!
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Is there a reason that you are inserting into Access instead of just appending the data record to a table? Are you updating the existing record?

If new records each time; consider outputting the array to a CVS file and then letting Access load it in by itself.

Get ahold of SQLServer Express and start you testing; that way you have everything ready when the approval comes in; install and go.
 

KLin

Lifer
Feb 29, 2000
30,449
752
126
Can you post the code and some sample data from the text file?
 

rifken2

Member
Feb 1, 2010
140
0
71
Right now the quickest complete time is going to CSV, about 6 seconds and then importing that in to Access.

To save the users from having to import the CSV I wanted to have the app do that step. It is new MDB everytime. The database is built each time.

I have the project laid out in SQL Server already, just no funding to acutally build it.

The only code I have at the moment is highly modified by someone else to send it to a CSV file. I don't have the original code. I don't know how the original code put it in the MDB. All I know for a fact is they added a single row to the table one record a time.
 

rifken2

Member
Feb 1, 2010
140
0
71
Upon further thought, I should be able to import the CSV in to access via code. That should speed up the process enough until I get the proper platform in place.
 

BoberFett

Lifer
Oct 9, 1999
37,562
9
81
Access is slow, there's not a whole lot you can do to increase speed there that I'm aware of. And adding indexes speeds up access but greatly slows down writes.

When you get SQL Server, look into BULK INSERT. Very fast. You can do any post processing in SQL rather than VBA.
 

rifken2

Member
Feb 1, 2010
140
0
71
On the same page with you... I cut my teeth on VBA, Excel, and Access... I have since graduated to true programming languages and proper platforms. This new job opportunity is not mine yet, but I am putting in advice to help them along to show them what I can do. They are so far in the stone age it isn't even funny. This office is around a $200MM a year in revenue and everything is manual it isn't even funny!

The last thing I want to do is say hire me and give me $60K to build out a new system/platform but that is exactly what I will have to do, so I am trying to show them there are quick wins as well as long term solutions and that is what I bring to the table.
 

ForumMaster

Diamond Member
Feb 24, 2005
7,792
1
0
if you want to increase performance, you have to change tactics.

you're using what is known in the RDBMS world as "row-by-row and very slow".
you need to load the entire file at once.

in oracle, you can create what is known as an external table. the table is created immediately in database, because it isn't stored there, it just read the file.

from there, you can quickly perform a bulk insert of all the rows at once.

i'm certain you can do this in sql server. would be much quicker.
 

KLin

Lifer
Feb 29, 2000
30,449
752
126
Right now the quickest complete time is going to CSV, about 6 seconds and then importing that in to Access.

To save the users from having to import the CSV I wanted to have the app do that step. It is new MDB everytime. The database is built each time.

I have the project laid out in SQL Server already, just no funding to acutally build it.

The only code I have at the moment is highly modified by someone else to send it to a CSV file. I don't have the original code. I don't know how the original code put it in the MDB. All I know for a fact is they added a single row to the table one record a time.

Which is very inefficient since you're basically making 1.6 million sql command executions. I would have written an access import specification and used this in conjunction with the transfertext method to import the data into a table.