• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Easy way to bring in Access DB files into MySQL in Linux

I have a few large access database files, something in the range of 84gb (each db is limited to 2gb, so everytime it fills up we start a new db).

Current record is 150+ million records. I need to import them into MYSQL so we can run queries and what not. I am having trouble bringing in such a large dataset.

To expand on this, we currently have over 80 access db files (at least 1 for each US state) that will need to be merged into one master database. I want speed searching (MyISM) and I want data reliability. We will be doing full text searches for most columns.

Any suggestions on front end help? One option was to make everything into CSV and import it that way.

The DB schema is same for all database files.

Once I import them in, is there a way to have use access to query the db or should I use something else. I dont want to use phpmyadmin for this portion. Any suggestions, will OpenTalend work?
 
Last edited:
I had to do something similar in the past, except on the order of 5000 database files. I found a set of linux tools that allowed me to have CLI access to the database files, and just wrote a script to do all the work.

Depending on how old your files are you might be able to use http://mdbtools.sourceforge.net/

It was pretty easy to do some SELECT * queries from the tables I wanted, alter the data to fit the new formats and then pipe INSERT INTO queries into mysql.
 
I guess for 80 files export-import csv seems reasonable to do manually.

Is the data clean? one could think this would be a good opportunity to also "clean" the data if needed.

Also for import in MySQL disable indexes + constraints for better performance (but contradicting above point).

I'm not a mysql expert, but to choose storage engine the important thing is the kind of workload you have. I personally find mysql a bit confusing because of the different storage engines and especially myisam because isn't transactions and referential integrity the main part of a relational database? So choosing between either a "real RDBMS" or a some kind of relational store + text index seems not optimal.

Do you have to use MySQL? What about PostgreSQL?

Besides that you could also use innodb and an external text search like http://lucene.apache.org/java/docs/index.html
or
http://sphinxsearch.com/
 
I'm not a mysql expert, but to choose storage engine the important thing is the kind of workload you have. I personally find mysql a bit confusing because of the different storage engines and especially myisam because isn't transactions and referential integrity the main part of a relational database? So choosing between either a "real RDBMS" or a some kind of relational store + text index seems not optimal.
This

I've worked only with databases, fractions of yours in size, but to me it always feels like something that was never properly designed from the start, but rather just banged together on the fly like an old shack with a leaking roof.

PostgreSQL has been good to me, but again, I've never dealt with anything on this scale, so I can't say how it handles.
Either way, you want to pick a solid dbms from the start, otherwise you might just find your self in the same dilemma later on.

Anyway, if the files all have the same schema, this shouldn't be too hard.
You could initially create the schema, then write a simple script to batch select -> insert the data.

There are some tools here for converting Access to psql and other engines you might find useful.
http://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL#Microsoft_Access
 
you can work with VERY large tables, the secret is indexing the data, using the indexes when you query, and also using temp tables as part of your sps, so you "hone" down your result sets, your response times will be much quicker even with lots of data.
 
Back
Top