Parsing my website into a database

cirrrocco

Golden Member
Sep 7, 2004
1,952
78
91
so guys, me and my friend have been writing down lyrics for many many trance songs over the last 7 or 8 years. My website is pretty straightforward

Letter from A - Z and then artist names and then when yo click on artist, the list of songs opens up and then there are a bunch of static HTML pages

Now to ease maintenance I want to store all this info into a database and create a database driven website. [In the meantime update some programming skills as well]

so I know

1. I would first need a parser to look through all my html files and parse data,
3. post the data from parser into a database.
3. publish the data [ I guess I can mysql, php for that, my sis knows that and hopefully she can help me with that]

what languages should I learn. are there any automated tools that can do step 1 and 2.
 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
1. C++, C, C#, Python, Ruby on Rails, Pascal, Perl, etc. Pretty much any language can do this. I would probably pick one with regular expressions built in and use that.
2. I would use the same language that was used for parsing to do the posting. Most languages have the ability to work with the ODBC drivers in one way or another.
3. Just make sure that you strip out all of the the HTML from the song that you can. A database could technically store the entire webpage as a single entry, however, having static html in your database almost defeats the purpose of going to a database in the first place.
 

cirrrocco

Golden Member
Sep 7, 2004
1,952
78
91
1. C++, C, C#, Python, Ruby on Rails, Pascal, Perl, etc. Pretty much any language can do this. I would probably pick one with regular expressions built in and use that.
2. I would use the same language that was used for parsing to do the posting. Most languages have the ability to work with the ODBC drivers in one way or another.
3. Just make sure that you strip out all of the the HTML from the song that you can. A database could technically store the entire webpage as a single entry, however, having static html in your database almost defeats the purpose of going to a database in the first place.

Thanks.

Question about the third point. Now I want to separate data from design. right now my page contains common elements and if I have to change it, I have to update all my four - five thousand pages and reupload them. so lets say I add a new element to the header, is it possible to change just one 'control' page and just upload that one page so that I dont have to reupload all the other pages.

I dont want to use frames and make it 90's design. I would be greatful if you could let me know, if there exists something like that.
 

alkemyst

No Lifer
Feb 13, 2001
83,769
19
81
CSS is what you are looking for.

Build the main 'page', let the database backend seed it and then use CSS to control the look and feel.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
CSS is what you are looking for.

Build the main 'page', let the database backend seed it and then use CSS to control the look and feel.

CSS is only a small part of the picture. For dynamic content like he wants he's going to need to use a scripting language to query the database, insert the HTML markup around the dynamic content and then finally use CSS to make it look pretty.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Right, the database shouldn't be storing pages as such, it should store the lyrics and some metadata (artist, song, etc.).

Then either bare scripting (PHP, whatever) or some fancy content management system builds pages on the fly based on the content of that database record.

One example:

(mysite.com) /builder.php ? song=I+got+a+feeling & artist=black+eyed+peas

That looks up the lyrics, writes out the page and inserts artist, title, lyrics into the proper spots on that page. It also references any script code and CSS pages needed.

Of course you have to sanitize the input, because someone will try passing in garbage to try to hack your site.
 

Aluvus

Platinum Member
Apr 27, 2006
2,913
1
0
Thanks.

Question about the third point. Now I want to separate data from design. right now my page contains common elements and if I have to change it, I have to update all my four - five thousand pages and reupload them. so lets say I add a new element to the header, is it possible to change just one 'control' page and just upload that one page so that I dont have to reupload all the other pages.

I dont want to use frames and make it 90's design. I would be greatful if you could let me know, if there exists something like that.

On most sites this is done with PHP. You create either a template file (and then content is dynamically shoved into it), or a simpler template plus supporting files to contain headers/footers/etc.

My suggestion would be to use an off-the-shelf content management system (CMS), figure out how it stores content, and then work on shoving your existing content into the database for the CMS. Using a "real" CMS will give you more features with less work, and probably fewer security problems.
 

manlymatt83

Lifer
Oct 14, 2005
10,051
44
91
CSS is only a small part of the picture. For dynamic content like he wants he's going to need to use a scripting language to query the database, insert the HTML markup around the dynamic content and then finally use CSS to make it look pretty.

Not necessarily needed (at least not a huge foorprint)... if he uses something like mongodb he could take the JSON right from it, massage it a tad, and have javascript get it with AJAX and parse it...
 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
Thanks.

Question about the third point. Now I want to separate data from design. right now my page contains common elements and if I have to change it, I have to update all my four - five thousand pages and reupload them. so lets say I add a new element to the header, is it possible to change just one 'control' page and just upload that one page so that I dont have to reupload all the other pages.

I dont want to use frames and make it 90's design. I would be greatful if you could let me know, if there exists something like that.

Right, the database shouldn't be storing pages as such, it should store the lyrics and some metadata (artist, song, etc.).

Then either bare scripting (PHP, whatever) or some fancy content management system builds pages on the fly based on the content of that database record.

One example:

(mysite.com) /builder.php ? song=I+got+a+feeling & artist=black+eyed+peas

That looks up the lyrics, writes out the page and inserts artist, title, lyrics into the proper spots on that page. It also references any script code and CSS pages needed.

Of course you have to sanitize the input, because someone will try passing in garbage to try to hack your site.

On most sites this is done with PHP. You create either a template file (and then content is dynamically shoved into it), or a simpler template plus supporting files to contain headers/footers/etc.

My suggestion would be to use an off-the-shelf content management system (CMS), figure out how it stores content, and then work on shoving your existing content into the database for the CMS. Using a "real" CMS will give you more features with less work, and probably fewer security problems.
These pretty much cover what you will want to do.

The biggest issue you are going to run into is the fact that your 5000 hand-made static pages will probably vary from page to page. Parsing data like that isn't going to be fun and will almost require that you check each and every parse to make sure what you want comes out.

After that, doing template work with php really isn't all that bad. MVC type development environments (ALA Ruby on rails or ASP.NET) are really pretty handy for this sort of stuff as well.

Like Dave said, make sure that your data is all cleaned and scrubbed. On top of that, all user input destine for a SQL query should be bound to the query and no added to the query. This is one of the main points of attack for hackers.
 

alkemyst

No Lifer
Feb 13, 2001
83,769
19
81
CSS is only a small part of the picture. For dynamic content like he wants he's going to need to use a scripting language to query the database, insert the HTML markup around the dynamic content and then finally use CSS to make it look pretty.

that was the part I mentioned about building the 'main' page.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
that was the part I mentioned about building the 'main' page.

Sure, but your post makes it seem like CSS is the main tool for the job when in fact it's not even needed, although highly recommended to style the pages.
 

alkemyst

No Lifer
Feb 13, 2001
83,769
19
81
Sure, but your post makes it seem like CSS is the main tool for the job when in fact it's not even needed, although highly recommended to style the pages.

only because you read it that way.

In reality all he needs is html/php/perl and a database.
 

OogyWaWa

Senior member
Jan 20, 2009
623
0
71
I wrote a Perl script that does exactly what you are talking about (the extract from HTML and load into DB). It really isn't that difficult, especially if your data is well organized (mine was note). PM me if you want the source.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
only because you read it that way.

In reality all he needs is html/php/perl and a database.

Then why didn't you mention that in your post? Your post was very vague and I didn't want the OP to go down a rabbit hole trying to learn CSS when it's not even the main tool for the job.
 

alkemyst

No Lifer
Feb 13, 2001
83,769
19
81
Then why didn't you mention that in your post? Your post was very vague and I didn't want the OP to go down a rabbit hole trying to learn CSS when it's not even the main tool for the job.

None of us know exactly what the OP really needs, it's suggestions.

WTF is your malfunction?
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Then why didn't you mention that in your post? Your post was very vague and I didn't want the OP to go down a rabbit hole trying to learn CSS when it's not even the main tool for the job.

Just to be clear, alkemyst's reply directly addressed the OP's question in post 3 about separating data and design. I didn't find it vague at all when read as a response to that question. More could have been said, sure, but not everyone has tons of time, and anyone can Google 'CSS' and be off and running.

I encourage everyone to take the time to read an entire thread before replying to it. I know it's often hard to come up with the time to do this.
 

cirrrocco

Golden Member
Sep 7, 2004
1,952
78
91
I wrote a Perl script that does exactly what you are talking about (the extract from HTML and load into DB). It really isn't that difficult, especially if your data is well organized (mine was note). PM me if you want the source.

Thanks oogywawa. I will PM you. My pages are pretty organized actually. I can easily find out exactly where in the html page the content starts and that is common for all 4-5K pages.

I have been using a program that does some complex find replace, so whever I wanted to replace a piece of txt in all the pages, I used that for the whole folder, and then reuploaded them.

Thanks again. I think I have some work ahead of me, prolly make my sis who just graduated from univ do that [no pics you fuckers :p]
 

cirrrocco

Golden Member
Sep 7, 2004
1,952
78
91
None of us know exactly what the OP really needs, it's suggestions.

WTF is your malfunction?

Alkemyst.. Thanks for your suggestions. I use CSS to control how all the pages look. so I have one CSS file for all the lyrics pages and one for the index and directory page [basically a-z.html]

I understand my question wasnt very clear, but it was actually changing a part of the page.

so for example lets suppose I have

header design
amazon search link
Album Name
Song name
Lyrics
Related links
Footer Design

Now supposed I wanted to change the following

header design
beatport Download Link
Album Name
Song name
Lyrics
Related links
Footer Design

I am having to do a find replace for all pages from amazon to beatport. so I was thinking that there should be a easier way to have the lyrics in their own data containers. and then the header design,download link, footer design can sit in just one html or php page which when changed will apply to all the lyrics page. I am sure that there should be something like that and so came to ATOT :p

anyway guys please don't fight..you have all been helpful. well i was writing down the lyrics to a song, with lots of ass and tits.

Hope you enjoy it

http://www.youtube.com/watch?v=8YC_DvarTlY [NSFW]
 

cirrrocco

Golden Member
Sep 7, 2004
1,952
78
91
On most sites this is done with PHP. You create either a template file (and then content is dynamically shoved into it), or a simpler template plus supporting files to contain headers/footers/etc.

My suggestion would be to use an off-the-shelf content management system (CMS), figure out how it stores content, and then work on shoving your existing content into the database for the CMS. Using a "real" CMS will give you more features with less work, and probably fewer security problems.

Aluvus.. I think that is exactly what I want. I dont want any fancy CMS solutions. I have most of my html pages under 8-9kb. No BS crap on the page, pretty much txt and related links and one search box for songs.

Can i clean up CMS so that it is vanilla. I dont to have users open each page which is 50-60 Kb each. I understand that isnt much , but many of my users are from Asia and Europe and I want to keep page sizes to a minimum. Plus I like pages to load super fast. something about minimalism that I love.
 
Last edited:

Aluvus

Platinum Member
Apr 27, 2006
2,913
1
0
Aluvus.. I think that is exactly what I want. I dont want any fancy CMS solutions. I have most of my html pages under 8-9kb. No BS crap on the page, pretty much txt and related links and one search box for songs.

Can i clean up CMS so that it is vanilla. I would to have users open each page which is 50-60 Kb each. I understand that isnt much , but many of my users are from Asia and Europe and I want to keep page sizes to a minimum. Plus I like pages to load super fast. something about minimalism that I love.

The actual HTML sent down the wire is almost entirely up to you. The normal CMS approach basically amounts to having some "magic" PHP variables that you can plug into an HTML template of your own making. The only real bloat you are likely to see is if you use the extra chrome that many CMSes provide (for instance, providing a link from each page to its parent category), the CMS may stick in a bit of extra formatting bits. Joomla seems to have a love of using tables for layout, often for no real reason, for example.

I've actually been meaning to make the same transition with a site I run, for some time now. I suspect you'll be done before I am :)
 

alkemyst

No Lifer
Feb 13, 2001
83,769
19
81
Alkemyst.. Thanks for your suggestions. I use CSS to control how all the pages look. so I have one CSS file for all the lyrics pages and one for the index and directory page [basically a-z.html]

I understand my question wasnt very clear, but it was actually changing a part of the page.

so for example lets suppose I have

...snip...

Yeah definitely what you should end up with is one or only a few pages that all reference their content from a database.

You make changes to those static pages and then all your content has those changes.

If you have all your look and feel in a CSS file (something I haven't had a need to do myself) then you can swap out your Holiday CSS with your Grammy CSS file and totally change your site in a few seconds.

The trick is much like getting your data into a common repository is to use common names in your pages.