Databases and dynamic web content

yottabit

Golden Member
Jun 5, 2008
1,607
718
146
Been doing some passive research for the past couple days and not getting too far so was hoping maybe you guys could help paint me a better picture for the current state of things.

TLDR version
I'm a non-programmer looking to understand what languages etc are best to create a website that can input user data, store it on a database (optionally), and then process that data on Windows applications automatically.

Would like to understand what databases/languages to research for serving up the webpages (PHP? AJAX?), storing the data (SQL?) and then interfacing with windows apps (VB? .NET?) And does this change if the data transactions need to be fast?

In the past I've messed around with some MySQL servers and PHP, a couple Drupal installations, etc but have been getting more interested in what technology and languages are out there now.

How does a website like Facebook store and serve its data? And how is it served dynamically (without page refresh?) I get PHP can serve data with a page refresh. Say I'm using a MySQL database, what languages are out their (and the best) to pull/serve that data? Can you do that with AJAX/Javascript? Ruby/Rails (excuse my ignorance here) What are the pro's and cons?

What are the database options out there? I'm only really familiar with MS Access, MySQL and Microsoft SQL. What about for very fast communications? Sometimes it seems like these relational databases are too slow. What is the fastest reasonable response time you should expect for a relational database to do a few queries?

What solutions do applications and games that rely on real-time or near real time data function? Are they using relational databases or something completely different? How does an MMO like WoW handle all that data? I would assume that there's got to be a relational database somewhere that stores all the relatively static info (characters, specs, items, etc), something more faster that's calculating and streaming real time data (damage, enemy reactions etc) and then a number of things that are being calculated on the client side.

I'm trying to imagine how fast things like this work and there's clearly a gap in my understanding even from a very high level. Is it proprietary code that's managing all that sort of data? Is it accomplished with a database like SQL or something completely different?

Then, of relational database options for instance, which are the best for connecting Websites to offline applications/services? For instance I believe it's possible to serve a webpage based off a SQL server, have users input data, and then have that data processed by a Windows application that can access the SQL server through a VB script or something right? Thinking of the type of websites where you upload an image/video/3d model and it does processing on it and automatically uploads you the processed version. How is that accomplished? Going oldschool I know I could have a Java applet someone logs into that creates something like a TCP/ip connection to a Java program running on a server that could process data. Doubt that's the recommended way to do it now LOL

Sorry for the wall of text. Any input to even point me in the correct directions to research would be hugely appreciated! Thanks everyone
 
Last edited:

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
Dynamic update of data is done using Javascript and either comet or some open connection between client and server of some type (can be done with plain old javascript and a willing server port). The initial page will be server via something else like PHP, although you can use a tonne of different languages and frameworks here, there are literally hundreds to choose from.

Postgres is another SQL database that is free, but unlike MySQL its truly open source and not run by the devil (Oracle).

The problem with databases is that they are in a different process than your program, and typically they connect via sockets and network communication. There is thus quite a bit of transit time for the query and data across that link. You can eliminate it for small databases by using an embedded SQL implementation, H2 in Java can do this for example. But embedded databases don't overall perform as well. Most performance issues with databases are overcomplicated queries or pulling more data than you need, its perfectly possible to get sub 10ms round trips on most SQL queries which given how parallel they can be is usually sufficient. There are other options, the term to search on is NoSQL.

Real time data doesn't go down to storage as its too slow. They will use in memory data only and likely replicate it to ensure they keep it in the event of machine failure. Wow most likely uses SQL databases but only for the persistent information. Most of the general game running stuff is most likely just in memory (like monster is at x,y,z, player is x,y,z and this animation etc) and what they store is loot picked up and given to players and gear changes and that sort of thing. But in order to make it look immediate the client will start playing the animations and effects before the server has acknowledged that they can. There is a lot of latency hiding that goes into these sorts of games. The halo developers did a wonderful video on it, you might want to track that down as it explains a lot about the what and how of latency hiding.

Anything with a driver for the SQL database can access the data in it for offline processing. A service like youtube will take the uploads of the files from the user, probably store those directly to disk, then it will have many background services picking up the file and looking at it. Some of them are recompressing it, some of them will be checking copyright and other such things and there is probably a whole pipeline of machines that use some distributed file system that access the file before its finally recompressed and then published. The meta data about its status and who it belongs to etc that most likely ends up in a SQL database, maybe a NoSQL database. But its not normally very efficient to store big assets in a SQL database, they really aren't good at it (its hard to split the really big DB files and back them up and distribute it etc etc).

What you use to upload it is entirely your choice, javascript is probably about the best option as its available everywhere. Then in terms of backend processing you can use literally any language you want that can connect to the database you are storing your meta data in, which is mostly likely almost anything.
 

yottabit

Golden Member
Jun 5, 2008
1,607
718
146
BrightCandle,

Thanks a lot for that detailed response post. That's exactly what I need to start some more research!

I think my fundamental mind-block was that I didn't realize AJAX allowed two way communication (much like my Java applet TCP port reference). I messed around with programming a bit in the late 90s/early 2000s and that's about all the experience I have so I still think of JavaScript as a very "client side" language. Obviously a lot has changed since then!
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
BrightCandle,

Thanks a lot for that detailed response post. That's exactly what I need to start some more research!

I think my fundamental mind-block was that I didn't realize AJAX allowed two way communication (much like my Java applet TCP port reference). I messed around with programming a bit in the late 90s/early 2000s and that's about all the experience I have so I still think of JavaScript as a very "client side" language. Obviously a lot has changed since then!

I don't know if I would call it "two-way" communication. Ajax allows script in a web page to make calls out to back-end APIs for data, or to execute transactions. HTML5 websockets are more a two-way model.
 

yottabit

Golden Member
Jun 5, 2008
1,607
718
146
I don't know if I would call it "two-way" communication. Ajax allows script in a web page to make calls out to back-end APIs for data, or to execute transactions. HTML5 websockets are more a two-way model.

That makes sense, poor choice of words on my part.

Is there a specific field of study that would relate to these kind of applications? I get confused because things seem to have branched out so far from the 90s. Who designs systems like this that have A) Dynamic Web Content B) Some database aspect C) Offline processing?

Is it something you would go to school for "computer programming" for and then just acquire the skills in each area after the fact?

On a real project would there typically be one person/group that handles the web-end of things and one that handles the "offline processing" end of things? Or is it common for the same programmer(s) to do everything?

I imagine security would start to be a pretty big concern once you have basically those three levels of programming going on

Still trying to round out my 30,000 foot view of all these things. I'd like to someday be able to be create such systems, or at least be able to understand enough about them to to be able to properly define requirements and orchestrate the creation of one. Just hard to figure out where to start when there is so much out there now.

I currently have a degree in Mechanical Engineering, and wouldn't be opposed to getting a second degree in programming or CS if it really added value. But if I can just as easily learn the fundamentals and then practice a few languages on my own then that would make more sense.

I'm also concerned about which languages are going to be used going forward vs obsoleted. If I spend a significant time investment getting up to speed on everything and then it becomes worthless that would be a disappointment.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
That makes sense, poor choice of words on my part.

Is there a specific field of study that would relate to these kind of applications? I get confused because things seem to have branched out so far from the 90s. Who designs systems like this that have A) Dynamic Web Content B) Some database aspect C) Offline processing?

Yes, indeed, things have changed quite a bit from the "90's." Early 90's was all about client-server and the ubiquitous web-based protocols were just starting to get some attention.

Field of study? There are dozens, and I'm not really equipped to comment on the formal academic path. Programmers are people who know how to program, and among all my current and former colleagues and professional acquaintances there is a huge diversity of paths traveled.

Is it something you would go to school for "computer programming" for and then just acquire the skills in each area after the fact?

Yes, essentially. That probably serves as a good high-level description of the formal academic path.

On a real project would there typically be one person/group that handles the web-end of things and one that handles the "offline processing" end of things? Or is it common for the same programmer(s) to do everything?

There can be a natural division of labor between the tiers, but it really depends on the project and team size. Most of the teams I've worked on have had various responsibilities throughout the stack, so everyone knew something about all of it.

I imagine security would start to be a pretty big concern once you have basically those three levels of programming going on

Yes, and it's a huge topic by itself.

Still trying to round out my 30,000 foot view of all these things. I'd like to someday be able to be create such systems, or at least be able to understand enough about them to to be able to properly define requirements and orchestrate the creation of one. Just hard to figure out where to start when there is so much out there now.

I currently have a degree in Mechanical Engineering, and wouldn't be opposed to getting a second degree in programming or CS if it really added value. But if I can just as easily learn the fundamentals and then practice a few languages on my own then that would make more sense.

I'm also concerned about which languages are going to be used going forward vs obsoleted. If I spend a significant time investment getting up to speed on everything and then it becomes worthless that would be a disappointment.

This question or set of questions comes up a lot, and I don't really know how to answer it/them. Your perception is correct: there is a lot involved. There has always been a lot involved, but today there is a huge proliferation of platforms, stacks, tools, languages, etc. I know if I was just beginning it would be daunting. The best advice I can give is that you can't really dive in from the top, imo. I think you need to swim up from the bottom. You need to understand programs and programming, and then how different programs cooperate on the same system, or different systems. You need to get how web apps work at each layer (conceptually its still a client-server model). It takes time, but I don't think that the big picture stuff can make a lot of sense until you're speaking the language.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Would like to understand what databases/languages to research for serving up the webpages (PHP? AJAX?), storing the data (SQL?) and then interfacing with windows apps (VB? .NET?) And does this change if the data transactions need to be fast?
Without an ORM, SQL is going to be annoying, at best, for row/column/object storage. With an ORM that makes it easy, moving to sane DBs tends to be hard. Starting with a sane DB, you'll need an ORM that can work easily with them, or roll your own (if you've got a set schema, or custom views, it's easier than it first looks). Past that, anything will do. Make a ODBC/JDBC connection to it, and voila!

Ruby is the best language commonly used for web development, bar none, IMO. Rails can get you up and going quickly, but be aware that Rails is built to use some anti-patterns as normal operation (the solutions for which are generally Sinatra or Camping).

Python is better than PHP for arbitrary web apps, but the best frameworks are either tied to their own web stacks, or just building blocks. It's much more limiting as a language, resulting in needlessly larger code than Ruby, for little or no gain. But, it has some addons that are unmatched by Ruby, to date, like NumPy and MPI. Also, some of the frameworks that include web servers, like CherryPy, are very fast, very mature, and very well documented.

PHP is old, entrenched, and not going anywhere, but also not remotely interesting, and a very clunky language. It exists, and extending it is easiest with itself, but I wouldn't get started with it, today.

.NET is good for Windows, Windows, and only Windows. Yeah, there's Mono, but it's Mono :rolleyes:. For a vertically integrated Windows shop, .NET is awesome. But, that generally costs tens of thousands per year just to start, and thousands per developer per year to keep up. The integration is fantastic, SQL Server's management tools are amazing, the ability to work in them while also messing around in Visual Studio is unmatched by anything else out there that I am aware of, etc. The price that comes at, however, is being locked into MS' ecosystem. When they change licensing in a way that negatively impacts you, if you decide to migrate to FOSS right then, it could take you several years to manage getting off of MS, and that will be a dangerous several years.

How does a website like Facebook store and serve its data?
With $$$, and employees that think like engineers. They use basically every language and storage tech that's proven itself, that they can make use of. Memcache, MySQL, PostgreSQl, PHP, Python, Erlang, etc....and where appropriate, directly compiled languages. They throw hardware and money at their problems now, and then throw money at experts to refine it later. Since they are in a volatile market, they also keep numerous experimental projects going on all the time, and their back ends are always going through transitions.

And how is it served dynamically (without page refresh?)
That would be AJAX. AJAX allows for the mimicking of MCV on the web, by polling from the client. Javascript can open connections at will, and keep connections open waiting for responses. You don't strictly need AJAX, but AJAX is well-known, common, and quite good.

What are the database options out there?
PostgreSQL is the best SQL DBMS for an attempt at a relational DB, if you don't have very deep pockets. MySQL is good for fast reading. SQLite is good for small data sets, tied to an application (it can be used with large data sets, too, if they're simple enough). Firebird is good for integrating, like SQLite. Various key-value, column, and object databases out there are good for tying directly to your application, with data laid out in the way your application is best suited to access it.

What about for very fast communications? Sometimes it seems like these relational databases are too slow. What is the fastest reasonable response time you should expect for a relational database to do a few queries?
Usually, that comes from programmers that do not want to learn about the tools they make use of daily (if you use active record, for example, it's not the DBMS' fault when reading thousands of objects is slow; your program just made thousands of requests to the DB for thousands of items, instead of one request for thousands of items). Response time will vary by DBMS and implementation, but in general, queries for small amounts of things should be dominated by rendering and communicating, not querying for the data--<100 microseconds for queries of small amounts of data in RAM would not be atypical, but it only grows with query complexity and data set size and complexity. It's not uncommon, however, for a relational DB to be a back-end, with an unnormalized read-heavy DB on the front end, either set up as K/V stores, or views specific to application uses. A common clustering scenario, as well, is to have master and slave servers, where the slave servers only serve reads, and the masters handle writes, then push updates to the slaves.

Relational DBs are reasonably possible with Oracle, Postgres, MySQL 5+, MS SQL Server. MySQL will be the hardest and slowest of them, if you try to implement a relational DB with it (it lacks some constraint types, and check constraints are ignored, so lots of triggers get needed that aren't needed for other SQL DBMSes). Relational and performance are orthogonal. Relational means the data is all in relations: tuples of named fields, with unique names per field, unique values within a relation (IE, it's a set), no unknown values stored (IE, no nulls in base tables), fields are typed, and no multivalued fields (1NF).

Many threads are faster than one thread, if it can be done.
CPU cache is faster than RAM.
RAM is faster than SSDs
SSDs are faster than the LAN.
The LAN is faster than HDDs.
HDDs in RAID 0 or 10 are faster than in RAID 5 or 6.
HDDs in RAID 5 or 6 are faster than client-side WANs.

But, if you to scale out to many machines, yet still maintain low latency, nothing beats a custom-designed storage system, tailored to the application(s) at hand. Whether an SQL DBMS, like MySQL or PostgreSQL, works, or if Mongo, Couch, or Cassandra fit the bill better, or something totally different, it's going to be designed based on the types of inserts and retrievals made, not by good data management principles (applying those is a huge undertaking by companies doing this sort of thing). Usually, such systems have either a back-end software layer that scrubs for correctness; or they are backed by a well-normalized database, with only reads being fast; or every single node duplicates enough data to tell if something might be screwed up. Any such system is going to need delicate care and feeding, and if it grows in utilization, constant rejiggering for performance. There is no one-size-fits-all solution, now or on the horizon.

What solutions do applications and games that rely on real-time or near real time data function? Are they using relational databases or something completely different? How does an MMO like WoW handle all that data? I would assume that there's got to be a relational database somewhere that stores all the relatively static info (characters, specs, items, etc), something more faster that's calculating and streaming real time data (damage, enemy reactions etc) and then a number of things that are being calculated on the client side.
For that kind of system, it's going to be custom built into the game server software, usually with no separation between layers. Economic features are going to hit a separate system, usually, that is better designed for managing the data at hand.

Then, of relational database options for instance, which are the best for connecting Websites to offline applications/services? For instance I believe it's possible to serve a webpage based off a SQL server, have users input data, and then have that data processed by a Windows application that can access the SQL server through a VB script or something right? Thinking of the type of websites where you upload an image/video/3d model and it does processing on it and automatically uploads you the processed version. How is that accomplished? Going oldschool I know I could have a Java applet someone logs into that creates something like a TCP/ip connection to a Java program running on a server that could process data. Doubt that's the recommended way to do it now LOL
You don't need a relational database to handle that, or even a SQL row database. The above can be done in dozens of ways equally well, even using the file system structure and metadata for a database (mtime checking and log files). I wouldn't recommend using files for it, but it's a basic enough task that I don't think there even is a best way to do it. Generally, the server doesn't care what connects to it, so long as it can authenticate, and uses the right protocols.

Who designs systems like this that have A) Dynamic Web Content B) Some database aspect C) Offline processing?
One key aspect of the web, and now more systems, is that everything should be treated like offline processing, whenever possible. That is, if a given "client" doesn't make a request, nothing should happen, and hat happens behind the curtain of the "server" doesn't matter. It's a generalization, but it works well. The nonobvious implications are that nothing is strictly pushed, and all state needs to be declared state. There's been an attempt to codify methods of doing that, as well, REST. Dynamic web content has been hammered out into basically actor model processing over networks.

On a real project would there typically be one person/group that handles the web-end of things and one that handles the "offline processing" end of things? Or is it common for the same programmer(s) to do everything?
That's going to vary all over the place. Generally, a larger organization will have distinctly enforced groups, while a smaller one will have specialists based on their talent, experience, and interest, but with plenty of cross-training. Also, generally, the data itself is extremely important in larger organizations, so they need people to make sure other people don't naively screw any data up, make sure backups are being done and are functional, etc.--there's a real need to have people that just handle the data as its own important entity.

I'm also concerned about which languages are going to be used going forward vs obsoleted. If I spend a significant time investment getting up to speed on everything and then it becomes worthless that would be a disappointment.
Languages are always coming and going, and in a language, libraries and frameworks are coming and going. If you're going to learn programming, learn programming, not a language. Any given language is just one way to express what you're trying to express to the computers underneath it.

And, finally, +1 to everything Markbnj said.
 
Last edited: