• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

How can determine what LANGUAGE an HTML page is written in?

Superwormy

Golden Member
I want to write a comptuer program which determines what language a computer program is written in.

If that's too hard, I'd just like to write a program which excludes any page thats not written in English.

Is there a QUICK way to do this? I see stuff about charset= in HTML, and Content-language... I'll be using PHP / Perl if that helps... I just need somewhere to start, cause right now I have no idea...

Anyoen?
 
You're going to have to parse the page for <html lang=xx> and see if you can't figure out the language that way. If the author of the page left that tag out, you're going to have to look at the http content-language header, and see if that's set. If neither of those is set, you make your best guess.
 
Back
Top