View Full Version : Language codes / tags in Northern Ireland ?
Andy Dingley
05-18-2007, 02:55 PM
Can anyone please suggest (ideally with references) what the language
codes or tags (ISO or IETF) ought to be to represent:
* English, as spoken in Northern Ireland (by either or both
communities)
* Ulster Scots
Thanks.
Andreas Prilop
05-18-2007, 02:55 PM
On Fri, 18 May 2007, Andy Dingley wrote:
> Organization: http://groups.google.com
Google ignores any language markup. They have their own, unknown
rules to detect^W guess the language of a page.
> Can anyone please suggest (ideally with references) what the language
> codes or tags (ISO or IETF) ought to be to represent:
For what purpose? Isn't it enough to write simply "en"?
I'm not aware that any software - let alone search engines -
recognize, say "en-GB" and "en-US" and make any difference.
--
In memoriam Alan J. Flavell
http://groups.google.com/groups/search?q=author:Alan.J.Flavell
Jukka K. Korpela
05-18-2007, 08:02 PM
Scripsit Andreas Prilop:
> I'm not aware that any software - let alone search engines -
> recognize, say "en-GB" and "en-US" and make any difference.
I think IBM Home Page Reader recognizes those codes and selected the
pronunciation accordingly. However the software has been discontinued. Yet,
it's still _potentially_ useful to declare the language variant.
If you open an HTML file in (a sufficiently new version of) MS Word, the
lang attributes will be recognized and applied in spelling checks. I guess
Word would then accept "honour" in en-GB and reject it in en-US.
You could also write (as an author or as a user) a style sheet that uses
language selectors and e.g. highlights any text in en-GB. Maybe someone
finds some use for such methods.
To summarize, the tangible benefits of using language markup in general and
refined markup in particular are rather small, but the idea is not
completely unrealistic. Using very refined codes means taking the risk that
software capable of recognizing simple codes like "en" doesn't grok your
"en-foobar" at all.
If someone really wants to spend time with such issues, RFC 4646 is probably
the best starting point.
--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
Jukka K. Korpela
05-19-2007, 03:12 AM
Scripsit Andreas Prilop:
> Google ignores any language markup. They have their own, unknown
> rules to detect^W guess the language of a page.
This is not so surprising or irrational, if we consider the volume and
adequacy of language markup on web pages.
I just checked the page of the ministry of social affairs and health in
Finland ( http://www.stm.fi ). The main page, which is in Finnish of course,
claims - in language markup - to be in American English (en-US), whereas the
Swedish-language version of that page is marked up as being in Finnish
(actually, fi_FI), and so is the English-language version.
So if you were Google, would you really honor language markup?
--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
AlanEdgey@aol.com
05-20-2007, 07:10 AM
On May 18, 4:01 pm, Andy Dingley <ding...@codesmiths.com> wrote:
> Can anyone please suggest (ideally with references) what thelanguage
> codes or tags (ISO or IETF) ought to be to represent:
>
> * UlsterScots
ISO 639-3 sco (http://en.wikipedia.org/wiki/ISO_639:s)
Alan
vBulletin® v3.7.4, Copyright ©2000-2010, Jelsoft Enterprises Ltd.