Hacky content negotiation for my home page
Post #11 published on by Tobias Fedder
When I started rebuilding this site using a static site generator a little over a year ago, I already knew I wanted it to deal with two languages: English and German. But only a few pages should exists in both languages, mostly for legal reasons, so I decided to deal with it by hand, creating separate pages serving them from distinct URLs.
One issue remained though, I want both, the home page in English and the equivalent Startseite in German, served from /
, depending on the user's preferred languages. I need content negotiation. But the web server I'm using, caddy, doesn't support content negotiation based on language yet — at least not out of the box. I wrote everything up to here already in my first blog post about generating a static blog, which also contains the following sentence: After wheighing up my options I decide to put the German homepage at / and will leave it to far future me to patch that unfortunate circumstance somehow.
Well, that far future me of the past seems to be me of today. The first step in that is serving the German home page from /de/
, analogous to the English home page under /en/
. Thereby both of them continue to be served from distinct URLs regardless of users' preferences. The second step is a URI rewrite internally on the server, to one of the two home pages. So in case the requested path is /
, I try to figure out wether or not the user may prefer German over English content. That's where it gets hacky. I run a regular expression against the value of the request's Accept-Language
header, looking for de
or en
. If it matches at all, I check if the first match is de
, then I infer German is prefered over English. Here is how that looks in my Caddyfile
.
⋮
@home-de `path('/') && header_regexp('Accept-Language', '\\b(en|de)\\b') && {re.1} == 'de'`
@home path /
rewrite @home-de /de/
rewrite @home /en/
header @home Vary Accept-Language
⋮
That last line sets the Vary
header with the value Accept-Language
, you know, because the content now varies based on that request header. That's my language‐based content negotiation workaround, and it is hacky; because it could fail in so many ways — not saying in most cases, just in many ways. RFC 9110 HTTP Semantics 12.5.4. Accept-Language states that many user agents
are listing [language tags] in order of decreasing quality
in the Accept-Language
request header. That is because some recipients treat the order in which language tags are listed as an indication of descending priority, particularly for tags that are assigned equal quality values (no value is the same as q=1).
My server is one such recipient now, which isn't ideal because this behavior cannot be relied upon.
— whoopsie. If a user agent sends the language tags in any other order, or no oder at all, this hack might fail.
It would even fail in descending order, if a user's language preferences are weird around so called extended language ranges. Imagine a user that prefers English as communicated in the US over generic German, and both of those over generic English, and therefore sends the following header: Accept-Language: en-US,de;q=0.7,en;q=0.3
I only differentiate between English and German, my regular expression would match on the en
in en-US
and result in responding with the English home page, although that isn't in en-US
.
But I assume it would serve most people well whose user agents send something like en-GB,fr-FR;q=0.7,de-DE;q=0.3
, extended language tags only. RFC 9110 12.5.4. ends with following note:
User agents ought to provide guidance to users when setting a preference, since users are rarely familiar with the details of language matching as described above. For example, users might assume that on selecting "en-gb", they will be served any kind of English document if British English is not available. A user agent might suggest, in such a case, to add "en" to the list for better matching behavior.
If you know any user agent that does, let me know.
On to side effects, in the last blog post I set up caching for the HTML. To do that I added a named matcher to my Caddyfile that checked whether or not a requested path appended by /index.html
leads to an actual file on the server. That no longer works for /
because there is no file anymore, instead it rewrites to /de/
or /en/
. The new matcher looks like this:
@html `file('{path}/index.html') || path('/')`
⋮
header @html {
Cache-Control "public, max-age=300"
}
Speaking of serving the exact same content from different places, I put a <link rel="canonical" href="…
in the <head>
of both homepages, referring to their respective distinct URL. A small filter function in my eleventy.config.js
allows me to use front matter in both markdown files for that.
export default function(conf) {
⁝
conf.addFilter("generateCanonicalLink", generate_canonical_link)
⁝
function generate_canonical_link(canonicalLink) {
if(!canonicalLink) return
if(!canonicalLink.includes("://")) {
canonicalLink = prepend_proto_and_fqdn(canonicalLink)
}
return `<link rel="canonical" href="${canonicalLink}">\n`
}
⁝
Then that filter is used in my base template base.njk
.
⁝
{{ canonical | generateCanonicalLink | safe }}
</head>
⁝
That's it. Finally, I can serve the home page to visitors in English or German from /
depending on — my interpretation of — their language preferences.