Deploying my website

Post #3 published on 2023-12-30 by Tobias Fedder

It is live. 🎉 I've done it.
2023 is the year of this website's start over — yes, that was close.

Now that I begin overthinking this: while I am writing this, the website isn't live yet. It will be when you are reading it. But if that's how I determine it, then I could have claimed that in any of the blog posts I've written so far.
Okay, let me rephrase that.

In this blog post I go through the final steps on my way to make this website publicly available. It is therefore the last post before launch. (🎉)

All that's left to do is

copying the files I created, to a linux server I already have,
pointing my domain to the server's IP address, and
serving the files from a web server that is already running.

A piece of cake, but…
the reason for a linux server already being there is that I've been running some web services, used by my family, for quite some time. For the same reason a webserver, namely Caddy, is installed and running. I needed a reverse proxy to make the web services available over HTTPS. Caddy was the easiest to configure as such. Especially because Caddy can grab some TLS certificates from Let's Encrypt just by putting the domain name in the configuration.

Trying the webserver config locally

The web services already running on the server shall continue working just fine, while the webserver, from now on, also serves a bunch of static files that I call my website. Instead of messing with the Caddy instance on the server, let me try that out locally first. I'll use docker compose for that. My docker-compose.yml looks like this:

version: "3.9"

services:
  caddy:
    container_name: caddy
    image: caddy:2.7.6-alpine
    volumes:
      - "./server/config/Caddyfile:/etc/caddy/Caddyfile:ro"
      - "./server/config/certs:/etc/certs:ro"
      - "./_site:/srv/tfedder:ro"
    ports:
      - "80:80"
      - "443:443"
    environment:
      TLDOMAIN: localhost
      NON_PROD_TLS: "tls /etc/certs/tfedder.localhost+4.pem /etc/certs/tfedder.localhost+4-key.pem"
      TZ: Europe/Berlin
      CADDY_DATE_STRING: 2023-13-32
      NON_PROD_DOMAINS: >
        dev.tfedder.localhost {
          tls /etc/certs/tfedder.localhost+4.pem /etc/certs/tfedder.localhost+4-key.pem
          reverse_proxy gateway.docker.internal:8080
        }
    extra_hosts:
      - "gateway.docker.internal:host-gateway"

That's a mouthful, so let's focus. I use the most recent Alpine‐based Caddy image and map the default ports for HTTP and HTTPS, 80 and 443, from the host machine to the same ports on the container. Furthermore, one single file and two directories from the host machine are mounted read‐only into the container's file system. The /etc/certs directory mounts certificates that were created with the tool mkcert — a story for another blog post maybe — enabling me to use HTTPS on localhost. The _site directory contains the generated files to be served. Lastly, the Caddyfile is the configuration file for Caddy. It will be parsed into JSON on webserver start. I could write the config in JSON myself, but although it is a custom format with its own syntax, I find it easier to work with the Caddyfile. Let's have a look at mine before we come back to the things in the docker-compose.yml I skipped over.

tfedder.{$TLDOMAIN:de} {
  {$NON_PROD_TLS}
  root * /srv/tfedder
  file_server
  
  header {
    -Server
    -Last-Modified
  }
  
  log {
    output file /var/log/caddy/tfedder_{$TLDOMAIN:de}/tfedder_{$TLDOMAIN:de}_{$CADDY_DATE_STRING}.log 
    format filter {
      wrap json
      fields {
        request>remote_ip delete
        request>remote_port delete
      }
    } 
  }
}

{$NON_PROD_DOMAINS}

service-one.tfedder.{$TLDOMAIN:de} {
  {$NON_PROD_TLS}
  header {
    -Server
  }
  reverse_proxy localhost:11111
}

service-two.tfedder.{$TLDOMAIN:de} {
  {$NON_PROD_TLS}
  header {
    -Server
  }
  reverse_proxy localhost:22222
}
⋮

With a quick glance at the non‐indented lines we see the different host names being served. The two at the bottom show the use as a reverse proxy. That's already in place. The domains look a bit odd. That's because instead of writing the top‐level domain (TLD) directly into the Caddyfile, there is an environment variable (ENV).
I don't want to have two Caddyfiles, one that I test locally, and the other untested one for the server; I want the same Caddyfile for both, so that I use what I've tested. Obviously, the configuration can not be exactly the same, therefore the ENVs set what differs from the outside. tfedder.{$TLDOMAIN:de} means replace the curly braces with the value of the ENV TLDOMAIN, and if that ENV doesn't exist fall back on de. Thereby its resolving to tfedder.de on the server, but when I start it with my docker-compose.yml then tfedder.localhost is the host name.

As you saw, the TLDOMAIN isn't the only ENV in that file. The first line of configuration for each host is {$NON_PROD_TLS}. This also isn't set on the server and therefore resolve to nothing. Locally the value amounts to the tls directive telling Caddy which certificate files to use. On the server I want the default behaviour, getting certificates from Let's Encrypt. Something that isn't going to fly for localhost.

The next directive is root * /srv/tfedder. It configures where to append the URL's path to, to look for the files requested. The file_server directive does⸺ well, take a guess. In the directive after that I prevent Caddy from sending two headers in the responses. I've no concern telling you, dear reader, that I use Caddy for this and if one would target this server there'll be ways to figure that out anyway. But not every script kiddie scraping the web needs to know. Regarding the removal of the Last-Modified header: that one can be used for caching, and so can the ETag. Only one of them is needed for caching and I prefer the latter. I might have to switch that in the future, because on a glance Caddy seems to generate its ETags, based on the file's content hash and the modification date metadata, instead of just the hashed content. I think that's unfortunate. What's the effective difference to Last-Modified then? Anyway, caching is something I can optimise after going live.

The last directive in my Caddyfile, for this webite at least, is a bit convoluted. It tells Caddy to write an access log in its default format, which is JSON. But without logging the remote_ip and the remote_port. I just don't need to know the port. The IP address on the other hand is someting I want to know. Since somewhat recently Caddy not only logs the remote_ip but also the client_ip. My understanding is, that in my setup both would always be the same. But if I had yet another reverse proxy in front of it, or something similar, then the remote_ip would be that of the proxy. So they are always the same in my case, but if they differed, I'd always want the client_ip, therefore the other one has to go.

The last thing about that last directive has to do with ENVs again. Of interest is the CADDY_DATE_STRING. The reason for that is log rotation. I'll get to it in a bit.

Before I do, let me explain the last non‐indented line: {$NON_PROD_DOMAINS}. If you scrolled up to the docker-compose.yml, you'd see that I am declaring a reverse proxy for the host dev.tfedder.localhost proxying it to gateway.docker.interal:8080. gateway.docker.internal resolves to the localhost of the host machine that runs the containers. If I typed localhost instead, then it would point to the container, and there is nothing except Caddy itself. On the host machine at port 8080 on the other hand, there is an 11ty dev‐server running — at least occasionally. It won't be running on the actual server though, hence hiding the whole declaration in an ENV.

Why serve the dev‐server over HTTPS, you ask? It's cool, I guess, innit? To make up a reason: there are some web platform features that require HTTPS for security reasons. But to make it easier to try out these features they are often allowed over HTTP on localhost. Also, some security measures are off or more lenient over HTTP.
Now imagine you build something based on such a platform feature, and everything works great, and then you push it to stage or prod, and what you built encounters HTTPS for the first time, and it breaks, and you cry. No, thank you. Not that I have run into a situation like that with this bare‐bones website.

Desperately rolling the log myself

Returning to the CADDY_DATE_STRING ENV and log rotation. There are very well established tools for the rotation of logs, for example logrotate. But the last two or three times I tried something with it I was too dense — or their man page too terse for me — to comprehend it. Caddy is capable of rolling its logs on its own, but only based on file size, not based on time. If you read the previous blog post or — even crazier — the privacy notice, you'll know that I log your IP address, but that I am going to delete it within 24 hours. Therefore I have to have log rotation. I decided to roll my own. The filename of the log contains the date, which is set through the ENV. Next day → new date → new filename → new log file, easy — well, almost.

My linux distro came with systemd. Due to me installing Caddy through the distro's package manager it also set up a Caddy systemd service. That means I can control Caddy without knowing much about it, just using generic systemd commands with the service name at the end. For example, to restart Caddy I run systemctl restart caddy. Another benefit is that Caddy starts automatically on boot, given the service has been enabled via systemctl enable caddy.

Having that service already configured made it pretty simple to provide the ENVs to it. In the caddy.service file I added the line EnvironmentFile=/etc/caddy/caddy.envs. Once after that change I need to systemctl daemon-reload, otherwise the ENV wouldn't be set to the service yet. As long as I only change the content of the file, I don't have to do that again. Speaking of which, that file has to contain the name of the ENV and the current date CADDY_DATE_STRING=YYYY-mm-dd. So I write a script to (re‐)create the file's content with the current date. I call it caddy-script.sh and put it right next to the Caddyfile in /etc/caddy/. I only have to run that script somehow once every day for the caddy.envs to contain the current date. However, Caddy resolves the placeholders at start, so it needs to be reloaded, which in contrast to a restart allows for updating the configuration without downtime. For this purpose the line systemctl reload caddy is added to the script. To execute that script I throw it into the root user's crontab sudo crontab -e as a cronjob 0 0 * * * /etc/caddy/caddy-script.sh, running at 0:00 every day.

Good, the log is rotating now, I'm done. No, wait, there was a reason for all that, I still have to anonymize the log from the day before. In addition to getting the current date, the script also gets the date from one day earlier. Then put the known fixed parts of the file path and filename together with yesterday's date and pipe that through a tool to replace the IP addresses with UUIDs, write the anonymized version to another place, then delete the old log. I'm sure people more familiar with the linux ecosystem could do the replacing with awk, sed or whatever scripting language, I wrote a little something to be compiled to a binary instead, called it log-anonymizer — creative, right? This is what the monstrosity looks like.

#! /bin/bash -e

envFile=/etc/caddy/caddy.envs
now=$(date '+%Y-%m-%d')
before=$(date -d '-1 day' '+%Y-%m-%d')

logFilePath=/var/log/caddy/tfedder_de/
archiveDir=archive/
logArchivePath=$logFilePath$archiveDir

logFilenamePrefix=tfedder_de_
logFilenameSuffix=.log

beforeLogFilename=$logFilenamePrefix$before$logFilenameSuffix

beforeLogPathAndFilename=$logFilePath$beforeLogFilename
beforeArchivePathAndFilename=$logArchivePath$beforeLogFilename

cat > $envFile <<EOF
CADDY_DATE_STRING=$now
EOF

systemctl reload caddy

/usr/local/bin/log-anonymizer < $beforeLogPathAndFilename >> $beforeArchivePathAndFilename

rm $beforeLogPathAndFilename

Finally everybody's privacy will be respected. I generate the files with 11ty once again. Copy them, as well as the new Caddyfile to the server. One last — but very important — step, before I reload the config. Logging in to my dashboard at the service provider managing my domain. Changing the DNS A record's IP from that old, cheap VPS to this server.
🎉🎉🎉 Here we go.

Every blog post from here on out can be as little as a single markdown file and a few shell commands — let's see how many months that takes.