Introduction

Systemd permet de lancer des actions en fonction du statut de sortie des services. Ceci permet par exemple de mettre en place une surveillance des services en dĂ©clenchant une notifications de rĂ©ussite ou d’échec d’un service.

Pour cela, nous utiliserons :

Création du service healthcheck

Créer le fichier .config/systemd/user/healthchecks-ping@.service avec le contenu suivant (GitHub) :

[Unit]
Description=Submit monitored EXIT_STATUS to healthchecks.io
Documentation=https://github.com/bbx0/systemd-healthchecks

# Usage:
# - Set ping key as service override: systemctl edit healthchecks-ping@.service
#   [Service]
#   Environment=HC_PING_KEY=<YourKey>
# - Create slug with same name as the monitored service in your healthchecks.io project
#   The prefix 'hc-' will be trimmed from the monitored service name and is not part of the slug.
# - Configure the monitored service unit to ping its result OnSuccess
#   [Unit]
#   OnSuccess=healthchecks-ping@%N.service
# - Optional: For explicit tests set the service unit execution result to success. (The test itself executed successfully.)
#   [Service]
#   SuccessExitStatus=0 1
# - Optional: Configure explicit transmition of failures (add suffix '-fail' to the instance name)
#   [Unit]
#   OnFailure=healthchecks-ping@%N-fail.service

[Service]
Type=oneshot
# Set via `systemctl edit healthchecks-ping@.service`
Environment=HC_PING_KEY=
Environment=HC_BASE_URL=https://hc-ping.com

# Slug is taken from the intance name of this service healthchecks-ping@<myslug>.service
# Example: healthchecks-ping@hc-diskspace-usage.service results in slug 'diskspace-usage'
# - A prefix 'hc-' is removed from the instance name (for easier use with dedicated healthcheck service units 'hc-%N')
# - A suffix '-fail' is removed from the instance name (to allow usage in OnFailure requiring a different instance name '%N-fail')
Environment=HC_SLUG=%i

# Skip submission if no ping key is configured 
ExecCondition=test -n "${HC_PING_KEY}"

# Only systemd version 251 and above provide information about monitored services
# ($MONITOR_* values are not available in ExecCondition, so we test it in ExecStartPre)
# Check also if its a "start" invocation, where there will be obviously no $MONITOR_*
#ExecStartPre=test -n "${MONITOR_EXIT_STATUS}"
#ExecStartPre=test -n "${MONITOR_INVOCATION_ID}"
ExecStartPre=/bin/sh -c "[[ -n \"${MONITOR_EXIT_STATUS}\" || \"${HC_SLUG}\" =~ -start$ ]]"
ExecStartPre=/bin/sh -c "[[ -n \"${MONITOR_INVOCATION_ID}\" || \"${HC_SLUG}\" =~ -start$ ]]"

# Ping EXIT_STATUS of the monitored service to healthchecks.io
# (Disable variable substitution by systemd as this is executed within a shell with env var support by its own.)
ExecStart=:/bin/sh -c "\
slug=$(echo \"${HC_SLUG}\" | sed --expression='s/^hc-//' --expression='s/-fail$//' --expression='s/-start$//') || exit 2; \
data=$(journalctl _SYSTEMD_INVOCATION_ID=${MONITOR_INVOCATION_ID} --output=cat | head --bytes=100kB) || exit 2; \
status=$(if [[ \"${HC_SLUG}\" =~ -start$ ]]; then echo \"start\"; else echo \"{$MONITOR_EXIT_STATUS}\"; fi) || exit 2; \
curl --fail --silent --show-error --max-time 10 --retry 5 --output /dev/null --request POST --data-raw \"${data}\" --url \"${HC_BASE_URL}/${HC_PING_KEY}/${slug}/${status}\""

Éditer les variables du service :

systemctl --user edit healthchecks-ping@.service

Ajouter au début du fichier (voir section XX) :

[Service]
Environment=HC_PING_KEY=clé
Environment=HC_BASE_URL=https://healthchecks.h.gued.in/ping
Environment=HC_SLUG=ackbar-%i

Et recharger la liste des services :

systemctl --user daemon-reload 

Ajout dans les services Ă  surveiller

Pour dĂ©clencher le service healthcheck prĂ©cĂ©demment créé dans les services Ă  surveiller, il suffit d’ajouter les directive OnSuccess & OnFailure (disponible Ă  partir de la version 251 de systemd) Ă  la section Unit du service :

OnSuccess=healthchecks-ping@%N.service
OnFailure=healthchecks-ping@%N-fail.service

Ainsi qu’une directive ExecPre sans la section Service afin d’executer le service healthcheck au dĂ©marrage. Cela permet de mesurer le temps d’execution du service surveillĂ©.

ExecStartPre=systemctl --user start healthchecks-ping@%N-start.service

Par exemple pour Restic :

[Unit]
Description=Restic backup service
After=network-online.target
Wants=network-online.target
StartLimitIntervalSec=300
StartLimitBurst=5
OnSuccess=healthchecks-ping@%N.service
OnFailure=healthchecks-ping@%N-fail.service
[Service]
Type=oneshot
Restart=on-failure
RestartSec=30s
ExecStartPre=systemctl --user start healthchecks-ping@%N-start.service
ExecStart=restic backup --verbose --one-file-system --tag systemd.timer $BACKUP_EXCLUDES $BACKUP_PATHS
ExecStartPost=restic forget --verbose --tag systemd.timer --group-by "paths,tags" --keep-daily $RETENTION_DAYS --keep-weekly $RETENTION_WEEKS --keep-monthly $RETENTION_MONTHS --keep-yearly $RETENTION_YEARS
EnvironmentFile=%h/.config/restic/restic-backup.conf

Après modification, il faudra recharger la liste des services :

systemctl --user daemon-reload 

Healthchecks

Installation de Healthchecks

Healthchecks sera installĂ© au travers d’une stack Docker (avec Traefik en reverse proxy - voir les dĂ©tails ici). Pensez Ă  ajuster les variables :

---
version: "2.1"
services:
  healthchecks:
    image: lscr.io/linuxserver/healthchecks:latest
    container_name: healthchecks
    environment:
      - PUID=1000
      - PGID=1000
      - SITE_ROOT="https://healthchecks.h.gued.in"
      - SITE_NAME="Healthchecks"
      - DEFAULT_FROM_EMAIL=""
      - SUPERUSER_EMAIL=""
      - SUPERUSER_PASSWORD=
      - APPRISE_ENABLED=True
    volumes:
      - /opt/healthchecks:/config
    restart: unless-stopped
    labels:
      - traefik.enable=true
      - traefik.http.routers.healthchecks.rule=Host(`healthchecks.h.gued.in`)
      - traefik.http.routers.healthchecks.entrypoints=websecure
      - traefik.http.routers.healthchecks.tls=true
      - traefik.http.routers.healthchecks.tls.certresolver=default
    networks:
      - web
      - default
networks:
  web:
    external: true

Création du check

Ajouter le check dans Healthchecks :

surveillance-services-systemd-add-check

Renseigner à minima le Name et le Slug, et ajuster si nécessaire la Period et le Grace Time.

A la prochaine exĂ©cution du service, vous devriez retrouver des checks positifs ou nĂ©gatifs dans l’interface du check nouvellement créé :

surveillance-services-systemd-check

Notifications

Healthchecks propose une intégration avec de nombreux services de notifications, dont Gotify.

Dans un premier temps, il sera nĂ©cessaire d’ajouter une Application dans Gotify :

surveillance-services-systemd-gotify-create-app

Renseigner le nom de l’Application, ici “Healthchecks”.

Une fois créé, il sera possible de récupérer son token depuis la liste des Applications : surveillance-services-systemd-gotify-view-app

Puis retourner dans Healthchecks, et dans la section Integrations, ajouter une intégration Gotify : surveillance-services-systemd-integrations

Et renseigner l’adresse du serveur Gotify et le token de l’Application :

surveillance-services-systemd-add-gotify

Autres pistes ?

Ask HN: How do you monitor your systemd services?