Compare commits

...

10 Commits

Author SHA1 Message Date
Kevin Veen-Birkenbach
f79bac2927 Release version 1.0.0 2026-05-12 23:38:38 +02:00
Kevin Veen-Birkenbach
88e9127f9b Use $SERVER env var in README instead of YOUR_SERVER_IP placeholder
`$SERVER` is set once at the top of the setup flow, then reused across
all five setup sections and the debugging chapter. Avoids 24× repetition
of the placeholder and lets the reader paste the blocks directly after
`export SERVER=...`.

`$SERVER` (not `$HOST`) because zsh sets `$HOST` to the local hostname
as a built-in parameter, and many tools use `$HOST` conventionally — name
collision would be confusing.

While here: drop the "(recommended)" hedge on `hal fix static-ip` in
section 3 of the install flow — given Hetzner's DHCP fragility (the bug
that caused this whole debugging session), the static cmdline IP belongs
in the standard install path, not as an optional extra.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:21:24 +02:00
Kevin Veen-Birkenbach
3cf66640b5 Reorganized hal CLI into subcommand groups + MIT licensed
CLI structure now:
  hal {status,diagnose,unlock,forget} HOST
  hal connect {rescue,chroot,server} HOST [CMD]
  hal setup   {image,dropbear,grub,encrypt-root} HOST
  hal fix     {boot,network,grub,kernel,static-ip,upgrade,expand-fs} HOST

Added subcommands cover the previously-manual sections of the README:
  setup image       — upload autosetup + run installimage
  setup dropbear    — install dropbear + mkinitcpio plugins + patch HOOKS
  setup grub        — initial grub install for LUKS boot
  setup encrypt-root — full LUKS conversion of installed root
  connect server    — SSH to booted Arch (vs rescue/chroot)
  unlock            — cryptroot-unlock via dropbear with passphrase from keyring
  fix expand-fs     — lvresize + btrfs resize

Renames (breaking):
  upgrade-system    -> fix upgrade
  expand-fs         -> fix expand-fs
  forget-passphrase -> forget
  reinstall-grub    -> fix grub
  downgrade-kernel  -> fix kernel
  use-static-ip     -> fix static-ip
  fix-{boot,network} -> fix {boot,network}
  install-{image,grub} -> setup {image,grub}
  setup-dropbear    -> setup dropbear
  encrypt-root      -> setup encrypt-root

Removed downgrade-initramfs (never verified, narrow use case).

README rewritten to reference only hal commands; raw bash blocks for
pacman/cryptsetup/grub-install/mount/chroot are gone. Added autosetup.example
as a template for `hal setup image --autosetup PATH`.

Licensed under MIT (LICENSE file added). Author and homepage shown in
hal --version, hal --help, pyproject.toml, and README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:10:06 +02:00
Kevin Veen-Birkenbach
181240eae7 Added hal Python CLI
Wraps the rescue/chroot/diagnose/fix workflows in a single tool with
LUKS-passphrase keyring caching. Subcommands: status, connect rescue,
connect chroot, diagnose, fix-boot, fix-network, downgrade-kernel,
downgrade-initramfs, reinstall-grub, use-static-ip, upgrade-system,
forget-passphrase.

connect subcommands accept an optional remote command after the host
for non-interactive execution.

README updated to reference hal instead of the previous shell scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:03:59 +02:00
Kevin Veen-Birkenbach [aka. Frantz]
841a974123 Removed 2020-10-15 13:56:40 +02:00
Kevin Veen-Birkenbach [aka. Frantz]
8c54e4d02e Implemented volume expand 2020-10-14 14:47:51 +02:00
Kevin Veen-Birkenbach [aka. Frantz]
0513150142 Optimized for final draft 2020-10-05 14:09:08 +02:00
Kevin Veen-Birkenbach [aka. Frantz]
6992e4b01d Solved mistake 2020-10-04 21:06:22 +02:00
Kevin Veen-Birkenbach [aka. Frantz]
a79551d867 Activated sshd 2020-05-15 09:38:17 +02:00
Kevin Veen-Birkenbach [aka. Frantz]
9341ab6cdd Deleted unnecessary code 2020-04-27 19:13:29 +02:00
26 changed files with 2176 additions and 317 deletions

29
.claude/settings.json Normal file
View File

@@ -0,0 +1,29 @@
{
"permissions": {
"allow": [
"Edit",
"Write",
"Bash(*)",
"WebFetch(domain:pypi.org)",
"WebFetch(domain:files.pythonhosted.org)",
"Bash(python3 -c ' *)",
"WebFetch(domain:api.github.com)"
],
"ask": [
"Bash(*hal *)",
"Bash(*hetzner_arch_luks *)",
"Bash(ssh *)",
"Bash(scp *)",
"Bash(sftp *)"
]
},
"sandbox": {
"enabled": true,
"autoAllowBashIfSandboxed": true,
"network": {
"allowedDomains": [
"*"
]
}
}
}

0
.codex Normal file
View File

39
.gitignore vendored Normal file
View File

@@ -0,0 +1,39 @@
# Python build / runtime artifacts
__pycache__/
*.py[cod]
*$py.class
*.egg-info/
*.egg
.eggs/
build/
dist/
wheels/
pip-wheel-metadata/
# Virtual environments
.venv/
venv/
env/
ENV/
# Tooling caches
.pytest_cache/
.mypy_cache/
.ruff_cache/
.tox/
.coverage
.coverage.*
htmlcov/
# Editor / IDE
.idea/
.vscode/
*.swp
*~
.DS_Store
# Claude Code: personal overrides (settings.json itself is checked in)
.claude/settings.local.json
# Diagnostic output from `hal diagnose ... | tee diagnose-*.log`
diagnose-*.log

4
CHANGELOG.md Normal file
View File

@@ -0,0 +1,4 @@
## [1.0.0] - 2026-05-12
* Official Release 🥳

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 Kevin Veen-Birkenbach <kevin@veen.world>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

4
MIRRORS Normal file
View File

@@ -0,0 +1,4 @@
git@github.com:kevinveenbirkenbach/hetzner-arch-luks.git
ssh://git@code.infinito.nexus:2201/kevinveenbirkenbach/hetzner-arch-luks.git
ssh://git@git.veen.world:2201/kevinveenbirkenbach/hetzner-arch-luks.git
https://pypi.org/project/hetzner-arch-luks/

39
Makefile Normal file
View File

@@ -0,0 +1,39 @@
# Top-level targets for the hetzner-arch-luks helper package.
#
# Usage:
# make install # editable install for the current user
# make uninstall
# make clean # remove Python build artifacts
# make check # quick smoke tests (imports + --help)
PYTHON ?= python3
PIP ?= $(PYTHON) -m pip
.DEFAULT_GOAL := help
.PHONY: help install install-system uninstall clean check
help:
@echo "Targets:"
@echo " install pip install --user -e ."
@echo " install-system pip install -e . (system-wide; needs sudo or venv)"
@echo " uninstall remove the installed package"
@echo " clean remove __pycache__, *.egg-info, build/, dist/"
@echo " check run package smoke tests"
install:
$(PIP) install --user -e .
install-system:
$(PIP) install -e .
uninstall:
$(PIP) uninstall -y hetzner-arch-luks
clean:
rm -rf build dist
find . -type d -name '__pycache__' -prune -exec rm -rf {} +
find . -type d -name '*.egg-info' -prune -exec rm -rf {} +
check:
$(PYTHON) -m hetzner_arch_luks --help >/dev/null
$(PYTHON) -c "from hetzner_arch_luks import cli, ssh, probe, remote; print('imports OK')"

441
README.md
View File

@@ -1,357 +1,164 @@
# Arch Linux with LUKS and btrfs on a Hetzner server (DRAFT)
# Arch Linux with LUKS and btrfs on a Hetzner server
## Software
This guide shows how to set up the following software composition:
* [Arch Linux](https://www.archlinux.de/)
* [btrfs](https://en.wikipedia.org/wiki/Btrfs)
* [LUKS](https://wiki.archlinux.org/index.php/Dm-crypt)
A small Python CLI (`hal`) that wraps every step of installing, encrypting, and
maintaining an [Arch Linux](https://www.archlinux.de/) server on
[Hetzner](https://www.hetzner.com/) Dedicated hardware with software RAID,
[LUKS](https://wiki.archlinux.org/index.php/Dm-crypt) full-disk encryption,
[btrfs](https://en.wikipedia.org/wiki/Btrfs) on top of LVM, and remote unlock
via [dropbear](https://wiki.archlinux.org/title/Dm-crypt/Specialties#busybox-based_initramfs_(built_with_mkinitcpio))
in the initramfs.
## Requirements
Written for a [Dedicated](https://de.wikipedia.org/wiki/Server#Dedizierte_Server) [Hetzner](https://www.hetzner.com/) server with the following hardware specifications:
```
CPU1: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz (Cores 8)
Memory: 15973 MB
Disk /dev/sda: 3000 GB (=> 2794 GiB)
Disk /dev/sdb: 3000 GB (=> 2794 GiB)
Total capacity 5589 GiB with 2 Disks
```
**Author:** Kevin Veen-Birkenbach &lt;[kevin@veen.world](mailto:kevin@veen.world)&gt; — [veen.world](https://veen.world)
**License:** MIT — see [LICENSE](./LICENSE)
## Legend
The following symbols show in which environment the code is executed:
* :computer: Client
* :ambulance: [Hetzner Rescue System](https://wiki.hetzner.de/index.php/Hetzner_Rescue-System/en)
* :ghost: Chroot from Rescue System into Arch
* :minidisc: Arch OS
## Guide
### 1. Configure and Install Image
#### 1.1 Login to Hetzner Rescue System
:computer: :
```bash
ssh root@your_server_ip
```
#### 1.2 Create the /autosetup
:ambulance: :
## Install the CLI
```bash
nano /autosetup
make install # → pip install --user -e .
hal --help
```
Save the following content into this file:
After install, every step below is a single `hal` subcommand.
```
## Hetzner Online GmbH - installimage - config
## Subcommand reference
DRIVE1 /dev/sda
DRIVE2 /dev/sdb
Run `hal --help`, `hal <group> --help`, or `hal <group> <target> --help` for the live reference.
## SOFTWARE RAID:
## activate software RAID? < 0 | 1 >
SWRAID 1
### Top-level
## Choose the level for the software RAID < 0 | 1 | 10 >
SWRAIDLEVEL 1
| Command | What it does |
|---|---|
| `hal status <host>` | Ping + port scan + SSH banner. No login. |
| `hal diagnose <host>` | Rescue → chroot, runs a fixed inspection script. Pipe with `tee` to save. |
| `hal unlock <host>` | Send the LUKS passphrase from the keyring to dropbear (`cryptroot-unlock`). |
| `hal forget <host>` | Clear the cached LUKS passphrase from libsecret. |
## BOOTLOADER:
BOOTLOADER grub
### `hal connect <target> <host> [cmd]`
## HOSTNAME:
HOSTNAME hetzner-arch-luks
#Adapt the hostname to your needs
Open a shell, or run `cmd` non-interactively.
## PARTITIONS / FILESYSTEMS:
PART /boot btrfs 512M
PART lvm vg0 all
LV vg0 swap swap swap 8G
LV vg0 root / btrfs 10G
| Target | Where it goes |
|---|---|
| `rescue` | Hetzner Rescue OS |
| `server` | Booted Arch system |
| `chroot` | Rescue → chroot of installed Arch (LUKS-unlocks + mounts first) |
## OPERATING SYSTEM IMAGE:
IMAGE /root/.oldroot/nfs/install/../images/archlinux-latest-64-minimal.tar.gz
```
#### 1.3 Install Image
:ambulance: :
```bash
installimage
```
#### 1.4 Restart
:ambulance: :
```bash
reboot
```
### `hal setup <target> <host>` — one-time install operations
### 2. Setup System
#### 2.1 Login to server
:computer: :
```bash
ssh-keygen -f "$HOME/.ssh/known_hosts" -R your_server_ip
ssh root@your_server_ip
```
#### 2.2 Update the system
:minidisc: :
```bash
pacman -Syyu
```
#### 2.3 Install administration tools:
:minidisc: :
```bash
pacman -S nano
```
| Target | What it does |
|---|---|
| `image --autosetup PATH` | In rescue: upload autosetup, run `installimage`. **Destructive.** |
| `dropbear` | Booted Arch: install dropbear + mkinitcpio plugins, copy authorized_keys, patch HOOKS. |
| `grub` | Rescue → chroot: install grub package, write LUKS-aware `/etc/default/grub`, grub-install on every boot disk. |
| `encrypt-root` | Rescue: LUKS-encrypt `/dev/md1`, preserve data via `/oldroot` copy. **Destructive on `/dev/md1`. Confirms before format.** |
### 3. Prepare System for Unlocking via SSH
#### 3.1 Install software
:minidisc: :
```bash
pacman -S busybox mkinitcpio-dropbear mkinitcpio-utils mkinitcpio-netconf
```
#### 3.2 Copy authorized keys to dropbear
> :warning: I don't know if the following step is correct. Later during executing ***mkinitcpio -p linux*** the following error appears:
```bash
-> Running build hook: [dropbear]
Error: Unrecognised key type
Error reading key from '/etc/ssh/ssh_host_rsa_key'
Error: Unrecognised key type
Error reading key from '/etc/ssh/ssh_host_dsa_key'
Error: Unrecognised key type
Error reading key from '/etc/ssh/ssh_host_ecdsa_key'
```
I assume this is connected to this.
The following links may help to solve the problem:
* https://github.com/grazzolini/mkinitcpio-dropbear/issues/8
* https://www.reddit.com/r/archlinux/comments/a8pcff/remote_unlock_encrypted_archlinux_with/
### `hal fix <target> <host>` — recovery + maintenance operations
| Target | What it does |
|---|---|
| `boot` | Patch `PermitRootLogin`, enable persistent journald. |
| `network` | Rewrite `.network` files to match by MACAddress= instead of interface name. |
| `grub` | Refresh Stage1 + core.img in MBR (Arch doesn't do this automatically after grub upgrades). |
| `kernel` | Roll the `linux` package back to the previous version (cache or archive.archlinux.org). |
| `static-ip` | Replace `ip=dhcp` in `/etc/default/grub` with a static cmdline IP derived from `/etc/systemd/network/*.network`. |
| `upgrade` | Full `pacman -Syyu` + initramfs rebuild + grub-install on every boot disk. |
| `expand-fs` | On booted Arch: `lvresize -l +100%FREE /dev/vg0/root && btrfs filesystem resize max /`. |
The LUKS passphrase is prompted (hidden) on first use and cached in the libsecret keyring per host — subsequent runs against the same host don't prompt.
## Setup flow
Each section is a small handful of `hal` commands. Click into the corresponding
table row above for what each one actually does.
Set the server IP/hostname once per shell — every block below uses `$SERVER`:
```bash
cp -v ~/.ssh/authorized_keys /etc/dropbear/root_key
export SERVER=your_server_ip # e.g. 46.4.224.77 or boot.echoserver
```
```bash
rm /etc/ssh/ssh_host_*
ssh-keygen -A -m PEM
```
or
### 1. Install Arch via installimage
```bash
ssh-keygen -m PEM -p -b 8192 -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key
ssh-keygen -m PEM -p -b 8192 -t rsa -f /etc/ssh/ssh_host_rsa_key
ssh-keygen -m PEM -p -b 8192 -t dsa -f /etc/ssh/ssh_host_dsa_key
ssh-keygen -y -f /etc/ssh/ssh_host_ecdsa_key > /etc/ssh/ssh_host_ecdsa_key.pub
ssh-keygen -y -f /etc/ssh/ssh_host_rsa_key > /etc/ssh/ssh_host_rsa_key.pub
ssh-keygen -y -f /etc/ssh/ssh_host_dsa_key > /etc/ssh/ssh_host_dsa_key.pub
hal connect rescue "$SERVER" # verify rescue is up
hal setup image "$SERVER" --autosetup autosetup # see autosetup.example
hal connect rescue "$SERVER" reboot
```
```bash
dropbearconvert openssh dropbear /etc/ssh/ssh_host_rsa_key /etc/dropbear/dropbear_rsa_host_key
dropbearconvert openssh dropbear /etc/ssh/ssh_host_dsa_key /etc/dropbear/dropbear_dss_host_key
```
#### 3.3 Modify /etc/mkinitcpio.conf
:minidisc: :
```bash
nano /etc/mkinitcpio.conf
```
##### Replace
**Old:**
```
HOOKS=(base udev autodetect modconf block mdadm_udev lvm2 filesystems keyboard fsck)
```
**New:**
```
HOOKS=(base udev autodetect modconf block mdadm_udev lvm2 netconf dropbear encryptssh filesystems keyboard fsck)
```
> :warning: In [one of the guides](http://daemons-point.com/blog/2019/10/20/hetzner-verschluesselt/#etcinitramfs-toolsinitramfsconf-anpassen) the ***/etc/initramfs-tools/initramfs.conf*** get modified. Don't know how to implement this for ***mkinitcpio***.<br>
**Old:**
```
BUSYBOX=auto
```
**New:**
```
BUSYBOX=y
```
Tip: copy `autosetup.example` to `autosetup`, edit `DRIVE1`/`DRIVE2`/`HOSTNAME`,
then run `setup image`.
### 4. Activate Encryption
#### 4.1 Activate Rescue System
Activate the rescue system https://robot.your-server.de/server
#### 4.2 Reboot
:minidisc: :
```bash
reboot
```
#### 4.3 Login to the rescue system
:computer: :
```bash
ssh-keygen -f "$HOME/.ssh/known_hosts" -R your_server_ip
ssh root@your_server_ip
```
#### 4.4 Mount the "system"
:ambulance: :
```bash
vgscan -v
vgchange -a y
mount /dev/mapper/vg0-root /mnt
```
#### 4.5 Copy "system"
:ambulance: :
```bash
echo 0 >/proc/sys/dev/raid/speed_limit_max
mkdir /oldroot
cp -va /mnt/. /oldroot/.
echo 200000 >/proc/sys/dev/raid/speed_limit_max
```
#### 4.6 Unmount the "system"
:ambulance: :
```bash
umount /mnt
```
#### 4.7 Delete decrypted LVM-Volume-Group
:ambulance: :
```bash
vgremove vg0
```
#### 4.8 Check drive state
:ambulance: :
```bash
cat /proc/mdstat
```
#### 4.9 Encrypt MD1 by executing
:ambulance: :
```bash
cryptsetup --cipher aes-xts-plain64 --key-size 256 --hash sha256 --iter-time=10000 luksFormat /dev/md1
cryptsetup luksOpen /dev/md1 cryptroot
pvcreate /dev/mapper/cryptroot
vgcreate vg0 /dev/mapper/cryptroot
lvcreate -n swap -L8G vg0
lvcreate -n root -L10G vg0
mkfs.btrfs /dev/vg0/root
mkswap /dev/vg0/swap
```
#### 4.10 Mount encrypted
:ambulance: :
```bash
mount /dev/vg0/root /mnt
```
#### 4.12 Copy "system"
:ambulance: :
```bash
echo 0 >/proc/sys/dev/raid/speed_limit_max
cp -av /oldroot/. /mnt/.
echo 200000 >/proc/sys/dev/raid/speed_limit_max
```
#### 4.13 Integrate Finale Installation
:ambulance: :
```bash
mount /dev/md0 /mnt/boot
mount --bind /dev /mnt/dev
mount --bind /sys /mnt/sys
mount --bind /proc /mnt/proc
chroot /mnt
```
#### 4.14
:ghost: :
```bash
echo "cryptroot /dev/md1 none luks" >> /etc/crypttab
```
#### 4.15 Create an initial ramdisk
:ghost: :
```bash
mkinitcpio -p linux
```
### 5 Grub
#### 5.1 Install Grub
:ghost: :
```bash
pacman -S grub
```
#### 5.2 Configure /etc/default/grub
:ghost: :
### 2. Boot Arch, install the dropbear stack
```bash
nano /etc/default/grub
```
> :warning: I'm not shure if the following is correct. Please check out this [link](https://wiki.archlinux.org/index.php/Dm-crypt/Specialties#Remote_unlocking_(hooks:_netconf,_dropbear,_tinyssh,_ppp)) . I appreciate feedback :two_hearts:
> :warning: I don't know if the raid also needs to be configured in the GRUB_CMDLINE_LINUX parameter.
Change the following parameters:
```bash
GRUB_CMDLINE_LINUX="cryptdevice=/dev/md1:root ip=dhcp"
GRUB_ENABLE_CRYPTODISK=y # Not secure if necessary
```
:information_source: Further [information](https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_an_entire_system#Configuring_GRUB).
#### 5.3 Make and Install on Hard-drives
:ghost: :
```bash
grub-mkconfig -o /boot/grub/grub.cfg
grub-install /dev/sda
grub-install /dev/sdb
hal connect server "$SERVER" # verify SSH works
hal connect server "$SERVER" pacman -Syyu # bring system current
hal setup dropbear "$SERVER" # dropbear + mkinitcpio plugins + HOOKS
```
#### 5.4 Restart System
:ghost: :ambulance: :
### 3. Convert root to LUKS
Activate Rescue in the Hetzner Robot UI, then:
```bash
exit
umount /mnt/boot /mnt/proc /mnt/sys /mnt/dev
umount /mnt
sync
reboot
```
### 6. Encryption Procedure
#### 6.1 Decrypt server
:computer: :
```bash
ssh -o UserKnownHostsFile=/dev/null root@your_server_ip
cryptroot-unlock
exit
```
#### 6.2 Login to server
:computer: :
```bash
ssh-keygen -f "$HOME/.ssh/known_hosts" -R your_server_ip
ssh root@your_server_ip
```
## 7. Debugging
### 7.1 Login to System from Rescue System
:ambulance: :
```bash
cryptsetup luksOpen /dev/md1 cryptroot
mount /dev/vg0/root /mnt
mount /dev/md0 /mnt/boot
mount --bind /dev /mnt/dev
mount --bind /sys /mnt/sys
mount --bind /proc /mnt/proc
chroot /mnt
```
### 7.2 Logout from chroot environment
:ghost: :ambulance: :
```bash
exit
umount /mnt/boot /mnt/proc /mnt/sys /mnt/dev
umount /mnt
sync
reboot
hal connect server "$SERVER" reboot # boots back into rescue
hal connect rescue "$SERVER" # verify rescue is up
hal setup encrypt-root "$SERVER" # LUKS conversion — DESTRUCTIVE
hal setup grub "$SERVER" # initial GRUB for LUKS boot
hal fix static-ip "$SERVER" # static initramfs IP — Hetzner DHCP is fragile
```
### 7.3 Regenerate GRUB and Arch
:ghost: :
Deactivate Rescue in the Hetzner Robot UI, then:
```bash
mkinitcpio -p linux
grub-mkconfig -o /boot/grub/grub.cfg
grub-install /dev/sda
grub-install /dev/sdb
hal connect rescue "$SERVER" reboot # final reboot into encrypted system
```
### 4. Day-to-day use
After every reboot the system blocks at dropbear in initramfs waiting for the
LUKS passphrase. From your client:
```bash
hal status "$SERVER" # wait for dropbear / sshd
hal unlock "$SERVER" # send passphrase to dropbear
hal connect server "$SERVER" # normal SSH after unlock
```
### 5. Expand the root filesystem later
If the autosetup gave you a small root LV and the rest is free LVM space:
```bash
hal fix expand-fs "$SERVER"
```
## Debugging an unresponsive server
The server isn't booting / SSH never comes up:
```bash
# 1. Reach the server's chroot
hal connect rescue "$SERVER" # via Hetzner Robot → Rescue first
hal diagnose "$SERVER" | tee "diag-$(date +%F-%H%M).log"
# 2. Apply best-guess fixes in roughly this order
hal fix boot "$SERVER" # sshd config + journald
hal fix network "$SERVER" # interface naming drift
hal fix grub "$SERVER" # stale MBR after grub upgrades
hal fix static-ip "$SERVER" # DHCP-in-initramfs fragility
# 3. Last-resort kernel rollback (if a kernel bump is the suspect)
hal fix kernel "$SERVER"
# 4. Or, after fixing whatever was broken, upgrade everything cleanly
hal fix upgrade "$SERVER"
```
Every `hal` chroot command makes its own backups (`<file>.hal-backup`)
before mutating anything, so individual fixes can be reverted by hand.
## Sources
The code is adapted from the following guides:
* http://daemons-point.com/blog/2019/10/20/hetzner-verschluesselt/
* https://www.howtoforge.com/using-the-btrfs-filesystem-with-raid1-with-ubuntu-12.10-on-a-hetzner-server

31
autosetup.example Normal file
View File

@@ -0,0 +1,31 @@
## Hetzner Online GmbH - installimage - config
## Copy to a working file, adjust DRIVE / HOSTNAME / sizes to your box,
## then upload via: hal install-image <host> --autosetup <path>
## Adjust DRIVE1 / DRIVE2 to your actual disks. Typical values:
## - /dev/sda /dev/sdb (SATA/SAS auction boxes)
## - /dev/nvme0n1 /dev/nvme1n1 (NVMe-based servers)
DRIVE1 /dev/sda
DRIVE2 /dev/sdb
## SOFTWARE RAID:
## activate software RAID? < 0 | 1 >
SWRAID 1
## Choose the level for the software RAID < 0 | 1 | 10 >
SWRAIDLEVEL 1
## BOOTLOADER:
BOOTLOADER grub
## HOSTNAME: adapt to your needs
HOSTNAME hetzner-arch-luks
## PARTITIONS / FILESYSTEMS:
## /boot must be its own partition (btrfs/ext4); root and swap on LVM.
PART /boot btrfs 512M
PART lvm vg0 all
LV vg0 swap swap swap 8G
LV vg0 root / btrfs 10G
## OPERATING SYSTEM IMAGE:
IMAGE /root/.oldroot/nfs/install/../images/archlinux-latest-64-minimal.tar.gz

33
pyproject.toml Normal file
View File

@@ -0,0 +1,33 @@
[build-system]
# 77+ for PEP 639 SPDX `license = "MIT"` + `license-files`.
requires = ["setuptools>=77"]
build-backend = "setuptools.build_meta"
[project]
name = "hetzner-arch-luks"
version = "1.0.0"
description = "End-to-end CLI (`hal`) for installing, encrypting, debugging and maintaining an Arch Linux server on Hetzner Dedicated hardware with software RAID, LUKS full-disk encryption, btrfs on LVM, and remote unlock via dropbear in the initramfs."
readme = "README.md"
requires-python = ">=3.9"
authors = [{ name = "Kevin Veen-Birkenbach", email = "kevin@veen.world" }]
maintainers = [{ name = "Kevin Veen-Birkenbach", email = "kevin@veen.world" }]
license = "MIT"
license-files = ["LICENSE"]
urls = { Homepage = "https://veen.world", Repository = "https://github.com/kevinveenbirkenbach/hetzner-arch-luks" }
classifiers = [
"Environment :: Console",
"Operating System :: POSIX :: Linux",
"Programming Language :: Python :: 3",
]
[project.scripts]
hal = "hetzner_arch_luks.cli:main"
[tool.setuptools]
package-dir = { "" = "src" }
[tool.setuptools.packages.find]
where = ["src"]
[tool.setuptools.package-data]
hetzner_arch_luks = ["resources/**/*.sh"]

View File

@@ -0,0 +1 @@
__version__ = "0.1.0"

View File

@@ -0,0 +1,4 @@
from .cli import main
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,291 @@
"""Command-line interface for the hetzner-arch-luks helpers.
Top-level structure:
hal status HOST
hal diagnose HOST
hal unlock HOST
hal forget HOST
hal connect {rescue,chroot,server} HOST [CMD...]
hal setup {image,dropbear,grub,encrypt-root} HOST [...]
hal fix {boot,network,grub,kernel,static-ip,upgrade,expand-fs} HOST
For commands that need the LUKS passphrase, the prompt happens *first*,
before any network IO. The passphrase is cached per-host in the libsecret
keyring so subsequent runs against the same host don't prompt.
"""
from __future__ import annotations
import argparse
import sys
from . import __version__, probe, remote
_AUTHOR = "Kevin Veen-Birkenbach <kevin@veen.world>"
_HOMEPAGE = "https://veen.world"
def _add_passphrase_flag(p: argparse.ArgumentParser) -> None:
p.add_argument(
"--no-passphrase-prompt",
action="store_true",
help="Skip the early LUKS prompt (use when LUKS is already open from a prior run).",
)
def _add_host(p: argparse.ArgumentParser) -> None:
p.add_argument("host")
def _build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
prog="hal",
description=(
"End-to-end CLI for installing, encrypting, debugging and maintaining "
"an Arch Linux server on Hetzner Dedicated hardware with software RAID, "
"LUKS full-disk encryption, btrfs on LVM, and remote unlock via dropbear "
"in the initramfs."
),
epilog=f"Author: {_AUTHOR}{_HOMEPAGE} License: MIT",
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"--version",
action="version",
version=(
f"hal {__version__}\n"
f"Author: {_AUTHOR}\n"
f"Homepage: {_HOMEPAGE}\n"
f"License: MIT"
),
)
sub = parser.add_subparsers(dest="cmd", required=True, metavar="COMMAND")
# -------------------- Top-level commands --------------------
p = sub.add_parser(
"status",
help="Probe reachability of a host (ping + ports + SSH banner). No login.",
)
_add_host(p)
p = sub.add_parser(
"diagnose",
help="Collect a fixed inspection report from inside the installed system via rescue.",
)
_add_host(p)
_add_passphrase_flag(p)
p = sub.add_parser(
"unlock",
help="Send the LUKS passphrase from the keyring to dropbear (cryptroot-unlock). Use after a reboot.",
)
_add_host(p)
_add_passphrase_flag(p)
p = sub.add_parser(
"forget",
help="Drop the cached LUKS passphrase for a host from the libsecret keyring.",
)
_add_host(p)
# -------------------- `connect` group --------------------
p_connect = sub.add_parser(
"connect",
help="Open a remote shell on rescue / chroot / server, or run a one-off command there.",
)
p_connect_sub = p_connect.add_subparsers(
dest="target", required=True, metavar="TARGET"
)
p = p_connect_sub.add_parser(
"rescue",
help="SSH into the Hetzner rescue system. Append a command for non-interactive use.",
)
_add_host(p)
p.add_argument(
"command", nargs=argparse.REMAINDER,
help="Optional command + args to run on the rescue instead of an interactive shell.",
)
p = p_connect_sub.add_parser(
"chroot",
help="Unlock LUKS via rescue, mount, and drop into `chroot /mnt /bin/bash`. Append a command for non-interactive use.",
)
_add_host(p)
_add_passphrase_flag(p)
p.add_argument(
"command", nargs=argparse.REMAINDER,
help="Optional command + args to run inside the chroot instead of an interactive shell.",
)
p = p_connect_sub.add_parser(
"server",
help="SSH into the booted Arch system. Append a command for non-interactive use.",
)
_add_host(p)
p.add_argument(
"command", nargs=argparse.REMAINDER,
help="Optional command + args to run on the server instead of an interactive shell.",
)
# -------------------- `setup` group (one-time install) --------------------
p_setup = sub.add_parser(
"setup",
help="One-time install operations: image / dropbear / grub / encrypt-root.",
)
p_setup_sub = p_setup.add_subparsers(
dest="target", required=True, metavar="TARGET"
)
p = p_setup_sub.add_parser(
"image",
help="In rescue: upload an autosetup file and run `installimage`. DESTRUCTIVE.",
)
_add_host(p)
p.add_argument(
"--autosetup", required=True,
help="Path to a local autosetup config file (uploaded to /autosetup on rescue).",
)
p = p_setup_sub.add_parser(
"dropbear",
help="On the booted system: install dropbear + mkinitcpio plugins, copy authorized_keys, patch HOOKS. MUTATES.",
)
_add_host(p)
p = p_setup_sub.add_parser(
"grub",
help="In chroot (initial install): install grub package, write LUKS-aware /etc/default/grub, grub-install on every boot disk. MUTATES.",
)
_add_host(p)
_add_passphrase_flag(p)
p = p_setup_sub.add_parser(
"encrypt-root",
help="In rescue: full LUKS conversion of an installed Arch (sections 4.44.15). DESTRUCTIVE — confirms before format.",
)
_add_host(p)
# -------------------- `fix` group (recovery operations) --------------------
p_fix = sub.add_parser(
"fix",
help="Recovery + maintenance operations: boot / network / grub / kernel / static-ip / upgrade / expand-fs.",
)
p_fix_sub = p_fix.add_subparsers(
dest="target", required=True, metavar="TARGET"
)
p = p_fix_sub.add_parser(
"boot",
help="In chroot: patch PermitRootLogin to prohibit-password, enable persistent journald. MUTATES.",
)
_add_host(p)
_add_passphrase_flag(p)
p = p_fix_sub.add_parser(
"network",
help="In chroot: rewrite /etc/systemd/network/*.network to match by MACAddress= instead of interface name. MUTATES.",
)
_add_host(p)
_add_passphrase_flag(p)
p = p_fix_sub.add_parser(
"grub",
help="In chroot: re-run grub-install on every disk backing /boot. MUTATES the MBR.",
)
_add_host(p)
_add_passphrase_flag(p)
p = p_fix_sub.add_parser(
"kernel",
help="In chroot: roll the `linux` package back to the previous version (cache or archive.archlinux.org). MUTATES.",
)
_add_host(p)
_add_passphrase_flag(p)
p = p_fix_sub.add_parser(
"static-ip",
help="In chroot: replace `ip=dhcp` in /etc/default/grub with a static kernel-cmdline IP derived from the .network file. MUTATES.",
)
_add_host(p)
_add_passphrase_flag(p)
p = p_fix_sub.add_parser(
"upgrade",
help="In chroot: full `pacman -Syyu` + rebuild initramfs + grub-install on every boot disk. MUTATES.",
)
_add_host(p)
_add_passphrase_flag(p)
p = p_fix_sub.add_parser(
"expand-fs",
help="On the booted system: `lvresize -l +100%%FREE /dev/vg0/root && btrfs filesystem resize max /`. MUTATES.",
)
_add_host(p)
return parser
def main(argv: list[str] | None = None) -> int:
args = _build_parser().parse_args(argv)
pp = not getattr(args, "no_passphrase_prompt", False)
# Top-level
if args.cmd == "status":
return probe.status(args.host)
if args.cmd == "diagnose":
return remote.diagnose(args.host, ask_passphrase=pp)
if args.cmd == "unlock":
return remote.unlock(args.host, ask_passphrase=pp)
if args.cmd == "forget":
return remote.forget_passphrase(args.host)
# connect group
if args.cmd == "connect":
cmd_list = getattr(args, "command", None) or None
if args.target == "rescue":
return remote.connect_rescue(args.host, command=cmd_list)
if args.target == "chroot":
return remote.connect_chroot(args.host, ask_passphrase=pp, command=cmd_list)
if args.target == "server":
return remote.connect_server(args.host, command=cmd_list)
# setup group
if args.cmd == "setup":
if args.target == "image":
return remote.install_image(args.host, args.autosetup)
if args.target == "dropbear":
return remote.setup_dropbear(args.host)
if args.target == "grub":
return remote.install_grub(args.host, ask_passphrase=pp)
if args.target == "encrypt-root":
return remote.encrypt_root(args.host)
# fix group
if args.cmd == "fix":
if args.target == "boot":
return remote.fix_boot(args.host, ask_passphrase=pp)
if args.target == "network":
return remote.fix_network(args.host, ask_passphrase=pp)
if args.target == "grub":
return remote.reinstall_grub(args.host, ask_passphrase=pp)
if args.target == "kernel":
return remote.downgrade_kernel(args.host, ask_passphrase=pp)
if args.target == "static-ip":
return remote.use_static_ip(args.host, ask_passphrase=pp)
if args.target == "upgrade":
return remote.upgrade_system(args.host, ask_passphrase=pp)
if args.target == "expand-fs":
return remote.expand_fs(args.host)
return 2
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,55 @@
"""Client-side reachability probes that need no SSH credentials."""
from __future__ import annotations
import shutil
import socket
import subprocess
def _have(cmd: str) -> bool:
return shutil.which(cmd) is not None
def _ssh_banner(host: str, port: int = 22, timeout: float = 3) -> str:
"""Read the first line the SSH server emits on connect.
Distinguishes Hetzner rescue (Debian OpenSSH banner) from installed Arch
(Arch OpenSSH banner) from Dropbear (Dropbear banner).
"""
try:
with socket.create_connection((host, port), timeout=timeout) as s:
s.settimeout(2)
data = s.recv(256)
return data.decode("utf-8", errors="replace").splitlines()[0] if data else ""
except (OSError, socket.timeout, UnicodeDecodeError):
return ""
def status(host: str) -> int:
"""Print a reachability report for `host`. Returns 0 always."""
print(f"==> ping (ICMP) {host}")
try:
subprocess.run(["ping", "-c", "2", "-W", "2", host], check=False)
except FileNotFoundError:
print("(ping not available)")
print()
print(f"==> ports 22, 222 on {host}")
if _have("nmap"):
subprocess.run(["nmap", "-Pn", "-p", "22,222", host], check=False)
else:
print("(nmap not installed; falling back to TCP probes)")
for port in (22, 222):
ok = False
try:
with socket.create_connection((host, port), timeout=3):
ok = True
except (OSError, socket.timeout):
pass
print(f" {port}: {'reachable' if ok else 'not reachable (filtered/closed/timeout)'}")
print()
print(f"==> SSH banner on {host}:22")
banner = _ssh_banner(host, 22)
print(banner if banner else "(no banner)")
return 0

View File

@@ -0,0 +1,413 @@
"""Orchestrates the rescue / chroot / diagnose flows over an SshSession.
Key UX choices:
- The LUKS passphrase is prompted *before* we touch the network, so the
user enters it once and can step away while the rest runs.
- On first prompt the passphrase is cached in the libsecret keyring
(GNOME Keyring / KWallet via secret-tool) so subsequent runs against
the same host skip the prompt entirely.
"""
from __future__ import annotations
import getpass
import importlib.resources
import shlex
import shutil
import subprocess
import sys
from .ssh import SshSession, wait_for_port
# Pre-LUKS step: assemble the RAID arrays. Idempotent (mdadm returns non-zero
# when arrays are already assembled — we swallow that).
_ASSEMBLE = "mdadm --assemble --scan 2>/dev/null || true"
# Post-LUKS step: activate LVM, mount root + boot, bind /dev /proc /sys /run.
# Idempotent: every mount is guarded with `mountpoint -q`.
_MOUNT = r"""
set -e
vgchange -ay >/dev/null
if ! mountpoint -q /mnt; then
mount /dev/vg0/root /mnt
mkdir -p /mnt/boot
mount /dev/md0 /mnt/boot
fi
for d in dev proc sys run; do
mountpoint -q "/mnt/$d" || mount --rbind "/$d" "/mnt/$d"
done
"""
# Schema for libsecret entries:
# service = hetzner-arch-luks
# host = <host>
_KEYRING_SERVICE = "hetzner-arch-luks"
# ---- keyring helpers (libsecret via secret-tool) ---------------------------
def _have_secret_tool() -> bool:
return shutil.which("secret-tool") is not None
def _keyring_load(host: str) -> str | None:
"""Look up the cached LUKS passphrase for `host`. None if not stored."""
if not _have_secret_tool():
return None
r = subprocess.run(
["secret-tool", "lookup", "service", _KEYRING_SERVICE, "host", host],
capture_output=True, text=True,
)
if r.returncode == 0 and r.stdout:
# secret-tool prints the secret raw, without trailing newline
return r.stdout
return None
def _keyring_store(host: str, passphrase: str) -> None:
"""Persist `passphrase` in libsecret under (service, host)."""
if not _have_secret_tool():
return
label = f"hetzner-arch-luks LUKS passphrase for {host}"
subprocess.run(
[
"secret-tool", "store", "--label", label,
"service", _KEYRING_SERVICE, "host", host,
],
input=passphrase, text=True, check=False,
)
def _keyring_clear(host: str) -> bool:
"""Drop the cached passphrase for `host`. Returns True if anything was deleted."""
if not _have_secret_tool():
return False
if _keyring_load(host) is None:
return False
subprocess.run(
["secret-tool", "clear", "service", _KEYRING_SERVICE, "host", host],
check=False, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
)
return True
# ---- passphrase prompt -----------------------------------------------------
def _prompt_passphrase(host: str, *, force_prompt: bool = False) -> str:
"""Get the LUKS passphrase for `host`.
Order:
1. Try the libsecret keyring (skipped if force_prompt=True or
secret-tool isn't installed).
2. Hidden prompt via getpass. On success, store to the keyring for
next time.
Empty input aborts the whole command.
"""
if not force_prompt:
cached = _keyring_load(host)
if cached:
print(f"(passphrase from keyring for {host})", file=sys.stderr)
return cached
p = getpass.getpass(f"LUKS passphrase for {host}: ")
if not p:
print("Empty passphrase — aborting.", file=sys.stderr)
sys.exit(1)
_keyring_store(host, p)
return p
# ---- session helpers -------------------------------------------------------
def _wait_rescue(host: str, timeout: int = 300) -> None:
print(f"==> Waiting for {host}:22 ...")
if not wait_for_port(host, 22, timeout=timeout):
print(f"Timeout: {host}:22 not reachable after {timeout}s", file=sys.stderr)
sys.exit(1)
def _luks_is_open(ssh: SshSession) -> bool:
r = ssh.run("test -e /dev/mapper/cryptroot", check=False, capture=True)
return r.returncode == 0
def _ensure_unlocked(ssh: SshSession, host: str, passphrase: str | None) -> None:
"""Open LUKS if needed. Retries once with a fresh prompt if the cached
passphrase from the keyring is rejected by cryptsetup.
cryptsetup reads the passphrase from stdin (via --key-file=-) and stops
at EOF. We send raw bytes with no trailing newline.
"""
if _luks_is_open(ssh):
print("==> LUKS already open.")
return
if passphrase is None:
passphrase = _prompt_passphrase(host)
print("==> Opening LUKS ...")
try:
ssh.run(
"cryptsetup luksOpen --key-file=- /dev/md1 cryptroot",
input_=passphrase.encode(),
)
except subprocess.CalledProcessError:
# Most likely: wrong passphrase. If we got it from the keyring,
# clear the bad entry and re-prompt once.
if _keyring_clear(host):
print(
"==> cryptsetup rejected the cached passphrase. Cleared keyring; re-prompting.",
file=sys.stderr,
)
passphrase = _prompt_passphrase(host, force_prompt=True)
ssh.run(
"cryptsetup luksOpen --key-file=- /dev/md1 cryptroot",
input_=passphrase.encode(),
)
else:
raise
def _setup(ssh: SshSession, host: str, passphrase: str | None) -> None:
"""Full sequence: assemble + LUKS + LVM + mount + binds."""
print("==> Assembling RAID ...")
ssh.run(_ASSEMBLE)
_ensure_unlocked(ssh, host, passphrase)
print("==> Activating LVM + mounting + binding ...")
ssh.run(_MOUNT)
# ---- public entry points (called by cli.py) --------------------------------
def _connect_simple(host: str, label: str, command: list[str] | None) -> int:
"""Shared body of `connect_rescue` and `connect_server` — wait for SSH,
then either drop into an interactive shell or run `command` and print.
"""
_wait_rescue(host)
with SshSession(host) as ssh:
if command:
cmd_str = " ".join(shlex.quote(c) for c in command)
print(f"==> Running on {label}: {cmd_str}")
ssh.run(cmd_str, check=False)
else:
print(f"==> Connected to {label}. Type 'exit' to leave.")
ssh.run("exec bash -l", tty=True, check=False)
return 0
def connect_rescue(host: str, *, command: list[str] | None = None) -> int:
"""Wait for rescue to come up, then open a shell or run `command`."""
return _connect_simple(host, "rescue", command)
def connect_server(host: str, *, command: list[str] | None = None) -> int:
"""Wait for the booted Arch system to come up, then open a shell or
run `command`. Same SSH plumbing as `connect_rescue`; named differently
for clarity in the docs."""
return _connect_simple(host, "server", command)
def connect_chroot(
host: str,
*,
ask_passphrase: bool = True,
command: list[str] | None = None,
) -> int:
"""Unlock LUKS via rescue, mount, then either open an interactive chroot
shell or run `command` inside the chroot non-interactively and print
its output."""
passphrase = _prompt_passphrase(host) if ask_passphrase else None
_wait_rescue(host)
with SshSession(host) as ssh:
_setup(ssh, host, passphrase)
if command:
# Pipe the command into chroot's bash via stdin — avoids all the
# quoting layers of `bash -c '<cmd>'` and is identical to how the
# diagnose/fix scripts are streamed in.
cmd_str = " ".join(shlex.quote(c) for c in command)
print(f"==> Running in chroot: {cmd_str}")
ssh.run("chroot /mnt /bin/bash", input_=(cmd_str + "\n").encode())
else:
print("==> Entering chroot. Type 'exit' to leave.")
ssh.run("chroot /mnt /bin/bash", tty=True, check=False)
return 0
def diagnose(host: str, *, ask_passphrase: bool = True) -> int:
"""Unlock + mount + run the chrooted diagnose script. Output goes to stdout."""
return _run_chroot_script(host, "diagnose/inside.sh", "diagnose", ask_passphrase)
def fix_boot(host: str, *, ask_passphrase: bool = True) -> int:
"""Unlock + mount + apply boot/SSH fixes inside chroot. MUTATES the system."""
return _run_chroot_script(host, "fix/boot.sh", "fix-boot", ask_passphrase)
def fix_network(host: str, *, ask_passphrase: bool = True) -> int:
"""Unlock + mount + rewrite .network files to use MACAddress= match. MUTATES."""
return _run_chroot_script(host, "fix/network.sh", "fix-network", ask_passphrase)
def downgrade_kernel(host: str, *, ask_passphrase: bool = True) -> int:
"""Unlock + mount + downgrade linux to the previous cached version. MUTATES."""
return _run_chroot_script(host, "fix/kernel.sh", "downgrade-kernel", ask_passphrase)
def reinstall_grub(host: str, *, ask_passphrase: bool = True) -> int:
"""Unlock + mount + grub-install on every disk backing /boot's RAID. MUTATES MBR."""
return _run_chroot_script(host, "fix/grub.sh", "reinstall-grub", ask_passphrase)
def use_static_ip(host: str, *, ask_passphrase: bool = True) -> int:
"""Replace ip=dhcp in /etc/default/grub with a static spec parsed from
the existing systemd-networkd .network file. Regenerates grub.cfg. MUTATES."""
return _run_chroot_script(host, "fix/static_ip.sh", "use-static-ip", ask_passphrase)
def upgrade_system(host: str, *, ask_passphrase: bool = True) -> int:
"""Unlock + mount + full `pacman -Syu` + rebuild initramfs + refresh GRUB
(config + MBR on all boot disks). Uses --disable-sandbox because the
Hetzner Rescue kernel lacks Landlock. MUTATES."""
return _run_chroot_script(host, "maintain/upgrade.sh", "upgrade-system", ask_passphrase)
def unlock(host: str, *, ask_passphrase: bool = True) -> int:
"""Pipe the LUKS passphrase to `cryptroot-unlock` on the dropbear that
is listening from initramfs. Use after a reboot, before the main sshd
is reachable. Uses a throwaway known_hosts to avoid host-key conflicts
between the dropbear and the real sshd (different host keys, same port).
"""
passphrase = _prompt_passphrase(host) if ask_passphrase else None
if passphrase is None:
print("Need a passphrase to send to cryptroot-unlock.", file=sys.stderr)
return 1
_wait_rescue(host) # really just "wait for port 22"
print(f"==> Sending passphrase to dropbear on {host} ...")
cmd = [
"ssh",
"-o", "UserKnownHostsFile=/dev/null",
"-o", "StrictHostKeyChecking=accept-new",
"-o", "GlobalKnownHostsFile=/dev/null",
"-o", "ConnectTimeout=10",
f"root@{host}",
"cryptroot-unlock",
]
r = subprocess.run(cmd, input=(passphrase + "\n").encode(), check=False)
if r.returncode == 0:
print("==> Passphrase accepted; system continues boot.")
else:
print(f"==> ssh/cryptroot-unlock exited with code {r.returncode}",
file=sys.stderr)
return r.returncode
def expand_fs(host: str) -> int:
"""Run `lvresize -l +100%FREE /dev/vg0/root && btrfs filesystem resize max /`
on the booted system. No LUKS passphrase needed — server is already up."""
_wait_rescue(host)
with SshSession(host) as ssh:
print("==> Expanding LVM root + btrfs filesystem ...")
ssh.run("lvresize -l +100%FREE /dev/vg0/root && btrfs filesystem resize max /")
return 0
def setup_dropbear(host: str) -> int:
"""Install dropbear + supporting packages, configure SSH keys, patch
/etc/mkinitcpio.conf HOOKS. Runs on the booted system. MUTATES."""
inside = (
importlib.resources
.files("hetzner_arch_luks")
.joinpath("resources/setup/dropbear.sh")
.read_bytes()
)
_wait_rescue(host)
with SshSession(host) as ssh:
print("==> Running setup-dropbear on the booted system ...")
ssh.run("bash -s", input_=inside)
return 0
def install_grub(host: str, *, ask_passphrase: bool = True) -> int:
"""Inside chroot: install grub package, write /etc/default/grub for
LUKS-encrypted root, grub-install on every boot disk, grub-mkconfig.
Used during the initial encryption setup. MUTATES."""
return _run_chroot_script(host, "setup/grub.sh", "install-grub", ask_passphrase)
def install_image(host: str, autosetup_path: str) -> int:
"""Upload an autosetup config to the rescue and run `installimage`.
DESTRUCTIVE — formats the disks per the autosetup contents."""
import pathlib
p = pathlib.Path(autosetup_path)
if not p.exists():
print(f"autosetup file not found: {autosetup_path}", file=sys.stderr)
return 1
content = p.read_bytes()
_wait_rescue(host)
with SshSession(host) as ssh:
print(f"==> Uploading {autosetup_path} → /autosetup on rescue ...")
ssh.run("cat > /autosetup", input_=content)
print("==> Running installimage (DESTRUCTIVE — this formats the disks!)")
ssh.run("installimage", tty=True)
return 0
def encrypt_root(host: str) -> int:
"""In rescue (NOT chroot): re-format /dev/md1 with LUKS, preserve the
installed root by copying through /oldroot, then mkinitcpio inside chroot.
Interactive: cryptsetup prompts for the new LUKS passphrase via the rescue
TTY. We upload the script to /root/_encrypt_root.sh and execute it with
a TTY allocated so cryptsetup's prompts work. DESTRUCTIVE on /dev/md1."""
content = (
importlib.resources
.files("hetzner_arch_luks")
.joinpath("resources/setup/encrypt_root.sh")
.read_bytes()
)
_wait_rescue(host)
with SshSession(host) as ssh:
print("==> Uploading encrypt-root script to rescue:/root/_encrypt_root.sh")
ssh.run("cat > /root/_encrypt_root.sh && chmod +x /root/_encrypt_root.sh",
input_=content)
print("==> Running encrypt-root (interactive — answer cryptsetup prompts)")
ssh.run("/root/_encrypt_root.sh", tty=True, check=False)
ssh.run("rm -f /root/_encrypt_root.sh", check=False)
return 0
def forget_passphrase(host: str) -> int:
"""Drop the stored LUKS passphrase for `host` from the libsecret keyring."""
if not _have_secret_tool():
print("secret-tool not installed — no keyring backend; nothing to clear.",
file=sys.stderr)
return 1
if _keyring_clear(host):
print(f"Cleared cached LUKS passphrase for {host}.")
return 0
print(f"No cached LUKS passphrase for {host}.")
return 0
def _run_chroot_script(host: str, resource: str, label: str, ask_passphrase: bool) -> int:
"""Shared driver: unlock + mount + pipe a packaged script into chrooted bash.
The script is streamed as stdin to `chroot /mnt /bin/bash`; bash reads its
program from stdin, so it runs inside the chroot without leaving any file
on the target.
"""
passphrase = _prompt_passphrase(host) if ask_passphrase else None
_wait_rescue(host)
inside = (
importlib.resources
.files("hetzner_arch_luks")
.joinpath(f"resources/{resource}")
.read_bytes()
)
with SshSession(host) as ssh:
_setup(ssh, host, passphrase)
print(f"==> Running {label} inside chroot ...")
ssh.run("chroot /mnt /bin/bash", input_=inside)
return 0

View File

@@ -0,0 +1,155 @@
#!/bin/bash
# Runs INSIDE the chroot of the installed Arch system. Prints diagnostics
# grouped by banner. Read-only — no state changes.
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "uname / os-release"
uname -a
cat /etc/os-release
banner "package versions (boot/storage/net/ssh)"
pacman -Q linux mkinitcpio openssh systemd device-mapper lvm2 grub \
cryptsetup mdadm dropbear 2>&1
pacman -Q mkinitcpio-utils mkinitcpio-dropbear mkinitcpio-netconf 2>&1 || true
banner "recent upgrades of boot/network/sshd components (last 60 matches)"
# Focused on the packages that most often break a Hetzner Arch+LUKS boot.
grep -E '\[ALPM\] (upgraded|installed|removed) (linux( |$)|systemd( |$)|mkinitcpio( |$)|openssh( |$)|dropbear( |$)|glibc( |$)|cryptsetup( |$)|lvm2( |$)|mdadm( |$)|grub( |$)|iproute2( |$)|nftables( |$)|iptables( |$)|firewalld( |$)|fail2ban( |$)|mkinitcpio-utils( |$)|mkinitcpio-dropbear( |$)|mkinitcpio-netconf( |$))' /var/log/pacman.log 2>/dev/null \
| tail -60 \
|| echo "(no matches)"
banner "last full-system upgrade transactions"
grep -nE 'starting full system upgrade|transaction completed' /var/log/pacman.log 2>/dev/null \
| tail -10 || echo "(no matches)"
banner "initcpio udev rules shipped on disk"
ls -l /usr/lib/initcpio/udev/ 2>&1
banner "is the historically broken file present?"
ls -l /usr/lib/initcpio/udev/11-dm-initramfs.rules 2>&1 || echo "absent"
banner "encryptssh install hook still references it?"
grep -n "11-dm-initramfs.rules" \
/usr/lib/initcpio/install/encryptssh \
/etc/initcpio/install/encryptssh 2>/dev/null || echo "no match"
banner "mkinitcpio.conf (HOOKS, MODULES, BINARIES, FILES, COMPRESSION)"
grep -E '^(HOOKS|MODULES|BINARIES|FILES|COMPRESSION)=' /etc/mkinitcpio.conf 2>&1
banner "/etc/crypttab"
cat /etc/crypttab 2>&1 || true
banner "/etc/fstab"
cat /etc/fstab 2>&1 || true
banner "/boot contents and free space"
ls -lh /boot 2>&1
df -h /boot 2>&1
banner "GRUB config + bootloader state"
ls -lh /boot/grub/ 2>&1
echo
if [ -f /boot/grub/grub.cfg ]; then
if command -v grub-script-check >/dev/null 2>&1; then
grub-script-check /boot/grub/grub.cfg 2>&1 && echo "grub.cfg: syntax OK"
else
echo "grub-script-check not available — skipping syntax check"
fi
echo
echo "-- menuentry / linux / initrd lines (first 40):"
grep -nE '^\s*(linux|initrd|menuentry)' /boot/grub/grub.cfg 2>&1 | head -40
echo
echo "-- referenced kernel/initramfs files exist?"
for p in $(grep -hE '^\s*(linux|initrd)\b' /boot/grub/grub.cfg 2>/dev/null \
| awk '{print $2}' | sort -u); do
if [ -e "$p" ]; then echo "EXISTS $p"
elif [ -e "/boot${p}" ]; then echo "EXISTS /boot${p} (grub.cfg path: $p)"
else echo "MISSING $p"
fi
done
else
echo "/boot/grub/grub.cfg NOT FOUND"
fi
echo
echo "-- grubenv:"
grub-editenv /boot/grub/grubenv list 2>/dev/null || cat /boot/grub/grubenv 2>/dev/null | head -5 || echo "(no grubenv)"
banner "initramfs contents — key tools actually packed in?"
if command -v lsinitcpio >/dev/null 2>&1; then
echo "-- matches in /boot/initramfs-linux.img:"
lsinitcpio /boot/initramfs-linux.img 2>/dev/null \
| grep -E '(cryptsetup|dropbear|encryptssh|netconf|mdadm|lvm|/init$|hooks/)' \
| sort -u | head -50
else
echo "lsinitcpio not available"
fi
banner "network: which service manages it?"
for u in systemd-networkd NetworkManager netctl-auto dhcpcd; do
printf " %-22s %s\n" "$u" "$(systemctl is-enabled "$u" 2>&1)"
done
# dhcpcd@interface units (Arch default for static-ish setups)
systemctl list-unit-files 'dhcpcd@*' --no-pager 2>/dev/null | grep -E 'dhcpcd@' || true
banner "network: config files present"
echo "-- /etc/systemd/network/"
ls -la /etc/systemd/network/ 2>&1 | head -20 || echo "(empty/missing)"
echo
echo "-- /etc/NetworkManager/system-connections/"
ls -la /etc/NetworkManager/system-connections/ 2>&1 | head -20 || echo "(empty/missing)"
echo
echo "-- /etc/netctl/"
ls -la /etc/netctl/ 2>&1 | head -20 || echo "(empty/missing)"
echo
echo "-- /etc/hostname / /etc/hosts"
cat /etc/hostname 2>&1 || true
echo "---"
cat /etc/hosts 2>&1 || true
banner "firewall units (would persist across reboots)"
for u in nftables iptables ip6tables firewalld ufw fail2ban docker; do
printf " %-12s %s\n" "$u" "$(systemctl is-enabled "$u" 2>&1)"
done
echo
if [ -f /etc/nftables.conf ]; then
echo "-- /etc/nftables.conf (first 60 lines):"
head -60 /etc/nftables.conf
fi
[ -f /etc/iptables/iptables.rules ] && { echo "-- /etc/iptables/iptables.rules (head 40):"; head -40 /etc/iptables/iptables.rules; }
banner "sshd state + drop-ins"
sshd -t 2>&1
systemctl is-enabled sshd 2>&1
grep -nE '^Port|^ListenAddress|^PermitRootLogin' /etc/ssh/sshd_config 2>&1 || true
echo
echo "-- sshd_config.d/ drop-ins (can override main config!):"
ls -la /etc/ssh/sshd_config.d/ 2>&1 || echo "(no drop-ins dir)"
for f in /etc/ssh/sshd_config.d/*.conf; do
[ -e "$f" ] || continue
echo
echo "-- $f:"
cat "$f"
done
banner "journal: which boots are actually recorded?"
journalctl --list-boots --no-pager 2>&1 | tail -15
banner "last recorded boot (-b 0): all errors"
journalctl -b 0 -p err --no-pager 2>&1 | head -100 || true
banner "last recorded boot (-b 0): sshd"
journalctl -b 0 -u sshd --no-pager 2>&1 | head -40 || true
banner "last recorded boot (-b 0): cryptsetup / dropbear / network units"
journalctl -b 0 \
-u 'systemd-cryptsetup*' -u 'dropbear*' \
-u 'systemd-networkd*' -u 'NetworkManager*' -u 'dhcpcd*' \
--no-pager 2>&1 | head -80 || true
banner "previous boot (-b -1): errors (only if a previous boot is recorded)"
journalctl -b -1 -p err --no-pager 2>&1 | head -50 || true
banner "failed units of last boot"
systemctl --failed --no-pager 2>&1 || true

View File

@@ -0,0 +1,55 @@
#!/bin/bash
# Runs INSIDE the chroot of the installed Arch system. Applies the recommended
# boot / SSH fixes:
#
# 1. PermitRootLogin: rewrite a literal "no" line to "prohibit-password"
# in /etc/ssh/sshd_config AND any drop-in under /etc/ssh/sshd_config.d/.
# Backups are kept once as *.hal-backup.
# 2. Persistent journald: create /var/log/journal so journald survives
# reboot (next boot onwards). Helps catch the next failure if there is one.
#
# Idempotent: re-running is safe — no-op on already-fixed configs.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "PermitRootLogin (before)"
grep -rn '^PermitRootLogin' /etc/ssh/sshd_config /etc/ssh/sshd_config.d/ 2>/dev/null \
|| echo "(no explicit setting found)"
changed=0
for f in /etc/ssh/sshd_config /etc/ssh/sshd_config.d/*.conf; do
[ -e "$f" ] || continue
if grep -q '^PermitRootLogin no$' "$f"; then
[ -f "$f.hal-backup" ] || cp -a "$f" "$f.hal-backup"
sed -i 's/^PermitRootLogin no$/PermitRootLogin prohibit-password/' "$f"
echo "==> Patched: $f (backup at $f.hal-backup)"
changed=1
fi
done
[ "$changed" -eq 0 ] && echo "==> Nothing to patch — PermitRootLogin is not 'no' anywhere."
banner "PermitRootLogin (after)"
grep -rn '^PermitRootLogin' /etc/ssh/sshd_config /etc/ssh/sshd_config.d/ 2>/dev/null \
|| echo "(no explicit setting found)"
banner "sshd_config syntax check"
sshd -t && echo "syntax OK"
banner "persistent journald"
if [ ! -d /var/log/journal ]; then
mkdir -p /var/log/journal
systemd-tmpfiles --create --prefix /var/log/journal 2>&1 || true
echo "==> Created /var/log/journal. journald will persist from next boot onwards."
else
echo "/var/log/journal already exists — journald is already persistent."
fi
banner "/boot space"
df -h /boot
ls -lh /boot
banner "summary"
echo "Done. The changes take effect on the NEXT boot of the installed system."
echo "Exit the chroot and reboot out of rescue when ready."

View File

@@ -0,0 +1,92 @@
#!/bin/bash
# Re-install GRUB stage1 + core.img to the MBR of every physical disk that
# backs /boot's RAID array. Needed when a `pacman -Syu` updated the grub
# package but grub-install was never re-run afterwards, leaving stale
# Stage1 code in the MBR that may not understand the new modules in
# /boot/grub/i386-pc/.
#
# Also regenerates /boot/grub/grub.cfg.
#
# Boot disks are auto-detected from the components of /dev/md0.
# Targets BIOS GRUB (--target=i386-pc); the existing /boot/grub/i386-pc/
# directory confirms this is a BIOS setup.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "current /boot/grub state"
ls -lh /boot/grub/
echo
echo "-- /boot/grub/i386-pc/ — most recent files:"
ls -lt /boot/grub/i386-pc/ 2>/dev/null | head -8
banner "identifying boot disks (members of md0)"
if [ ! -e /dev/md0 ]; then
echo "ERROR: /dev/md0 does not exist. Was the RAID assembled before chroot?"
exit 1
fi
echo "-- mdadm --detail /dev/md0 (member partitions):"
mdadm --detail /dev/md0 | awk '/active sync/ {print " " $NF}'
# Convert a partition path to its parent disk. lsblk fails inside our chroot
# (can't resolve PKNAME against the rescue-bound /sys), so use the standard
# Linux device naming conventions instead.
parent_disk() {
local part="$1"
case "$part" in
/dev/nvme[0-9]*n[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/mmcblk[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/loop[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/sd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
/dev/vd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
/dev/hd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
*)
# Last resort — try lsblk; may return empty in chroot
local d
d=$(lsblk -no PKNAME "$part" 2>/dev/null | head -1)
[ -n "$d" ] && echo "/dev/$d"
;;
esac
}
BOOT_DISKS=()
for part in $(mdadm --detail /dev/md0 2>/dev/null | awk '/active sync/ {print $NF}'); do
disk=$(parent_disk "$part")
[ -z "$disk" ] && { echo "WARN: cannot resolve parent disk for $part"; continue; }
already=0
for d in "${BOOT_DISKS[@]}"; do [ "$d" = "$disk" ] && already=1; done
[ "$already" -eq 0 ] && BOOT_DISKS+=("$disk")
done
if [ "${#BOOT_DISKS[@]}" -eq 0 ]; then
echo "ERROR: could not detect any boot disks."
exit 1
fi
echo
echo "Will run grub-install on: ${BOOT_DISKS[*]}"
banner "regenerating /boot/grub/grub.cfg"
grub-mkconfig -o /boot/grub/grub.cfg 2>&1 | tail -10
banner "reinstalling GRUB to each boot disk"
for disk in "${BOOT_DISKS[@]}"; do
echo
echo "-- grub-install --target=i386-pc --recheck $disk"
grub-install --target=i386-pc --recheck "$disk"
done
banner "post-install state"
echo "-- /boot/grub/i386-pc/ — newest files now:"
ls -lt /boot/grub/i386-pc/ 2>/dev/null | head -6
banner "next steps"
cat <<EOF
1. Exit chroot, umount -R /mnt, reboot.
2. If the system boots normally:
→ root cause confirmed = stale MBR after grub package upgrades
(grub-install was never re-run after a pacman -Syu touched grub).
→ To prevent recurrence, add a pacman hook (Arch wiki: "GRUB").
3. If still unbootable:
→ GRUB stage1 was not the cause. Next bisection: downgrade systemd.
EOF

View File

@@ -0,0 +1,110 @@
#!/bin/bash
# Runs INSIDE the chroot. Downgrades the linux kernel to the previous
# version (the one running BEFORE the most recent `pacman upgraded linux`
# in /var/log/pacman.log). Looks in /var/cache/pacman/pkg/ first; if not
# present, fetches from https://archive.archlinux.org/.
#
# After downgrade: regenerates initramfs + grub.cfg.
#
# Use case: a `pacman -Syu` bumped the kernel to a version that fails to
# boot on this hardware. Rolling the kernel back leaves every other
# package on the new version, so this isolates the kernel as a variable.
#
# Idempotent: if already on the previous version, exits as a no-op.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "determining previous kernel version from pacman.log"
PREV=$(grep -E '\[ALPM\] upgraded linux \(' /var/log/pacman.log 2>/dev/null \
| tail -1 \
| sed -E 's/.*upgraded linux \(([^ ]+) -> [^)]+\).*/\1/')
CURR=$(pacman -Q linux | awk '{print $2}')
if [ -z "$PREV" ]; then
echo "FATAL: Could not parse a previous kernel version from /var/log/pacman.log."
echo " Pacman log entries for 'linux' upgrades:"
grep -E '\[ALPM\] (installed|upgraded) linux \(' /var/log/pacman.log 2>/dev/null \
| tail -5 || echo " (none found)"
exit 1
fi
echo "Currently installed: linux-$CURR"
echo "Previous version: linux-$PREV"
if [ "$PREV" = "$CURR" ]; then
echo "Already on the previous version. Nothing to do."
exit 0
fi
PKG_NAME="linux-${PREV}-x86_64.pkg.tar.zst"
CACHE_PATH="/var/cache/pacman/pkg/${PKG_NAME}"
banner "locating package"
TARGET=""
if [ -e "$CACHE_PATH" ]; then
echo "Found in cache: $CACHE_PATH"
TARGET="$CACHE_PATH"
else
echo "Not in cache. Fetching from archive.archlinux.org ..."
URL="https://archive.archlinux.org/packages/l/linux/${PKG_NAME}"
echo "URL: $URL"
if curl -fsSL --connect-timeout 15 -o "/tmp/${PKG_NAME}" "$URL"; then
TARGET="/tmp/${PKG_NAME}"
echo "Downloaded: $TARGET ($(du -h "$TARGET" | cut -f1))"
else
cat <<EOF >&2
Download failed from $URL.
Reasons might be:
- chroot has no working DNS / no outbound network
- the specific version is no longer on archive.archlinux.org
- upstream temporarily unavailable
Workarounds:
1. Test network from chroot:
curl -v https://archive.archlinux.org/
2. Manually download on your client:
curl -O $URL
and SCP into rescue, then place at:
/mnt/tmp/${PKG_NAME}
(Inside the chroot it appears as /tmp/${PKG_NAME}.)
3. Pick a different version — list at:
https://archive.archlinux.org/packages/l/linux/
EOF
exit 1
fi
fi
banner "/boot space before"
df -h /boot
ls -lh /boot
banner "downgrading kernel (pacman -U)"
pacman -U --noconfirm "$TARGET"
banner "regenerating initramfs"
mkinitcpio -P
banner "regenerating GRUB config"
grub-mkconfig -o /boot/grub/grub.cfg 2>&1 | tail -10
banner "/boot space after"
df -h /boot
ls -lh /boot
banner "result"
pacman -Q linux
banner "next steps"
cat <<EOF
1. Exit chroot, umount -R /mnt, reboot.
2. If system boots and SSH works:
→ root cause confirmed = linux $CURR incompatible on this hardware.
Pin the kernel by adding to /etc/pacman.conf:
IgnorePkg = linux
OR install linux-lts and switch to it as the primary kernel.
3. If still unbootable:
→ kernel was not the cause. Next bisection target: systemd.
EOF

View File

@@ -0,0 +1,69 @@
#!/bin/bash
# Runs INSIDE the chroot of the installed Arch system. Rewrites every
# systemd-networkd *.network file's [Match] block to use MACAddress= instead
# of Name=. This makes the network config survive kernel / systemd upgrades
# that may rename the interface (predictable naming changes, driver enum).
#
# The MAC is auto-detected via `ip link show` (visible because /sys is bind-
# mounted from rescue — same physical NIC, same MAC).
#
# Idempotent: a .network file that already uses MACAddress= is skipped.
# Backups are kept once at <file>.hal-backup.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "detecting NIC MAC"
# Pick the first non-loopback link with a colon-formatted MAC.
MAC=$(ip -br link show 2>/dev/null \
| awk '$1 != "lo" && $1 != "" && $3 ~ /^([0-9a-fA-F]{2}:){5}[0-9a-fA-F]{2}$/ {print $3; exit}')
if [ -z "$MAC" ]; then
echo "Could not auto-detect a non-loopback MAC. Aborting." >&2
exit 1
fi
echo "Detected MAC: $MAC"
banner ".network files (before)"
for f in /etc/systemd/network/*.network; do
[ -e "$f" ] || continue
echo "-- $f:"
cat "$f"
echo
done
banner "patching"
changed=0
for f in /etc/systemd/network/*.network; do
[ -e "$f" ] || continue
if grep -qE '^[[:space:]]*MACAddress[[:space:]]*=' "$f"; then
echo "$f: already uses MACAddress= — skipping"
continue
fi
if ! grep -qE '^[[:space:]]*Name[[:space:]]*=' "$f"; then
echo "$f: no Name= match — skipping"
continue
fi
[ -f "$f.hal-backup" ] || cp -a "$f" "$f.hal-backup"
awk -v mac="$MAC" '
BEGIN { replaced=0 }
/^[[:space:]]*Name[[:space:]]*=/ && !replaced { print "MACAddress=" mac; replaced=1; next }
{ print }
' "$f" > "$f.tmp" && mv "$f.tmp" "$f"
echo "$f: patched (backup at $f.hal-backup)"
changed=1
done
[ "$changed" -eq 0 ] && echo "Nothing to patch — all .network files already use MACAddress=."
banner ".network files (after)"
for f in /etc/systemd/network/*.network; do
[ -e "$f" ] || continue
echo "-- $f:"
cat "$f"
echo
done
banner "summary"
echo "Done. The change takes effect on the NEXT boot of the installed system."
echo "Backups (if any) are at /etc/systemd/network/*.network.hal-backup."

View File

@@ -0,0 +1,124 @@
#!/bin/bash
# Replaces `ip=dhcp` in /etc/default/grub with a static kernel-cmdline
# network spec derived from the existing /etc/systemd/network/*.network file.
#
# Why: Dropbear-in-initramfs relies on a working network for remote LUKS
# unlock. On Hetzner Dedicated, `ip=dhcp` is fragile — Hetzner's own docs
# recommend static configuration for FDE+Dropbear setups. A kernel/iproute2
# upgrade can subtly change the DHCP request format and break the
# previously-working DHCP path.
#
# The .network file already has the correct values (IP, gateway). This
# script reuses them in the kernel cmdline so dropbear has network in
# initramfs without depending on Hetzner DHCP.
#
# Resulting cmdline format (Linux kernel `ip=` documented form):
# ip=<client>:<server>:<gateway>:<netmask>:<hostname>:<device>:<protocol>
# We use:
# ip=46.4.224.77::46.4.224.65:255.255.255.255:echoserver:eth0:none
#
# Idempotent: re-running won't double-patch.
# Reversible: original /etc/default/grub backed up to .hal-backup.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "locating systemd-networkd config"
NETFILE=""
for f in /etc/systemd/network/*.network; do
[ -e "$f" ] || continue
NETFILE="$f"
break
done
if [ -z "$NETFILE" ]; then
echo "ERROR: no /etc/systemd/network/*.network file found."
echo " Cannot derive static IP/gateway."
exit 1
fi
echo "Using: $NETFILE"
echo
cat "$NETFILE"
banner "parsing"
# IPv4 address: first Address= or [Address]/Address= line without colon.
IPV4=$(awk '
/^[[:space:]]*Address[[:space:]]*=/ {
sub(/^[[:space:]]*Address[[:space:]]*=[[:space:]]*/, "")
if ($0 !~ /:/) { print; exit }
}
' "$NETFILE")
IPV4_BARE="${IPV4%%/*}"
# Gateway: first IPv4 Gateway= line.
GATEWAY=$(awk '
/^[[:space:]]*Gateway[[:space:]]*=/ {
sub(/^[[:space:]]*Gateway[[:space:]]*=[[:space:]]*/, "")
if ($0 !~ /:/) { print; exit }
}
' "$NETFILE")
HOST="$(cat /etc/hostname 2>/dev/null | head -1 | tr -d ' \t\n' || true)"
[ -z "$HOST" ] && HOST="host"
# Device: 'eth0' matches the kernel pre-udev naming of the first ethernet
# interface and is what Hetzner uses in their FDE-static-IP docs.
DEVICE="eth0"
echo " IPv4: $IPV4_BARE"
echo " Gateway: $GATEWAY"
echo " Hostname: $HOST"
echo " Device: $DEVICE"
if [ -z "$IPV4_BARE" ] || [ -z "$GATEWAY" ]; then
echo "ERROR: could not parse IPv4 address or gateway from $NETFILE."
exit 1
fi
IPSPEC="ip=${IPV4_BARE}::${GATEWAY}:255.255.255.255:${HOST}:${DEVICE}:none"
echo
echo "Will set kernel cmdline param: $IPSPEC"
banner "current /etc/default/grub"
cat /etc/default/grub
banner "patching /etc/default/grub"
if grep -qE 'ip=dhcp' /etc/default/grub; then
[ -f /etc/default/grub.hal-backup ] || cp -a /etc/default/grub /etc/default/grub.hal-backup
# Replace just the ip=dhcp token (leaves all other kernel params untouched)
sed -i -E "s|ip=dhcp|${IPSPEC}|g" /etc/default/grub
echo "Replaced ip=dhcp → $IPSPEC"
echo "Backup: /etc/default/grub.hal-backup"
elif grep -qE "ip=${IPV4_BARE//./\\.}::" /etc/default/grub; then
echo "Static ip= already configured for $IPV4_BARE — no change."
elif grep -qE 'ip=' /etc/default/grub; then
echo "WARNING: /etc/default/grub has an ip= directive that's neither dhcp"
echo " nor the expected static spec. Manual review needed:"
grep -nE 'ip=' /etc/default/grub
echo "Aborting — won't blindly overwrite an unknown ip= value."
exit 1
else
echo "No ip= directive found in GRUB_CMDLINE_LINUX. Manual edit may be needed."
exit 1
fi
banner "patched /etc/default/grub"
cat /etc/default/grub
banner "regenerating /boot/grub/grub.cfg"
grub-mkconfig -o /boot/grub/grub.cfg 2>&1 | tail -10
banner "verifying"
echo "-- ip= lines in new grub.cfg:"
grep -nE '\bip=' /boot/grub/grub.cfg | head -5 || echo "(no ip= line found — unexpected)"
banner "next steps"
cat <<EOF
1. Exit chroot, umount -R /mnt, reboot.
2. If system boots and SSH works:
→ Root cause was DHCP-in-initramfs fragility (Hetzner side / iproute2
behavior change). Static cmdline IP is the recommended permanent fix.
3. To revert (if anything goes wrong):
cp /etc/default/grub.hal-backup /etc/default/grub
grub-mkconfig -o /boot/grub/grub.cfg
EOF

View File

@@ -0,0 +1,95 @@
#!/bin/bash
# Runs INSIDE the chroot. Full pacman -Syu + initramfs rebuild + GRUB refresh
# (config + MBR on every disk backing /boot's RAID).
#
# CRITICAL: pacman 7.x uses Linux Landlock for its sandbox protection. The
# Hetzner Rescue kernel does NOT enable Landlock, so pacman -Syu inside the
# chroot would fail at the database-sync step with:
# error: restricting filesystem access failed because Landlock is not supported
# error: switching to sandbox user 'alpm' failed!
# The --disable-sandbox flag works around this. Outside the rescue context
# (e.g. on the live system later) the flag is unnecessary.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
# Convert a partition path to its parent disk. lsblk fails inside our chroot
# (can't resolve PKNAME against the rescue-bound /sys), so use standard
# Linux device-naming conventions instead. (Same helper as fix/grub.sh.)
parent_disk() {
local part="$1"
case "$part" in
/dev/nvme[0-9]*n[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/mmcblk[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/loop[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/sd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
/dev/vd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
/dev/hd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
*)
local d
d=$(lsblk -no PKNAME "$part" 2>/dev/null | head -1)
[ -n "$d" ] && echo "/dev/$d"
;;
esac
}
banner "pre-upgrade state"
echo "-- key packages BEFORE:"
pacman -Q linux mkinitcpio systemd openssh dropbear cryptsetup mdadm lvm2 grub 2>&1 | head -15
echo
echo "-- /boot space BEFORE:"
df -h /boot
banner "running pacman -Syyu (with --disable-sandbox for Rescue kernel)"
pacman --disable-sandbox -Syyu --noconfirm
banner "rebuilding initramfs"
mkinitcpio -P
banner "identifying boot disks (members of md0)"
if [ ! -e /dev/md0 ]; then
echo "ERROR: /dev/md0 not present. RAID not assembled? Aborting GRUB step."
exit 1
fi
BOOT_DISKS=()
for part in $(mdadm --detail /dev/md0 2>/dev/null | awk '/active sync/ {print $NF}'); do
disk=$(parent_disk "$part")
[ -z "$disk" ] && { echo "WARN: cannot resolve parent disk for $part"; continue; }
already=0
for d in "${BOOT_DISKS[@]}"; do [ "$d" = "$disk" ] && already=1; done
[ "$already" -eq 0 ] && BOOT_DISKS+=("$disk")
done
echo "Boot disks: ${BOOT_DISKS[*]}"
banner "refreshing GRUB on all boot disks"
for disk in "${BOOT_DISKS[@]}"; do
echo
echo "-- grub-install --target=i386-pc --recheck $disk"
grub-install --target=i386-pc --recheck "$disk"
done
banner "regenerating /boot/grub/grub.cfg"
grub-mkconfig -o /boot/grub/grub.cfg 2>&1 | tail -10
banner "post-upgrade state"
echo "-- key packages AFTER:"
pacman -Q linux mkinitcpio systemd openssh dropbear cryptsetup mdadm lvm2 grub 2>&1 | head -15
echo
echo "-- /boot space AFTER:"
df -h /boot
banner "summary"
cat <<EOF
System fully upgraded. Boot stack refreshed:
- All packages on current state from Arch repos
- initramfs rebuilt for the current kernel
- GRUB stage1 + core.img re-written on all boot disks
- grub.cfg regenerated
Recommended next steps:
1. (Optional but recommended) Run \`hal use-static-ip <host>\` afterwards to
harden the initramfs network against future DHCP issues.
2. Exit chroot, umount -R /mnt, reboot, disable Rescue in Hetzner Robot.
3. Watch with: hal status <host>
EOF

View File

@@ -0,0 +1,59 @@
#!/bin/bash
# Runs on the BOOTED Arch system (post-installimage, pre-encryption).
# Wires up dropbear + encryptssh + netconf for later remote-LUKS-unlock.
#
# Performs sections 3.13.5 of the README:
# - install busybox / mkinitcpio-{dropbear,utils,netconf}
# - copy authorized_keys to /etc/dropbear/root_key
# - regenerate OpenSSH host keys in PEM format
# - convert RSA host key to dropbear format
# - replace the HOOKS line in /etc/mkinitcpio.conf
#
# Idempotent: re-running is safe. A backup of /etc/mkinitcpio.conf is taken
# at first patch as /etc/mkinitcpio.conf.hal-backup.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "installing dropbear + mkinitcpio plugins"
pacman -S --noconfirm --needed \
busybox mkinitcpio-dropbear mkinitcpio-utils mkinitcpio-netconf
banner "copying authorized_keys to /etc/dropbear/root_key"
install -d -m 0755 /etc/dropbear
install -m 0600 /root/.ssh/authorized_keys /etc/dropbear/root_key
chmod 700 /root/.ssh
chmod 600 /root/.ssh/authorized_keys
banner "enabling sshd"
systemctl enable sshd
banner "regenerating OpenSSH host keys (PEM format)"
rm -f /etc/ssh/ssh_host_*
ssh-keygen -A -m PEM
banner "importing RSA host key into dropbear"
dropbearconvert openssh dropbear \
/etc/ssh/ssh_host_rsa_key /etc/dropbear/dropbear_rsa_host_key
banner "patching HOOKS in /etc/mkinitcpio.conf"
[ -f /etc/mkinitcpio.conf.hal-backup ] \
|| cp -a /etc/mkinitcpio.conf /etc/mkinitcpio.conf.hal-backup
# Replace any existing HOOKS=(...) line with the encryptssh-enabled set.
sed -i -E \
's|^HOOKS=.*|HOOKS=(base udev autodetect modconf block mdadm_udev lvm2 netconf dropbear encryptssh filesystems keyboard fsck)|' \
/etc/mkinitcpio.conf
echo "HOOKS line is now:"
grep '^HOOKS=' /etc/mkinitcpio.conf
banner "done"
cat <<EOF
Next steps:
1. Activate Hetzner Rescue in the Robot, then reboot the server.
2. From your client: hal connect rescue <host>
3. Inside rescue: hal encrypt-root <host>
4. After that: hal install-grub <host>
EOF

View File

@@ -0,0 +1,106 @@
#!/bin/bash
# Runs IN HETZNER RESCUE (NOT in chroot). Re-creates the root LV stack on
# top of LUKS, preserving the installed Arch by copying it through /oldroot.
#
# Performs sections 4.44.15 of the README in one go:
# 4.4 mount the unencrypted /dev/mapper/vg0-root
# 4.5 cp -va /mnt → /oldroot at full RAID resync speed
# 4.6 umount /mnt
# 4.7 vgremove vg0
# 4.8 cat /proc/mdstat (display)
# 4.9 luksFormat /dev/md1 (prompts for NEW passphrase!)
# luksOpen + recreate LVM (vg0 with swap + root)
# mkfs.btrfs / mkswap
# 4.10 mount the encrypted root at /mnt
# 4.12 cp -va /oldroot back into /mnt at full RAID resync speed
# 4.13 bind /dev /sys /proc, mount /boot
# 4.14 echo cryptroot line into /mnt/etc/crypttab
# 4.15 chroot + mkinitcpio -P
#
# DESTRUCTIVE: /dev/md1 will be re-formatted with LUKS. Any data not under
# /mnt (vg0-root) is lost. Confirmation prompted before the format step.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "4.4 mount existing unencrypted root"
vgscan -v
vgchange -a y
mount /dev/mapper/vg0-root /mnt
banner "4.5 copy current system to /oldroot (full RAID resync speed)"
mkdir -p /oldroot
echo 0 > /proc/sys/dev/raid/speed_limit_max
cp -va /mnt/. /oldroot/.
echo 200000 > /proc/sys/dev/raid/speed_limit_max
banner "4.6 unmount original root"
umount /mnt
banner "4.7 remove unencrypted VG (frees /dev/md1)"
vgremove -f vg0
banner "4.8 RAID state"
cat /proc/mdstat
banner "CONFIRMATION REQUIRED"
echo "About to luksFormat /dev/md1. This is DESTRUCTIVE for /dev/md1."
echo "Type 'YES' to continue (anything else aborts):"
read -r confirm
if [ "$confirm" != "YES" ]; then
echo "Aborted by user before luksFormat. /oldroot still has your data;"
echo "you can re-create the original LVM by hand from there if needed."
exit 1
fi
banner "4.9 LUKS format /dev/md1 (you will be prompted for the NEW passphrase)"
cryptsetup --cipher aes-xts-plain64 --key-size 256 --hash sha256 \
--iter-time 10000 luksFormat /dev/md1
banner "4.9b open the LUKS volume (re-enter the same passphrase)"
cryptsetup luksOpen /dev/md1 cryptroot
banner "4.9c recreate LVM on top of /dev/mapper/cryptroot"
pvcreate /dev/mapper/cryptroot
vgcreate vg0 /dev/mapper/cryptroot
lvcreate -n swap -L 8G vg0
lvcreate -n root -l 100%FREE vg0
mkfs.btrfs /dev/vg0/root
mkswap /dev/vg0/swap
banner "4.10 mount the encrypted root"
mount /dev/vg0/root /mnt
banner "4.12 copy system back into the encrypted root"
echo 0 > /proc/sys/dev/raid/speed_limit_max
cp -va /oldroot/. /mnt/.
echo 200000 > /proc/sys/dev/raid/speed_limit_max
banner "4.13 bind-mount /dev /sys /proc, mount /boot"
mount /dev/md0 /mnt/boot
mount --bind /dev /mnt/dev
mount --bind /sys /mnt/sys
mount --bind /proc /mnt/proc
banner "4.14 append cryptroot line to /etc/crypttab"
if ! grep -qE '^cryptroot[[:space:]]' /mnt/etc/crypttab 2>/dev/null; then
echo "cryptroot /dev/md1 none luks" >> /mnt/etc/crypttab
fi
grep cryptroot /mnt/etc/crypttab
banner "4.15 regenerate initramfs inside chroot"
chroot /mnt /bin/bash -c "mkinitcpio -P"
banner "done"
cat <<EOF
Encryption setup complete. /oldroot can be deleted manually after you've
confirmed the encrypted boot works.
Recommended next steps:
hal install-grub <host> # configures GRUB for LUKS-encrypted root
hal connect rescue <host> reboot
# Disable rescue in Hetzner Robot
hal status <host> # poll for dropbear / sshd
hal unlock <host> # send LUKS passphrase to dropbear
EOF

View File

@@ -0,0 +1,78 @@
#!/bin/bash
# Runs INSIDE the chroot. Initial GRUB install for the LUKS-encrypted root.
# Performs sections 5.15.3 of the README:
# - install the grub package
# - write /etc/default/grub with the LUKS cmdline + GRUB_ENABLE_CRYPTODISK=y
# - grub-mkconfig
# - grub-install on every disk backing /boot's RAID
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
# Convert a partition path to its parent disk. (Same helper as fix/grub.sh.)
parent_disk() {
local part="$1"
case "$part" in
/dev/nvme[0-9]*n[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/mmcblk[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/sd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
/dev/vd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
*)
local d
d=$(lsblk -no PKNAME "$part" 2>/dev/null | head -1)
[ -n "$d" ] && echo "/dev/$d"
;;
esac
}
banner "installing grub package"
pacman -S --noconfirm --needed grub
banner "writing /etc/default/grub for LUKS boot"
[ -f /etc/default/grub.hal-backup ] || cp -a /etc/default/grub /etc/default/grub.hal-backup
cat > /etc/default/grub <<'GRUBEOF'
# hetzner-arch-luks default grub config
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Arch"
GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0"
GRUB_CMDLINE_LINUX="cryptdevice=/dev/md1:cryptroot ip=dhcp"
GRUB_PRELOAD_MODULES="part_gpt part_msdos"
GRUB_ENABLE_CRYPTODISK=y
GRUB_TIMEOUT_STYLE=menu
GRUB_TERMINAL_INPUT=console
GRUB_GFXMODE=auto
GRUB_GFXPAYLOAD_LINUX=keep
GRUB_DISABLE_RECOVERY=true
GRUBEOF
echo "Wrote /etc/default/grub. Showing relevant lines:"
grep -E '^GRUB_(CMDLINE_LINUX|ENABLE_CRYPTODISK|PRELOAD_MODULES)=' /etc/default/grub
banner "identifying boot disks (members of md0)"
BOOT_DISKS=()
for part in $(mdadm --detail /dev/md0 2>/dev/null | awk '/active sync/ {print $NF}'); do
disk=$(parent_disk "$part")
[ -z "$disk" ] && continue
already=0
for d in "${BOOT_DISKS[@]}"; do [ "$d" = "$disk" ] && already=1; done
[ "$already" -eq 0 ] && BOOT_DISKS+=("$disk")
done
echo "Boot disks: ${BOOT_DISKS[*]}"
banner "grub-mkconfig"
grub-mkconfig -o /boot/grub/grub.cfg 2>&1 | tail -10
banner "grub-install on each boot disk"
for disk in "${BOOT_DISKS[@]}"; do
echo "-- grub-install --target=i386-pc --recheck $disk"
grub-install --target=i386-pc --recheck "$disk"
done
banner "done"
cat <<EOF
GRUB installed for LUKS-encrypted boot.
Recommended next step: hal use-static-ip <host> (replaces ip=dhcp with a
static kernel-cmdline IP, making the initramfs network independent of DHCP).
EOF

View File

@@ -0,0 +1,145 @@
"""SSH helpers using OpenSSH ControlMaster for connection reuse.
The `SshSession` context manager opens a single SSH connection on enter
(interactive: password / host key accept happens here once) and then runs
follow-up commands over the same multiplexed channel without re-auth.
We deliberately wrap the OpenSSH client rather than using a library like
paramiko so the user's existing config (~/.ssh/config, agent, key files,
known_hosts) just works.
"""
from __future__ import annotations
import os
import shutil
import socket
import subprocess
import tempfile
import time
def remove_stale_known_hosts(host: str) -> None:
"""Drop any cached host key for `host`.
Each Hetzner rescue activation generates a fresh host key, so a stale
entry would otherwise block the connection with a MITM warning.
"""
known = os.path.expanduser("~/.ssh/known_hosts")
if not os.path.exists(known):
return
subprocess.run(
["ssh-keygen", "-f", known, "-R", host],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
def tcp_reachable(host: str, port: int, timeout: float = 3) -> bool:
try:
with socket.create_connection((host, port), timeout=timeout):
return True
except (OSError, socket.timeout):
return False
def wait_for_port(host: str, port: int = 22, timeout: int = 300, interval: int = 2) -> bool:
"""Block until host:port accepts TCP or `timeout` elapses."""
deadline = time.monotonic() + timeout
while time.monotonic() < deadline:
if tcp_reachable(host, port, timeout=2):
return True
time.sleep(interval)
return False
class SshSession:
"""Persistent SSH connection to one host via OpenSSH ControlMaster.
Use as a context manager. The master is opened by running a no-op remote
command during __enter__; this is where interactive prompts (password,
host key acceptance) happen. Subsequent `run()` calls reuse the cached
connection.
Example:
with SshSession("rescue.example.com") as ssh:
ssh.run("uname -a")
ssh.run("cat", input_=b"hello")
ssh.run("/bin/bash", tty=True) # interactive shell
"""
def __init__(self, host: str, user: str = "root"):
self.host = host
self.user = user
self._tmpdir: str | None = None
self._sock: str | None = None
# ---- context management -------------------------------------------------
def __enter__(self) -> "SshSession":
self._tmpdir = tempfile.mkdtemp(prefix="hal-ssh-")
self._sock = os.path.join(self._tmpdir, "ctl")
remove_stale_known_hosts(self.host)
# Open the master with a quick no-op. Auth (and any TTY prompts) happen
# right here. After this returns, the socket at self._sock is live and
# follow-up ssh invocations reusing it skip auth entirely.
cmd = [
"ssh",
"-o", "ControlMaster=auto",
"-o", f"ControlPath={self._sock}",
"-o", "ControlPersist=10m",
"-o", "StrictHostKeyChecking=accept-new",
"-o", "ServerAliveInterval=30",
f"{self.user}@{self.host}",
"true",
]
subprocess.run(cmd, check=True)
return self
def __exit__(self, exc_type, exc_val, exc_tb) -> None:
if self._sock and os.path.exists(self._sock):
subprocess.run(
[
"ssh", "-o", f"ControlPath={self._sock}",
"-O", "exit", f"{self.user}@{self.host}",
],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
if self._tmpdir and os.path.isdir(self._tmpdir):
shutil.rmtree(self._tmpdir, ignore_errors=True)
# ---- remote execution ---------------------------------------------------
def run(
self,
remote_cmd: str,
*,
tty: bool = False,
input_: bytes | None = None,
check: bool = True,
capture: bool = False,
) -> subprocess.CompletedProcess:
"""Run `remote_cmd` on the remote host over the multiplexed channel.
remote_cmd : Shell command(s) as a single string. Newlines OK — the
remote shell parses them as multiple statements.
tty : Allocate a remote pseudo-tty (needed for interactive
tools like `bash` or things using /dev/tty).
input_ : Bytes to feed to the remote command's stdin. Mutually
exclusive with tty (no terminal if stdin is a pipe).
check : Raise CalledProcessError on non-zero exit.
capture : Capture stdout/stderr in the returned CompletedProcess
instead of inheriting the parent's.
"""
if tty and input_ is not None:
raise ValueError("tty=True is incompatible with feeding stdin via input_")
cmd = ["ssh", "-o", f"ControlPath={self._sock}"]
if tty:
cmd += ["-t"]
cmd += [f"{self.user}@{self.host}", remote_cmd]
kwargs: dict = {"check": check}
if input_ is not None:
kwargs["input"] = input_
if capture:
kwargs["capture_output"] = True
return subprocess.run(cmd, **kwargs)