Added hal Python CLI

Wraps the rescue/chroot/diagnose/fix workflows in a single tool with
LUKS-passphrase keyring caching. Subcommands: status, connect rescue,
connect chroot, diagnose, fix-boot, fix-network, downgrade-kernel,
downgrade-initramfs, reinstall-grub, use-static-ip, upgrade-system,
forget-passphrase.

connect subcommands accept an optional remote command after the host
for non-interactive execution.

README updated to reference hal instead of the previous shell scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Kevin Veen-Birkenbach
2026-05-12 17:03:59 +02:00
parent 841a974123
commit 181240eae7
20 changed files with 1718 additions and 5 deletions

29
.claude/settings.json Normal file
View File

@@ -0,0 +1,29 @@
{
"permissions": {
"allow": [
"Edit",
"Write",
"Bash(*)",
"WebFetch(domain:pypi.org)",
"WebFetch(domain:files.pythonhosted.org)",
"Bash(python3 -c ' *)",
"WebFetch(domain:api.github.com)"
],
"ask": [
"Bash(*hal *)",
"Bash(*hetzner_arch_luks *)",
"Bash(ssh *)",
"Bash(scp *)",
"Bash(sftp *)"
]
},
"sandbox": {
"enabled": true,
"autoAllowBashIfSandboxed": true,
"network": {
"allowedDomains": [
"*"
]
}
}
}

0
.codex Normal file
View File

39
.gitignore vendored Normal file
View File

@@ -0,0 +1,39 @@
# Python build / runtime artifacts
__pycache__/
*.py[cod]
*$py.class
*.egg-info/
*.egg
.eggs/
build/
dist/
wheels/
pip-wheel-metadata/
# Virtual environments
.venv/
venv/
env/
ENV/
# Tooling caches
.pytest_cache/
.mypy_cache/
.ruff_cache/
.tox/
.coverage
.coverage.*
htmlcov/
# Editor / IDE
.idea/
.vscode/
*.swp
*~
.DS_Store
# Claude Code: personal overrides (settings.json itself is checked in)
.claude/settings.local.json
# Diagnostic output from `hal diagnose ... | tee diagnose-*.log`
diagnose-*.log

39
Makefile Normal file
View File

@@ -0,0 +1,39 @@
# Top-level targets for the hetzner-arch-luks helper package.
#
# Usage:
# make install # editable install for the current user
# make uninstall
# make clean # remove Python build artifacts
# make check # quick smoke tests (imports + --help)
PYTHON ?= python3
PIP ?= $(PYTHON) -m pip
.DEFAULT_GOAL := help
.PHONY: help install install-system uninstall clean check
help:
@echo "Targets:"
@echo " install pip install --user -e ."
@echo " install-system pip install -e . (system-wide; needs sudo or venv)"
@echo " uninstall remove the installed package"
@echo " clean remove __pycache__, *.egg-info, build/, dist/"
@echo " check run package smoke tests"
install:
$(PIP) install --user -e .
install-system:
$(PIP) install -e .
uninstall:
$(PIP) uninstall -y hetzner-arch-luks
clean:
rm -rf build dist
find . -type d -name '__pycache__' -prune -exec rm -rf {} +
find . -type d -name '*.egg-info' -prune -exec rm -rf {} +
check:
$(PYTHON) -m hetzner_arch_luks --help >/dev/null
$(PYTHON) -c "from hetzner_arch_luks import cli, ssh, probe, remote; print('imports OK')"

View File

@@ -23,12 +23,30 @@ The following symbols show in which environment the code is executed:
* :ghost: Chroot from Rescue System into Arch * :ghost: Chroot from Rescue System into Arch
* :minidisc: Arch OS * :minidisc: Arch OS
## CLI helper (`hal`)
This repo ships a small Python CLI (`hal`) that wraps the recurring SSH / LUKS / chroot dances. Install it once on your client:
```bash
pip install --user -e .
```
After that, `hal` is on your `$PATH`. Subcommands used throughout the guide:
| Command | What it does |
|---|---|
| `hal status <host>` | Probe reachability (ping, ports 22/222, SSH banner). No login. |
| `hal connect rescue <host>` | Wait for rescue, drop known_hosts entry, SSH in as root. |
| `hal connect chroot <host>` | Prompt LUKS passphrase **first** (hidden), then via rescue: assemble RAID → unlock LUKS → mount → drop into `chroot /mnt /bin/bash`. |
| `hal diagnose <host>` | Same setup as `connect chroot`, then runs a fixed diagnostic script inside the chroot and prints the report to stdout. |
The passphrase prompt happens *before* the SSH connection is established, so you can type it once, walk away, and the rest runs unattended.
## Guide ## Guide
### 1. Configure and Install Image ### 1. Configure and Install Image
#### 1.1 Login to Hetzner Rescue System #### 1.1 Login to Hetzner Rescue System
:computer: : :computer: :
```bash ```bash
ssh root@your_server_ip hal connect rescue your_server_ip
``` ```
#### 1.2 Create the /autosetup #### 1.2 Create the /autosetup
@@ -154,8 +172,7 @@ reboot
#### 4.3 Login to the rescue system #### 4.3 Login to the rescue system
:computer: : :computer: :
```bash ```bash
ssh-keygen -f "$HOME/.ssh/known_hosts" -R your_server_ip hal connect rescue your_server_ip
ssh root@your_server_ip
``` ```
#### 4.4 Mount the "system" #### 4.4 Mount the "system"
@@ -301,6 +318,26 @@ btrfs filesystem resize max /
## 8. Debugging ## 8. Debugging
### 8.1 Login to System from Rescue System ### 8.1 Login to System from Rescue System
With the rescue system already activated and running, drop straight into the chroot from your client:
:computer: :
```bash
hal connect chroot your_server_ip
```
You'll be prompted for the LUKS passphrase first (hidden input). The CLI then waits for rescue, assembles the RAID, opens LUKS, activates LVM, mounts `/mnt` + `/mnt/boot` + the pseudo-filesystems, and drops you into `chroot /mnt /bin/bash`. Idempotent — re-running while already mounted just re-enters the chroot.
### 8.2 Collect diagnostics in one shot
If you want a non-interactive snapshot of the installed system's state (package versions, last-boot journal errors, sshd status, `/boot` contents, etc.):
:computer: :
```bash
hal diagnose your_server_ip | tee "diagnose-$(date +%F-%H%M).log"
```
The CLI runs the same setup as `connect chroot` and then a fixed inspection script inside the chroot. Output goes to stdout (and the log file via `tee`).
<details>
<summary>Manual equivalent of the unlock + mount sequence</summary>
:ambulance: : :ambulance: :
```bash ```bash
cryptsetup luksOpen /dev/md1 cryptroot cryptsetup luksOpen /dev/md1 cryptroot
@@ -311,7 +348,8 @@ mount --bind /sys /mnt/sys
mount --bind /proc /mnt/proc mount --bind /proc /mnt/proc
chroot /mnt chroot /mnt
``` ```
### 8.2 Logout from chroot environment </details>
### 8.3 Logout from chroot environment
:ghost: :ambulance: : :ghost: :ambulance: :
```bash ```bash
exit exit
@@ -321,7 +359,7 @@ sync
reboot reboot
``` ```
### 8.3 Regenerate GRUB and Arch ### 8.4 Regenerate GRUB and Arch
:ghost: : :ghost: :
```bash ```bash
mkinitcpio -p linux mkinitcpio -p linux

29
pyproject.toml Normal file
View File

@@ -0,0 +1,29 @@
[build-system]
requires = ["setuptools>=64"]
build-backend = "setuptools.build_meta"
[project]
name = "hetzner-arch-luks"
version = "0.1.0"
description = "CLI helpers for the hetzner-arch-luks setup: connect to rescue, drop into the encrypted chroot, probe reachability, collect diagnostics."
readme = "README.md"
requires-python = ">=3.9"
authors = [{ name = "Kevin Veen-Birkenbach" }]
license = { text = "Proprietary" }
classifiers = [
"Environment :: Console",
"Operating System :: POSIX :: Linux",
"Programming Language :: Python :: 3",
]
[project.scripts]
hal = "hetzner_arch_luks.cli:main"
[tool.setuptools]
package-dir = { "" = "src" }
[tool.setuptools.packages.find]
where = ["src"]
[tool.setuptools.package-data]
hetzner_arch_luks = ["resources/**/*.sh"]

View File

@@ -0,0 +1 @@
__version__ = "0.1.0"

View File

@@ -0,0 +1,4 @@
from .cli import main
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,212 @@
"""Command-line interface for the hetzner-arch-luks helpers.
Entry point: hal <subcommand> <host>
Subcommands:
status client-side reachability probe (no login)
connect rescue <host> SSH into the rescue system
connect chroot <host> LUKS unlock + mount + interactive chroot shell
diagnose <host> LUKS unlock + mount + collect diagnostics
For commands that need the LUKS passphrase, the prompt happens *first*, before
any network IO — so you can type the passphrase, walk away, and the rest runs
unattended.
"""
from __future__ import annotations
import argparse
import sys
from . import probe, remote
def _build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
prog="hal",
description="Helper CLI for the hetzner-arch-luks workflow.",
)
sub = parser.add_subparsers(dest="cmd", required=True)
p_status = sub.add_parser(
"status",
help="Probe reachability of a host (ping + ports + SSH banner). No login.",
)
p_status.add_argument("host")
p_connect = sub.add_parser(
"connect",
help="Open an interactive remote shell.",
)
p_connect_sub = p_connect.add_subparsers(dest="target", required=True)
p_rescue = p_connect_sub.add_parser(
"rescue",
help="SSH into the Hetzner rescue system (waits for port 22 to come up). "
"Pass extra args after the host to run them non-interactively.",
)
p_rescue.add_argument("host")
p_rescue.add_argument(
"command",
nargs=argparse.REMAINDER,
help="Optional command + args to run on the rescue instead of opening "
"an interactive shell. Example: hal connect rescue HOST reboot",
)
p_chroot = p_connect_sub.add_parser(
"chroot",
help="Unlock LUKS via rescue, mount, and drop into chroot /mnt /bin/bash. "
"Pass extra args after the host to run them inside the chroot.",
)
p_chroot.add_argument("host")
p_chroot.add_argument(
"--no-passphrase-prompt",
action="store_true",
help="Skip the early LUKS prompt (use when LUKS is already open from a prior run).",
)
p_chroot.add_argument(
"command",
nargs=argparse.REMAINDER,
help="Optional command + args to run inside the chroot instead of "
"opening an interactive shell. Example: hal connect chroot HOST pacman -Q linux",
)
p_diag = sub.add_parser(
"diagnose",
help="Collect diagnostics from inside the installed system via rescue.",
)
p_diag.add_argument("host")
p_diag.add_argument(
"--no-passphrase-prompt",
action="store_true",
help="Skip the early LUKS prompt (use when LUKS is already open from a prior run).",
)
p_fix = sub.add_parser(
"fix-boot",
help="Apply boot/SSH fixes inside the chroot. MUTATES the installed system.",
)
p_fix.add_argument("host")
p_fix.add_argument(
"--no-passphrase-prompt",
action="store_true",
help="Skip the early LUKS prompt (use when LUKS is already open from a prior run).",
)
p_fixnet = sub.add_parser(
"fix-network",
help="Rewrite systemd-networkd .network files to use MACAddress= match. MUTATES.",
)
p_fixnet.add_argument("host")
p_fixnet.add_argument(
"--no-passphrase-prompt",
action="store_true",
help="Skip the early LUKS prompt (use when LUKS is already open from a prior run).",
)
p_dk = sub.add_parser(
"downgrade-kernel",
help="Roll the linux package back to the previous cached version. MUTATES. "
"Use after a kernel-bump pacman -Syu made the system unbootable.",
)
p_dk.add_argument("host")
p_dk.add_argument(
"--no-passphrase-prompt",
action="store_true",
help="Skip the early LUKS prompt (use when LUKS is already open from a prior run).",
)
p_fp = sub.add_parser(
"forget-passphrase",
help="Drop the cached LUKS passphrase for a host from the libsecret keyring.",
)
p_fp.add_argument("host")
p_rg = sub.add_parser(
"reinstall-grub",
help="Re-run grub-install on every disk backing /boot. MUTATES the MBR. "
"Use after a grub-package upgrade that didn't refresh the bootloader.",
)
p_rg.add_argument("host")
p_rg.add_argument(
"--no-passphrase-prompt",
action="store_true",
help="Skip the early LUKS prompt (use when LUKS is already open from a prior run).",
)
p_di = sub.add_parser(
"downgrade-initramfs",
help="Downgrade mkinitcpio + dropbear + cryptsetup + mdadm + lvm2 to the "
"version before the last pacman -Syu, then rebuild initramfs. MUTATES.",
)
p_di.add_argument("host")
p_di.add_argument(
"--no-passphrase-prompt",
action="store_true",
help="Skip the early LUKS prompt (use when LUKS is already open from a prior run).",
)
p_si = sub.add_parser(
"use-static-ip",
help="Replace ip=dhcp in /etc/default/grub with a static kernel-cmdline "
"network spec (derived from /etc/systemd/network/*.network). MUTATES.",
)
p_si.add_argument("host")
p_si.add_argument(
"--no-passphrase-prompt",
action="store_true",
help="Skip the early LUKS prompt (use when LUKS is already open from a prior run).",
)
p_us = sub.add_parser(
"upgrade-system",
help="Full pacman -Syyu + initramfs rebuild + grub-install on every boot disk "
"+ grub.cfg regen, all in one chroot session. Uses --disable-sandbox "
"to work around the Hetzner Rescue kernel's missing Landlock. MUTATES.",
)
p_us.add_argument("host")
p_us.add_argument(
"--no-passphrase-prompt",
action="store_true",
help="Skip the early LUKS prompt (use when LUKS is already open from a prior run).",
)
return parser
def main(argv: list[str] | None = None) -> int:
args = _build_parser().parse_args(argv)
if args.cmd == "status":
return probe.status(args.host)
if args.cmd == "connect" and args.target == "rescue":
return remote.connect_rescue(args.host, command=args.command or None)
if args.cmd == "connect" and args.target == "chroot":
return remote.connect_chroot(
args.host,
ask_passphrase=not args.no_passphrase_prompt,
command=args.command or None,
)
if args.cmd == "diagnose":
return remote.diagnose(args.host, ask_passphrase=not args.no_passphrase_prompt)
if args.cmd == "fix-boot":
return remote.fix_boot(args.host, ask_passphrase=not args.no_passphrase_prompt)
if args.cmd == "fix-network":
return remote.fix_network(args.host, ask_passphrase=not args.no_passphrase_prompt)
if args.cmd == "downgrade-kernel":
return remote.downgrade_kernel(args.host, ask_passphrase=not args.no_passphrase_prompt)
if args.cmd == "forget-passphrase":
return remote.forget_passphrase(args.host)
if args.cmd == "reinstall-grub":
return remote.reinstall_grub(args.host, ask_passphrase=not args.no_passphrase_prompt)
if args.cmd == "downgrade-initramfs":
return remote.downgrade_initramfs(args.host, ask_passphrase=not args.no_passphrase_prompt)
if args.cmd == "use-static-ip":
return remote.use_static_ip(args.host, ask_passphrase=not args.no_passphrase_prompt)
if args.cmd == "upgrade-system":
return remote.upgrade_system(args.host, ask_passphrase=not args.no_passphrase_prompt)
return 2
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,55 @@
"""Client-side reachability probes that need no SSH credentials."""
from __future__ import annotations
import shutil
import socket
import subprocess
def _have(cmd: str) -> bool:
return shutil.which(cmd) is not None
def _ssh_banner(host: str, port: int = 22, timeout: float = 3) -> str:
"""Read the first line the SSH server emits on connect.
Distinguishes Hetzner rescue (Debian OpenSSH banner) from installed Arch
(Arch OpenSSH banner) from Dropbear (Dropbear banner).
"""
try:
with socket.create_connection((host, port), timeout=timeout) as s:
s.settimeout(2)
data = s.recv(256)
return data.decode("utf-8", errors="replace").splitlines()[0] if data else ""
except (OSError, socket.timeout, UnicodeDecodeError):
return ""
def status(host: str) -> int:
"""Print a reachability report for `host`. Returns 0 always."""
print(f"==> ping (ICMP) {host}")
try:
subprocess.run(["ping", "-c", "2", "-W", "2", host], check=False)
except FileNotFoundError:
print("(ping not available)")
print()
print(f"==> ports 22, 222 on {host}")
if _have("nmap"):
subprocess.run(["nmap", "-Pn", "-p", "22,222", host], check=False)
else:
print("(nmap not installed; falling back to TCP probes)")
for port in (22, 222):
ok = False
try:
with socket.create_connection((host, port), timeout=3):
ok = True
except (OSError, socket.timeout):
pass
print(f" {port}: {'reachable' if ok else 'not reachable (filtered/closed/timeout)'}")
print()
print(f"==> SSH banner on {host}:22")
banner = _ssh_banner(host, 22)
print(banner if banner else "(no banner)")
return 0

View File

@@ -0,0 +1,303 @@
"""Orchestrates the rescue / chroot / diagnose flows over an SshSession.
Key UX choices:
- The LUKS passphrase is prompted *before* we touch the network, so the
user enters it once and can step away while the rest runs.
- On first prompt the passphrase is cached in the libsecret keyring
(GNOME Keyring / KWallet via secret-tool) so subsequent runs against
the same host skip the prompt entirely.
"""
from __future__ import annotations
import getpass
import importlib.resources
import shlex
import shutil
import subprocess
import sys
from .ssh import SshSession, wait_for_port
# Pre-LUKS step: assemble the RAID arrays. Idempotent (mdadm returns non-zero
# when arrays are already assembled — we swallow that).
_ASSEMBLE = "mdadm --assemble --scan 2>/dev/null || true"
# Post-LUKS step: activate LVM, mount root + boot, bind /dev /proc /sys /run.
# Idempotent: every mount is guarded with `mountpoint -q`.
_MOUNT = r"""
set -e
vgchange -ay >/dev/null
if ! mountpoint -q /mnt; then
mount /dev/vg0/root /mnt
mkdir -p /mnt/boot
mount /dev/md0 /mnt/boot
fi
for d in dev proc sys run; do
mountpoint -q "/mnt/$d" || mount --rbind "/$d" "/mnt/$d"
done
"""
# Schema for libsecret entries:
# service = hetzner-arch-luks
# host = <host>
_KEYRING_SERVICE = "hetzner-arch-luks"
# ---- keyring helpers (libsecret via secret-tool) ---------------------------
def _have_secret_tool() -> bool:
return shutil.which("secret-tool") is not None
def _keyring_load(host: str) -> str | None:
"""Look up the cached LUKS passphrase for `host`. None if not stored."""
if not _have_secret_tool():
return None
r = subprocess.run(
["secret-tool", "lookup", "service", _KEYRING_SERVICE, "host", host],
capture_output=True, text=True,
)
if r.returncode == 0 and r.stdout:
# secret-tool prints the secret raw, without trailing newline
return r.stdout
return None
def _keyring_store(host: str, passphrase: str) -> None:
"""Persist `passphrase` in libsecret under (service, host)."""
if not _have_secret_tool():
return
label = f"hetzner-arch-luks LUKS passphrase for {host}"
subprocess.run(
[
"secret-tool", "store", "--label", label,
"service", _KEYRING_SERVICE, "host", host,
],
input=passphrase, text=True, check=False,
)
def _keyring_clear(host: str) -> bool:
"""Drop the cached passphrase for `host`. Returns True if anything was deleted."""
if not _have_secret_tool():
return False
if _keyring_load(host) is None:
return False
subprocess.run(
["secret-tool", "clear", "service", _KEYRING_SERVICE, "host", host],
check=False, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
)
return True
# ---- passphrase prompt -----------------------------------------------------
def _prompt_passphrase(host: str, *, force_prompt: bool = False) -> str:
"""Get the LUKS passphrase for `host`.
Order:
1. Try the libsecret keyring (skipped if force_prompt=True or
secret-tool isn't installed).
2. Hidden prompt via getpass. On success, store to the keyring for
next time.
Empty input aborts the whole command.
"""
if not force_prompt:
cached = _keyring_load(host)
if cached:
print(f"(passphrase from keyring for {host})", file=sys.stderr)
return cached
p = getpass.getpass(f"LUKS passphrase for {host}: ")
if not p:
print("Empty passphrase — aborting.", file=sys.stderr)
sys.exit(1)
_keyring_store(host, p)
return p
# ---- session helpers -------------------------------------------------------
def _wait_rescue(host: str, timeout: int = 300) -> None:
print(f"==> Waiting for {host}:22 ...")
if not wait_for_port(host, 22, timeout=timeout):
print(f"Timeout: {host}:22 not reachable after {timeout}s", file=sys.stderr)
sys.exit(1)
def _luks_is_open(ssh: SshSession) -> bool:
r = ssh.run("test -e /dev/mapper/cryptroot", check=False, capture=True)
return r.returncode == 0
def _ensure_unlocked(ssh: SshSession, host: str, passphrase: str | None) -> None:
"""Open LUKS if needed. Retries once with a fresh prompt if the cached
passphrase from the keyring is rejected by cryptsetup.
cryptsetup reads the passphrase from stdin (via --key-file=-) and stops
at EOF. We send raw bytes with no trailing newline.
"""
if _luks_is_open(ssh):
print("==> LUKS already open.")
return
if passphrase is None:
passphrase = _prompt_passphrase(host)
print("==> Opening LUKS ...")
try:
ssh.run(
"cryptsetup luksOpen --key-file=- /dev/md1 cryptroot",
input_=passphrase.encode(),
)
except subprocess.CalledProcessError:
# Most likely: wrong passphrase. If we got it from the keyring,
# clear the bad entry and re-prompt once.
if _keyring_clear(host):
print(
"==> cryptsetup rejected the cached passphrase. Cleared keyring; re-prompting.",
file=sys.stderr,
)
passphrase = _prompt_passphrase(host, force_prompt=True)
ssh.run(
"cryptsetup luksOpen --key-file=- /dev/md1 cryptroot",
input_=passphrase.encode(),
)
else:
raise
def _setup(ssh: SshSession, host: str, passphrase: str | None) -> None:
"""Full sequence: assemble + LUKS + LVM + mount + binds."""
print("==> Assembling RAID ...")
ssh.run(_ASSEMBLE)
_ensure_unlocked(ssh, host, passphrase)
print("==> Activating LVM + mounting + binding ...")
ssh.run(_MOUNT)
# ---- public entry points (called by cli.py) --------------------------------
def connect_rescue(host: str, *, command: list[str] | None = None) -> int:
"""Wait for rescue to come up, then either open an interactive SSH shell
or run `command` non-interactively and print its output.
No passphrase prompt — rescue itself isn't encrypted.
"""
_wait_rescue(host)
with SshSession(host) as ssh:
if command:
cmd_str = " ".join(shlex.quote(c) for c in command)
print(f"==> Running on rescue: {cmd_str}")
ssh.run(cmd_str, check=False)
else:
print("==> Connected to rescue. Type 'exit' to leave.")
ssh.run("exec bash -l", tty=True, check=False)
return 0
def connect_chroot(
host: str,
*,
ask_passphrase: bool = True,
command: list[str] | None = None,
) -> int:
"""Unlock LUKS via rescue, mount, then either open an interactive chroot
shell or run `command` inside the chroot non-interactively and print
its output."""
passphrase = _prompt_passphrase(host) if ask_passphrase else None
_wait_rescue(host)
with SshSession(host) as ssh:
_setup(ssh, host, passphrase)
if command:
# Pipe the command into chroot's bash via stdin — avoids all the
# quoting layers of `bash -c '<cmd>'` and is identical to how the
# diagnose/fix scripts are streamed in.
cmd_str = " ".join(shlex.quote(c) for c in command)
print(f"==> Running in chroot: {cmd_str}")
ssh.run("chroot /mnt /bin/bash", input_=(cmd_str + "\n").encode())
else:
print("==> Entering chroot. Type 'exit' to leave.")
ssh.run("chroot /mnt /bin/bash", tty=True, check=False)
return 0
def diagnose(host: str, *, ask_passphrase: bool = True) -> int:
"""Unlock + mount + run the chrooted diagnose script. Output goes to stdout."""
return _run_chroot_script(host, "diagnose/inside.sh", "diagnose", ask_passphrase)
def fix_boot(host: str, *, ask_passphrase: bool = True) -> int:
"""Unlock + mount + apply boot/SSH fixes inside chroot. MUTATES the system."""
return _run_chroot_script(host, "fix/boot.sh", "fix-boot", ask_passphrase)
def fix_network(host: str, *, ask_passphrase: bool = True) -> int:
"""Unlock + mount + rewrite .network files to use MACAddress= match. MUTATES."""
return _run_chroot_script(host, "fix/network.sh", "fix-network", ask_passphrase)
def downgrade_kernel(host: str, *, ask_passphrase: bool = True) -> int:
"""Unlock + mount + downgrade linux to the previous cached version. MUTATES."""
return _run_chroot_script(host, "fix/kernel.sh", "downgrade-kernel", ask_passphrase)
def reinstall_grub(host: str, *, ask_passphrase: bool = True) -> int:
"""Unlock + mount + grub-install on every disk backing /boot's RAID. MUTATES MBR."""
return _run_chroot_script(host, "fix/grub.sh", "reinstall-grub", ask_passphrase)
def downgrade_initramfs(host: str, *, ask_passphrase: bool = True) -> int:
"""Downgrade mkinitcpio+dropbear+cryptsetup+mdadm+lvm2, rebuild initramfs. MUTATES."""
return _run_chroot_script(host, "fix/initramfs.sh", "downgrade-initramfs", ask_passphrase)
def use_static_ip(host: str, *, ask_passphrase: bool = True) -> int:
"""Replace ip=dhcp in /etc/default/grub with a static spec parsed from
the existing systemd-networkd .network file. Regenerates grub.cfg. MUTATES."""
return _run_chroot_script(host, "fix/static_ip.sh", "use-static-ip", ask_passphrase)
def upgrade_system(host: str, *, ask_passphrase: bool = True) -> int:
"""Unlock + mount + full `pacman -Syu` + rebuild initramfs + refresh GRUB
(config + MBR on all boot disks). Uses --disable-sandbox because the
Hetzner Rescue kernel lacks Landlock. MUTATES."""
return _run_chroot_script(host, "maintain/upgrade.sh", "upgrade-system", ask_passphrase)
def forget_passphrase(host: str) -> int:
"""Drop the stored LUKS passphrase for `host` from the libsecret keyring."""
if not _have_secret_tool():
print("secret-tool not installed — no keyring backend; nothing to clear.",
file=sys.stderr)
return 1
if _keyring_clear(host):
print(f"Cleared cached LUKS passphrase for {host}.")
return 0
print(f"No cached LUKS passphrase for {host}.")
return 0
def _run_chroot_script(host: str, resource: str, label: str, ask_passphrase: bool) -> int:
"""Shared driver: unlock + mount + pipe a packaged script into chrooted bash.
The script is streamed as stdin to `chroot /mnt /bin/bash`; bash reads its
program from stdin, so it runs inside the chroot without leaving any file
on the target.
"""
passphrase = _prompt_passphrase(host) if ask_passphrase else None
_wait_rescue(host)
inside = (
importlib.resources
.files("hetzner_arch_luks")
.joinpath(f"resources/{resource}")
.read_bytes()
)
with SshSession(host) as ssh:
_setup(ssh, host, passphrase)
print(f"==> Running {label} inside chroot ...")
ssh.run("chroot /mnt /bin/bash", input_=inside)
return 0

View File

@@ -0,0 +1,155 @@
#!/bin/bash
# Runs INSIDE the chroot of the installed Arch system. Prints diagnostics
# grouped by banner. Read-only — no state changes.
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "uname / os-release"
uname -a
cat /etc/os-release
banner "package versions (boot/storage/net/ssh)"
pacman -Q linux mkinitcpio openssh systemd device-mapper lvm2 grub \
cryptsetup mdadm dropbear 2>&1
pacman -Q mkinitcpio-utils mkinitcpio-dropbear mkinitcpio-netconf 2>&1 || true
banner "recent upgrades of boot/network/sshd components (last 60 matches)"
# Focused on the packages that most often break a Hetzner Arch+LUKS boot.
grep -E '\[ALPM\] (upgraded|installed|removed) (linux( |$)|systemd( |$)|mkinitcpio( |$)|openssh( |$)|dropbear( |$)|glibc( |$)|cryptsetup( |$)|lvm2( |$)|mdadm( |$)|grub( |$)|iproute2( |$)|nftables( |$)|iptables( |$)|firewalld( |$)|fail2ban( |$)|mkinitcpio-utils( |$)|mkinitcpio-dropbear( |$)|mkinitcpio-netconf( |$))' /var/log/pacman.log 2>/dev/null \
| tail -60 \
|| echo "(no matches)"
banner "last full-system upgrade transactions"
grep -nE 'starting full system upgrade|transaction completed' /var/log/pacman.log 2>/dev/null \
| tail -10 || echo "(no matches)"
banner "initcpio udev rules shipped on disk"
ls -l /usr/lib/initcpio/udev/ 2>&1
banner "is the historically broken file present?"
ls -l /usr/lib/initcpio/udev/11-dm-initramfs.rules 2>&1 || echo "absent"
banner "encryptssh install hook still references it?"
grep -n "11-dm-initramfs.rules" \
/usr/lib/initcpio/install/encryptssh \
/etc/initcpio/install/encryptssh 2>/dev/null || echo "no match"
banner "mkinitcpio.conf (HOOKS, MODULES, BINARIES, FILES, COMPRESSION)"
grep -E '^(HOOKS|MODULES|BINARIES|FILES|COMPRESSION)=' /etc/mkinitcpio.conf 2>&1
banner "/etc/crypttab"
cat /etc/crypttab 2>&1 || true
banner "/etc/fstab"
cat /etc/fstab 2>&1 || true
banner "/boot contents and free space"
ls -lh /boot 2>&1
df -h /boot 2>&1
banner "GRUB config + bootloader state"
ls -lh /boot/grub/ 2>&1
echo
if [ -f /boot/grub/grub.cfg ]; then
if command -v grub-script-check >/dev/null 2>&1; then
grub-script-check /boot/grub/grub.cfg 2>&1 && echo "grub.cfg: syntax OK"
else
echo "grub-script-check not available — skipping syntax check"
fi
echo
echo "-- menuentry / linux / initrd lines (first 40):"
grep -nE '^\s*(linux|initrd|menuentry)' /boot/grub/grub.cfg 2>&1 | head -40
echo
echo "-- referenced kernel/initramfs files exist?"
for p in $(grep -hE '^\s*(linux|initrd)\b' /boot/grub/grub.cfg 2>/dev/null \
| awk '{print $2}' | sort -u); do
if [ -e "$p" ]; then echo "EXISTS $p"
elif [ -e "/boot${p}" ]; then echo "EXISTS /boot${p} (grub.cfg path: $p)"
else echo "MISSING $p"
fi
done
else
echo "/boot/grub/grub.cfg NOT FOUND"
fi
echo
echo "-- grubenv:"
grub-editenv /boot/grub/grubenv list 2>/dev/null || cat /boot/grub/grubenv 2>/dev/null | head -5 || echo "(no grubenv)"
banner "initramfs contents — key tools actually packed in?"
if command -v lsinitcpio >/dev/null 2>&1; then
echo "-- matches in /boot/initramfs-linux.img:"
lsinitcpio /boot/initramfs-linux.img 2>/dev/null \
| grep -E '(cryptsetup|dropbear|encryptssh|netconf|mdadm|lvm|/init$|hooks/)' \
| sort -u | head -50
else
echo "lsinitcpio not available"
fi
banner "network: which service manages it?"
for u in systemd-networkd NetworkManager netctl-auto dhcpcd; do
printf " %-22s %s\n" "$u" "$(systemctl is-enabled "$u" 2>&1)"
done
# dhcpcd@interface units (Arch default for static-ish setups)
systemctl list-unit-files 'dhcpcd@*' --no-pager 2>/dev/null | grep -E 'dhcpcd@' || true
banner "network: config files present"
echo "-- /etc/systemd/network/"
ls -la /etc/systemd/network/ 2>&1 | head -20 || echo "(empty/missing)"
echo
echo "-- /etc/NetworkManager/system-connections/"
ls -la /etc/NetworkManager/system-connections/ 2>&1 | head -20 || echo "(empty/missing)"
echo
echo "-- /etc/netctl/"
ls -la /etc/netctl/ 2>&1 | head -20 || echo "(empty/missing)"
echo
echo "-- /etc/hostname / /etc/hosts"
cat /etc/hostname 2>&1 || true
echo "---"
cat /etc/hosts 2>&1 || true
banner "firewall units (would persist across reboots)"
for u in nftables iptables ip6tables firewalld ufw fail2ban docker; do
printf " %-12s %s\n" "$u" "$(systemctl is-enabled "$u" 2>&1)"
done
echo
if [ -f /etc/nftables.conf ]; then
echo "-- /etc/nftables.conf (first 60 lines):"
head -60 /etc/nftables.conf
fi
[ -f /etc/iptables/iptables.rules ] && { echo "-- /etc/iptables/iptables.rules (head 40):"; head -40 /etc/iptables/iptables.rules; }
banner "sshd state + drop-ins"
sshd -t 2>&1
systemctl is-enabled sshd 2>&1
grep -nE '^Port|^ListenAddress|^PermitRootLogin' /etc/ssh/sshd_config 2>&1 || true
echo
echo "-- sshd_config.d/ drop-ins (can override main config!):"
ls -la /etc/ssh/sshd_config.d/ 2>&1 || echo "(no drop-ins dir)"
for f in /etc/ssh/sshd_config.d/*.conf; do
[ -e "$f" ] || continue
echo
echo "-- $f:"
cat "$f"
done
banner "journal: which boots are actually recorded?"
journalctl --list-boots --no-pager 2>&1 | tail -15
banner "last recorded boot (-b 0): all errors"
journalctl -b 0 -p err --no-pager 2>&1 | head -100 || true
banner "last recorded boot (-b 0): sshd"
journalctl -b 0 -u sshd --no-pager 2>&1 | head -40 || true
banner "last recorded boot (-b 0): cryptsetup / dropbear / network units"
journalctl -b 0 \
-u 'systemd-cryptsetup*' -u 'dropbear*' \
-u 'systemd-networkd*' -u 'NetworkManager*' -u 'dhcpcd*' \
--no-pager 2>&1 | head -80 || true
banner "previous boot (-b -1): errors (only if a previous boot is recorded)"
journalctl -b -1 -p err --no-pager 2>&1 | head -50 || true
banner "failed units of last boot"
systemctl --failed --no-pager 2>&1 || true

View File

@@ -0,0 +1,55 @@
#!/bin/bash
# Runs INSIDE the chroot of the installed Arch system. Applies the recommended
# boot / SSH fixes:
#
# 1. PermitRootLogin: rewrite a literal "no" line to "prohibit-password"
# in /etc/ssh/sshd_config AND any drop-in under /etc/ssh/sshd_config.d/.
# Backups are kept once as *.hal-backup.
# 2. Persistent journald: create /var/log/journal so journald survives
# reboot (next boot onwards). Helps catch the next failure if there is one.
#
# Idempotent: re-running is safe — no-op on already-fixed configs.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "PermitRootLogin (before)"
grep -rn '^PermitRootLogin' /etc/ssh/sshd_config /etc/ssh/sshd_config.d/ 2>/dev/null \
|| echo "(no explicit setting found)"
changed=0
for f in /etc/ssh/sshd_config /etc/ssh/sshd_config.d/*.conf; do
[ -e "$f" ] || continue
if grep -q '^PermitRootLogin no$' "$f"; then
[ -f "$f.hal-backup" ] || cp -a "$f" "$f.hal-backup"
sed -i 's/^PermitRootLogin no$/PermitRootLogin prohibit-password/' "$f"
echo "==> Patched: $f (backup at $f.hal-backup)"
changed=1
fi
done
[ "$changed" -eq 0 ] && echo "==> Nothing to patch — PermitRootLogin is not 'no' anywhere."
banner "PermitRootLogin (after)"
grep -rn '^PermitRootLogin' /etc/ssh/sshd_config /etc/ssh/sshd_config.d/ 2>/dev/null \
|| echo "(no explicit setting found)"
banner "sshd_config syntax check"
sshd -t && echo "syntax OK"
banner "persistent journald"
if [ ! -d /var/log/journal ]; then
mkdir -p /var/log/journal
systemd-tmpfiles --create --prefix /var/log/journal 2>&1 || true
echo "==> Created /var/log/journal. journald will persist from next boot onwards."
else
echo "/var/log/journal already exists — journald is already persistent."
fi
banner "/boot space"
df -h /boot
ls -lh /boot
banner "summary"
echo "Done. The changes take effect on the NEXT boot of the installed system."
echo "Exit the chroot and reboot out of rescue when ready."

View File

@@ -0,0 +1,92 @@
#!/bin/bash
# Re-install GRUB stage1 + core.img to the MBR of every physical disk that
# backs /boot's RAID array. Needed when a `pacman -Syu` updated the grub
# package but grub-install was never re-run afterwards, leaving stale
# Stage1 code in the MBR that may not understand the new modules in
# /boot/grub/i386-pc/.
#
# Also regenerates /boot/grub/grub.cfg.
#
# Boot disks are auto-detected from the components of /dev/md0.
# Targets BIOS GRUB (--target=i386-pc); the existing /boot/grub/i386-pc/
# directory confirms this is a BIOS setup.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "current /boot/grub state"
ls -lh /boot/grub/
echo
echo "-- /boot/grub/i386-pc/ — most recent files:"
ls -lt /boot/grub/i386-pc/ 2>/dev/null | head -8
banner "identifying boot disks (members of md0)"
if [ ! -e /dev/md0 ]; then
echo "ERROR: /dev/md0 does not exist. Was the RAID assembled before chroot?"
exit 1
fi
echo "-- mdadm --detail /dev/md0 (member partitions):"
mdadm --detail /dev/md0 | awk '/active sync/ {print " " $NF}'
# Convert a partition path to its parent disk. lsblk fails inside our chroot
# (can't resolve PKNAME against the rescue-bound /sys), so use the standard
# Linux device naming conventions instead.
parent_disk() {
local part="$1"
case "$part" in
/dev/nvme[0-9]*n[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/mmcblk[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/loop[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/sd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
/dev/vd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
/dev/hd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
*)
# Last resort — try lsblk; may return empty in chroot
local d
d=$(lsblk -no PKNAME "$part" 2>/dev/null | head -1)
[ -n "$d" ] && echo "/dev/$d"
;;
esac
}
BOOT_DISKS=()
for part in $(mdadm --detail /dev/md0 2>/dev/null | awk '/active sync/ {print $NF}'); do
disk=$(parent_disk "$part")
[ -z "$disk" ] && { echo "WARN: cannot resolve parent disk for $part"; continue; }
already=0
for d in "${BOOT_DISKS[@]}"; do [ "$d" = "$disk" ] && already=1; done
[ "$already" -eq 0 ] && BOOT_DISKS+=("$disk")
done
if [ "${#BOOT_DISKS[@]}" -eq 0 ]; then
echo "ERROR: could not detect any boot disks."
exit 1
fi
echo
echo "Will run grub-install on: ${BOOT_DISKS[*]}"
banner "regenerating /boot/grub/grub.cfg"
grub-mkconfig -o /boot/grub/grub.cfg 2>&1 | tail -10
banner "reinstalling GRUB to each boot disk"
for disk in "${BOOT_DISKS[@]}"; do
echo
echo "-- grub-install --target=i386-pc --recheck $disk"
grub-install --target=i386-pc --recheck "$disk"
done
banner "post-install state"
echo "-- /boot/grub/i386-pc/ — newest files now:"
ls -lt /boot/grub/i386-pc/ 2>/dev/null | head -6
banner "next steps"
cat <<EOF
1. Exit chroot, umount -R /mnt, reboot.
2. If the system boots normally:
→ root cause confirmed = stale MBR after grub package upgrades
(grub-install was never re-run after a pacman -Syu touched grub).
→ To prevent recurrence, add a pacman hook (Arch wiki: "GRUB").
3. If still unbootable:
→ GRUB stage1 was not the cause. Next bisection: downgrade systemd.
EOF

View File

@@ -0,0 +1,119 @@
#!/bin/bash
# Runs INSIDE the chroot. Downgrades the 5 packages that determine how the
# initramfs is built AND what binaries end up inside it, to the version
# they had before the most recent `pacman -Syu`.
#
# The 5 packages:
# mkinitcpio — build tool. mkinitcpio 41 changed hook handling and may
# silently break setups using older third-party hooks
# (mkinitcpio-utils / -dropbear / -netconf).
# dropbear — SSH daemon in initramfs for remote LUKS unlock. The
# 2025.89 → 2026.90 jump may have changed key/config format.
# cryptsetup — LUKS open in initramfs.
# mdadm — RAID assemble in initramfs.
# lvm2 — LVM activate in initramfs.
#
# Source: /var/log/pacman.log tells us the exact previous versions.
# Files: prefer /var/cache/pacman/pkg/, fall back to archive.archlinux.org.
# After: rebuilds initramfs and regenerates grub.cfg.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
PKGS=(mkinitcpio dropbear cryptsetup mdadm lvm2)
# Arch convention for package-file naming.
pkg_arch() {
case "$1" in
mkinitcpio|mkinitcpio-utils|mkinitcpio-dropbear|mkinitcpio-netconf) echo "any" ;;
*) echo "x86_64" ;;
esac
}
# Extract previous version from the most recent
# "[ALPM] upgraded <pkg> (OLD -> NEW)" line in pacman.log.
prev_version() {
local pkg="$1"
grep -E "\[ALPM\] upgraded $pkg \(" /var/log/pacman.log 2>/dev/null \
| tail -1 \
| sed -E "s/.*upgraded $pkg \(([^ ]+) -> [^)]+\).*/\1/"
}
banner "discovering previous versions from pacman.log"
declare -A FNAMES
TARGETS=()
for pkg in "${PKGS[@]}"; do
prev=$(prev_version "$pkg")
curr=$(pacman -Q "$pkg" 2>/dev/null | awk '{print $2}')
if [ -z "$prev" ]; then
echo " $pkg: no 'upgraded' entry in pacman.log — SKIP"
continue
fi
if [ "$prev" = "$curr" ]; then
echo " $pkg: already at previous version $curr — skip"
continue
fi
arch=$(pkg_arch "$pkg")
fname="${pkg}-${prev}-${arch}.pkg.tar.zst"
echo " $pkg: $curr$prev ($fname)"
FNAMES[$pkg]="$fname"
TARGETS+=("$pkg")
done
if [ "${#TARGETS[@]}" -eq 0 ]; then
echo "Nothing to downgrade."
exit 0
fi
banner "fetching packages"
FILES=()
for pkg in "${TARGETS[@]}"; do
fname="${FNAMES[$pkg]}"
cache="/var/cache/pacman/pkg/$fname"
if [ -e "$cache" ]; then
echo " $pkg: cached → $cache"
FILES+=("$cache")
continue
fi
first_letter="${pkg:0:1}"
url="https://archive.archlinux.org/packages/${first_letter}/${pkg}/${fname}"
out="/tmp/$fname"
echo " $pkg: fetching"
echo " URL: $url"
if curl -fsSL --connect-timeout 15 -o "$out" "$url"; then
size=$(du -h "$out" | cut -f1)
echo " OK ($size)"
FILES+=("$out")
else
echo " FAILED — cannot continue without all packages"
exit 1
fi
done
banner "downgrading (single transaction)"
pacman -U --noconfirm "${FILES[@]}"
banner "rebuilding initramfs (with downgraded mkinitcpio + tools)"
mkinitcpio -P
banner "regenerating GRUB config"
grub-mkconfig -o /boot/grub/grub.cfg 2>&1 | tail -10
banner "result"
for pkg in "${PKGS[@]}"; do
pacman -Q "$pkg" 2>/dev/null || true
done
banner "next steps"
cat <<EOF
1. Exit chroot, umount -R /mnt, reboot.
2. If the system boots and SSH works:
→ root cause is in one of {mkinitcpio, dropbear, cryptsetup, mdadm, lvm2}.
Pin them so the next pacman -Syu does not re-upgrade:
IgnorePkg = ${PKGS[*]}
in /etc/pacman.conf. Bisect later to find the exact culprit.
3. If still unbootable:
→ not the initramfs stack. Remaining suspects: glibc, systemd, iproute2.
Next attempt would be a full rollback of all May-11 package upgrades.
EOF

View File

@@ -0,0 +1,110 @@
#!/bin/bash
# Runs INSIDE the chroot. Downgrades the linux kernel to the previous
# version (the one running BEFORE the most recent `pacman upgraded linux`
# in /var/log/pacman.log). Looks in /var/cache/pacman/pkg/ first; if not
# present, fetches from https://archive.archlinux.org/.
#
# After downgrade: regenerates initramfs + grub.cfg.
#
# Use case: a `pacman -Syu` bumped the kernel to a version that fails to
# boot on this hardware. Rolling the kernel back leaves every other
# package on the new version, so this isolates the kernel as a variable.
#
# Idempotent: if already on the previous version, exits as a no-op.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "determining previous kernel version from pacman.log"
PREV=$(grep -E '\[ALPM\] upgraded linux \(' /var/log/pacman.log 2>/dev/null \
| tail -1 \
| sed -E 's/.*upgraded linux \(([^ ]+) -> [^)]+\).*/\1/')
CURR=$(pacman -Q linux | awk '{print $2}')
if [ -z "$PREV" ]; then
echo "FATAL: Could not parse a previous kernel version from /var/log/pacman.log."
echo " Pacman log entries for 'linux' upgrades:"
grep -E '\[ALPM\] (installed|upgraded) linux \(' /var/log/pacman.log 2>/dev/null \
| tail -5 || echo " (none found)"
exit 1
fi
echo "Currently installed: linux-$CURR"
echo "Previous version: linux-$PREV"
if [ "$PREV" = "$CURR" ]; then
echo "Already on the previous version. Nothing to do."
exit 0
fi
PKG_NAME="linux-${PREV}-x86_64.pkg.tar.zst"
CACHE_PATH="/var/cache/pacman/pkg/${PKG_NAME}"
banner "locating package"
TARGET=""
if [ -e "$CACHE_PATH" ]; then
echo "Found in cache: $CACHE_PATH"
TARGET="$CACHE_PATH"
else
echo "Not in cache. Fetching from archive.archlinux.org ..."
URL="https://archive.archlinux.org/packages/l/linux/${PKG_NAME}"
echo "URL: $URL"
if curl -fsSL --connect-timeout 15 -o "/tmp/${PKG_NAME}" "$URL"; then
TARGET="/tmp/${PKG_NAME}"
echo "Downloaded: $TARGET ($(du -h "$TARGET" | cut -f1))"
else
cat <<EOF >&2
Download failed from $URL.
Reasons might be:
- chroot has no working DNS / no outbound network
- the specific version is no longer on archive.archlinux.org
- upstream temporarily unavailable
Workarounds:
1. Test network from chroot:
curl -v https://archive.archlinux.org/
2. Manually download on your client:
curl -O $URL
and SCP into rescue, then place at:
/mnt/tmp/${PKG_NAME}
(Inside the chroot it appears as /tmp/${PKG_NAME}.)
3. Pick a different version — list at:
https://archive.archlinux.org/packages/l/linux/
EOF
exit 1
fi
fi
banner "/boot space before"
df -h /boot
ls -lh /boot
banner "downgrading kernel (pacman -U)"
pacman -U --noconfirm "$TARGET"
banner "regenerating initramfs"
mkinitcpio -P
banner "regenerating GRUB config"
grub-mkconfig -o /boot/grub/grub.cfg 2>&1 | tail -10
banner "/boot space after"
df -h /boot
ls -lh /boot
banner "result"
pacman -Q linux
banner "next steps"
cat <<EOF
1. Exit chroot, umount -R /mnt, reboot.
2. If system boots and SSH works:
→ root cause confirmed = linux $CURR incompatible on this hardware.
Pin the kernel by adding to /etc/pacman.conf:
IgnorePkg = linux
OR install linux-lts and switch to it as the primary kernel.
3. If still unbootable:
→ kernel was not the cause. Next bisection target: systemd.
EOF

View File

@@ -0,0 +1,69 @@
#!/bin/bash
# Runs INSIDE the chroot of the installed Arch system. Rewrites every
# systemd-networkd *.network file's [Match] block to use MACAddress= instead
# of Name=. This makes the network config survive kernel / systemd upgrades
# that may rename the interface (predictable naming changes, driver enum).
#
# The MAC is auto-detected via `ip link show` (visible because /sys is bind-
# mounted from rescue — same physical NIC, same MAC).
#
# Idempotent: a .network file that already uses MACAddress= is skipped.
# Backups are kept once at <file>.hal-backup.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "detecting NIC MAC"
# Pick the first non-loopback link with a colon-formatted MAC.
MAC=$(ip -br link show 2>/dev/null \
| awk '$1 != "lo" && $1 != "" && $3 ~ /^([0-9a-fA-F]{2}:){5}[0-9a-fA-F]{2}$/ {print $3; exit}')
if [ -z "$MAC" ]; then
echo "Could not auto-detect a non-loopback MAC. Aborting." >&2
exit 1
fi
echo "Detected MAC: $MAC"
banner ".network files (before)"
for f in /etc/systemd/network/*.network; do
[ -e "$f" ] || continue
echo "-- $f:"
cat "$f"
echo
done
banner "patching"
changed=0
for f in /etc/systemd/network/*.network; do
[ -e "$f" ] || continue
if grep -qE '^[[:space:]]*MACAddress[[:space:]]*=' "$f"; then
echo "$f: already uses MACAddress= — skipping"
continue
fi
if ! grep -qE '^[[:space:]]*Name[[:space:]]*=' "$f"; then
echo "$f: no Name= match — skipping"
continue
fi
[ -f "$f.hal-backup" ] || cp -a "$f" "$f.hal-backup"
awk -v mac="$MAC" '
BEGIN { replaced=0 }
/^[[:space:]]*Name[[:space:]]*=/ && !replaced { print "MACAddress=" mac; replaced=1; next }
{ print }
' "$f" > "$f.tmp" && mv "$f.tmp" "$f"
echo "$f: patched (backup at $f.hal-backup)"
changed=1
done
[ "$changed" -eq 0 ] && echo "Nothing to patch — all .network files already use MACAddress=."
banner ".network files (after)"
for f in /etc/systemd/network/*.network; do
[ -e "$f" ] || continue
echo "-- $f:"
cat "$f"
echo
done
banner "summary"
echo "Done. The change takes effect on the NEXT boot of the installed system."
echo "Backups (if any) are at /etc/systemd/network/*.network.hal-backup."

View File

@@ -0,0 +1,124 @@
#!/bin/bash
# Replaces `ip=dhcp` in /etc/default/grub with a static kernel-cmdline
# network spec derived from the existing /etc/systemd/network/*.network file.
#
# Why: Dropbear-in-initramfs relies on a working network for remote LUKS
# unlock. On Hetzner Dedicated, `ip=dhcp` is fragile — Hetzner's own docs
# recommend static configuration for FDE+Dropbear setups. A kernel/iproute2
# upgrade can subtly change the DHCP request format and break the
# previously-working DHCP path.
#
# The .network file already has the correct values (IP, gateway). This
# script reuses them in the kernel cmdline so dropbear has network in
# initramfs without depending on Hetzner DHCP.
#
# Resulting cmdline format (Linux kernel `ip=` documented form):
# ip=<client>:<server>:<gateway>:<netmask>:<hostname>:<device>:<protocol>
# We use:
# ip=46.4.224.77::46.4.224.65:255.255.255.255:echoserver:eth0:none
#
# Idempotent: re-running won't double-patch.
# Reversible: original /etc/default/grub backed up to .hal-backup.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
banner "locating systemd-networkd config"
NETFILE=""
for f in /etc/systemd/network/*.network; do
[ -e "$f" ] || continue
NETFILE="$f"
break
done
if [ -z "$NETFILE" ]; then
echo "ERROR: no /etc/systemd/network/*.network file found."
echo " Cannot derive static IP/gateway."
exit 1
fi
echo "Using: $NETFILE"
echo
cat "$NETFILE"
banner "parsing"
# IPv4 address: first Address= or [Address]/Address= line without colon.
IPV4=$(awk '
/^[[:space:]]*Address[[:space:]]*=/ {
sub(/^[[:space:]]*Address[[:space:]]*=[[:space:]]*/, "")
if ($0 !~ /:/) { print; exit }
}
' "$NETFILE")
IPV4_BARE="${IPV4%%/*}"
# Gateway: first IPv4 Gateway= line.
GATEWAY=$(awk '
/^[[:space:]]*Gateway[[:space:]]*=/ {
sub(/^[[:space:]]*Gateway[[:space:]]*=[[:space:]]*/, "")
if ($0 !~ /:/) { print; exit }
}
' "$NETFILE")
HOST="$(cat /etc/hostname 2>/dev/null | head -1 | tr -d ' \t\n' || true)"
[ -z "$HOST" ] && HOST="host"
# Device: 'eth0' matches the kernel pre-udev naming of the first ethernet
# interface and is what Hetzner uses in their FDE-static-IP docs.
DEVICE="eth0"
echo " IPv4: $IPV4_BARE"
echo " Gateway: $GATEWAY"
echo " Hostname: $HOST"
echo " Device: $DEVICE"
if [ -z "$IPV4_BARE" ] || [ -z "$GATEWAY" ]; then
echo "ERROR: could not parse IPv4 address or gateway from $NETFILE."
exit 1
fi
IPSPEC="ip=${IPV4_BARE}::${GATEWAY}:255.255.255.255:${HOST}:${DEVICE}:none"
echo
echo "Will set kernel cmdline param: $IPSPEC"
banner "current /etc/default/grub"
cat /etc/default/grub
banner "patching /etc/default/grub"
if grep -qE 'ip=dhcp' /etc/default/grub; then
[ -f /etc/default/grub.hal-backup ] || cp -a /etc/default/grub /etc/default/grub.hal-backup
# Replace just the ip=dhcp token (leaves all other kernel params untouched)
sed -i -E "s|ip=dhcp|${IPSPEC}|g" /etc/default/grub
echo "Replaced ip=dhcp → $IPSPEC"
echo "Backup: /etc/default/grub.hal-backup"
elif grep -qE "ip=${IPV4_BARE//./\\.}::" /etc/default/grub; then
echo "Static ip= already configured for $IPV4_BARE — no change."
elif grep -qE 'ip=' /etc/default/grub; then
echo "WARNING: /etc/default/grub has an ip= directive that's neither dhcp"
echo " nor the expected static spec. Manual review needed:"
grep -nE 'ip=' /etc/default/grub
echo "Aborting — won't blindly overwrite an unknown ip= value."
exit 1
else
echo "No ip= directive found in GRUB_CMDLINE_LINUX. Manual edit may be needed."
exit 1
fi
banner "patched /etc/default/grub"
cat /etc/default/grub
banner "regenerating /boot/grub/grub.cfg"
grub-mkconfig -o /boot/grub/grub.cfg 2>&1 | tail -10
banner "verifying"
echo "-- ip= lines in new grub.cfg:"
grep -nE '\bip=' /boot/grub/grub.cfg | head -5 || echo "(no ip= line found — unexpected)"
banner "next steps"
cat <<EOF
1. Exit chroot, umount -R /mnt, reboot.
2. If system boots and SSH works:
→ Root cause was DHCP-in-initramfs fragility (Hetzner side / iproute2
behavior change). Static cmdline IP is the recommended permanent fix.
3. To revert (if anything goes wrong):
cp /etc/default/grub.hal-backup /etc/default/grub
grub-mkconfig -o /boot/grub/grub.cfg
EOF

View File

@@ -0,0 +1,95 @@
#!/bin/bash
# Runs INSIDE the chroot. Full pacman -Syu + initramfs rebuild + GRUB refresh
# (config + MBR on every disk backing /boot's RAID).
#
# CRITICAL: pacman 7.x uses Linux Landlock for its sandbox protection. The
# Hetzner Rescue kernel does NOT enable Landlock, so pacman -Syu inside the
# chroot would fail at the database-sync step with:
# error: restricting filesystem access failed because Landlock is not supported
# error: switching to sandbox user 'alpm' failed!
# The --disable-sandbox flag works around this. Outside the rescue context
# (e.g. on the live system later) the flag is unnecessary.
set -e
banner() { printf "\n========== %s ==========\n" "$1"; }
# Convert a partition path to its parent disk. lsblk fails inside our chroot
# (can't resolve PKNAME against the rescue-bound /sys), so use standard
# Linux device-naming conventions instead. (Same helper as fix/grub.sh.)
parent_disk() {
local part="$1"
case "$part" in
/dev/nvme[0-9]*n[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/mmcblk[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/loop[0-9]*p[0-9]*) echo "${part%p[0-9]*}" ;;
/dev/sd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
/dev/vd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
/dev/hd[a-z]*[0-9]*) echo "$part" | sed -E 's/[0-9]+$//' ;;
*)
local d
d=$(lsblk -no PKNAME "$part" 2>/dev/null | head -1)
[ -n "$d" ] && echo "/dev/$d"
;;
esac
}
banner "pre-upgrade state"
echo "-- key packages BEFORE:"
pacman -Q linux mkinitcpio systemd openssh dropbear cryptsetup mdadm lvm2 grub 2>&1 | head -15
echo
echo "-- /boot space BEFORE:"
df -h /boot
banner "running pacman -Syyu (with --disable-sandbox for Rescue kernel)"
pacman --disable-sandbox -Syyu --noconfirm
banner "rebuilding initramfs"
mkinitcpio -P
banner "identifying boot disks (members of md0)"
if [ ! -e /dev/md0 ]; then
echo "ERROR: /dev/md0 not present. RAID not assembled? Aborting GRUB step."
exit 1
fi
BOOT_DISKS=()
for part in $(mdadm --detail /dev/md0 2>/dev/null | awk '/active sync/ {print $NF}'); do
disk=$(parent_disk "$part")
[ -z "$disk" ] && { echo "WARN: cannot resolve parent disk for $part"; continue; }
already=0
for d in "${BOOT_DISKS[@]}"; do [ "$d" = "$disk" ] && already=1; done
[ "$already" -eq 0 ] && BOOT_DISKS+=("$disk")
done
echo "Boot disks: ${BOOT_DISKS[*]}"
banner "refreshing GRUB on all boot disks"
for disk in "${BOOT_DISKS[@]}"; do
echo
echo "-- grub-install --target=i386-pc --recheck $disk"
grub-install --target=i386-pc --recheck "$disk"
done
banner "regenerating /boot/grub/grub.cfg"
grub-mkconfig -o /boot/grub/grub.cfg 2>&1 | tail -10
banner "post-upgrade state"
echo "-- key packages AFTER:"
pacman -Q linux mkinitcpio systemd openssh dropbear cryptsetup mdadm lvm2 grub 2>&1 | head -15
echo
echo "-- /boot space AFTER:"
df -h /boot
banner "summary"
cat <<EOF
System fully upgraded. Boot stack refreshed:
- All packages on current state from Arch repos
- initramfs rebuilt for the current kernel
- GRUB stage1 + core.img re-written on all boot disks
- grub.cfg regenerated
Recommended next steps:
1. (Optional but recommended) Run \`hal use-static-ip <host>\` afterwards to
harden the initramfs network against future DHCP issues.
2. Exit chroot, umount -R /mnt, reboot, disable Rescue in Hetzner Robot.
3. Watch with: hal status <host>
EOF

View File

@@ -0,0 +1,145 @@
"""SSH helpers using OpenSSH ControlMaster for connection reuse.
The `SshSession` context manager opens a single SSH connection on enter
(interactive: password / host key accept happens here once) and then runs
follow-up commands over the same multiplexed channel without re-auth.
We deliberately wrap the OpenSSH client rather than using a library like
paramiko so the user's existing config (~/.ssh/config, agent, key files,
known_hosts) just works.
"""
from __future__ import annotations
import os
import shutil
import socket
import subprocess
import tempfile
import time
def remove_stale_known_hosts(host: str) -> None:
"""Drop any cached host key for `host`.
Each Hetzner rescue activation generates a fresh host key, so a stale
entry would otherwise block the connection with a MITM warning.
"""
known = os.path.expanduser("~/.ssh/known_hosts")
if not os.path.exists(known):
return
subprocess.run(
["ssh-keygen", "-f", known, "-R", host],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
def tcp_reachable(host: str, port: int, timeout: float = 3) -> bool:
try:
with socket.create_connection((host, port), timeout=timeout):
return True
except (OSError, socket.timeout):
return False
def wait_for_port(host: str, port: int = 22, timeout: int = 300, interval: int = 2) -> bool:
"""Block until host:port accepts TCP or `timeout` elapses."""
deadline = time.monotonic() + timeout
while time.monotonic() < deadline:
if tcp_reachable(host, port, timeout=2):
return True
time.sleep(interval)
return False
class SshSession:
"""Persistent SSH connection to one host via OpenSSH ControlMaster.
Use as a context manager. The master is opened by running a no-op remote
command during __enter__; this is where interactive prompts (password,
host key acceptance) happen. Subsequent `run()` calls reuse the cached
connection.
Example:
with SshSession("rescue.example.com") as ssh:
ssh.run("uname -a")
ssh.run("cat", input_=b"hello")
ssh.run("/bin/bash", tty=True) # interactive shell
"""
def __init__(self, host: str, user: str = "root"):
self.host = host
self.user = user
self._tmpdir: str | None = None
self._sock: str | None = None
# ---- context management -------------------------------------------------
def __enter__(self) -> "SshSession":
self._tmpdir = tempfile.mkdtemp(prefix="hal-ssh-")
self._sock = os.path.join(self._tmpdir, "ctl")
remove_stale_known_hosts(self.host)
# Open the master with a quick no-op. Auth (and any TTY prompts) happen
# right here. After this returns, the socket at self._sock is live and
# follow-up ssh invocations reusing it skip auth entirely.
cmd = [
"ssh",
"-o", "ControlMaster=auto",
"-o", f"ControlPath={self._sock}",
"-o", "ControlPersist=10m",
"-o", "StrictHostKeyChecking=accept-new",
"-o", "ServerAliveInterval=30",
f"{self.user}@{self.host}",
"true",
]
subprocess.run(cmd, check=True)
return self
def __exit__(self, exc_type, exc_val, exc_tb) -> None:
if self._sock and os.path.exists(self._sock):
subprocess.run(
[
"ssh", "-o", f"ControlPath={self._sock}",
"-O", "exit", f"{self.user}@{self.host}",
],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
if self._tmpdir and os.path.isdir(self._tmpdir):
shutil.rmtree(self._tmpdir, ignore_errors=True)
# ---- remote execution ---------------------------------------------------
def run(
self,
remote_cmd: str,
*,
tty: bool = False,
input_: bytes | None = None,
check: bool = True,
capture: bool = False,
) -> subprocess.CompletedProcess:
"""Run `remote_cmd` on the remote host over the multiplexed channel.
remote_cmd : Shell command(s) as a single string. Newlines OK — the
remote shell parses them as multiple statements.
tty : Allocate a remote pseudo-tty (needed for interactive
tools like `bash` or things using /dev/tty).
input_ : Bytes to feed to the remote command's stdin. Mutually
exclusive with tty (no terminal if stdin is a pipe).
check : Raise CalledProcessError on non-zero exit.
capture : Capture stdout/stderr in the returned CompletedProcess
instead of inheriting the parent's.
"""
if tty and input_ is not None:
raise ValueError("tty=True is incompatible with feeding stdin via input_")
cmd = ["ssh", "-o", f"ControlPath={self._sock}"]
if tty:
cmd += ["-t"]
cmd += [f"{self.user}@{self.host}", remote_cmd]
kwargs: dict = {"check": check}
if input_ is not None:
kwargs["input"] = input_
if capture:
kwargs["capture_output"] = True
return subprocess.run(cmd, **kwargs)