WIP: Implement firewall rules #32
No reviewers
Labels
No labels
bug
duplicate
enhancement
future
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
albert/shepherd-launcher!32
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "u/albert/4/firewall"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes #4
Replaces the Wayland-required shell-based firewall test (scripts/integration-tests/test-firewall.sh + run-activity.sh) with two #[ignore] tests in crates/shepherd-e2e/tests/firewall.rs that run in CI under the existing e2e job: - firewall_unsupported_path_runs_activity: points SHEPHERD_FIREWALL_HELPER at a nonexistent path to force the probe to Unsupported, then verifies activities with [entries.firewall] still launch (regression guard against the silent-no-op bug fixed in the "Make firewall enforcement failures explicit" commit). - firewall_supported_path_invokes_helper_with_expected_argv: drops stub pkcheck/pkexec/shepherd-firewall-helper executables in PATH so the chain runs unprivileged in CI, then asserts the recorded helper argv contains apply-process, --scope-name, --uid/--gid, --default deny, --allow 127.0.0.0/8, --allow ::1/128, and the activity command after `--`. Adds HarnessBuilder::shepherdd_env so a test can inject env vars (SHEPHERD_FIREWALL_HELPER + an augmented PATH) past the harness's env_clear(). No other test changes. Real BPF cgroup filter enforcement still requires CAP_NET_ADMIN, the system systemd manager, and a working polkit, none of which exist in CI. Manual validation continues to use ./scripts/integration-tests/setup-firewall-dev.sh.For hosts that have run setup-firewall-dev.sh and re-logged in: launches an activity through the real shepherd-firewall-helper / pkexec / system systemd-run chain, has a probe script connect to one allowed and one denied TCP target via bash /dev/tcp, and asserts the BPF address filter is actually attached (allow=OPEN, deny=BLOCKED). - crates/shepherd-e2e/tests/firewall_real.rs is a separate test binary so it can run in isolation. It self-skips with [SKIP] on hosts where the helper isn't installed or polkit doesn't grant, so CI's --include-ignored sweep stays a no-op pass. - scripts/integration-tests/run-firewall-probe.sh is the inside-activity probe (atomic log write so the orchestrator never reads a partial file). - scripts/integration-tests/test-firewall.sh pre-checks preconditions, builds, then execs the cargo test with --nocapture. Verified on a configured host: allow=OPEN, deny=BLOCKED, real BPF filter attached. This is the regression guard against shepherdd ever drifting back to the silent-no-op `systemd-run --user --scope` path.Replaces the silent-no-op `systemctl --user --runtime set-property` path (per-user systemd lacks CAP_NET_ADMIN/CAP_BPF, can't attach cgroup_skb) with a new `apply-cgroup` helper subcommand that loads a cgroup_skb/egress program via aya and attaches it directly to the runtime's scope cgroup using the legacy bpf(BPF_PROG_ATTACH) syscall (so the program persists past helper exit until cgroup destruction). - crates/shepherd-firewall-bpf/ is a new out-of-workspace crate targeting bpfel-unknown-none on nightly. ~110 lines: 4 LPM tries (v4/v6 × allow/deny), a DEFAULT array, one cgroup_skb program with deny-then-allow-then-default match order matching systemd's. - shepherd-firewall-helper grows aya as a dep + build.rs that compiles the BPF crate via `rustup run nightly cargo build` (env scrubbed so the parent cargo doesn't override the BPF crate's pinned toolchain). The .o is embedded via include_bytes!. - The new apply-cgroup subcommand validates the cgroup path is under /sys/fs/cgroup/user.slice/user-<PKEXEC_UID>.slice/user@<UID>.service/, loads + populates maps via aya, then BPF_PROG_ATTACH manually so we get the legacy "attach lives until cgroup dies" semantics instead of aya's link-fd-bound attach. - adapter.rs's apply_firewall_to_existing_scope now invokes the new subcommand via pkexec. - New manual test crates/shepherd-e2e/tests/firewall_real_snap.rs + scripts/integration-tests/test-firewall-snap.sh that `snap try --classic`-installs a tiny probe snap, launches it through the API, and asserts allow=OPEN deny=BLOCKED. Verified on a configured host: real BPF filter attached to the snap's runtime-created scope cgroup, loopback reachable, 8.8.8.8:53 dropped. Caveats documented in docs/ai/history/2026-05-02 004 firewall snap bpf via aya.md, notably: CI doesn't yet have the bpf-linker / nightly / LLVM-dev toolchain so this build won't pass CI as-is; flatpak parity is still pending.ip route\instead of strtonum to find the DinD gateway`firewall_real.rs` exercises the full process-firewall enforcement chain: shepherdd launches a [entries.kind=process] activity through the *system* systemd manager, pkexecs the privileged helper, the helper attaches a cgroup_skb BPF program to the activity's scope, and a probe inside the scope confirms loopback succeeds while 8.8.8.8:53 is dropped. Until now the test self-skipped in CI because the prerequisites — systemd as PID1, polkit + dbus running, the helper installed setuid'd, root in the shepherd-firewall group, --privileged + --cgroupns=host for cgroup_skb attach — weren't there. Two pieces: - .ci/Dockerfile: ~70 MB layer for systemd + systemd-sysv + dbus + polkit + sudo, plus a `systemctl mask` pass for the units that fail noisily inside a container (udev, resolved, networkd, NetworkManager, getty, etc.). Other jobs override the entrypoint and don't boot systemd, so they're unaffected. - .github/workflows/ci.yml: new `firewall` job. Rather than ask the user to flip `container.privileged: true` globally on the runner (which would erode isolation for every other job), the job stays a regular non-privileged Forgejo job and uses the dind sidecar it already talks to to launch its *own* private privileged container with `--entrypoint /sbin/init`. The workspace copies in via `tar | docker exec` (the runner's job container and the dind daemon don't share a filesystem, so plain --volume mounts the wrong path), then a single `docker exec` runs `cargo build` of the helper, the project's `scripts/shepherd install firewall --debug` to drop the helper + polkit assets + group, `usermod -aG shepherd-firewall root`, and `sg shepherd-firewall -c "cargo test ... firewall_real"`. The sidecar is torn down on job exit via trap. Snap and flatpak variants stay manual — snapd doesn't run reliably in containers, and the snap/flatpak adapter code is a thin wrapper over the same primitive this test exercises, so a green process test catches ~all of the same regressions. Doc records the design choice (privileged sidecar vs. global runner privileged), the image growth, and a follow-up to add a target/ cache once things land green.Prior to merge, this looks like it needs some more manual validation and a README mention (though maybe that should wait until the browser integration #10)
Looks like we might be missing something in the Rust setup -- I'm seeing the following when building on a fresh machine:
View command line instructions
Manual merge helper
Use this merge commit message when completing the merge manually.
Checkout
From your project repository, check out a new branch and test the changes.