I realized a critical detail about my setup: the standard vfs-cache
strategy is a good starting point only if the cache’s performance is superior to the S3 backend. With this theory in mind, it was time to put it to the test.
Category: Incident Response
ZFS Disaster Recovery: Rebuilding and Mirroring a Pool After Top-Level Vdev Error
I recently learned a hard lesson about ZFS Vdev architecture after attempting to convert a single-disk pool into a mirror. By mistake, I added the new disk as a top-level Vdev, rather than attaching it as a mirror. As zpool remove
and zpool detach
both failed on the top-level Vdev, I was forced to destroy the pool and restore the data from a snapshot.
This process outlines how I recovered data and subsequently created a proper mirror configuration.
Continue reading ZFS Disaster Recovery: Rebuilding and Mirroring a Pool After Top-Level Vdev ErrorNextcloud S3 Workaround: Multi-User Rclone Mounts with Systemd Templates
I experienced trouble with Nextcloud’s built-in S3 connector, as it would corrupt photos during auto-upload from the Android client. Since dedicated S3FS or Goofys were also not ideal, I decided on a reliable alternative: using rclone to manage the mounts. This strategy allows me to decouple the unreliable Nextcloud S3 implementation from the underlying object storage.
Continue reading Nextcloud S3 Workaround: Multi-User Rclone Mounts with Systemd TemplatesStrongSwan VPN: Mastering IKEv2 EAP-TLS and ChromeOS Client Integration
StrongSwan is the complete IPsec solution used to secure communication between servers and clients via mutual certificate-based authentication and encryption. This guide documents the necessary implementation steps for the highly secure IKEv2 EAP-TLS protocol, focusing on critical workarounds for seamless ChromeOS integration.
Continue reading StrongSwan VPN: Mastering IKEv2 EAP-TLS and ChromeOS Client IntegrationSuricata Performance: Resolving eBPF Bypass Failure via Manual Kernel Filter Compilation
Enabling eBPF (Extended Berkeley Packet Filter) bypass is the ultimate step in Suricata performance tuning. It allows the kernel to filter known-safe traffic (e.g., TLS data) before the packets reach the resource-intensive Userspace engine. However, this functionality often fails to work out-of-the-box.
I found a bug report confirming that the pre-compiled .bpf
files shipped in my distribution were incompatible with the current libbpf
library (version > 1.0
). Without a successful .bpf
load, the kernel bypass mechanism is completely inactive.
Part I: Diagnosis of the Bypass Failure
To confirm the failure, I checked Suricata’s internal statistics via suricatasc
. The initial output confirmed that the eBPF bypass was not occurring, despite the configuration being set in suricata.yaml
.
Initial Failure Metrics
The metrics show zero packets being bypassed (ipv4_success: 0
):
>>> ebpf-bypassed-stat
Success:
{
"ens3": {
"ipv4_fail": 0,
"ipv4_maps_count": 0,
"ipv4_success": 0,
"ipv6_fail": 0,
"ipv6_maps_count": 0,
"ipv6_success": 0
},
"ens5": {
"ipv4_fail": 0,
"ipv4_maps_count": 78,
"ipv4_success": 0,
"ipv6_fail": 0,
"ipv6_maps_count": 0,
"ipv6_success": 0
}
}
The simple interface status confirmed the failure, but also revealed an underlying issue with checksums that requires further attention:
>>> iface-stat ens3
Success:
{
"bypassed": 0,
"drop": 0,
"invalid-checksums": 11510,
"pkts": 21704175
}
The attempt to load the default .bpf
file resulted in a fatal error:
Error: ebpf: Unable to load eBPF objects in '/usr/lib/suricata/ebpf/bypass_filter.bpf': Operation not supported
Part II: Manual Kernel Filter Compilation
The solution is to manually compile the .bpf
files from the Suricata source code, linking them against the host system’s current libbpf
library. This resolves the version incompatibility.
The Compilation Process
I grab the Suricata source code and configure the build process specifically to include eBPF support:
# Install dependencies as explained in Suricata installation documentation
./scripts/bundle.sh
./autogen.sh
./configure --enable-ebpf-build
# Change into the eBPF directory and compile the kernel filters
cd ebpf
make
Deployment
The newly compiled files are copied to the correct path, replacing the broken distribution files.
cp *.bpf /usr/lib/suricata/ebpf/
Once the corrected filter is loaded, the logs show success:
Info: ebpf: Successfully loaded eBPF file '/usr/lib/suricata/ebpf/bypass_filter.bpf' on 'ens3'
Info: ebpf: Successfully loaded eBPF file '/usr/lib/suricata/ebpf/bypass_filter.bpf' on 'ens5'
Part III: Verification
The successful loading of the eBPF filter confirms that Suricata is now utilizing the kernel to filter traffic before passing it to the Userspace engine, resulting in significant CPU savings.
Final Success Metrics (Post-Compilation)
The metrics now show thousands of successful bypasses, validating the fix:
>>> ebpf-bypassed-stat
Success:
{
"ens3": {
"ipv4_fail": 0,
"ipv4_maps_count": 32,
"ipv4_success": 32292,
"ipv6_fail": 0,
"ipv6_maps_count": 0,
"ipv6_success": 0
},
"ens5": {
"ipv4_fail": 0,
"ipv4_maps_count": 78,
"ipv4_success": 32290,
"ipv6_fail": 0,
"ipv6_maps_count": 0,
"ipv6_success": 0
}
}
The interface statistics now display the successfully bypassed packets:
>>> iface-stat ens5
Success:
{
"bypassed": 807883,
"drop": 0,
"invalid-checksums": 0,
"pkts": 316991330
}
Note: The original log showed a high count of invalid-checksums
. This is a separate, critical issue (often related to offloading) that needs to be addressed, but the eBPF bypass functionality itself is now working.
Sources / See Also
- Suricata Documentation. Working with eBPF and XDP.
https://docs.suricata.io/en/latest/install/ebpf-xdp.html
- Suricata Documentation. Suricata 7 Changelog (Note new policy behavior).
https://suricata.io/changelog/
- Suricata Documentation. FAQ: Traffic gets blocked after upgrading to Suricata 7.
https://suricata-update.readthedocs.io/en/latest/faq.html#my-traffic-gets-blocked-after-upgrading-to-suricata-7
- Libvirt Documentation. VirtIO Device Configuration (Driver Offload Parameters).
https://libvirt.org/formatdomain.html#elementsNICS
- GitHub Repository libbpf. eBPF library source and version compatibility issues.
https://github.com/libbpf/libbpf
- Linux Networking. Understanding the eBPF framework and its application in networking.
https://www.kernel.org/doc/html/latest/networking/filter.html
Suricata IPS: Fixing Legitimate Traffic Drops by Disabling drop-invalid
I encountered a peculiar issue where my WordPress instance was unable to reach wordpress.org
, and DokuWiki could not access its plugin repository. All standard network checks (wget
, curl
, DNS) worked fine, and no drops were registered by the standard firewall rules.
However, logging revealed a problem deep within the Intrusion Prevention System (IPS) layer.
The Diagnostic: Stream Errors
I noticed an unusually high number of dropped packets related to stream errors in the stats.log
:
ips.drop_reason.flow_drop | Total | 837
ips.drop_reason.rules | Total | 3398
ips.drop_reason.stream_error | Total | 19347
This confirmed that Suricata’s TCP Stream Engine was classifying legitimate traffic as invalid, causing the connection to stall before the application layer could proceed. The volume of stream_error
drops was alarmingly high.
Further investigation into Suricata’s internal statistics revealed details about the nature of the errors:
stream.fin_but_no_session | Total | 12508
stream.rst_but_no_session | Total | 2577
stream.pkt_spurious_retransmission | Total | 14735
These specific counters (FINs/RSTs without an active session, spurious retransmissions) point to common issues in asymmetric routing or session tracking in complex bridged/virtualized environments.
The Workaround: Disabling Strict Stream Enforcement
Based on community discussions regarding unexpected drops in IPS mode, I tested a key stream-configuration variable.
The default setting drop-invalid: yes
instructs Suricata to immediately drop packets it deems invalid according to its internal state machine (often due to out-of-sync sequence numbers or timing issues).
The Fix: I set this directive to no
.
stream:
memcap: 64mb
memcap-policy: ignore
drop-invalid: no # Set to 'no' to fix legitimate traffic drops
checksum-validation: yes
midstream-policy: ignore
inline: auto
reassembly:
As soon as I applied this change, the traffic to wordpress.org
and the DokuWiki repository resumed functioning normally.
Conclusion: The Security Trade-off
While this workaround immediately solved the connectivity problem, I am consciously accepting a security trade-off. Disabling drop-invalid
instructs the IPS to allow potentially ambiguous or invalid packets to pass.
- Risk: This allows a low-volume attacker to potentially use malformed packets to bypass the stream state-tracking.
- Benefit: It ensures Service Availability for crucial application updates and connections that the IPS was incorrectly flagging due to virtualization or network environment subtleties.
My next step will be to investigate the root cause of the high stream_error
count to see if the error is caused by a kernel-level configuration or a misaligned network path.
Sources / See Also (Quellen)
- Suricata Documentation. Stream Configuration and Settings (Specifically
drop-invalid
).https://docs.suricata.io/en/latest/configuration/stream.html
- Suricata Documentation. Understanding and Analyzing the Stats Log.
https://docs.suricata.io/en/latest/output/stats/stats-log.html
- Suricata Documentation. IPS Mode and Traffic Drop Reasons.
https://docs.suricata.io/en/latest/performance/ips-mode.html
- OISF Community Forum. Discussion on high stream errors/spurious retransmissions and network offloading. (Diese Art von Diskussion ist der primäre Fundort für solche Workarounds).
- Linux Manpage: ethtool. Documentation on Network Offloading (TSO, GSO, LRO) which often causes Suricata Stream issues.
Suricata AF-Packet: Resolving VirtIO Non-Functionality via Checksum Offload Disablement
This article documents a two-part process: successfully upgrading Suricata to version 7 on Debian Bookworm and solving a critical stability issue required to run the AF-Packet IPS mode with high-performance VirtIO NICs in a virtual machine. Without this specific configuration, the IPS failed to function.
Part I: Suricata 7 Upgrade and Policy Changes
A much newer Suricata version can be installed by utilizing Debian’s bookworm-backports
repository, which is essential for access to the latest security features and performance enhancements.
The Backports Installation
- Ensure the backports repository is configured in your
/etc/apt/sources.list
:deb https://ftp.debian.org/debian/ bookworm-backports contrib main non-free non-free-firmware
- Install Suricata using the specific target:
apt-get install -t bookworm-backports suricata
Post-Upgrade Security Alert (Critical)
After upgrading to Suricata 7, you may experience immediate traffic blocking. This is not a bug, but a deliberate change in the application’s default security posture.
- Reason: Suricata 7 introduced new policy rules that are often set to
drop
by default. - Action: You must review your new
suricata.yaml
configuration. The recommended approach is to install the new configuration files, compare them with your old setup, and set unwanted policies toignore
.
Reference: This new behavior is explicitly documented in the official Suricata 7 Changelog. Consult the Suricata FAQ for troubleshooting details on blocking issues.
Part II: The VirtIO and AF-Packet Critical Failure Fix
When using Suricata in IPS mode with the high-performance AF-Packet acquisition method, using VirtIO NICs is preferred. However, without a specific Libvirt configuration, the IPS fails entirely to process bridged traffic.
The Problematic Default VirtIO Config
If the VirtIO NIC is defined simply with <model type='virtio'/>
in the Libvirt XML, AF-Packet fails to initialize or correctly process traffic.
The Solution: Disabling Guest Checksum Offload
The fix requires overriding the default driver settings by introducing the <driver>
block and explicitly setting checksum (csum) offloading to off
for the guest system.
This solution was found while troubleshooting similar packet loss issues in a thread related to XDP drivers in RHEL environments, suggesting a common kernel/driver interaction problem with aggressive offloading features.
The minimal required working Libvirt XML configuration looks like this:
<interface type='bridge'>
<mac address='..:..:..:..:..:..'/>
<source bridge='ovs-guests'/>
<virtualport type='openvswitch'>
</virtualport>
<model type='virtio'/>
<driver name='vhost'>
<host csum='off' gso='off' tso4='off' tso6='off' ecn='off' ufo='off' mrg_rxbuf='off'/>
<guest csum='off' tso4='off' tso6='off' ecn='off' ufo='off'/>
</driver>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</interface>
Crucial Insight: The key fix is the parameter csum='off'
within the <guest/>
tag. If checksum offloading is left enabled (csum='on'
), the system fails to bridge traffic completely.
Part III: The Deep Dive: Why Checksum Offload Causes Complete Failure
Here is the rationale for why Checksum Offload (CSUM) leads to complete non-functionality:
1. The CSUM Optimization Paradigm (CSUM=’on’)
When you set csum='on'
, you are performing a performance optimization aimed at saving CPU cycles:
- The Host/Hypervisor receives packets and passes them to the VirtIO Driver (Vhost).
- The Vhost Driver passes the packets into the VirtIO Ring in the Guest System, but marks them with a special flag (e.g., in the
skb
—Socket Buffer—metadata) signaling to the Guest Kernel: “Attention, the L3/L4 checksum is invalid/missing and must be corrected or calculated before further processing up the stack.” - This is a performance trick: the CPU-intensive checksum calculation is delegated to the Guest Kernel, but only when it is truly necessary.
2. The Collision Point: AF-Packet Bypass
Suricata using AF-Packet now bypasses precisely this process:
- AF-Packet is a very low-level packet capture method. It operates directly above the driver (or in the kernel) and fetches the raw L2 frames directly from the VirtIO Ring.
- Suricata receives the packet at a point before the standard kernel stack has performed the checksum finalization.
- Suricata’s Deep Packet Inspection (DPI) engine relies on the integrity of the Layer 3/Layer 4 headers (e.g., to check the TCP segment length, track the TCP state machine, or evaluate the validity of IP headers).
- The Non-Functionality: Since Suricata receives a packet with the “Checksum missing/invalid” flag, it interprets this not as an optimization instruction, but as a critical error in the packet itself (Corrupted Packet).
3. The Resolution (CSUM=’off’)
By explicitly setting <guest csum='off'>
, we force the Host/Vhost Driver to deliver the packets to the Guest as if they were ‘normal’ Ethernet frames that already contain all checksums. Suricata therefore only sees complete, consistent packets and can apply the DPI logic without error.
Sources / See Also
- Suricata Documentation. Suricata 7 Changelog (Note new policy behavior).
https://suricata.io/changelog/
- Suricata Documentation. FAQ: Traffic gets blocked after upgrading to Suricata 7.
https://suricata-update.readthedocs.io/en/latest/faq.html#my-traffic-gets-blocked-after-upgrading-to-suricata-7
- Suricata Documentation. Working with AF-Packet.
https://docs.suricata.io/en/latest/install/af-packet.html
- Libvirt Documentation. VirtIO Device Configuration (Driver Offload Parameters).
https://libvirt.org/formatdomain.html#elementsNICS
- Debian Wiki. Instructions for using Debian Backports.
https://wiki.debian.org/Backports
- Suricata Community Forums. Troubleshooting references for XDP/Packet Loss (Context for driver tuning).
https://forum.suricata.io/
- Linux Networking. Understanding the Checksum Offload Mechanism.
https://www.kernel.org/doc/Documentation/networking/checksum-offloads.txt
Automated Defense: Building a Central Log Hub for Fail2ban and External Firewall Integration
A very light-weight and efficient approach for consolidating logs centrally is by using rsyslog. My virtual machines all use rsyslog to forward their logs to a dedicated internal virtual machine, which acts as the central log hub. A fail2ban instance on this hub checks all incoming logs and sends a block command to an external firewall—a process helpful for automated security.
Continue reading Automated Defense: Building a Central Log Hub for Fail2ban and External Firewall IntegrationNextcloud Client on Chromebook (ARM/aarch64): Solving Two-Way Sync
Short explanation on how to get the Nextcloud Linux desktop client working reliably on a Chromebook. This solution is necessary because the official Android desktop client does not offer true two-way synchronization, which is a critical feature for managing files across systems.
Continue reading Nextcloud Client on Chromebook (ARM/aarch64): Solving Two-Way SyncNextcloud and MinIO Integration: Why Direct S3 Fails and the Filesystem Abstraction Workaround
MinIO is a fantastic Object Storage solution, and I intended to use my distributed MinIO system as the primary external storage for Nextcloud. This distributed setup, which uses Sidekick as a load balancer for seamless node access, proved functional but revealed a critical stability flaw, particularly with mobile uploads.
Continue reading Nextcloud and MinIO Integration: Why Direct S3 Fails and the Filesystem Abstraction Workaround