KeyDB Performance: Switching from Redis to a Multi-Threaded Drop-in Alternative

I just read a few pages with benchmarks about Redis vs. KeyDB, and my curiosity was piqued. KeyDB, a multi-threaded fork of Redis, promised significant performance and memory usage improvements. This guide shows how I switched three different Redis use cases to KeyDB.


A Note on Risks and Support

What I show here is not officially recommended by the developers of Nextcloud or Paperless-NGX. KeyDB is designed to be a drop-in replacement, but conflicts can occur. I highly recommend you know what you are doing, thoroughly test the migration, and expect potential operational challenges.

Part I: Installation and Preparation

The installation uses a dedicated repository to ensure the latest version is available.

# Add the KeyDB repository and key
echo "deb https://download.keydb.dev/open-source-dist $(lsb_release -sc) main" | tee /etc/apt/sources.list.d/keydb.list
wget -O /etc/apt/trusted.gpg.d/keydb.gpg https://download.keydb.dev/open-source-dist/keyring.gpg
apt-get update
apt-get install keydb

Part II: Switching Multi-Instance Services (systemd)

I previously configured my multi-instance system using systemd unit templates (redis@amavis.conf). I noticed that both Redis and KeyDB ship with similar scripts (redis-server@ and keydb-server@), making the switch simple.

Migration Procedure

The migration involves copying the old Redis configuration, adapting it for KeyDB, and switching the Systemd service.

  1. Copy and Adapt Configuration: I copy the old Redis configuration and use sed (or a text editor) to change paths and database directories.
cp /etc/redis/redis-amavis.conf /etc/keydb/keydb-amavis.conf

# Example of the final KeyDB configuration
# cat /etc/keydb/keydb-amavis.conf 
include /etc/keydb/keydb.conf

port 6377

pidfile /var/run/keydb-amavis/keydb-server.pid
logfile /var/log/keydb/keydb-server-amavis.log
dbfilename amavis.rdb
dir /var/lib/keydb

maxmemory 300M
  1. Switch Systemd Service: I stop the old service and enable the new keydb-server@ unit.
systemctl disable redis-server@amavis
systemctl stop redis-server@amavis
systemctl enable keydb-server@amavis
systemctl start keydb-server@amavis

This was a successful drop-in replacement for my Amavis and SpamAssassin instances.

Part III: Switching Application Backends

1. Nextcloud (Unix Socket Integration)

My Nextcloud setup relies on a local Unix Socket for communication, which is faster and safer than TCP over localhost. The switch requires changing the socket path in two places:

  1. KeyDB Configuration: Set KeyDB to listen on the Unix socket and adjust permissions.
# /etc/keydb/keydb.conf
Port 0
unixsocket /var/run/keydb/keydb-server.sock
unixsocketperm 770
  1. Application Configuration: Update config.php and php.ini to point to the new socket path.
# Nextcloud config.php excerpt
'host' => '/var/run/keydb/keydb-server.sock',

# PHP.ini for session storage
session.save_path = "unix:///var/run/keydb/keydb-server.sock"

2. Paperless-NGX (Docker Compose)

Switching the caching and broker service to KeyDB in the Docker Compose stack was the easiest migration.

  1. Update docker-compose.yml: I changed the image: tag to the KeyDB image (eqalpha/keydb:latest) and renamed the volume.
# docker-compose.yml (Excerpt)

services:
  broker:
    image: eqalpha/keydb:latest
    restart: unless-stopped
    volumes:
      - keydbdata:/data

volumes:
  ...
  keydbdata: # Renamed from redisdata
  1. Execute Update: Running docker compose down, pull, and up -d replaced the backend, which the stateless Paperless services accepted instantly.

Conclusion

KeyDB provided a successful drop-in replacement for Redis across all my use cases (Systemd, Nextcloud, Docker Compose). It’s a compelling alternative for users seeking multi-threaded performance gains without changing their application logic.

Sources / See Also

  • KeyDB Documentation. Official Installation and Configuration Guides. https://docs.keydb.dev/
  • KeyDB Documentation. KeyDB vs Redis Benchmarks (Multi-Threading). https://docs.keydb.dev/blog/2021/01/18/keydb-benchmarks
  • Redis Documentation. Redis persistence explained. https://redis.io/topics/persistence/
  • systemd Documentation. Using Templates and Instances (Service Management). https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Templates
  • EQ Alpha Labs. Technical blog posts on KeyDB’s architecture and multi-threading. https://docs.keydb.dev/blog/

Container Storage Architecture: Deploying Rclone S3 Mounts via Docker Volume Plugin

I rely on S3 for central storage. Since some tools do not support native S3 yet, I use rclone. This article details how I implement a persistent S3 mount directly into a Docker container (paperless-ngx) using the rclone Docker Volume Plugin, which is a superior method to traditional host-level mounts.

Part I: Plugin Setup and Secure Credentials

1. Installation of the Volume Plugin

The key to this architecture is the Docker Volume Plugin, which allows Docker to manage the entire mount process, including the underlying FUSE execution, transparently. This is cleaner than managing FUSE mounts via the host’s fstab.

# Install FUSE dependency
apt-get -y install fuse

# Create the necessary config directory for the plugin
mkdir -p /var/lib/docker-plugins/rclone/config

# Install and activate the rclone Docker Volume Plugin
docker plugin install rclone/docker-volume-rclone:amd64 args="-v" --alias rclone --grant-all-permissions

2. Rclone Configuration

The rclone.conf must be placed in a directory accessible to the plugin and defines the S3 endpoint and access keys.

# /var/lib/docker-plugins/rclone/config/rclone.conf

[minio]
type = s3
region = somewhere-over-the-rainbow
endpoint = https://your-s3:9000
provider = Minio
env_auth = false
access_key_id = ...
secret_access_key = ...
acl = bucket-owner-full-control

Part II: Volume Architecture and Data Migration

1. Modifying docker-compose.yml (The Architecture)

The goal is to replace the old, local media volume with a new named volume (s3) that uses the rclone driver. This requires two critical changes in docker-compose.yml.

1. Define the Volume Driver (End of File): Define the named volume s3 and specify its driver and driver options.

volumes:
  data:
  #media: <-- COMMENT THIS OUT
  pgdata:
  redisdata:
  s3:
    driver: rclone
    driver_opts:
      remote: "minio:paperless-jean"
      allow_other: "true"
      vfs_cache_mode: "full" # Critical for consistency

Note: The remote value “minio:paperless-jean” references the configuration section [minio] in rclone.conf and the target bucket name (paperless-jean).

2. Update the Mountpoint: In the webserver service definition, the old local mount (media) must be commented out, and the new s3 volume must be assigned the correct path (/usr/src/paperless/media/documents).

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    volumes:
      - s3:/usr/src/paperless/media/documents
      - data:/usr/src/paperless/data 
      #- media:/usr/src/paperless/media <-- REMOVE THIS LINE
      ...

2. Initial Data Migration

After updating docker-compose.yml and running docker compose pull / up, the new s3 volume will be created and mounted. Since the old documents are not yet in S3, a migration is required.

The Strategy: Export all existing documents from the local database and immediately import them back into the new S3-backed volume.

# 1. Export documents from the local database
docker compose exec -T webserver document_exporter -d -c ../export

# 2. Re-import into the new S3-backed volume
docker compose exec -T webserver document_importer ../export

Verification: The log output confirms the successful migration, showing the number of objects copied to the MinIO backend.

Checking the manifest
Installed 1299 object(s) from 1 fixture(s)
Copy files into paperless...
100%|██████████| 1020/1020 [00:56<00:00, 17.96it/s]

Sources / See Also

  1. Rclone Documentation. Mount Options and Usage (VFS Cache Modes). https://rclone.org/commands/rclone_mount/
  2. Nextcloud Documentation. External Storage Configuration (S3 as Primary Storage). https://docs.nextcloud.com/server/latest/admin_manual/configuration_files/external_storage/s3.html
  3. FUSE Project Documentation. Understanding FUSE Filesystems and Permissions (allow-other, umask). https://github.com/libfuse/libfuse
  4. systemd Documentation. Using Templates and Instances (rclone@.service). https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Templates
  5. MinIO Documentation. Reference Guide for S3 Configuration and Endpoints. https://min.io/docs/minio/linux/deployment/distributed-deployment/

Nextcloud S3 Workaround: Multi-User Rclone Mounts with Systemd Templates

I experienced trouble with Nextcloud’s built-in S3 connector, as it would corrupt photos during auto-upload from the Android client. Since dedicated S3FS or Goofys were also not ideal, I decided on a reliable alternative: using rclone to manage the mounts. This strategy allows me to decouple the unreliable Nextcloud S3 implementation from the underlying object storage.

Note: While a native S3 implementation is preferable, this method abstracts the object storage as a block-like device. This compromise is necessary for stability.

Part I: Rclone Setup and S3 Configuration

1. Installation and Credentials

Installing rclone is straightforward. The challenge lies in secure credential management and defining the S3 endpoint.



apt-get install rclone
mkdir /etc/rclone

# Example rclone.conf for MinIO credentials
[minio]
type = s3
region = somewhere-over-the-rainbow
endpoint = http://127.0.0.1:9000
provider = Minio 
env_auth = false
access_key_id = ...
secret_access_key = ...
acl = bucket-owner-full-control

2. The Mounting Challenge

Mounting the entire Nextcloud data directory (/var/www/nextcloud/data) to S3 is suboptimal, as folders like appdata_... and updater often contain volatile data that should not reside on object storage. The cleanest solution is to mount the S3 bucket specifically to the user’s files directory (/var/www/nextcloud/data/user/files), which excludes the trashbin and appdata folders.

Part II: Multi-Instance Management with systemd

To manage mounts for multiple users or multiple S3 buckets without redundant service files, I implemented a systemd multi-instance unit template. This demonstrates efficient Configuration Management and scalable deployment.

Unit Definition (rclone@.service)

The unit template uses the %i placeholder, which corresponds to the username (e.g., rclone@jean).

[Unit]
Description=rclone - s3 mount for nextcloud %i data
Documentation=https://rclone.org/ man:rclone(1)
AssertPathExists=/etc/rclone/rclone-%i.conf
RequiresMountsFor=/var/www/nextcloud/data
Before=nginx.service
After=network-online.target
Wants=network-online.target

Note: The RequiresMountsFor and Before=nginx.service directives are crucial for guaranteeing that the mount point is ready before the webserver attempts to serve files.

Service Execution

The service uses the %i placeholder to reference the correct configuration and the user-specific mount path (/var/www/nextcloud/data/%i/files). The environment variable RCLONE_CONFIG ensures the service loads the correct credential file.

[Service]
Type=notify
Environment=RCLONE_CONFIG=/etc/rclone/rclone-%i.conf
ExecStart=/usr/bin/rclone \
    mount minio:nextcloud-%i /var/www/nextcloud/data/%i/files \
    --allow-other \
    --vfs-cache-mode writes \
    --log-level INFO \
    --log-file /var/log/rclone/rclone-%i.log \
    --umask 002
ExecStop=/bin/fusermount -uz /var/www/nextcloud/data/%i/files
Restart=on-failure
User=www-data
Group=www-data

Part III: Cache Tuning and Verification

VFS Cache and Performance Tuning

I am still experimenting with the optimal mount options, but these settings ensure stability and good performance. The VFS cache settings are critical for preventing corruption and managing the Object Storage paradigm.

   --allow-other \
   --vfs-cache-mode writes \  # Recommended for Nextcloud to handle writes reliably
   --umask 002 \
   --use-server-modtime \
   --transfers 16 \
   --vfs-fast-fingerprint \
   --vfs-cache-max-age 168h \
   --vfs-cache-max-size 15G

Verification and Observability

Verification involves ensuring the mount is persistent and checking the rclone status log.

systemctl enable rclone@jean
systemctl start rclone@jean

# Check status after a few hours
root@nc:~# systemctl status rclone@jean
...
     Active: active (running) since Sun 2024-04-07 04:30:55 CEST; 13h ago
       Docs: https://rclone.org/
             man:rclone(1)
   Main PID: 4105 (rclone)
     Status: "[17:53] vfs cache: objects 21 (was 21) in use 0, to upload 0, uploading 0, total size 895.986Mi (was 895.986Mi)"
...

The log confirms that the mount is active and the VFS cache is managing objects correctly.

Sources / See Also

  1. Rclone Documentation. Mount Options and Usage (VFS Cache Modes). https://rclone.org/commands/rclone_mount/
  2. Nextcloud Documentation. External Storage Configuration (S3 as Primary Storage). https://docs.nextcloud.com/server/latest/admin_manual/configuration_files/external_storage/s3.html
  3. FUSE Project Documentation. Understanding FUSE Filesystems and Permissions (allow-other, umask). https://github.com/libfuse/libfuse
  4. systemd Documentation. Using Templates and Instances (rclone@.service). https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Templates
  5. MinIO Documentation. Reference Guide for S3 Configuration and Endpoints. https://min.io/docs/minio/linux/reference/minio-cli/minio-mc-admin-config.html

Docker Update Automation: Advanced Bash Pipelining. paperless-ngx

This article documents a reliable update script for the Paperless-NGX stack, which minimizes the risk of container failures during automated maintenance. The focus here is not just on simple automation, but on ensuring the integrity of the process—especially handling logs and exit codes within complex Bash pipelines.

Part I: Defining the Problem (The Log and Exit Code Dilemma)

The initial simple script worked, but it suffered from two critical flaws that make it unsuitable for production cron jobs:

  1. Inaccurate Timestamp: The start and end time logged was identical, as the $DATE variable was only defined once at the script’s initiation.
  2. Broken Exit Codes (The Fatal Flaw): Commands inside a pipe (|) often run in a subshell. If docker compose down fails, the pipe’s overall exit code ($?) often reflects the status of the final command (e.g., while read), hiding the initial failure. This means the script might proceed with docker compose pull even if the service failed to stop.

Part II: Solution – Hardening the Bash Pipeline

To create a production-ready script, I implement advanced Bash features to guarantee reliable command execution and accurate logging.

1. The wlog Function (Adding Timestamps and Centralizing Output)

The wlog function is introduced to wrap commands, timestamp the output of every line, and consolidate stdout and stderr (2>&1), enabling central logging.

wlog () {
  local cmd="$@"
  # Redirects command output through the pipeline
  $cmd 2>&1 | while read -r l; do d=`date`; echo "$d: $l"; done
} 

2. Resolving Exit Codes and Pipeline Integrity

The failure of the initial script to correctly capture the exit code is solved by enabling two shell options, which are available since Bash 4.2:

# Required for reliable pipelines (available Bash 4.2+)
shopt -s lastpipe
shopt -so pipefail
  • shopt -s lastpipe: Forces the last segment of the pipe (while read) to run in the current shell, allowing $? to be reliably checked.
  • shopt -so pipefail: Ensures the exit code of the pipeline is that of the first command that failed (this is critical for safe automation).

Part III: The Final Automation Script

The final script applies these techniques, ensuring that docker compose pull only executes if docker compose down was successful (&& operator).

#!/bin/bash

set -e
shopt -s lastpipe
shopt -so pipefail

PDIR=/opt/paperless/paperless-ngx
LOG=/opt/paperless/docker-compose-cron.log

wlog () {
  local cmd="$@"
  $cmd 2>&1 | while read -r l; do d=`date`; echo "$d: $l"; done
}

wlog echo "Starting Docker Compose Update" >> $LOG
cd $PDIR

# 1. Stop and pull only if successful
wlog /usr/bin/docker compose down >> $LOG && wlog /usr/bin/docker compose pull >> $LOG

# 2. Start all containers
wlog /usr/bin/docker compose up --wait -d >> $LOG

wlog echo "Finished Docker Compose Update" >> $LOG

Verification (Log Output)

The log output now provides precise timestamps for every step of the Docker Compose operation, fulfilling the Observability requirement.

Wed Feb 21 21:29:45 CET 2024: Container paperless-webserver-1  Stopping
Wed Feb 21 21:29:53 CET 2024: Container paperless-webserver-1  Stopped
...
Wed Feb 21 21:30:03 CET 2024: Network paperless_default  Removed
...

Part IV: Conclusion and Alternatives

This solution provides reliable automation using pure Bash. However, be aware that solutions like Docker Watchtower may offer a simpler, container-native approach if complex exit code logic is not required.

Sources / See Also

  1. GNU Bash Reference Manual. Shell Options for Pipeline Management (shopt -s lastpipe, pipefail). https://www.gnu.org/software/bash/manual/bash.html
  2. Docker Documentation. Docker Compose Upgrade and Maintenance. https://docs.docker.com/compose/compose-file/08-upgrade/
  3. Docker Documentation. Reference for Docker Compose CLI commands (down, pull, up). https://docs.docker.com/compose/reference/overview/
  4. Linux Manpage: date. Usage of the date command for precise timestamping.
  5. Linux Manpage: cron. Syntax and execution environment for automated job scheduling.

Paperless-NGX Maintenance: Routine Updates and Major Stack Upgrades

This article documents the process for updating and upgrading the Paperless-NGX stack. This covers everything from simple container image updates to complex major version upgrades of backend services like PostgreSQL.

Part I: Routine Maintenance

Updating Paperless-NGX itself and its stateless dependencies is simple. I installed it to /opt/paperless, so I always execute the following commands under the dedicated unprivileged user.

Stopping, Pulling, and Restarting

The process involves stopping all containers, pulling new images, and bringing the stack back up.

# 1. Stop all containers
~:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose down
[+] Running 6/6
 ✔ Container paperless-webserver-1  Removed                              6.2s 
 ✔ Container paperless-db-1         Removed                              0.3s 
... (remaining container removal output)
 ✔ Network paperless_default        Removed                              0.2s 

# 2. Update/Pull latest images
~:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose pull   
[+] Pulling 15/15
 ✔ db Pulled                                                             1.0s 
... (Detailed output confirming image downloads)
 ✔ gotenberg 10 layers [⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿]      0B/0B      Pulled          48.6s 
...

# 3. Starting the stack (The -d detaches the process)
~:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose up -d
[+] Running 5/5
 ✔ Container paperless-db-1         Started                              0.5s 
... (Output confirming all services started)

Part II: Major Version Upgrades

Upgrading major services (e.g., PostgreSQL or Gotenberg) requires changing the container image tag in docker-compose.yml and executing a specific data migration procedure.

Database Upgrade Strategy

For PostgreSQL (e.g., v15 to v16), I identified three primary variants for handling the underlying database file format change:

  1. Variant 1 (Dumps): Dump the old database, upgrade the image, and import the dump.
  2. Variant 2 (App Export/Import): Use Paperless-NGX’s built-in exporter/importer. (My preferred clean strategy).
  3. Variant 3 (pg_upgrade): Use the official PostgreSQL tool. (Complex, but efficient for very massive databases).

Recommendation: Before proceeding with any major version upgrade, check if the new versions are officially supported by Paperless-NGX, and create a full snapshot or backup.

Variant 2: Application Export/Import

This variant is often the cleanest way to upgrade, as the application handles the data transfer logic.

1. Create Backup and Export Documents

# Export documents via the webserver container
~:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose exec -T webserver document_exporter ../export 
100%|██████████| 1004/1004 [00:07<00:00, 126.43it/s]

root@paperless:/opt/paperless/paperless-ngx# du -sh export 
986M    export

2. Modify Stack Versions

I update the image: tags in docker-compose.yml to the desired new versions:

# Example modifications in docker-compose.yml
image: docker.io/library/postgres:16
image: docker.io/gotenberg/gotenberg:8

3. CRITICAL STEP: Volume Isolation

To ensure the new PostgreSQL container initializes a clean, compatible database, the old volume data must be isolated. This is the manual, atomic volume management step.

# Manually rename the old volumes for a clean start:
cd /var/lib/docker/volumes
mv paperless_media paperless_media_backup
mkdir -p paperless_media/_data
mv paperless_pgdata paperless_pgdata_backup
mkdir -p paperless_pgdata/_data

4. Start and Import Data

The new environment starts up with the fresh PostgreSQL 16 database.

# Start the new environment
~:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose up -d
... (Success output)

# Import the data into the clean database
~:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose exec -T webserver document_importer ../export
Checking the manifest
Installed 1278 object(s) from 1 fixture(s)
Copy files into paperless...
100%|██████████| 1004/1004 [00:16<00:00, 62.37it/s]
Updating search index...
100%|██████████| 1004/1004 [00:24<00:00, 41.64it/s]

Part III: Automation and Operational Considerations

Automation Script (Example Cronjob)

For simple minor version updates (where latest tags are used and volumes are untouched), this script provides basic automation.

#!/bin/bash

set -e 

# Simple script to stop, pull, and restart the stack
PDIR=/opt/paperless/paperless-ngx
LOG=/opt/paperless/docker-compose-cron.log

# Stop all containers and pull if successful
/usr/bin/docker compose down >> $LOG 2>&1 && /usr/bin/docker compose pull >> $LOG 2>&1

# Start all containers
/usr/bin/docker compose up --wait -d >> $LOG 2>&1

Note: Automating major version upgrades (requiring volume management) must always be performed manually.

Operational Pitfalls

It is crucial to verify the integrity of the stack after any upgrade. Issues like compatibility problems between new versions of supporting services (e.g., Gotenberg) and Paperless-NGX confirm that manual verification after a major upgrade is mandatory.

Sources / See Also

  1. Paperless-NGX Documentation. Upgrade Guide. https://docs.paperless-ngx.com/upgrade/
  2. Docker Documentation. Docker Compose Upgrade and Maintenance. https://docs.docker.com/compose/compose-file/08-upgrade/
  3. PostgreSQL Documentation. Major PostgreSQL Version Upgrades. https://www.postgresql.org/docs/current/upgrading.html
  4. GitHub Repository paperless-ngx/paperless-ngx. Discussions on Gotenberg Compatibility and Database Schema. https://github.com/paperless-ngx/paperless-ngx/

Paperless-NGX Setup: Installation, Security, and NGINX Integration

When I read about paperless-ngx, I was immediately drawn to the idea of having all my documents indexed (via OCR) and centrally stored. With a proper tagging system, exporting my documents for my annual tax declaration should only take seconds.

The installation procedure is straightforward but contains several critical security pitfalls that must be addressed, especially when integrating a reverse proxy. Here are my notes on setting up Paperless-NGX in Debian 12 Bookworm.

Part I: Installation and Secure User Setup

1. Install Docker Engine

Please consult the official Docker documentation for the installation of the Docker Engine.

2. Add a Dedicated, Unprivileged User

The safest approach is to use a dedicated system user. This ensures the application does not run with root privileges, even if the installation script or containers were ever compromised.

# 1. Create dedicated system user 'paperless'
adduser paperless --system --home /opt/paperless --group

# 2. Grant the user permissions to use Docker
usermod -aG docker paperless

3. Run the Install Script Securely

Execute the official install script using the newly created, unprivileged paperless user by leveraging sudo -Hu paperless.

sudo -Hu paperless bash -c "$(curl --location --silent --show-error https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"

My Configuration Settings during the script:

SettingRecommended ValueRationale
URLhttps://documents.example.comNecessary for reverse-proxy and SSL configuration.
Database backendpostgresRecommended for production and better performance compared to SQLite.
Enable Apache Tika?yesRequired for indexing complex document types (Word, Excel, PowerPoint).
OCR languagedeu+eng+fra+araCaution: Each language increases resource usage. Choose only what is necessary.

Part II: Configuration and Container Management (Beginner Guide)

1. Modifying Configuration (docker-compose.env)

The environment variables are managed via the docker-compose.env file located in the installation directory (/opt/paperless/paperless-ngx/).

I recommend immediately setting the following variables, which are essential for security and functionality:

PAPERLESS_URL=https://documents.example.com
PAPERLESS_SECRET_KEY=------------USE-A-LONG-CRYPTIC-RANDOM-KEY----------------
PAPERLESS_OCR_LANGUAGE=ara+deu+eng+fra
PAPERLESS_OCR_LANGUAGES=ara deu eng fra # Note: space vs. plus sign syntax
PAPERLESS_CONSUMER_RECURSIVE=true
PAPERLESS_PORT=8000
  • OCR Note: Be sure to set both variables (_LANGUAGE and _LANGUAGES) as the syntax requirements for the Tesseract engine and the Docker Compose files differ.
  • CONSUMER_RECURSIVE: Set to true to allow dropping folders into the consume directory.

2. Container Management: Start, Stop, and Update

For users new to Docker, knowing the exact commands for managing the environment after configuration changes is essential.

First, navigate to the directory containing the configuration files:

# cd /opt/paperless/paperless-ngx/

Stop and Restart (After configuration change):

root@paperless:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose down
[+] Running 6/6
 ✔ Container paperless-webserver-1  Removed                                                   6.9s 
...
root@paperless:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose up -d
[+] Running 6/6
 ✔ Network paperless_default        Created                                                   0.1s 
...
 ✔ Container paperless-webserver-1  Started                                                   0.0s

Update (Pulling new container images):

root@paperless:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose down
root@paperless:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose pull
[+] Pulling 35/22
...

Part III: Critical Security Fix and NGINX Integration

1. CRITICAL SECURITY FLAW: Port Exposure Fix

The default installation (as of writing this article: 17. Dezember 2023) does not bind the Paperless-NGX webserver (Port 8000) to localhost (127.0.0.1). This means if you lack a strict host firewall, the Paperless login page is accessible from the internet via Port 8000.

Proof of Exposure: A netstat check shows global listening:

tcp        0      0 0.0.0.0:8000            0.0.0.0:* LISTEN

The Fix: You must edit the ports directive in the docker-compose.yml to explicitly set the binding to 127.0.0.1.

# /opt/paperless/paperless-ngx/docker-compose.yml (webserver section)
    ports:
      # CRITICAL: Only the localhost can reach Port 8000 on the host.
      - "127.0.0.1:8000:8000" 

2. NGINX SSL/TLS Basic Hardening

Since Paperless-NGX handles sensitive personal documents, a strong TLS configuration is mandatory. I suggest using the Mozilla SSL Configuration Generator as a reference for modern best practices.

Recommendations:

  • ECDSA Certificates: Use ECDSA certificates (e.g., secp384r1) over legacy RSA keys for better performance and security.
  • HSTS: Implement Strict-Transport-Security (HSTS) to force browsers to always use HTTPS.
  • TLS Protocol: Use ssl_protocols TLSv1.3; to ensure only the most current and secure protocol is allowed.

3. Header Management and Inheritance Logic

A common pitfall with NGINX is the add_header directive. If you use even one add_header directive within a location {} block, it overrides/disables all header inheritance from the parent server {} block.

This means if you add the Referrer-Policy header in your location / {} block, you must re-declare all other global headers (like HSTS and other security headers) there as well.

4. Essential Security Headers

To ensure defense against common web attacks, I use a separate headers.conf file:

Nginx

# headers.conf in /etc/nginx/conf.d/
add_header X-Frame-Options "SAMEORIGIN" always;         # Clickjacking Defense
add_header X-Content-Type-Options "nosniff" always;    # MIME-Sniffing Defense
add_header X-XSS-Protection "0" always;                # Disables obsolete browser protection
add_header Permissions-Policy "camera=(), microphone=()" always; # Prevents browser access to peripherals

5. Content Security Policy (CSP)

CSP is the most crucial defense against Cross-Site Scripting (XSS). Paperless-NGX’s UI uses inline scripts and styles, which complicate the policy.

The following CSP is a working compromise, allowing essential inline elements while blocking common injection points. I strongly suggest using the developer console to check for any blocked resources after implementation.

Nginx

# Functional CSP for paperless-ngx
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; img-src data: 'self'; upgrade-insecure-requests" always;

Note: Using 'unsafe-inline' is often necessary for applications that have not fully adopted modern CSP practices.

6. Blocking Search Engine Indexers (robots.txt)

Since this is a system for private documents, we must prevent all search engines and indexing services from crawling or indexing the instance, regardless of the login protection.

This is easily achieved in NGINX without creating a file on the disk:

Nginx

location = /robots.txt {
  add_header Content-Type text/plain;
  return 200 "User-agent: AdsBot-Google\nUser-agent: *\nDisallow: /\n";
}

Part IV: Final Site Configuration

The final NGINX site configuration combines all security requirements (HSTS, Headers, robots.txt) and correctly proxies to the secure loopback address.

Nginx

server {
  server_name documents.example.com;

  add_header Strict-Transport-Security "max-age=63072000" always;
  add_header Referrer-Policy "strict-origin-when-cross-origin";
  include conf.d/headers.conf; # Includes basic security headers

  location = /robots.txt {
    add_header Content-Type text/plain;
    return 200 "User-agent: AdsBot-Google\nUser-agent: *\nDisallow: /\n";
  }

  location / {
    proxy_pass http://localhost:8000/;

    # Required headers for secure proxying and WebSockets
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    # ... other proxy settings ...
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_redirect off;
  }
  
  # TLS Configuration
  listen 443 ssl; 
  ssl_certificate /etc/letsencrypt/live/.../fullchain.pem; 
  ssl_certificate_key /etc/letsencrypt/live/.../privkey.pem; 
  ssl_trusted_certificate /etc/letsencrypt/live/.../chain.pem;
}

# HTTP Redirect
server {
  listen 80;
  server_name documents.example.com;
  return 301 https://$host$request_uri;
}

Part V: Further Hardening Suggestions

To move beyond the basic secure setup, I suggest investigating these advanced hardening techniques:

AreaSuggestionGoal
AuthenticationExternal Authentication: Implement a proxy layer like Authelia or Keycloak to enforce Multi-Factor Authentication (MFA) before the Paperless-NGX login page.Zero Trust: Protect against Brute-Force attacks before they reach the application.
Rate LimitingFail2ban Integration: Configure Fail2ban to monitor NGINX access logs for login failures and automatically block the source IP.Brute-Force Defense at the network/IP layer.
Protocol SecurityDisable TLSv1.2: If all client devices are modern, disable TLSv1.2 completely to enforce TLSv1.3 only.Eliminate older, potentially vulnerable crypto-protocols.
Security HeadersStrong CORS Policies: Implement strict CORS headers (Cross-Origin Resource Sharing) to prevent the Paperless instance from being used to serve resources to unauthorized external domains.Defense against Cross-Origin attacks.

Sources / See Also

  1. Paperless-NGX Documentation. Installation Guide. https://docs.paperless-ngx.com/setup/
  2. Paperless-NGX Documentation. Advanced Tasks: Fail2ban. https://docs.paperless-ngx.com/advanced_tasks/#fail2ban
  3. Docker Documentation. Install Docker Engine. https://docs.docker.com/engine/install/debian/
  4. Mozilla SSL Configuration Generator. A reference tool for modern TLS configurations. https://ssl-config.mozilla.org/
  5. Scott Helme. Hardening Your HTTP Response Headers (X-Frame-Options, X-Content-Type-Options). https://scotthelme.co.uk/hardening-your-http-response-headers/
  6. Scott Helme. Content Security Policy – An Introduction. https://scotthelme.co.uk/content-security-policy-an-introduction/
  7. NGINX Documentation. Understanding the NGINX add_header Directive. http://nginx.org/en/docs/http/ngx_http_headers_module.html#add_header

Distributed MinIO on AWS Lightsail: Multi-Node Setup

MinIO is a high-performance, S3-compatible object storage solution. This article provides a blueprint for deploying a distributed MinIO stack using Amazon Lightsail, covering the critical steps for multi-node setup, networking, and Systemd.

Continue reading Distributed MinIO on AWS Lightsail: Multi-Node Setup

Nextcloud and MinIO Integration: Why Direct S3 Fails and the Filesystem Abstraction Workaround

MinIO is a fantastic Object Storage solution, and I intended to use my distributed MinIO system as the primary external storage for Nextcloud. This distributed setup, which uses Sidekick as a load balancer for seamless node access, proved functional but revealed a critical stability flaw, particularly with mobile uploads.

Continue reading Nextcloud and MinIO Integration: Why Direct S3 Fails and the Filesystem Abstraction Workaround

Nextcloud Migration and Database Performance: Solving Deadlocks with PostgreSQL

Getting the famous “1213 Deadlock found when trying to get lock; try restarting transaction” error in Nextcloud can be frustrating. This issue affected many users and was discussed in bug reports like this: Nextcloud Deadlock Issue
. The community frequently recommends switching the backend database to PostgreSQL. While I was initially skeptical, the migration proved to be the definitive solution for this recurring issue in my setup.

This guide outlines the streamlined procedure for migrating Nextcloud from MariaDB/MySQL to PostgreSQL. The process is uncomplicated and can drastically improve system stability.

Continue reading Nextcloud Migration and Database Performance: Solving Deadlocks with PostgreSQL

Nextcloud Performance Tuning: PHP, Redis, and Database Optimization

Just a quick guide on how I install Nextcloud. This covers Nextcloud 25.0.1 with PHP 8.1 on Debian Bullseye, optimized with Redis, APCu, and MariaDB.

Continue reading Nextcloud Performance Tuning: PHP, Redis, and Database Optimization