Paperless-ngx with S3 using rclone

It’s easier for me to access all my data with some sort of a central storage – for this purpose I decided to use S3 a long time ago. Some cool tools I use do not have native s3 support (yet) but rclone helps with that. In this article I’ll show you how to use the docker volume plugin with a minio s3 storage configured in docker-compose.yml for use with paperless-ngx.

Backups. That’s how we shall always start. Once you are sure you have a Backup I would advise to make an update first if possible. So that nothing stops you from re-importing your documents later. Once you updated your paperless-ngx docker compose installation continue by exporting all documents:

docker compose exec -T webserver document_exporter -d -c ../export
100%|██████████| 1020/1020 [00:35<00:00, 29.08it/s]

The next step then is to install the docker volume plugin in the host:

apt-get -y install fuse
mkdir -p /var/lib/docker-plugins/rclone/config
mkdir -p /var/lib/docker-plugins/rclone/cache
docker plugin install rclone/docker-volume-rclone:amd64 args="-v" --alias rclone --grant-all-permissions

Then edit your docker-compose.yml. Here are the relevant snippets. Basically add the named storage s3 to the volumes list of the webserver. The mountpoint is /usr/src/paperless/media/documents. I used that directory so that I do not add logs, locks and other stuff to the s3. Comment the old media volume (you may still reference it by media, but then you may need to docker volume rm the current media first or it won’t work).

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    [..]
    volumes:
      - s3:/usr/src/paperless/media/documents
      - data:/usr/src/paperless/data 
      #- media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    [..]

Then at the end of this file there is a section for all the volumes. Comment the media section and add the s3 section:

volumes:
  data:
  #media:
  pgdata:
  redisdata:
  s3:
    driver: rclone
    driver_opts:
      remote: "minio:paperless-jean"
      allow_other: "true"
      vfs_cache_mode: "full"

The name for my bucket is paperless-jean. I reference the remote using “minio”. Create a rclone.conf in /var/lib/docker-plugins/rclone/config it could look like this – just an example:

[minio]
type = s3
region = somewhere-over-the-rainbow
endpoint = https://your-s3:9000
provider = Minio
env_auth = false
access_key_id = 
secret_access_key = 
acl = bucket-owner-full-control

Then do the usual docker compose pull, and docker compose up -d. You should see the s3 mount also in the host e.g. df -hT should contain something like:

df -hT | grep minio
minio:paperless-jean fuse.rclone  1.0P     0  1.0P   0% /var/lib/docker/plugins/[..]/propagated-mount/paperless_s3

docker volume list should contain paperless_s3:

docker volume list | grep s3
rclone:latest   paperless_s3

Now re-import all documents using:

docker compose exec -T webserver document_importer ../export
Found existing user(s), this might indicate a non-empty installation
Found existing documents(s), this might indicate a non-empty installation
Checking the manifest
Installed 1299 object(s) from 1 fixture(s)
Copy files into paperless...
100%|██████████| 1020/1020 [00:56<00:00, 17.96it/s]
Updating search index...
100%|██████████| 1020/1020 [00:41<00:00, 24.52it/s]

According to the above 1020 files. According to my s3 now there are 2961 objects in s3. Now that I do have all my paperless-ngx documents also in my S3 I can easily access them from other places, as well.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.