When I read about paperless-ngx I liked the idea of having all my documents in a central storage so that I could access them from all my devices. Furthermore those documents would be indexed (also using OCR) so that I could search (fulltext) in all of them. Due to the tagging system – if done correctly – exporting all my documents for my yearly tax declaration should just take seconds…
Installation
The Installation of paperless-ngx is easy. However, there are a few stumbling blocks. That is because currently the installation guide is not working as-is. If you try the bare-metal Installation within a VM (Debian Bookworm) you’ll have trouble with some dependencies like python-ipware. I did not try the installation with Debian Bullseye or Ubuntu. So maybe it works for you but it did not for me.
The installation procedure is described here: https://docs.paperless-ngx.com/setup/.
I think the most easy installation is the docker install script one. However, this one will also not just work as-is. If you run the script as root-user the script will correctly tell you that you should not run the script as root-user. And yes – you should not (blindly) run any scripts found on the internet as root-user unless you checked those scripts and understand fully what they’re doing. However, you probably also do not want to install paperless-ngx into your normal user account. So you need to add a user first.
Here are my notes on how to setup paperless-ngx in Debian 12 Bookworm.
1. Install Docker-Engine
Please consult the docker documentation. (Hint: I use Docker Engine, not Docker Desktop – But your requirements might be different). There is also a Debian installation procedure page.
2. Add a user for paperless
adduser paperless --system --home /opt/paperless --group
# give the user paperless docker permissions
usermod -aG docker paperless
3. Run the install script using the previously created user
Just as in the official documentation, I only added “sudo -Hu paperless” to their command:
sudo -Hu paperless bash -c "$(curl --location --silent --show-error https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"
The installation script will ask you a few things, here is what I set:
# Set the URL this will run on later, e.g
URL []: https://documents.example.com
# I suggest to leave the default port. Later I use NGINX as a reverse-proxy which
# will forward to port 8000
Port [8000]:
# Not much to say about the timezone I guess...
Current time zone [Europe/Berlin]:
# use sqlite if low memory system; however I would recommend postgres.
Database backend (postgres sqlite mariadb) [postgres]:
# you also want feed paperless-ngx with documents like Word, Excel, Powerpoint...?
Enable Apache Tika? (yes no) [no]: yes
# Every language you add, needs more resources. So only choose those you really
# need. I have documents in german, english, french and arabic.
OCR language [eng]: deu+eng+fra+ara
# don't touch.
User ID [107]:
Group ID [115]:
# the user accounts needs to have access to this directory.
Target folder [/opt/paperless]:
For the remaining settings just pick the defaults unless you know better.
4. Modify the installation
Something I completely miss in this short part of the documentation is how to modify the configuration of paperless if you go by the docker install script. I assume this is clear for people who are used to Docker – For people who are not used to Docker this is unclear.
If you follow above steps there will be a docker-compose.env in /opt/paperless/paperless-ngx:
root@paperless:/opt/paperless/paperless-ngx# ls
consume docker-compose.env docker-compose.yml export
You can modify the configuration of paperless-ngx using this file. I added a few settings to it:
PAPERLESS_URL=https://documents.example.com
USERMAP_UID=107
USERMAP_GID=115
PAPERLESS_TIME_ZONE=Europe/Berlin
PAPERLESS_OCR_LANGUAGE=ara+deu+eng+fra
PAPERLESS_SECRET_KEY=------------CHANGEME----------------
PAPERLESS_OCR_LANGUAGES=ara deu eng fra
PAPERLESS_CONSUMER_RECURSIVE=true
PAPERLESS_PORT=8000
You can see a list of the possible values here. I would recommend you first try WITHOUT touching any of these settings to get a feeling and understanding of what they do. However, I set
OCR_LANGUAGES and OCR_LANGUAGE mind that one is with spaces, the other one uses a + to separate multiple languages. Also mind, that multiple languages will require more resources.
CONSUMER_RECURSIVE to true because I want to also throw folders into the consumer directory.
5. Run, Update, Stop
If you change the settings in docker-compose.env you just need to restart the environment. This works by issuing docker compose down, docker compose up -d. The -d switch to detach – else you will have it in foreground.
First switch to the folder where the docker-compose.env is:
# cd /opt/paperless/paperless-ngx/
Then you can stop the environment using docker compose down:
root@paperless:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose down
[+] Running 6/6
✔ Container paperless-webserver-1 Removed 6.9s
✔ Container paperless-db-1 Removed 0.3s
✔ Container paperless-tika-1 Removed 0.4s
✔ Container paperless-gotenberg-1 Removed 10.2s
✔ Container paperless-broker-1 Removed 0.4s
✔ Network paperless_default Removed 0.3s
If you want to update paperless, just use docker compose down like above. Then use docker compose pull.
root@paperless:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose pull
[+] Pulling 35/22
✔ webserver 17 layers [⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿] 0B/0B Pulled 10.8s
✔ tika Pulled 0.5s
✔ gotenberg Pulled 1.0s
✔ db 13 layers [⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿] 0B/0B Pulled 15.7s
✔ broker Pulled 1.4s
And to start:
root@paperless:/opt/paperless/paperless-ngx# sudo -Hu paperless docker compose up -d
[+] Running 6/6
✔ Network paperless_default Created 0.1s
✔ Container paperless-tika-1 Started 0.1s
✔ Container paperless-broker-1 Started 0.1s
✔ Container paperless-gotenberg-1 Started 0.1s
✔ Container paperless-db-1 Started 0.1s
✔ Container paperless-webserver-1 Started 0.0s
You can also force the recreation of the containers by adding –force-recreate to the docker compose up -d command.
Running paperless-ngx behind NGINX
The documentation shows how to use NGINX as a reverse proxy for paperless-ngx. This is a good starting point. However, if you deploy paperless-ngx on the Internet you may want to do a little bit more.
If possible for you, I would suggest you to use Wireguard, OpenVPN or an IPSEC tunnel using Strongswan and make this a requirement for connecting to the paperless-ngx instance. Because usually you will have sensitive data / documents in paperless-ngx.
Port 8000 not bound to localhost – Attention!
I followed the guide as written on the paperless-ngx website. I read it multiple times. Unless I am blind and missed something important, by default the port 8000 is not exposed to localhost on the container’s host but to all. A nmap from outside shows this:
root@fw2:/var/log/suricata# nmap xx.xx.xx.xx
Starting Nmap 7.93 ( https://nmap.org ) at 2023-12-17 15:37 CET
Nmap scan report for xx.xx.xx.xx
Host is up (0.0016s latency).
Not shown: 996 closed tcp ports (reset)
PORT STATE SERVICE
22/tcp open ssh
80/tcp open http
443/tcp open https
8000/tcp open http-alt
MAC Address: xx:xx:xx:xx:xx:xx (Mathtech)
Nmap done: 1 IP address (1 host up) scanned in 0.36 seconds
Netstat also shows this:
tcp 0 0 0.0.0.0:8000 0.0.0.0:* LISTEN 861/docker-proxy
tcp6 0 0 :::8000 :::* LISTEN 867/docker-proxy
A wget/curl from outside to port 8000 also gives me the login page of paperless-ngx. Obviously, we don’t want this. We want that our NGINX is the only system which reaches paperless-ngx.
Edit the docker-compose.yml in /opt/paperless/paperless-ngx and add localhost to the line:
ports:
- "8000:8000"
below webserver: so that it looks like this:
ports:
- "127.0.0.1:8000:8000"
Then stop and start the docker environment and re-check using nmap / netstat…:
root@paperless:/opt/paperless/paperless-ngx# netstat -apn | grep :8000
tcp 0 0 127.0.0.1:8000 0.0.0.0:* LISTEN 3578/docker-proxy
Now your paperless-ngx will only be accessible through a reverse-proxy which you configure to use 127.0.0.1:8000.
SSL/TLS
I would suggest to use a rather strong TLS configuration. Anything your browser and devices you access paperless-ngx with allow. I added a tls.conf in /etc/nginx/conf.d/ with the following content:
#
# get a more up2date / better configuration from
# https://ssl-config.mozilla.org/
#
ssl_session_timeout 1d;
ssl_session_cache shared:SSL:10m;
ssl_session_tickets off;
ssl_protocols TLSv1.3;
ssl_prefer_server_ciphers off;
# if you want to disable TLSv1.2 also and your devices and browsers
# are modern enough for this, uncomment the following. However, mind
# that some of the online scan tools won't work to test your
# configuration because they simply do not support TLSv1.3, yet.
#ssl_ciphers ECDHE-ECDSA-AES256-GCM-SHA384;
#ssl_conf_command Ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256;
add_header Strict-Transport-Security "max-age=63072000" always;
# OCSP stapling
ssl_stapling on;
ssl_stapling_verify on;
# use your own resolver if possible.
resolver 127.0.0.1;
If you use certbot to handle your configuration files, you will maybe see that certbot also adds additional configuration to your site. Those settings are fine, but they’re more weak than the above settings. So you may want to always compare and maybe comment them in your site-configuration:
#include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
#ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
Talking about SSL/TLS Hardening you maybe want your letsencrypt certificate a bit stronger by for example getting an ECDSA certificate with a stronger curve algorithm:
certbot renew --key-type ecdsa --elliptic-curve secp384r1 --cert-name documents.example.com --force-renewal
Did you know, that you can check the key-type of your certificates using the command certbot certificates?
root@paperless:/etc/letsencrypt/live# certbot certificates | grep "Key Type"
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Key Type: ECDSA
Key Type: ECDSA
Headers
One thing you should be aware of is inheritance in NGINX when using add_header. Because the configuration snippet of the paperless-ngx documentation adds the Referrer-Policy:
add_header Referrer-Policy "strict-origin-when-cross-origin";
in the location / {} block of the specific site. NGINX usually inherits add_header from the parent declarations / blocks. With one exception: If there is (just) one add_header directive, you need to re-declare _all_ the headers in that block. Hence you also need to add the https strict transport header in that location, if you used my tls.conf:
add_header Referrer-Policy "strict-origin-when-cross-origin";
add_header Strict-Transport-Security "max-age=63072000" always;
While we’re talking about headers, you may also take a look at the following header.conf I’m using. Maybe you also want to use it – check the linked resources:
#
# securityheaders
#
# see: https://scotthelme.co.uk/hardening-your-http-response-headers/#x-frame-options
# https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Frame-Options
# https://infosec.mozilla.org/guidelines/web_security#x-frame-options
#add_header X-Frame-Options "SAMEORIGIN" always;
# see: https://scotthelme.co.uk/hardening-your-http-response-headers/#x-content-type-options
# https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Content-Type-Options
#add_header X-Content-Type-Options "nosniff" always;
# see: https://scotthelme.co.uk/a-new-security-header-referrer-policy/
# https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referrer-Policy
#add_header Referrer-Policy "no-referrer" always;
# see: https://scotthelme.co.uk/goodbye-feature-policy-and-hello-permissions-policy/
# https://github.com/w3c/webappsec-permissions-policy/blob/main/permissions-policy-explainer.md
# https://github.com/w3c/webappsec-permissions-policy/blob/main/features.md
add_header Permissions-Policy "camera=(), microphone=()" always;
# see: https://scotthelme.co.uk/a-new-security-header-feature-policy/
# https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Feature-Policy
# For compatibility reasons, including Feature-Policy (the former header for
# Permission-Policy, as well.
add_header Feature-Policy "microphone 'none'" always;
#
# mozilla observatory
#
# A setting of 0 disables this, and currently the observatory will reduce
# your points if you disable it. However, read the github issue - you want
# it disabled.
# see: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-XSS-Protection
# https://github.com/mozilla/http-observatory/issues/432
add_header X-XSS-Protection 0 always;
# see: https://scotthelme.co.uk/content-security-policy-an-introduction/
# https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP
# https://developers.google.com/web/fundamentals/security/csp
# you REALLY want to check what this is doing BEFORE using it. 2nd link.
# also you may want to add some.
#add_header Content-Security-Policy "frame-ancestors 'self'; base-uri 'self'; form-action 'self'" always;
Include these headers in your site configuration. For example like this:
add_header Strict-Transport-Security "max-age=63072000" always;
add_header Referrer-Policy "strict-origin-when-cross-origin";
include conf.d/headers.conf;
CSP
Now let’s talk about CSP. If you took a look at my headers.conf above you probably saw the following:
# see: https://scotthelme.co.uk/content-security-policy-an-introduction/
# https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP
# https://developers.google.com/web/fundamentals/security/csp
# you REALLY want to check what this is doing BEFORE using it. 2nd link.
# also you may want to add some.
#add_header Content-Security-Policy "frame-ancestors 'self'; base-uri 'self'; form-action 'self'" always;
This is CSP (the abbreviation for Content-Security-Policy). It allows to send a header which tells the browser what is allowed and disallowed. Assuming that someone somehow managed to inject some javascript inline into paperless-ngx this javascript would be blocked by your Browser if your CSP says that inline-javascript is not allowed.
Everything (as far as I know) in the CSP falls back to default-src. So if you set default-src to none you effectively block everything you did not explicitely allow. The other way around, if you set default-src to e.g. self and disallow everything you do not want, works as well.
Now for paperless-ngx, you can’t simply use default-src: self; and ignore everything else. The Web UI of paperless-ngx would at least give (currently) 23 errors (refused to load) due to inline-scripts and inline-styles. What worked for me was:
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; img-src data: 'self'; upgrade-insecure-requests" always;
You may as well use your developer console, check for the hashes all the inline elements have and add them to the CSP. But you’ll have to re-do this whenever the WebUI changes (might happen every update).
The paperless-ngx documentation shows a CSP in their apache2 example. You may try to adapt that one. And I strongly suggest when you work with this one, use the developer toolbar and the network tab of your favorite browser to verify that nothing important is blocked.
Robots.txt
This is a documentation system.. We really do not want search-engines to index anything here. Paperless-ngx correctly has the required html-tags which forbid indexing. However, the robots.txt is missing. Whether the robots.txt makes sense nowadays or not would be beyond this article. Crawlers may also ignore our wish to not index our page. But it would not hurt us to define a robots.txt. Here is an example how to do it in NGINX without dealing with files and such like:
location = /robots.txt {
add_header Content-Type text/plain;
return 200 "User-agent: AdsBot-Google\nUser-agent: *\nDisallow: /\n";
}
So the Site-Configuration may look like this:
server {
server_name documents.example.com;
add_header Strict-Transport-Security "max-age=63072000" always;
add_header Referrer-Policy "strict-origin-when-cross-origin";
include conf.d/headers.conf;
location = /robots.txt {
add_header Content-Type text/plain;
return 200 "User-agent: AdsBot-Google\nUser-agent: *\nDisallow: /\n";
}
location / {
# Adjust host and port as required.
proxy_pass http://localhost:8000/;
# These configuration options are required for WebSockets to work.
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Host $server_name;
}
listen 443 ssl;
ssl_certificate /etc/letsencrypt/live/documents.example.com/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/documents.example.com/privkey.pem; # managed by Certbot
ssl_trusted_certificate /etc/letsencrypt/live/documents.example.com/chain.pem;
#include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
#ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}
server {
if ($host = documents.example.com) {
return 301 https://$host$request_uri;
} # managed by Certbot
listen 80;
server_name documents.example.com;
return 404; # managed by Certbot
}
Paperless-ngx settings
There are a few more settings you can make to further secure your paperless-ngx installation. In /opt/paperless/paperless-ngx change the docker-compose.env and set a good cryptic secret key in:
PAPERLESS_SECRET_KEY=
Now set the URI of paperless using:
PAPERLESS_URL=https://documents.example.com
Finally set the IP of your NGINX reverse-proxy and 127.0.0.1 here:
PAPERLESS_TRUSTED_PROXIES=1.2.3.4,127.0.0.1
All communication to the reverse-proxy from outside is forced to be https. Hence I can set the following. However, check the documentation before using it.
PAPERLESS_PROXY_SSL_HEADER=["HTTP_X_FORWARDED_PROTO", "https"]
Since we use a reverse-proxy here, we should also set X-Forwarded-For accordingly:
PAPERLESS_USE_X_FORWARD_HOST=true
Further securing
The documentation shows how to use fail2ban for further securing the stack. I’d suggest you follow that.