Rclone’s VFS Cache: A Deep Dive into Optimizing for a Local MinIO S3 Backend

I realized a critical detail about my setup: the standard vfs-cache strategy, which I picked previously, is a good starting point only if the cache’s performance is superior to the S3 backend. With this theory in mind, it was time to put it to the test.

In my case, the cache is a single rotating disk, while my MinIO S3 backend is powered by a multi-disk array. This means my cache is a potential performance bottleneck, potentially slower than a direct request to my MinIO cluster, which can leverage parallel I/O from its multiple drives.

This led me to re-evaluate my approach and configure the mounts without a read cache, instead tuning the other settings to let MinIO handle the load directly.

Optimized Settings for Read Performance (without a Read Cache)

I chose this approach to rely on MinIO’s parallel disk access, which is crucial for streaming large files. The settings below represent the best balance I found between efficiency and performance.

  • --vfs-cache-mode writes: I’ll only cache write operations. Read operations will stream directly from MinIO. This avoids my single-disk cache and fully utilizes MinIO’s multi-disk read speeds.
  • --vfs-read-chunk-size 64M: I’m loading large files in efficient chunks that leverage my fast internal network.
  • --vfs-read-ahead 64M: This is vital for smooth streaming. I’m telling Rclone to prefetch a large chunk of the next part of the audiobook, ensuring my player doesn’t buffer.
  • --vfs-read-chunk-size-limit 2G: I’ll allow Rclone to dynamically increase the chunk size to handle massive files without unnecessary overhead.
  • --vfs-read-chunk-streams 8: I’ll maintain 8 parallel streams to fully utilize the disk I/O of my MinIO setup.

With consistently small documents for Paperless-ngx, my goal is to fetch each file as quickly as possible without relying on a single-disk cache that lacks parallelization.

  • --vfs-cache-mode writes: I’m also only caching writes here. Reads will go directly to MinIO.
  • --vfs-read-chunk-size 512K: This is perfectly sized for my documents, ensuring most files are retrieved in a single request.
  • --vfs-read-ahead 0: No prefetching is needed for random document access, saving system resources.
  • --vfs-read-chunk-size-limit 0: I’ll disable dynamic sizing as my files are consistently small.
  • --vfs-read-chunk-streams 8: I’ll keep 8 parallel streams to handle multiple small document requests efficiently, for example, during indexing.

Test Methodology and Results

To compare the performance of both caching strategies, I ran a series of controlled tests using the dd command, which is perfect for measuring sequential read speed.

My Test Setup:

  • A test file of 864 MB from my Audiobookshelf bucket.
  • The Rclone mount was unmounted and the cache was cleared before each test to ensure a fair comparison.
  • The same dd command was used for all runs: dd if=/path/to/my/testfile of=/dev/null bs=1M

Test 1: VFS Cache Off (--vfs-cache-mode writes) This test measured the baseline performance of reading directly from my multi-disk MinIO setup.

  • Result: The average read speed was 56 MB/s.
  • Analysis: This speed represents the maximum performance I can achieve by leveraging MinIO’s parallel disk I/O. It confirms that the bottleneck is indeed my disk speed and not the network.

Test 2: VFS Cache On (--vfs-cache-mode full) This test had two parts: the initial read and the subsequent cached read.

  • Result (First Run – Reading from MinIO, writing to cache): The average read speed was 30.76 MB/s.
  • Analysis: As expected, this speed was significantly slower than the “cache off” run. The single-disk cache became a bottleneck, limiting the speed as data had to be written to disk while it was being read from MinIO.
  • Result (Second Run – Reading from local cache): The average read speed was 82.52 MB/s.
  • Analysis: This speed was close to the raw read speed of my single cache disk. While it’s very fast, it shows that the single disk limits performance compared to what MinIO can provide on a first read.

The Final Verdict: Why a Cache Was the Right Choice All Along

After extensive testing and a deep dive into my system’s architecture, I’ve come to a definitive conclusion: the local cache, despite my initial skepticism, is the most performant solution for my specific setup.

My initial assumption was that MinIO’s multi-disk setup would be inherently faster than a single-disk cache. The real-world tests, however, told a different story. The “no-cache” approach, which streamed data directly from my MinIO cluster, only achieved a speed of around 56 MB/s. In contrast, the second read from my local HDD cache consistently hit 82.52 MB/s.

This result shed light on my MinIO instance’s internal architecture. While my MinIO backend has four disks, my tests showed a highly imbalanced load. It appears that my MinIO configuration prioritizes reading from a limited (2) number of disks to serve a single request, while the other drives handle background tasks related to cluster integrity (?). This turns the multi-disk array into a speed bottleneck for my specific read-heavy use case, despite the overall number of disks.

The Path Forward: Scaling for Performance

This conclusion led me to a final thought experiment. Could a different MinIO architecture make the “no-cache” approach viable? The answer is yes, but it requires scaling the hardware.

My tests revealed that my MinIO setup delivers a combined read performance of 56 MB/s. Given that each of my single disks can achieve a raw read speed of around 80 MB/s, and my MinIO setup is reading from only two of them (simplified), the theoretical maximum speed should have been 160 MB/s.

This means my MinIO configuration is operating at an efficiency of only 35% (56 MB/s observed / 160 MB/s theoretical).

A Note on Efficiency: The efficiency percentage in this article is a simplified model, not a scientifically precise measurement. It serves as a practical indicator to help explain the performance differences between a complex, highly-redundant system and a straightforward local cache. It makes it tangible why a simple cache on a hard drive, which can deliver 100% of its theoretical performance, is faster than a complex, distributed system that can only utilize 35% of its ideal capacity under load.

Based on MinIO’s automatic scaling, adding more disks to the cluster should increase the number of parallel reads. For example, by increasing my setup to six drives, MinIO would automatically switch to an (EC:3) scheme, reading from three disks at once.

This should increase my expected speed to around 84 MB/s (3 disks x 80 MB/s (raw speed) x 0.35 (observed efficiency)). While this would be a slight improvement over my current 82.52 MB/s cache, it’s clear that the easiest and most effective way to guarantee peak performance is to use a dedicated SSD as a cache. It provides the high speed and high IOPS needed to outperform the complexities and bottlenecks of a distributed storage system, without requiring a change in the core architecture.

Important Note: A Homelab Scenario

These test results and conclusions are derived from a specific home lab environment and are not directly comparable to a professional cloud or data center setup. In my case, I have a limited number of rotating drives, which, when combined with MinIO’s distributed architecture, results in a performance bottleneck on the S3 side.

However, the methodology of challenging assumptions with real-world tests and data-driven analysis is universally applicable. The key takeaway for any similar setup is to measure the performance of each component of your stack to identify bottlenecks, rather than relying on theoretical assumptions.

Fun Fact: I used Gemini

I enjoy challenging my thoughts and configurations with my digital co-pilot, Gemini. To give you a glimpse into the process, here are some stats from our conversation:

  • Total Turns: The entire conversation consisted of 76 inputs and outputs from Gemini and me.
  • Words Written: I wrote a total of 2,584 words, while Gemini contributed 4,275 words.
  • Word Ratio: Gemini wrote approximately 1.65 times more words than I did.
  • Calculations Performed: During our discussion, we performed 12 distinct calculations to analyze hardware performance, efficiency percentages, and theoretical scaling.
  • Corrections and Adjustments Made by Me: I asked for corrections or adjustments 8 times, which led to a more precise and accurate answer according to Gemini.
  • Complete Re-evaluations: My new information and test results led Gemini to completely re-evaluate the original assumptions 4 times. Each of these moments was a turning point in the discussion, according to Gemini.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.