We have a few Sidekiq workers that are handling our background jobs. Recently, one of them, the one that is operating on a default queue, started to get a lot of OOM kills.
Debugging jobs didn't show anything special. I moved few of them around to different queues to see if it is anything in particular but found nothing.
Ultimately, I found information that it might be related to how Ruby is handling memory allocation and that once allocated it never gives the memory back.
I installed sidekiq-prometheus to get some more data into our monitoring solution. This is what it showed in terms of allocated_objects metric. Seemed strange - it never goes down.
One of the fixes that I stumbled upon on the internet is to change the memory allocator to jemalloc.
Happy with a solution I went to implement it. Yet, it turned out that our docker image is based on alpine version of Ruby and alpine does not support jemalloc.
Fortunately, the internet had a solution for that as well.
After adding those lines to our
Dockerfile Ruby started to use jemalloc as its memory allocator.
FROM ruby:2.7-alpine AS builder RUN apk add build-base RUN wget -O - https://github.com/jemalloc/jemalloc/releases/download/5.2.1/jemalloc-5.2.1.tar.bz2 | tar -xj && \ cd jemalloc-5.2.1 && \ ./configure && \ make && \ make install FROM ruby:2.7-alpine COPY --from=builder /usr/local/lib/libjemalloc.so.2 /usr/local/lib/ ENV LD_PRELOAD=/usr/local/lib/libjemalloc.so.2
You can then run
MALLOC_CONF=stats_print:true ruby -e "exit"to check if it works in your container.
I didn't have time to deploy it yet, but I'll update the article to document if it worked.
Actually, it turned out that this is a job that does some work on many records. Optimizing resolved the issues with dying worker.
Still using jemalloc was beneficial because it reduced memory usage a bit.