Debugging Sidekiq workers OOM killed

We have a few Sidekiq workers that are handling our background jobs. Recently, one of them, the one that is operating on a default queue, started to get a lot of OOM kills.

Debugging Sidekiq workers OOM killed

We have a few Sidekiq workers that are handling our background jobs. Recently, one of them, the one that is operating on a default queue, started to get a lot of OOM kills.

Debugging jobs didn't show anything special. I moved few of them around to different queues to see if it is anything in particular but found nothing.

Ultimately, I found information that it might be related to how Ruby is handling memory allocation and that once allocated it never gives the memory back.

I installed sidekiq-prometheus to get some more data into our monitoring solution. This is what it showed in terms of allocated_objects metric. Seemed strange - it never goes down.

One of the fixes that I stumbled upon on the internet is to change the memory allocator to jemalloc.

Happy with a solution I went to implement it. Yet, it turned out that our docker image is based on alpine version of Ruby and alpine does not support jemalloc.

Fortunately, the internet had a solution for that as well.

After adding those lines to our Dockerfile Ruby started to use jemalloc as its memory allocator.

FROM ruby:2.7-alpine AS builder

RUN apk add build-base

RUN wget -O - https://github.com/jemalloc/jemalloc/releases/download/5.2.1/jemalloc-5.2.1.tar.bz2 | tar -xj && \
    cd jemalloc-5.2.1 && \
    ./configure && \
    make && \
    make install

FROM ruby:2.7-alpine

COPY --from=builder /usr/local/lib/libjemalloc.so.2 /usr/local/lib/
ENV LD_PRELOAD=/usr/local/lib/libjemalloc.so.2

You can then run MALLOC_CONF=stats_print:true ruby -e "exit"to check if it works in your container.

I didn't have time to deploy it yet, but I'll update the article to document if it worked.

EDIT 20.04.20201

Actually, it turned out that this is a job that does some work on many records. Optimizing resolved the issues with dying worker.

Still using jemalloc was beneficial because it reduced memory usage a bit.

Photo by Adrien Delforge on Unsplash