
Introduction
I recently ran into a weird error in one of our services that uses socket.io with a Redis adapter. The problem was that whenever a Redis cluster node in our k8s environment restarted, socket.io would just give up and disconnect instead of trying to reconnect. This ended up causing pods to crash and a whole mess of issues.
Now, I’ve always been against “vibe coding”. But a good friend recently convinced me to give it a real try. So I did.
Here’s a look at the “vibe-coded” solution from an AI assistant versus the actual fix that finally worked.
The Vibe-Coded Solution
So, I gave vibe-coding a real shot. I fed the AI all the details: error logs, my code, everything. It gave me the below solution, and here’s the gist of it.
1 | import { Cluster } from 'redis'; |
The AI’s idea was to use a reconnectStrategy (which is actually the redis client’s default behavior anyway) and then implement an ensureConnection method. The plan was to call this method before every single Redis operation.
This had two huge problems right off the bat:
It added tons of complexity. Sprinkling ensureConnection before every call would make the code a nightmare to maintain.
It was literally impossible. I can’t just go into the socket.io library’s underlying code and force it to call my method before it makes its own Redis calls! The idea is almost funny in hindsight.
So, this wasn’t just a bad solution—it was a dead end. It was never going to fix the bug.
That was my cue to stop being lazy. I closed the AI chat and finally did what I should have done from the start: I actually read the socket.io documentation to find the real solution.
The Real Solution
I finally ditched the AI and started digging through the actual socket.io documentation, specifically the adapters section. And that’s where I found the golden ticket.
Turns out, there’s a feature called “Connection state recovery.” The classic Redis adapter doesn’t support it, but the newer Redis Streams adapter does. The docs even call it out:
unlike the adapter based on Redis PUB/SUB mechanism, this adapter will properly handle any temporary disconnection to the Redis server and resume the stream
Redis Adapter:
Redis Stream Adapter:
Boom. Just by switching from redis-adapter to redis-stream-adapter, the main temporary disconnect issue was fixed.
But I didn’t stop there. I dug a little deeper into the socket.io source code and found a small catch. Some operations, like the fetchSockets()
method, need Redis to be connected at that exact moment because they interact with all the other service pods. If it fails, it throws an error—even with the new streams adapter.
This isn’t mentioned in the docs; you only find it by looking at the code. So, I just added a simple wrapper to retry the fetchSockets()
method a couple of times if it failed.
And that was it. With some deep knowledge, reading the official docs, and peeking at the source code, I fixed the bug for good. No unnecessary complexity, just a solid understanding of the tools I was using. That’s the real power that “vibe coding” can never give you.