Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in DevOps and Agile by (29.3k points)

We develop bot with BotKit and now we try to solve a problem with minimal deployment downtime.

The server and Docker container is running on this server. Inside container run bot-app instance connected with RTM-server (Slack). When I start to deploy a new version (v2) of bot-app, I want to get zero downtime, users should not see "bot is offline".

timeline

Deploy script runs second docker container with a new version of bot-app. And bot-app connects to RTM-server too. In this way, there are a few seconds, when both apps run, connected to RTM-server and responds to user commands (and a user will see two answers to his command).

What optimal decision I can get if on the one hand we want to get zero downtime and on the other hand, we want to prevent the user interacts with the two instances at the same time?

Decision 1: To allow a small chance the likelihood of a collision, when both instances will respond to the user command.

Decision 2: Abandon the zero downtime deployment. In this case, deploy the script first stops the first docker-container, then start another one. The app will not respond to user commands, sent between stopping the current version of the app and fully starting of a new version of an app.

Decision 3: With an interact of parallel run current and new version of app or mutexes. General schematic: 1) Current version of the app is running 2) Deploy script starts a new version of app 3) I time when a new version of the app almost run and ready to connect to RTM-server, it sends to current version app command to close RTM-connection. 4) The current version of app closes RTM-connection 5) New version of app open RTM-connection

I think there are other good solutions.

How would you have solved this problem in your application?

1 Answer

0 votes
by (50.2k points)

As per the question, it clearly states that you don’t want to use bootkit. Then you need to use some sort of external way to signal that a given message has already been processed.

Let’s go with an example, using Redis, have the bot do the following command when a message comes in:

SET message:<message timestamp> 1 NX PX 30000

The NX option which means these commands will succeed only if the key doesn’t already exist.

Refer: https://redis.io/commands/set

The first instance of the bot that manages to execute and the other instance will fail. The bot should only process the message and respond if this command succeeded.

<PX 30000> will set you a 30-second expiration and this will let you do your zero-downtime upgrades by overlapping the running bot instances without having to worry about a message being processed twice.

Note: it’s still possible in this scheme for a message to be dropped altogether if a bot is shut down in a non-graceful way.

Related questions

Browse Categories

...