From 2417a6b7603524dc5cd30d29b153f91024b9443d Mon Sep 17 00:00:00 2001
From: Mitja Felicijan <mitja.felicijan@gmail.com>
Date: Wed, 1 Nov 2023 22:54:27 +0100
Subject: Move to Jekyll

---
 ...nge-case-of-elasticsearch-allocation-failure.md | 108 ---------------------
 1 file changed, 108 deletions(-)
 delete mode 100644 content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md

(limited to 'content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md')

diff --git a/content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md b/content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md
deleted file mode 100644
index efe88fa..0000000
--- a/content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md
+++ /dev/null
@@ -1,108 +0,0 @@
----
-title: The strange case of Elasticsearch allocation failure
-url: the-strange-case-of-elasticsearch-allocation-failure.html
-date: 2020-03-29T12:00:00+02:00
-type: post
-draft: false
----
-
-I've been using Elasticsearch in production for 5 years now and never had a
-single problem with it. Hell, never even known there could be a problem. Just
-worked. All this time. The first node that I deployed is still being used in
-production, never updated, upgraded, touched in anyway.
-
-All this bliss came to an abrupt end this Friday when I got notification that
-Elasticsearch cluster went warm. Well, warm is not that bad right? Wrong!
-Quickly after that I got another email which sent chills down my spine.  Cluster
-is now red. RED! Now, shit really hit the fan!
-
-I tried googling what could be the problem and after executing allocation
-function noticed that some shards were unassigned and 5 attempts were already
-made (which is BTW to my luck the maximum) and that meant I am basically fucked.
-They also applied that one should wait for cluster to re-balance itself. So, I
-waited. One hour, two hours, several hours. Nothing, still RED.
-
-The strangest thing about it all was, that queries were still being fulfilled.
-Data was coming out. On the outside it looked like nothing was wrong but
-everybody that would look at the cluster would know immediately that something
-was very very wrong and we were living on borrowed time here.
-
-> **Please, DO NOT do what I did.** Seriously! Please ask someone on official
-forums or if you know an expert please consult him. There could be million of
-reasons and these solution fit my problem. Maybe in your case it would
-disastrous. I had all the data backed up and even if I would fail spectacularly
-I would be able to restore the data. It would be a huge pain and I would loose
-couple of days but I had a plan B.
-
-Executing allocation and told me what the problem was but no clear solution yet.
-
-```yaml
-GET /_cat/allocation?format=json
-```
-
-I got a message that `ALLOCATION_FAILED` with additional info `failed to create
-shard, failure ioexception[failed to obtain in-memory shard lock]`.  Well
-splendid! I must also say that our cluster is capable more than enough to handle
-the traffic. Also JVM memory pressure never was an issue. So what happened
-really then?
-
-I tried also re-routing failed ones with no success due to AWS restrictions on
-having managed Elasticsearch cluster (they lock some of the functions).
-
-```yaml
-POST /_cluster/reroute?retry_failed=true
-```
-
-I got a message that significantly reduced my options.
-
-```json
-{
-  "Message": "Your request: '/_cluster/reroute' is not allowed."
-}
-```
-
-After that I went on a hunt again. I won't bother you with all the details
-because hours/days went by until I was finally able to re-index the problematic
-index and hoped for the best. Until that moment even re-indexing was giving me
-errors.
-
-```yaml
-POST _reindex
-{
-  "source": {
-    "index": "myindex"
-  },
-  "dest": {
-    "index": "myindex-new"
-  }
-}
-```
-
-I needed to do this multiple times to get all the documents re-indexed. Then I
-dropped the original one with the following command.
-
-```yaml
-DELETE /myindex
-```
-
-And re-indexed again new one in the original one (well by name only).
-
-```yaml
-POST _reindex
-{
-  "source": {
-    "index": "myindex-new"
-  },
-  "dest": {
-    "index": "myindex"
-  }
-}
-```
-
-On the surface it looks like all is working but I have a long road in front of
-me to get all the things working again. Cluster now shows that it is in Green
-mode but I am also getting a notification that the cluster has processing status
-which could mean million of things.
-
-Godspeed!
-
-- 
cgit v1.2.3