Massive formatting and added figcaption

author: Mitja Felicijan <mitja.felicijan@gmail.com> 2023-06-27 14:50:20 +0200
committer: Mitja Felicijan <mitja.felicijan@gmail.com> 2023-06-27 14:50:20 +0200
commit: 8697555125c57ae64a0c9b78514b4aac4fd523de (patch)
tree: a699df53a7c35a4425f30bca86982c4341f6de40 /content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md
parent: 33b2615a5038bc85036081e8b5e0da8584d88097 (diff)
download: mitjafelicijan.com-8697555125c57ae64a0c9b78514b4aac4fd523de.tar.gz
1 files changed, 37 insertions, 37 deletions
diff --git a/content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md b/content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md
index d0f4bac..bf1d710 100644
--- a/content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md
+++ b/content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md
@@ -5,33 +5,33 @@ date: 2020-03-29T12:00:00+02:00
 draft: false
 ---
-I've been using Elasticsearch in production for 5 years now and never had a 
+I've been using Elasticsearch in production for 5 years now and never had a
-single problem with it. Hell, never even known there could be a problem. Just 
+single problem with it. Hell, never even known there could be a problem. Just
 worked. All this time. The first node that I deployed is still being used in
 production, never updated, upgraded, touched in anyway.
-All this bliss came to an abrupt end this Friday when I got notification that 
+All this bliss came to an abrupt end this Friday when I got notification that
-Elasticsearch cluster went warm. Well, warm is not that bad right? Wrong! 
+Elasticsearch cluster went warm. Well, warm is not that bad right? Wrong!
-Quickly after that I got another email which sent chills down my spine. 
+Quickly after that I got another email which sent chills down my spine.  Cluster
-Cluster is now red. RED! Now, shit really hit the fan!
+is now red. RED! Now, shit really hit the fan!
-I tried googling what could be the problem and after executing allocation 
+I tried googling what could be the problem and after executing allocation
-function noticed that some shards were unassigned and 5 attempts were already 
+function noticed that some shards were unassigned and 5 attempts were already
-made (which is BTW to my luck the maximum) and that meant I am basically fucked. 
+made (which is BTW to my luck the maximum) and that meant I am basically fucked.
-They also applied that one should wait for cluster to re-balance itself. So, 
+They also applied that one should wait for cluster to re-balance itself. So, I
-I waited. One hour, two hours, several hours. Nothing, still RED.
+waited. One hour, two hours, several hours. Nothing, still RED.
-The strangest thing about it all was, that queries were still being fulfilled. 
+The strangest thing about it all was, that queries were still being fulfilled.
-Data was coming out. On the outside it looked like nothing was wrong but 
+Data was coming out. On the outside it looked like nothing was wrong but
-everybody that would look at the cluster would know immediately that something 
+everybody that would look at the cluster would know immediately that something
 was very very wrong and we were living on borrowed time here.
-> **Please, DO NOT do what I did.** Seriously! Please ask someone on official 
+> **Please, DO NOT do what I did.** Seriously! Please ask someone on official
-forums or if you know an expert please consult him. There could be million of 
+forums or if you know an expert please consult him. There could be million of
-reasons and these solution fit my problem. Maybe in your case it would 
+reasons and these solution fit my problem. Maybe in your case it would
-disastrous. I had all the data backed up and even if I would fail spectacularly 
+disastrous. I had all the data backed up and even if I would fail spectacularly
-I would be able to restore the data. It would be a huge pain and I would 
+I would be able to restore the data. It would be a huge pain and I would loose
-loose couple of days but I had a plan B.
+couple of days but I had a plan B.
 Executing allocation and told me what the problem was but no clear solution yet.
@@ -39,14 +39,14 @@ Executing allocation and told me what the problem was but no clear solution yet.
 GET /_cat/allocation?format=json
 ```
-I got a message that `ALLOCATION_FAILED` with additional info 
+I got a message that `ALLOCATION_FAILED` with additional info `failed to create
-`failed to create shard, failure ioexception[failed to obtain in-memory shard lock]`. 
+shard, failure ioexception[failed to obtain in-memory shard lock]`.  Well
-Well splendid! I must also say that our cluster is capable more than enough 
+splendid! I must also say that our cluster is capable more than enough to handle
-to handle the traffic. Also JVM memory pressure never was an issue. So what
+the traffic. Also JVM memory pressure never was an issue. So what happened
-happened really then?
+really then?
-I tried also re-routing failed ones with no success due to AWS restrictions 
+I tried also re-routing failed ones with no success due to AWS restrictions on
-on having managed Elasticsearch cluster (they lock some of the functions).
+having managed Elasticsearch cluster (they lock some of the functions).
 ```yaml
 POST /_cluster/reroute?retry_failed=true
@@ -60,10 +60,10 @@ I got a message that significantly reduced my options.
 }
 ```
-After that I went on a hunt again. I won't bother you with all the details 
+After that I went on a hunt again. I won't bother you with all the details
-because hours/days went by until I was finally able to re-index the problematic 
+because hours/days went by until I was finally able to re-index the problematic
-index and hoped for the best. Until that moment even re-indexing was giving 
+index and hoped for the best. Until that moment even re-indexing was giving me
-me errors.
+errors.
 ```yaml
 POST _reindex
@@ -77,8 +77,8 @@ POST _reindex
 }
 ```
-I needed to do this multiple times to get all the documents re-indexed. Then 
+I needed to do this multiple times to get all the documents re-indexed. Then I
-I dropped the original one with the following command.
+dropped the original one with the following command.
 ```yaml
 DELETE /myindex
@@ -98,10 +98,10 @@ POST _reindex
 }
 ```
-On the surface it looks like all is working but I have a long road in front 
+On the surface it looks like all is working but I have a long road in front of
-of me to get all the things working again. Cluster now shows that it is in 
+me to get all the things working again. Cluster now shows that it is in Green
-Green mode but I am also getting a notification that the cluster has 
+mode but I am also getting a notification that the cluster has processing status
-processing status which could mean million of things.
+which could mean million of things.
 Godspeed!
author	Mitja Felicijan <mitja.felicijan@gmail.com>	2023-06-27 14:50:20 +0200
committer	Mitja Felicijan <mitja.felicijan@gmail.com>	2023-06-27 14:50:20 +0200
commit	8697555125c57ae64a0c9b78514b4aac4fd523de (patch)
tree	a699df53a7c35a4425f30bca86982c4341f6de40 /content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md
parent	33b2615a5038bc85036081e8b5e0da8584d88097 (diff)
download	mitjafelicijan.com-8697555125c57ae64a0c9b78514b4aac4fd523de.tar.gz

diff --git a/content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md b/content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md index d0f4bac..bf1d710 100644 --- a/content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md +++ b/content/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md
@@ -5,33 +5,33 @@ date: 2020-03-29T12:00:00+02:00
5	draft: false	5	draft: false
6	---	6	---
7		7
8	I've been using Elasticsearch in production for 5 years now and never had a	8	I've been using Elasticsearch in production for 5 years now and never had a
9	single problem with it. Hell, never even known there could be a problem. Just	9	single problem with it. Hell, never even known there could be a problem. Just
10	worked. All this time. The first node that I deployed is still being used in	10	worked. All this time. The first node that I deployed is still being used in
11	production, never updated, upgraded, touched in anyway.	11	production, never updated, upgraded, touched in anyway.
12		12
13	All this bliss came to an abrupt end this Friday when I got notification that	13	All this bliss came to an abrupt end this Friday when I got notification that
14	Elasticsearch cluster went warm. Well, warm is not that bad right? Wrong!	14	Elasticsearch cluster went warm. Well, warm is not that bad right? Wrong!
15	Quickly after that I got another email which sent chills down my spine.	15	Quickly after that I got another email which sent chills down my spine. Cluster
16	Cluster is now red. RED! Now, shit really hit the fan!	16	is now red. RED! Now, shit really hit the fan!
17		17
18	I tried googling what could be the problem and after executing allocation	18	I tried googling what could be the problem and after executing allocation
19	function noticed that some shards were unassigned and 5 attempts were already	19	function noticed that some shards were unassigned and 5 attempts were already
20	made (which is BTW to my luck the maximum) and that meant I am basically fucked.	20	made (which is BTW to my luck the maximum) and that meant I am basically fucked.
21	They also applied that one should wait for cluster to re-balance itself. So,	21	They also applied that one should wait for cluster to re-balance itself. So, I
22	I waited. One hour, two hours, several hours. Nothing, still RED.	22	waited. One hour, two hours, several hours. Nothing, still RED.
23		23
24	The strangest thing about it all was, that queries were still being fulfilled.	24	The strangest thing about it all was, that queries were still being fulfilled.
25	Data was coming out. On the outside it looked like nothing was wrong but	25	Data was coming out. On the outside it looked like nothing was wrong but
26	everybody that would look at the cluster would know immediately that something	26	everybody that would look at the cluster would know immediately that something
27	was very very wrong and we were living on borrowed time here.	27	was very very wrong and we were living on borrowed time here.
28		28
29	> Please, DO NOT do what I did. Seriously! Please ask someone on official	29	> Please, DO NOT do what I did. Seriously! Please ask someone on official
30	forums or if you know an expert please consult him. There could be million of	30	forums or if you know an expert please consult him. There could be million of
31	reasons and these solution fit my problem. Maybe in your case it would	31	reasons and these solution fit my problem. Maybe in your case it would
32	disastrous. I had all the data backed up and even if I would fail spectacularly	32	disastrous. I had all the data backed up and even if I would fail spectacularly
33	I would be able to restore the data. It would be a huge pain and I would	33	I would be able to restore the data. It would be a huge pain and I would loose
34	loose couple of days but I had a plan B.	34	couple of days but I had a plan B.
35		35
36	Executing allocation and told me what the problem was but no clear solution yet.	36	Executing allocation and told me what the problem was but no clear solution yet.
37		37
@@ -39,14 +39,14 @@ Executing allocation and told me what the problem was but no clear solution yet.
39	GET /_cat/allocation?format=json	39	GET /_cat/allocation?format=json
40	```	40	```
41		41
42	I got a message that `ALLOCATION_FAILED` with additional info	42	I got a message that `ALLOCATION_FAILED` with additional info `failed to create
43	`failed to create shard, failure ioexception[failed to obtain in-memory shard lock]`.	43	shard, failure ioexception[failed to obtain in-memory shard lock]`. Well
44	Well splendid! I must also say that our cluster is capable more than enough	44	splendid! I must also say that our cluster is capable more than enough to handle
45	to handle the traffic. Also JVM memory pressure never was an issue. So what	45	the traffic. Also JVM memory pressure never was an issue. So what happened
46	happened really then?	46	really then?
47		47
48	I tried also re-routing failed ones with no success due to AWS restrictions	48	I tried also re-routing failed ones with no success due to AWS restrictions on
49	on having managed Elasticsearch cluster (they lock some of the functions).	49	having managed Elasticsearch cluster (they lock some of the functions).
50		50
51	```yaml	51	```yaml
52	POST /_cluster/reroute?retry_failed=true	52	POST /_cluster/reroute?retry_failed=true
@@ -60,10 +60,10 @@ I got a message that significantly reduced my options.
60	}	60	}
61	```	61	```
62		62
63	After that I went on a hunt again. I won't bother you with all the details	63	After that I went on a hunt again. I won't bother you with all the details
64	because hours/days went by until I was finally able to re-index the problematic	64	because hours/days went by until I was finally able to re-index the problematic
65	index and hoped for the best. Until that moment even re-indexing was giving	65	index and hoped for the best. Until that moment even re-indexing was giving me
66	me errors.	66	errors.
67		67
68	```yaml	68	```yaml
69	POST _reindex	69	POST _reindex
@@ -77,8 +77,8 @@ POST _reindex
77	}	77	}
78	```	78	```
79		79
80	I needed to do this multiple times to get all the documents re-indexed. Then	80	I needed to do this multiple times to get all the documents re-indexed. Then I
81	I dropped the original one with the following command.	81	dropped the original one with the following command.
82		82
83	```yaml	83	```yaml
84	DELETE /myindex	84	DELETE /myindex
@@ -98,10 +98,10 @@ POST _reindex
98	}	98	}
99	```	99	```
100		100
101	On the surface it looks like all is working but I have a long road in front	101	On the surface it looks like all is working but I have a long road in front of
102	of me to get all the things working again. Cluster now shows that it is in	102	me to get all the things working again. Cluster now shows that it is in Green
103	Green mode but I am also getting a notification that the cluster has	103	mode but I am also getting a notification that the cluster has processing status
104	processing status which could mean million of things.	104	which could mean million of things.
105		105
106	Godspeed!	106	Godspeed!
107		107