aboutsummaryrefslogtreecommitdiff
path: root/public/the-strange-case-of-elasticsearch-allocation-failure.html
diff options
context:
space:
mode:
Diffstat (limited to 'public/the-strange-case-of-elasticsearch-allocation-failure.html')
-rwxr-xr-xpublic/the-strange-case-of-elasticsearch-allocation-failure.html64
1 files changed, 64 insertions, 0 deletions
diff --git a/public/the-strange-case-of-elasticsearch-allocation-failure.html b/public/the-strange-case-of-elasticsearch-allocation-failure.html
new file mode 100755
index 0000000..fdb32a0
--- /dev/null
+++ b/public/the-strange-case-of-elasticsearch-allocation-failure.html
@@ -0,0 +1,64 @@
1<!doctype html><html lang=en-us><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><link href="data:image/x-icon;base64,AAABAAEAEBAAAAEAIABoBAAAFgAAACgAAAAQAAAAIAAAAAEAIAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL69vf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv76+/8LBwQkAAAAAAAAAAAAAAAC+vb3/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL+9vf/Bv78JAAAAAAAAAAAAAAAAu7q6/wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7ubr/vr29CAAAAAAAAAAAy8nJAZ6foP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAnqGj/6GipAoAAAAAHLjU/xcXHf/BwsL/I8XY/yPK3v8XGiD/IbjL/yPF2f8XGiD/Fxkf/yLF2f8gnK3/Fxog/62ztv8fwNf/FRcd/x271v8mz93/GRsi/xkXHf8p097/GiIp/xobIv8p0t3/KdPe/xocIv8fYmr/KNPe/xoZH/8aHCL/J87c/xy81/8VFxz/IsPZ/8zS0/8XGiD/Ir/R/yPH2/8XGiD/Fxkf/yPH2/8dd4T/GBog/yPJ3f8jyNr/uru9/xcUGv8cudb/EhITDKi5vRKlvMP/RUpOERwcHRAdOj4QHTk8EBwdHRAdNTgQHTo/EBwcHRAcHB0QSGduEKW4vf+koqQfHzg+EBqz0ewSFRv7EyMr/xq51vsTERb7ExUb+xq41fsau9j7ExUb+xiPp/sZudb7ExUb+xMVG/sZuNX/GKvI/BIUGfMdvdn/IrfL/xcaIP8n1eb/J9Dh/xkcIf8ZGR7/J8/f/xxCSv8ZGyH/J9Dg/ybQ4P8ZHCL/FSQs/yPK3/8UExj/GE1b/ybS5P8ZGB7/Ghwj/ynW5P8p2Ob/Ghwi/yWrtv8p1eH/Ghwi/xocIv8p1uT/J8XT/xkcIv8m1un/Hb7d/xUYH/8hzOr/HtHu/xcaIf8XGB//I8vi/xgxOv8XGSD/I8rg/yPK4P8XGiD/GUFL/yPP6f8SERj/Fhkh/x3A4f8AAAAAJ2f9/ydr//8mZPH/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlYu38J2v//ydo/f8AAAAAAAAAAAd8/fkFqf//Iob8sAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMY39awWr//8FfP3/AAAAAAAAAAAFm/7/SfD//wR+/f8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOB/f9B7v//BaX+/wAAAAAAAAAAQ878SAyZ/v9n1v4KAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADu9v8DDJb+/z3N/XgAAAAA3/sAAN/7AADf+wAA3/sAAAAAAAAAAAAAAAAAAN/7AAAAAAAAAAAAAAAAAAAAAAAAj/EAAI/5AACP8QAA3/sAAA==" rel=icon type=image/x-icon><title>The strange case of Elasticsearch allocation failure</title><meta name=description content="I&amp;#39;ve been using Elasticsearch in production for 5 years now and never had asingle problem with it."><link rel=alternate type=application/rss+xml title="Mitja Felicijan's posts" href=https://mitjafelicijan.com/index.xml><link rel=alternate type=application/rss+xml title="Mitja Felicijan's notes" href=https://mitjafelicijan.com/notes.xml><style>body{padding:1rem;max-width:760px;background:#fff;font-family:times new roman,Times,serif;line-height:1.35rem}hr{margin-block-start:1.5rem}h1,h2,h3{line-height:initial}footer{margin-block-start:3rem}table{max-width:100%;border-collapse:separate;border-spacing:2px;border:1px solid #000;border-left:1px solid #999;border-top:1px solid #999}blockquote{font-style:italic}table thead{background:#eee}td,th{border:1px solid #000;padding:4px;border-right:1px solid #999;border-bottom:1px solid #999;text-align:left}pre{text-wrap:nowrap;overflow-x:auto;margin-block-start:1.5rem;margin-block-end:1.5rem;padding:.5rem 0;border-top:1px solid #000;border-bottom:1px solid #000}pre code{line-height:1.3em}pre,code,pre *,code *{font-family:monospace;font-size:initial!important}img,video,audio{max-width:100%}header{display:flex;flex-direction:row;gap:3rem}nav{display:flex;gap:.75rem}.pstatus-orange{background:gold}.pstatus-green{background:#9acd32}.pstatus-red{background:#cd5c5c}@media only screen and (max-width:600px){header{flex-direction:column;gap:1rem}a{word-wrap:break-word}}</style><header><nav class=main><a href=/>Home</a>
2<a href=https://git.mitjafelicijan.com/ target=_blank>Git</a>
3<a href=https://files.mitjafelicijan.com/ target=_blank>Files</a>
4<a href=/mitjafelicijan.pgp.pub.txt target=_blank>PGP</a>
5<a href=/curriculum-vitae.html>CV</a>
6<a href=/index.xml target=_blank>RSS</a></nav></header><main><div><h1>The strange case of Elasticsearch allocation failure</h1><p>Mar 29, 2020<div><p>I've been using Elasticsearch in production for 5 years now and never had a
7single problem with it. Hell, never even known there could be a problem. Just
8worked. All this time. The first node that I deployed is still being used in
9production, never updated, upgraded, touched in anyway.<p>All this bliss came to an abrupt end this Friday when I got notification that
10Elasticsearch cluster went warm. Well, warm is not that bad right? Wrong!
11Quickly after that I got another email which sent chills down my spine. Cluster
12is now red. RED! Now, shit really hit the fan!<p>I tried googling what could be the problem and after executing allocation
13function noticed that some shards were unassigned and 5 attempts were already
14made (which is BTW to my luck the maximum) and that meant I am basically fucked.
15They also applied that one should wait for cluster to re-balance itself. So, I
16waited. One hour, two hours, several hours. Nothing, still RED.<p>The strangest thing about it all was, that queries were still being fulfilled.
17Data was coming out. On the outside it looked like nothing was wrong but
18everybody that would look at the cluster would know immediately that something
19was very very wrong and we were living on borrowed time here.<blockquote><p><strong>Please, DO NOT do what I did.</strong> Seriously! Please ask someone on official
20forums or if you know an expert please consult him. There could be million of
21reasons and these solution fit my problem. Maybe in your case it would
22disastrous. I had all the data backed up and even if I would fail spectacularly
23I would be able to restore the data. It would be a huge pain and I would loose
24couple of days but I had a plan B.</blockquote><p>Executing allocation and told me what the problem was but no clear solution yet.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>GET /_cat/allocation?format=json
25</span></span></code></pre><p>I got a message that <code>ALLOCATION_FAILED</code> with additional info <code>failed to create shard, failure ioexception[failed to obtain in-memory shard lock]</code>. Well
26splendid! I must also say that our cluster is capable more than enough to handle
27the traffic. Also JVM memory pressure never was an issue. So what happened
28really then?<p>I tried also re-routing failed ones with no success due to AWS restrictions on
29having managed Elasticsearch cluster (they lock some of the functions).<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>POST /_cluster/reroute?retry_failed=true
30</span></span></code></pre><p>I got a message that significantly reduced my options.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>{
31</span></span><span style=display:flex><span> &#34;Message&#34;: <span style=color:#a31515>&#34;Your request: &#39;/_cluster/reroute&#39; is not allowed.&#34;</span>
32</span></span><span style=display:flex><span>}
33</span></span></code></pre><p>After that I went on a hunt again. I won't bother you with all the details
34because hours/days went by until I was finally able to re-index the problematic
35index and hoped for the best. Until that moment even re-indexing was giving me
36errors.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>POST _reindex
37</span></span><span style=display:flex><span>{
38</span></span><span style=display:flex><span> &#34;source&#34;: {
39</span></span><span style=display:flex><span> &#34;index&#34;: <span style=color:#a31515>&#34;myindex&#34;</span>
40</span></span><span style=display:flex><span> },
41</span></span><span style=display:flex><span> &#34;dest&#34;: {
42</span></span><span style=display:flex><span> &#34;index&#34;: <span style=color:#a31515>&#34;myindex-new&#34;</span>
43</span></span><span style=display:flex><span> }
44</span></span><span style=display:flex><span>}
45</span></span></code></pre><p>I needed to do this multiple times to get all the documents re-indexed. Then I
46dropped the original one with the following command.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>DELETE /myindex
47</span></span></code></pre><p>And re-indexed again new one in the original one (well by name only).<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>POST _reindex
48</span></span><span style=display:flex><span>{
49</span></span><span style=display:flex><span> &#34;source&#34;: {
50</span></span><span style=display:flex><span> &#34;index&#34;: <span style=color:#a31515>&#34;myindex-new&#34;</span>
51</span></span><span style=display:flex><span> },
52</span></span><span style=display:flex><span> &#34;dest&#34;: {
53</span></span><span style=display:flex><span> &#34;index&#34;: <span style=color:#a31515>&#34;myindex&#34;</span>
54</span></span><span style=display:flex><span> }
55</span></span><span style=display:flex><span>}
56</span></span></code></pre><p>On the surface it looks like all is working but I have a long road in front of
57me to get all the things working again. Cluster now shows that it is in Green
58mode but I am also getting a notification that the cluster has processing status
59which could mean million of things.<p>Godspeed!</div></div></main><footer><hr><div><h3>Want to comment or have something to add?</h3>You can write me an email at
60<a href=mailto:m@mitjafelicijan.com>m@mitjafelicijan.com</a> or catch up
61with me
62<a href=https://telegram.me/mitjafelicijan target=_blank>on Telegram</a>.</div><hr><p>This website does not track you. Content is made available under
63the <a href=https://creativecommons.org/licenses/by/4.0/ target=_blank rel=noreferrer>CC BY 4.0 license</a> unless specified
64otherwise. Blog feed is available as <a href=/index.xml target=_blank>RSS feed</a>.</footer><script src=https://cdn.usefathom.com/script.js data-site=XHQARKXP defer></script> \ No newline at end of file