diff options
Diffstat (limited to 'public/the-strange-case-of-elasticsearch-allocation-failure.html')
| -rwxr-xr-x | public/the-strange-case-of-elasticsearch-allocation-failure.html | 64 |
1 files changed, 64 insertions, 0 deletions
diff --git a/public/the-strange-case-of-elasticsearch-allocation-failure.html b/public/the-strange-case-of-elasticsearch-allocation-failure.html new file mode 100755 index 0000000..fdb32a0 --- /dev/null +++ b/public/the-strange-case-of-elasticsearch-allocation-failure.html | |||
| @@ -0,0 +1,64 @@ | |||
| 1 | <!doctype html><html lang=en-us><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><link href="data:image/x-icon;base64,AAABAAEAEBAAAAEAIABoBAAAFgAAACgAAAAQAAAAIAAAAAEAIAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL69vf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv76+/8LBwQkAAAAAAAAAAAAAAAC+vb3/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL+9vf/Bv78JAAAAAAAAAAAAAAAAu7q6/wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7ubr/vr29CAAAAAAAAAAAy8nJAZ6foP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAnqGj/6GipAoAAAAAHLjU/xcXHf/BwsL/I8XY/yPK3v8XGiD/IbjL/yPF2f8XGiD/Fxkf/yLF2f8gnK3/Fxog/62ztv8fwNf/FRcd/x271v8mz93/GRsi/xkXHf8p097/GiIp/xobIv8p0t3/KdPe/xocIv8fYmr/KNPe/xoZH/8aHCL/J87c/xy81/8VFxz/IsPZ/8zS0/8XGiD/Ir/R/yPH2/8XGiD/Fxkf/yPH2/8dd4T/GBog/yPJ3f8jyNr/uru9/xcUGv8cudb/EhITDKi5vRKlvMP/RUpOERwcHRAdOj4QHTk8EBwdHRAdNTgQHTo/EBwcHRAcHB0QSGduEKW4vf+koqQfHzg+EBqz0ewSFRv7EyMr/xq51vsTERb7ExUb+xq41fsau9j7ExUb+xiPp/sZudb7ExUb+xMVG/sZuNX/GKvI/BIUGfMdvdn/IrfL/xcaIP8n1eb/J9Dh/xkcIf8ZGR7/J8/f/xxCSv8ZGyH/J9Dg/ybQ4P8ZHCL/FSQs/yPK3/8UExj/GE1b/ybS5P8ZGB7/Ghwj/ynW5P8p2Ob/Ghwi/yWrtv8p1eH/Ghwi/xocIv8p1uT/J8XT/xkcIv8m1un/Hb7d/xUYH/8hzOr/HtHu/xcaIf8XGB//I8vi/xgxOv8XGSD/I8rg/yPK4P8XGiD/GUFL/yPP6f8SERj/Fhkh/x3A4f8AAAAAJ2f9/ydr//8mZPH/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlYu38J2v//ydo/f8AAAAAAAAAAAd8/fkFqf//Iob8sAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMY39awWr//8FfP3/AAAAAAAAAAAFm/7/SfD//wR+/f8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOB/f9B7v//BaX+/wAAAAAAAAAAQ878SAyZ/v9n1v4KAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADu9v8DDJb+/z3N/XgAAAAA3/sAAN/7AADf+wAA3/sAAAAAAAAAAAAAAAAAAN/7AAAAAAAAAAAAAAAAAAAAAAAAj/EAAI/5AACP8QAA3/sAAA==" rel=icon type=image/x-icon><title>The strange case of Elasticsearch allocation failure</title><meta name=description content="I&#39;ve been using Elasticsearch in production for 5 years now and never had asingle problem with it."><link rel=alternate type=application/rss+xml title="Mitja Felicijan's posts" href=https://mitjafelicijan.com/index.xml><link rel=alternate type=application/rss+xml title="Mitja Felicijan's notes" href=https://mitjafelicijan.com/notes.xml><style>body{padding:1rem;max-width:760px;background:#fff;font-family:times new roman,Times,serif;line-height:1.35rem}hr{margin-block-start:1.5rem}h1,h2,h3{line-height:initial}footer{margin-block-start:3rem}table{max-width:100%;border-collapse:separate;border-spacing:2px;border:1px solid #000;border-left:1px solid #999;border-top:1px solid #999}blockquote{font-style:italic}table thead{background:#eee}td,th{border:1px solid #000;padding:4px;border-right:1px solid #999;border-bottom:1px solid #999;text-align:left}pre{text-wrap:nowrap;overflow-x:auto;margin-block-start:1.5rem;margin-block-end:1.5rem;padding:.5rem 0;border-top:1px solid #000;border-bottom:1px solid #000}pre code{line-height:1.3em}pre,code,pre *,code *{font-family:monospace;font-size:initial!important}img,video,audio{max-width:100%}header{display:flex;flex-direction:row;gap:3rem}nav{display:flex;gap:.75rem}.pstatus-orange{background:gold}.pstatus-green{background:#9acd32}.pstatus-red{background:#cd5c5c}@media only screen and (max-width:600px){header{flex-direction:column;gap:1rem}a{word-wrap:break-word}}</style><header><nav class=main><a href=/>Home</a> | ||
| 2 | <a href=https://git.mitjafelicijan.com/ target=_blank>Git</a> | ||
| 3 | <a href=https://files.mitjafelicijan.com/ target=_blank>Files</a> | ||
| 4 | <a href=/mitjafelicijan.pgp.pub.txt target=_blank>PGP</a> | ||
| 5 | <a href=/curriculum-vitae.html>CV</a> | ||
| 6 | <a href=/index.xml target=_blank>RSS</a></nav></header><main><div><h1>The strange case of Elasticsearch allocation failure</h1><p>Mar 29, 2020<div><p>I've been using Elasticsearch in production for 5 years now and never had a | ||
| 7 | single problem with it. Hell, never even known there could be a problem. Just | ||
| 8 | worked. All this time. The first node that I deployed is still being used in | ||
| 9 | production, never updated, upgraded, touched in anyway.<p>All this bliss came to an abrupt end this Friday when I got notification that | ||
| 10 | Elasticsearch cluster went warm. Well, warm is not that bad right? Wrong! | ||
| 11 | Quickly after that I got another email which sent chills down my spine. Cluster | ||
| 12 | is now red. RED! Now, shit really hit the fan!<p>I tried googling what could be the problem and after executing allocation | ||
| 13 | function noticed that some shards were unassigned and 5 attempts were already | ||
| 14 | made (which is BTW to my luck the maximum) and that meant I am basically fucked. | ||
| 15 | They also applied that one should wait for cluster to re-balance itself. So, I | ||
| 16 | waited. One hour, two hours, several hours. Nothing, still RED.<p>The strangest thing about it all was, that queries were still being fulfilled. | ||
| 17 | Data was coming out. On the outside it looked like nothing was wrong but | ||
| 18 | everybody that would look at the cluster would know immediately that something | ||
| 19 | was very very wrong and we were living on borrowed time here.<blockquote><p><strong>Please, DO NOT do what I did.</strong> Seriously! Please ask someone on official | ||
| 20 | forums or if you know an expert please consult him. There could be million of | ||
| 21 | reasons and these solution fit my problem. Maybe in your case it would | ||
| 22 | disastrous. I had all the data backed up and even if I would fail spectacularly | ||
| 23 | I would be able to restore the data. It would be a huge pain and I would loose | ||
| 24 | couple of days but I had a plan B.</blockquote><p>Executing allocation and told me what the problem was but no clear solution yet.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>GET /_cat/allocation?format=json | ||
| 25 | </span></span></code></pre><p>I got a message that <code>ALLOCATION_FAILED</code> with additional info <code>failed to create shard, failure ioexception[failed to obtain in-memory shard lock]</code>. Well | ||
| 26 | splendid! I must also say that our cluster is capable more than enough to handle | ||
| 27 | the traffic. Also JVM memory pressure never was an issue. So what happened | ||
| 28 | really then?<p>I tried also re-routing failed ones with no success due to AWS restrictions on | ||
| 29 | having managed Elasticsearch cluster (they lock some of the functions).<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>POST /_cluster/reroute?retry_failed=true | ||
| 30 | </span></span></code></pre><p>I got a message that significantly reduced my options.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>{ | ||
| 31 | </span></span><span style=display:flex><span> "Message": <span style=color:#a31515>"Your request: '/_cluster/reroute' is not allowed."</span> | ||
| 32 | </span></span><span style=display:flex><span>} | ||
| 33 | </span></span></code></pre><p>After that I went on a hunt again. I won't bother you with all the details | ||
| 34 | because hours/days went by until I was finally able to re-index the problematic | ||
| 35 | index and hoped for the best. Until that moment even re-indexing was giving me | ||
| 36 | errors.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>POST _reindex | ||
| 37 | </span></span><span style=display:flex><span>{ | ||
| 38 | </span></span><span style=display:flex><span> "source": { | ||
| 39 | </span></span><span style=display:flex><span> "index": <span style=color:#a31515>"myindex"</span> | ||
| 40 | </span></span><span style=display:flex><span> }, | ||
| 41 | </span></span><span style=display:flex><span> "dest": { | ||
| 42 | </span></span><span style=display:flex><span> "index": <span style=color:#a31515>"myindex-new"</span> | ||
| 43 | </span></span><span style=display:flex><span> } | ||
| 44 | </span></span><span style=display:flex><span>} | ||
| 45 | </span></span></code></pre><p>I needed to do this multiple times to get all the documents re-indexed. Then I | ||
| 46 | dropped the original one with the following command.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>DELETE /myindex | ||
| 47 | </span></span></code></pre><p>And re-indexed again new one in the original one (well by name only).<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>POST _reindex | ||
| 48 | </span></span><span style=display:flex><span>{ | ||
| 49 | </span></span><span style=display:flex><span> "source": { | ||
| 50 | </span></span><span style=display:flex><span> "index": <span style=color:#a31515>"myindex-new"</span> | ||
| 51 | </span></span><span style=display:flex><span> }, | ||
| 52 | </span></span><span style=display:flex><span> "dest": { | ||
| 53 | </span></span><span style=display:flex><span> "index": <span style=color:#a31515>"myindex"</span> | ||
| 54 | </span></span><span style=display:flex><span> } | ||
| 55 | </span></span><span style=display:flex><span>} | ||
| 56 | </span></span></code></pre><p>On the surface it looks like all is working but I have a long road in front of | ||
| 57 | me to get all the things working again. Cluster now shows that it is in Green | ||
| 58 | mode but I am also getting a notification that the cluster has processing status | ||
| 59 | which could mean million of things.<p>Godspeed!</div></div></main><footer><hr><div><h3>Want to comment or have something to add?</h3>You can write me an email at | ||
| 60 | <a href=mailto:m@mitjafelicijan.com>m@mitjafelicijan.com</a> or catch up | ||
| 61 | with me | ||
| 62 | <a href=https://telegram.me/mitjafelicijan target=_blank>on Telegram</a>.</div><hr><p>This website does not track you. Content is made available under | ||
| 63 | the <a href=https://creativecommons.org/licenses/by/4.0/ target=_blank rel=noreferrer>CC BY 4.0 license</a> unless specified | ||
| 64 | otherwise. Blog feed is available as <a href=/index.xml target=_blank>RSS feed</a>.</footer><script src=https://cdn.usefathom.com/script.js data-site=XHQARKXP defer></script> \ No newline at end of file | ||
