aboutsummaryrefslogtreecommitdiff
path: root/public/the-strange-case-of-elasticsearch-allocation-failure.html
diff options
context:
space:
mode:
authorMitja Felicijan <mitja.felicijan@gmail.com>2023-10-29 14:41:39 +0100
committerMitja Felicijan <mitja.felicijan@gmail.com>2023-10-29 14:41:39 +0100
commit2836163e54e3b94342113314e70ee564c456c43e (patch)
tree59b82fc69e83cc6d92846a8e9f510b0bb865cf3b /public/the-strange-case-of-elasticsearch-allocation-failure.html
parentd50ea4053ea04abb3a455606d4591a8283af0677 (diff)
downloadmitjafelicijan.com-2836163e54e3b94342113314e70ee564c456c43e.tar.gz
Added public folder to git so it get get deployed on vercel
Diffstat (limited to 'public/the-strange-case-of-elasticsearch-allocation-failure.html')
-rwxr-xr-xpublic/the-strange-case-of-elasticsearch-allocation-failure.html78
1 files changed, 78 insertions, 0 deletions
diff --git a/public/the-strange-case-of-elasticsearch-allocation-failure.html b/public/the-strange-case-of-elasticsearch-allocation-failure.html
new file mode 100755
index 0000000..bc5863a
--- /dev/null
+++ b/public/the-strange-case-of-elasticsearch-allocation-failure.html
@@ -0,0 +1,78 @@
1<!doctype html><html lang=en-us><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><link href="data:image/x-icon;base64,AAABAAEAEBAAAAEAIABoBAAAFgAAACgAAAAQAAAAIAAAAAEAIAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL69vf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv76+/8LBwQkAAAAAAAAAAAAAAAC+vb3/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL+9vf/Bv78JAAAAAAAAAAAAAAAAu7q6/wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7ubr/vr29CAAAAAAAAAAAy8nJAZ6foP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAnqGj/6GipAoAAAAAHLjU/xcXHf/BwsL/I8XY/yPK3v8XGiD/IbjL/yPF2f8XGiD/Fxkf/yLF2f8gnK3/Fxog/62ztv8fwNf/FRcd/x271v8mz93/GRsi/xkXHf8p097/GiIp/xobIv8p0t3/KdPe/xocIv8fYmr/KNPe/xoZH/8aHCL/J87c/xy81/8VFxz/IsPZ/8zS0/8XGiD/Ir/R/yPH2/8XGiD/Fxkf/yPH2/8dd4T/GBog/yPJ3f8jyNr/uru9/xcUGv8cudb/EhITDKi5vRKlvMP/RUpOERwcHRAdOj4QHTk8EBwdHRAdNTgQHTo/EBwcHRAcHB0QSGduEKW4vf+koqQfHzg+EBqz0ewSFRv7EyMr/xq51vsTERb7ExUb+xq41fsau9j7ExUb+xiPp/sZudb7ExUb+xMVG/sZuNX/GKvI/BIUGfMdvdn/IrfL/xcaIP8n1eb/J9Dh/xkcIf8ZGR7/J8/f/xxCSv8ZGyH/J9Dg/ybQ4P8ZHCL/FSQs/yPK3/8UExj/GE1b/ybS5P8ZGB7/Ghwj/ynW5P8p2Ob/Ghwi/yWrtv8p1eH/Ghwi/xocIv8p1uT/J8XT/xkcIv8m1un/Hb7d/xUYH/8hzOr/HtHu/xcaIf8XGB//I8vi/xgxOv8XGSD/I8rg/yPK4P8XGiD/GUFL/yPP6f8SERj/Fhkh/x3A4f8AAAAAJ2f9/ydr//8mZPH/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlYu38J2v//ydo/f8AAAAAAAAAAAd8/fkFqf//Iob8sAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMY39awWr//8FfP3/AAAAAAAAAAAFm/7/SfD//wR+/f8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOB/f9B7v//BaX+/wAAAAAAAAAAQ878SAyZ/v9n1v4KAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADu9v8DDJb+/z3N/XgAAAAA3/sAAN/7AADf+wAA3/sAAAAAAAAAAAAAAAAAAN/7AAAAAAAAAAAAAAAAAAAAAAAAj/EAAI/5AACP8QAA3/sAAA==" rel=icon type=image/x-icon><title>The strange case of Elasticsearch allocation failure</title><meta name=description content="I&amp;#39;ve been using Elasticsearch in production for 5 years now and never had asingle problem with it."><link rel=alternate type=application/rss+xml title="Mitja Felicijan's posts" href=https://mitjafelicijan.com/index.xml><link rel=alternate type=application/rss+xml title="Mitja Felicijan's notes" href=https://mitjafelicijan.com/notes.xml><style>body{padding:1rem;max-width:760px;background:#fff;font-family:sans-serif;line-height:1.35rem;font-size:16px;margin:0 auto}hr{margin-block-start:1.5rem}h1,h2,h3{line-height:initial}h1{font-size:xx-large}footer{margin-block-start:2rem}cap{text-transform:capitalize}table{max-width:100%;width:100%;border-collapse:separate;border-spacing:2px;border:1px solid #000;border-left:1px solid #999;border-top:1px solid #999}blockquote{font-style:italic}table thead{background:#eee}ul.list li{padding:.2em 0}ul{line-height:1.4em}td,th{border:1px solid #000;padding:4px;border-right:1px solid #999;border-bottom:1px solid #999;text-align:left}pre{text-wrap:nowrap;overflow-x:auto;padding:0 1em;border:1px solid #dcdcdc}code{padding:0 3px;font-size:14px;border:0}pre code{line-height:1.3em}pre,code,pre *,code *{font-family:monospace}figure{margin-inline-start:0;margin-inline-end:0}figcaption{text-align:center}figcaption p{margin:.3em 0 0}img,video,audio{max-width:100%}header{display:flex;flex-direction:row;gap:3rem}nav{display:flex;gap:.75rem}nav.main{flex-grow:1}.pstatus-orange{background:gold}.pstatus-green{background:#9acd32}.pstatus-red{background:#cd5c5c}@media only screen and (max-width:600px){body{padding:15px}header{flex-direction:column;gap:1rem}a{word-wrap:break-word}}</style><header><nav class=main itemscope itemtype=http://schema.org/SiteNavigationElement role=toolbar><a href=/>Home</a>
2<a href=https://git.mitjafelicijan.com/ target=_blank>Git</a>
3<a href=https://files.mitjafelicijan.com/ target=_blank>Files</a>
4<a href=/radio.pls target=_blank>Radio</a>
5<a href=/mitjafelicijan.pgp.pub.txt target=_blank>PGP</a>
6<a href=/curriculum-vitae.html>CV</a>
7<a href=/index.xml target=_blank>RSS</a></nav></header><main role=main><article itemtype=http://schema.org/Article><h1 itemtype=headline>The strange case of Elasticsearch allocation failure</h1><p><cap>post</cap>, Mar 29, 2020 on <a href=https://mitjafelicijan.com>Mitja Felicijan's blog</a><div><p>I've been using Elasticsearch in production for 5 years now and never had a
8single problem with it. Hell, never even known there could be a problem. Just
9worked. All this time. The first node that I deployed is still being used in
10production, never updated, upgraded, touched in anyway.<p>All this bliss came to an abrupt end this Friday when I got notification that
11Elasticsearch cluster went warm. Well, warm is not that bad right? Wrong!
12Quickly after that I got another email which sent chills down my spine. Cluster
13is now red. RED! Now, shit really hit the fan!<p>I tried googling what could be the problem and after executing allocation
14function noticed that some shards were unassigned and 5 attempts were already
15made (which is BTW to my luck the maximum) and that meant I am basically fucked.
16They also applied that one should wait for cluster to re-balance itself. So, I
17waited. One hour, two hours, several hours. Nothing, still RED.<p>The strangest thing about it all was, that queries were still being fulfilled.
18Data was coming out. On the outside it looked like nothing was wrong but
19everybody that would look at the cluster would know immediately that something
20was very very wrong and we were living on borrowed time here.<blockquote><p><strong>Please, DO NOT do what I did.</strong> Seriously! Please ask someone on official
21forums or if you know an expert please consult him. There could be million of
22reasons and these solution fit my problem. Maybe in your case it would
23disastrous. I had all the data backed up and even if I would fail spectacularly
24I would be able to restore the data. It would be a huge pain and I would loose
25couple of days but I had a plan B.</blockquote><p>Executing allocation and told me what the problem was but no clear solution yet.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>GET /_cat/allocation?format=json
26</span></span></code></pre><p>I got a message that <code>ALLOCATION_FAILED</code> with additional info <code>failed to create shard, failure ioexception[failed to obtain in-memory shard lock]</code>. Well
27splendid! I must also say that our cluster is capable more than enough to handle
28the traffic. Also JVM memory pressure never was an issue. So what happened
29really then?<p>I tried also re-routing failed ones with no success due to AWS restrictions on
30having managed Elasticsearch cluster (they lock some of the functions).<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>POST /_cluster/reroute?retry_failed=true
31</span></span></code></pre><p>I got a message that significantly reduced my options.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>{
32</span></span><span style=display:flex><span> &#34;Message&#34;: <span style=color:#a31515>&#34;Your request: &#39;/_cluster/reroute&#39; is not allowed.&#34;</span>
33</span></span><span style=display:flex><span>}
34</span></span></code></pre><p>After that I went on a hunt again. I won't bother you with all the details
35because hours/days went by until I was finally able to re-index the problematic
36index and hoped for the best. Until that moment even re-indexing was giving me
37errors.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>POST _reindex
38</span></span><span style=display:flex><span>{
39</span></span><span style=display:flex><span> &#34;source&#34;: {
40</span></span><span style=display:flex><span> &#34;index&#34;: <span style=color:#a31515>&#34;myindex&#34;</span>
41</span></span><span style=display:flex><span> },
42</span></span><span style=display:flex><span> &#34;dest&#34;: {
43</span></span><span style=display:flex><span> &#34;index&#34;: <span style=color:#a31515>&#34;myindex-new&#34;</span>
44</span></span><span style=display:flex><span> }
45</span></span><span style=display:flex><span>}
46</span></span></code></pre><p>I needed to do this multiple times to get all the documents re-indexed. Then I
47dropped the original one with the following command.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>DELETE /myindex
48</span></span></code></pre><p>And re-indexed again new one in the original one (well by name only).<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>POST _reindex
49</span></span><span style=display:flex><span>{
50</span></span><span style=display:flex><span> &#34;source&#34;: {
51</span></span><span style=display:flex><span> &#34;index&#34;: <span style=color:#a31515>&#34;myindex-new&#34;</span>
52</span></span><span style=display:flex><span> },
53</span></span><span style=display:flex><span> &#34;dest&#34;: {
54</span></span><span style=display:flex><span> &#34;index&#34;: <span style=color:#a31515>&#34;myindex&#34;</span>
55</span></span><span style=display:flex><span> }
56</span></span><span style=display:flex><span>}
57</span></span></code></pre><p>On the surface it looks like all is working but I have a long road in front of
58me to get all the things working again. Cluster now shows that it is in Green
59mode but I am also getting a notification that the cluster has processing status
60which could mean million of things.<p>Godspeed!</div></article></main><section><hr><h2>Posts from blogs I follow around the net</h2><ul><li><a href=https://chotrin.org/writing/2023-10-20.html target=_blank rel=noopener>OpenBSD upgrade and fall things.</a><div>Been AFK for a bit. It's autumn and I upgraded this server to OpenBSD 7.4! — <a href=https://chotrin.org>chötrin's wiki.</a><li><a href=https://mirzapandzo.com/next-image-url-parameter-is-valid-but-upstream-response-is-invalid target=_blank rel=noopener>Next/Image "url" parameter is valid but upstream response is invalid</a><div>Getting "url" parameter is valid but upstream response is invalid error with Next/Image on WSL2 — <a href=https://mirzapandzo.com/>Mirza Pandzo's Blog</a><li><a href=https://drewdevault.com/2023/10/13/Going-off-script.html target=_blank rel=noopener>Going off-script</a><div>There is a phenomenon in society which I find quite bizarre. Upon our entry to
61this mortal coil, we are endowed with self-awareness, agency, and free will.
62Each of th… — <a href=https://drewdevault.com>Drew DeVault's blog</a><li><a href=https://solar.lowtechmagazine.com/2023/10/workshop-in-rotterdam-how-to-build-a-bike-generator/ target=_blank rel=noopener>Workshop in Rotterdam: How to Build a Bike Generator</a><div>Afbeelding: Low-tech Magazine workshop in Rotterdam, the Netherlands. Poster: Marie Verdeil. Image: Sara Vercauteren
63The workshop takes place on behalf of the “Hou… — <a href=https://solar.lowtechmagazine.com/posts/>LOW←TECH MAGAZINE English</a><li><a href="http://offbeatpursuit.com:80/blog/?id=24" target=_blank rel=noopener>Printf debugging</a><div>tags:
64plan9
65There’s no shame in that. Yes, there is documentation, code to be
66read, and debuggers to be used. But sometimes you just need to “see”
67what is happening.
68So… — <a href=http://offbeatpursuit.com:80/blog/>WLOG - blog</a><li><a href=https://neil.computer/notes/chart-of-accounts-for-startups-and-saas-companies/ target=_blank rel=noopener>Chart of Accounts for Startups and SaaS Companies</a><div>Accounting is fundamental to starting a business. You need to have a basic understanding of accounting principles and essential bookkeeping. I had to learn it. Ther… — <a href=https://neil.computer/>Neil Panchal</a><li><a href=https://journal.valeriansaliou.name/deploy-a-nomad-cluster-on-alpine-linux-with-vultr/ target=_blank rel=noopener>Deploy a Nomad Cluster on Alpine Linux with Vultr</a><div>After spending countless hours trying to understand how to deploy my apps on Kubernetes for the first time to host Mirage, an AI API service that I run, I ended up … — <a href=https://journal.valeriansaliou.name/>Valerian Saliou</a><li><a href=https://jcs.org/2023/10/17/wikipedia target=_blank rel=noopener>Wikipedia Reader 1.0 Released</a><div>Wikipedia Reader
691.0 has been released:
70wikipedia-1.0.sit
71(StuffIt 3 archive, includes
72source code
73and THINK C 5 project file)
74SHA256: 360e12d064f6579695f1e627ce34cb2f0… — <a href=https://jcs.org/>joshua stein</a></ul><p><a href=https://git.sr.ht/~sircmpwn/openring>Generated with openring.</a></section><footer><hr><p><big><strong>Want to comment or have something to add?</strong></big><p>You can write me an email
75at <a href=mailto:m@mitjafelicijan.com>m@mitjafelicijan.com</a> or
76catch up with me <a href=https://telegram.me/mitjafelicijan target=_blank>on Telegram</a>.<hr><p>This website does not track you. Content is made available under
77the <a href=https://creativecommons.org/licenses/by/4.0/ target=_blank rel=noreferrer>CC BY 4.0 license</a> unless specified
78otherwise. Blog is also available as <a href=/index.xml target=_blank>RSS feed</a>.</footer> \ No newline at end of file