aboutsummaryrefslogtreecommitdiff
path: root/public/the-strange-case-of-elasticsearch-allocation-failure.html
diff options
context:
space:
mode:
authorMitja Felicijan <mitja.felicijan@gmail.com>2023-11-01 22:54:27 +0100
committerMitja Felicijan <mitja.felicijan@gmail.com>2023-11-01 22:54:27 +0100
commit2417a6b7603524dc5cd30d29b153f91024b9443d (patch)
tree9be5ea8e5baba96dd9159217da6badf6157fb595 /public/the-strange-case-of-elasticsearch-allocation-failure.html
parent89ba3497f07a8ea43d209b583f39fcc286acc923 (diff)
downloadmitjafelicijan.com-2417a6b7603524dc5cd30d29b153f91024b9443d.tar.gz
Move to Jekyll
Diffstat (limited to 'public/the-strange-case-of-elasticsearch-allocation-failure.html')
-rwxr-xr-xpublic/the-strange-case-of-elasticsearch-allocation-failure.html97
1 files changed, 0 insertions, 97 deletions
diff --git a/public/the-strange-case-of-elasticsearch-allocation-failure.html b/public/the-strange-case-of-elasticsearch-allocation-failure.html
deleted file mode 100755
index 5c380a4..0000000
--- a/public/the-strange-case-of-elasticsearch-allocation-failure.html
+++ /dev/null
@@ -1,97 +0,0 @@
1<!doctype html><html lang=en-us><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=generator content="JBMAFP - github.com/mitjafelicijan/jbmafp"><link href="data:image/x-icon;base64,AAABAAEAEBAAAAEAIABoBAAAFgAAACgAAAAQAAAAIAAAAAEAIAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL69vf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv76+/8LBwQkAAAAAAAAAAAAAAAC+vb3/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL+9vf/Bv78JAAAAAAAAAAAAAAAAu7q6/wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7ubr/vr29CAAAAAAAAAAAy8nJAZ6foP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAnqGj/6GipAoAAAAAHLjU/xcXHf/BwsL/I8XY/yPK3v8XGiD/IbjL/yPF2f8XGiD/Fxkf/yLF2f8gnK3/Fxog/62ztv8fwNf/FRcd/x271v8mz93/GRsi/xkXHf8p097/GiIp/xobIv8p0t3/KdPe/xocIv8fYmr/KNPe/xoZH/8aHCL/J87c/xy81/8VFxz/IsPZ/8zS0/8XGiD/Ir/R/yPH2/8XGiD/Fxkf/yPH2/8dd4T/GBog/yPJ3f8jyNr/uru9/xcUGv8cudb/EhITDKi5vRKlvMP/RUpOERwcHRAdOj4QHTk8EBwdHRAdNTgQHTo/EBwcHRAcHB0QSGduEKW4vf+koqQfHzg+EBqz0ewSFRv7EyMr/xq51vsTERb7ExUb+xq41fsau9j7ExUb+xiPp/sZudb7ExUb+xMVG/sZuNX/GKvI/BIUGfMdvdn/IrfL/xcaIP8n1eb/J9Dh/xkcIf8ZGR7/J8/f/xxCSv8ZGyH/J9Dg/ybQ4P8ZHCL/FSQs/yPK3/8UExj/GE1b/ybS5P8ZGB7/Ghwj/ynW5P8p2Ob/Ghwi/yWrtv8p1eH/Ghwi/xocIv8p1uT/J8XT/xkcIv8m1un/Hb7d/xUYH/8hzOr/HtHu/xcaIf8XGB//I8vi/xgxOv8XGSD/I8rg/yPK4P8XGiD/GUFL/yPP6f8SERj/Fhkh/x3A4f8AAAAAJ2f9/ydr//8mZPH/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlYu38J2v//ydo/f8AAAAAAAAAAAd8/fkFqf//Iob8sAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMY39awWr//8FfP3/AAAAAAAAAAAFm/7/SfD//wR+/f8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOB/f9B7v//BaX+/wAAAAAAAAAAQ878SAyZ/v9n1v4KAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADu9v8DDJb+/z3N/XgAAAAA3/sAAN/7AADf+wAA3/sAAAAAAAAAAAAAAAAAAN/7AAAAAAAAAAAAAAAAAAAAAAAAj/EAAI/5AACP8QAA3/sAAA==" rel=icon type=image/x-icon><title>The strange case of Elasticsearch allocation failure</title><meta name=description content="I&amp;#39;ve been using Elasticsearch in production for 5 years now and never had asingle problem with it."><meta name=author content="Mitja Felicijan"><link rel=alternate type=application/rss+xml title="Mitja Felicijan's posts" href=https://mitjafelicijan.com/index.xml><link rel=alternate type=application/rss+xml title="Mitja Felicijan's notes" href=https://mitjafelicijan.com/notes.xml><style>:root{--border-color:gainsboro;--border-size:2px;--link-color:blue;--bg-color:#eee}*::selection{background:var(--link-color);color:#fff}*::-moz-selection{background:var(--link-color);color:#fff}*::-webkit-selection{background:var(--link-color);color:#fff}body{padding:2.5rem;max-width:1900px;background:#fff;font-family:sans-serif;line-height:1.35rem;font-size:16px}hr{border:0;border-bottom:var(--border-size)solid var(--border-color);margin-block-start:1.5rem}a{color:var(--link-color);text-decoration:none}h1,h2,h3{line-height:initial}h1{font-size:xx-large}footer{margin-block-start:2rem}cap{text-transform:capitalize}blockquote{font-style:italic}table{max-width:100%;border:var(--border-size)solid var(--border-color);border-collapse:separate;border-spacing:0}table thead tr th{border-bottom:var(--border-size)solid var(--border-color);text-align:left}table th,table td{padding:.5em .8em}ul.list li{padding:.2em 0}ul{line-height:1.35em}pre{text-wrap:nowrap;overflow-x:auto;padding:0 1em;border:var(--border-size)solid var(--border-color)}code{padding:0 3px;font-size:14px;border:0;background:var(--bg-color)}pre code{line-height:1.3em;background:#fff}pre,code,pre *,code *{font-family:monospace}figure{margin-inline-start:0;margin-inline-end:0}figcaption{width:800px;max-width:100%;text-align:center}figcaption p{margin:.3em 0 1.5em;font-style:italic}img,video,audio{width:800px;max-width:100%;border:var(--border-size)solid var(--border-color);padding:.5em}header nav{display:flex;gap:.9rem}article iframe{margin:0!important}audio::-webkit-media-controls-enclosure{border-radius:0}@media only screen and (max-width:600px){body{padding:.5em;word-wrap:break-word}header nav{gap:.7rem}header nav .hob{display:none}a{word-wrap:break-word}img,video,audio{padding:0}}</style><header><nav class=main itemscope itemtype=http://schema.org/SiteNavigationElement role=navigation aria-label="Main navigation"><a href=/>Home</a>
2<a href=/#posts>Posts</a>
3<a href=/#notes>Notes</a>
4<a href=/#sideprojects class=hob>Side Projects</a>
5<a href=/vault.html>Vault</a>
6<a href=https://github.com/mitjafelicijan target=_blank>Code</a>
7<a href=/mitjafelicijan.pgp.pub.txt target=_blank class=hob>PGP</a>
8<a href=/curriculum-vitae.html>CV</a>
9<a href=/index.xml target=_blank class=hob>RSS</a></nav></header><main role=main><article itemtype=http://schema.org/Article><h1 itemtype=headline>The strange case of Elasticsearch allocation failure</h1><p><cap>post</cap>, Mar 29, 2020 on <a href=https://mitjafelicijan.com>Mitja Felicijan's blog</a><div><p>I've been using Elasticsearch in production for 5 years now and never had a
10single problem with it. Hell, never even known there could be a problem. Just
11worked. All this time. The first node that I deployed is still being used in
12production, never updated, upgraded, touched in anyway.<p>All this bliss came to an abrupt end this Friday when I got notification that
13Elasticsearch cluster went warm. Well, warm is not that bad right? Wrong!
14Quickly after that I got another email which sent chills down my spine. Cluster
15is now red. RED! Now, shit really hit the fan!<p>I tried googling what could be the problem and after executing allocation
16function noticed that some shards were unassigned and 5 attempts were already
17made (which is BTW to my luck the maximum) and that meant I am basically fucked.
18They also applied that one should wait for cluster to re-balance itself. So, I
19waited. One hour, two hours, several hours. Nothing, still RED.<p>The strangest thing about it all was, that queries were still being fulfilled.
20Data was coming out. On the outside it looked like nothing was wrong but
21everybody that would look at the cluster would know immediately that something
22was very very wrong and we were living on borrowed time here.<blockquote><p><strong>Please, DO NOT do what I did.</strong> Seriously! Please ask someone on official
23forums or if you know an expert please consult him. There could be million of
24reasons and these solution fit my problem. Maybe in your case it would
25disastrous. I had all the data backed up and even if I would fail spectacularly
26I would be able to restore the data. It would be a huge pain and I would loose
27couple of days but I had a plan B.</blockquote><p>Executing allocation and told me what the problem was but no clear solution yet.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>GET /_cat/allocation?format=json
28</span></span></code></pre><p>I got a message that <code>ALLOCATION_FAILED</code> with additional info <code>failed to create shard, failure ioexception[failed to obtain in-memory shard lock]</code>. Well
29splendid! I must also say that our cluster is capable more than enough to handle
30the traffic. Also JVM memory pressure never was an issue. So what happened
31really then?<p>I tried also re-routing failed ones with no success due to AWS restrictions on
32having managed Elasticsearch cluster (they lock some of the functions).<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>POST /_cluster/reroute?retry_failed=true
33</span></span></code></pre><p>I got a message that significantly reduced my options.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>{
34</span></span><span style=display:flex><span> &#34;Message&#34;: <span style=color:#a31515>&#34;Your request: &#39;/_cluster/reroute&#39; is not allowed.&#34;</span>
35</span></span><span style=display:flex><span>}
36</span></span></code></pre><p>After that I went on a hunt again. I won't bother you with all the details
37because hours/days went by until I was finally able to re-index the problematic
38index and hoped for the best. Until that moment even re-indexing was giving me
39errors.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>POST _reindex
40</span></span><span style=display:flex><span>{
41</span></span><span style=display:flex><span> &#34;source&#34;: {
42</span></span><span style=display:flex><span> &#34;index&#34;: <span style=color:#a31515>&#34;myindex&#34;</span>
43</span></span><span style=display:flex><span> },
44</span></span><span style=display:flex><span> &#34;dest&#34;: {
45</span></span><span style=display:flex><span> &#34;index&#34;: <span style=color:#a31515>&#34;myindex-new&#34;</span>
46</span></span><span style=display:flex><span> }
47</span></span><span style=display:flex><span>}
48</span></span></code></pre><p>I needed to do this multiple times to get all the documents re-indexed. Then I
49dropped the original one with the following command.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>DELETE /myindex
50</span></span></code></pre><p>And re-indexed again new one in the original one (well by name only).<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>POST _reindex
51</span></span><span style=display:flex><span>{
52</span></span><span style=display:flex><span> &#34;source&#34;: {
53</span></span><span style=display:flex><span> &#34;index&#34;: <span style=color:#a31515>&#34;myindex-new&#34;</span>
54</span></span><span style=display:flex><span> },
55</span></span><span style=display:flex><span> &#34;dest&#34;: {
56</span></span><span style=display:flex><span> &#34;index&#34;: <span style=color:#a31515>&#34;myindex&#34;</span>
57</span></span><span style=display:flex><span> }
58</span></span><span style=display:flex><span>}
59</span></span></code></pre><p>On the surface it looks like all is working but I have a long road in front of
60me to get all the things working again. Cluster now shows that it is in Green
61mode but I am also getting a notification that the cluster has processing status
62which could mean million of things.<p>Godspeed!</div></article></main><section><hr><h2>Posts from blogs I follow around the net</h2><ul><li><a href=https://utcc.utoronto.ca/~cks/space/blog/linux/NFSv4ServerLockClients target=_blank rel=noopener>Finding which NFSv4 client owns a lock on a Linux NFS(v4) server</a> — <a href=https://utcc.utoronto.ca/~cks/space/blog/>Chris's Wiki :: blog</a><div>A while back I wrote an entry about finding which NFS client owns
63a lock on a Linux NFS server, which turned
64out to be specific to NFS v3 (which I really should have seen coming,
65since it involved NLM and lockd). Finding the NFS v4 client that
66owns a lock is, depending on your perspective, either simpl…<li><a href=http://www.landley.net/notes-2023.html#28-10-2023 target=_blank rel=noopener>October 28, 2023</a> — <a href=http://www.landley.net/notes-2023.html>Rob Landley's Blog Thing for 2023</a><div>Oh good grief, two of my least favorite licensing people, Larry Rosen
67and Bradley Kuhn, are interacting on the OSI's license-discuss
68list where the're doing
69bad computer history and insisting that a guy Larry Rosen
70coincidentally interviewed for a book years ago is clearly the origin of
71somethin…<li><a href="http://offbeatpursuit.com:80/blog/?id=25" target=_blank rel=noopener>A fix by any other name</a> — <a href=http://offbeatpursuit.com:80/blog/>WLOG - blog</a><div>tags:
72i2c, plan9
73Another month, another file system.
74Well, if you can’t fix it in software, fix it in hardware (looking at
75you, bme680, we’re not
76done yet). The show must go on, as they say, and I would like my
77experiments to go on.
78So a “new” addition to the environmental sensor family connected to
79the h…<li><a href=https://mirzapandzo.com/next-image-url-parameter-is-valid-but-upstream-response-is-invalid target=_blank rel=noopener>Next/Image "url" parameter is valid but upstream response is invalid</a> — <a href=https://mirzapandzo.com/>Mirza Pandzo's Blog</a><div>Getting "url" parameter is valid but upstream response is invalid error with Next/Image on WSL2<li><a href=https://drewdevault.com/2023/10/13/Going-off-script.html target=_blank rel=noopener>Going off-script</a> — <a href=https://drewdevault.com>Drew DeVault's blog</a><div>There is a phenomenon in society which I find quite bizarre. Upon our entry to
80this mortal coil, we are endowed with self-awareness, agency, and free will.
81Each of the 8 billion members of this human race represents a unique person, a
82unique worldview, and a unique agency. Yet, many of us have the sam…<li><a href=https://szymonkaliski.com/writing/2023-10-02-building-a-diy-pen-plotter/ target=_blank rel=noopener>Building a DIY Pen Plotter</a> — <a href=http://github.com/dylang/node-rss>Szymon Kaliski</a><div>This article documents my learnings from designing and building a DIY Pen Plotter during the summer of 2023.
83My ultimate goal is to build my…<li><a href=https://neil.computer/notes/chart-of-accounts-for-startups-and-saas-companies/ target=_blank rel=noopener>Chart of Accounts for Startups and SaaS Companies</a> — <a href=https://neil.computer/>Neil Panchal</a><div>Accounting is fundamental to starting a business. You need to have a basic understanding of accounting principles and essential bookkeeping. I had to learn it. There was no choice. For filing taxes, your CPA is going to ask you for an Income Statement (also known as P/L statement). If<li><a href=https://journal.valeriansaliou.name/deploy-a-nomad-cluster-on-alpine-linux-with-vultr/ target=_blank rel=noopener>Deploy a Nomad Cluster on Alpine Linux with Vultr</a> — <a href=https://journal.valeriansaliou.name/>Valerian Saliou</a><div>After spending countless hours trying to understand how to deploy my apps on Kubernetes for the first time to host Mirage, an AI API service that I run, I ended up making myself a promise that the next app I work on would be using a more productive & simpler<li><a href=https://jcs.org/2023/10/25/wifi_da target=_blank rel=noopener>BlueSCSI Wi-Fi Desk Accessory 1.0 Released</a> — <a href=https://jcs.org/>joshua stein</a><div>BlueSCSI Wi-Fi Desk Accessory
841.0 has been released:
85wifi_da-1.0.sit
86(StuffIt 3 archive)
87SHA256: ccfc9d27dd5da7412d10cef73b81119a1fec3848e4d1d88ff652a07ffdc6a69aSHA1: ff124972f202ceda6d7fa4788110a67ccda6a13a
88This is the initial public release of my BlueSCSI Wi-Fi Desk Accessory for
89classic MacOS.<li><a href=https://michael.stapelberg.ch/posts/2023-10-25-my-all-flash-zfs-network-storage-build/ target=_blank rel=noopener>My 2023 all-flash ZFS NAS (Network Storage) build</a> — <a href=https://michael.stapelberg.ch/>Michael Stapelbergs Website</a><div>For over 10 years now, I run two self-built NAS (Network Storage) devices which serve media (currently via Jellyfin) and run daily backups of all my PCs and servers.
90In this article, I describe my goals, which hardware I picked for my new build (and why) and how I set it up.
91Design Goals
92I use my netw…</ul><p>Generated with <a href=https://git.sr.ht/~sircmpwn/openring target=_blank rel=noopener>openring</a>.</section><footer><hr><p><big><strong>Want to comment or have something to add?</strong></big><p>You can write me an email
93at <a href=mailto:mitja.felicijan@gmail.com>mitja.felicijan@gmail.com</a> or
94catch up with me <a href=https://telegram.me/mitjafelicijan target=_blank>on Telegram</a>.<hr><p>This website does not track you. Content is made available under the <a href=https://creativecommons.org/licenses/by/4.0/ target=_blank rel=noreferrer>CC BY 4.0 license</a> unless
95specified otherwise. Blog is also available as <a href=/index.xml target=_blank>RSS feed</a>.</footer><script>
96 window.va = window.va || function () { (window.vaq = window.vaq || []).push(arguments); };
97 </script><script defer src=/_vercel/insights/script.js></script> \ No newline at end of file