diff options
| author | Mitja Felicijan <m@mitjafelicijan.com> | 2023-07-12 18:35:08 +0200 |
|---|---|---|
| committer | Mitja Felicijan <m@mitjafelicijan.com> | 2023-07-12 18:35:08 +0200 |
| commit | 23a56bd50b04211da3cab45f72c3390711b2416b (patch) | |
| tree | ab9a4a0136b4cce06dba7d853e296f682f807dbb /content/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md | |
| parent | cecb4b48a39a3558979b9c4b50e45bf605a3684e (diff) | |
| download | mitjafelicijan.com-23a56bd50b04211da3cab45f72c3390711b2416b.tar.gz | |
Moved notes and posts into subfolders
Diffstat (limited to 'content/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md')
| -rw-r--r-- | content/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md | 331 |
1 files changed, 331 insertions, 0 deletions
diff --git a/content/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md b/content/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md new file mode 100644 index 0000000..d2fa558 --- /dev/null +++ b/content/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md | |||
| @@ -0,0 +1,331 @@ | |||
| 1 | --- | ||
| 2 | title: Using DigitalOcean Spaces Object Storage with FUSE | ||
| 3 | url: using-digitalocean-spaces-object-storage-with-fuse.html | ||
| 4 | date: 2018-01-16T12:00:00+02:00 | ||
| 5 | type: post | ||
| 6 | draft: false | ||
| 7 | --- | ||
| 8 | |||
| 9 | Couple of months ago [DigitalOcean](https://www.digitalocean.com) introduced new | ||
| 10 | product called | ||
| 11 | [Spaces](https://blog.digitalocean.com/introducing-spaces-object-storage/) which | ||
| 12 | is Object Storage very similar to Amazon's S3. This really peaked my interest, | ||
| 13 | because this was something I was missing and even the thought of going over the | ||
| 14 | internet for such functionality was in no interest to me. Also in fashion with | ||
| 15 | their previous pricing this also is very cheap and pricing page is a no-brainer | ||
| 16 | compared to AWS or GCE. [Prices are clearly and precisely defined and | ||
| 17 | outlined](https://www.digitalocean.com/pricing/). You must love them for that | ||
| 18 | :) | ||
| 19 | |||
| 20 | ## Initial requirements | ||
| 21 | |||
| 22 | * Is it possible to use them as a mounted drive with FUSE? (tl;dr YES) | ||
| 23 | * Will the performance degrade over time and over different sizes of objects? | ||
| 24 | (tl;dr NO&YES) | ||
| 25 | * Can storage be mounted on multiple machines at the same time and be writable? | ||
| 26 | (tl;dr YES) | ||
| 27 | |||
| 28 | > Let me be clear. This scripts I use are made just for benchmarking and are not | ||
| 29 | > intended to be used in real-life situations. Besides that, I am looking into | ||
| 30 | > using this approaches but adding caching service in front of it and then | ||
| 31 | > dumping everything as an object to storage. This could potentially be some | ||
| 32 | > interesting post of itself. But in case you would need real-time data without | ||
| 33 | > eventual consistency please take this scripts as they are: not usable in such | ||
| 34 | > situations. | ||
| 35 | |||
| 36 | ## Is it possible to use them as a mounted drive with FUSE? | ||
| 37 | |||
| 38 | Well, actually they can be used in such manor. Because they are similar to [AWS | ||
| 39 | S3](https://aws.amazon.com/s3/) many tools are available and you can find many | ||
| 40 | articles and [Stackoverflow items](https://stackoverflow.com/search?q=s3+fuse). | ||
| 41 | |||
| 42 | To make this work you will need DigitalOcean account. If you don't have one you | ||
| 43 | will not be able to test this code. But if you have an account then you go and | ||
| 44 | [create new | ||
| 45 | Droplet](https://cloud.digitalocean.com/droplets/new?size=s-1vcpu-1gb®ion=ams3&distro=debian&distroImage=debian-9-x64&options=private_networking,install_agent). | ||
| 46 | If you click on this link you will already have preselected Debian 9 with | ||
| 47 | smallest VM option. | ||
| 48 | |||
| 49 | * Please be sure to add you SSH key, because we will login to this machine | ||
| 50 | remotely. | ||
| 51 | * If you change your region please remember which one you choose because we will | ||
| 52 | need this information when we try to mount space to our machine. | ||
| 53 | |||
| 54 | Instuctions on how to use SSH keys and how to setup them are available in | ||
| 55 | article [How To Use SSH Keys with DigitalOcean | ||
| 56 | Droplets](https://www.digitalocean.com/community/tutorials/how-to-use-ssh-keys-with-digitalocean-droplets). | ||
| 57 | |||
| 58 |  | ||
| 59 | |||
| 60 | After we created Droplet it's time to create new Space. This is done by clicking | ||
| 61 | on a button [Create](https://cloud.digitalocean.com/spaces/new) (right top | ||
| 62 | corner) and selecting Spaces. Choose pronounceable ```Unique name``` because we | ||
| 63 | will use it in examples below. You can either choose Private or Public, it | ||
| 64 | doesn't matter in our case. And you can always change that in the future. | ||
| 65 | |||
| 66 | When you have created new Space we should [generate Access | ||
| 67 | key](https://cloud.digitalocean.com/settings/api/tokens). This link will guide | ||
| 68 | to the page when you can generate this key. After you create new one, please | ||
| 69 | save provided Key and Secret because Secret will not be shown again. | ||
| 70 | |||
| 71 |  | ||
| 72 | |||
| 73 | Now that we have new Space and Access key we should SSH into our machine. | ||
| 74 | |||
| 75 | ```bash | ||
| 76 | # replace IP with the ip of your newly created droplet | ||
| 77 | ssh root@IP | ||
| 78 | |||
| 79 | # this will install utilities for mounting storage objects as FUSE | ||
| 80 | apt install s3fs | ||
| 81 | |||
| 82 | # we now need to provide credentials (access key we created earlier) | ||
| 83 | # replace KEY and SECRET with your own credentials but leave the colon between them | ||
| 84 | # we also need to set proper permissions | ||
| 85 | echo "KEY:SECRET" > .passwd-s3fs | ||
| 86 | chmod 600 .passwd-s3fs | ||
| 87 | |||
| 88 | # now we mount space to our machine | ||
| 89 | # replace UNIQUE-NAME with the name you choose earlier | ||
| 90 | # if you choose different region for your space be careful about -ourl option (ams3) | ||
| 91 | s3fs UNIQUE-NAME /mnt/ -ourl=https://ams3.digitaloceanspaces.com -ouse_cache=/tmp | ||
| 92 | |||
| 93 | # now we try to create a file | ||
| 94 | # once you mount it may take a couple of seconds to retrieve data | ||
| 95 | echo "Hello cruel world" > /mnt/hello.txt | ||
| 96 | ``` | ||
| 97 | |||
| 98 | After all this you can return to your browser and go to [DigitalOcean | ||
| 99 | Spaces](https://cloud.digitalocean.com/spaces) and click on your created | ||
| 100 | space. If file hello.txt is present you have successfully mounted space to your | ||
| 101 | machine and wrote data to it. | ||
| 102 | |||
| 103 | I choose the same region for my Droplet and my Space but you don't have to. You | ||
| 104 | can have different regions. What this actually does to performance I don't know. | ||
| 105 | |||
| 106 | Additional information on FUSE: | ||
| 107 | |||
| 108 | * [Github project page for s3fs](https://github.com/s3fs-fuse/s3fs-fuse) | ||
| 109 | * [FUSE - Filesystem in Userspace](https://en.wikipedia.org/wiki/Filesystem_in_Userspace) | ||
| 110 | |||
| 111 | ## Will the performance degrade over time and over different sizes of objects? | ||
| 112 | |||
| 113 | For this task I didn't want to just read and write text files or uploading | ||
| 114 | images. I actually wanted to figure out if using something like SQlite is viable | ||
| 115 | in this case. | ||
| 116 | |||
| 117 | ### Measurement experiment 1: File copy | ||
| 118 | |||
| 119 | ```bash | ||
| 120 | # first we create some dummy files at different sizes | ||
| 121 | dd if=/dev/zero of=10KB.dat bs=1024 count=10 #10KB | ||
| 122 | dd if=/dev/zero of=100KB.dat bs=1024 count=100 #100KB | ||
| 123 | dd if=/dev/zero of=1MB.dat bs=1024 count=1024 #1MB | ||
| 124 | dd if=/dev/zero of=10MB.dat bs=1024 count=10240 #10MB | ||
| 125 | |||
| 126 | # now we set time command to only return real | ||
| 127 | TIMEFORMAT=%R | ||
| 128 | |||
| 129 | # now lets test it | ||
| 130 | (time cp 10KB.dat /mnt/) |& tee -a 10KB.results.txt | ||
| 131 | |||
| 132 | # and now we automate | ||
| 133 | # this will perform the same operation 100 times | ||
| 134 | # this will output results into separated files based on objecty size | ||
| 135 | n=0; while (( n++ < 100 )); do (time cp 10KB.dat /mnt/10KB.$n.dat) |& tee -a 10KB.results.txt; done | ||
| 136 | n=0; while (( n++ < 100 )); do (time cp 100KB.dat /mnt/100KB.$n.dat) |& tee -a 100KB.results.txt; done | ||
| 137 | n=0; while (( n++ < 100 )); do (time cp 1MB.dat /mnt/1MB.$n.dat) |& tee -a 1MB.results.txt; done | ||
| 138 | n=0; while (( n++ < 100 )); do (time cp 10MB.dat /mnt/10MB.$n.dat) |& tee -a 10MB.results.txt; done | ||
| 139 | ``` | ||
| 140 | |||
| 141 | Files of size 100MB were not successfully transferred and ended up displaying | ||
| 142 | error (cp: failed to close '/mnt/100MB.1.dat': Operation not permitted). | ||
| 143 | |||
| 144 | As I suspected, object size is not really that important. Sadly I don't have the | ||
| 145 | time to test performance over periods of time. But if some of you would do it | ||
| 146 | please send me your data. I would be interested in seeing results. | ||
| 147 | |||
| 148 | **Here are plotted results** | ||
| 149 | |||
| 150 | You can download [raw result here](/assets/do-fuse/copy-benchmarks.tsv). | ||
| 151 | Measurements are in seconds. | ||
| 152 | |||
| 153 | <script src="//cdn.plot.ly/plotly-latest.min.js"></script> | ||
| 154 | <div id="copy-benchmarks"></div> | ||
| 155 | <script> | ||
| 156 | (function(){ | ||
| 157 | var request = new XMLHttpRequest(); | ||
| 158 | request.open("GET", "/assets/do-fuse/copy-benchmarks.tsv", true); | ||
| 159 | request.onload = function() { | ||
| 160 | if (request.status >= 200 && request.status < 400) { | ||
| 161 | var payload = request.responseText.trim(); | ||
| 162 | var tsv = payload.split("\n"); | ||
| 163 | for (var i=0; i<tsv.length; i++) { tsv[i] = tsv[i].split("\t"); } | ||
| 164 | var traces = []; | ||
| 165 | var headers = tsv[0]; | ||
| 166 | tsv.shift(); | ||
| 167 | Array.prototype.forEach.call(headers, function(el, idx) { | ||
| 168 | var x = []; | ||
| 169 | var y = []; | ||
| 170 | for (var j=0; j<tsv.length; j++) { | ||
| 171 | x.push(j); | ||
| 172 | y.push(parseFloat(tsv[j][idx].replace(",", "."))); | ||
| 173 | } | ||
| 174 | traces.push({ x: x, y: y, type: "scatter", name: el, line: { width: 1, shape: "spline" } }); | ||
| 175 | }); | ||
| 176 | var copy = Plotly.newPlot("copy-benchmarks", traces, { legend: {"orientation": "h"}, height: 400, margin: { l: 40, r: 0, b: 20, t: 30, pad: 0 }, yaxis: { title: "execution time in seconds", titlefont: { size: 12 } }, xaxis: { title: "fn(i)", titlefont: { size: 12 } } }); | ||
| 177 | } else { } | ||
| 178 | }; | ||
| 179 | request.onerror = function() { }; | ||
| 180 | request.send(null); | ||
| 181 | })(); | ||
| 182 | </script> | ||
| 183 | |||
| 184 | As far as these tests show, performance is quite stable and can be predicted | ||
| 185 | which is fantastic. But this is a small test and spans only over couple of | ||
| 186 | hours. So you should not completely trust them. | ||
| 187 | |||
| 188 | ### Measurement experiment 2: SQLite performanse | ||
| 189 | |||
| 190 | I was unable to use database file directly from mounted drive so this is a no-go | ||
| 191 | as I suspected. So I executed code below on a local disk just to get some | ||
| 192 | benchmarks. I inserted 1000 records with DROPTABLE, CREATETABLE, INSERTMANY, | ||
| 193 | FETCHALL, COMMIT for 1000 times to generate statistics. As you can see | ||
| 194 | performance of SQLite is quite amazing. You could then potentially just copy | ||
| 195 | file to mounted drive and be done with it. | ||
| 196 | |||
| 197 | ```python | ||
| 198 | import time | ||
| 199 | import sqlite3 | ||
| 200 | import sys | ||
| 201 | |||
| 202 | if len(sys.argv) < 3: | ||
| 203 | print("usage: python sqlite-benchmark.py DB_PATH NUM_RECORDS REPEAT") | ||
| 204 | exit() | ||
| 205 | |||
| 206 | def data_iter(x): | ||
| 207 | for i in range(x): | ||
| 208 | yield "m" + str(i), "f" + str(i*i) | ||
| 209 | |||
| 210 | header_line = "%s\t%s\t%s\t%s\t%s\n" % ("DROPTABLE", "CREATETABLE", "INSERTMANY", "FETCHALL", "COMMIT") | ||
| 211 | with open("sqlite-benchmarks.tsv", "w") as fp: | ||
| 212 | fp.write(header_line) | ||
| 213 | |||
| 214 | start_time = time.time() | ||
| 215 | conn = sqlite3.connect(sys.argv[1]) | ||
| 216 | c = conn.cursor() | ||
| 217 | end_time = time.time() | ||
| 218 | result_time = CONNECT = end_time - start_time | ||
| 219 | print("CONNECT: %g seconds" % (result_time)) | ||
| 220 | |||
| 221 | start_time = time.time() | ||
| 222 | c.execute("PRAGMA journal_mode=WAL") | ||
| 223 | c.execute("PRAGMA temp_store=MEMORY") | ||
| 224 | c.execute("PRAGMA synchronous=OFF") | ||
| 225 | result_time = PRAGMA = end_time - start_time | ||
| 226 | print("PRAGMA: %g seconds" % (result_time)) | ||
| 227 | |||
| 228 | for i in range(int(sys.argv[3])): | ||
| 229 | print("#%i" % (i)) | ||
| 230 | |||
| 231 | start_time = time.time() | ||
| 232 | c.execute("drop table if exists test") | ||
| 233 | end_time = time.time() | ||
| 234 | result_time = DROPTABLE = end_time - start_time | ||
| 235 | print("DROPTABLE: %g seconds" % (result_time)) | ||
| 236 | |||
| 237 | start_time = time.time() | ||
| 238 | c.execute("create table if not exists test(a,b)") | ||
| 239 | end_time = time.time() | ||
| 240 | result_time = CREATETABLE = end_time - start_time | ||
| 241 | print("CREATETABLE: %g seconds" % (result_time)) | ||
| 242 | |||
| 243 | start_time = time.time() | ||
| 244 | c.executemany("INSERT INTO test VALUES (?, ?)", data_iter(int(sys.argv[2]))) | ||
| 245 | end_time = time.time() | ||
| 246 | result_time = INSERTMANY = end_time - start_time | ||
| 247 | print("INSERTMANY: %g seconds" % (result_time)) | ||
| 248 | |||
| 249 | start_time = time.time() | ||
| 250 | c.execute("select count(*) from test") | ||
| 251 | res = c.fetchall() | ||
| 252 | end_time = time.time() | ||
| 253 | result_time = FETCHALL = end_time - start_time | ||
| 254 | print("FETCHALL: %g seconds" % (result_time)) | ||
| 255 | |||
| 256 | start_time = time.time() | ||
| 257 | conn.commit() | ||
| 258 | end_time = time.time() | ||
| 259 | result_time = COMMIT = end_time - start_time | ||
| 260 | print("COMMIT: %g seconds" % (result_time)) | ||
| 261 | |||
| 262 | |||
| 263 | log_line = "%f\t%f\t%f\t%f\t%f\n" % (DROPTABLE, CREATETABLE, INSERTMANY, FETCHALL, COMMIT) | ||
| 264 | with open("sqlite-benchmarks.tsv", "a") as fp: | ||
| 265 | fp.write(log_line) | ||
| 266 | |||
| 267 | start_time = time.time() | ||
| 268 | conn.close() | ||
| 269 | end_time = time.time() | ||
| 270 | result_time = CLOSE = end_time - start_time | ||
| 271 | print("CLOSE: %g seconds" % (result_time)) | ||
| 272 | ``` | ||
| 273 | |||
| 274 | You can download [raw result here](/assets/do-fuse/sqlite-benchmarks.tsv). And | ||
| 275 | again, these results are done on a local block storage and do not represent | ||
| 276 | capabilities of object storage. With my current approach and state of the test | ||
| 277 | code these can not be done. I would need to make Python code much more robust | ||
| 278 | and check locking etc. | ||
| 279 | |||
| 280 | <div id="sqlite-benchmarks"></div> | ||
| 281 | <script> | ||
| 282 | (function(){ | ||
| 283 | var request = new XMLHttpRequest(); | ||
| 284 | request.open("GET", "/assets/do-fuse/sqlite-benchmarks.tsv", true); | ||
| 285 | request.onload = function() { | ||
| 286 | if (request.status >= 200 && request.status < 400) { | ||
| 287 | var payload = request.responseText.trim(); | ||
| 288 | var tsv = payload.split("\n"); | ||
| 289 | for (var i=0; i<tsv.length; i++) { tsv[i] = tsv[i].split("\t"); } | ||
| 290 | var traces = []; | ||
| 291 | var headers = tsv[0]; | ||
| 292 | tsv.shift(); | ||
| 293 | Array.prototype.forEach.call(headers, function(el, idx) { | ||
| 294 | var x = []; | ||
| 295 | var y = []; | ||
| 296 | for (var j=0; j<tsv.length; j++) { | ||
| 297 | x.push(j); | ||
| 298 | y.push(parseFloat(tsv[j][idx].replace(",", "."))); | ||
| 299 | } | ||
| 300 | traces.push({ x: x, y: y, type: "scatter", name: el, line: { width: 1, shape: "spline" } }); | ||
| 301 | }); | ||
| 302 | var sqlite = Plotly.newPlot("sqlite-benchmarks", traces, { legend: {"orientation": "h"}, height: 400, margin: { l: 50, r: 0, b: 20, t: 30, pad: 0 }, yaxis: { title: "execution time in seconds", titlefont: { size: 12 } } }); | ||
| 303 | } else { } | ||
| 304 | }; | ||
| 305 | request.onerror = function() { }; | ||
| 306 | request.send(null); | ||
| 307 | })(); | ||
| 308 | </script> | ||
| 309 | |||
| 310 | ## Can storage be mounted on multiple machines at the same time and be writable? | ||
| 311 | |||
| 312 | Well, this one didn't take long to test. And the answer is **YES**. I mounted | ||
| 313 | space on both machines and measured same performance on both machines. But | ||
| 314 | because file is downloaded before write and then uploaded on complete there | ||
| 315 | could potentially be problems is another process is trying to access the same | ||
| 316 | file. | ||
| 317 | |||
| 318 | ## Observations and conslusion | ||
| 319 | |||
| 320 | Using Spaces in this way makes it easier to access and manage files. But besides | ||
| 321 | that you would need to write additional code to make this one play nice with you | ||
| 322 | applications. | ||
| 323 | |||
| 324 | Nevertheless, this was extremely simple to setup and use and this is just | ||
| 325 | another excellent product in DigitalOcean product line. I found this exercise | ||
| 326 | very valuable and am thinking about implementing some sort of mechanism for | ||
| 327 | SQLite, so data can be stored on Spaces and accessed by many VM's. For a project | ||
| 328 | where data doesn't need to be accessible in real-time and can have couple of | ||
| 329 | minutes old data this would be very interesting. If any of you find this | ||
| 330 | proposal interesting please write in a comment box below or shoot me an email | ||
| 331 | and I will keep you posted. | ||
