aboutsummaryrefslogtreecommitdiff
path: root/content/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md
diff options
context:
space:
mode:
Diffstat (limited to 'content/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md')
-rw-r--r--content/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md330
1 files changed, 0 insertions, 330 deletions
diff --git a/content/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md b/content/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md
deleted file mode 100644
index 3a62594..0000000
--- a/content/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md
+++ /dev/null
@@ -1,330 +0,0 @@
1---
2title: Using DigitalOcean Spaces Object Storage with FUSE
3url: using-digitalocean-spaces-object-storage-with-fuse.html
4date: 2018-01-16T12:00:00+02:00
5draft: false
6---
7
8Couple of months ago [DigitalOcean](https://www.digitalocean.com) introduced new
9product called
10[Spaces](https://blog.digitalocean.com/introducing-spaces-object-storage/) which
11is Object Storage very similar to Amazon's S3. This really peaked my interest,
12because this was something I was missing and even the thought of going over the
13internet for such functionality was in no interest to me. Also in fashion with
14their previous pricing this also is very cheap and pricing page is a no-brainer
15compared to AWS or GCE. [Prices are clearly and precisely defined and
16outlined](https://www.digitalocean.com/pricing/). You must love them for that
17:)
18
19## Initial requirements
20
21* Is it possible to use them as a mounted drive with FUSE? (tl;dr YES)
22* Will the performance degrade over time and over different sizes of objects?
23 (tl;dr NO&YES)
24* Can storage be mounted on multiple machines at the same time and be writable?
25 (tl;dr YES)
26
27> Let me be clear. This scripts I use are made just for benchmarking and are not
28> intended to be used in real-life situations. Besides that, I am looking into
29> using this approaches but adding caching service in front of it and then
30> dumping everything as an object to storage. This could potentially be some
31> interesting post of itself. But in case you would need real-time data without
32> eventual consistency please take this scripts as they are: not usable in such
33> situations.
34
35## Is it possible to use them as a mounted drive with FUSE?
36
37Well, actually they can be used in such manor. Because they are similar to [AWS
38S3](https://aws.amazon.com/s3/) many tools are available and you can find many
39articles and [Stackoverflow items](https://stackoverflow.com/search?q=s3+fuse).
40
41To make this work you will need DigitalOcean account. If you don't have one you
42will not be able to test this code. But if you have an account then you go and
43[create new
44Droplet](https://cloud.digitalocean.com/droplets/new?size=s-1vcpu-1gb&region=ams3&distro=debian&distroImage=debian-9-x64&options=private_networking,install_agent).
45If you click on this link you will already have preselected Debian 9 with
46smallest VM option.
47
48* Please be sure to add you SSH key, because we will login to this machine
49 remotely.
50* If you change your region please remember which one you choose because we will
51 need this information when we try to mount space to our machine.
52
53Instuctions on how to use SSH keys and how to setup them are available in
54article [How To Use SSH Keys with DigitalOcean
55Droplets](https://www.digitalocean.com/community/tutorials/how-to-use-ssh-keys-with-digitalocean-droplets).
56
57![DigitalOcean Droplets](/assets/do-fuse/fuse-droplets.png)
58
59After we created Droplet it's time to create new Space. This is done by clicking
60on a button [Create](https://cloud.digitalocean.com/spaces/new) (right top
61corner) and selecting Spaces. Choose pronounceable ```Unique name``` because we
62will use it in examples below. You can either choose Private or Public, it
63doesn't matter in our case. And you can always change that in the future.
64
65When you have created new Space we should [generate Access
66key](https://cloud.digitalocean.com/settings/api/tokens). This link will guide
67to the page when you can generate this key. After you create new one, please
68save provided Key and Secret because Secret will not be shown again.
69
70![DigitalOcean Spaces](/assets/do-fuse/fuse-spaces.png)
71
72Now that we have new Space and Access key we should SSH into our machine.
73
74```bash
75# replace IP with the ip of your newly created droplet
76ssh root@IP
77
78# this will install utilities for mounting storage objects as FUSE
79apt install s3fs
80
81# we now need to provide credentials (access key we created earlier)
82# replace KEY and SECRET with your own credentials but leave the colon between them
83# we also need to set proper permissions
84echo "KEY:SECRET" > .passwd-s3fs
85chmod 600 .passwd-s3fs
86
87# now we mount space to our machine
88# replace UNIQUE-NAME with the name you choose earlier
89# if you choose different region for your space be careful about -ourl option (ams3)
90s3fs UNIQUE-NAME /mnt/ -ourl=https://ams3.digitaloceanspaces.com -ouse_cache=/tmp
91
92# now we try to create a file
93# once you mount it may take a couple of seconds to retrieve data
94echo "Hello cruel world" > /mnt/hello.txt
95```
96
97After all this you can return to your browser and go to [DigitalOcean
98Spaces](https://cloud.digitalocean.com/spaces) and click on your created
99space. If file hello.txt is present you have successfully mounted space to your
100machine and wrote data to it.
101
102I choose the same region for my Droplet and my Space but you don't have to. You
103can have different regions. What this actually does to performance I don't know.
104
105Additional information on FUSE:
106
107* [Github project page for s3fs](https://github.com/s3fs-fuse/s3fs-fuse)
108* [FUSE - Filesystem in Userspace](https://en.wikipedia.org/wiki/Filesystem_in_Userspace)
109
110## Will the performance degrade over time and over different sizes of objects?
111
112For this task I didn't want to just read and write text files or uploading
113images. I actually wanted to figure out if using something like SQlite is viable
114in this case.
115
116### Measurement experiment 1: File copy
117
118```bash
119# first we create some dummy files at different sizes
120dd if=/dev/zero of=10KB.dat bs=1024 count=10 #10KB
121dd if=/dev/zero of=100KB.dat bs=1024 count=100 #100KB
122dd if=/dev/zero of=1MB.dat bs=1024 count=1024 #1MB
123dd if=/dev/zero of=10MB.dat bs=1024 count=10240 #10MB
124
125# now we set time command to only return real
126TIMEFORMAT=%R
127
128# now lets test it
129(time cp 10KB.dat /mnt/) |& tee -a 10KB.results.txt
130
131# and now we automate
132# this will perform the same operation 100 times
133# this will output results into separated files based on objecty size
134n=0; while (( n++ < 100 )); do (time cp 10KB.dat /mnt/10KB.$n.dat) |& tee -a 10KB.results.txt; done
135n=0; while (( n++ < 100 )); do (time cp 100KB.dat /mnt/100KB.$n.dat) |& tee -a 100KB.results.txt; done
136n=0; while (( n++ < 100 )); do (time cp 1MB.dat /mnt/1MB.$n.dat) |& tee -a 1MB.results.txt; done
137n=0; while (( n++ < 100 )); do (time cp 10MB.dat /mnt/10MB.$n.dat) |& tee -a 10MB.results.txt; done
138```
139
140Files of size 100MB were not successfully transferred and ended up displaying
141error (cp: failed to close '/mnt/100MB.1.dat': Operation not permitted).
142
143As I suspected, object size is not really that important. Sadly I don't have the
144time to test performance over periods of time. But if some of you would do it
145please send me your data. I would be interested in seeing results.
146
147**Here are plotted results**
148
149You can download [raw result here](/assets/do-fuse/copy-benchmarks.tsv).
150Measurements are in seconds.
151
152<script src="//cdn.plot.ly/plotly-latest.min.js"></script>
153<div id="copy-benchmarks"></div>
154<script>
155(function(){
156 var request = new XMLHttpRequest();
157 request.open("GET", "/assets/do-fuse/copy-benchmarks.tsv", true);
158 request.onload = function() {
159 if (request.status >= 200 && request.status < 400) {
160 var payload = request.responseText.trim();
161 var tsv = payload.split("\n");
162 for (var i=0; i<tsv.length; i++) { tsv[i] = tsv[i].split("\t"); }
163 var traces = [];
164 var headers = tsv[0];
165 tsv.shift();
166 Array.prototype.forEach.call(headers, function(el, idx) {
167 var x = [];
168 var y = [];
169 for (var j=0; j<tsv.length; j++) {
170 x.push(j);
171 y.push(parseFloat(tsv[j][idx].replace(",", ".")));
172 }
173 traces.push({ x: x, y: y, type: "scatter", name: el, line: { width: 1, shape: "spline" } });
174 });
175 var copy = Plotly.newPlot("copy-benchmarks", traces, { legend: {"orientation": "h"}, height: 400, margin: { l: 40, r: 0, b: 20, t: 30, pad: 0 }, yaxis: { title: "execution time in seconds", titlefont: { size: 12 } }, xaxis: { title: "fn(i)", titlefont: { size: 12 } } });
176 } else { }
177 };
178 request.onerror = function() { };
179 request.send(null);
180})();
181</script>
182
183As far as these tests show, performance is quite stable and can be predicted
184which is fantastic. But this is a small test and spans only over couple of
185hours. So you should not completely trust them.
186
187### Measurement experiment 2: SQLite performanse
188
189I was unable to use database file directly from mounted drive so this is a no-go
190as I suspected. So I executed code below on a local disk just to get some
191benchmarks. I inserted 1000 records with DROPTABLE, CREATETABLE, INSERTMANY,
192FETCHALL, COMMIT for 1000 times to generate statistics. As you can see
193performance of SQLite is quite amazing. You could then potentially just copy
194file to mounted drive and be done with it.
195
196```python
197import time
198import sqlite3
199import sys
200
201if len(sys.argv) < 3:
202 print("usage: python sqlite-benchmark.py DB_PATH NUM_RECORDS REPEAT")
203 exit()
204
205def data_iter(x):
206 for i in range(x):
207 yield "m" + str(i), "f" + str(i*i)
208
209header_line = "%s\t%s\t%s\t%s\t%s\n" % ("DROPTABLE", "CREATETABLE", "INSERTMANY", "FETCHALL", "COMMIT")
210with open("sqlite-benchmarks.tsv", "w") as fp:
211 fp.write(header_line)
212
213start_time = time.time()
214conn = sqlite3.connect(sys.argv[1])
215c = conn.cursor()
216end_time = time.time()
217result_time = CONNECT = end_time - start_time
218print("CONNECT: %g seconds" % (result_time))
219
220start_time = time.time()
221c.execute("PRAGMA journal_mode=WAL")
222c.execute("PRAGMA temp_store=MEMORY")
223c.execute("PRAGMA synchronous=OFF")
224result_time = PRAGMA = end_time - start_time
225print("PRAGMA: %g seconds" % (result_time))
226
227for i in range(int(sys.argv[3])):
228 print("#%i" % (i))
229
230 start_time = time.time()
231 c.execute("drop table if exists test")
232 end_time = time.time()
233 result_time = DROPTABLE = end_time - start_time
234 print("DROPTABLE: %g seconds" % (result_time))
235
236 start_time = time.time()
237 c.execute("create table if not exists test(a,b)")
238 end_time = time.time()
239 result_time = CREATETABLE = end_time - start_time
240 print("CREATETABLE: %g seconds" % (result_time))
241
242 start_time = time.time()
243 c.executemany("INSERT INTO test VALUES (?, ?)", data_iter(int(sys.argv[2])))
244 end_time = time.time()
245 result_time = INSERTMANY = end_time - start_time
246 print("INSERTMANY: %g seconds" % (result_time))
247
248 start_time = time.time()
249 c.execute("select count(*) from test")
250 res = c.fetchall()
251 end_time = time.time()
252 result_time = FETCHALL = end_time - start_time
253 print("FETCHALL: %g seconds" % (result_time))
254
255 start_time = time.time()
256 conn.commit()
257 end_time = time.time()
258 result_time = COMMIT = end_time - start_time
259 print("COMMIT: %g seconds" % (result_time))
260
261 print
262 log_line = "%f\t%f\t%f\t%f\t%f\n" % (DROPTABLE, CREATETABLE, INSERTMANY, FETCHALL, COMMIT)
263 with open("sqlite-benchmarks.tsv", "a") as fp:
264 fp.write(log_line)
265
266start_time = time.time()
267conn.close()
268end_time = time.time()
269result_time = CLOSE = end_time - start_time
270print("CLOSE: %g seconds" % (result_time))
271```
272
273You can download [raw result here](/assets/do-fuse/sqlite-benchmarks.tsv). And
274again, these results are done on a local block storage and do not represent
275capabilities of object storage. With my current approach and state of the test
276code these can not be done. I would need to make Python code much more robust
277and check locking etc.
278
279<div id="sqlite-benchmarks"></div>
280<script>
281(function(){
282 var request = new XMLHttpRequest();
283 request.open("GET", "/assets/do-fuse/sqlite-benchmarks.tsv", true);
284 request.onload = function() {
285 if (request.status >= 200 && request.status < 400) {
286 var payload = request.responseText.trim();
287 var tsv = payload.split("\n");
288 for (var i=0; i<tsv.length; i++) { tsv[i] = tsv[i].split("\t"); }
289 var traces = [];
290 var headers = tsv[0];
291 tsv.shift();
292 Array.prototype.forEach.call(headers, function(el, idx) {
293 var x = [];
294 var y = [];
295 for (var j=0; j<tsv.length; j++) {
296 x.push(j);
297 y.push(parseFloat(tsv[j][idx].replace(",", ".")));
298 }
299 traces.push({ x: x, y: y, type: "scatter", name: el, line: { width: 1, shape: "spline" } });
300 });
301 var sqlite = Plotly.newPlot("sqlite-benchmarks", traces, { legend: {"orientation": "h"}, height: 400, margin: { l: 50, r: 0, b: 20, t: 30, pad: 0 }, yaxis: { title: "execution time in seconds", titlefont: { size: 12 } } });
302 } else { }
303 };
304 request.onerror = function() { };
305 request.send(null);
306})();
307</script>
308
309## Can storage be mounted on multiple machines at the same time and be writable?
310
311Well, this one didn't take long to test. And the answer is **YES**. I mounted
312space on both machines and measured same performance on both machines. But
313because file is downloaded before write and then uploaded on complete there
314could potentially be problems is another process is trying to access the same
315file.
316
317## Observations and conslusion
318
319Using Spaces in this way makes it easier to access and manage files. But besides
320that you would need to write additional code to make this one play nice with you
321applications.
322
323Nevertheless, this was extremely simple to setup and use and this is just
324another excellent product in DigitalOcean product line. I found this exercise
325very valuable and am thinking about implementing some sort of mechanism for
326SQLite, so data can be stored on Spaces and accessed by many VM's. For a project
327where data doesn't need to be accessible in real-time and can have couple of
328minutes old data this would be very interesting. If any of you find this
329proposal interesting please write in a comment box below or shoot me an email
330and I will keep you posted.