aboutsummaryrefslogtreecommitdiff
path: root/posts
diff options
context:
space:
mode:
Diffstat (limited to 'posts')
-rw-r--r--posts/2015-03-07-curriculum-vitae.md72
-rw-r--r--posts/2017-03-07-golang-profiling-simplified.md113
-rw-r--r--posts/2017-04-17-what-i-ve-learned-developing-ad-server.md136
-rw-r--r--posts/2017-04-21-profiling-python-web-applications-with-visual-tools.md187
-rw-r--r--posts/2017-08-11-simple-iot-application.md489
-rw-r--r--posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md263
-rw-r--r--posts/2019-01-03-encoding-binary-data-into-dna-sequence.md348
-rw-r--r--posts/2019-10-14-simplifying-and-reducing-clutter.md24
-rw-r--r--posts/2019-10-19-using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md88
-rw-r--r--posts/2020-03-22-simple-sse-based-pubsub-server.md398
-rw-r--r--posts/2020-03-27-create-placeholder-images-with-sharp.md85
-rw-r--r--posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md78
-rw-r--r--posts/2020-03-30-my-love-and-hate-relationship-with-nodejs copy.md43
-rwxr-xr-xposts/2020-05-05-remote-work.md39
-rw-r--r--posts/2020-08-15-systemd-disable-wake-onmouse.md49
-rw-r--r--posts/2020-09-06-esp-and-micropython.md205
-rw-r--r--posts/2020-09-08-bind-warning-on-login.md42
-rw-r--r--posts/2020-09-09-digitalocean-sync.md66
18 files changed, 2725 insertions, 0 deletions
diff --git a/posts/2015-03-07-curriculum-vitae.md b/posts/2015-03-07-curriculum-vitae.md
new file mode 100644
index 0000000..bb082e7
--- /dev/null
+++ b/posts/2015-03-07-curriculum-vitae.md
@@ -0,0 +1,72 @@
1---
2Title: Curriculum Vitae
3Description: Curriculum Vitae
4Slug: curriculum-vitae
5Listing: false
6Created: ""
7Tags: []
8---
9
10**Mitja Felicijan**
11
12*[m@mitjafelicijan.com](mailto:m@mitjafelicijan.com?subject=Website+CV+Contact)*
13
14*Slovenia, EU*
15
16## Technical experience
17
18- **Key languages:** Golang, Python, C, Bash.
19- **Platforms:** GNU/Linux, macOS.
20- **Interests:** Zigbee, KNX, Modbus, Machine to Machine, Embedded systems, Operating systems, Distributed systems, IOT, RDBMS, Algorithms, Database engine design, SQL, NoSQL, NewSQL, Big data analytics, Machine learning, Prediction algorithms, Realtime analytics, Systems automation, Natural language processing, Bioinformatics.
21
22## Major projects
23
24- SMS marketing system (2007)
25- Yacht management software (2008)
26- Smart Home Gateway (2009)
27- Moxa UPort 1130 USB to RS485 Universal Linux driver (2009)
28- Remote management of electricity meter (2009)
29- Remote management of blood pressure monitor (2010)
30- Infomat automation system (2010)
31- GPS Tourist - GIS Software (2011)
32- Minimal GNU/Linux distribution for embedded platforms (2011)
33- Digital Jukebox system (2012)
34- NanoCloudLogger - Machine to Machine (2012)
35- Street Lightning System (2012)
36- Smart cabins with hardware sensor management (2013)
37- Contextual advertising server (2015)
38- Network accessible database engine for caching and in-memory storage (2016)
39- Tick database engine specifically designed for storing and processing large amount of sensor data with high write throughput (2016)
40- Wireless industrial lighting management system - hardware and software (2016)
41- Minimal configuration reverse proxy (2017)
42- Industrial IOT platform for deployment on on-premise (2018)
43- Custom Platform as a service based on Docker Swarm (2018)
44- Toolkit for encoding binary data into DNA sequence (2019)
45- Minimal configuration reverse proxy with load balancing and rate limiting (2019)
46- E-ink conference room occupancy display, hardware and software solution (2019)
47
48## Employment history
49
50- Freelancer (2001 – Present)
51- Software developer at Mobinia (2005 – 2007)
52- CTO at Milk (2007 – 2009)
53- Co-Founder of UTS (2009 – 2014)
54- Senior Software Engineer at TSmedia (2015 - 2017)
55- Senior Software Engineer at Renderspace (2017 - 2019)
56- IT Consultant (2017 – Present)
57
58## Awards
59
60- Regional Award for Innovation by Chamber of Commerce and Industry of Slovenia for project Intelligent system management and regulation of Street Lighting, 2010
61- National Award for Innovation by Chamber of Commerce and Industry of Slovenia for project Intelligent system management and regulation of Street Lighting, 2010
62
63## Key responsibilities
64
65- Embedded platform development.
66- Hardware design and driver development.
67- Designing, developing and testing systems.
68- Implementation of the systems.
69- Writing and maintaining user and technical documents.
70- Development and maintenance of the project.
71- Code revision, testing and output.
72- Work on the enhancement suggested by the customers and fixes the bugs reported.
diff --git a/posts/2017-03-07-golang-profiling-simplified.md b/posts/2017-03-07-golang-profiling-simplified.md
new file mode 100644
index 0000000..8059aec
--- /dev/null
+++ b/posts/2017-03-07-golang-profiling-simplified.md
@@ -0,0 +1,113 @@
1---
2Title: Golang profiling simplified
3Description: Golang profiling simplified
4Slug: golang-profiling-simplified
5Listing: true
6Created: 2017, March 7
7Tags: []
8---
9
10Many posts have been written regarding profiling in Golang and I haven’t found proper tutorial regarding this. Almost all of them are missing some part of important information and it gets pretty frustrating when you have a deadline and are not finding simple distilled solution.
11
12Nevertheless, after searching and experimenting I have found a solution that works for me and probably should also for you.
13
14## Where are my pprof files?
15
16By default pprof files are generated in /tmp/ folder. You can override folder where this files are generated programmatically in your golang code as we will see below in example.
17
18## Why is my CPU profile empty?
19
20I have found out that sometimes CPU profile is empty because program was not executing long enough. Programs, that execute too quickly don’t produce pprof file in my cases. Well, file is generated but only contains 4KB of information.
21
22## Profiling
23
24As you can see from examples we are executing dummy_benchmark functions to ensure some sort of execution. Memory profiling can be done without such a “complex” function. But CPU profiling needs it.
25
26Both memory and CPU profiling examples are almost the same. Only parameters in main function when calling profile.Start are different. When we set profile.ProfilePath(“.”) we tell profiler to store pprof files in the same folder as our program.
27
28### Memory profiling
29
30```go
31package main
32
33import (
34 "fmt"
35 "time"
36 "github.com/pkg/profile"
37)
38
39func dummy_benchmark() {
40
41 fmt.Println("first set ...")
42 for i := 0; i < 918231333; i++ {
43 i *= 2
44 i /= 2
45 }
46
47 <-time.After(time.Second*3)
48
49 fmt.Println("sencond set ...")
50 for i := 0; i < 9182312232; i++ {
51 i *= 2
52 i /= 2
53 }
54}
55
56func main() {
57 defer profile.Start(profile.MemProfile, profile.ProfilePath("."), profile.NoShutdownHook).Stop()
58 dummy_benchmark()
59}
60```
61
62### CPU profiling
63
64```go
65package main
66
67import (
68 "fmt"
69 "time"
70 "github.com/pkg/profile"
71)
72
73func dummy_benchmark() {
74
75 fmt.Println("first set ...")
76 for i := 0; i < 918231333; i++ {
77 i *= 2
78 i /= 2
79 }
80
81 <-time.After(time.Second*3)
82
83 fmt.Println("sencond set ...")
84 for i := 0; i < 9182312232; i++ {
85 i *= 2
86 i /= 2
87 }
88}
89
90func main() {
91 defer profile.Start(profile.CPUProfile, profile.ProfilePath("."), profile.NoShutdownHook).Stop()
92 dummy_benchmark()
93}
94```
95
96### Generating profiling reports
97
98```bash
99# memory profiling
100go build mem.go
101./mem
102go tool pprof -pdf ./mem mem.pprof > mem.pdf
103
104# cpu profiling
105go build cpu.go
106./cpu
107go tool pprof -pdf ./cpu cpu.pprof > cpu.pdf
108```
109
110This will generate PDF document with visualized profile.
111
112- [Memory PDF profile example](/assets/go-profiling/golang-profiling-mem.pdf)
113- [CPU PDF profile example](/assets/go-profiling/golang-profiling-cpu.pdf)
diff --git a/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md b/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md
new file mode 100644
index 0000000..90fe238
--- /dev/null
+++ b/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md
@@ -0,0 +1,136 @@
1---
2Title: What I've learned developing ad server
3Description: Lessons I learned developing contextual ad server
4Slug: what-i-ve-learned-developing-ad-server
5Listing: true
6Created: 2017, April 17
7Tags: []
8---
9
10For the past year and half I have been developing native advertising server that contextually matches ads and displays them in different template forms on variety of websites. This project grew from serving thousands of ads per day to millions.
11
12The system is made from couple of core components:
13
14- API for serving ads,
15- Utils - cronjobs and queue management tools,
16- Dashboard UI.
17
18Initial release was using [MongoDB](https://www.mongodb.com/) for full-text search but was later replaced by [Elasticsearch](https://www.elastic.co/) for better CPU utilization and better search performance. This provided us with many amazing functionalities of [Elasticsearch](https://www.elastic.co/). You should check it out if you do any search related operations.
19
20Because the premise of the server is to provide native ad experience, they are rendered on the client side via simple templating engine. This ensures that ads can be displayed number of different ways based on the visual style of the page. And this makes JavaScript client library quite complex.
21
22So now that you know basic information about the product lets get into the lessons we learned.
23
24## Aggregate everything
25
26After beta version was released everything (impressions, clicks, etc) was written in nanosecond resolution in the database. At that time we were using [PostgreSQL](https://www.postgresql.org/) and database quickly grew way above 200GB in disk space. And that was problematic. Statistics took disturbingly long time to aggregate. Also using indexes on stats table in database was no help after we reached 500 million datapoints.
27
28> There is a marketing product information and there is real life experience. And the tend to be quite the opposite.
29
30This was the reason that now everything is aggregated on daily basis and this data is then fed to Elastic in form of daily summary. With this we achieved we can now track many more dimensions such as zone, channel and platform information. And with this information we can now adapt occurrences of ads on specific places more precisely.
31
32We have also adapted [Redis](https://redis.io/) as a full-time citizen in our stack. Because Redis also stores information on a local disk we have some sort of backup if server would accidentally suffer some failure.
33
34All the real-time statistics for ad serving and redirecting is presented as counters in Redis instance and daily extracted and pushed to Elastic.
35
36## Measure everything
37
38The thing about software is that we really don't know how well it is performing under load until such load is presented. When testing locally everything is fine but when on production things tend to fall apart.
39
40As a solution for this we are measuring everything we can. Function execution time (by encapsulating functions with timers), server performance (cpu, memory, disk, etc), Nginx and [uWSGI](https://uwsgi-docs.readthedocs.io/) performance. We sacrifice a bit of performance for the sake of this information. And we store all this information for later analysis.
41
42**Example of function execution time**
43
44```json
45{
46 "get_final_filtered_ads": {
47 "counter": 1931250,
48 "avg": 0.0066143431,
49 "elapsed": 12773.9500310003
50 },
51 "store_keywords_statistics": {
52 "counter": 1931011,
53 "avg": 0.0004605267,
54 "elapsed": 889.2821669996
55 },
56 "match_by_context": {
57 "counter": 1931011,
58 "avg": 0.0055960716,
59 "elapsed": 10806.0758889999
60 },
61 "match_by_high_performance": {
62 "counter": 262,
63 "avg": 0.0152770229,
64 "elapsed": 4.00258
65 },
66 "store_impression_stats": {
67 "counter": 1931250,
68 "avg": 0.0006189991,
69 "elapsed": 1195.4419869999
70 }
71}
72```
73
74We have also started profiling with [cProfile](https://pymotw.com/2/profile/) and then visualizing with [KCachegrind](http://kcachegrind.sourceforge.net/). This provides much more detailed look into code execution.
75
76## Cache control is your friend
77
78Because we use Javascript library for rendering ads we rely on this script extensively and when in need we need to be able to change behavior of the script quickly.
79
80In our case we can not simply replace javascript url in html code. It usually takes a day or two for the guys who maintain sites to change code or add ?ver=xxx attribute. And this makes rapid deployment and testing very difficult and time consuming. There is a limitation of how much you can test locally.
81
82We are now in the process of integrating [Google Tag Manager](https://www.google.com/analytics/tag-manager/) but couple of websites are developed on ASP.net platform that have some problems with tag manager. With a solution below we are certain that we are serving latest version of the script.
83
84And it only takes one mistake and users have the script cached and in case of caching it for 1 year you probably know where the problem is.
85
86```nginx
87# nginx ➜ /etc/nginx/sites-available/default
88location /static/ {
89 alias /path-to-static-content/;
90 autoindex off;
91 charset utf-8;
92 gzip on;
93 gzip_types text/plain application/javascript application/x-javascript text/javascript text/xml text/css;
94 location ~* \.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)$ {
95 expires 1y;
96 add_header Pragma public;
97 add_header Cache-Control "public";
98 }
99 location ~* \.(css|js|txt)$ {
100 expires 3600s;
101 add_header Pragma public;
102 add_header Cache-Control "public, must-revalidate";
103 }
104}
105```
106
107Also be careful when redirecting to url in your python code. We noticed that if we didn't precisely setup cache control and expire headers in response we didn't get the request on the server and therefore couldn't measure clicks. So when redirecting do as follows and there will be no problems.
108
109```python
110# python ➜ bottlepy web micro-framework
111response = bottle.HTTPResponse(status=302)
112response.set_header("Cache-Control", "no-store, no-cache, must-revalidate")
113response.set_header("Expires", "Thu, 01 Jan 1970 00:00:00 GMT")
114response.set_header("Location", url)
115return response
116```
117
118> Cache control in browsers is quite aggressive and you need to be precise to avoid future problems. We learned that lesson the hard way.
119
120## Learn NGINX
121
122When deciding on a web server we went with Nginx as a reverse proxy for our applications. We adapted micro-service oriented architecture early in the project to ensure when we scale we can easily add additional servers to our cluster. And Nginx was crucial to perform load balancing and static content delivery.
123
124At first our config file was quite simple and later grew larger. After patching and adding new settings I sat down and learned more about the guts of Nginx. This proved to be very useful and we were able to squeeze much more out of our setup. So I advise you to take your time and read through the [documentation](https://nginx.org/en/docs/). This saved us a lot of headache. Googling for solutions only goes so far.
125
126## Use Redis/Memcached
127
128As explained above we are using caching basically for everything. It is the corner stone of our services. At first we were very careful about the quantity of things we stored in [Redis](https://redis.io/). But we later found out that the memory footprint is very low even when storing large amount of data in it.
129
130So we gradually increased our usage to caching whole HTML outputs of dashboard. This improved our performance in order of magnitude. And by using native TTL support this goes hand in hand with our needs.
131
132The reason why we choose [Redis](https://redis.io/) over [Memcached](https://memcached.org/) was the nature of scalability of Redis out of the box. But all this can be achieved with Memcached.
133
134## Conclusion
135
136There are a lot more details that could have been written and every single topic in here deserves it's own post but you probably got the idea about the problems we faced.
diff --git a/posts/2017-04-21-profiling-python-web-applications-with-visual-tools.md b/posts/2017-04-21-profiling-python-web-applications-with-visual-tools.md
new file mode 100644
index 0000000..af2c65a
--- /dev/null
+++ b/posts/2017-04-21-profiling-python-web-applications-with-visual-tools.md
@@ -0,0 +1,187 @@
1---
2Title: Profiling Python web applications with visual tools
3Description: Missing link when debugging and profiling python web application
4Slug: profiling-python-web-applications-with-visual-tools
5Listing: true
6Created: 2017, April 21
7Tags: []
8---
9
10I have been profiling my software with KCachegrind for a long time now and I was missing this option when I am developing API's or other web services. I always knew that this is possible but never really took the time and dive into it.
11
12Before we begin there are some requirements. We will need to:
13
14- implement [cProfile](https://docs.python.org/2/library/profile.html#module-cProfile) into our web app,
15- convert output to [callgrind](http://valgrind.org/docs/manual/cl-manual.html) format with [pyprof2calltree](https://pypi.python.org/pypi/pyprof2calltree/),
16- visualize data with [KCachegrind](http://kcachegrind.sourceforge.net/html/Home.html) or [Profiling Viewer](http://www.profilingviewer.com/).
17
18
19If you are using MacOS you should check out [Profiling Viewer](http://www.profilingviewer.com/) or [MacCallGrind](http://www.maccallgrind.com/).
20
21![KCachegrind](/assets/python-profiling/kcachegrind.png)
22
23We will be dividing this post into two main categories:
24
25- writing simple web-service,
26- visualize profile of this web-service.
27
28## Simple web-service
29
30Let's use virtualenv so we won't pollute our base system. If you don't have virtualenv installed on your system you can install it with pip command.
31
32```bash
33# let's install virtualenv globally
34$ sudo pip install virtualenv
35
36# let's also install pyprof2calltree globally
37$ sudo pip install pyprof2calltree
38
39# now we create project
40$ mkdir demo-project
41$ cd demo-project/
42
43# now let's create folder where we will store profiles
44$ mkdir prof
45
46# now we create empty virtualenv in venv/ folder
47$ virtualenv --no-site-packages venv
48
49# we now need to activate virtualenv
50$ source venv/bin/activate
51
52# you can check if virtualenv was correctly initialized by
53# checking where your python interpreter is located
54# if command bellow points to your created directory and not some
55# system dir like /usr/bin/python then everything is fine
56$ which python
57
58# we can check now if all is good ➜ if ok couple of
59# lines will be displayed
60$ pip freeze
61# appdirs==1.4.3
62# packaging==16.8
63# pyparsing==2.2.0
64# six==1.10.0
65
66# now we are ready to install bottlepy ➜ web micro-framework
67$ pip install bottle
68
69# you can deactivate virtualenv but you will then go
70# under system domain ➜ for now don't deactivate
71$ deactivate
72```
73
74We are now ready to write simple web service. Let's create file app.py and paste code bellow in this newly created file.
75
76```python
77# -*- coding: utf-8 -*-
78
79import bottle
80import random
81import cProfile
82
83app = bottle.Bottle()
84
85# this function is a decorator and encapsulates function
86# and performs profiling and then saves it to subfolder
87# prof/function-name.prof
88# in our example only awesome_random_number function will
89# be profiled because it has do_cprofile defined
90def do_cprofile(func):
91 def profiled_func(*args, **kwargs):
92 profile = cProfile.Profile()
93 try:
94 profile.enable()
95 result = func(*args, **kwargs)
96 profile.disable()
97 return result
98 finally:
99 profile.dump_stats("prof/" + str(func.__name__) + ".prof")
100 return profiled_func
101
102
103# we use profiling over specific function with including
104# @do_cprofile above function declaration
105@app.route("/")
106@do_cprofile
107def awesome_random_number():
108 awesome_random_number = random.randint(0, 100)
109 return "awesome random number is " + str(awesome_random_number)
110
111@app.route("/test")
112def test():
113 return "dummy test"
114
115if __name__ == '__main__':
116 bottle.run(
117 app = app,
118 host = "0.0.0.0",
119 port = 4000
120 )
121
122# run with 'python app.py'
123# open browser 'http://0.0.0.0:4000'
124```
125
126When browser hits awesome\_random\_number() function profile is created in prof/ subfolder.
127
128## Visualize profile
129
130Now let's create callgrind format from this cProfile output.
131
132```bash
133$ cd prof/
134$ pyprof2calltree -i awesome_random_number.prof
135# this creates 'awesome_random_number.prof.log' file in the same folder
136```
137
138This file can be opened with visualizing tools listed above. In this case we will be using Profilling Viewer under MacOS. You can open image in new tab. As you can see from this example there is hierarchy of execution order of your code.
139
140![Profilling Viewer](/assets/python-profiling/profiling-viewer.png)
141
142> Make sure you convert output of the cProfile output every time you want to refresh and take a look at your possible optimizations because cProfile updates .prof file every time browser hits the function.
143
144This is just a simple example but when you are developing real-life applications this can be very illuminating, especially to see which parts of your code are bottlenecks and need to be optimized.
145
146## Update 2017-04-22
147
148Reddit user [mvt](https://www.reddit.com/user/mvt) also recommended this awesome web based profile visualizer [SnakeViz](https://jiffyclub.github.io/snakeviz/) that directly takes output from [cProfile](https://docs.python.org/2/library/profile.html#module-cProfile) module.
149
150<div class="reddit-embed" data-embed-media="www.redditmedia.com" data-embed-parent="false" data-embed-live="false" data-embed-uuid="583880c1-002e-41ed-a373-020a0ef2cff9" data-embed-created="2017-04-22T19:46:54.810Z"><a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/dgljhsb/">Comment</a> from discussion <a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/">Profiling Python web applications with visual tools</a>.</div><script async src="https://www.redditstatic.com/comment-embed.js"></script>
151
152```bash
153# let's install it globally as well
154$ sudo pip install snakeviz
155
156# now let's visualize
157$ cd prof/
158$ snakeviz awesome_random_number.prof
159# this automatically opens browser window and
160# shows visualized profile
161```
162
163![SnakeViz](/assets/python-profiling/snakeviz.png)
164
165Reddit user [ccharles](https://www.reddit.com/user/ccharles) suggested a better way for installing pip software by targeting user level instead of using sudo.
166
167<div class="reddit-embed" data-embed-media="www.redditmedia.com" data-embed-parent="false" data-embed-live="false" data-embed-uuid="f4f0459e-684d-441e-bebe-eb49b2f0a31d" data-embed-created="2017-04-22T19:46:10.874Z"><a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/dglpzkx/">Comment</a> from discussion <a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/">Profiling Python web applications with visual tools</a>.</div><script async src="https://www.redditstatic.com/comment-embed.js"></script>
168
169```bash
170# now we need to add this path to our $PATH variable
171# we do this my adding this line at the end of your
172# ~/.bashrc file
173PATH=$PATH:$HOME/.local/bin/
174
175# in order to use this new configuration you can close
176# and reopen terminal or reload .bashrc file
177$ source ~/.bashrc
178
179# now let's test if new directory is present in $PATH
180$ echo $PATH
181
182# now we can install on user level by adding --user
183# without use of sudo
184$ pip install snakeviz --user
185```
186
187Or as suggested by [mvt](https://www.reddit.com/user/mvt) you can use [pipsi](https://github.com/mitsuhiko/pipsi).
diff --git a/posts/2017-08-11-simple-iot-application.md b/posts/2017-08-11-simple-iot-application.md
new file mode 100644
index 0000000..dee5e74
--- /dev/null
+++ b/posts/2017-08-11-simple-iot-application.md
@@ -0,0 +1,489 @@
1---
2Title: Simple IOT application supported by real-time monitoring and data history
3Description: Develop simple IOT application with Arduino MKR1000 and Python
4Slug: simple-iot-application
5Listing: true
6Created: 2017, August 11
7Tags: []
8---
9
10## Initial thoughts
11
12I have been developing these kind of application for the better part of my last 5 years and people keep asking me how to approach developing such application and I will give a try explaining it here.
13
14IOT applications are really no different than any other kind of applications. We have data that needs to be collected and visualized in some form of tables or charts. The main difference here is that most of the times these data is collected by some kind of device foreign to developer that mainly operates in web domain. But fear not, it's not that different than writing some JavaScript.
15
16There are many devices able to transmit data via wireless or wired network by default but for the sake of example we will be using commonly known Arduino with wireless module already on the board → [Arduino MKR1000](https://store.arduino.cc/arduino-mkr1000).
17
18In order to make this little project as accessible to others as possible I will try to make it as inexpensive as possible. And by this I mean that I will avoid using hosted virtual servers and will be using my own laptop as a server. But you must buy Arduino MKR1000 to follow steps below. But if you would want to deploy this software I would suggest using [DigitalOcean](https://www.digitalocean.com) → smallest VPS is only per month making this one of the most affordable option out there. Please notice that this software will not run on stock web hosting that only supports LAMP (Linux, Apache, MySQL, and PHP).
19
20_But before we begin please take notice that this is strictly experimental code and not well optimized and there are much better ways in handling some aspects of the application but that requires much deeper knowledge of technology that is not needed for an example like this._
21
22**Development steps**
23
241. Simple Python API that will receive and store incoming data.
252. Prototype C++ code that will read "sensor data" and transmit it to API.
263. Data visualization with charts → extends Python web application.
27
28Step 1. and 3. will share the same web application. One route will be dedicated to API and another to serving HTML with chart.
29
30Schema below represents what we will try to achieve and how different parts correlates to each other.
31
32![Overview](/assets/iot-application/simple-iot-application-overview.svg)
33
34## Simple Python API
35
36I have always been a fan of simplicity so we will be using [Bottle: Python Web Framework](https://bottlepy.org/docs/dev/). It is a single file web framework that seriously simplifies working with routes, templating and has built-in web server that satisfies our need in this case.
37
38First we need to install bottle package. This can be done by downloading ```bottle.py``` and placing it in the root of your application or by using pip software ```pip install bottle --user```.
39
40If you are using Linux or MacOS then Python is already installed. If you will try to test this on Windows please install [Python for Windows](https://www.python.org/downloads/windows/). There may be some problems with path when you will try to launch ```python webapp.py``` so please take care of this before you continue.
41
42### Basic web application
43
44Most basic bottle application is quite simple. Paste code below in ```webapp.py``` file and save.
45
46```python
47# -*- coding: utf-8 -*-
48
49import bottle
50
51# initializing bottle app
52app = bottle.Bottle()
53
54# triggered when / is accessed from browser
55# only accepts GET → no POST allowed
56@app.route("/", method=["GET"])
57def route_default():
58 return "howdy from python"
59
60# starting server on http://0.0.0.0:5000
61if __name__ == "__main__":
62 bottle.run(
63 app = app,
64 host = "0.0.0.0",
65 port = 5000,
66 debug = True,
67 reloader = True,
68 catchall = True,
69 )
70```
71
72To run this simple application you should open command prompt or terminal on your machine and go to the folder containing your file and type ```python webapp.py```. If everything goes ok then open your web browser and point it to ```http://0.0.0.0:5000```.
73
74If you would like change the port of your application (like port 80) and not use root to run your app this will present a problem. The TCP/IP port numbers below 1024 are privileged ports → this is a security feature. So in order of simplicity and security use a port number above 1024 like I have used port 5000.
75
76If this fails at any time please fix it before you continue, because nothing below will work otherwise.
77
78We use 0.0.0.0 as default host so that this app is available over your local network. If you find your local ip ```ifconfig``` and try accessing this site with your phone (if on same network/router as your machine) this should work as well (example of such ip ```http://192.168.1.15:5000```). This is a must have because Arduino will be accessing this application to send it's data.
79
80### Web application security
81
82There is a lot to be said about security and is a topic of many books. Of course all this can not be written here but to just establish some basic security → you should always use SSL with your application. Some fantastic free certificates are available by [Let's Encrypt - Free SSL/TLS Certificates](https://letsencrypt.org). With SSL certificate installed you should then make use of HTTP headers and send your "API key" via a header. If your key is send via header then this key is encrypted by SSL and send encrypted over the network. Never send your api keys by GET parameter like ```http://example.com/?api_key=somekeyvalue```. The problem that this kind of sending presents is that this key is visible in logs and by network sniffers.
83
84There is a fantastic article describing some aspects about security: [11 Web Application Security Best Practices](https://www.keycdn.com/blog/web-application-security-best-practices/). Please check it out.
85
86### Simple API for writing data-points
87
88We will now be using boilerplate code from example above and extend it to be able to write data received by API to local storage. For example use I will use SQLite3 because it plays well with Python and can store quite large amount of data. I have been using it to collect gigabytes of data in a single database without any corruption or problems → your experience may vary.
89
90To avoid learning SQLite I will be using [Dataset: databases for lazy people](https://dataset.readthedocs.io/en/latest/index.html). This package abstracts SQL and simplifies writing and reading data from database. You should install this package with pip software ```pip install dataset --user```.
91
92Because API will use POST method I will be testing if code works correctly by using [Restlet Client for Google Chrome](https://chrome.google.com/webstore/detail/restlet-client-rest-api-t/aejoelaoggembcahagimdiliamlcdmfm). This software also allows you to set headers → for basic security with API_KEY.
93
94To quickly generate passwords or API keys I usually use this nifty website [RandomKeygen](https://randomkeygen.com/).
95
96Copy and paste code below over your previous code in file ```webapp.py```.
97
98```python
99# -*- coding: utf-8 -*-
100
101import time
102import bottle
103import random
104import dataset
105
106# initializing bottle app
107app = bottle.Bottle()
108
109# connects to sqlite database
110# check_same_thread=False allows using it in multi-threaded mode
111app.config["dsn"] = dataset.connect("sqlite:///data.db?check_same_thread=False")
112
113# api key that will be used in Arduino code
114app.config["api_key"] = "JtF2aUE5SGHfVJBCG5SH"
115
116# triggered when /api is accessed from browser
117# only accepts POST → no GET allowed
118@app.route("/api", method=["POST"])
119def route_default():
120 status = 400
121 ts = int(time.time()) # current timestamp
122 value = bottle.request.body.read() # data from device
123 api_key = bottle.request.get_header("Api_Key") # api key from header
124
125 # outputs to console received data for debug reason
126 print ">>> {} :: {}".format(value, api_key)
127
128 # if api_key is correct and value is present
129 # then writes attribute to point table
130 if api_key == app.config["api_key"] and value:
131 app.config["dsn"]["point"].insert(dict(ts=ts, value=value))
132 status = 200
133
134 # we only need to return status
135 return bottle.HTTPResponse(status=status, body="")
136
137# starting server on http://0.0.0.0:5000
138if __name__ == "__main__":
139 bottle.run(
140 app = app,
141 host = "0.0.0.0",
142 port = 5000,
143 debug = True,
144 reloader = True,
145 catchall = True,
146 )
147```
148
149To run this simply go to folder containing python file and run ```python webapp.py``` from terminal. If everything goes ok you should have simple API available via POST method on /api route.
150
151After testing the service with Restlet Client you should be able to view your data in a database file ```data.db```.
152
153![REST settings example](/assets/iot-application/iot-rest-example.png)
154
155You can also check the contents of new database file by using desktop client for SQLite → [DB Browser for SQLite](http://sqlitebrowser.org/).
156
157![SQLite database example](/assets/iot-application/iot-sqlite-db.png)
158
159Table structure is as simple as it can be. We have ts (timestamp) and value (value from Arduino). As you can see timestamp is generated on API side. If you would happen to have atomic clock on Arduino it would be then better to generate and send timestamp with the value. This would be particularity useful if we would be collecting sensor data at a higher frequency and then sending this data in bulk to API.
160
161If you will deploy this app with uWSGI and multi-threaded, use DSN (Data Source Name) url with ```?check_same_thread=False```.
162
163Ok, now that we have some sort of a working API with some basic security so unwanted people can not post data to your database can we proceed further and try to program Arduino to send data to API.
164
165## Sending data to API with Arduino MKR1000
166
167First of all you should have MKR1000 module and microUSB cable to proceed. If you have ever done any work with Arduino you should know that you also need [Arduino IDE](https://www.arduino.cc/en/Main/Software). On provided link you should be able to download and install IDE. Once that task is completed and you have successfully run blink example you should proceed to the next step.
168
169In order to use wireless capabilities of MKR1000 you need to first install [WiFi101 library](https://www.arduino.cc/en/Reference/WiFi101) in Arduino IDE. Please check before you install, you may already have it installed.
170
171Code below is a working example that sends data to API. Before you try to test your code make sure you have run Python web application. Then change settings for wifi, api endpoint and api_key. If by some reason code bellow doesn't work for you please leave a comment and I'll try to help.
172
173Once you have opened IDE and copied this code try to compile and upload it. Then open "Serial monitor" to see if any output is presented by Arduino.
174
175```c
176#include <WiFi101.h>
177
178// wifi settings
179char ssid[] = "ssid-name";
180char pass[] = "ssid-password";
181
182// api server enpoint
183char server[] = "192.168.6.22";
184int port = 5000;
185
186// api key that must be the same as the one in Python code
187String api_key = "JtF2aUE5SGHfVJBCG5SH";
188
189// frequency data is sent in ms - every 5 seconds
190int timeout = 1000 * 5;
191
192int status = WL_IDLE_STATUS;
193
194void setup() {
195
196 // initialize serial and wait for port to open:
197 Serial.begin(9600);
198 delay(1000);
199
200 // check for the presence of the shield
201 if (WiFi.status() == WL_NO_SHIELD) {
202 Serial.println("WiFi shield not present");
203 while (true);
204 }
205
206 // attempt to connect to wifi network
207 while (status != WL_CONNECTED) {
208 Serial.print("Attempting to connect to SSID: ");
209 Serial.println(ssid);
210 status = WiFi.begin(ssid, pass);
211 // wait 10 seconds for connection
212 delay(10000);
213 }
214
215 // output wifi status to serial monitor
216 Serial.print("SSID: ");
217 Serial.println(WiFi.SSID());
218
219 IPAddress ip = WiFi.localIP();
220 Serial.print("IP Address: ");
221 Serial.println(ip);
222
223 long rssi = WiFi.RSSI();
224 Serial.print("signal strength (RSSI):");
225 Serial.print(rssi);
226 Serial.println(" dBm");
227}
228
229void loop() {
230
231 WiFiClient client;
232
233 if (client.connect(server, port)) {
234
235 // I use random number generator for this example
236 // but you can use analog or digital inputs from arduino
237 String content = String(random(1000));
238
239 client.println("POST /api HTTP/1.1");
240 client.println("Connection: close");
241 client.println("Api-Key: " + api_key);
242 client.println("Content-Length: " + String(content.length()));
243 client.println();
244 client.println(content);
245
246 delay(100);
247 client.stop();
248 Serial.println("Data sent successfully ...");
249
250 } else {
251 Serial.println("Problem sending data ...");
252 }
253
254 // waits for x seconds and continue looping
255 delay(timeout);
256
257}
258```
259
260As seen from example you can notice that Arduino is generating random integer between [ 0 .. 1000 ]. You can easily replace this with a temperature sensor or any other kind of sensor.
261
262Now that we have API under the hood and Arduino is sending demo data we can now focus on data visualization.
263
264## Data visualization
265
266Before we continue we should examine our project folder structure. Currently we only have two files in our project:
267
268_simple-iot-app/_
269
270* _webapp.py_
271* _data.db_
272
273We will now add HTML template that will contain CSS and JavaScript code inline for the simplicity reason. And for the bottle framework to be able to scan root application folder for templates we will add ```bottle.TEMPLATE_PATH.insert(0, "./")``` in ```webapp.py```. By default bottle framework uses ```views/``` subfolder to store templates. This is not the ideal situation and if you will use bottle to develop web applications you should use native behavior and store templates in it's predefined folder. But for the sake of example we will over-ride this. Be careful to fully replace your code with new code that is provided below. Avoid partially replacing code in file :) Also new code for reading data-points is provided in Python example below.
274
275First we add new route to our web application. It should be trigger when browser hits root of application ```http://0.0.0.0:5000/```. This route will do nothing more than render ```frontend.html``` template. This is done by ```return bottle.template("frontend.html")```. Check code below to further examine how exactly this is done.
276
277Now we will expand ```/api``` route and use different methods to write or read data-points. For writing data-point we will use POST method and for reading points we will use GET method. GET method will return JSON object with latest readings and historical data.
278
279There is a fantastic JavaScript library for plotting time-series charts called [MetricsGraphics.js](https://www.metricsgraphicsjs.org) that is based on [D3.js](https://d3js.org/) library for visualizing data.
280
281Data schema required by MetricsGraphics.js → to achieve this we need to transform data from database into this format:
282
283```json
284[
285 {
286 "date": "2017-08-11 01:07:20",
287 "value": 933
288 },
289 {
290 "date": "2017-08-11 01:07:30",
291 "value": 743
292 }
293]
294```
295
296Web application is now complete and we only need ```frontend.html``` that we will develop now. If you would try to start web app now and go to root app this will return error because we don't have frontend.html yet.
297
298```python
299# -*- coding: utf-8 -*-
300
301import time
302import bottle
303import json
304import datetime
305import random
306import dataset
307
308# initializing bottle app
309app = bottle.Bottle()
310
311# adds root directory as template folder
312bottle.TEMPLATE_PATH.insert(0, "./")
313
314# connects to sqlite database
315# check_same_thread=False allows using it in multi-threaded mode
316app.config["db"] = dataset.connect("sqlite:///data.db?check_same_thread=False")
317
318# api key that will be used in Arduino code
319app.config["api_key"] = "JtF2aUE5SGHfVJBCG5SH"
320
321# triggered when / is accessed from browser
322# only accepts GET → no POST allowed
323@app.route("/", method=["GET"])
324def route_default():
325 return bottle.template("frontend.html")
326
327# triggered when /api is accessed from browser
328# accepts POST and GET
329@app.route("/api", method=["GET", "POST"])
330def route_default():
331
332 # if method is POST then we write datapoint
333 if bottle.request.method == "POST":
334 status = 400
335 ts = int(time.time()) # current timestamp
336 value = bottle.request.body.read() # data from device
337 api_key = bottle.request.get_header("Api-Key") # api key from header
338
339 # outputs to console recieved data for debug reason
340 print ">>> {} :: {}".format(value, api_key)
341
342 # if api_key is correct and value is present
343 # then writes attribute to point table
344 if api_key == app.config["api_key"] and value:
345 app.config["db"]["point"].insert(dict(ts=ts, value=value))
346 status = 200
347
348 # we only need to return status
349 return bottle.HTTPResponse(status=status, body="")
350
351 # if method is GET then we read datapoint
352 else:
353 response = []
354 datapoints = app.config["db"]["point"].all()
355
356 for point in datapoints:
357 response.append({
358 "date": datetime.datetime.fromtimestamp(int(point["ts"])).strftime("%Y-%m-%d %H:%M:%S"),
359 "value": point["value"]
360 })
361
362 bottle.response.content_type = "application/json"
363 return json.dumps(response)
364
365# starting server on http://0.0.0.0:5000
366if __name__ == "__main__":
367 bottle.run(
368 app = app,
369 host = "0.0.0.0",
370 port = 5000,
371 debug = True,
372 reloader = True,
373 catchall = True,
374 )
375```
376
377And now finally we can implement ```frontend.html```. Create file with this name and copy code below. When you are done you can start web application. Steps for this part are listed below the code.
378
379```html
380<!DOCTYPE html>
381<html>
382
383 <head>
384 <meta charset="utf-8">
385 <title>Simple IOT application</title>
386 </head>
387
388 <body>
389
390 <h1>Simple IOT application</h1>
391
392 <div class="chart-placeholder">
393 <div id="chart"></div>
394 </div>
395
396 <!-- application main script -->
397 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
398 <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/4.10.0/d3.min.js"></script>
399 <script src="https://cdnjs.cloudflare.com/ajax/libs/metrics-graphics/2.11.0/metricsgraphics.min.js"></script>
400 <script>
401 function fetch_and_render() {
402 d3.json("/api", function(data) {
403 data = MG.convert.date(data, "date", "%Y-%m-%d %H:%M:%S");
404 MG.data_graphic({
405 data: data,
406 chart_type: "line",
407 full_width: true,
408 height: 270,
409 target: document.getElementById("chart"),
410 x_accessor: "date",
411 y_accessor: "value"
412 });
413 });
414 }
415 window.onload = function() {
416 // initial call for rendering
417 fetch_and_render();
418
419 // updates chart every 5 seconds
420 setInterval(function() {
421 fetch_and_render();
422 }, 5000);
423 }
424 </script>
425
426 <!-- application styles -->
427 <style>
428 body {
429 font: 13px sans-serif;
430 padding: 20px 50px;
431 }
432 .chart-placeholder {
433 border: 2px solid #ccc;
434 width: 100%;
435 user-select: none;
436 }
437 /* chart styles */
438 .mg-line1-color {
439 stroke: red;
440 stroke-width: 2;
441 }
442 .mg-main-area, .mg-main-line {
443 fill: #fff;
444 }
445 .mg-x-axis line, .mg-y-axis line {
446 stroke: #b3b2b2;
447 stroke-width: 1px;
448 }
449 </style>
450
451 </body>
452
453</html>
454```
455
456Now the folder structure should look like:
457
458_simple-iot-app/_
459
460* _webapp.py_
461* _data.db_
462* _frontend.html_
463
464Ok, lets now start application and start feeding it data.
465
4661. ```python webapp.py```
4672. connect Arduino MKR1000 to power source
4683. open browser and go to ```http://0.0.0.0:5000```
469
470If everything goes well you should be seeing new data-points rendered on chart every 5 seconds.
471
472If you navigate to ```http://0.0.0.0:5000``` you should see rendered chart as shown on picture below.
473
474![Application output](/assets/iot-application/iot-app-output.png)
475
476Complete application with all the code is available for [download](/assets/iot-application/simple-iot-application.zip).
477
478## Conclusion
479
480I hope this clarifies some aspects of IOT application development. Of course this is a minimal example and is far from what can be done in real life with some further dive into other technologies.
481
482If you would like to continue exploring IOT world here are some interesting resources for you to examine:
483
484* [Reading Sensors with an Arduino](https://www.allaboutcircuits.com/projects/reading-sensors-with-an-arduino/)
485* [MQTT 101 – How to Get Started with the lightweight IoT Protocol](http://www.hivemq.com/blog/how-to-get-started-with-mqtt)
486* [Stream Updates with Server-Sent Events](https://www.html5rocks.com/en/tutorials/eventsource/basics/)
487* [Internet of Things (IoT) Tutorials](http://www.tutorialspoint.com/internet_of_things/)
488
489Any comment or additional ideas are welcomed in comments below.
diff --git a/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md b/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md
new file mode 100644
index 0000000..ae895f7
--- /dev/null
+++ b/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md
@@ -0,0 +1,263 @@
1---
2Title: Using DigitalOcean Spaces Object Storage with FUSE
3Description: Using DigitalOcean Spaces Object Storage with FUSE
4Slug: using-digitalocean-spaces-object-storage-with-fuse
5Listing: true
6Created: 2018, January 16
7Tags: []
8---
9
10Couple of months ago [DigitalOcean](https://www.digitalocean.com) introduced new product called [Spaces](https://blog.digitalocean.com/introducing-spaces-object-storage/) which is Object Storage very similar to Amazon's S3. This really peaked my interest, because this was something I was missing and even the thought of going over the internet for such functionality was in no interest to me. Also in fashion with their previous pricing this also is very cheap and pricing page is a no-brainer compared to AWS or GCE. [Prices are clearly and precisely defined and outlined](https://www.digitalocean.com/pricing/). You must love them for that :)
11
12### Initial requirements
13
14* Is it possible to use them as a mounted drive with FUSE? (tl;dr YES)
15* Will the performance degrade over time and over different sizes of objects? (tl;dr NO&YES)
16* Can storage be mounted on multiple machines at the same time and be writable? (tl;dr YES)
17
18> Let me be clear. This scripts I use are made just for benchmarking and are not intended to be used in real-life situations. Besides that, I am looking into using this approaches but adding caching service in front of it and then dumping everything as an object to storage. This could potentially be some interesting post of itself. But in case you would need real-time data without eventual consistency please take this scripts as they are: not usable in such situations.
19
20## Is it possible to use them as a mounted drive with FUSE?
21
22Well, actually they can be used in such manor. Because they are similar to [AWS S3](https://aws.amazon.com/s3/) many tools are available and you can find many articles and [Stackoverflow items](https://stackoverflow.com/search?q=s3+fuse).
23
24To make this work you will need DigitalOcean account. If you don't have one you will not be able to test this code. But if you have an account then you go and [create new Droplet](https://cloud.digitalocean.com/droplets/new?size=s-1vcpu-1gb&region=ams3&distro=debian&distroImage=debian-9-x64&options=private_networking,install_agent). If you click on this link you will already have preselected Debian 9 with smallest VM option.
25
26* Please be sure to add you SSH key, because we will login to this machine remotely.
27* If you change your region please remember which one you choose because we will need this information when we try to mount space to our machine.
28
29Instuctions on how to use SSH keys and how to setup them are available in article [How To Use SSH Keys with DigitalOcean Droplets](https://www.digitalocean.com/community/tutorials/how-to-use-ssh-keys-with-digitalocean-droplets).
30
31![DigitalOcean Droplets](/assets/do-fuse/fuse-droplets.png)
32
33After we created Droplet it's time to create new Space. This is done by clicking on a button [Create](https://cloud.digitalocean.com/spaces/new) (right top corner) and selecting Spaces. Choose pronounceable ```Unique name``` because we will use it in examples below. You can either choose Private or Public, it doesn't matter in our case. And you can always change that in the future.
34
35When you have created new Space we should [generate Access key](https://cloud.digitalocean.com/settings/api/tokens). This link will guide to the page when you can generate this key. After you create new one, please save provided Key and Secret because Secret will not be shown again.
36
37![DigitalOcean Spaces](/assets/do-fuse/fuse-spaces.png)
38
39Now that we have new Space and Access key we should SSH into our machine.
40
41```bash
42# replace IP with the ip of your newly created droplet
43ssh root@IP
44
45# this will install utilities for mounting storage objects as FUSE
46apt install s3fs
47
48# we now need to provide credentials (access key we created earlier)
49# replace KEY and SECRET with your own credentials but leave the colon between them
50# we also need to set proper permissions
51echo "KEY:SECRET" > .passwd-s3fs
52chmod 600 .passwd-s3fs
53
54# now we mount space to our machine
55# replace UNIQUE-NAME with the name you choose earlier
56# if you choose different region for your space be careful about -ourl option (ams3)
57s3fs UNIQUE-NAME /mnt/ -ourl=https://ams3.digitaloceanspaces.com -ouse_cache=/tmp
58
59# now we try to create a file
60# once you mount it may take a couple of seconds to retrieve data
61echo "Hello cruel world" > /mnt/hello.txt
62```
63
64After all this you can return to your browser and go to [DigitalOcean Spaces](https://cloud.digitalocean.com/spaces) and click on your created space. If file hello.txt is present you have successfully mounted space to your machine and wrote data to it.
65
66I choose the same region for my Droplet and my Space but you don't have to. You can have different regions. What this actually does to performance I don't know.
67
68Additional information on FUSE:
69
70* [Github project page for s3fs](https://github.com/s3fs-fuse/s3fs-fuse)
71* [FUSE - Filesystem in Userspace](https://en.wikipedia.org/wiki/Filesystem_in_Userspace)
72
73## Will the performance degrade over time and over different sizes of objects?
74
75For this task I didn't want to just read and write text files or uploading images. I actually wanted to figure out if using something like SQlite is viable in this case.
76
77### Measurement experiment 1: File copy
78
79```bash
80# first we create some dummy files at different sizes
81dd if=/dev/zero of=10KB.dat bs=1024 count=10 #10KB
82dd if=/dev/zero of=100KB.dat bs=1024 count=100 #100KB
83dd if=/dev/zero of=1MB.dat bs=1024 count=1024 #1MB
84dd if=/dev/zero of=10MB.dat bs=1024 count=10240 #10MB
85
86# now we set time command to only return real
87TIMEFORMAT=%R
88
89# now lets test it
90(time cp 10KB.dat /mnt/) |& tee -a 10KB.results.txt
91
92# and now we automate
93# this will perform the same operation 100 times
94# this will output results into separated files based on objecty size
95n=0; while (( n++ < 100 )); do (time cp 10KB.dat /mnt/10KB.$n.dat) |& tee -a 10KB.results.txt; done
96n=0; while (( n++ < 100 )); do (time cp 100KB.dat /mnt/100KB.$n.dat) |& tee -a 100KB.results.txt; done
97n=0; while (( n++ < 100 )); do (time cp 1MB.dat /mnt/1MB.$n.dat) |& tee -a 1MB.results.txt; done
98n=0; while (( n++ < 100 )); do (time cp 10MB.dat /mnt/10MB.$n.dat) |& tee -a 10MB.results.txt; done
99```
100
101Files of size 100MB were not successfully transferred and ended up displaying error (cp: failed to close '/mnt/100MB.1.dat': Operation not permitted).
102
103As I suspected, object size is not really that important. Sadly I don't have the time to test performance over periods of time. But if some of you would do it please send me your data. I would be interested in seeing results.
104
105**Here are plotted results**
106
107You can download [raw result here](/assets/do-fuse/copy-benchmarks.tsv). Measurements are in seconds.
108
109<script src="//cdn.plot.ly/plotly-latest.min.js"></script>
110<div id="copy-benchmarks"></div>
111<script>
112(function(){
113 var request = new XMLHttpRequest();
114 request.open("GET", "/assets/do-fuse/copy-benchmarks.tsv", true);
115 request.onload = function() {
116 if (request.status >= 200 && request.status < 400) {
117 var payload = request.responseText.trim();
118 var tsv = payload.split("\n");
119 for (var i=0; i<tsv.length; i++) { tsv[i] = tsv[i].split("\t"); }
120 var traces = [];
121 var headers = tsv[0];
122 tsv.shift();
123 Array.prototype.forEach.call(headers, function(el, idx) {
124 var x = [];
125 var y = [];
126 for (var j=0; j<tsv.length; j++) {
127 x.push(j);
128 y.push(parseFloat(tsv[j][idx].replace(",", ".")));
129 }
130 traces.push({ x: x, y: y, type: "scatter", name: el, line: { width: 1, shape: "spline" } });
131 });
132 var copy = Plotly.newPlot("copy-benchmarks", traces, { legend: {"orientation": "h"}, height: 400, margin: { l: 40, r: 0, b: 20, t: 30, pad: 0 }, yaxis: { title: "execution time in seconds", titlefont: { size: 12 } }, xaxis: { title: "fn(i)", titlefont: { size: 12 } } });
133 } else { }
134 };
135 request.onerror = function() { };
136 request.send(null);
137})();
138</script>
139
140As far as these tests show, performance is quite stable and can be predicted which is fantastic. But this is a small test and spans only over couple of hours. So you should not completely trust them.
141
142### Measurement experiment 2: SQLite performanse
143
144I was unable to use database file directly from mounted drive so this is a no-go as I suspected. So I executed code below on a local disk just to get some benchmarks. I inserted 1000 records with DROPTABLE, CREATETABLE, INSERTMANY, FETCHALL, COMMIT for 1000 times to generate statistics. As you can see performance of SQLite is quite amazing. You could then potentially just copy file to mounted drive and be done with it.
145
146```python
147import time
148import sqlite3
149import sys
150
151if len(sys.argv) < 3:
152 print("usage: python sqlite-benchmark.py DB_PATH NUM_RECORDS REPEAT")
153 exit()
154
155def data_iter(x):
156 for i in range(x):
157 yield "m" + str(i), "f" + str(i*i)
158
159header_line = "%s\t%s\t%s\t%s\t%s\n" % ("DROPTABLE", "CREATETABLE", "INSERTMANY", "FETCHALL", "COMMIT")
160with open("sqlite-benchmarks.tsv", "w") as fp:
161 fp.write(header_line)
162
163start_time = time.time()
164conn = sqlite3.connect(sys.argv[1])
165c = conn.cursor()
166end_time = time.time()
167result_time = CONNECT = end_time - start_time
168print("CONNECT: %g seconds" % (result_time))
169
170start_time = time.time()
171c.execute("PRAGMA journal_mode=WAL")
172c.execute("PRAGMA temp_store=MEMORY")
173c.execute("PRAGMA synchronous=OFF")
174result_time = PRAGMA = end_time - start_time
175print("PRAGMA: %g seconds" % (result_time))
176
177for i in range(int(sys.argv[3])):
178 print("#%i" % (i))
179
180 start_time = time.time()
181 c.execute("drop table if exists test")
182 end_time = time.time()
183 result_time = DROPTABLE = end_time - start_time
184 print("DROPTABLE: %g seconds" % (result_time))
185
186 start_time = time.time()
187 c.execute("create table if not exists test(a,b)")
188 end_time = time.time()
189 result_time = CREATETABLE = end_time - start_time
190 print("CREATETABLE: %g seconds" % (result_time))
191
192 start_time = time.time()
193 c.executemany("INSERT INTO test VALUES (?, ?)", data_iter(int(sys.argv[2])))
194 end_time = time.time()
195 result_time = INSERTMANY = end_time - start_time
196 print("INSERTMANY: %g seconds" % (result_time))
197
198 start_time = time.time()
199 c.execute("select count(*) from test")
200 res = c.fetchall()
201 end_time = time.time()
202 result_time = FETCHALL = end_time - start_time
203 print("FETCHALL: %g seconds" % (result_time))
204
205 start_time = time.time()
206 conn.commit()
207 end_time = time.time()
208 result_time = COMMIT = end_time - start_time
209 print("COMMIT: %g seconds" % (result_time))
210
211 print
212 log_line = "%f\t%f\t%f\t%f\t%f\n" % (DROPTABLE, CREATETABLE, INSERTMANY, FETCHALL, COMMIT)
213 with open("sqlite-benchmarks.tsv", "a") as fp:
214 fp.write(log_line)
215
216start_time = time.time()
217conn.close()
218end_time = time.time()
219result_time = CLOSE = end_time - start_time
220print("CLOSE: %g seconds" % (result_time))
221```
222
223You can download [raw result here](/assets/do-fuse/sqlite-benchmarks.tsv). And again, these results are done on a local block storage and do not represent capabilities of object storage. With my current approach and state of the test code these can not be done. I would need to make Python code much more robust and check locking etc.
224
225<div id="sqlite-benchmarks"></div>
226<script>
227(function(){
228 var request = new XMLHttpRequest();
229 request.open("GET", "/assets/do-fuse/sqlite-benchmarks.tsv", true);
230 request.onload = function() {
231 if (request.status >= 200 && request.status < 400) {
232 var payload = request.responseText.trim();
233 var tsv = payload.split("\n");
234 for (var i=0; i<tsv.length; i++) { tsv[i] = tsv[i].split("\t"); }
235 var traces = [];
236 var headers = tsv[0];
237 tsv.shift();
238 Array.prototype.forEach.call(headers, function(el, idx) {
239 var x = [];
240 var y = [];
241 for (var j=0; j<tsv.length; j++) {
242 x.push(j);
243 y.push(parseFloat(tsv[j][idx].replace(",", ".")));
244 }
245 traces.push({ x: x, y: y, type: "scatter", name: el, line: { width: 1, shape: "spline" } });
246 });
247 var sqlite = Plotly.newPlot("sqlite-benchmarks", traces, { legend: {"orientation": "h"}, height: 400, margin: { l: 50, r: 0, b: 20, t: 30, pad: 0 }, yaxis: { title: "execution time in seconds", titlefont: { size: 12 } } });
248 } else { }
249 };
250 request.onerror = function() { };
251 request.send(null);
252})();
253</script>
254
255## Can storage be mounted on multiple machines at the same time and be writable?
256
257Well, this one didn't take long to test. And the answer is **YES**. I mounted space on both machines and measured same performance on both machines. But because file is downloaded before write and then uploaded on complete there could potentially be problems is another process is trying to access the same file.
258
259## Observations and conslusion
260
261Using Spaces in this way makes it easier to access and manage files. But besides that you would need to write additional code to make this one play nice with you applications.
262
263Nevertheless, this was extremely simple to setup and use and this is just another excellent product in DigitalOcean product line. I found this exercise very valuable and am thinking about implementing some sort of mechanism for SQLite, so data can be stored on Spaces and accessed by many VM's. For a project where data doesn't need to be accessible in real-time and can have couple of minutes old data this would be very interesting. If any of you find this proposal interesting please write in a comment box below or shoot me an email and I will keep you posted.
diff --git a/posts/2019-01-03-encoding-binary-data-into-dna-sequence.md b/posts/2019-01-03-encoding-binary-data-into-dna-sequence.md
new file mode 100644
index 0000000..1bf39ea
--- /dev/null
+++ b/posts/2019-01-03-encoding-binary-data-into-dna-sequence.md
@@ -0,0 +1,348 @@
1---
2Title: Encoding binary data into DNA sequence
3Description: Imagine a world where you could go outside and take a leaf from a tree and put it through your ~ personal DNA sequencer and get data like music, videos or computer programs from it
4Slug: encoding-binary-data-into-dna-sequence
5Listing: true
6Created: 2019, January 3
7Tags: []
8---
9
10## Initial thoughts
11
12Imagine a world where you could go outside and take a leaf from a tree and put it through your personal DNA sequencer and get data like music, videos or computer programs from it. Well, this is all possible now. It was not done on a large scale because it is quite expensive to create DNA strands but it's possible.
13
14Encoding data into DNA sequence is relatively simple process once you understand the relationship between binary data and nucleotides and scientists have been making large leaps in this field in order to provide viable long-term storage solution for our data that would potentially survive our specie if case of global disaster. We could imprint all the world's knowledge into plants and ensure the survival of our knowledge.
15
16More optimistic usage for this technology would be easier storage of ever growing data we produce every day. Once machines for sequencing DNA become fast enough and cheaper this could mean the next evolution of storing data and abandoning classical hard and solid state drives in data warehouses.
17
18As we currently stand this is still not viable but it is quite an amazing and cool technology.
19
20My interests in this field are purely in encoding processes and experimental testing mainly because I don't have the access to this expensive machines. My initial goal was to create a toolkit that can be used by everybody to encode their data into a proper DNA sequence.
21
22## Glossary
23
24**deoxyribose**
25: A five-carbon sugar molecule with a hydrogen atom rather than a hydroxyl group in the 2′ position; the sugar component of DNA nucleotides.
26
27**double helix**
28: The molecular shape of DNA in which two strands of nucleotides wind around each other in a spiral shape.
29
30**nitrogenous base**
31: A nitrogen-containing molecule that acts as a base; often referring to one of the purine or pyrimidine components of nucleic acids.
32
33**phosphate group**
34: A molecular group consisting of a central phosphorus atom bound to four oxygen atoms.
35
36**RGB**
37: The RGB color model is an additive color model in which red, green and blue light are added together in various ways to reproduce a broad array of colors.
38
39**GCC**
40: The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages.
41
42## Data encoding
43
44**TL;DR:** Encoding involves the use of a code to change original data into a form that can be used by an external process.
45
46Encoding is the process of converting data into a format required for a number of information processing needs, including:
47
48- Program compiling and execution
49- Data transmission, storage and compression/decompression
50- Application data processing, such as file conversion
51
52Encoding can have two meanings:
53
54- In computer technology, encoding is the process of applying a specific code, such as letters, symbols and numbers, to data for conversion into an equivalent cipher.
55- In electronics, encoding refers to analog to digital conversion.
56
57## Quick history of DNA
58
59- **1869** - Friedrich Miescher identifies "nuclein".
60- **1900s** - The Eugenics Movement.
61- **1900** – Mendel's theories are rediscovered by researchers.
62- **1944** - Oswald Avery identifies DNA as the 'transforming principle'.
63- **1952** - Rosalind Franklin photographs crystallized DNA fibres.
64- **1953** - James Watson and Francis Crick discover the double helix structure of DNA.
65- **1965** - Marshall Nirenberg is the first person to sequence the bases in each codon.
66- **1983** - Huntington's disease is the first mapped genetic disease.
67- **1990** - The Human Genome Project begins.
68- **1995** - Haemophilus Influenzae is the first bacterium genome sequenced.
69- **1996** - Dolly the sheep is cloned.
70- **1999** - First human chromosome is decoded.
71- **2000** – Genetic code of the fruit fly is decoded.
72- **2002** – Mouse is the first mammal to have its genome decoded.
73- **2003** – The Human Genome Project is completed.
74- **2013** – DNA Worldwide and Eurofins Forensic discover identical twins have differences in their genetic makeup.
75
76## What is DNA?
77
78Deoxyribonucleic acid, a self-replicating material which is **present in nearly all living organisms** as the main constituent of chromosomes. It is the **carrier of genetic information**.
79
80> The nitrogen in our DNA, the calcium in our teeth, the iron in our blood, the carbon in our apple pies were made in the interiors of collapsing stars. We are made of starstuff.
81>
82> **-- Carl Sagan, Cosmos**
83
84The nucleotide in DNA consists of a sugar (deoxyribose), one of four bases (cytosine (C), thymine (T), adenine (A), guanine (G)), and a phosphate. Cytosine and thymine are pyrimidine bases, while adenine and guanine are purine bases. The sugar and the base together are called a nucleoside.
85
86![DNA](/assets/dna-sequence/dna-basics.jpg#center)
87
88*DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and cytosine pairs with guanine. (credit a: modification of work by Jerome Walker, Dennis Myts)*
89
90## Encode binary data into DNA sequence
91
92As an input file you can use any file you want:
93- ASCII files,
94- Compiled programs,
95- Multimedia files (MP3, MP4, MVK, etc),
96- Images,
97- Database files,
98- etc.
99
100Note: If you would copy all the bytes from RAM to file or pipe data to file you could encode also this data as long as you provide file pointer to the encoder.
101
102### Basic Encoding
103
104As already mentioned, the Basic Encoding is based on a simple mapping. Since DNA is composed of 4 nucleotides (Adenine, Cytosine, Guanine, Thymine; usually referred using the first letter). Using this technique we can encode
105
106$$ log_2(4) = log_2(2^2) = 2 bits $$
107
108using a single nucleotide. In this way, we are able to use the 4 bases that compose the DNA strand to encode each byte of data.
109
110| Two bits | Nucleotides |
111| -------- | ---------------- |
112| 00 | **A** (Adenine) |
113| 10 | **G** (Guanine) |
114| 01 | **C** (Cytosine) |
115| 11 | **T** (Thymine) |
116
117With this in mind we can simply encode any data by using two-bit to Nucleotides conversion
118
119```python
120{ Algorithm 1: Naive byte array to DNA encode }
121procedure EncodeToDNASequence(f) string
122begin
123 enc string
124 while not eof(f) do
125 c byte := buffer[0] { Read 1 byte from buffer }
126 bin integer := sprintf('08b', c) { Convert to string binary }
127 for e in range[0, 2, 4, 6] do
128 if e[0] == 48 and e[1] == 48 then { 0x00 - A (Adenine) }
129 enc += 'A'
130 else if e[0] == 48 and e[1] == 49 then { 0x01 - G (Guanine) }
131 enc += 'G'
132 else if e[0] == 49 and e[1] == 48 then { 0x10 - C (Cytosine) }
133 enc += 'C'
134 else if e[0] == 49 and e[1] == 49 then { 0x11 - T (Thymine) }
135 enc += 'T'
136 return enc { Return DNA sequence }
137end
138```
139
140Another encoding would be **Goldman encoding**. Using this encoding helps with Nonsense mutation (amino acids replaced by a stop codon) that occurs and is the most problematic during translation because it leads to truncated amino acid sequences, which in turn results in truncated proteins.
141
142[Where to store big data? In DNA: Nick Goldman at TEDxPrague](https://www.youtube.com/watch?v=a4PiGWNsIEU)
143
144### FASTA file format
145
146In bioinformatics, FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The format originates from the FASTA software package, but has now become a standard in the field of bioinformatics.
147
148The first line in a FASTA file started either with a ">" (greater-than) symbol or, less frequently, a ";" (semicolon) was taken as a comment. Subsequent lines starting with a semicolon would be ignored by software. Since the only comment used was the first, it quickly became used to hold a summary description of the sequence, often starting with a unique library accession number, and with time it has become commonplace to always use ">" for the first line and to not use ";" comments (which would otherwise be ignored).
149
150```text
151;LCBO - Prolactin precursor - Bovine
152; a sample sequence in FASTA format
153MDSKGSSQKGSRLLLLLVVSNLLLCQGVVSTPVCPNGPGNCQVSLRDLFDRAVMVSHYIHDLSS
154EMFNEFDKRYAQGKGFITMALNSCHTSSLPTPEDKEQAQQTHHEVLMSLILGLLRSWNDPLYHL
155VTEVRGMKGAPDAILSRAIEIEEENKRLLEGMEMIFGQVIPGAKETEPYPVWSGLPSLQTKDED
156ARYSAFYNLLHCLRRDSSKIDTYLKLLNCRIIYNNNC*
157
158>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
159ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID
160FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
161DIDGDGQVNYEEFVQMMTAK*
162
163>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
164LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
165EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
166LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
167GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
168IENY
169```
170
171FASTA format was extended by [FASTQ](https://en.wikipedia.org/wiki/FASTQ_format) format from the [Sanger Centre](https://www.sanger.ac.uk/) in Cambridge.
172
173### PNG encoded DNA sequence
174
175| Nucleotides | RGB | Color name |
176| ------------- | ----------- | ---------- |
177| A -> Adenine | (0,0,255) | Blue |
178| G -> Guanine | (0,100,0) | Green |
179| C -> Cytosine | (255,0,0) | Red |
180| T -> Thymine | (255,255,0) | Yellow |
181
182With this in mind we can create a simple algorithm to create PNG representation of a DNA sequence.
183
184```python
185{ Algorithm 2: Naive DNA to PNG encode from FASTA file }
186procedure EncodeDNASequenceToPNG(f)
187begin
188 i image
189 while not eof(f) do
190 c char := buffer[0] { Read 1 char from buffer }
191 case c of
192 'A': color := RGB(0, 0, 255) { Blue }
193 'G': color := RGB(0, 100, 0) { Green }
194 'C': color := RGB(255, 0, 0) { Red }
195 'T': color := RGB(255, 255, 0) { Yellow }
196 drawRect(i, [x, y], color)
197 save(i) { Save PNG image }
198end
199```
200
201## Encoding text file in practice
202
203In this example we will take a simple text file as our input stream for encoding. This file will have a quote from Niels Bohr and saved as txt file.
204
205> How wonderful that we have met with a paradox. Now we have some hope of making progress.
206> ― Niels Bohr
207
208First we encode text file into FASTA file.
209
210```bash
211./dnae-encode -i quote.txt -o quote.fa
2122019/01/10 00:38:29 Gathering input file stats
2132019/01/10 00:38:29 Starting encoding ...
214 106 B / 106 B [==================================] 100.00% 0s
2152019/01/10 00:38:29 Saving to FASTA file ...
2162019/01/10 00:38:29 Output FASTA file length is 438 B
2172019/01/10 00:38:29 Process took 987.263µs
2182019/01/10 00:38:29 Done ...
219```
220
221Output of `quote.fa` file contains the encoded DNA sequence in ASCII format.
222
223```text
224>SEQ1
225GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA
226GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA
227ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA
228ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT
229GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT
230GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC
231AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC
232AACC
233```
234
235Then we encode FASTA file from previous operation to encode this data into PNG.
236
237```bash
238./dnae-png -i quote.fa -o quote.png
2392019/01/10 00:40:09 Gathering input file stats ...
2402019/01/10 00:40:09 Deconstructing FASTA file ...
2412019/01/10 00:40:09 Compositing image file ...
242 424 / 424 [==================================] 100.00% 0s
2432019/01/10 00:40:09 Saving output file ...
2442019/01/10 00:40:09 Output image file length is 1.1 kB
2452019/01/10 00:40:09 Process took 19.036117ms
2462019/01/10 00:40:09 Done ...
247```
248
249After encoding into PNG format this file looks like this.
250
251![Encoded Quote in PNG format](/assets/dna-sequence/quote.png)
252
253The larger the input stream is the larger the PNG file would be.
254
255Compiled basic Hello World C program with [GCC](https://www.gnu.org/software/gcc/) would [look like](/assets/dna-sequence/sample.png).
256
257```c
258// gcc -O3 -o sample sample.c
259#include <stdio.h>
260
261main() {
262 printf("Hello, world!\n");
263 return 0;
264}
265```
266
267## Toolkit for encoding data
268
269I have created a toolkit with two main programs:
270- dnae-encode (encodes file into FASTA file)
271- dnae-png (encodes FASTA file into PNG)
272
273Toolkit with full source code is available on [github.com/mitjafelicijan/dna-encoding](https://github.com/mitjafelicijan/dna-encoding).
274
275### dnae-encode
276
277```bash
278> ./dnae-encode --help
279usage: dnae-encode --input=INPUT [<flags>]
280
281A command-line application that encodes file into DNA sequence.
282
283Flags:
284 --help Show context-sensitive help (also try --help-long and --help-man).
285 -i, --input=INPUT Input file (ASCII or binary) which will be encoded into DNA sequence.
286 -o, --output="out.fa" Output file which stores DNA sequence in FASTA format.
287 -s, --sequence=SEQ1 The description line (defline) or header/identifier line, gives a name and/or a unique identifier for the sequence.
288 -c, --columns=60 Row characters length (no more than 120 characters). Devices preallocate fixed line sizes in software.
289 --version Show application version.
290```
291
292### dnae-png
293
294```bash
295> ./dnae-png --help
296usage: dnae-png --input=INPUT [<flags>]
297
298A command-line application that encodes FASTA file into PNG image.
299
300Flags:
301 --help Show context-sensitive help (also try --help-long and --help-man).
302 -i, --input=INPUT Input FASTA file which will be encoded into PNG image.
303 -o, --output="out.png" Output file in PNG format that represents DNA sequence in graphical way.
304 -s, --size=10 Size of pairings of DNA bases on image in pixels (lower resolution lower file size).
305 --version Show application version.
306```
307
308## Benchmarks
309
310First we generate some binary sample data with dd.
311
312```bash
313dd if=<(openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero) of=1KB.bin bs=1KB count=1 iflag=fullblock
314```
315
316Our freshly generated 1KB file looks something like this (its full of garbage data as intended).
317
318![Sample binary file 1KB](/assets/dna-sequence/sample-binary-file.png)
319
320We create following binary files:
321- 1KB.bin
322- 10KB.bin
323- 100KB.bin
324- 1MB.bin
325- 10MB.bin
326- 100MB.bin
327
328After this we create FASTA files for all the binary files by encoding them into DNA sequence.
329
330```bash
331./dnae-encode -i 100MB.bin -o 100MB.fa
332```
333
334Then we GZIP all the FASTA files to see how much the can be compressed.
335
336```bash
337gzip -9 < 10MB.fa > 10MB.fa.gz
338```
339
340[Download ODS file with benchmarks](/assets/dna-sequence/benchmarks.ods).
341
342## References
343
344- https://www.techopedia.com/definition/948/encoding
345- https://www.dna-worldwide.com/resource/160/history-dna-timeline
346- https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/
347- https://arxiv.org/abs/1801.04774
348- https://en.wikipedia.org/wiki/FASTA_format
diff --git a/posts/2019-10-14-simplifying-and-reducing-clutter.md b/posts/2019-10-14-simplifying-and-reducing-clutter.md
new file mode 100644
index 0000000..24c55c6
--- /dev/null
+++ b/posts/2019-10-14-simplifying-and-reducing-clutter.md
@@ -0,0 +1,24 @@
1---
2Title: Simplifying and reducing clutter in my life and work
3Description: Simplifying and reducing clutter in my life and work
4Slug: simplifying-and-reducing-clutter
5Listing: true
6Created: 2019, October 14
7Tags: []
8---
9
10I recently moved my main working machine back from Hachintosh to Linux. Well the experiment was interesting and I have done some great work on macOS but it was time to move back.
11
12I actually really missed Linux. The simplicity of `apt-get` or just the amount of software that exists for Linux should be a no-brainer. I spent most of my time on macOS finding solutions to make things work. Using [Brew](https://brew.sh/) was just a horrible experience and far from package managers of Linux. At least they managed to get that `sudo` debacle sorted.
13
14Not all was bad. macOS in general was a perfectly good environment. Things like Docker and tooling like this worked without any hiccups. My normal tools like coding IDE worked flawlessly and the whole look and feel is just superb. I have been using MacBook Air for couple of years so I was used to the system but never as a daily driver.
15
16One of the things I did after I installed Linux back on my machine was cleaning up my Dropbox folder. I have everything on Dropbox. Even projects folder. I write code for living so my whole life revolves around couple of megs of code (with assets). So it's not like I have huge files on my machine. I don't have movies or music or pictures on my PC. All of that stuff is in cloud. I use Google music and I have Netflix account which is more than enough for me.
17
18I also went and deleted some of the repositories on my Github account. I have deleted more code than deployed. People find this strange but for me deleting something feels so cathartic and also forces me to write better code next time around when I am faced with similar problem. That was a huge relief if I am being totally honest.
19
20Next step was to do something with my webpage. I have been using some scripts I wrote a while ago to generate static pages from markdown source posts. I kept on adding and adding stuff on top of it and it became a source of a frustration. And this is just a simple blog and I was using gulp and npm. Anyways after couple of hours of searching and testing static generators I found an interesting one [https://github.com/piranha/gostatic](https://github.com/piranha/gostatic) and I just decided to use this one. It was the only one that had a simple templating engine, not that I really need one. But others had this convoluted way of trying to solve everything and at the end just required quite bigger learning curve I was ready to go with. So I deleted couple of old posts, simplified HTML, trashed most of the CSS and went with [https://motherfuckingwebsite.com/](https://motherfuckingwebsite.com/) aesthetics. Yeah, the previous site was more visually stimulating but all I really care is the content at this point. And Times New Roman font is kind of awesome.
21
22I stopped working on most of the projects in the past couple of months because the overhead was just too insane. There comes a point when you stretch yourself too much and then you stop progressing and with that comes dissatisfaction.
23
24So that's about it. Moving forward minimal style.
diff --git a/posts/2019-10-19-using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md b/posts/2019-10-19-using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md
new file mode 100644
index 0000000..b975828
--- /dev/null
+++ b/posts/2019-10-19-using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md
@@ -0,0 +1,88 @@
1---
2Title: Using sentiment analysis for click&#8209;bait detection in RSS feeds
3Description: Using Python with sentiment analysis to detect if titles in RSS feeds are click-bait
4Slug: using-sentiment-analysis-for-click-bait-detection-in-rss-feeds
5Listing: true
6Created: 2019, October 19
7Tags: []
8---
9
10## Initial thoughts
11
12One of the things that interested me for a while now is if major well established news sites use click bait titles to drive additional traffic to their sites and generate additional impressions.
13
14Goal is to see how article titles and actual content of article differ from each other and see if titles are click-baited.
15
16## Preparing and cleaning data
17
18For this example I opted to just use RSS feed from a new website and decided to go with [The Guardian](https://www.theguardian.com) World news. While this gets us limited data (~40) articles and also description (actual content) is trimmed this really doesn't reflect the actual article contents.
19
20To get better content I could use web scraping and use RSS as link list and fetch contents directly from website, but for this simple example this will suffice.
21
22There are couple of requirements we need to install before we continue:
23
24- `pip3 install feedparser` (parses RSS feed from url)
25- `pip3 install vaderSentiment` (does sentiment polarity analysis)
26- `pip3 install matplotlib` (plots chart of results)
27
28So first we need to fetch RSS data and sanitize HTML content from description.
29
30```python
31import re
32import feedparser
33
34feed_url = "https://www.theguardian.com/world/rss"
35feed = feedparser.parse(feed_url)
36
37# sanitize html
38for item in feed.entries:
39 item.description = re.sub('<[^<]+?>', '', item.description)
40```
41
42## Perform sentiment analysis
43
44Since we now have cleaned up data in our `feed.entries` object we can start with performing sentiment analysis.
45
46There are many sentiment analysis libraries available that range from rule-based sentiment analysis up to machine learning supported analysis. To keep things simple I decided to use rule-based analysis library [vaderSentiment](https://github.com/cjhutto/vaderSentiment) from [C.J. Hutto](https://github.com/cjhutto). Really nice library and quite easy to use.
47
48```python
49from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
50analyser = SentimentIntensityAnalyzer()
51
52sentiment_results = []
53for item in feed.entries:
54 sentiment_title = analyser.polarity_scores(item.title)
55 sentiment_description = analyser.polarity_scores(item.description)
56 sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']])
57```
58
59Now that we have this data in a shape that is compatible with matplotlib we can plot results to see the difference between title and description sentiment of an article.
60
61```python
62import matplotlib.pyplot as plt
63
64plt.rcParams['figure.figsize'] = (15, 3)
65plt.plot(sentiment_results, drawstyle='steps')
66plt.title('Sentiment analysis relationship between title and description (Guardian World News)')
67plt.legend(['title', 'description'])
68plt.show()
69```
70
71## Results and assets
72
731. Because of the small sample size further conclusions are impossible to make.
742. Rule-based approach may not be the best way of doing this. By using deep learning we would be able to get better insights.
753. **Next step would be to** periodically fetch RSS items and store them over a longer period of time and then perform analysis again and use either machine learning or deep learning on top of it.
76
77![Relationship between title and description](/assets/sentiment-analysis/guardian-sa-title-desc-relationship.png)
78
79Figure above displays difference between title and description sentiment for specific RSS feed item. 1 means positive and -1 means negative sentiment.
80
81[» Download Jupyter Notebook](/assets/sentiment-analysis/sentiment-analysis.ipynb)
82
83## Going further
84
85- [Twitter Sentiment Analysis by Bryan Schwierzke](https://github.com/bswiss/news_mood)
86- [AFINN-based sentiment analysis for Node.js by Andrew Sliwinski](https://github.com/thisandagain/sentiment)
87- [Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande](https://github.com/adeshpande3/LSTM-Sentiment-Analysis)
88- [Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir](https://github.com/abdulfatir/twitter-sentiment-analysis)
diff --git a/posts/2020-03-22-simple-sse-based-pubsub-server.md b/posts/2020-03-22-simple-sse-based-pubsub-server.md
new file mode 100644
index 0000000..56a7dfa
--- /dev/null
+++ b/posts/2020-03-22-simple-sse-based-pubsub-server.md
@@ -0,0 +1,398 @@
1---
2Title: Simple Server-Sent Events based PubSub Server
3Description: Simple Server-Sent Events based PubSub Server
4Slug: simple-server-sent-events-based-pubsub-server
5Listing: true
6Created: 2020, March 22
7Tags: []
8---
9
10## Before we continue ...
11
12Publisher Subscriber model is nothing new and there are many amazing solutions out there, so writing a new one would be a waste of time if other solutions wouldn't have quite complex install procedures and weren't so hard to maintain. But to be fair, comparing this simple server with something like [Kafka](https://kafka.apache.org/) or [RabbitMQ](https://www.rabbitmq.com/) is laughable at the least. Those solutions are enterprise grade and have many mechanisms there to ensure messages aren't lost and much more. Regardless of these drawbacks, this method has been tested on a large website and worked until now without any problems. So now, that we got that cleared up, let's continue.
13
14***Wiki definition:** Publish/subscribe messaging, or pub/sub messaging, is a form of asynchronous service-to-service communication used in serverless and microservices architectures. In a pub/sub model, any message published to a topic is immediately received by all the subscribers to the topic.*
15
16## General goals
17
18- provide a simple server that relays messages to all the connected clients,
19- messages can be posted on specific topics,
20- messages get sent via [Server-Sent Events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) to all the subscribers.
21
22## How exactly does the pub/sub model work?
23
24The easiest way to explain this is with diagram bellow. Basic function is simple. We have subscribers that receive messages, and we have publishers that create and post messages. Similar model is also well know pattern that works on a premise of consumers and producers, and they take similar roles.
25
26![How PubSub works](/assets/simple-pubsub-server/pubsub-overview.png)
27
28**These are some naive characteristics we want to achieve:**
29
30- producer is publishing messages to subscribe topic,
31- consumer is receiving messages from subscribed topic,
32- servers is also known as Broker,
33- broker does not store messages or tracks success,
34- broker uses [FIFO](https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics)) method for delivering messages,
35- if consumer wants to receive messages from a topic, producer and consumer topics must match,
36- consumer can subscribe to multiple topics,
37- producer can publish to multiple topics,
38- each message has a messageId.
39
40**Known drawbacks:**
41
42- messages will not be stored in a persistent queue or unreceived messages like [DeadLetterQueue](https://en.wikipedia.org/wiki/Dead_letter_queue) so old messages could be lost on server restart,
43- [Server-Sent Events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) opens a long-running connection between the client and the server so make sure if your setup is load balanced that the load balancer in this case can have long opened connection,
44- no system moderation due to the dynamic nature of creating queues.
45
46## Server-Sent Events
47
48Read more about it on [official specification page](https://html.spec.whatwg.org/multipage/server-sent-events.html).
49
50### Current browser support
51
52![Browser support](../assets/simple-pubsub-server/caniuse.png)
53
54Check [https://caniuse.com/#feat=eventsource](https://caniuse.com/#feat=eventsource) for latest information about browser support.
55
56### Known issues
57
58- Firefox 52 and below do not support EventSource in web/shared workers
59- In Firefox prior to version 36 server-sent events do not reconnect automatically in case of a connection interrupt (bug)
60- Reportedly, CORS in EventSource is currently supported in Firefox 10+, Opera 12+, Chrome 26+, Safari 7.0+.
61- Antivirus software may block the event streaming data chunks.
62
63Source: [https://caniuse.com/#feat=eventsource](https://caniuse.com/#feat=eventsource)
64
65### Message format
66
67The simplest message that can be sent is only with data attribute:
68
69```bash
70data: this is a simple message
71<blank line>
72```
73
74You can send message IDs to be used if the connection is dropped:
75
76```bash
77id: 33
78data: this is line one
79data: this is line two
80<blank line>
81```
82
83And you can specify your own event types (the above messages will all trigger the message event):
84
85```bash
86id: 36
87event: price
88data: 103.34
89<blank line>
90```
91
92### Server requirements
93
94The important thing is how you send headers and which headers are sent by the server that triggers browser to threat response as a EventStream.
95
96Headers responsible for this are:
97
98```bash
99Content-Type: text/event-stream
100Cache-Control: no-cache
101Connection: keep-alive
102```
103
104### Debugging with Google Chrome
105
106Google Chrome provides build-in debugging and exploration tool for [Server-Sent Events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) which is quite nice and available from Developer Tools under Network tab.
107
108> You can debug only client side events that get received and not the server ones. For debugging server events add `console.log` to `server.js` code and print out events.
109
110![Google Chrome Developer Tools EventStream](../assets/simple-pubsub-server/chrome-debugging.png)
111
112## Server implementation
113
114For the sake of this example we will use [Node.js](https://nodejs.org/en/) with [Express](https://expressjs.com) as our router since this is the easiest way to get started and we will use already written SSE library for node [sse-pubsub](https://www.npmjs.com/package/sse-pubsub) so we don't reinvent the wheel.
115
116```bash
117npm init --yes
118
119npm install express
120npm install body-parser
121npm install sse-pubsub
122```
123
124Basic implementation of a server (`server.js`):
125
126```js
127const express = require('express');
128const bodyParser = require('body-parser');
129const SSETopic = require('sse-pubsub');
130
131const app = express();
132const port = process.env.PORT || 4000;
133
134// topics container
135const sseTopics = {};
136
137app.use(bodyParser.json());
138
139// open for all cors
140app.all('*', (req, res, next) => {
141 res.header('Access-Control-Allow-Origin', '*');
142 res.header('Access-Control-Allow-Headers', 'X-Requested-With, Content-Type');
143 next();
144});
145
146// preflight request error fix
147app.options('*', async (req, res) => {
148 res.header('Access-Control-Allow-Origin', '*');
149 res.header('Access-Control-Allow-Headers', 'X-Requested-With, Content-Type');
150 res.send('OK');
151});
152
153// serve the event streams
154app.get('/stream/:topic', async (req, res, next) => {
155 const topic = req.params.topic;
156
157 if (!(topic in sseTopics)) {
158 sseTopics[topic] = new SSETopic({
159 pingInterval: 0,
160 maxStreamDuration: 15000,
161 });
162 }
163
164 // subscribing client to topic
165 sseTopics[topic].subscribe(req, res);
166});
167
168// accepts new messages into topic
169app.post('/publish', async (req, res) => {
170 let body = req.body;
171 let status = 200;
172
173 console.log('Incoming message:', req.body);
174
175 if (
176 body.hasOwnProperty('topic') &&
177 body.hasOwnProperty('event') &&
178 body.hasOwnProperty('message')
179 ) {
180 const topic = req.body.topic;
181 const event = req.body.event;
182 const message = req.body.message;
183
184 if (topic in sseTopics) {
185 // sends message to all the subscribers
186 sseTopics[topic].publish(message, event);
187 }
188 } else {
189 status = 400;
190 }
191
192 res.status(status).send({
193 status,
194 });
195});
196
197// returns JSON object of all opened topics
198app.get('/status', async (req, res) => {
199 res.send(sseTopics);
200});
201
202// health-check endpoint
203app.get('/', async (req, res) => {
204 res.send('OK');
205});
206
207// return a 404 if no routes match
208app.use((req, res, next) => {
209 res.set('Cache-Control', 'private, no-store');
210 res.status(404).end('Not found');
211});
212
213// starts the server
214app.listen(port, () => {
215 console.log(`PubSub server running on http://localhost:${port}`);
216});
217```
218
219### Our custom message format
220
221Each message posted on a server must be in a specific format that out server accepts. Having structure like this allows us to have multiple separated type of events on each topic.
222
223With this we can separate streams and only receive events that belong to the topic.
224
225One example would be, that we have index page and we want to receive messages about new upvotes or new subscribers but we don't want to follow events for other pages. This reduces clutter and overall network. And structure is much nicer and maintanable.
226
227```json
228{
229 "topic": "sample-topic",
230 "event": "sample-event",
231 "message": { "name": "John" }
232}
233```
234
235## Publisher and subscriber clients
236
237### Publisher and subscriber in action
238
239<video src="/assets/simple-pubsub-server/clients.mp4" controls></video>
240
241You can download [the code](../assets/simple-pubsub-server/sse-pubsub-server.zip) and follow along.
242
243### Publisher
244
245As talked about above publisher is the one that send messages to the broker/server. Message inside the payload can be whatever you want (string, object, array). I would however personally avoid send large chunks of data like blobs and such.
246
247```html
248<!DOCTYPE html>
249<html lang="en">
250
251 <head>
252 <meta charset="UTF-8">
253 <meta name="viewport" content="width=device-width, initial-scale=1.0">
254 <title>Publisher</title>
255 </head>
256
257 <body>
258
259 <h1>Publisher</h1>
260
261 <fieldset>
262 <p>
263 <label>Server:</label>
264 <input type="text" id="server" value="http://localhost:4000">
265 </p>
266 <p>
267 <label>Topic:</label>
268 <input type="text" id="topic" value="sample-topic">
269 </p>
270 <p>
271 <label>Event:</label>
272 <input type="text" id="event" value="sample-event">
273 </p>
274 <p>
275 <label>Message:</label>
276 <input type="text" id="message" value='{"name": "John"}'>
277 </p>
278 <p>
279 <button type="button" id="button">Publish message to topic</button>
280 </p>
281 </fieldset>
282
283 <script>
284
285 const button = document.querySelector('#button');
286 const server = document.querySelector('#server');
287 const topic = document.querySelector('#topic');
288 const event = document.querySelector('#event');
289 const message = document.querySelector('#message');
290
291 button.addEventListener('click', async (evt) => {
292 const req = await fetch(`${server.value}/publish`, {
293 method: 'post',
294 headers: {
295 'Accept': 'application/json',
296 'Content-Type': 'application/json',
297 },
298 body: JSON.stringify({
299 topic: topic.value,
300 event: event.value,
301 message: JSON.parse(message.value),
302 }),
303 });
304
305 const res = await req.json();
306 console.log(res);
307 });
308
309 </script>
310
311 </body>
312
313</html>
314
315```
316
317### Subscriber
318
319Subscriber is responsible for receiving new messages that come from server via publisher. The code bellow is very rudimentary but works and follows the implementation guidelines for EventSource.
320
321You can use either Developer Tools Console to see incoming messages or you can defer to Debugging with Google Chrome section above to see all EventStream messages.
322
323> Don't be alarmed if the subscriber gets disconnected from the server every so often. The code we have here resets connection every 15s but it automatically get reconnected and fetches all messages up to last received message id. This setting can be adjusted in `server.js` file; search for the `maxStreamDuration` variable.
324
325```html
326<!DOCTYPE html>
327<html lang="en">
328
329 <head>
330 <meta charset="UTF-8">
331 <meta name="viewport" content="width=device-width, initial-scale=1.0">
332 <title>Subscriber</title>
333 <link rel="stylesheet" href="style.css">
334 </head>
335
336 <body>
337
338 <h1>Subscriber</h1>
339
340 <fieldset>
341 <p>
342 <label>Server:</label>
343 <input type="text" id="server" value="http://localhost:4000">
344 </p>
345 <p>
346 <label>Topic:</label>
347 <input type="text" id="topic" value="sample-topic">
348 </p>
349 <p>
350 <label>Event:</label>
351 <input type="text" id="event" value="sample-event">
352 </p>
353 <p>
354 <button type="button" id="button">Subscribe to topic</button>
355 </p>
356 </fieldset>
357
358 <script>
359
360 const button = document.querySelector('#button');
361 const server = document.querySelector('#server');
362 const topic = document.querySelector('#topic');
363 const event = document.querySelector('#event');
364
365 button.addEventListener('click', async (evt) => {
366
367 let es = new EventSource(`${server.value}/stream/${topic.value}`);
368
369 es.addEventListener(event.value, function (evt) {
370 console.log(`incoming message`, JSON.parse(evt.data));
371 });
372
373 es.addEventListener('open', function (evt) {
374 console.log('connected', evt);
375 });
376
377 es.addEventListener('error', function (evt) {
378 console.log('error', evt);
379 });
380
381 });
382
383 </script>
384
385 </body>
386
387</html>
388
389```
390
391## Reading further
392
393- [Using server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events)
394- [Using SSE Instead Of WebSockets For Unidirectional Data Flow Over HTTP/2](https://www.smashingmagazine.com/2018/02/sse-websockets-data-flow-http2/)
395- [What is Server-Sent Events?](https://apifriends.com/api-streaming/server-sent-events/)
396- [An HTTP/2 extension for bidirectional messaging communication](https://tools.ietf.org/id/draft-xie-bidirectional-messaging-01.html)
397- [Introduction to HTTP/2](https://developers.google.com/web/fundamentals/performance/http2)
398- [The WebSocket API (WebSockets)](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API)
diff --git a/posts/2020-03-27-create-placeholder-images-with-sharp.md b/posts/2020-03-27-create-placeholder-images-with-sharp.md
new file mode 100644
index 0000000..ef035c9
--- /dev/null
+++ b/posts/2020-03-27-create-placeholder-images-with-sharp.md
@@ -0,0 +1,85 @@
1---
2Title: Create placeholder images with sharp Node.js image processing library
3Description: Create placeholder images with sharp Node.js image processing library
4Slug: create-placeholder-images-with-sharp
5Listing: true
6Created: 2020, March 27
7Tags: []
8---
9
10I have been searching for a solution to pre-generate some placeholder images for image server I needed to develop that resizes images on S3. I though this would be a 15min job and quickly found out how very mistaken I was.
11
12Even though Node.js is not really the best way to do this kind of things (surely something written in C or Rust or even Golang would be the correct way to do this but we didn't need the speed in our case) I found an excellent library [sharp - High performance Node.js image processing](https://github.com/lovell/sharp).
13
14Getting things running was a breeze.
15
16## Fetch image from S3 and save resized
17
18```js
19const sharp = require('sharp');
20const aws = require('aws-sdk');
21
22const x,y = 100;
23const s3 = new aws.S3({});
24
25aws.config.update({
26 secretAccessKey: 'secretAccessKey',
27 accessKeyId: 'accessKeyId',
28 region: 'region'
29});
30
31const originalImage = await s3.getObject({
32 Bucket: 'some-bucket-name',
33 Key: 'image.jpg',
34}).promise();
35
36const resizedImage = await sharp(originalImage.Body)
37 .resize(x, y)
38 .jpeg({ progressive: true })
39 .toBuffer();
40
41s3.putObject({
42 Bucket: 'some-bucket-name',
43 Key: `optimized/${x}x${y}/image.jpg`,
44 Body: resizedImage,
45 ContentType: 'image/jpeg',
46 ACL: 'public-read'
47}).promise();
48```
49
50All this code was wrapped inside a web service with some additional security checks and defensive coding to detect if key is missing on S3.
51
52And at that point I needed to return placeholder images as a response in case key is missing or x,y are not allowed by the server etc. I could have created PNG in Gimp and just serve them but I wanted to respect aspect ratio and I didn't want to return some mangled images.
53
54> Main problem with finding a clean solution I could copy and paste and change a bit was a task. API is changing constantly and there weren't clear examples or I was unable to find them.
55
56## Generating placeholder images using SVG
57
58What I ended up was using SVG to generate text and created image with sharp and used composition to combine both layers. Response returned by this function is a buffer you can use to either upload to S3 or save to local file.
59
60```js
61const generatePlaceholderImageWithText = async (width, height, message) => {
62 const overlay = `<svg width="${width - 20}" height="${height - 20}">
63 <text x="50%" y="50%" font-family="sans-serif" font-size="16" text-anchor="middle">${message}</text>
64 </svg>`;
65
66 return await sharp({
67 create: {
68 width: width,
69 height: height,
70 channels: 4,
71 background: { r: 230, g: 230, b: 230, alpha: 1 }
72 }
73 })
74 .composite([{
75 input: Buffer.from(overlay),
76 gravity: 'center',
77 }])
78 .jpeg()
79 .toBuffer();
80}
81```
82
83That is about it. Nothing more to it. You can change the color of the image by changing `background` and if you want to change text styling you can adapt SVG to your needs.
84
85> Also be careful about the length of the text. This function positions text at the center and adds `20px` padding on all sides. If text is longer than the image it will get cut.
diff --git a/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md b/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md
new file mode 100644
index 0000000..7d70a7d
--- /dev/null
+++ b/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md
@@ -0,0 +1,78 @@
1---
2Title: The strange case of Elasticsearch allocation failure
3Description: Elasticsearch allocation failure on some indices while reporting domain processing
4Slug: the-strange-case-of-elasticsearch-allocation-failure
5Listing: true
6Created: 2020, March 29
7Tags: []
8---
9
10I've been using Elasticsearch in production for 5 years now and never had a single problem with it. Hell, never even known there could be a problem. Just worked. All this time. The first node that I deployed is still being used in production, never updated, upgraded, touched in anyway.
11
12All this bliss came to an abrupt end this Friday when I got notification that Elasticsearch cluster went warm. Well, warm is not that bad right? Wrong! Quickly after that I got another email which sent chills down my spine. Cluster is now red. RED! Now, shit really hit the fan!
13
14I tried googling what could be the problem and after executing allocation function noticed that some shards were unassigned and 5 attempts were already made (which is BTW to my luck the maximum) and that meant I am basically fucked. They also applied that one should wait for cluster to re-balance itself. So, I waited. One hour, two hours, several hours. Nothing, still RED.
15
16The strangest thing about it all was, that queries were still being fulfilled. Data was coming out. On the outside it looked like nothing was wrong but everybody that would look at the cluster would know immediately that something was very very wrong and we were living on borrowed time here.
17
18> **Please, DO NOT do what I did.** Seriously! Please ask someone on official forums or if you know an expert please consult him. There could be million of reasons and these solution fit my problem. Maybe in your case it would disastrous. I had all the data backed up and even if I would fail spectacularly I would be able to restore the data. It would be a huge pain and I would loose couple of days but I had a plan B.
19
20Executing allocation and told me what the problem was but no clear solution yet.
21
22```yaml
23GET /_cat/allocation?format=json
24```
25
26I got a message that `ALLOCATION_FAILED` with additional info `failed to create shard, failure ioexception[failed to obtain in-memory shard lock]`. Well splendid! I must also say that our cluster is capable more than enough to handle the traffic. Also JVM memory pressure never was an issue. So what happened really then?
27
28I tried also re-routing failed ones with no success due to AWS restrictions on having managed Elasticsearch cluster (they lock some of the functions).
29
30```yaml
31POST /_cluster/reroute?retry_failed=true
32```
33
34I got a message that significantly reduced my options.
35
36```json
37{
38 "Message": "Your request: '/_cluster/reroute' is not allowed."
39}
40```
41
42After that I went on a hunt again. I won't bother you with all the details because hours/days went by until I was finally able to re-index the problematic index and hoped for the best. Until that moment even re-indexing was giving me errors.
43
44```yaml
45POST _reindex
46{
47 "source": {
48 "index": "myindex"
49 },
50 "dest": {
51 "index": "myindex-new"
52 }
53}
54```
55
56I needed to do this multiple times to get all the documents re-indexed. Then I dropped the original one with the following command.
57
58```yaml
59DELETE /myindex
60```
61
62And re-indexed again new one in the original one (well by name only).
63
64```yaml
65POST _reindex
66{
67 "source": {
68 "index": "myindex-new"
69 },
70 "dest": {
71 "index": "myindex"
72 }
73}
74```
75
76On the surface it looks like all is working but I have a long road in front of me to get all the things working again. Cluster now shows that it is in Green mode but I am also getting a notification that the cluster has processing status which could mean million of things.
77
78Godspeed!
diff --git a/posts/2020-03-30-my-love-and-hate-relationship-with-nodejs copy.md b/posts/2020-03-30-my-love-and-hate-relationship-with-nodejs copy.md
new file mode 100644
index 0000000..70e0f51
--- /dev/null
+++ b/posts/2020-03-30-my-love-and-hate-relationship-with-nodejs copy.md
@@ -0,0 +1,43 @@
1---
2Title: My love and hate relationship with Node.js
3Description: How I found a way to love and hate Node.js with a passion
4Slug: my-love-and-hate-relationship-with-nodejs
5Listing: true
6Created: 2020, March 30
7Tags: []
8---
9
10Previous project I was working on was being coded in [Golang](https://golang.org/). Also was my first project using it. And damn, that was an awesome experience. The whole thing is just superb. From how errors are handled. The C-like way you handle compiling. The way the language is structured making it incredibly versatile and easy to learn.
11
12It may cause some pain for somebody that is not used of using interfaces to map JSON and doing the recompilation all the time. But we have tools like [entr](http://eradman.com/entrproject/) and [make](https://www.gnu.org/software/make/) to fix that.
13
14But we are not here to talk about my undying love for **Golang**. Only in some way we probably should. It is an excellent example of how modern language should be designed. And because I have used it extensively in the last couple of years this probably taints my views of other languages. And is doing me a great disservice. Nevertheless, here we are.
15
16About two years ago I started flirting with [Node.js](https://nodejs.org/en/) for a project I started working on. What I wanted was to have things written in a language that is widely used, and we could get additional developers for. As much as **Golang** is amazing it's really hard to get developers for it. Even now. And after playing around with it for a week I felt in love with the speed of iteration and massive package ecosystem. Do you want SSO? You got it! Do you want some esoteric library for something? There is a strong chance somebody wrote it. It is so extensive that you find yourself evaluating packages based on **GitHub stars** and number of contributors. You get swallowed by the vanity metrics and that potentially will become the downfall of Node.js.
17
18Because of the sheer amount of choice I often got anxiety when choosing libraries. Will I choose the correct one? Is this library something that will be supported for a foreseeable future or not? I am used of using libraries that are being in development for 10 years plus (Python, C) and that gave me some sort of comfort. And it is probably unfair to Node.js and community to expect same dedication.
19
20Moving forward ... Work started and things were great. **Speed of iteration was insane**. For some feature that I would need a day in Golang only took me hour or two. I became lazy! Using packages all over the place. Falling into the same trap as others. Packages on top of packages. And [npm](https://www.npmjs.com/) didn't help at all. The way that the package manager works is just horrendous. And not allowing to have node_modules outside the project is also the stupidest idea ever.
21
22So at that point I started feeling the technical debt that comes with Node.js and the whole ecosystem. What nobody tells you is that **structuring large Node.js apps** is more problematic than one would think. And going microservice for every single thing is also a bad idea. The amount of networking you introduce with that approach always ends up being a pain in the ass. And I don't even want to go into system administration here. The overhead is insane. Package-lock.json made many days feel like living hell for me. And I would eat the cost of all this if it meant for better development experience. Well, it didn't.
23
24The **lack of Typescript** support in the interpreter is still mind boggling to me. Why haven't they added native support yet for this is beyond me?! That would have solved so many problems. Lack of type safety became a problem somewhere in the middle of the project where the codebase was sufficiently large enough to present problems. We started adding arguments to functions and there was **no way to implicitly define argument types**. And because at that point there were a lot of functions, it became impossible to know what each one accepts, development became more and more trial and error based.
25
26I tried **implementing Typescript**, but that would present a large refactor that we were not willing to do at that point. The benefits were not enough. I also tried [Flow - static type checker](https://flow.org/) but implementation was also horrible. What Typescript and Flow forces you is to have src folder and then **transpile** your code into dist folder and run it with node. WTH is that all about. Why can't this be done in memory or some virtual file system? Why? I see no reason why this couldn't be done like this. But it is what it is. I abandoned all hope for static type checking.
27
28One of the problems that resulted from not having interfaces or types was inability to model out our data from **Elasticsearch**. I could have done a **pedestrian implementation** of it, but there must be a better way of doing this without resorting to some hack basically. Or maybe I haven't found a solution, which is also a possibility. I have looked, though. No juice!
29
30**Error handling?** Is that a joke?
31
32Thank god for **await/async**. Without it, I would have probably just abandoned the whole thing and went with something else like Python. That's all I am going to say about this :)
33
34I started asking myself a question if Node.js is actually ready to be used in a **large scale applications**? And this was a totally wrong question. What I should have been asking myself was, how to use Node.js in large scale application. And you don't get this in **marketing material** for Express or Koa etc. They never tell you this. Making Node.js scale on infrastructure or in codebase is really **more of an art than a science**. And just like with the whole JavaScript ecosystem:
35- impossible to master,
36- half of your time you work on your tooling,
37- just accept transpilers that convert one code into another (holly smokes),
38- error handling is a joke,
39- standards? What standards?
40
41But on the other hand. As I did, you will also learn to love it. Learn to use it quickly and do impossible things in crazy limited time.
42
43I hate to admit it. But I love Node.js. Dammit, I love it :)
diff --git a/posts/2020-05-05-remote-work.md b/posts/2020-05-05-remote-work.md
new file mode 100755
index 0000000..1588dbe
--- /dev/null
+++ b/posts/2020-05-05-remote-work.md
@@ -0,0 +1,39 @@
1---
2Title: Remote work and how it affects the daily lives of people
3Description: Remote work and how it affects the daily lives of people
4Slug: remote-work
5Listing: true
6Created: 2020, May 5
7Tags: []
8---
9
10I have been working remotely for the past 5 years. I love it. Love the freedom and make your schedule thingy.
11
12## You work more not less
13
14I've heard from people things like: "Oh, you are so lucky, working from home, having all the free time you want". It was obvious they had no clue what means working remotely. They had this romantic idea of remote work. You can watch TV whenever you like, you can go outside for a picnic if you want and stuff like that.
15
16This may be true if you work a day or two in a week from home. But if you go completely remote all these changes completely. I take some time to acclimate but then you start feeling the consequences of going fully remote. And it's not all rainbows and unicorns. Rather the opposite.
17
18## Feeling lost
19
20At first, I remembered I felt lost. I was not used to this kind of environment. It felt disoriented and a part of you that is used to procrastinate turns on. You start thinking of a workday as a whole day. And soon this idea of "I can do this later" starts creeping in. Well, I have the whole day ahead of me. I can do this a bit later.
21
22## Hyper-performance
23
24As a direct result, you become more focused on your work since you don't have all the interruptions common in the workplace. And you can quickly get used to this hyper-performance. But this mode requires also a lot of peace and quiet.
25
26And here we come to the ugly parts of all this. **People rarely have the self-control** to not waste other people's time. It is paralyzing when people start calling you, sending you chat messages, etc. The thing is, that when I achieve this hyper-performance mode I am completely embroiled in the problem I am solving and this kind of interruptions mess with your head. I need an hour at least to get back in the zone. Sometimes not achieving the same focus the whole day.
27
28I know that life is not how you want it to be and takes its route but from what I've learned this kind of interruptions can be avoided in 90% of the case easily just by closing any chat programs and putting your phone in a drawer.
29
30## Suggestion to all the new remote workers
31
32- Stop wasting other people's time. You don't bother people at their desks in the office either.
33- Do not replace daily chats in the hallways with instant messaging software. It will only interrupt people. Nothing good will come of it.
34- Set your working hours and try to not allow it to bleed outside these boundaries and maintain your routine.
35- Be prepared that hours will be longer regardless of your good intentions and your well thought of routine.
36- Try to be hyper-focused and do only one thing at the time. Multitasking is the enemy of progress.
37- Avoid long meetings and if possible eliminate them. Rather take time to write them out and allow others to respond in their own time. Meetings are usually a large waste of time and most of the people attending them are there just because the manager said so.
38- The software will not solve your problems. And throwing money at problems neither.
39- If you are in a managerial position don't supervise any single minute of workers. They are probably giving you more hours anyways. Track progress weekly not daily. You hired them and give them the benefit of the doubt that they will deliver what you agreed upon.
diff --git a/posts/2020-08-15-systemd-disable-wake-onmouse.md b/posts/2020-08-15-systemd-disable-wake-onmouse.md
new file mode 100644
index 0000000..f4ac0ee
--- /dev/null
+++ b/posts/2020-08-15-systemd-disable-wake-onmouse.md
@@ -0,0 +1,49 @@
1---
2Title: Disable mouse wake from suspend with systemd service
3Description: Disable mouse wake from suspend with systemd service
4Slug: disable-mouse-wake-from-suspend-with-systemd-service
5Listing: true
6Created: 2020, August 15
7Tags: []
8---
9
10I recently bought [ThinkPad X220](https://www.laptopmag.com/reviews/laptops/lenovo-thinkpad-x220) just as a joke on eBay to test Linux distributions and play around with things and not destroy my main machine. Little to my knowledge I felt in love with it. Man, they really made awesome machines back then.
11
12After changing disk that came with it to SSD and installing Ubuntu to test if everything works I noticed that even after a single touch of my external mouse the system would wake up from sleep even though the lid was shut down.
13
14I wouldn't even noticed it if laptop didn't have [LED sleep indicator](https://support.lenovo.com/lk/en/solutions/~/media/Images/ContentImages/p/pd025386_x1_status_03.ashx?w=426&h=262). I already had a bad experience with Linux and it's power management. I had a [Dell Inspiron 7537](https://www.pcmag.com/reviews/dell-inspiron-15-7537) laptop with a touchscreen and while traveling it decided to wake up and started cooking in my backpack to the point that the digitizer responsible for touch actually glue off and the whole screen got wrecked. So, I am a bit touchy about this.
15
16I went on solution hunting and to my surprise there is no easy way to disable specific devices to perform wake up. Why is this not under the power management tab in setting is really strange.
17
18After googling for a solution I found [this nice article describing the solution](https://codetrips.com/2020/03/18/ubuntu-disable-mouse-wake-from-suspend/) that worked for me. The only problem with this solution was that he added his solution to `.bashrc` and this triggers `sudo` that asks for a password each time new terminal is opened, which get annoying quickly since I open a lot of terminals all the time.
19
20I followed his instructions and got to solution `sudo sh -c "echo 'disabled' > /sys/bus/usb/devices/2-1.1/power/wakeup"`.
21
22I created a system service file `sudo nano /etc/systemd/system/disable-mouse-wakeup.service` and removed `sudo` and replaced `sh` with `/usr/bin/sh` and pasted all that in `ExecStart`.
23
24```ini
25[Unit]
26Description=Disables wakeup on mouse event
27After=network.target
28StartLimitIntervalSec=0
29
30[Service]
31Type=simple
32Restart=always
33RestartSec=1
34User=root
35ExecStart=/usr/bin/sh -c "echo 'disabled' > /sys/bus/usb/devices/2-1.1/power/wakeup"
36
37[Install]
38WantedBy=multi-user.target
39```
40
41After that I enabled, started and checked status of service.
42
43```sh
44sudo systemctl enable disable-mouse-wakeup.service
45sudo systemctl start disable-mouse-wakeup.service
46sudo systemctl status disable-mouse-wakeup.service
47```
48
49This will permanently disable that device from wakeing up you computer on boot. If you have many devices you would like to surpress from waking up your machine I would create a shell script and call that instead of direclty doing it in service file.
diff --git a/posts/2020-09-06-esp-and-micropython.md b/posts/2020-09-06-esp-and-micropython.md
new file mode 100644
index 0000000..1052795
--- /dev/null
+++ b/posts/2020-09-06-esp-and-micropython.md
@@ -0,0 +1,205 @@
1---
2Title: Getting started with MicroPython and ESP8266
3Description: Getting started with MicroPython and ESP8266
4Slug: esp8266-and-micropython-guide
5Listing: true
6Created: 2020, September 6
7Tags: []
8---
9
10**Table of contents**
11
121. [Introduction](#introduction)
132. [Flashing the SOC](#flashing-the-soc)
143. [Install better tooling](#install-better-tooling)
154. [Additional resources](#additional-resources)
16
17
18## Introduction
19
20A while ago I bought some [ESP8266](https://www.espressif.com/en/products/socs/esp8266) and [ESP32](https://www.espressif.com/en/products/socs/esp32) dev boards to play around with and I finally found a project to try it out.
21
22For my project, I used [ESP32](https://www.espressif.com/en/products/socs/esp32) but I could easily choose [ESP8266](https://www.espressif.com/en/products/socs/esp8266). This guide contains which tools I use and how I prepared my workspace to code for [ESP8266](https://www.espressif.com/en/products/socs/esp8266).
23
24![ESP8266 and ESP32 boards](/assets/esp8366-micropython/boards.jpg)
25
26This guide covers:
27- flashing SOC
28- install proper tooling
29- deploying a simple script
30
31> Make sure that you are using **a good USB cable**. I had some problems with mine and once I replaced it everything started to work.
32
33## Flashing the SOC
34
35Plug your ESP8266 to USB port and check if the device was recognized with executing `dmesg | grep ch341-uart`.
36
37Then check if the device is available under `/dev/` by running `ls /dev/ttyUSB*`.
38
39> **Linux users**: if a device is not available be sure you are in `dialout` group. You can check this by executing `groups $USER`. You can add a user to `dialout` group with `sudo adduser $USER dialout`.
40
41After these conditions are meet go to the navigate to [https://micropython.org/download/esp8266/](https://micropython.org/download/esp8266/) and download `esp8266-20200902-v1.13.bin`.
42
43```sh
44mkdir esp8266-test
45cd esp8266-test
46
47wget https://micropython.org/resources/firmware/esp8266-20200902-v1.13.bin
48```
49
50After obtaining firmware we will need some tooling to flash the firmware to the board.
51
52```sh
53sudo pip3 install esptool
54```
55
56You can read more about `esptool` at [https://github.com/espressif/esptool/](https://github.com/espressif/esptool/).
57
58Before flashing the firmware we need to erase the flash on device. Substitute `USB0` with the device listed in output of `ls /dev/ttyUSB*`.
59
60```sh
61esptool.py --port /dev/ttyUSB0 erase_flash
62```
63
64If flash was successfully erased it is now time to flash the new firmware to it.
65
66```sh
67esptool.py --port /dev/ttyUSB0 --baud 460800 write_flash --flash_size=detect 0 esp8266-20200902-v1.13.bin
68```
69
70If everything went ok you can try accessing MicroPython REPL with `screen /dev/ttyUSB0 115200` or `picocom /dev/ttyUSB0 -b115200`.
71
72> Sometimes you will need to press `ENTER` in `screen` or `picocom` to access REPL.
73
74When you are in REPL you can test if all is working properly following steps.
75
76```py
77> import machine
78> machine.freq()
79```
80
81This should output a number representing a frequency of the CPU (mine was `80000000`).
82
83When you are in `screen` or `picocom` these can help you a bit.
84
85| Key | Command |
86| -------- | -------------------- |
87| CTRL+d | preforms soft reboot |
88| CTRL+a x | exits picocom |
89| CTRL+a \ | exits screen |
90
91
92## Install better tooling
93
94Now, to make our lives a little bit easier there are couple of additional tools that will make this whole experience a little more bearable.
95
96There are twq cool ways of uploading local files to SOC flash.
97
98- ampy → [https://github.com/scientifichackers/ampy](https://github.com/scientifichackers/ampy)
99- rshell → [https://github.com/dhylands/rshell](https://github.com/dhylands/rshell)
100
101### ampy
102
103```bash
104# installing ampy
105sudo pip3 install adafruit-ampy
106```
107
108Listed below are some common commands I used.
109
110```bash
111
112# uploads file to flash
113ampy --delay 2 --port /dev/ttyUSB0 put boot.py
114
115# lists file on flash
116ampy --delay 2 --port /dev/ttyUSB0 ls
117
118# outputs contents of file on flash
119ampy --delay 2 --port /dev/ttyUSB0 cat boot.py
120```
121
122> I added `delay` of 2 seconds because I had problems with executing commands.
123
124### rshell
125
126Even though `ampy` is a cool tool I opted with `rshell` in the end since it's much more polished and feature rich.
127
128```bash
129# installing ampy
130sudo pip3 install rshell
131```
132
133Now that `rshell` is installed we can connect to the board.
134
135```bash
136rshell --buffer-size=30 -p /dev/ttyUSB0 -a
137```
138
139This will open a shell inside bash and from here you can execute multiple commands. You can check what is supported with `help` once you are inside of a shell.
140
141```bash
142m@turing ~/Junk/esp8266-test
143$ rshell --buffer-size=30 -p /dev/ttyUSB0 -a
144
145Using buffer-size of 30
146Connecting to /dev/ttyUSB0 (buffer-size 30)...
147Trying to connect to REPL connected
148Testing if ubinascii.unhexlify exists ... Y
149Retrieving root directories ... /boot.py/
150Setting time ... Sep 06, 2020 23:54:28
151Evaluating board_name ... pyboard
152Retrieving time epoch ... Jan 01, 2000
153Welcome to rshell. Use Control-D (or the exit command) to exit rshell.
154/home/m/Junk/esp8266-test> help
155
156Documented commands (type help <topic>):
157========================================
158args cat connect date edit filesize help mkdir rm shell
159boards cd cp echo exit filetype ls repl rsync
160
161Use Control-D (or the exit command) to exit rshell.
162```
163
164> Inside a shell `ls` will display list of files on your machine. To get list of files on flash folder `/pyboard` is remapped inside the shell. To list files on flash you must perform `ls /pyboard`.
165
166#### Moving files to flash
167
168To avoid copying files all the time I used `rsync` function from the inside of `rshell`.
169
170```bash
171rsync . /pyboard
172```
173
174#### Executing scripts
175
176It is a pain to continuously reboot the device to trigger `/pyboard/boot.py` and there is a better way of testing local scripts on remote device.
177
178Lets assume we have `src/freq.py` file that displays CPU frequency of a remote device.
179
180```py
181# src/freq.py
182
183import machine
184print(machine.freq())
185```
186
187Now lets upload this and execute it.
188
189```bash
190# syncs files to remove device
191rsync ./src /pyboard
192
193# goes into REPL
194repl
195
196# we import file by importing it without .py extension and this will run the script
197> import freq
198
199# CTRL+x will exit REPL
200```
201
202## Additional resources
203
204- [https://randomnerdtutorials.com/getting-started-micropython-esp32-esp8266/](https://randomnerdtutorials.com/getting-started-micropython-esp32-esp8266/)
205- [http://docs.micropython.org/en/latest/esp8266/quickref.html](http://docs.micropython.org/en/latest/esp8266/quickref.html)
diff --git a/posts/2020-09-08-bind-warning-on-login.md b/posts/2020-09-08-bind-warning-on-login.md
new file mode 100644
index 0000000..2ccc3c6
--- /dev/null
+++ b/posts/2020-09-08-bind-warning-on-login.md
@@ -0,0 +1,42 @@
1---
2Title: Fix bind warning in .profile on login in Ubuntu
3Description: Fix bind warning in .profile on login in Ubuntu
4Slug: bind-warning-on-login-in-ubuntu
5Listing: true
6Created: 2020, September 8
7Tags: []
8---
9
10Recently I moved back to [bash](https://www.gnu.org/software/bash/) as my default shell. I was previously using [fish](https://fishshell.com/) and got used to the cool features it has. But, regardless of that, I wanted to move to a more standard shell because I was hopping back and forth with exporting variables and stuff like that which got pretty annoying.
11
12So I embarked on a mission to make [bash](https://www.gnu.org/software/bash/) more like [fish](https://fishshell.com/) and in the process found that I really missed autosuggest with TAB on changing directories.
13
14I found a nice alternative that emulates [zsh](http://zsh.sourceforge.net/) like autosuggestion and autocomplete so I added the following to my `.bashrc` file.
15
16```bash
17bind "TAB:menu-complete"
18bind "set show-all-if-ambiguous on"
19bind "set completion-ignore-case on"
20bind "set menu-complete-display-prefix on"
21bind '"\e[Z":menu-complete-backward'
22```
23
24I haven't noticed anything wrong with this and all was working fine until I restarted my machine and then I got this error.
25
26![Profile bind error](/assets/profile-bind-error/error.jpg)
27
28When I pressed OK, I got into the [Gnome shell](https://wiki.gnome.org/Projects/GnomeShell) and all was working fine, but the error was still bugging me. I started looking for the reason why this is happening and found a solution to this error on [Remote SSH Commands - bash bind warning: line editing not enabled](https://superuser.com/a/892682).
29
30So I added a simple `if [ -t 1 ]` around `bind` statements to avoid running commands that presume the session is interactive when it isn't.
31
32```bash
33if [ -t 1 ]; then
34 bind "TAB:menu-complete"
35 bind "set show-all-if-ambiguous on"
36 bind "set completion-ignore-case on"
37 bind "set menu-complete-display-prefix on"
38 bind '"\e[Z":menu-complete-backward'
39fi
40```
41
42After logging out and back in the problem was gone.
diff --git a/posts/2020-09-09-digitalocean-sync.md b/posts/2020-09-09-digitalocean-sync.md
new file mode 100644
index 0000000..eeaf096
--- /dev/null
+++ b/posts/2020-09-09-digitalocean-sync.md
@@ -0,0 +1,66 @@
1---
2Title: Using Digitalocean Spaces to sync between computers
3Description: Using Digitalocean Spaces to sync between computers
4Slug: digitalocean-spaces-to-sync-between-computers
5Listing: true
6Created: 2020, September 9
7Tags: []
8---
9
10I've been using [Dropbox](https://www.dropbox.com/) for probably **10+ years** now and I-ve became so used to it that it runs in the background that I don't even imagine a world without it. But it's not without problems.
11
12At first I had problems with `.venv` environments for Python and the only solution for excluding synchronization for this folder was to manually exclude a specific folder which is not really scalable. FYI, my whole project folder is synced on [Dropbox](https://www.dropbox.com/). This of course introduced a lot of syncing of files and folders that are not needed or even break things on other machines. In the case of **Python**, I couldn't use that on my second machine. I needed to delete `.venv` folder and pip it again which synced files again to the main machine. This was very frustrating. **Nodejs** handles this much nicer and I can just run the scripts without deleting `node_modules` again and reinstalling. However, `node_modules` is a beast of its own. It creates so many files that OS has a problem counting them when you check the folder contents for size.
13
14I wanted something similar to Dropbox. I could without the instant syncing but it would need to be fast and had the option for me to exclude folders like `node_modules, .venv, .git` and folders like that.
15
16I went on a hunt for an alternative to [Dropbox](https://www.dropbox.com/) and found:
17
18- [Tresorit](https://tresorit.com/)
19- [Sync.com](https://sync.com)
20- [Box](https://www.box.com/)
21
22You know, the usual list of suspects. I didn't include [Google drive](https://drive.google.com) or [One drive](https://onedrive.live.com/) since they are even more draconian than Dropbox.
23
24> All this does not stem from me being paranoid but recently these companies have became more and more aggressive and they keep violating our privacy when they share our data with 3rd party services. It is getting out of control.
25
26So, my main problem was still there. No way of excluding a specific folder from syncing. And before we go into "*But you have git, isn't that enough?*", I must say, that many of the files (PDFs, spreadsheets, etc) I have in a `git` repo don't get pushed upstream to Git and I still want to have them synced across my computers.
27
28I initially wanted to use [rsync](https://linux.die.net/man/1/rsync) but I would need to then have a remote VPS or transfer between my computers directly. I wanted a solution where all my files could be accessible to me without my machine.
29
30> **WARNING: This solution will cost you money!** DigitalOcean Spaces are $5 per month and there are some bandwidth limitations and if you go beyond that you get billed additionally.
31
32Then I remembered that I could use something like [S3](https://en.wikipedia.org/wiki/Amazon_S3) since it has versioning and is fully managed. I didn't want to go down the AWS rabbit hole with this so I choose [DigitalOcean Spaces](https://www.digitalocean.com/products/spaces/).
33
34Then I needed a command-line tool to sync between source and target. I found this nice tool [s3cmd](https://s3tools.org/s3cmd) and it is in the Ubuntu repositories.
35
36```bash
37sudo apt install s3cmd
38```
39
40After installation will I create a new Space bucket on DigitalOcean. Remember the zone you will choose because you will need it when you will configure `s3cmd`.
41
42Then I visited [Digitalocean Applications & API](https://cloud.digitalocean.com/account/api/tokens) and generated **Spaces access keys**. Save both key and secret somewhere safe because when you will leave the page secret will not be available anymore to you and you will need to re-generate it.
43
44```bash
45# enter your key and secret and correct endpoint
46# my endpoint is ams3.digitaloceanspaces.com because
47# I created my bucket in Amsterdam regiin
48s3cmd --configure
49```
50After that I played around with options for `s3cmd` and got to the following command.
51
52```bash
53# I executed this command from my projects folder
54cd projects
55s3cmd sync --delete-removed --exclude 'node_modules/*' --exclude '.git/*' --exclude '.venv/*' ./ s3://my-bucket-name/projects/
56```
57
58When syncing int he other direction you will need to change the order of the `SOURCE` and `TARGET` to `s3://my-bucket-name/projects/` and `./`.
59
60> Be sure that all the paths have trailing slash so that sync knows that this are directories.
61
62I am planning to implement some sort of a `.ignore` file that will enable me to have a project-specific exclude options.
63
64I am currently running this every hour as a cronjob which is perfectly fine for now when I am testing how this whole thing works and how it all will turn out.
65
66I have also created a small Gnome extension which is still very unstable, but when/if this whole experiment pays of I will share on Github.