aboutsummaryrefslogtreecommitdiff
path: root/content
diff options
context:
space:
mode:
Diffstat (limited to 'content')
-rw-r--r--content/2015-03-07-curriculum-vitae copy.md69
-rw-r--r--content/2017-03-07-golang-profiling-simplified.md111
-rw-r--r--content/2017-04-17-what-i-ve-learned-developing-ad-server.md134
-rw-r--r--content/2017-04-21-profiling-python-web-applications-with-visual-tools.md185
-rw-r--r--content/2017-08-11-simple-iot-application.md487
-rw-r--r--content/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md261
-rw-r--r--content/2019-01-03-encoding-binary-data-into-dna-sequence.md346
-rw-r--r--content/2019-10-14-simplifying-and-reducing-clutter.md22
-rw-r--r--content/2019-10-19-using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md86
-rw-r--r--content/2020-03-22-simple-sse-based-pubsub-server.md396
-rw-r--r--content/2020-03-27-create-placeholder-images-with-sharp.md83
-rw-r--r--content/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md76
-rw-r--r--content/2020-03-30-my-love-and-hate-relationship-with-nodejs copy.md41
-rw-r--r--content/2020-04-05-remote-work.md37
-rw-r--r--content/2020-08-15-systemd-disable-wake-onmouse.md47
-rw-r--r--content/2020-09-06-esp-and-micropython.md203
-rw-r--r--content/2020-09-08-bind-warning-on-login.md40
-rw-r--r--content/2020-09-09-digitalocean-sync.md65
-rw-r--r--content/2020-12-25-weekly-newsletter.md10
19 files changed, 0 insertions, 2699 deletions
diff --git a/content/2015-03-07-curriculum-vitae copy.md b/content/2015-03-07-curriculum-vitae copy.md
deleted file mode 100644
index 1b2fddf..0000000
--- a/content/2015-03-07-curriculum-vitae copy.md
+++ /dev/null
@@ -1,69 +0,0 @@
1~ title: Curriculum Vitae
2~ slug: /curriculum-vitae.html
3~ date: 2018-01-16
4~ template: page
5~ hide: true
6
7**Mitja Felicijan**
8
9*[mitja.felicijan@gmail.com](mailto:mitja.felicijan@gmail.com?subject=Website+CV+Contact)*
10
11*Slovenia, EU*
12
13## Technical experience
14
15- **Key languages:** Golang, Python, C, Bash.
16- **Platforms:** GNU/Linux, macOS.
17- **Interests:** Zigbee, KNX, Modbus, Machine to Machine, Embedded systems, Operating systems, Distributed systems, IOT, RDBMS, Algorithms, Database engine design, SQL, NoSQL, NewSQL, Big data analytics, Machine learning, Prediction algorithms, Realtime analytics, Systems automation, Natural language processing, Bioinformatics.
18
19## Major projects
20
21- SMS marketing system (2007)
22- Yacht management software (2008)
23- Smart Home Gateway (2009)
24- Moxa UPort 1130 USB to RS485 Universal Linux driver (2009)
25- Remote management of electricity meter (2009)
26- Remote management of blood pressure monitor (2010)
27- Infomat automation system (2010)
28- GPS Tourist - GIS Software (2011)
29- Minimal GNU/Linux distribution for embedded platforms (2011)
30- Digital Jukebox system (2012)
31- NanoCloudLogger - Machine to Machine (2012)
32- Street Lightning System (2012)
33- Smart cabins with hardware sensor management (2013)
34- Contextual advertising server (2015)
35- Network accessible database engine for caching and in-memory storage (2016)
36- Tick database engine specifically designed for storing and processing large amount of sensor data with high write throughput (2016)
37- Wireless industrial lighting management system - hardware and software (2016)
38- Minimal configuration reverse proxy (2017)
39- Industrial IOT platform for deployment on on-premise (2018)
40- Custom Platform as a service based on Docker Swarm (2018)
41- Toolkit for encoding binary data into DNA sequence (2019)
42- Minimal configuration reverse proxy with load balancing and rate limiting (2019)
43- E-ink conference room occupancy display, hardware and software solution (2019)
44
45## Employment history
46
47- Freelancer (2001 – Present)
48- Software developer at Mobinia (2005 – 2007)
49- CTO at Milk (2007 – 2009)
50- Co-Founder of UTS (2009 – 2014)
51- Senior Software Engineer at TSmedia (2015 - 2017)
52- Senior Software Engineer at Renderspace (2017 - 2019)
53- IT Consultant (2017 – Present)
54
55## Awards
56
57- Regional Award for Innovation by Chamber of Commerce and Industry of Slovenia for project Intelligent system management and regulation of Street Lighting, 2010
58- National Award for Innovation by Chamber of Commerce and Industry of Slovenia for project Intelligent system management and regulation of Street Lighting, 2010
59
60## Key responsibilities
61
62- Embedded platform development.
63- Hardware design and driver development.
64- Designing, developing and testing systems.
65- Implementation of the systems.
66- Writing and maintaining user and technical documents.
67- Development and maintenance of the project.
68- Code revision, testing and output.
69- Work on the enhancement suggested by the customers and fixes the bugs reported.
diff --git a/content/2017-03-07-golang-profiling-simplified.md b/content/2017-03-07-golang-profiling-simplified.md
deleted file mode 100644
index d5a9541..0000000
--- a/content/2017-03-07-golang-profiling-simplified.md
+++ /dev/null
@@ -1,111 +0,0 @@
1~ title: Golang profiling simplified
2~ description: Golang profiling demystified
3~ slug: /golang-profiling-simplified.html
4~ date: 2017-03-07
5~ template: post
6~ hide: false
7
8Many posts have been written regarding profiling in Golang and I haven’t found proper tutorial regarding this. Almost all of them are missing some part of important information and it gets pretty frustrating when you have a deadline and are not finding simple distilled solution.
9
10Nevertheless, after searching and experimenting I have found a solution that works for me and probably should also for you.
11
12## Where are my pprof files?
13
14By default pprof files are generated in /tmp/ folder. You can override folder where this files are generated programmatically in your golang code as we will see below in example.
15
16## Why is my CPU profile empty?
17
18I have found out that sometimes CPU profile is empty because program was not executing long enough. Programs, that execute too quickly don’t produce pprof file in my cases. Well, file is generated but only contains 4KB of information.
19
20## Profiling
21
22As you can see from examples we are executing dummy_benchmark functions to ensure some sort of execution. Memory profiling can be done without such a “complex” function. But CPU profiling needs it.
23
24Both memory and CPU profiling examples are almost the same. Only parameters in main function when calling profile.Start are different. When we set profile.ProfilePath(“.”) we tell profiler to store pprof files in the same folder as our program.
25
26### Memory profiling
27
28```go
29package main
30
31import (
32 "fmt"
33 "time"
34 "github.com/pkg/profile"
35)
36
37func dummy_benchmark() {
38
39 fmt.Println("first set ...")
40 for i := 0; i < 918231333; i++ {
41 i *= 2
42 i /= 2
43 }
44
45 <-time.After(time.Second*3)
46
47 fmt.Println("sencond set ...")
48 for i := 0; i < 9182312232; i++ {
49 i *= 2
50 i /= 2
51 }
52}
53
54func main() {
55 defer profile.Start(profile.MemProfile, profile.ProfilePath("."), profile.NoShutdownHook).Stop()
56 dummy_benchmark()
57}
58```
59
60### CPU profiling
61
62```go
63package main
64
65import (
66 "fmt"
67 "time"
68 "github.com/pkg/profile"
69)
70
71func dummy_benchmark() {
72
73 fmt.Println("first set ...")
74 for i := 0; i < 918231333; i++ {
75 i *= 2
76 i /= 2
77 }
78
79 <-time.After(time.Second*3)
80
81 fmt.Println("sencond set ...")
82 for i := 0; i < 9182312232; i++ {
83 i *= 2
84 i /= 2
85 }
86}
87
88func main() {
89 defer profile.Start(profile.CPUProfile, profile.ProfilePath("."), profile.NoShutdownHook).Stop()
90 dummy_benchmark()
91}
92```
93
94### Generating profiling reports
95
96```bash
97# memory profiling
98go build mem.go
99./mem
100go tool pprof -pdf ./mem mem.pprof > mem.pdf
101
102# cpu profiling
103go build cpu.go
104./cpu
105go tool pprof -pdf ./cpu cpu.pprof > cpu.pdf
106```
107
108This will generate PDF document with visualized profile.
109
110- [Memory PDF profile example](/assets/go-profiling/golang-profiling-mem.pdf)
111- [CPU PDF profile example](/assets/go-profiling/golang-profiling-cpu.pdf)
diff --git a/content/2017-04-17-what-i-ve-learned-developing-ad-server.md b/content/2017-04-17-what-i-ve-learned-developing-ad-server.md
deleted file mode 100644
index d5f06b3..0000000
--- a/content/2017-04-17-what-i-ve-learned-developing-ad-server.md
+++ /dev/null
@@ -1,134 +0,0 @@
1~ title: What I've learned developing ad server
2~ description: Lessons I learned developing contextual ad server
3~ slug: /what-i-ve-learned-developing-ad-server.html
4~ date: 2017-04-17
5~ template: post
6~ hide: false
7
8For the past year and half I have been developing native advertising server that contextually matches ads and displays them in different template forms on variety of websites. This project grew from serving thousands of ads per day to millions.
9
10The system is made from couple of core components:
11
12- API for serving ads,
13- Utils - cronjobs and queue management tools,
14- Dashboard UI.
15
16Initial release was using [MongoDB](https://www.mongodb.com/) for full-text search but was later replaced by [Elasticsearch](https://www.elastic.co/) for better CPU utilization and better search performance. This provided us with many amazing functionalities of [Elasticsearch](https://www.elastic.co/). You should check it out if you do any search related operations.
17
18Because the premise of the server is to provide native ad experience, they are rendered on the client side via simple templating engine. This ensures that ads can be displayed number of different ways based on the visual style of the page. And this makes JavaScript client library quite complex.
19
20So now that you know basic information about the product lets get into the lessons we learned.
21
22## Aggregate everything
23
24After beta version was released everything (impressions, clicks, etc) was written in nanosecond resolution in the database. At that time we were using [PostgreSQL](https://www.postgresql.org/) and database quickly grew way above 200GB in disk space. And that was problematic. Statistics took disturbingly long time to aggregate. Also using indexes on stats table in database was no help after we reached 500 million datapoints.
25
26> There is a marketing product information and there is real life experience. And the tend to be quite the opposite.
27
28This was the reason that now everything is aggregated on daily basis and this data is then fed to Elastic in form of daily summary. With this we achieved we can now track many more dimensions such as zone, channel and platform information. And with this information we can now adapt occurrences of ads on specific places more precisely.
29
30We have also adapted [Redis](https://redis.io/) as a full-time citizen in our stack. Because Redis also stores information on a local disk we have some sort of backup if server would accidentally suffer some failure.
31
32All the real-time statistics for ad serving and redirecting is presented as counters in Redis instance and daily extracted and pushed to Elastic.
33
34## Measure everything
35
36The thing about software is that we really don't know how well it is performing under load until such load is presented. When testing locally everything is fine but when on production things tend to fall apart.
37
38As a solution for this we are measuring everything we can. Function execution time (by encapsulating functions with timers), server performance (cpu, memory, disk, etc), Nginx and [uWSGI](https://uwsgi-docs.readthedocs.io/) performance. We sacrifice a bit of performance for the sake of this information. And we store all this information for later analysis.
39
40**Example of function execution time**
41
42```json
43{
44 "get_final_filtered_ads": {
45 "counter": 1931250,
46 "avg": 0.0066143431,
47 "elapsed": 12773.9500310003
48 },
49 "store_keywords_statistics": {
50 "counter": 1931011,
51 "avg": 0.0004605267,
52 "elapsed": 889.2821669996
53 },
54 "match_by_context": {
55 "counter": 1931011,
56 "avg": 0.0055960716,
57 "elapsed": 10806.0758889999
58 },
59 "match_by_high_performance": {
60 "counter": 262,
61 "avg": 0.0152770229,
62 "elapsed": 4.00258
63 },
64 "store_impression_stats": {
65 "counter": 1931250,
66 "avg": 0.0006189991,
67 "elapsed": 1195.4419869999
68 }
69}
70```
71
72We have also started profiling with [cProfile](https://pymotw.com/2/profile/) and then visualizing with [KCachegrind](http://kcachegrind.sourceforge.net/). This provides much more detailed look into code execution.
73
74## Cache control is your friend
75
76Because we use Javascript library for rendering ads we rely on this script extensively and when in need we need to be able to change behavior of the script quickly.
77
78In our case we can not simply replace javascript url in html code. It usually takes a day or two for the guys who maintain sites to change code or add ?ver=xxx attribute. And this makes rapid deployment and testing very difficult and time consuming. There is a limitation of how much you can test locally.
79
80We are now in the process of integrating [Google Tag Manager](https://www.google.com/analytics/tag-manager/) but couple of websites are developed on ASP.net platform that have some problems with tag manager. With a solution below we are certain that we are serving latest version of the script.
81
82And it only takes one mistake and users have the script cached and in case of caching it for 1 year you probably know where the problem is.
83
84```nginx
85# nginx ➜ /etc/nginx/sites-available/default
86location /static/ {
87 alias /path-to-static-content/;
88 autoindex off;
89 charset utf-8;
90 gzip on;
91 gzip_types text/plain application/javascript application/x-javascript text/javascript text/xml text/css;
92 location ~* \.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)$ {
93 expires 1y;
94 add_header Pragma public;
95 add_header Cache-Control "public";
96 }
97 location ~* \.(css|js|txt)$ {
98 expires 3600s;
99 add_header Pragma public;
100 add_header Cache-Control "public, must-revalidate";
101 }
102}
103```
104
105Also be careful when redirecting to url in your python code. We noticed that if we didn't precisely setup cache control and expire headers in response we didn't get the request on the server and therefore couldn't measure clicks. So when redirecting do as follows and there will be no problems.
106
107```python
108# python ➜ bottlepy web micro-framework
109response = bottle.HTTPResponse(status=302)
110response.set_header("Cache-Control", "no-store, no-cache, must-revalidate")
111response.set_header("Expires", "Thu, 01 Jan 1970 00:00:00 GMT")
112response.set_header("Location", url)
113return response
114```
115
116> Cache control in browsers is quite aggressive and you need to be precise to avoid future problems. We learned that lesson the hard way.
117
118## Learn NGINX
119
120When deciding on a web server we went with Nginx as a reverse proxy for our applications. We adapted micro-service oriented architecture early in the project to ensure when we scale we can easily add additional servers to our cluster. And Nginx was crucial to perform load balancing and static content delivery.
121
122At first our config file was quite simple and later grew larger. After patching and adding new settings I sat down and learned more about the guts of Nginx. This proved to be very useful and we were able to squeeze much more out of our setup. So I advise you to take your time and read through the [documentation](https://nginx.org/en/docs/). This saved us a lot of headache. Googling for solutions only goes so far.
123
124## Use Redis/Memcached
125
126As explained above we are using caching basically for everything. It is the corner stone of our services. At first we were very careful about the quantity of things we stored in [Redis](https://redis.io/). But we later found out that the memory footprint is very low even when storing large amount of data in it.
127
128So we gradually increased our usage to caching whole HTML outputs of dashboard. This improved our performance in order of magnitude. And by using native TTL support this goes hand in hand with our needs.
129
130The reason why we choose [Redis](https://redis.io/) over [Memcached](https://memcached.org/) was the nature of scalability of Redis out of the box. But all this can be achieved with Memcached.
131
132## Conclusion
133
134There are a lot more details that could have been written and every single topic in here deserves it's own post but you probably got the idea about the problems we faced.
diff --git a/content/2017-04-21-profiling-python-web-applications-with-visual-tools.md b/content/2017-04-21-profiling-python-web-applications-with-visual-tools.md
deleted file mode 100644
index 98a1971..0000000
--- a/content/2017-04-21-profiling-python-web-applications-with-visual-tools.md
+++ /dev/null
@@ -1,185 +0,0 @@
1~ title: Profiling Python web applications with visual tools
2~ description: Missing link when debugging and profiling python web application
3~ slug: /profiling-python-web-applications-with-visual-tools.html
4~ date: 2017-04-21
5~ template: post
6~ hide: false
7
8I have been profiling my software with KCachegrind for a long time now and I was missing this option when I am developing API's or other web services. I always knew that this is possible but never really took the time and dive into it.
9
10Before we begin there are some requirements. We will need to:
11
12- implement [cProfile](https://docs.python.org/2/library/profile.html#module-cProfile) into our web app,
13- convert output to [callgrind](http://valgrind.org/docs/manual/cl-manual.html) format with [pyprof2calltree](https://pypi.python.org/pypi/pyprof2calltree/),
14- visualize data with [KCachegrind](http://kcachegrind.sourceforge.net/html/Home.html) or [Profiling Viewer](http://www.profilingviewer.com/).
15
16
17If you are using MacOS you should check out [Profiling Viewer](http://www.profilingviewer.com/) or [MacCallGrind](http://www.maccallgrind.com/).
18
19![KCachegrind](/assets/python-profiling/kcachegrind.png)
20
21We will be dividing this post into two main categories:
22
23- writing simple web-service,
24- visualize profile of this web-service.
25
26## Simple web-service
27
28Let's use virtualenv so we won't pollute our base system. If you don't have virtualenv installed on your system you can install it with pip command.
29
30```bash
31# let's install virtualenv globally
32$ sudo pip install virtualenv
33
34# let's also install pyprof2calltree globally
35$ sudo pip install pyprof2calltree
36
37# now we create project
38$ mkdir demo-project
39$ cd demo-project/
40
41# now let's create folder where we will store profiles
42$ mkdir prof
43
44# now we create empty virtualenv in venv/ folder
45$ virtualenv --no-site-packages venv
46
47# we now need to activate virtualenv
48$ source venv/bin/activate
49
50# you can check if virtualenv was correctly initialized by
51# checking where your python interpreter is located
52# if command bellow points to your created directory and not some
53# system dir like /usr/bin/python then everything is fine
54$ which python
55
56# we can check now if all is good ➜ if ok couple of
57# lines will be displayed
58$ pip freeze
59# appdirs==1.4.3
60# packaging==16.8
61# pyparsing==2.2.0
62# six==1.10.0
63
64# now we are ready to install bottlepy ➜ web micro-framework
65$ pip install bottle
66
67# you can deactivate virtualenv but you will then go
68# under system domain ➜ for now don't deactivate
69$ deactivate
70```
71
72We are now ready to write simple web service. Let's create file app.py and paste code bellow in this newly created file.
73
74```python
75# -*- coding: utf-8 -*-
76
77import bottle
78import random
79import cProfile
80
81app = bottle.Bottle()
82
83# this function is a decorator and encapsulates function
84# and performs profiling and then saves it to subfolder
85# prof/function-name.prof
86# in our example only awesome_random_number function will
87# be profiled because it has do_cprofile defined
88def do_cprofile(func):
89 def profiled_func(*args, **kwargs):
90 profile = cProfile.Profile()
91 try:
92 profile.enable()
93 result = func(*args, **kwargs)
94 profile.disable()
95 return result
96 finally:
97 profile.dump_stats("prof/" + str(func.__name__) + ".prof")
98 return profiled_func
99
100
101# we use profiling over specific function with including
102# @do_cprofile above function declaration
103@app.route("/")
104@do_cprofile
105def awesome_random_number():
106 awesome_random_number = random.randint(0, 100)
107 return "awesome random number is " + str(awesome_random_number)
108
109@app.route("/test")
110def test():
111 return "dummy test"
112
113if __name__ == '__main__':
114 bottle.run(
115 app = app,
116 host = "0.0.0.0",
117 port = 4000
118 )
119
120# run with 'python app.py'
121# open browser 'http://0.0.0.0:4000'
122```
123
124When browser hits awesome\_random\_number() function profile is created in prof/ subfolder.
125
126## Visualize profile
127
128Now let's create callgrind format from this cProfile output.
129
130```bash
131$ cd prof/
132$ pyprof2calltree -i awesome_random_number.prof
133# this creates 'awesome_random_number.prof.log' file in the same folder
134```
135
136This file can be opened with visualizing tools listed above. In this case we will be using Profilling Viewer under MacOS. You can open image in new tab. As you can see from this example there is hierarchy of execution order of your code.
137
138![Profilling Viewer](/assets/python-profiling/profiling-viewer.png)
139
140> Make sure you convert output of the cProfile output every time you want to refresh and take a look at your possible optimizations because cProfile updates .prof file every time browser hits the function.
141
142This is just a simple example but when you are developing real-life applications this can be very illuminating, especially to see which parts of your code are bottlenecks and need to be optimized.
143
144## Update 2017-04-22
145
146Reddit user [mvt](https://www.reddit.com/user/mvt) also recommended this awesome web based profile visualizer [SnakeViz](https://jiffyclub.github.io/snakeviz/) that directly takes output from [cProfile](https://docs.python.org/2/library/profile.html#module-cProfile) module.
147
148<div class="reddit-embed" data-embed-media="www.redditmedia.com" data-embed-parent="false" data-embed-live="false" data-embed-uuid="583880c1-002e-41ed-a373-020a0ef2cff9" data-embed-created="2017-04-22T19:46:54.810Z"><a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/dgljhsb/">Comment</a> from discussion <a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/">Profiling Python web applications with visual tools</a>.</div><script async src="https://www.redditstatic.com/comment-embed.js"></script>
149
150```bash
151# let's install it globally as well
152$ sudo pip install snakeviz
153
154# now let's visualize
155$ cd prof/
156$ snakeviz awesome_random_number.prof
157# this automatically opens browser window and
158# shows visualized profile
159```
160
161![SnakeViz](/assets/python-profiling/snakeviz.png)
162
163Reddit user [ccharles](https://www.reddit.com/user/ccharles) suggested a better way for installing pip software by targeting user level instead of using sudo.
164
165<div class="reddit-embed" data-embed-media="www.redditmedia.com" data-embed-parent="false" data-embed-live="false" data-embed-uuid="f4f0459e-684d-441e-bebe-eb49b2f0a31d" data-embed-created="2017-04-22T19:46:10.874Z"><a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/dglpzkx/">Comment</a> from discussion <a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/">Profiling Python web applications with visual tools</a>.</div><script async src="https://www.redditstatic.com/comment-embed.js"></script>
166
167```bash
168# now we need to add this path to our $PATH variable
169# we do this my adding this line at the end of your
170# ~/.bashrc file
171PATH=$PATH:$HOME/.local/bin/
172
173# in order to use this new configuration you can close
174# and reopen terminal or reload .bashrc file
175$ source ~/.bashrc
176
177# now let's test if new directory is present in $PATH
178$ echo $PATH
179
180# now we can install on user level by adding --user
181# without use of sudo
182$ pip install snakeviz --user
183```
184
185Or as suggested by [mvt](https://www.reddit.com/user/mvt) you can use [pipsi](https://github.com/mitsuhiko/pipsi).
diff --git a/content/2017-08-11-simple-iot-application.md b/content/2017-08-11-simple-iot-application.md
deleted file mode 100644
index 1b99eb1..0000000
--- a/content/2017-08-11-simple-iot-application.md
+++ /dev/null
@@ -1,487 +0,0 @@
1~ title: Simple IOT application supported by real-time monitoring and data history
2~ description: Develop simple IOT application with Arduino MKR1000 and Python
3~ slug: /simple-iot-application.html
4~ date: 2017-08-11
5~ template: post
6~ hide: false
7
8## Initial thoughts
9
10I have been developing these kind of application for the better part of my last 5 years and people keep asking me how to approach developing such application and I will give a try explaining it here.
11
12IOT applications are really no different than any other kind of applications. We have data that needs to be collected and visualized in some form of tables or charts. The main difference here is that most of the times these data is collected by some kind of device foreign to developer that mainly operates in web domain. But fear not, it's not that different than writing some JavaScript.
13
14There are many devices able to transmit data via wireless or wired network by default but for the sake of example we will be using commonly known Arduino with wireless module already on the board → [Arduino MKR1000](https://store.arduino.cc/arduino-mkr1000).
15
16In order to make this little project as accessible to others as possible I will try to make it as inexpensive as possible. And by this I mean that I will avoid using hosted virtual servers and will be using my own laptop as a server. But you must buy Arduino MKR1000 to follow steps below. But if you would want to deploy this software I would suggest using [DigitalOcean](https://www.digitalocean.com) → smallest VPS is only per month making this one of the most affordable option out there. Please notice that this software will not run on stock web hosting that only supports LAMP (Linux, Apache, MySQL, and PHP).
17
18_But before we begin please take notice that this is strictly experimental code and not well optimized and there are much better ways in handling some aspects of the application but that requires much deeper knowledge of technology that is not needed for an example like this._
19
20**Development steps**
21
221. Simple Python API that will receive and store incoming data.
232. Prototype C++ code that will read "sensor data" and transmit it to API.
243. Data visualization with charts → extends Python web application.
25
26Step 1. and 3. will share the same web application. One route will be dedicated to API and another to serving HTML with chart.
27
28Schema below represents what we will try to achieve and how different parts correlates to each other.
29
30![Overview](/assets/iot-application/simple-iot-application-overview.svg)
31
32## Simple Python API
33
34I have always been a fan of simplicity so we will be using [Bottle: Python Web Framework](https://bottlepy.org/docs/dev/). It is a single file web framework that seriously simplifies working with routes, templating and has built-in web server that satisfies our need in this case.
35
36First we need to install bottle package. This can be done by downloading ```bottle.py``` and placing it in the root of your application or by using pip software ```pip install bottle --user```.
37
38If you are using Linux or MacOS then Python is already installed. If you will try to test this on Windows please install [Python for Windows](https://www.python.org/downloads/windows/). There may be some problems with path when you will try to launch ```python webapp.py``` so please take care of this before you continue.
39
40### Basic web application
41
42Most basic bottle application is quite simple. Paste code below in ```webapp.py``` file and save.
43
44```python
45# -*- coding: utf-8 -*-
46
47import bottle
48
49# initializing bottle app
50app = bottle.Bottle()
51
52# triggered when / is accessed from browser
53# only accepts GET → no POST allowed
54@app.route("/", method=["GET"])
55def route_default():
56 return "howdy from python"
57
58# starting server on http://0.0.0.0:5000
59if __name__ == "__main__":
60 bottle.run(
61 app = app,
62 host = "0.0.0.0",
63 port = 5000,
64 debug = True,
65 reloader = True,
66 catchall = True,
67 )
68```
69
70To run this simple application you should open command prompt or terminal on your machine and go to the folder containing your file and type ```python webapp.py```. If everything goes ok then open your web browser and point it to ```http://0.0.0.0:5000```.
71
72If you would like change the port of your application (like port 80) and not use root to run your app this will present a problem. The TCP/IP port numbers below 1024 are privileged ports → this is a security feature. So in order of simplicity and security use a port number above 1024 like I have used port 5000.
73
74If this fails at any time please fix it before you continue, because nothing below will work otherwise.
75
76We use 0.0.0.0 as default host so that this app is available over your local network. If you find your local ip ```ifconfig``` and try accessing this site with your phone (if on same network/router as your machine) this should work as well (example of such ip ```http://192.168.1.15:5000```). This is a must have because Arduino will be accessing this application to send it's data.
77
78### Web application security
79
80There is a lot to be said about security and is a topic of many books. Of course all this can not be written here but to just establish some basic security → you should always use SSL with your application. Some fantastic free certificates are available by [Let's Encrypt - Free SSL/TLS Certificates](https://letsencrypt.org). With SSL certificate installed you should then make use of HTTP headers and send your "API key" via a header. If your key is send via header then this key is encrypted by SSL and send encrypted over the network. Never send your api keys by GET parameter like ```http://example.com/?api_key=somekeyvalue```. The problem that this kind of sending presents is that this key is visible in logs and by network sniffers.
81
82There is a fantastic article describing some aspects about security: [11 Web Application Security Best Practices](https://www.keycdn.com/blog/web-application-security-best-practices/). Please check it out.
83
84### Simple API for writing data-points
85
86We will now be using boilerplate code from example above and extend it to be able to write data received by API to local storage. For example use I will use SQLite3 because it plays well with Python and can store quite large amount of data. I have been using it to collect gigabytes of data in a single database without any corruption or problems → your experience may vary.
87
88To avoid learning SQLite I will be using [Dataset: databases for lazy people](https://dataset.readthedocs.io/en/latest/index.html). This package abstracts SQL and simplifies writing and reading data from database. You should install this package with pip software ```pip install dataset --user```.
89
90Because API will use POST method I will be testing if code works correctly by using [Restlet Client for Google Chrome](https://chrome.google.com/webstore/detail/restlet-client-rest-api-t/aejoelaoggembcahagimdiliamlcdmfm). This software also allows you to set headers → for basic security with API_KEY.
91
92To quickly generate passwords or API keys I usually use this nifty website [RandomKeygen](https://randomkeygen.com/).
93
94Copy and paste code below over your previous code in file ```webapp.py```.
95
96```python
97# -*- coding: utf-8 -*-
98
99import time
100import bottle
101import random
102import dataset
103
104# initializing bottle app
105app = bottle.Bottle()
106
107# connects to sqlite database
108# check_same_thread=False allows using it in multi-threaded mode
109app.config["dsn"] = dataset.connect("sqlite:///data.db?check_same_thread=False")
110
111# api key that will be used in Arduino code
112app.config["api_key"] = "JtF2aUE5SGHfVJBCG5SH"
113
114# triggered when /api is accessed from browser
115# only accepts POST → no GET allowed
116@app.route("/api", method=["POST"])
117def route_default():
118 status = 400
119 ts = int(time.time()) # current timestamp
120 value = bottle.request.body.read() # data from device
121 api_key = bottle.request.get_header("Api_Key") # api key from header
122
123 # outputs to console received data for debug reason
124 print ">>> {} :: {}".format(value, api_key)
125
126 # if api_key is correct and value is present
127 # then writes attribute to point table
128 if api_key == app.config["api_key"] and value:
129 app.config["dsn"]["point"].insert(dict(ts=ts, value=value))
130 status = 200
131
132 # we only need to return status
133 return bottle.HTTPResponse(status=status, body="")
134
135# starting server on http://0.0.0.0:5000
136if __name__ == "__main__":
137 bottle.run(
138 app = app,
139 host = "0.0.0.0",
140 port = 5000,
141 debug = True,
142 reloader = True,
143 catchall = True,
144 )
145```
146
147To run this simply go to folder containing python file and run ```python webapp.py``` from terminal. If everything goes ok you should have simple API available via POST method on /api route.
148
149After testing the service with Restlet Client you should be able to view your data in a database file ```data.db```.
150
151![REST settings example](/assets/iot-application/iot-rest-example.png)
152
153You can also check the contents of new database file by using desktop client for SQLite → [DB Browser for SQLite](http://sqlitebrowser.org/).
154
155![SQLite database example](/assets/iot-application/iot-sqlite-db.png)
156
157Table structure is as simple as it can be. We have ts (timestamp) and value (value from Arduino). As you can see timestamp is generated on API side. If you would happen to have atomic clock on Arduino it would be then better to generate and send timestamp with the value. This would be particularity useful if we would be collecting sensor data at a higher frequency and then sending this data in bulk to API.
158
159If you will deploy this app with uWSGI and multi-threaded, use DSN (Data Source Name) url with ```?check_same_thread=False```.
160
161Ok, now that we have some sort of a working API with some basic security so unwanted people can not post data to your database can we proceed further and try to program Arduino to send data to API.
162
163## Sending data to API with Arduino MKR1000
164
165First of all you should have MKR1000 module and microUSB cable to proceed. If you have ever done any work with Arduino you should know that you also need [Arduino IDE](https://www.arduino.cc/en/Main/Software). On provided link you should be able to download and install IDE. Once that task is completed and you have successfully run blink example you should proceed to the next step.
166
167In order to use wireless capabilities of MKR1000 you need to first install [WiFi101 library](https://www.arduino.cc/en/Reference/WiFi101) in Arduino IDE. Please check before you install, you may already have it installed.
168
169Code below is a working example that sends data to API. Before you try to test your code make sure you have run Python web application. Then change settings for wifi, api endpoint and api_key. If by some reason code bellow doesn't work for you please leave a comment and I'll try to help.
170
171Once you have opened IDE and copied this code try to compile and upload it. Then open "Serial monitor" to see if any output is presented by Arduino.
172
173```c
174#include <WiFi101.h>
175
176// wifi settings
177char ssid[] = "ssid-name";
178char pass[] = "ssid-password";
179
180// api server enpoint
181char server[] = "192.168.6.22";
182int port = 5000;
183
184// api key that must be the same as the one in Python code
185String api_key = "JtF2aUE5SGHfVJBCG5SH";
186
187// frequency data is sent in ms - every 5 seconds
188int timeout = 1000 * 5;
189
190int status = WL_IDLE_STATUS;
191
192void setup() {
193
194 // initialize serial and wait for port to open:
195 Serial.begin(9600);
196 delay(1000);
197
198 // check for the presence of the shield
199 if (WiFi.status() == WL_NO_SHIELD) {
200 Serial.println("WiFi shield not present");
201 while (true);
202 }
203
204 // attempt to connect to wifi network
205 while (status != WL_CONNECTED) {
206 Serial.print("Attempting to connect to SSID: ");
207 Serial.println(ssid);
208 status = WiFi.begin(ssid, pass);
209 // wait 10 seconds for connection
210 delay(10000);
211 }
212
213 // output wifi status to serial monitor
214 Serial.print("SSID: ");
215 Serial.println(WiFi.SSID());
216
217 IPAddress ip = WiFi.localIP();
218 Serial.print("IP Address: ");
219 Serial.println(ip);
220
221 long rssi = WiFi.RSSI();
222 Serial.print("signal strength (RSSI):");
223 Serial.print(rssi);
224 Serial.println(" dBm");
225}
226
227void loop() {
228
229 WiFiClient client;
230
231 if (client.connect(server, port)) {
232
233 // I use random number generator for this example
234 // but you can use analog or digital inputs from arduino
235 String content = String(random(1000));
236
237 client.println("POST /api HTTP/1.1");
238 client.println("Connection: close");
239 client.println("Api-Key: " + api_key);
240 client.println("Content-Length: " + String(content.length()));
241 client.println();
242 client.println(content);
243
244 delay(100);
245 client.stop();
246 Serial.println("Data sent successfully ...");
247
248 } else {
249 Serial.println("Problem sending data ...");
250 }
251
252 // waits for x seconds and continue looping
253 delay(timeout);
254
255}
256```
257
258As seen from example you can notice that Arduino is generating random integer between [ 0 .. 1000 ]. You can easily replace this with a temperature sensor or any other kind of sensor.
259
260Now that we have API under the hood and Arduino is sending demo data we can now focus on data visualization.
261
262## Data visualization
263
264Before we continue we should examine our project folder structure. Currently we only have two files in our project:
265
266_simple-iot-app/_
267
268* _webapp.py_
269* _data.db_
270
271We will now add HTML template that will contain CSS and JavaScript code inline for the simplicity reason. And for the bottle framework to be able to scan root application folder for templates we will add ```bottle.TEMPLATE_PATH.insert(0, "./")``` in ```webapp.py```. By default bottle framework uses ```views/``` subfolder to store templates. This is not the ideal situation and if you will use bottle to develop web applications you should use native behavior and store templates in it's predefined folder. But for the sake of example we will over-ride this. Be careful to fully replace your code with new code that is provided below. Avoid partially replacing code in file :) Also new code for reading data-points is provided in Python example below.
272
273First we add new route to our web application. It should be trigger when browser hits root of application ```http://0.0.0.0:5000/```. This route will do nothing more than render ```frontend.html``` template. This is done by ```return bottle.template("frontend.html")```. Check code below to further examine how exactly this is done.
274
275Now we will expand ```/api``` route and use different methods to write or read data-points. For writing data-point we will use POST method and for reading points we will use GET method. GET method will return JSON object with latest readings and historical data.
276
277There is a fantastic JavaScript library for plotting time-series charts called [MetricsGraphics.js](https://www.metricsgraphicsjs.org) that is based on [D3.js](https://d3js.org/) library for visualizing data.
278
279Data schema required by MetricsGraphics.js → to achieve this we need to transform data from database into this format:
280
281```json
282[
283 {
284 "date": "2017-08-11 01:07:20",
285 "value": 933
286 },
287 {
288 "date": "2017-08-11 01:07:30",
289 "value": 743
290 }
291]
292```
293
294Web application is now complete and we only need ```frontend.html``` that we will develop now. If you would try to start web app now and go to root app this will return error because we don't have frontend.html yet.
295
296```python
297# -*- coding: utf-8 -*-
298
299import time
300import bottle
301import json
302import datetime
303import random
304import dataset
305
306# initializing bottle app
307app = bottle.Bottle()
308
309# adds root directory as template folder
310bottle.TEMPLATE_PATH.insert(0, "./")
311
312# connects to sqlite database
313# check_same_thread=False allows using it in multi-threaded mode
314app.config["db"] = dataset.connect("sqlite:///data.db?check_same_thread=False")
315
316# api key that will be used in Arduino code
317app.config["api_key"] = "JtF2aUE5SGHfVJBCG5SH"
318
319# triggered when / is accessed from browser
320# only accepts GET → no POST allowed
321@app.route("/", method=["GET"])
322def route_default():
323 return bottle.template("frontend.html")
324
325# triggered when /api is accessed from browser
326# accepts POST and GET
327@app.route("/api", method=["GET", "POST"])
328def route_default():
329
330 # if method is POST then we write datapoint
331 if bottle.request.method == "POST":
332 status = 400
333 ts = int(time.time()) # current timestamp
334 value = bottle.request.body.read() # data from device
335 api_key = bottle.request.get_header("Api-Key") # api key from header
336
337 # outputs to console recieved data for debug reason
338 print ">>> {} :: {}".format(value, api_key)
339
340 # if api_key is correct and value is present
341 # then writes attribute to point table
342 if api_key == app.config["api_key"] and value:
343 app.config["db"]["point"].insert(dict(ts=ts, value=value))
344 status = 200
345
346 # we only need to return status
347 return bottle.HTTPResponse(status=status, body="")
348
349 # if method is GET then we read datapoint
350 else:
351 response = []
352 datapoints = app.config["db"]["point"].all()
353
354 for point in datapoints:
355 response.append({
356 "date": datetime.datetime.fromtimestamp(int(point["ts"])).strftime("%Y-%m-%d %H:%M:%S"),
357 "value": point["value"]
358 })
359
360 bottle.response.content_type = "application/json"
361 return json.dumps(response)
362
363# starting server on http://0.0.0.0:5000
364if __name__ == "__main__":
365 bottle.run(
366 app = app,
367 host = "0.0.0.0",
368 port = 5000,
369 debug = True,
370 reloader = True,
371 catchall = True,
372 )
373```
374
375And now finally we can implement ```frontend.html```. Create file with this name and copy code below. When you are done you can start web application. Steps for this part are listed below the code.
376
377```html
378<!DOCTYPE html>
379<html>
380
381 <head>
382 <meta charset="utf-8">
383 <title>Simple IOT application</title>
384 </head>
385
386 <body>
387
388 <h1>Simple IOT application</h1>
389
390 <div class="chart-placeholder">
391 <div id="chart"></div>
392 </div>
393
394 <!-- application main script -->
395 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
396 <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/4.10.0/d3.min.js"></script>
397 <script src="https://cdnjs.cloudflare.com/ajax/libs/metrics-graphics/2.11.0/metricsgraphics.min.js"></script>
398 <script>
399 function fetch_and_render() {
400 d3.json("/api", function(data) {
401 data = MG.convert.date(data, "date", "%Y-%m-%d %H:%M:%S");
402 MG.data_graphic({
403 data: data,
404 chart_type: "line",
405 full_width: true,
406 height: 270,
407 target: document.getElementById("chart"),
408 x_accessor: "date",
409 y_accessor: "value"
410 });
411 });
412 }
413 window.onload = function() {
414 // initial call for rendering
415 fetch_and_render();
416
417 // updates chart every 5 seconds
418 setInterval(function() {
419 fetch_and_render();
420 }, 5000);
421 }
422 </script>
423
424 <!-- application styles -->
425 <style>
426 body {
427 font: 13px sans-serif;
428 padding: 20px 50px;
429 }
430 .chart-placeholder {
431 border: 2px solid #ccc;
432 width: 100%;
433 user-select: none;
434 }
435 /* chart styles */
436 .mg-line1-color {
437 stroke: red;
438 stroke-width: 2;
439 }
440 .mg-main-area, .mg-main-line {
441 fill: #fff;
442 }
443 .mg-x-axis line, .mg-y-axis line {
444 stroke: #b3b2b2;
445 stroke-width: 1px;
446 }
447 </style>
448
449 </body>
450
451</html>
452```
453
454Now the folder structure should look like:
455
456_simple-iot-app/_
457
458* _webapp.py_
459* _data.db_
460* _frontend.html_
461
462Ok, lets now start application and start feeding it data.
463
4641. ```python webapp.py```
4652. connect Arduino MKR1000 to power source
4663. open browser and go to ```http://0.0.0.0:5000```
467
468If everything goes well you should be seeing new data-points rendered on chart every 5 seconds.
469
470If you navigate to ```http://0.0.0.0:5000``` you should see rendered chart as shown on picture below.
471
472![Application output](/assets/iot-application/iot-app-output.png)
473
474Complete application with all the code is available for [download](/assets/iot-application/simple-iot-application.zip).
475
476## Conclusion
477
478I hope this clarifies some aspects of IOT application development. Of course this is a minimal example and is far from what can be done in real life with some further dive into other technologies.
479
480If you would like to continue exploring IOT world here are some interesting resources for you to examine:
481
482* [Reading Sensors with an Arduino](https://www.allaboutcircuits.com/projects/reading-sensors-with-an-arduino/)
483* [MQTT 101 – How to Get Started with the lightweight IoT Protocol](http://www.hivemq.com/blog/how-to-get-started-with-mqtt)
484* [Stream Updates with Server-Sent Events](https://www.html5rocks.com/en/tutorials/eventsource/basics/)
485* [Internet of Things (IoT) Tutorials](http://www.tutorialspoint.com/internet_of_things/)
486
487Any comment or additional ideas are welcomed in comments below.
diff --git a/content/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md b/content/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md
deleted file mode 100644
index 93a167e..0000000
--- a/content/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md
+++ /dev/null
@@ -1,261 +0,0 @@
1~ title: Using DigitalOcean Spaces Object Storage with FUSE
2~ description: Using DigitalOcean Spaces Object Storage with FUSE
3~ slug: /using-digitalocean-spaces-object-storage-with-fuse.html
4~ date: 2018-01-16
5~ template: post
6~ hide: false
7
8Couple of months ago [DigitalOcean](https://www.digitalocean.com) introduced new product called [Spaces](https://blog.digitalocean.com/introducing-spaces-object-storage/) which is Object Storage very similar to Amazon's S3. This really peaked my interest, because this was something I was missing and even the thought of going over the internet for such functionality was in no interest to me. Also in fashion with their previous pricing this also is very cheap and pricing page is a no-brainer compared to AWS or GCE. [Prices are clearly and precisely defined and outlined](https://www.digitalocean.com/pricing/). You must love them for that :)
9
10### Initial requirements
11
12* Is it possible to use them as a mounted drive with FUSE? (tl;dr YES)
13* Will the performance degrade over time and over different sizes of objects? (tl;dr NO&YES)
14* Can storage be mounted on multiple machines at the same time and be writable? (tl;dr YES)
15
16> Let me be clear. This scripts I use are made just for benchmarking and are not intended to be used in real-life situations. Besides that, I am looking into using this approaches but adding caching service in front of it and then dumping everything as an object to storage. This could potentially be some interesting post of itself. But in case you would need real-time data without eventual consistency please take this scripts as they are: not usable in such situations.
17
18## Is it possible to use them as a mounted drive with FUSE?
19
20Well, actually they can be used in such manor. Because they are similar to [AWS S3](https://aws.amazon.com/s3/) many tools are available and you can find many articles and [Stackoverflow items](https://stackoverflow.com/search?q=s3+fuse).
21
22To make this work you will need DigitalOcean account. If you don't have one you will not be able to test this code. But if you have an account then you go and [create new Droplet](https://cloud.digitalocean.com/droplets/new?size=s-1vcpu-1gb&region=ams3&distro=debian&distroImage=debian-9-x64&options=private_networking,install_agent). If you click on this link you will already have preselected Debian 9 with smallest VM option.
23
24* Please be sure to add you SSH key, because we will login to this machine remotely.
25* If you change your region please remember which one you choose because we will need this information when we try to mount space to our machine.
26
27Instuctions on how to use SSH keys and how to setup them are available in article [How To Use SSH Keys with DigitalOcean Droplets](https://www.digitalocean.com/community/tutorials/how-to-use-ssh-keys-with-digitalocean-droplets).
28
29![DigitalOcean Droplets](/assets/do-fuse/fuse-droplets.png)
30
31After we created Droplet it's time to create new Space. This is done by clicking on a button [Create](https://cloud.digitalocean.com/spaces/new) (right top corner) and selecting Spaces. Choose pronounceable ```Unique name``` because we will use it in examples below. You can either choose Private or Public, it doesn't matter in our case. And you can always change that in the future.
32
33When you have created new Space we should [generate Access key](https://cloud.digitalocean.com/settings/api/tokens). This link will guide to the page when you can generate this key. After you create new one, please save provided Key and Secret because Secret will not be shown again.
34
35![DigitalOcean Spaces](/assets/do-fuse/fuse-spaces.png)
36
37Now that we have new Space and Access key we should SSH into our machine.
38
39```bash
40# replace IP with the ip of your newly created droplet
41ssh root@IP
42
43# this will install utilities for mounting storage objects as FUSE
44apt install s3fs
45
46# we now need to provide credentials (access key we created earlier)
47# replace KEY and SECRET with your own credentials but leave the colon between them
48# we also need to set proper permissions
49echo "KEY:SECRET" > .passwd-s3fs
50chmod 600 .passwd-s3fs
51
52# now we mount space to our machine
53# replace UNIQUE-NAME with the name you choose earlier
54# if you choose different region for your space be careful about -ourl option (ams3)
55s3fs UNIQUE-NAME /mnt/ -ourl=https://ams3.digitaloceanspaces.com -ouse_cache=/tmp
56
57# now we try to create a file
58# once you mount it may take a couple of seconds to retrieve data
59echo "Hello cruel world" > /mnt/hello.txt
60```
61
62After all this you can return to your browser and go to [DigitalOcean Spaces](https://cloud.digitalocean.com/spaces) and click on your created space. If file hello.txt is present you have successfully mounted space to your machine and wrote data to it.
63
64I choose the same region for my Droplet and my Space but you don't have to. You can have different regions. What this actually does to performance I don't know.
65
66Additional information on FUSE:
67
68* [Github project page for s3fs](https://github.com/s3fs-fuse/s3fs-fuse)
69* [FUSE - Filesystem in Userspace](https://en.wikipedia.org/wiki/Filesystem_in_Userspace)
70
71## Will the performance degrade over time and over different sizes of objects?
72
73For this task I didn't want to just read and write text files or uploading images. I actually wanted to figure out if using something like SQlite is viable in this case.
74
75### Measurement experiment 1: File copy
76
77```bash
78# first we create some dummy files at different sizes
79dd if=/dev/zero of=10KB.dat bs=1024 count=10 #10KB
80dd if=/dev/zero of=100KB.dat bs=1024 count=100 #100KB
81dd if=/dev/zero of=1MB.dat bs=1024 count=1024 #1MB
82dd if=/dev/zero of=10MB.dat bs=1024 count=10240 #10MB
83
84# now we set time command to only return real
85TIMEFORMAT=%R
86
87# now lets test it
88(time cp 10KB.dat /mnt/) |& tee -a 10KB.results.txt
89
90# and now we automate
91# this will perform the same operation 100 times
92# this will output results into separated files based on objecty size
93n=0; while (( n++ < 100 )); do (time cp 10KB.dat /mnt/10KB.$n.dat) |& tee -a 10KB.results.txt; done
94n=0; while (( n++ < 100 )); do (time cp 100KB.dat /mnt/100KB.$n.dat) |& tee -a 100KB.results.txt; done
95n=0; while (( n++ < 100 )); do (time cp 1MB.dat /mnt/1MB.$n.dat) |& tee -a 1MB.results.txt; done
96n=0; while (( n++ < 100 )); do (time cp 10MB.dat /mnt/10MB.$n.dat) |& tee -a 10MB.results.txt; done
97```
98
99Files of size 100MB were not successfully transferred and ended up displaying error (cp: failed to close '/mnt/100MB.1.dat': Operation not permitted).
100
101As I suspected, object size is not really that important. Sadly I don't have the time to test performance over periods of time. But if some of you would do it please send me your data. I would be interested in seeing results.
102
103**Here are plotted results**
104
105You can download [raw result here](/assets/do-fuse/copy-benchmarks.tsv). Measurements are in seconds.
106
107<script src="//cdn.plot.ly/plotly-latest.min.js"></script>
108<div id="copy-benchmarks"></div>
109<script>
110(function(){
111 var request = new XMLHttpRequest();
112 request.open("GET", "/assets/do-fuse/copy-benchmarks.tsv", true);
113 request.onload = function() {
114 if (request.status >= 200 && request.status < 400) {
115 var payload = request.responseText.trim();
116 var tsv = payload.split("\n");
117 for (var i=0; i<tsv.length; i++) { tsv[i] = tsv[i].split("\t"); }
118 var traces = [];
119 var headers = tsv[0];
120 tsv.shift();
121 Array.prototype.forEach.call(headers, function(el, idx) {
122 var x = [];
123 var y = [];
124 for (var j=0; j<tsv.length; j++) {
125 x.push(j);
126 y.push(parseFloat(tsv[j][idx].replace(",", ".")));
127 }
128 traces.push({ x: x, y: y, type: "scatter", name: el, line: { width: 1, shape: "spline" } });
129 });
130 var copy = Plotly.newPlot("copy-benchmarks", traces, { legend: {"orientation": "h"}, height: 400, margin: { l: 40, r: 0, b: 20, t: 30, pad: 0 }, yaxis: { title: "execution time in seconds", titlefont: { size: 12 } }, xaxis: { title: "fn(i)", titlefont: { size: 12 } } });
131 } else { }
132 };
133 request.onerror = function() { };
134 request.send(null);
135})();
136</script>
137
138As far as these tests show, performance is quite stable and can be predicted which is fantastic. But this is a small test and spans only over couple of hours. So you should not completely trust them.
139
140### Measurement experiment 2: SQLite performanse
141
142I was unable to use database file directly from mounted drive so this is a no-go as I suspected. So I executed code below on a local disk just to get some benchmarks. I inserted 1000 records with DROPTABLE, CREATETABLE, INSERTMANY, FETCHALL, COMMIT for 1000 times to generate statistics. As you can see performance of SQLite is quite amazing. You could then potentially just copy file to mounted drive and be done with it.
143
144```python
145import time
146import sqlite3
147import sys
148
149if len(sys.argv) < 3:
150 print("usage: python sqlite-benchmark.py DB_PATH NUM_RECORDS REPEAT")
151 exit()
152
153def data_iter(x):
154 for i in range(x):
155 yield "m" + str(i), "f" + str(i*i)
156
157header_line = "%s\t%s\t%s\t%s\t%s\n" % ("DROPTABLE", "CREATETABLE", "INSERTMANY", "FETCHALL", "COMMIT")
158with open("sqlite-benchmarks.tsv", "w") as fp:
159 fp.write(header_line)
160
161start_time = time.time()
162conn = sqlite3.connect(sys.argv[1])
163c = conn.cursor()
164end_time = time.time()
165result_time = CONNECT = end_time - start_time
166print("CONNECT: %g seconds" % (result_time))
167
168start_time = time.time()
169c.execute("PRAGMA journal_mode=WAL")
170c.execute("PRAGMA temp_store=MEMORY")
171c.execute("PRAGMA synchronous=OFF")
172result_time = PRAGMA = end_time - start_time
173print("PRAGMA: %g seconds" % (result_time))
174
175for i in range(int(sys.argv[3])):
176 print("#%i" % (i))
177
178 start_time = time.time()
179 c.execute("drop table if exists test")
180 end_time = time.time()
181 result_time = DROPTABLE = end_time - start_time
182 print("DROPTABLE: %g seconds" % (result_time))
183
184 start_time = time.time()
185 c.execute("create table if not exists test(a,b)")
186 end_time = time.time()
187 result_time = CREATETABLE = end_time - start_time
188 print("CREATETABLE: %g seconds" % (result_time))
189
190 start_time = time.time()
191 c.executemany("INSERT INTO test VALUES (?, ?)", data_iter(int(sys.argv[2])))
192 end_time = time.time()
193 result_time = INSERTMANY = end_time - start_time
194 print("INSERTMANY: %g seconds" % (result_time))
195
196 start_time = time.time()
197 c.execute("select count(*) from test")
198 res = c.fetchall()
199 end_time = time.time()
200 result_time = FETCHALL = end_time - start_time
201 print("FETCHALL: %g seconds" % (result_time))
202
203 start_time = time.time()
204 conn.commit()
205 end_time = time.time()
206 result_time = COMMIT = end_time - start_time
207 print("COMMIT: %g seconds" % (result_time))
208
209 print
210 log_line = "%f\t%f\t%f\t%f\t%f\n" % (DROPTABLE, CREATETABLE, INSERTMANY, FETCHALL, COMMIT)
211 with open("sqlite-benchmarks.tsv", "a") as fp:
212 fp.write(log_line)
213
214start_time = time.time()
215conn.close()
216end_time = time.time()
217result_time = CLOSE = end_time - start_time
218print("CLOSE: %g seconds" % (result_time))
219```
220
221You can download [raw result here](/assets/do-fuse/sqlite-benchmarks.tsv). And again, these results are done on a local block storage and do not represent capabilities of object storage. With my current approach and state of the test code these can not be done. I would need to make Python code much more robust and check locking etc.
222
223<div id="sqlite-benchmarks"></div>
224<script>
225(function(){
226 var request = new XMLHttpRequest();
227 request.open("GET", "/assets/do-fuse/sqlite-benchmarks.tsv", true);
228 request.onload = function() {
229 if (request.status >= 200 && request.status < 400) {
230 var payload = request.responseText.trim();
231 var tsv = payload.split("\n");
232 for (var i=0; i<tsv.length; i++) { tsv[i] = tsv[i].split("\t"); }
233 var traces = [];
234 var headers = tsv[0];
235 tsv.shift();
236 Array.prototype.forEach.call(headers, function(el, idx) {
237 var x = [];
238 var y = [];
239 for (var j=0; j<tsv.length; j++) {
240 x.push(j);
241 y.push(parseFloat(tsv[j][idx].replace(",", ".")));
242 }
243 traces.push({ x: x, y: y, type: "scatter", name: el, line: { width: 1, shape: "spline" } });
244 });
245 var sqlite = Plotly.newPlot("sqlite-benchmarks", traces, { legend: {"orientation": "h"}, height: 400, margin: { l: 50, r: 0, b: 20, t: 30, pad: 0 }, yaxis: { title: "execution time in seconds", titlefont: { size: 12 } } });
246 } else { }
247 };
248 request.onerror = function() { };
249 request.send(null);
250})();
251</script>
252
253## Can storage be mounted on multiple machines at the same time and be writable?
254
255Well, this one didn't take long to test. And the answer is **YES**. I mounted space on both machines and measured same performance on both machines. But because file is downloaded before write and then uploaded on complete there could potentially be problems is another process is trying to access the same file.
256
257## Observations and conslusion
258
259Using Spaces in this way makes it easier to access and manage files. But besides that you would need to write additional code to make this one play nice with you applications.
260
261Nevertheless, this was extremely simple to setup and use and this is just another excellent product in DigitalOcean product line. I found this exercise very valuable and am thinking about implementing some sort of mechanism for SQLite, so data can be stored on Spaces and accessed by many VM's. For a project where data doesn't need to be accessible in real-time and can have couple of minutes old data this would be very interesting. If any of you find this proposal interesting please write in a comment box below or shoot me an email and I will keep you posted.
diff --git a/content/2019-01-03-encoding-binary-data-into-dna-sequence.md b/content/2019-01-03-encoding-binary-data-into-dna-sequence.md
deleted file mode 100644
index 335b868..0000000
--- a/content/2019-01-03-encoding-binary-data-into-dna-sequence.md
+++ /dev/null
@@ -1,346 +0,0 @@
1~ title: Encoding binary data into DNA sequence
2~ description: Imagine a world where you could go outside and take a leaf from a tree and put it through your ~ personal DNA sequencer and get data like music, videos or computer programs from it
3~ slug: /encoding-binary-data-into-dna-sequence.html
4~ date: 2019-01-03
5~ template: post
6~ hide: false
7
8## Initial thoughts
9
10Imagine a world where you could go outside and take a leaf from a tree and put it through your personal DNA sequencer and get data like music, videos or computer programs from it. Well, this is all possible now. It was not done on a large scale because it is quite expensive to create DNA strands but it's possible.
11
12Encoding data into DNA sequence is relatively simple process once you understand the relationship between binary data and nucleotides and scientists have been making large leaps in this field in order to provide viable long-term storage solution for our data that would potentially survive our specie if case of global disaster. We could imprint all the world's knowledge into plants and ensure the survival of our knowledge.
13
14More optimistic usage for this technology would be easier storage of ever growing data we produce every day. Once machines for sequencing DNA become fast enough and cheaper this could mean the next evolution of storing data and abandoning classical hard and solid state drives in data warehouses.
15
16As we currently stand this is still not viable but it is quite an amazing and cool technology.
17
18My interests in this field are purely in encoding processes and experimental testing mainly because I don't have the access to this expensive machines. My initial goal was to create a toolkit that can be used by everybody to encode their data into a proper DNA sequence.
19
20## Glossary
21
22**deoxyribose**
23: A five-carbon sugar molecule with a hydrogen atom rather than a hydroxyl group in the 2′ position; the sugar component of DNA nucleotides.
24
25**double helix**
26: The molecular shape of DNA in which two strands of nucleotides wind around each other in a spiral shape.
27
28**nitrogenous base**
29: A nitrogen-containing molecule that acts as a base; often referring to one of the purine or pyrimidine components of nucleic acids.
30
31**phosphate group**
32: A molecular group consisting of a central phosphorus atom bound to four oxygen atoms.
33
34**RGB**
35: The RGB color model is an additive color model in which red, green and blue light are added together in various ways to reproduce a broad array of colors.
36
37**GCC**
38: The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages.
39
40## Data encoding
41
42**TL;DR:** Encoding involves the use of a code to change original data into a form that can be used by an external process.
43
44Encoding is the process of converting data into a format required for a number of information processing needs, including:
45
46- Program compiling and execution
47- Data transmission, storage and compression/decompression
48- Application data processing, such as file conversion
49
50Encoding can have two meanings:
51
52- In computer technology, encoding is the process of applying a specific code, such as letters, symbols and numbers, to data for conversion into an equivalent cipher.
53- In electronics, encoding refers to analog to digital conversion.
54
55## Quick history of DNA
56
57- **1869** - Friedrich Miescher identifies "nuclein".
58- **1900s** - The Eugenics Movement.
59- **1900** – Mendel's theories are rediscovered by researchers.
60- **1944** - Oswald Avery identifies DNA as the 'transforming principle'.
61- **1952** - Rosalind Franklin photographs crystallized DNA fibres.
62- **1953** - James Watson and Francis Crick discover the double helix structure of DNA.
63- **1965** - Marshall Nirenberg is the first person to sequence the bases in each codon.
64- **1983** - Huntington's disease is the first mapped genetic disease.
65- **1990** - The Human Genome Project begins.
66- **1995** - Haemophilus Influenzae is the first bacterium genome sequenced.
67- **1996** - Dolly the sheep is cloned.
68- **1999** - First human chromosome is decoded.
69- **2000** – Genetic code of the fruit fly is decoded.
70- **2002** – Mouse is the first mammal to have its genome decoded.
71- **2003** – The Human Genome Project is completed.
72- **2013** – DNA Worldwide and Eurofins Forensic discover identical twins have differences in their genetic makeup.
73
74## What is DNA?
75
76Deoxyribonucleic acid, a self-replicating material which is **present in nearly all living organisms** as the main constituent of chromosomes. It is the **carrier of genetic information**.
77
78> The nitrogen in our DNA, the calcium in our teeth, the iron in our blood, the carbon in our apple pies were made in the interiors of collapsing stars. We are made of starstuff.
79>
80> **-- Carl Sagan, Cosmos**
81
82The nucleotide in DNA consists of a sugar (deoxyribose), one of four bases (cytosine (C), thymine (T), adenine (A), guanine (G)), and a phosphate. Cytosine and thymine are pyrimidine bases, while adenine and guanine are purine bases. The sugar and the base together are called a nucleoside.
83
84![DNA](/assets/dna-sequence/dna-basics.jpg#center)
85
86*DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and cytosine pairs with guanine. (credit a: modification of work by Jerome Walker, Dennis Myts)*
87
88## Encode binary data into DNA sequence
89
90As an input file you can use any file you want:
91- ASCII files,
92- Compiled programs,
93- Multimedia files (MP3, MP4, MVK, etc),
94- Images,
95- Database files,
96- etc.
97
98Note: If you would copy all the bytes from RAM to file or pipe data to file you could encode also this data as long as you provide file pointer to the encoder.
99
100### Basic Encoding
101
102As already mentioned, the Basic Encoding is based on a simple mapping. Since DNA is composed of 4 nucleotides (Adenine, Cytosine, Guanine, Thymine; usually referred using the first letter). Using this technique we can encode
103
104$$ log_2(4) = log_2(2^2) = 2 bits $$
105
106using a single nucleotide. In this way, we are able to use the 4 bases that compose the DNA strand to encode each byte of data.
107
108| Two bits | Nucleotides |
109| -------- | ---------------- |
110| 00 | **A** (Adenine) |
111| 10 | **G** (Guanine) |
112| 01 | **C** (Cytosine) |
113| 11 | **T** (Thymine) |
114
115With this in mind we can simply encode any data by using two-bit to Nucleotides conversion
116
117```python
118{ Algorithm 1: Naive byte array to DNA encode }
119procedure EncodeToDNASequence(f) string
120begin
121 enc string
122 while not eof(f) do
123 c byte := buffer[0] { Read 1 byte from buffer }
124 bin integer := sprintf('08b', c) { Convert to string binary }
125 for e in range[0, 2, 4, 6] do
126 if e[0] == 48 and e[1] == 48 then { 0x00 - A (Adenine) }
127 enc += 'A'
128 else if e[0] == 48 and e[1] == 49 then { 0x01 - G (Guanine) }
129 enc += 'G'
130 else if e[0] == 49 and e[1] == 48 then { 0x10 - C (Cytosine) }
131 enc += 'C'
132 else if e[0] == 49 and e[1] == 49 then { 0x11 - T (Thymine) }
133 enc += 'T'
134 return enc { Return DNA sequence }
135end
136```
137
138Another encoding would be **Goldman encoding**. Using this encoding helps with Nonsense mutation (amino acids replaced by a stop codon) that occurs and is the most problematic during translation because it leads to truncated amino acid sequences, which in turn results in truncated proteins.
139
140[Where to store big data? In DNA: Nick Goldman at TEDxPrague](https://www.youtube.com/watch?v=a4PiGWNsIEU)
141
142### FASTA file format
143
144In bioinformatics, FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The format originates from the FASTA software package, but has now become a standard in the field of bioinformatics.
145
146The first line in a FASTA file started either with a ">" (greater-than) symbol or, less frequently, a ";" (semicolon) was taken as a comment. Subsequent lines starting with a semicolon would be ignored by software. Since the only comment used was the first, it quickly became used to hold a summary description of the sequence, often starting with a unique library accession number, and with time it has become commonplace to always use ">" for the first line and to not use ";" comments (which would otherwise be ignored).
147
148```text
149;LCBO - Prolactin precursor - Bovine
150; a sample sequence in FASTA format
151MDSKGSSQKGSRLLLLLVVSNLLLCQGVVSTPVCPNGPGNCQVSLRDLFDRAVMVSHYIHDLSS
152EMFNEFDKRYAQGKGFITMALNSCHTSSLPTPEDKEQAQQTHHEVLMSLILGLLRSWNDPLYHL
153VTEVRGMKGAPDAILSRAIEIEEENKRLLEGMEMIFGQVIPGAKETEPYPVWSGLPSLQTKDED
154ARYSAFYNLLHCLRRDSSKIDTYLKLLNCRIIYNNNC*
155
156>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
157ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID
158FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
159DIDGDGQVNYEEFVQMMTAK*
160
161>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
162LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
163EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
164LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
165GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
166IENY
167```
168
169FASTA format was extended by [FASTQ](https://en.wikipedia.org/wiki/FASTQ_format) format from the [Sanger Centre](https://www.sanger.ac.uk/) in Cambridge.
170
171### PNG encoded DNA sequence
172
173| Nucleotides | RGB | Color name |
174| ------------- | ----------- | ---------- |
175| A -> Adenine | (0,0,255) | Blue |
176| G -> Guanine | (0,100,0) | Green |
177| C -> Cytosine | (255,0,0) | Red |
178| T -> Thymine | (255,255,0) | Yellow |
179
180With this in mind we can create a simple algorithm to create PNG representation of a DNA sequence.
181
182```python
183{ Algorithm 2: Naive DNA to PNG encode from FASTA file }
184procedure EncodeDNASequenceToPNG(f)
185begin
186 i image
187 while not eof(f) do
188 c char := buffer[0] { Read 1 char from buffer }
189 case c of
190 'A': color := RGB(0, 0, 255) { Blue }
191 'G': color := RGB(0, 100, 0) { Green }
192 'C': color := RGB(255, 0, 0) { Red }
193 'T': color := RGB(255, 255, 0) { Yellow }
194 drawRect(i, [x, y], color)
195 save(i) { Save PNG image }
196end
197```
198
199## Encoding text file in practice
200
201In this example we will take a simple text file as our input stream for encoding. This file will have a quote from Niels Bohr and saved as txt file.
202
203> How wonderful that we have met with a paradox. Now we have some hope of making progress.
204> ― Niels Bohr
205
206First we encode text file into FASTA file.
207
208```bash
209./dnae-encode -i quote.txt -o quote.fa
2102019/01/10 00:38:29 Gathering input file stats
2112019/01/10 00:38:29 Starting encoding ...
212 106 B / 106 B [==================================] 100.00% 0s
2132019/01/10 00:38:29 Saving to FASTA file ...
2142019/01/10 00:38:29 Output FASTA file length is 438 B
2152019/01/10 00:38:29 Process took 987.263µs
2162019/01/10 00:38:29 Done ...
217```
218
219Output of `quote.fa` file contains the encoded DNA sequence in ASCII format.
220
221```text
222>SEQ1
223GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA
224GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA
225ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA
226ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT
227GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT
228GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC
229AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC
230AACC
231```
232
233Then we encode FASTA file from previous operation to encode this data into PNG.
234
235```bash
236./dnae-png -i quote.fa -o quote.png
2372019/01/10 00:40:09 Gathering input file stats ...
2382019/01/10 00:40:09 Deconstructing FASTA file ...
2392019/01/10 00:40:09 Compositing image file ...
240 424 / 424 [==================================] 100.00% 0s
2412019/01/10 00:40:09 Saving output file ...
2422019/01/10 00:40:09 Output image file length is 1.1 kB
2432019/01/10 00:40:09 Process took 19.036117ms
2442019/01/10 00:40:09 Done ...
245```
246
247After encoding into PNG format this file looks like this.
248
249![Encoded Quote in PNG format](/assets/dna-sequence/quote.png)
250
251The larger the input stream is the larger the PNG file would be.
252
253Compiled basic Hello World C program with [GCC](https://www.gnu.org/software/gcc/) would [look like](/assets/dna-sequence/sample.png).
254
255```c
256// gcc -O3 -o sample sample.c
257#include <stdio.h>
258
259main() {
260 printf("Hello, world!\n");
261 return 0;
262}
263```
264
265## Toolkit for encoding data
266
267I have created a toolkit with two main programs:
268- dnae-encode (encodes file into FASTA file)
269- dnae-png (encodes FASTA file into PNG)
270
271Toolkit with full source code is available on [github.com/mitjafelicijan/dna-encoding](https://github.com/mitjafelicijan/dna-encoding).
272
273### dnae-encode
274
275```bash
276> ./dnae-encode --help
277usage: dnae-encode --input=INPUT [<flags>]
278
279A command-line application that encodes file into DNA sequence.
280
281Flags:
282 --help Show context-sensitive help (also try --help-long and --help-man).
283 -i, --input=INPUT Input file (ASCII or binary) which will be encoded into DNA sequence.
284 -o, --output="out.fa" Output file which stores DNA sequence in FASTA format.
285 -s, --sequence=SEQ1 The description line (defline) or header/identifier line, gives a name and/or a unique identifier for the sequence.
286 -c, --columns=60 Row characters length (no more than 120 characters). Devices preallocate fixed line sizes in software.
287 --version Show application version.
288```
289
290### dnae-png
291
292```bash
293> ./dnae-png --help
294usage: dnae-png --input=INPUT [<flags>]
295
296A command-line application that encodes FASTA file into PNG image.
297
298Flags:
299 --help Show context-sensitive help (also try --help-long and --help-man).
300 -i, --input=INPUT Input FASTA file which will be encoded into PNG image.
301 -o, --output="out.png" Output file in PNG format that represents DNA sequence in graphical way.
302 -s, --size=10 Size of pairings of DNA bases on image in pixels (lower resolution lower file size).
303 --version Show application version.
304```
305
306## Benchmarks
307
308First we generate some binary sample data with dd.
309
310```bash
311dd if=<(openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero) of=1KB.bin bs=1KB count=1 iflag=fullblock
312```
313
314Our freshly generated 1KB file looks something like this (its full of garbage data as intended).
315
316![Sample binary file 1KB](/assets/dna-sequence/sample-binary-file.png)
317
318We create following binary files:
319- 1KB.bin
320- 10KB.bin
321- 100KB.bin
322- 1MB.bin
323- 10MB.bin
324- 100MB.bin
325
326After this we create FASTA files for all the binary files by encoding them into DNA sequence.
327
328```bash
329./dnae-encode -i 100MB.bin -o 100MB.fa
330```
331
332Then we GZIP all the FASTA files to see how much the can be compressed.
333
334```bash
335gzip -9 < 10MB.fa > 10MB.fa.gz
336```
337
338[Download ODS file with benchmarks](/assets/dna-sequence/benchmarks.ods).
339
340## References
341
342- https://www.techopedia.com/definition/948/encoding
343- https://www.dna-worldwide.com/resource/160/history-dna-timeline
344- https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/
345- https://arxiv.org/abs/1801.04774
346- https://en.wikipedia.org/wiki/FASTA_format
diff --git a/content/2019-10-14-simplifying-and-reducing-clutter.md b/content/2019-10-14-simplifying-and-reducing-clutter.md
deleted file mode 100644
index 580839b..0000000
--- a/content/2019-10-14-simplifying-and-reducing-clutter.md
+++ /dev/null
@@ -1,22 +0,0 @@
1~ title: Simplifying and reducing clutter in my life and work
2~ description: Simplifying and reducing clutter in my life and work
3~ slug: /simplifying-and-reducing-clutter.html
4~ date: 2019-10-14
5~ template: post
6~ hide: false
7
8I recently moved my main working machine back from Hachintosh to Linux. Well the experiment was interesting and I have done some great work on macOS but it was time to move back.
9
10I actually really missed Linux. The simplicity of `apt-get` or just the amount of software that exists for Linux should be a no-brainer. I spent most of my time on macOS finding solutions to make things work. Using [Brew](https://brew.sh/) was just a horrible experience and far from package managers of Linux. At least they managed to get that `sudo` debacle sorted.
11
12Not all was bad. macOS in general was a perfectly good environment. Things like Docker and tooling like this worked without any hiccups. My normal tools like coding IDE worked flawlessly and the whole look and feel is just superb. I have been using MacBook Air for couple of years so I was used to the system but never as a daily driver.
13
14One of the things I did after I installed Linux back on my machine was cleaning up my Dropbox folder. I have everything on Dropbox. Even projects folder. I write code for living so my whole life revolves around couple of megs of code (with assets). So it's not like I have huge files on my machine. I don't have movies or music or pictures on my PC. All of that stuff is in cloud. I use Google music and I have Netflix account which is more than enough for me.
15
16I also went and deleted some of the repositories on my Github account. I have deleted more code than deployed. People find this strange but for me deleting something feels so cathartic and also forces me to write better code next time around when I am faced with similar problem. That was a huge relief if I am being totally honest.
17
18Next step was to do something with my webpage. I have been using some scripts I wrote a while ago to generate static pages from markdown source posts. I kept on adding and adding stuff on top of it and it became a source of a frustration. And this is just a simple blog and I was using gulp and npm. Anyways after couple of hours of searching and testing static generators I found an interesting one [https://github.com/piranha/gostatic](https://github.com/piranha/gostatic) and I just decided to use this one. It was the only one that had a simple templating engine, not that I really need one. But others had this convoluted way of trying to solve everything and at the end just required quite bigger learning curve I was ready to go with. So I deleted couple of old posts, simplified HTML, trashed most of the CSS and went with [https://motherfuckingwebsite.com/](https://motherfuckingwebsite.com/) aesthetics. Yeah, the previous site was more visually stimulating but all I really care is the content at this point. And Times New Roman font is kind of awesome.
19
20I stopped working on most of the projects in the past couple of months because the overhead was just too insane. There comes a point when you stretch yourself too much and then you stop progressing and with that comes dissatisfaction.
21
22So that's about it. Moving forward minimal style.
diff --git a/content/2019-10-19-using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md b/content/2019-10-19-using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md
deleted file mode 100644
index 33e35cb..0000000
--- a/content/2019-10-19-using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md
+++ /dev/null
@@ -1,86 +0,0 @@
1~ title: Using sentiment analysis for click&#8209;bait detection in RSS feeds
2~ description: Using Python with sentiment analysis to detect if titles in RSS feeds are click-bait
3~ slug: /using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.html
4~ date: 2019-10-19
5~ template: post
6~ hide: false
7
8## Initial thoughts
9
10One of the things that interested me for a while now is if major well established news sites use click bait titles to drive additional traffic to their sites and generate additional impressions.
11
12Goal is to see how article titles and actual content of article differ from each other and see if titles are click-baited.
13
14## Preparing and cleaning data
15
16For this example I opted to just use RSS feed from a new website and decided to go with [The Guardian](https://www.theguardian.com) World news. While this gets us limited data (~40) articles and also description (actual content) is trimmed this really doesn't reflect the actual article contents.
17
18To get better content I could use web scraping and use RSS as link list and fetch contents directly from website, but for this simple example this will suffice.
19
20There are couple of requirements we need to install before we continue:
21
22- `pip3 install feedparser` (parses RSS feed from url)
23- `pip3 install vaderSentiment` (does sentiment polarity analysis)
24- `pip3 install matplotlib` (plots chart of results)
25
26So first we need to fetch RSS data and sanitize HTML content from description.
27
28```python
29import re
30import feedparser
31
32feed_url = "https://www.theguardian.com/world/rss"
33feed = feedparser.parse(feed_url)
34
35# sanitize html
36for item in feed.entries:
37 item.description = re.sub('<[^<]+?>', '', item.description)
38```
39
40## Perform sentiment analysis
41
42Since we now have cleaned up data in our `feed.entries` object we can start with performing sentiment analysis.
43
44There are many sentiment analysis libraries available that range from rule-based sentiment analysis up to machine learning supported analysis. To keep things simple I decided to use rule-based analysis library [vaderSentiment](https://github.com/cjhutto/vaderSentiment) from [C.J. Hutto](https://github.com/cjhutto). Really nice library and quite easy to use.
45
46```python
47from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
48analyser = SentimentIntensityAnalyzer()
49
50sentiment_results = []
51for item in feed.entries:
52 sentiment_title = analyser.polarity_scores(item.title)
53 sentiment_description = analyser.polarity_scores(item.description)
54 sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']])
55```
56
57Now that we have this data in a shape that is compatible with matplotlib we can plot results to see the difference between title and description sentiment of an article.
58
59```python
60import matplotlib.pyplot as plt
61
62plt.rcParams['figure.figsize'] = (15, 3)
63plt.plot(sentiment_results, drawstyle='steps')
64plt.title('Sentiment analysis relationship between title and description (Guardian World News)')
65plt.legend(['title', 'description'])
66plt.show()
67```
68
69## Results and assets
70
711. Because of the small sample size further conclusions are impossible to make.
722. Rule-based approach may not be the best way of doing this. By using deep learning we would be able to get better insights.
733. **Next step would be to** periodically fetch RSS items and store them over a longer period of time and then perform analysis again and use either machine learning or deep learning on top of it.
74
75![Relationship between title and description](/assets/sentiment-analysis/guardian-sa-title-desc-relationship.png)
76
77Figure above displays difference between title and description sentiment for specific RSS feed item. 1 means positive and -1 means negative sentiment.
78
79[» Download Jupyter Notebook](/assets/sentiment-analysis/sentiment-analysis.ipynb)
80
81## Going further
82
83- [Twitter Sentiment Analysis by Bryan Schwierzke](https://github.com/bswiss/news_mood)
84- [AFINN-based sentiment analysis for Node.js by Andrew Sliwinski](https://github.com/thisandagain/sentiment)
85- [Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande](https://github.com/adeshpande3/LSTM-Sentiment-Analysis)
86- [Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir](https://github.com/abdulfatir/twitter-sentiment-analysis)
diff --git a/content/2020-03-22-simple-sse-based-pubsub-server.md b/content/2020-03-22-simple-sse-based-pubsub-server.md
deleted file mode 100644
index d7bd451..0000000
--- a/content/2020-03-22-simple-sse-based-pubsub-server.md
+++ /dev/null
@@ -1,396 +0,0 @@
1~ title: Simple Server-Sent Events based PubSub Server
2~ description: PubSub server made with Server-Sent Events
3~ slug: /simple-server-sent-events-based-pubsub-server.html
4~ date: 2020-03-22
5~ template: post
6~ hide: false
7
8## Before we continue ...
9
10Publisher Subscriber model is nothing new and there are many amazing solutions out there, so writing a new one would be a waste of time if other solutions wouldn't have quite complex install procedures and weren't so hard to maintain. But to be fair, comparing this simple server with something like [Kafka](https://kafka.apache.org/) or [RabbitMQ](https://www.rabbitmq.com/) is laughable at the least. Those solutions are enterprise grade and have many mechanisms there to ensure messages aren't lost and much more. Regardless of these drawbacks, this method has been tested on a large website and worked until now without any problems. So now, that we got that cleared up, let's continue.
11
12***Wiki definition:** Publish/subscribe messaging, or pub/sub messaging, is a form of asynchronous service-to-service communication used in serverless and microservices architectures. In a pub/sub model, any message published to a topic is immediately received by all the subscribers to the topic.*
13
14## General goals
15
16- provide a simple server that relays messages to all the connected clients,
17- messages can be posted on specific topics,
18- messages get sent via [Server-Sent Events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) to all the subscribers.
19
20## How exactly does the pub/sub model work?
21
22The easiest way to explain this is with diagram bellow. Basic function is simple. We have subscribers that receive messages, and we have publishers that create and post messages. Similar model is also well know pattern that works on a premise of consumers and producers, and they take similar roles.
23
24![How PubSub works](/assets/simple-pubsub-server/pubsub-overview.png)
25
26**These are some naive characteristics we want to achieve:**
27
28- producer is publishing messages to subscribe topic,
29- consumer is receiving messages from subscribed topic,
30- servers is also known as Broker,
31- broker does not store messages or tracks success,
32- broker uses [FIFO](https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics)) method for delivering messages,
33- if consumer wants to receive messages from a topic, producer and consumer topics must match,
34- consumer can subscribe to multiple topics,
35- producer can publish to multiple topics,
36- each message has a messageId.
37
38**Known drawbacks:**
39
40- messages will not be stored in a persistent queue or unreceived messages like [DeadLetterQueue](https://en.wikipedia.org/wiki/Dead_letter_queue) so old messages could be lost on server restart,
41- [Server-Sent Events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) opens a long-running connection between the client and the server so make sure if your setup is load balanced that the load balancer in this case can have long opened connection,
42- no system moderation due to the dynamic nature of creating queues.
43
44## Server-Sent Events
45
46Read more about it on [official specification page](https://html.spec.whatwg.org/multipage/server-sent-events.html).
47
48### Current browser support
49
50![Browser support](../assets/simple-pubsub-server/caniuse.png)
51
52Check [https://caniuse.com/#feat=eventsource](https://caniuse.com/#feat=eventsource) for latest information about browser support.
53
54### Known issues
55
56- Firefox 52 and below do not support EventSource in web/shared workers
57- In Firefox prior to version 36 server-sent events do not reconnect automatically in case of a connection interrupt (bug)
58- Reportedly, CORS in EventSource is currently supported in Firefox 10+, Opera 12+, Chrome 26+, Safari 7.0+.
59- Antivirus software may block the event streaming data chunks.
60
61Source: [https://caniuse.com/#feat=eventsource](https://caniuse.com/#feat=eventsource)
62
63### Message format
64
65The simplest message that can be sent is only with data attribute:
66
67```bash
68data: this is a simple message
69<blank line>
70```
71
72You can send message IDs to be used if the connection is dropped:
73
74```bash
75id: 33
76data: this is line one
77data: this is line two
78<blank line>
79```
80
81And you can specify your own event types (the above messages will all trigger the message event):
82
83```bash
84id: 36
85event: price
86data: 103.34
87<blank line>
88```
89
90### Server requirements
91
92The important thing is how you send headers and which headers are sent by the server that triggers browser to threat response as a EventStream.
93
94Headers responsible for this are:
95
96```bash
97Content-Type: text/event-stream
98Cache-Control: no-cache
99Connection: keep-alive
100```
101
102### Debugging with Google Chrome
103
104Google Chrome provides build-in debugging and exploration tool for [Server-Sent Events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) which is quite nice and available from Developer Tools under Network tab.
105
106> You can debug only client side events that get received and not the server ones. For debugging server events add `console.log` to `server.js` code and print out events.
107
108![Google Chrome Developer Tools EventStream](../assets/simple-pubsub-server/chrome-debugging.png)
109
110## Server implementation
111
112For the sake of this example we will use [Node.js](https://nodejs.org/en/) with [Express](https://expressjs.com) as our router since this is the easiest way to get started and we will use already written SSE library for node [sse-pubsub](https://www.npmjs.com/package/sse-pubsub) so we don't reinvent the wheel.
113
114```bash
115npm init --yes
116
117npm install express
118npm install body-parser
119npm install sse-pubsub
120```
121
122Basic implementation of a server (`server.js`):
123
124```js
125const express = require('express');
126const bodyParser = require('body-parser');
127const SSETopic = require('sse-pubsub');
128
129const app = express();
130const port = process.env.PORT || 4000;
131
132// topics container
133const sseTopics = {};
134
135app.use(bodyParser.json());
136
137// open for all cors
138app.all('*', (req, res, next) => {
139 res.header('Access-Control-Allow-Origin', '*');
140 res.header('Access-Control-Allow-Headers', 'X-Requested-With, Content-Type');
141 next();
142});
143
144// preflight request error fix
145app.options('*', async (req, res) => {
146 res.header('Access-Control-Allow-Origin', '*');
147 res.header('Access-Control-Allow-Headers', 'X-Requested-With, Content-Type');
148 res.send('OK');
149});
150
151// serve the event streams
152app.get('/stream/:topic', async (req, res, next) => {
153 const topic = req.params.topic;
154
155 if (!(topic in sseTopics)) {
156 sseTopics[topic] = new SSETopic({
157 pingInterval: 0,
158 maxStreamDuration: 15000,
159 });
160 }
161
162 // subscribing client to topic
163 sseTopics[topic].subscribe(req, res);
164});
165
166// accepts new messages into topic
167app.post('/publish', async (req, res) => {
168 let body = req.body;
169 let status = 200;
170
171 console.log('Incoming message:', req.body);
172
173 if (
174 body.hasOwnProperty('topic') &&
175 body.hasOwnProperty('event') &&
176 body.hasOwnProperty('message')
177 ) {
178 const topic = req.body.topic;
179 const event = req.body.event;
180 const message = req.body.message;
181
182 if (topic in sseTopics) {
183 // sends message to all the subscribers
184 sseTopics[topic].publish(message, event);
185 }
186 } else {
187 status = 400;
188 }
189
190 res.status(status).send({
191 status,
192 });
193});
194
195// returns JSON object of all opened topics
196app.get('/status', async (req, res) => {
197 res.send(sseTopics);
198});
199
200// health-check endpoint
201app.get('/', async (req, res) => {
202 res.send('OK');
203});
204
205// return a 404 if no routes match
206app.use((req, res, next) => {
207 res.set('Cache-Control', 'private, no-store');
208 res.status(404).end('Not found');
209});
210
211// starts the server
212app.listen(port, () => {
213 console.log(`PubSub server running on http://localhost:${port}`);
214});
215```
216
217### Our custom message format
218
219Each message posted on a server must be in a specific format that out server accepts. Having structure like this allows us to have multiple separated type of events on each topic.
220
221With this we can separate streams and only receive events that belong to the topic.
222
223One example would be, that we have index page and we want to receive messages about new upvotes or new subscribers but we don't want to follow events for other pages. This reduces clutter and overall network. And structure is much nicer and maintanable.
224
225```json
226{
227 "topic": "sample-topic",
228 "event": "sample-event",
229 "message": { "name": "John" }
230}
231```
232
233## Publisher and subscriber clients
234
235### Publisher and subscriber in action
236
237<video src="/assets/simple-pubsub-server/clients.mp4" controls></video>
238
239You can download [the code](../assets/simple-pubsub-server/sse-pubsub-server.zip) and follow along.
240
241### Publisher
242
243As talked about above publisher is the one that send messages to the broker/server. Message inside the payload can be whatever you want (string, object, array). I would however personally avoid send large chunks of data like blobs and such.
244
245```html
246<!DOCTYPE html>
247<html lang="en">
248
249 <head>
250 <meta charset="UTF-8">
251 <meta name="viewport" content="width=device-width, initial-scale=1.0">
252 <title>Publisher</title>
253 </head>
254
255 <body>
256
257 <h1>Publisher</h1>
258
259 <fieldset>
260 <p>
261 <label>Server:</label>
262 <input type="text" id="server" value="http://localhost:4000">
263 </p>
264 <p>
265 <label>Topic:</label>
266 <input type="text" id="topic" value="sample-topic">
267 </p>
268 <p>
269 <label>Event:</label>
270 <input type="text" id="event" value="sample-event">
271 </p>
272 <p>
273 <label>Message:</label>
274 <input type="text" id="message" value='{"name": "John"}'>
275 </p>
276 <p>
277 <button type="button" id="button">Publish message to topic</button>
278 </p>
279 </fieldset>
280
281 <script>
282
283 const button = document.querySelector('#button');
284 const server = document.querySelector('#server');
285 const topic = document.querySelector('#topic');
286 const event = document.querySelector('#event');
287 const message = document.querySelector('#message');
288
289 button.addEventListener('click', async (evt) => {
290 const req = await fetch(`${server.value}/publish`, {
291 method: 'post',
292 headers: {
293 'Accept': 'application/json',
294 'Content-Type': 'application/json',
295 },
296 body: JSON.stringify({
297 topic: topic.value,
298 event: event.value,
299 message: JSON.parse(message.value),
300 }),
301 });
302
303 const res = await req.json();
304 console.log(res);
305 });
306
307 </script>
308
309 </body>
310
311</html>
312
313```
314
315### Subscriber
316
317Subscriber is responsible for receiving new messages that come from server via publisher. The code bellow is very rudimentary but works and follows the implementation guidelines for EventSource.
318
319You can use either Developer Tools Console to see incoming messages or you can defer to Debugging with Google Chrome section above to see all EventStream messages.
320
321> Don't be alarmed if the subscriber gets disconnected from the server every so often. The code we have here resets connection every 15s but it automatically get reconnected and fetches all messages up to last received message id. This setting can be adjusted in `server.js` file; search for the `maxStreamDuration` variable.
322
323```html
324<!DOCTYPE html>
325<html lang="en">
326
327 <head>
328 <meta charset="UTF-8">
329 <meta name="viewport" content="width=device-width, initial-scale=1.0">
330 <title>Subscriber</title>
331 <link rel="stylesheet" href="style.css">
332 </head>
333
334 <body>
335
336 <h1>Subscriber</h1>
337
338 <fieldset>
339 <p>
340 <label>Server:</label>
341 <input type="text" id="server" value="http://localhost:4000">
342 </p>
343 <p>
344 <label>Topic:</label>
345 <input type="text" id="topic" value="sample-topic">
346 </p>
347 <p>
348 <label>Event:</label>
349 <input type="text" id="event" value="sample-event">
350 </p>
351 <p>
352 <button type="button" id="button">Subscribe to topic</button>
353 </p>
354 </fieldset>
355
356 <script>
357
358 const button = document.querySelector('#button');
359 const server = document.querySelector('#server');
360 const topic = document.querySelector('#topic');
361 const event = document.querySelector('#event');
362
363 button.addEventListener('click', async (evt) => {
364
365 let es = new EventSource(`${server.value}/stream/${topic.value}`);
366
367 es.addEventListener(event.value, function (evt) {
368 console.log(`incoming message`, JSON.parse(evt.data));
369 });
370
371 es.addEventListener('open', function (evt) {
372 console.log('connected', evt);
373 });
374
375 es.addEventListener('error', function (evt) {
376 console.log('error', evt);
377 });
378
379 });
380
381 </script>
382
383 </body>
384
385</html>
386
387```
388
389## Reading further
390
391- [Using server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events)
392- [Using SSE Instead Of WebSockets For Unidirectional Data Flow Over HTTP/2](https://www.smashingmagazine.com/2018/02/sse-websockets-data-flow-http2/)
393- [What is Server-Sent Events?](https://apifriends.com/api-streaming/server-sent-events/)
394- [An HTTP/2 extension for bidirectional messaging communication](https://tools.ietf.org/id/draft-xie-bidirectional-messaging-01.html)
395- [Introduction to HTTP/2](https://developers.google.com/web/fundamentals/performance/http2)
396- [The WebSocket API (WebSockets)](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API)
diff --git a/content/2020-03-27-create-placeholder-images-with-sharp.md b/content/2020-03-27-create-placeholder-images-with-sharp.md
deleted file mode 100644
index e53a7de..0000000
--- a/content/2020-03-27-create-placeholder-images-with-sharp.md
+++ /dev/null
@@ -1,83 +0,0 @@
1~ title: Create placeholder images with sharp Node.js image processing library
2~ description: Create placeholder images with sharp Node.js image processing library
3~ slug: /create-placeholder-images-with-sharp.html
4~ date: 2020-03-27
5~ template: post
6~ hide: false
7
8I have been searching for a solution to pre-generate some placeholder images for image server I needed to develop that resizes images on S3. I though this would be a 15min job and quickly found out how very mistaken I was.
9
10Even though Node.js is not really the best way to do this kind of things (surely something written in C or Rust or even Golang would be the correct way to do this but we didn't need the speed in our case) I found an excellent library [sharp - High performance Node.js image processing](https://github.com/lovell/sharp).
11
12Getting things running was a breeze.
13
14## Fetch image from S3 and save resized
15
16```js
17const sharp = require('sharp');
18const aws = require('aws-sdk');
19
20const x,y = 100;
21const s3 = new aws.S3({});
22
23aws.config.update({
24 secretAccessKey: 'secretAccessKey',
25 accessKeyId: 'accessKeyId',
26 region: 'region'
27});
28
29const originalImage = await s3.getObject({
30 Bucket: 'some-bucket-name',
31 Key: 'image.jpg',
32}).promise();
33
34const resizedImage = await sharp(originalImage.Body)
35 .resize(x, y)
36 .jpeg({ progressive: true })
37 .toBuffer();
38
39s3.putObject({
40 Bucket: 'some-bucket-name',
41 Key: `optimized/${x}x${y}/image.jpg`,
42 Body: resizedImage,
43 ContentType: 'image/jpeg',
44 ACL: 'public-read'
45}).promise();
46```
47
48All this code was wrapped inside a web service with some additional security checks and defensive coding to detect if key is missing on S3.
49
50And at that point I needed to return placeholder images as a response in case key is missing or x,y are not allowed by the server etc. I could have created PNG in Gimp and just serve them but I wanted to respect aspect ratio and I didn't want to return some mangled images.
51
52> Main problem with finding a clean solution I could copy and paste and change a bit was a task. API is changing constantly and there weren't clear examples or I was unable to find them.
53
54## Generating placeholder images using SVG
55
56What I ended up was using SVG to generate text and created image with sharp and used composition to combine both layers. Response returned by this function is a buffer you can use to either upload to S3 or save to local file.
57
58```js
59const generatePlaceholderImageWithText = async (width, height, message) => {
60 const overlay = `<svg width="${width - 20}" height="${height - 20}">
61 <text x="50%" y="50%" font-family="sans-serif" font-size="16" text-anchor="middle">${message}</text>
62 </svg>`;
63
64 return await sharp({
65 create: {
66 width: width,
67 height: height,
68 channels: 4,
69 background: { r: 230, g: 230, b: 230, alpha: 1 }
70 }
71 })
72 .composite([{
73 input: Buffer.from(overlay),
74 gravity: 'center',
75 }])
76 .jpeg()
77 .toBuffer();
78}
79```
80
81That is about it. Nothing more to it. You can change the color of the image by changing `background` and if you want to change text styling you can adapt SVG to your needs.
82
83> Also be careful about the length of the text. This function positions text at the center and adds `20px` padding on all sides. If text is longer than the image it will get cut.
diff --git a/content/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md b/content/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md
deleted file mode 100644
index db4f377..0000000
--- a/content/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md
+++ /dev/null
@@ -1,76 +0,0 @@
1~ title: The strange case of Elasticsearch allocation failure
2~ description: Elasticsearch allocation failure on some indices while reporting domain processing
3~ slug: /the-strange-case-of-elasticsearch-allocation-failure.html
4~ date: 2020-03-29
5~ template: post
6~ hide: false
7
8I've been using Elasticsearch in production for 5 years now and never had a single problem with it. Hell, never even known there could be a problem. Just worked. All this time. The first node that I deployed is still being used in production, never updated, upgraded, touched in anyway.
9
10All this bliss came to an abrupt end this Friday when I got notification that Elasticsearch cluster went warm. Well, warm is not that bad right? Wrong! Quickly after that I got another email which sent chills down my spine. Cluster is now red. RED! Now, shit really hit the fan!
11
12I tried googling what could be the problem and after executing allocation function noticed that some shards were unassigned and 5 attempts were already made (which is BTW to my luck the maximum) and that meant I am basically fucked. They also applied that one should wait for cluster to re-balance itself. So, I waited. One hour, two hours, several hours. Nothing, still RED.
13
14The strangest thing about it all was, that queries were still being fulfilled. Data was coming out. On the outside it looked like nothing was wrong but everybody that would look at the cluster would know immediately that something was very very wrong and we were living on borrowed time here.
15
16> **Please, DO NOT do what I did.** Seriously! Please ask someone on official forums or if you know an expert please consult him. There could be million of reasons and these solution fit my problem. Maybe in your case it would disastrous. I had all the data backed up and even if I would fail spectacularly I would be able to restore the data. It would be a huge pain and I would loose couple of days but I had a plan B.
17
18Executing allocation and told me what the problem was but no clear solution yet.
19
20```yaml
21GET /_cat/allocation?format=json
22```
23
24I got a message that `ALLOCATION_FAILED` with additional info `failed to create shard, failure ioexception[failed to obtain in-memory shard lock]`. Well splendid! I must also say that our cluster is capable more than enough to handle the traffic. Also JVM memory pressure never was an issue. So what happened really then?
25
26I tried also re-routing failed ones with no success due to AWS restrictions on having managed Elasticsearch cluster (they lock some of the functions).
27
28```yaml
29POST /_cluster/reroute?retry_failed=true
30```
31
32I got a message that significantly reduced my options.
33
34```json
35{
36 "Message": "Your request: '/_cluster/reroute' is not allowed."
37}
38```
39
40After that I went on a hunt again. I won't bother you with all the details because hours/days went by until I was finally able to re-index the problematic index and hoped for the best. Until that moment even re-indexing was giving me errors.
41
42```yaml
43POST _reindex
44{
45 "source": {
46 "index": "myindex"
47 },
48 "dest": {
49 "index": "myindex-new"
50 }
51}
52```
53
54I needed to do this multiple times to get all the documents re-indexed. Then I dropped the original one with the following command.
55
56```yaml
57DELETE /myindex
58```
59
60And re-indexed again new one in the original one (well by name only).
61
62```yaml
63POST _reindex
64{
65 "source": {
66 "index": "myindex-new"
67 },
68 "dest": {
69 "index": "myindex"
70 }
71}
72```
73
74On the surface it looks like all is working but I have a long road in front of me to get all the things working again. Cluster now shows that it is in Green mode but I am also getting a notification that the cluster has processing status which could mean million of things.
75
76Godspeed!
diff --git a/content/2020-03-30-my-love-and-hate-relationship-with-nodejs copy.md b/content/2020-03-30-my-love-and-hate-relationship-with-nodejs copy.md
deleted file mode 100644
index a57e4a6..0000000
--- a/content/2020-03-30-my-love-and-hate-relationship-with-nodejs copy.md
+++ /dev/null
@@ -1,41 +0,0 @@
1~ title: My love and hate relationship with Node.js
2~ description: How I found a way to love and hate Node.js with a passion
3~ slug: /my-love-and-hate-relationship-with-nodejs.html
4~ date: 2020-03-30
5~ template: post
6~ hide: false
7
8Previous project I was working on was being coded in [Golang](https://golang.org/). Also was my first project using it. And damn, that was an awesome experience. The whole thing is just superb. From how errors are handled. The C-like way you handle compiling. The way the language is structured making it incredibly versatile and easy to learn.
9
10It may cause some pain for somebody that is not used of using interfaces to map JSON and doing the recompilation all the time. But we have tools like [entr](http://eradman.com/entrproject/) and [make](https://www.gnu.org/software/make/) to fix that.
11
12But we are not here to talk about my undying love for **Golang**. Only in some way we probably should. It is an excellent example of how modern language should be designed. And because I have used it extensively in the last couple of years this probably taints my views of other languages. And is doing me a great disservice. Nevertheless, here we are.
13
14About two years ago I started flirting with [Node.js](https://nodejs.org/en/) for a project I started working on. What I wanted was to have things written in a language that is widely used, and we could get additional developers for. As much as **Golang** is amazing it's really hard to get developers for it. Even now. And after playing around with it for a week I felt in love with the speed of iteration and massive package ecosystem. Do you want SSO? You got it! Do you want some esoteric library for something? There is a strong chance somebody wrote it. It is so extensive that you find yourself evaluating packages based on **GitHub stars** and number of contributors. You get swallowed by the vanity metrics and that potentially will become the downfall of Node.js.
15
16Because of the sheer amount of choice I often got anxiety when choosing libraries. Will I choose the correct one? Is this library something that will be supported for a foreseeable future or not? I am used of using libraries that are being in development for 10 years plus (Python, C) and that gave me some sort of comfort. And it is probably unfair to Node.js and community to expect same dedication.
17
18Moving forward ... Work started and things were great. **Speed of iteration was insane**. For some feature that I would need a day in Golang only took me hour or two. I became lazy! Using packages all over the place. Falling into the same trap as others. Packages on top of packages. And [npm](https://www.npmjs.com/) didn't help at all. The way that the package manager works is just horrendous. And not allowing to have node_modules outside the project is also the stupidest idea ever.
19
20So at that point I started feeling the technical debt that comes with Node.js and the whole ecosystem. What nobody tells you is that **structuring large Node.js apps** is more problematic than one would think. And going microservice for every single thing is also a bad idea. The amount of networking you introduce with that approach always ends up being a pain in the ass. And I don't even want to go into system administration here. The overhead is insane. Package-lock.json made many days feel like living hell for me. And I would eat the cost of all this if it meant for better development experience. Well, it didn't.
21
22The **lack of Typescript** support in the interpreter is still mind boggling to me. Why haven't they added native support yet for this is beyond me?! That would have solved so many problems. Lack of type safety became a problem somewhere in the middle of the project where the codebase was sufficiently large enough to present problems. We started adding arguments to functions and there was **no way to implicitly define argument types**. And because at that point there were a lot of functions, it became impossible to know what each one accepts, development became more and more trial and error based.
23
24I tried **implementing Typescript**, but that would present a large refactor that we were not willing to do at that point. The benefits were not enough. I also tried [Flow - static type checker](https://flow.org/) but implementation was also horrible. What Typescript and Flow forces you is to have src folder and then **transpile** your code into dist folder and run it with node. WTH is that all about. Why can't this be done in memory or some virtual file system? Why? I see no reason why this couldn't be done like this. But it is what it is. I abandoned all hope for static type checking.
25
26One of the problems that resulted from not having interfaces or types was inability to model out our data from **Elasticsearch**. I could have done a **pedestrian implementation** of it, but there must be a better way of doing this without resorting to some hack basically. Or maybe I haven't found a solution, which is also a possibility. I have looked, though. No juice!
27
28**Error handling?** Is that a joke?
29
30Thank god for **await/async**. Without it, I would have probably just abandoned the whole thing and went with something else like Python. That's all I am going to say about this :)
31
32I started asking myself a question if Node.js is actually ready to be used in a **large scale applications**? And this was a totally wrong question. What I should have been asking myself was, how to use Node.js in large scale application. And you don't get this in **marketing material** for Express or Koa etc. They never tell you this. Making Node.js scale on infrastructure or in codebase is really **more of an art than a science**. And just like with the whole JavaScript ecosystem:
33- impossible to master,
34- half of your time you work on your tooling,
35- just accept transpilers that convert one code into another (holly smokes),
36- error handling is a joke,
37- standards? What standards?
38
39But on the other hand. As I did, you will also learn to love it. Learn to use it quickly and do impossible things in crazy limited time.
40
41I hate to admit it. But I love Node.js. Dammit, I love it :)
diff --git a/content/2020-04-05-remote-work.md b/content/2020-04-05-remote-work.md
deleted file mode 100644
index e7a1ff0..0000000
--- a/content/2020-04-05-remote-work.md
+++ /dev/null
@@ -1,37 +0,0 @@
1~ title: Remote work and how it affects the daily lives of people
2~ description: Remote work and how it affects the daily lives of people
3~ slug: /remote-work.html
4~ date: 2020-05-05
5~ template: post
6~ hide: false
7
8I have been working remotely for the past 5 years. I love it. Love the freedom and make your schedule thingy.
9
10## You work more not less
11
12I've heard from people things like: "Oh, you are so lucky, working from home, having all the free time you want". It was obvious they had no clue what means working remotely. They had this romantic idea of remote work. You can watch TV whenever you like, you can go outside for a picnic if you want and stuff like that.
13
14This may be true if you work a day or two in a week from home. But if you go completely remote all these changes completely. I take some time to acclimate but then you start feeling the consequences of going fully remote. And it's not all rainbows and unicorns. Rather the opposite.
15
16## Feeling lost
17
18At first, I remembered I felt lost. I was not used to this kind of environment. It felt disoriented and a part of you that is used to procrastinate turns on. You start thinking of a workday as a whole day. And soon this idea of "I can do this later" starts creeping in. Well, I have the whole day ahead of me. I can do this a bit later.
19
20## Hyper-performance
21
22As a direct result, you become more focused on your work since you don't have all the interruptions common in the workplace. And you can quickly get used to this hyper-performance. But this mode requires also a lot of peace and quiet.
23
24And here we come to the ugly parts of all this. **People rarely have the self-control** to not waste other people's time. It is paralyzing when people start calling you, sending you chat messages, etc. The thing is, that when I achieve this hyper-performance mode I am completely embroiled in the problem I am solving and this kind of interruptions mess with your head. I need an hour at least to get back in the zone. Sometimes not achieving the same focus the whole day.
25
26I know that life is not how you want it to be and takes its route but from what I've learned this kind of interruptions can be avoided in 90% of the case easily just by closing any chat programs and putting your phone in a drawer.
27
28## Suggestion to all the new remote workers
29
30- Stop wasting other people's time. You don't bother people at their desks in the office either.
31- Do not replace daily chats in the hallways with instant messaging software. It will only interrupt people. Nothing good will come of it.
32- Set your working hours and try to not allow it to bleed outside these boundaries and maintain your routine.
33- Be prepared that hours will be longer regardless of your good intentions and your well thought of routine.
34- Try to be hyper-focused and do only one thing at the time. Multitasking is the enemy of progress.
35- Avoid long meetings and if possible eliminate them. Rather take time to write them out and allow others to respond in their own time. Meetings are usually a large waste of time and most of the people attending them are there just because the manager said so.
36- The software will not solve your problems. And throwing money at problems neither.
37- If you are in a managerial position don't supervise any single minute of workers. They are probably giving you more hours anyways. Track progress weekly not daily. You hired them and give them the benefit of the doubt that they will deliver what you agreed upon.
diff --git a/content/2020-08-15-systemd-disable-wake-onmouse.md b/content/2020-08-15-systemd-disable-wake-onmouse.md
deleted file mode 100644
index dc22220..0000000
--- a/content/2020-08-15-systemd-disable-wake-onmouse.md
+++ /dev/null
@@ -1,47 +0,0 @@
1~ title: Disable mouse wake from suspend with systemd service
2~ description: Disable mouse wake from suspend with systemd service
3~ slug: /disable-mouse-wake-from-suspend-with-systemd-service.html
4~ date: 2020-08-15
5~ template: post
6~ hide: false
7
8I recently bought [ThinkPad X220](https://www.laptopmag.com/reviews/laptops/lenovo-thinkpad-x220) just as a joke on eBay to test Linux distributions and play around with things and not destroy my main machine. Little to my knowledge I felt in love with it. Man, they really made awesome machines back then.
9
10After changing disk that came with it to SSD and installing Ubuntu to test if everything works I noticed that even after a single touch of my external mouse the system would wake up from sleep even though the lid was shut down.
11
12I wouldn't even noticed it if laptop didn't have [LED sleep indicator](https://support.lenovo.com/lk/en/solutions/~/media/Images/ContentImages/p/pd025386_x1_status_03.ashx?w=426&h=262). I already had a bad experience with Linux and it's power management. I had a [Dell Inspiron 7537](https://www.pcmag.com/reviews/dell-inspiron-15-7537) laptop with a touchscreen and while traveling it decided to wake up and started cooking in my backpack to the point that the digitizer responsible for touch actually glue off and the whole screen got wrecked. So, I am a bit touchy about this.
13
14I went on solution hunting and to my surprise there is no easy way to disable specific devices to perform wake up. Why is this not under the power management tab in setting is really strange.
15
16After googling for a solution I found [this nice article describing the solution](https://codetrips.com/2020/03/18/ubuntu-disable-mouse-wake-from-suspend/) that worked for me. The only problem with this solution was that he added his solution to `.bashrc` and this triggers `sudo` that asks for a password each time new terminal is opened, which get annoying quickly since I open a lot of terminals all the time.
17
18I followed his instructions and got to solution `sudo sh -c "echo 'disabled' > /sys/bus/usb/devices/2-1.1/power/wakeup"`.
19
20I created a system service file `sudo nano /etc/systemd/system/disable-mouse-wakeup.service` and removed `sudo` and replaced `sh` with `/usr/bin/sh` and pasted all that in `ExecStart`.
21
22```ini
23[Unit]
24Description=Disables wakeup on mouse event
25After=network.target
26StartLimitIntervalSec=0
27
28[Service]
29Type=simple
30Restart=always
31RestartSec=1
32User=root
33ExecStart=/usr/bin/sh -c "echo 'disabled' > /sys/bus/usb/devices/2-1.1/power/wakeup"
34
35[Install]
36WantedBy=multi-user.target
37```
38
39After that I enabled, started and checked status of service.
40
41```sh
42sudo systemctl enable disable-mouse-wakeup.service
43sudo systemctl start disable-mouse-wakeup.service
44sudo systemctl status disable-mouse-wakeup.service
45```
46
47This will permanently disable that device from wakeing up you computer on boot. If you have many devices you would like to surpress from waking up your machine I would create a shell script and call that instead of direclty doing it in service file.
diff --git a/content/2020-09-06-esp-and-micropython.md b/content/2020-09-06-esp-and-micropython.md
deleted file mode 100644
index b5002ce..0000000
--- a/content/2020-09-06-esp-and-micropython.md
+++ /dev/null
@@ -1,203 +0,0 @@
1~ title: Getting started with MicroPython and ESP8266
2~ description: Getting started with MicroPython and ESP8266
3~ slug: /esp8266-and-micropython-guide.html
4~ date: 2020-09-06
5~ template: post
6~ hide: false
7
8**Table of contents**
9
101. [Introduction](#introduction)
112. [Flashing the SOC](#flashing-the-soc)
123. [Install better tooling](#install-better-tooling)
134. [Additional resources](#additional-resources)
14
15
16## Introduction
17
18A while ago I bought some [ESP8266](https://www.espressif.com/en/products/socs/esp8266) and [ESP32](https://www.espressif.com/en/products/socs/esp32) dev boards to play around with and I finally found a project to try it out.
19
20For my project, I used [ESP32](https://www.espressif.com/en/products/socs/esp32) but I could easily choose [ESP8266](https://www.espressif.com/en/products/socs/esp8266). This guide contains which tools I use and how I prepared my workspace to code for [ESP8266](https://www.espressif.com/en/products/socs/esp8266).
21
22![ESP8266 and ESP32 boards](/assets/esp8366-micropython/boards.jpg)
23
24This guide covers:
25- flashing SOC
26- install proper tooling
27- deploying a simple script
28
29> Make sure that you are using **a good USB cable**. I had some problems with mine and once I replaced it everything started to work.
30
31## Flashing the SOC
32
33Plug your ESP8266 to USB port and check if the device was recognized with executing `dmesg | grep ch341-uart`.
34
35Then check if the device is available under `/dev/` by running `ls /dev/ttyUSB*`.
36
37> **Linux users**: if a device is not available be sure you are in `dialout` group. You can check this by executing `groups $USER`. You can add a user to `dialout` group with `sudo adduser $USER dialout`.
38
39After these conditions are meet go to the navigate to [https://micropython.org/download/esp8266/](https://micropython.org/download/esp8266/) and download `esp8266-20200902-v1.13.bin`.
40
41```sh
42mkdir esp8266-test
43cd esp8266-test
44
45wget https://micropython.org/resources/firmware/esp8266-20200902-v1.13.bin
46```
47
48After obtaining firmware we will need some tooling to flash the firmware to the board.
49
50```sh
51sudo pip3 install esptool
52```
53
54You can read more about `esptool` at [https://github.com/espressif/esptool/](https://github.com/espressif/esptool/).
55
56Before flashing the firmware we need to erase the flash on device. Substitute `USB0` with the device listed in output of `ls /dev/ttyUSB*`.
57
58```sh
59esptool.py --port /dev/ttyUSB0 erase_flash
60```
61
62If flash was successfully erased it is now time to flash the new firmware to it.
63
64```sh
65esptool.py --port /dev/ttyUSB0 --baud 460800 write_flash --flash_size=detect 0 esp8266-20200902-v1.13.bin
66```
67
68If everything went ok you can try accessing MicroPython REPL with `screen /dev/ttyUSB0 115200` or `picocom /dev/ttyUSB0 -b115200`.
69
70> Sometimes you will need to press `ENTER` in `screen` or `picocom` to access REPL.
71
72When you are in REPL you can test if all is working properly following steps.
73
74```py
75> import machine
76> machine.freq()
77```
78
79This should output a number representing a frequency of the CPU (mine was `80000000`).
80
81When you are in `screen` or `picocom` these can help you a bit.
82
83| Key | Command |
84| -------- | -------------------- |
85| CTRL+d | preforms soft reboot |
86| CTRL+a x | exits picocom |
87| CTRL+a \ | exits screen |
88
89
90## Install better tooling
91
92Now, to make our lives a little bit easier there are couple of additional tools that will make this whole experience a little more bearable.
93
94There are twq cool ways of uploading local files to SOC flash.
95
96- ampy → [https://github.com/scientifichackers/ampy](https://github.com/scientifichackers/ampy)
97- rshell → [https://github.com/dhylands/rshell](https://github.com/dhylands/rshell)
98
99### ampy
100
101```bash
102# installing ampy
103sudo pip3 install adafruit-ampy
104```
105
106Listed below are some common commands I used.
107
108```bash
109
110# uploads file to flash
111ampy --delay 2 --port /dev/ttyUSB0 put boot.py
112
113# lists file on flash
114ampy --delay 2 --port /dev/ttyUSB0 ls
115
116# outputs contents of file on flash
117ampy --delay 2 --port /dev/ttyUSB0 cat boot.py
118```
119
120> I added `delay` of 2 seconds because I had problems with executing commands.
121
122### rshell
123
124Even though `ampy` is a cool tool I opted with `rshell` in the end since it's much more polished and feature rich.
125
126```bash
127# installing ampy
128sudo pip3 install rshell
129```
130
131Now that `rshell` is installed we can connect to the board.
132
133```bash
134rshell --buffer-size=30 -p /dev/ttyUSB0 -a
135```
136
137This will open a shell inside bash and from here you can execute multiple commands. You can check what is supported with `help` once you are inside of a shell.
138
139```bash
140m@turing ~/Junk/esp8266-test
141$ rshell --buffer-size=30 -p /dev/ttyUSB0 -a
142
143Using buffer-size of 30
144Connecting to /dev/ttyUSB0 (buffer-size 30)...
145Trying to connect to REPL connected
146Testing if ubinascii.unhexlify exists ... Y
147Retrieving root directories ... /boot.py/
148Setting time ... Sep 06, 2020 23:54:28
149Evaluating board_name ... pyboard
150Retrieving time epoch ... Jan 01, 2000
151Welcome to rshell. Use Control-D (or the exit command) to exit rshell.
152/home/m/Junk/esp8266-test> help
153
154Documented commands (type help <topic>):
155========================================
156args cat connect date edit filesize help mkdir rm shell
157boards cd cp echo exit filetype ls repl rsync
158
159Use Control-D (or the exit command) to exit rshell.
160```
161
162> Inside a shell `ls` will display list of files on your machine. To get list of files on flash folder `/pyboard` is remapped inside the shell. To list files on flash you must perform `ls /pyboard`.
163
164#### Moving files to flash
165
166To avoid copying files all the time I used `rsync` function from the inside of `rshell`.
167
168```bash
169rsync . /pyboard
170```
171
172#### Executing scripts
173
174It is a pain to continuously reboot the device to trigger `/pyboard/boot.py` and there is a better way of testing local scripts on remote device.
175
176Lets assume we have `src/freq.py` file that displays CPU frequency of a remote device.
177
178```py
179# src/freq.py
180
181import machine
182print(machine.freq())
183```
184
185Now lets upload this and execute it.
186
187```bash
188# syncs files to remove device
189rsync ./src /pyboard
190
191# goes into REPL
192repl
193
194# we import file by importing it without .py extension and this will run the script
195> import freq
196
197# CTRL+x will exit REPL
198```
199
200## Additional resources
201
202- [https://randomnerdtutorials.com/getting-started-micropython-esp32-esp8266/](https://randomnerdtutorials.com/getting-started-micropython-esp32-esp8266/)
203- [http://docs.micropython.org/en/latest/esp8266/quickref.html](http://docs.micropython.org/en/latest/esp8266/quickref.html)
diff --git a/content/2020-09-08-bind-warning-on-login.md b/content/2020-09-08-bind-warning-on-login.md
deleted file mode 100644
index 51f59c7..0000000
--- a/content/2020-09-08-bind-warning-on-login.md
+++ /dev/null
@@ -1,40 +0,0 @@
1~ title: Fix bind warning in .profile on login in Ubuntu
2~ description: Fix bind warning in .profile on login in Ubuntu
3~ slug: /bind-warning-on-login-in-ubuntu.html
4~ date: 2020-09-08
5~ template: post
6~ hide: false
7
8Recently I moved back to [bash](https://www.gnu.org/software/bash/) as my default shell. I was previously using [fish](https://fishshell.com/) and got used to the cool features it has. But, regardless of that, I wanted to move to a more standard shell because I was hopping back and forth with exporting variables and stuff like that which got pretty annoying.
9
10So I embarked on a mission to make [bash](https://www.gnu.org/software/bash/) more like [fish](https://fishshell.com/) and in the process found that I really missed autosuggest with TAB on changing directories.
11
12I found a nice alternative that emulates [zsh](http://zsh.sourceforge.net/) like autosuggestion and autocomplete so I added the following to my `.bashrc` file.
13
14```bash
15bind "TAB:menu-complete"
16bind "set show-all-if-ambiguous on"
17bind "set completion-ignore-case on"
18bind "set menu-complete-display-prefix on"
19bind '"\e[Z":menu-complete-backward'
20```
21
22I haven't noticed anything wrong with this and all was working fine until I restarted my machine and then I got this error.
23
24![Profile bind error](/assets/profile-bind-error/error.jpg)
25
26When I pressed OK, I got into the [Gnome shell](https://wiki.gnome.org/Projects/GnomeShell) and all was working fine, but the error was still bugging me. I started looking for the reason why this is happening and found a solution to this error on [Remote SSH Commands - bash bind warning: line editing not enabled](https://superuser.com/a/892682).
27
28So I added a simple `if [ -t 1 ]` around `bind` statements to avoid running commands that presume the session is interactive when it isn't.
29
30```bash
31if [ -t 1 ]; then
32 bind "TAB:menu-complete"
33 bind "set show-all-if-ambiguous on"
34 bind "set completion-ignore-case on"
35 bind "set menu-complete-display-prefix on"
36 bind '"\e[Z":menu-complete-backward'
37fi
38```
39
40After logging out and back in the problem was gone.
diff --git a/content/2020-09-09-digitalocean-sync.md b/content/2020-09-09-digitalocean-sync.md
deleted file mode 100644
index 7c5dbbd..0000000
--- a/content/2020-09-09-digitalocean-sync.md
+++ /dev/null
@@ -1,65 +0,0 @@
1~ title: Using Digitalocean Spaces to sync between computers
2~ description: Using Digitalocean Spaces to sync between computers
3~ slug: /digitalocean-spaces-to-sync-between-computers.html
4~ date: 2020-09-09
5~ template: post
6~ hide: false
7
8
9I've been using [Dropbox](https://www.dropbox.com/) for probably **10+ years** now and I-ve became so used to it that it runs in the background that I don't even imagine a world without it. But it's not without problems.
10
11At first I had problems with `.venv` environments for Python and the only solution for excluding synchronization for this folder was to manually exclude a specific folder which is not really scalable. FYI, my whole project folder is synced on [Dropbox](https://www.dropbox.com/). This of course introduced a lot of syncing of files and folders that are not needed or even break things on other machines. In the case of **Python**, I couldn't use that on my second machine. I needed to delete `.venv` folder and pip it again which synced files again to the main machine. This was very frustrating. **Nodejs** handles this much nicer and I can just run the scripts without deleting `node_modules` again and reinstalling. However, `node_modules` is a beast of its own. It creates so many files that OS has a problem counting them when you check the folder contents for size.
12
13I wanted something similar to Dropbox. I could without the instant syncing but it would need to be fast and had the option for me to exclude folders like `node_modules, .venv, .git` and folders like that.
14
15I went on a hunt for an alternative to [Dropbox](https://www.dropbox.com/) and found:
16
17- [Tresorit](https://tresorit.com/)
18- [Sync.com](https://sync.com)
19- [Box](https://www.box.com/)
20
21You know, the usual list of suspects. I didn't include [Google drive](https://drive.google.com) or [One drive](https://onedrive.live.com/) since they are even more draconian than Dropbox.
22
23> All this does not stem from me being paranoid but recently these companies have became more and more aggressive and they keep violating our privacy when they share our data with 3rd party services. It is getting out of control.
24
25So, my main problem was still there. No way of excluding a specific folder from syncing. And before we go into "*But you have git, isn't that enough?*", I must say, that many of the files (PDFs, spreadsheets, etc) I have in a `git` repo don't get pushed upstream to Git and I still want to have them synced across my computers.
26
27I initially wanted to use [rsync](https://linux.die.net/man/1/rsync) but I would need to then have a remote VPS or transfer between my computers directly. I wanted a solution where all my files could be accessible to me without my machine.
28
29> **WARNING: This solution will cost you money!** DigitalOcean Spaces are $5 per month and there are some bandwidth limitations and if you go beyond that you get billed additionally.
30
31Then I remembered that I could use something like [S3](https://en.wikipedia.org/wiki/Amazon_S3) since it has versioning and is fully managed. I didn't want to go down the AWS rabbit hole with this so I choose [DigitalOcean Spaces](https://www.digitalocean.com/products/spaces/).
32
33Then I needed a command-line tool to sync between source and target. I found this nice tool [s3cmd](https://s3tools.org/s3cmd) and it is in the Ubuntu repositories.
34
35```bash
36sudo apt install s3cmd
37```
38
39After installation will I create a new Space bucket on DigitalOcean. Remember the zone you will choose because you will need it when you will configure `s3cmd`.
40
41Then I visited [Digitalocean Applications & API](https://cloud.digitalocean.com/account/api/tokens) and generated **Spaces access keys**. Save both key and secret somewhere safe because when you will leave the page secret will not be available anymore to you and you will need to re-generate it.
42
43```bash
44# enter your key and secret and correct endpoint
45# my endpoint is ams3.digitaloceanspaces.com because
46# I created my bucket in Amsterdam regiin
47s3cmd --configure
48```
49After that I played around with options for `s3cmd` and got to the following command.
50
51```bash
52# I executed this command from my projects folder
53cd projects
54s3cmd sync --delete-removed --exclude 'node_modules/*' --exclude '.git/*' --exclude '.venv/*' ./ s3://my-bucket-name/projects/
55```
56
57When syncing int he other direction you will need to change the order of the `SOURCE` and `TARGET` to `s3://my-bucket-name/projects/` and `./`.
58
59> Be sure that all the paths have trailing slash so that sync knows that this are directories.
60
61I am planning to implement some sort of a `.ignore` file that will enable me to have a project-specific exclude options.
62
63I am currently running this every hour as a cronjob which is perfectly fine for now when I am testing how this whole thing works and how it all will turn out.
64
65I have also created a small Gnome extension which is still very unstable, but when/if this whole experiment pays of I will share on Github.
diff --git a/content/2020-12-25-weekly-newsletter.md b/content/2020-12-25-weekly-newsletter.md
deleted file mode 100644
index 7f6e873..0000000
--- a/content/2020-12-25-weekly-newsletter.md
+++ /dev/null
@@ -1,10 +0,0 @@
1~ title: Weekly newsletter
2~ slug: /weekly-newsletter.html
3~ date: 2020-12-25
4~ template: page
5~ hide: true
6
7🕵🏼 [You can check old emailing archive sent here.](/weekly-newsletter-archive/)
8
9
10<iframe src="https://cdn.forms-content.sg-form.com/b9a1dea3-465a-11eb-af61-0e1900a266f8" width="610px" height="700" frameborder="0" scrolling="no"></iframe>