aboutsummaryrefslogtreecommitdiff
path: root/_posts
diff options
context:
space:
mode:
authorMitja Felicijan <mitja.felicijan@gmail.com>2019-02-17 21:53:36 +0100
committerMitja Felicijan <mitja.felicijan@gmail.com>2019-02-17 21:53:36 +0100
commit8e9ef5ba62b8bee028428384ad5666e245eb854c (patch)
treeb382c5b40f122b2a152da2226006abab34abe105 /_posts
parentad974810d43e1d5f70bca269665c25230e6a3221 (diff)
downloadmitjafelicijan.com-8e9ef5ba62b8bee028428384ad5666e245eb854c.tar.gz
content update
Diffstat (limited to '_posts')
-rw-r--r--_posts/2015-11-10-software-development-pitfalls.md85
-rw-r--r--_posts/2016-10-14-how-we-successfully-destroyed-the-joy-of-product-development.md39
-rw-r--r--_posts/2017-01-12-gce-aws-docker-and-why-i-choose-classic-vms-and-digitalocean.md50
-rw-r--r--_posts/2017-03-07-golang-profiling-simplified.md121
-rw-r--r--_posts/2017-04-10-what-its-like-to-be-a-software-developer.md33
-rw-r--r--_posts/2017-04-17-what-i-ve-learned-developing-ad-server.md144
-rw-r--r--_posts/2017-04-21-profiling-python-web-applications-with-visual-tools.md192
-rw-r--r--_posts/2017-08-11-simple-iot-application.md499
-rw-r--r--_posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md271
-rw-r--r--_posts/2018-08-05-the-bullshit-web-developments-pov.md113
-rw-r--r--_posts/2019-01-03-encoding-binary-data-into-dna-sequence.md415
11 files changed, 0 insertions, 1962 deletions
diff --git a/_posts/2015-11-10-software-development-pitfalls.md b/_posts/2015-11-10-software-development-pitfalls.md
deleted file mode 100644
index ffee159..0000000
--- a/_posts/2015-11-10-software-development-pitfalls.md
+++ /dev/null
@@ -1,85 +0,0 @@
1---
2
3layout: post
4title: Software development and my favorite pitfalls
5description: Couple of observations regarding project management.
6
7---
8
9**Table of contents**
10
111. [Initial thoughts](#initial-thoughts)
122. [Ping emails](#ping-emails)
133. [Everybody is a project manager](#everybody-is-a-project-manager)
144. [We are never wrong](#we-are-never-wrong)
155. [Micromanaging](#micromanaging)
166. [Human contact - no need for it!](#human-contact---no-need-for-it)
177. [MVP is killing innovation](#mvp-is-killing-innovation)
188. [Pressure wasteland](#pressure-wasteland)
199. [Conclusion](#conclusion)
20
21## Initial thoughts
22
23Over the years I had privilege to work on some very excited projects both in software development field and also in electronics field and every experience taught me some invaluable lessons about how NOT TO approach development. And through this post I will try to point out some of the absurd outdated techniques I find the most annoying and damaging during a development cycle. There will be swearing because this topic really gets on my nerves and I never coherently tried to explain them in writing. So if I get heated up please bare with me :)
24
25As new methods of project management are emerging, underlaying processes still stay old and outdated. This is mainly because we as people are unable to completely shift away from this approaches.
26
27I was always struggling with communication, and many times that cost me a relationship or two because I was not on the ball all the time. Through every experience I became more convinced that I am the problem and never ever doubted that the problem may be that communication never evolved a single step from emails. And if you think for a second, not many thing have changed around this topic. We just have different representations of email (message boards, chats, project management tools). And I believe this is the real problem we are facing now.
28
29There are many articles written about hyper connectivity and the effects that are a direct result of it. But mainstream does nothing towards it. We are just putting out fires and we do nothing to prevent it. I am certain this will be a major source of grief in coming years. And what we all can do to avoid this is to change our mindset and experiment on our communication skills, development approaches. We need to maximize possible output that a person can give. And to achieve this we need to listen to them, encourage them. I know that not everybody is a naturally born leader, but everybody has an opinion.
30
31There are many talks right now about methodologies such as Scrum, Kanban, Cleanroom and they all fucking piss me of :). These are all boxes that imprison people and take away their freedom of thought. This is a straight forward mindfuck / amputation of creativity.
32
33Let me list a couple of things that I find really destructive and bad for a project and in a long run company.
34
35## Ping emails
36
37Ping emails are emails you have to write as soon as you receive an email. It’s sole purpose is to inform sender that you received their email and you are working on it. It’s result is only to calm down the sender that their task is being dealt with. It’s intent basically is, I did my job by sending you this email so I am on clear ground. I categorize this email as fuck you email. This is one of the most irritating types of emails I need to write. This is the ultimate control freak show you can experience and it gives sender false feeling of control. Newsflash: We do not live in 1982 where there was a possibility that email never reached destination. I really fucking hate this from the bottom of my heart :)
38
39They should be like: “Yes, I am fucking alive and I am at your service my leash!”. I guess if I would reply like this, I wouldn’t have to write any more of this kind of messages :)
40
41## Everybody is a project manager
42
43Well, this is a tough one. I noticed that as soon as you let people to give their suggestions you are basically fucked. There is a truth in saying: “Give low expectations and deliver little bit more you promised.”.
44
45People tend to take a role of a manager as soon as they are presented with an opportunity. And by getting angry at them you only provoke yourself. They are not at fault. You just need to tell them they are only giving suggestions and not tasks at the beginning and everything will be alright. But if you give them a feeling that they are in control you will have immense problems explaining why their features are not in current release.
46
47Project mission must be always on the top of project requirements and any deviation from it will result in major project butchering. And by this I mean that project will get it’s own path and you will be left with half done software that helps nobody. Clear mission goal and clean execution will allow you to develop software will clear intent.
48
49## We are never wrong
50
51I find this type of arrogance the worst. We must always conduct ourselves that we are infallible and cannot make mistakes. As soon as procedure or process is established there is no room for changes or improvements. This is the most idiotic thing someone can say of think. If think that processes need to involve and change over time. This is imperative and need to have in your organization if you want to improve and develop company. We all need to grow balls and change everything in order to adapt to current situations. Being a prisoner of predefined processes kills creativity.
52
53I am constantly trying new software for project managing and communication. I believe every team has it’s own dynamic and it needs to be discovered organically and naturally through many experiments. By putting team in a box you are amputating their creativity and therefore minimizing their potential. But if you talk to an executive you will mainly find archetypical thinking and a strong need to compartmentalize everything from business processes to resource management. And this type of management that often displays micro management technique only works on short periods (couple of years) and then employees either leave company or become basically retarded drones on auto pilot.
54
55## Micromanaging
56
57This basically implies that everybody on the team is a fucking idiot that needs to have a todo list that they can not write themselves. How about spoon feeding the team at launch because besides the team leader everybody must be a retarded idiot at best.
58
59I prefer milestones as they give developers much more freedom and creativity developing and not waste their time checking some bizarre todo list that was not even thought through. Project always changes through development cycle and all you are left at the end is a list of unchecked tasks and the wrath of management why they are not completed. Best WTF moment!
60
61## Human contact - no need for it!
62
63We are vigorously trying to eliminate physical contact by replacing short meetings with software with no regards that we are not machines. Many times a simple 5min meeting at morning can solve most of the problems. In rapid development short bursts of man to man communication is possibly the best way to go.
64
65We now have all this software available and all what we get out of it is a huge clusterfuck. An obstacle and not a solution. So why we still use them? Because we strive to better ourselves.
66
67## MVP is killing innovation
68
69Many will disagree with me on this one but I stand strong by this statement. What I noticed in my experience that all this buzz words surrounding us only mislead and capture you in a circle of solving a problem that already has a solution but we are unable to see it without using some fancy word for it. The toughest this to do for a developer is to minimize requirements. Well this is though only for bad developers. Yes, I said is. There are many types of developers out there. And those unable to minimize feature scope are the ones you don’t need on your team. Their only goal is to solve problems that exist only in their fucking heads. And than you have to argue with them and waste energy on them instead of developing your awesome product. They are a cancer and I suggest you cut them off.
70
71MVP as an idea is great but sadly people don’t understand underlaying philosophy and they spent too much time focusing and fixating on something that every sane person with normal IQ will understand without some made up acronym. And the result is a lot of talking and barely no execution.
72
73Well MVP is not directly killing innovation but stupid people do when they try to understand it.
74
75## Pressure wasteland
76
77You must never allow to be pressured into confirming a deadline if you are not sure. We often feel a need that we are in service of others which is true to some extent. But it is also true that others are in service to us to some extent. And we forget this. We are all pressured all the time to make decisions just to calm other people down. And when they leave your office you experience WTF moment :) How the hell did they manage to fuck me up again :)
78
79People need to realize that more pressure you put on somebody less they will be able todo. So 5 min update email requests will only resolve in mental breakdown and inability to work that day. Constant poking is probably the only thing I loose my mind instantly. For all you that are doing this: “We are not fucking idiots and stop bothering us with your own insecurities and let us do our job. We will do it quicker and better without you moron breathing on our necks.”
80
81If this happens to me I end up with no energy at the end. Don’t you get it? You will get much more from and out of me if you ask me like a human person and not your personal butler. On a long run you are destroying your relationships and nobody would want to work with you. Your schizophrenic approach will damage only you in a long run. Nobody is anybody’s property.
82
83## Conclusion
84
85I am guilty of many things described in this post. And I find it hard sometimes to acknowledge this. And I lie to myself and try vigorously to find some explanation why I do this things. There is always space for growth. And maybe you will also find some of yourself in this post and realize what needs to change in order to evolve.
diff --git a/_posts/2016-10-14-how-we-successfully-destroyed-the-joy-of-product-development.md b/_posts/2016-10-14-how-we-successfully-destroyed-the-joy-of-product-development.md
deleted file mode 100644
index 45028ad..0000000
--- a/_posts/2016-10-14-how-we-successfully-destroyed-the-joy-of-product-development.md
+++ /dev/null
@@ -1,39 +0,0 @@
1---
2
3layout: post
4title: How we successfully destroyed the joy of product development
5description: My take on project development.
6
7---
8
9No matter how hard we try to reinvent processes in software development we still haven’t found perfect solution for this. And to dismiss SDLC just because it’s something old is as ridiculous as the concept of designers being user experience gurus. As I have written couple of times before designers have their place and is not in the UX community. Most of them probably never heard of Jakob Nielsen and this proves a lot. Don’t get me wrong. There are designers out there that are absolutely amazing in what they do, but most of them are not. Good design has little to do with how things look in my opinion. But it has very much to do with how product behaves. And to take a chance on design look only is scary to me.
10
11I have this huge beef with so called UX “experts”. I really do. From the bottom of my heart. I almost hate them. Well, not the pure ghetto ones. There are many of them out there I am sure of. But I have not had the pleasure to work with such person.
12
13Good UX expert should have programming background and an eye for design. Being UX expert requires you to be analytical and precise. Not really qualities of designers. Design is much more about the feeling and emotional perception. And this two don’t dance well together.
14
15Natural progression of project focused on user should be:
16
17- detailed requirements and fantastic prototypes/wireframes with detailed user journey diagrams,
18- design focused and restricted to serve requirements,
19- code written just to fulfill design and requirements → nothing more and nothing less → no additional dead code should be allowed,
20- testing should be done on all targeted devices → avoid bugs and you will avoid brand failure.
21
22Designer should never be allowed to have blank canvas. Good software is written because there are many restrictions either in requirements or real world. And most importantly → good software is solving only one problem at the time. I don’t see why this shouldn’t apply to design as well.
23
24Yes yes we get it, but we don’t have the time or the money to do project development like that. Well, you better find it or you will slowly decline into abyss of mediocre companies that have nothing to show for. Clients are not dumb and are in need of quality products and services. It is not enough anymore for a product that just works. It has to be technically precise and functionally on the spot.
25
26When developers and designers are forced to think and work from the scratch many new doors open. New ideas are born how to solve problems that were previously not possible because they were living in a box of limited thought and patterns. If you solve problems always only with your knowledge nothing new can be invented. When there is no room for experimentation there is no room for improvement. You want your developers and designers to be this fountain of innovation and you don’t really let them innovate, you are just slowly closing front doors of your company. Good developers and designers are hard to find and even easier to loose.
27
28Being agile does not mean to be a slave of constant changes. It does not mean that project managers can constantly change requirements at their will. And it sure does not mean that clear vision on product direction should be something we said goodby to. We have perverted initial intention of Manifesto for Agile Software Development as we always do. We have taken it so far and we have all become slaves of advertisement by consulting companies trying to cash in on this “new - but old” concept.
29
30Manifesto for Agile Software Development states:
31
32- individuals and interactions over processes and tools,
33- working software over comprehensive documentation,
34- customer collaboration over contract negotiation,
35- responding to change over following a plan.
36
37This was written in times when software was developed very differently than how we do it now. We have eliminated many of the problems from old age just by listening to reason and not trendy hyped words that are just tools of marketing strategist to avoid the real issues. Being flat, being agile, being stupid is what I say.
38
39Development and design should be about improving yourself and consequently product you are working on. When this becomes a chore you should probably start thinking about changing companies. People make products not management.
diff --git a/_posts/2017-01-12-gce-aws-docker-and-why-i-choose-classic-vms-and-digitalocean.md b/_posts/2017-01-12-gce-aws-docker-and-why-i-choose-classic-vms-and-digitalocean.md
deleted file mode 100644
index c55af32..0000000
--- a/_posts/2017-01-12-gce-aws-docker-and-why-i-choose-classic-vms-and-digitalocean.md
+++ /dev/null
@@ -1,50 +0,0 @@
1---
2
3layout: post
4title: GCE, AWS, Docker and why I choose classic VM’s and DigitalOcean for my current project
5description: Reasons why I choose DigitalOcean for my project
6
7---
8
9**Table of contents**
10
111. [Docker tools and complexity that comes with it](#docker-tools-and-complexity-that-comes-with-it)
122. [Lack of real life examples of Docker in action](#lack-of-real-life-examples-of-docker-in-action)
133. [Ease of deployment](#ease-of-deployment)
144. [Where to go from here](#where-to-go-from-here)
15
16I have been developing a product for the past few months and one of product’s requirement is the ability to automatically scale quickly on system’s demand.
17
18As most of you probably know system design is much more important then actual code that will drive the product. And this was my main concern when developing this product. I have read anything I could get my hands on about Docker as it was hyped so much in media for the past two years. At a first glance Docker was ideal fit for this platform. But then as I started to seriously experiment with it and developing around it several problems occurred. Well, it would be unfair to call them problems but lets say drawbacks when developing rapidly.
19
20To put it in perspective: this project is basically MVP that needs to automatically scale when new customers signs up. These customers are sending metrics to my system that is later visualized and analyzed. There were some basic requirements that I needed the answer before I choose technology.
21
22- Pricing involving hardware and infrastructure.
23- Ease of implementation/deployment and scaling.
24- How much will this cost me per customer?
25
26The way I envisioned the architecture was straight forward → simple nodes in cluster that take care of x number of customers (1 node ~ 10 customers). I found that pricing in GCE and AWS is very hard to predict → what the cost will be when system would scale. And this was necessary for me to know in order to make financial projection of costs. This is the most important thing for me at this time as I am deciding on prices we should charge future customers and establish healthy revenue model and subsequently business model. I want this product to organically scale and fuel its future development with money made by product itself → very little startup capital (10 nodes for six months & capital for company expanses). I have made many simulations but could not figure out with at least some certainty what will that cost be. Based on this both of the providers are currently not suited for me. So I choose DigitalOcean. They have really straight forward pricing model and this allowed me to make pretty accurate cost matrix for my infrastructure.
27
28I love hard metrics. By this I mean metrics I can test now and have trust they will hold in the future. This was the reason I found Docker too volatile as containers are spawned and halted and there is really no way in predicting this numbers. I have no problem with spawning multiple VMs and not using them but having basically limited control over that is at this time unacceptable for me.
29
30## Docker tools and complexity that comes with it
31
32Probably some of you will correct me on this one, but I find all this management tools like Kubernetes, Swarm etc a bit overkill for a startup project. All this tools are able to scale really massively but they all require extensive knowledge of DevOps. When you are a one man band trying to push a product out, there just is no time to learn these tools and concepts in depth in order to really take advantage of their features. It is much easier to use internal metrics of your app (uwsgi stats server, golang middleware stats) and simply fetch them to one server and visualize them. That task alone took me couple of hours and I had simple metrics system in place that with collaboration of DigitalOcean API enabled me to auto spawn new VMs on demand when users reached max number of users supported by current number of nodes. There is something to say about simplicity of this solution. And I love simple solutions.
33
34## Lack of real life examples of Docker in action
35
36I found many HelloWorld examples and tutorials showing how to spawn containers and deploying simple python apps but I haven’t found really clear example of showing how to battle permanent storage with containers, load balancing, disk management, ip & port management.
37
38This is not Docker’s nor community’s fault to be absolutely clear. It just shows that it is not that simple to deploy real-world application with Docker. Maybe my software architecture is not designed with Docker in mind.
39
40## Ease of deployment
41
42What I really love about Docker is ease of deployment of your application code via container. Multilayered architecture of Docker images also adds to pros list. And the fact that containers sit on top of host OS makes it very intriguing. But if you use container engine from Google you basically spawn VM’s and run containers in this machines and this takes bare-metal approach out of the equation. So at the end you still use hypervisior. I guess if I had my own hardware servers I would be able to fully take advantage of containers.
43
44Because most of my code in nodes is written in Golang and C++ deployment becomes pretty easy. All I have to do is replace binaries on node and that’s that. To avoid downtime I have two instances of one node and I load balance between them. So when I am updating software I first update on node1.A and then node1.B if first one is successful.
45
46## Where to go from here
47
48Docker is amazing technology. But the weird pricing model and steep learning curve for deployment of real live application at this time is too much of a hassle for me. I am sure I could lower costs with Docker approach but it would just took too much time at this stage to implement it properly.
49
50I am currently trying to adapt my project to fit Docker and I believe this would be an interesting solution. Idea is to use one container for one customer. I would just need to find the solution for auto-spawning containers on demand for a specific customer. I would then need a flexible load balancer to correctly forward traffic to container designated for this customer. The problem I have is that I need very flexible storage solution because the amount of data that will be aggregated will scale exponentially and I need to permanently store this on disk. And VM approach is allowing me to precisely calculate per customer per VM how much disk I need. Maybe one of you may have a better solution.
diff --git a/_posts/2017-03-07-golang-profiling-simplified.md b/_posts/2017-03-07-golang-profiling-simplified.md
deleted file mode 100644
index 4c7266c..0000000
--- a/_posts/2017-03-07-golang-profiling-simplified.md
+++ /dev/null
@@ -1,121 +0,0 @@
1---
2
3layout: post
4title: Golang profiling simplified
5description: Golang profiling made easy
6
7---
8
9**Table of contents**
10
111. [Where are my pprof files?](#where-are-my-pprof-files)
122. [Why is my cpu profile empty?](#why-is-my-cpu-profile-empty)
133. [Profiling](#profiling)
14 1. [Memory profiling](#memory-profiling)
15 2. [CPU profiling](#cpu-profiling)
16 3. [Generating profiling reports](#generating-profiling-reports)
17
18Many posts have been written regarding profiling in Golang and I haven’t found proper tutorial regarding this. Almost all of them are missing some part of important information and it gets pretty frustrating when you have a deadline and are not finding simple distilled solution.
19
20Nevertheless, after searching and experimenting I have found a solution that works for me and probably should also for you.
21
22## Where are my pprof files?
23
24By default pprof files are generated in /tmp/ folder. You can override folder where this files are generated programmatically in your golang code as we will see below in example.
25
26## Why is my cpu profile empty?
27
28I have found out that sometimes CPU profile is empty because program was not executing long enough. Programs, that execute too quickly don’t produce pprof file in my cases. Well, file is generated but only contains 4KB of information.
29
30## Profiling
31
32As you can see from examples we are executing dummy_benchmark functions to ensure some sort of execution. Memory profiling can be done without such a “complex” function. But CPU profiling needs it.
33
34Both memory and CPU profiling examples are almost the same. Only parameters in main function when calling profile.Start are different. When we set profile.ProfilePath(“.”) we tell profiler to store pprof files in the same folder as our program.
35
36### Memory profiling
37
38```go
39package main
40
41import (
42 "fmt"
43 "time"
44 "github.com/pkg/profile"
45)
46
47func dummy_benchmark() {
48
49 fmt.Println("first set ...")
50 for i := 0; i < 918231333; i++ {
51 i *= 2
52 i /= 2
53 }
54
55 <-time.After(time.Second*3)
56
57 fmt.Println("sencond set ...")
58 for i := 0; i < 9182312232; i++ {
59 i *= 2
60 i /= 2
61 }
62}
63
64func main() {
65 defer profile.Start(profile.MemProfile, profile.ProfilePath("."), profile.NoShutdownHook).Stop()
66 dummy_benchmark()
67}
68```
69
70### CPU profiling
71
72```go
73package main
74
75import (
76 "fmt"
77 "time"
78 "github.com/pkg/profile"
79)
80
81func dummy_benchmark() {
82
83 fmt.Println("first set ...")
84 for i := 0; i < 918231333; i++ {
85 i *= 2
86 i /= 2
87 }
88
89 <-time.After(time.Second*3)
90
91 fmt.Println("sencond set ...")
92 for i := 0; i < 9182312232; i++ {
93 i *= 2
94 i /= 2
95 }
96}
97
98func main() {
99 defer profile.Start(profile.CPUProfile, profile.ProfilePath("."), profile.NoShutdownHook).Stop()
100 dummy_benchmark()
101}
102```
103
104### Generating profiling reports
105
106```bash
107# memory profiling
108go build mem.go
109./mem
110go tool pprof -pdf ./mem mem.pprof > mem.pdf
111
112# cpu profiling
113go build cpu.go
114./cpu
115go tool pprof -pdf ./cpu cpu.pprof > cpu.pdf
116```
117
118This will generate PDF document with visualized profile.
119
120- [Memory PDF profile example](/files/golang-profiling-mem.pdf)
121- [CPU PDF profile example](/files/golang-profiling-cpu.pdf)
diff --git a/_posts/2017-04-10-what-its-like-to-be-a-software-developer.md b/_posts/2017-04-10-what-its-like-to-be-a-software-developer.md
deleted file mode 100644
index f18f5dd..0000000
--- a/_posts/2017-04-10-what-its-like-to-be-a-software-developer.md
+++ /dev/null
@@ -1,33 +0,0 @@
1---
2
3layout: post
4title: What it's like to be a software developer
5description: Couple of observations regarding project management
6
7---
8
9I get asked a lot what the hell I actually do. I find it funny but I guess it is my fault in most cases. I try not to be the kind of a man that is always talking about his work. I live in a small village and most of my neighbours probably have no idea what I actually do. And I am ok with that. I prefer this. But on some occasions I find it disturbing how people judge other people just because they don't understand what they are all about. Many of them probably think I am some strange kind of a looser that is awake all the time and works from home. He probably plays games and type on a computer :) What kind of a job is that? That is no job at all! :) You work for eight hours, then you go home and drink a beer and go work in your workshop. This is what real men do!
10
11Well, you know. It's just the way it is. And it takes time for people to understand. Being home after many years in living elsewhere really grounded me in some cases. Coming back to the place where you grew up brings some sort of a humility back in your life. And this is ok. Nobody want's to be Icarus anyways.
12
13What I am meaning to say is if you are in a similar situation as me it will take time for people to start understanding you. Don't get discouraged by this. Take it as it is. People judge what they don't understand.
14
15I have this saying that sleeping is for pussies and we will sleep when we die. I am 32 years old now and I haven't slowed down regarding my work hours. I have steped up the pace. I usually work for about 16-18 hours a day every day. It doesn't matter if it's Monday or Saturday. Work needs to be done.
16
17I know that there are other ways. But if you want to be good there really is no other way. There are no shortcuts. There is no easier way to get to the point where you really know what the hell you are doing. Myth about this genius programmer truly is one huge bullshit. Without putting in the hours nothing can be achieved. There is no success without dedication.
18
19My friends and coworkers often ask me how the hell did I learn so much stuff. Where do I find the time to go through all this material. And I have a simple response for them: "When you go to sleep I begin reading and prototyping. When you go on a trip I make prototype projects just for the sake of learning. When you take your time for fucking around I read articles and books hunting that single small piece of information that will help me one day." And often they don't believe me. They think I am just that smart and everything is easy for me. They have this misguided belief that I just had all this knowledge implanted in me at birth. And this is not the case. I have read so much in my lifetime and most of this information was useful to me later in my life. But that didn't stop me even though I had no immediate use of it. This probably is the main difference between me and my friends. I don't learn because I need to but because I am piecing together this huge puzzle and I threat is like a game. This amazing game of enlightenment.
20
21I had many burn-downs in my career. Most of them come around new years. I guess around this time things slow down a bit and right then when you relax for a minute or two things get real :). They say when you enter your retirment you should never ever park your ass on a couch. You will die there :) When my burndown happens I fall into this huge depression and I start questioning my sanity. I question my decisions. I question my progress in life. I question everything. I try to understand if all this is worth it?! And every time this happens I struggle with this kind of questions. And by the time all this is over I come to the same conclusion every single time. Yes it fucking is worth it. And through the years I have noticed that this is some sort of a reset for me. This helps me maintain my sanity in the long run :) I love it when things get tough. It gets me to the next level. This teaches me progress is life.
22
23I don't even count anymore how many programming languages I have learned. I even stop noticing projects. They just fly by. It's like I am hunting this revelation that is set for me. And this drives me. This helps me every day to step up my game. Every single problem I solve I come little closer to my goal. My never reaching goal. And it's ok with me if I never reach this goal.
24
25The only problem I have now is time. There just ain't enough time to learn everything day has to offer. It's like I am on a quest to become this mini search machine :).
26
27This obsession with learning has come to the point where I stopped watching TV and news all together. I find this as noise that clutters your mind. The whole point about news is to frighten you and put your mind into a dangerous loop where you thinks that nothing matters anyways → world is going to shit. And the truth is so far away from this. We are living in this times where all this amazing possibilities are at hand. We just need to take control of our mindset and everything starts to look possible again.
28
29What else can say after more than 10 years in this space? What else can be said anyways? I still love what I do as much as I did 10 years ago. I love it even more. And if I would have a single suggestion for all of you is to stop worrying about immediate benefits and focus on the long run. Learn, prototype, experiment and have fun. We all get frustrated at times but that doesn't mean we should stop. Doing this kind of work is a privilege. We are making and creating. In the most pure sense we are creators. And there really is no better way to live your life.
30
31> A life without challenge, a life without hardship, a life without purpose, seems pale and pointless. With challenge come perseverance and gumption. With hardship come resilience and resolve. With purpose come strength and understanding.
32>
33> — Terry Fallis, The High Road
diff --git a/_posts/2017-04-17-what-i-ve-learned-developing-ad-server.md b/_posts/2017-04-17-what-i-ve-learned-developing-ad-server.md
deleted file mode 100644
index bfacd6f..0000000
--- a/_posts/2017-04-17-what-i-ve-learned-developing-ad-server.md
+++ /dev/null
@@ -1,144 +0,0 @@
1---
2
3layout: post
4title: What I've learned developing ad server
5description: Lessons I learned developing contextual ad server
6
7---
8
9**Table of contents**
10
111. [Aggregate everything](#aggregate-everything)
122. [Measure everything](#measure-everything)
133. [Cache control is your friend](#cache-control-is-your-friend)
144. [Learn NGINX](#learn-nginx)
155. [Use Redis/Memcached](#use-redismemcached)
166. [Conclusion](#conclusion)
17
18For the past year and half I have been developing native advertising server that contextually matches ads and displays them in different template forms on variety of websites. This project grew from serving thousands of ads per day to millions.
19
20The system is made from couple of core components:
21
22- API for serving ads,
23- Utils - cronjobs and queue management tools,
24- Dashboard UI.
25
26Initial release was using [MongoDB](https://www.mongodb.com/) for full-text search but was later replaced by [Elasticsearch](https://www.elastic.co/) for better CPU utilization and better search performance. This provided us with many amazing functionalities of [Elasticsearch](https://www.elastic.co/). You should check it out if you do any search related operations.
27
28Because the premise of the server is to provide native ad experience, they are rendered on the client side via simple templating engine. This ensures that ads can be displayed number of different ways based on the visual style of the page. And this makes Javascript client library quite complex.
29
30So now that you know basic information about the product lets get into the lessons we learned.
31
32## Aggregate everything
33
34After beta version was released everything (impressions, clicks, etc) was written in nanosecond resolution in the database. At that time we were using [PostgreSQL](https://www.postgresql.org/) and database quickly grew way above 200GB in disk space. And that was problematic. Statistics took disturbingly long time to aggregate. Also using indexes on stats table in database was no help after we reached 500 million datapoints.
35
36> There is a marketing product information and there is real life experience. And the tend to be quite the opposite.
37
38This was the reason that now everything is aggregated on daily basis and this data is then fed to Elastic in form of daily summary. With this we achieved we can now track many more dimensions such as zone, channel and platform information. And with this information we can now adapt occurrences of ads on specific places more precisely.
39
40We have also adapted [Redis](https://redis.io/) as a full-time citizen in our stack. Because Redis also stores information on a local disk we have some sort of backup if server would accidentally suffer some failure.
41
42All the real-time statistics for ad serving and redirecting is presented as counters in Redis instance and daily extracted and pushed to Elastic.
43
44## Measure everything
45
46The thing about software is that we really don't know how well it is performing under load until such load is presented. When testing locally everything is fine but when on production things tend to fall apart.
47
48As a solution for this we are measuring everything we can. Function execution time (by encapsulating functions with timers), server performance (cpu, memory, disk, etc), Nginx and [uWSGI](https://uwsgi-docs.readthedocs.io/) performance. We sacrifice a bit of performance for the sake of this information. And we store all this information for later analysis.
49
50**Example of function execution time**
51
52```json
53{
54 "get_final_filtered_ads": {
55 "counter": 1931250,
56 "avg": 0.0066143431,
57 "elapsed": 12773.9500310003
58 },
59 "store_keywords_statistics": {
60 "counter": 1931011,
61 "avg": 0.0004605267,
62 "elapsed": 889.2821669996
63 },
64 "match_by_context": {
65 "counter": 1931011,
66 "avg": 0.0055960716,
67 "elapsed": 10806.0758889999
68 },
69 "match_by_high_performance": {
70 "counter": 262,
71 "avg": 0.0152770229,
72 "elapsed": 4.00258
73 },
74 "store_impression_stats": {
75 "counter": 1931250,
76 "avg": 0.0006189991,
77 "elapsed": 1195.4419869999
78 }
79}
80```
81
82We have also started profiling with [cProfile](https://pymotw.com/2/profile/) and then visualizing with [KCachegrind](http://kcachegrind.sourceforge.net/). This provides much more detailed look into code execution.
83
84## Cache control is your friend
85
86Because we use Javascript library for rendering ads we rely on this script extensively and when in need we need to be able to change behavior of the script quickly.
87
88In our case we can not simply replace javascript url in html code. It usually takes a day or two for the guys who maintain sites to change code or add ?ver=xxx attribute. And this makes rapid deployment and testing very difficult and time consuming. There is a limitation of how much you can test locally.
89
90We are now in the process of integrating [Google Tag Manager](https://www.google.com/analytics/tag-manager/) but couple of websites are developed on ASP.net platform that have some problems with tag manager. With a solution below we are certain that we are serving latest version of the script.
91
92And it only takes one mistake and users have the script cached and in case of caching it for 1 year you probably know where the problem is.
93
94```nginx
95# nginx ➜ /etc/nginx/sites-available/default
96location /static/ {
97 alias /path-to-static-content/;
98 autoindex off;
99 charset utf-8;
100 gzip on;
101 gzip_types text/plain application/javascript application/x-javascript text/javascript text/xml text/css;
102 location ~* \.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)$ {
103 expires 1y;
104 add_header Pragma public;
105 add_header Cache-Control "public";
106 }
107 location ~* \.(css|js|txt)$ {
108 expires 3600s;
109 add_header Pragma public;
110 add_header Cache-Control "public, must-revalidate";
111 }
112}
113```
114
115Also be careful when redirecting to url in your python code. We noticed that if we didn't precisely setup cache control and expire headers in response we didn't get the request on the server and therefore couldn't measure clicks. So when redirecting do as follows and there will be no problems.
116
117```python
118# python ➜ bottlepy web micro-framework
119response = bottle.HTTPResponse(status=302)
120response.set_header("Cache-Control", "no-store, no-cache, must-revalidate")
121response.set_header("Expires", "Thu, 01 Jan 1970 00:00:00 GMT")
122response.set_header("Location", url)
123return response
124```
125
126> Cache control in browsers is quite aggressive and you need to be precise to avoid future problems. We learned that lesson the hard way.
127
128## Learn NGINX
129
130When deciding on a web server we went with Nginx as a reverse proxy for our applications. We adapted micro-service oriented architecture early in the project to ensure when we scale we can easily add additional servers to our cluster. And Nginx was crucial to perform load balancing and static content delivery.
131
132At first our config file was quite simple and later grew larger. After patching and adding new settings I sat down and learned more about the guts of Nginx. This proved to be very useful and we were able to squeeze much more out of our setup. So I advise you to take your time and read through the [documentation](https://nginx.org/en/docs/). This saved us a lot of headache. Googling for solutions only goes so far.
133
134## Use Redis/Memcached
135
136As explained above we are using caching basically for everything. It is the corner stone of our services. At first we were very careful about the quantity of things we stored in [Redis](https://redis.io/). But we later found out that the memory footprint is very low even when storing large amount of data in it.
137
138So we gradually increased our usage to caching whole HTML outputs of dashboard. This improved our performance in order of magnitude. And by using native TTL support this goes hand in hand with our needs.
139
140The reason why we choose [Redis](https://redis.io/) over [Memcached](https://memcached.org/) was the nature of scalability of Redis out of the box. But all this can be achieved with Memcached.
141
142## Conclusion
143
144There are a lot more details that could have been written and every single topic in here deserves it's own post but you probably got the idea about the problems we faced.
diff --git a/_posts/2017-04-21-profiling-python-web-applications-with-visual-tools.md b/_posts/2017-04-21-profiling-python-web-applications-with-visual-tools.md
deleted file mode 100644
index dcd8ce1..0000000
--- a/_posts/2017-04-21-profiling-python-web-applications-with-visual-tools.md
+++ /dev/null
@@ -1,192 +0,0 @@
1---
2
3layout: post
4title: Profiling Python web applications with visual tools
5description: Missing link when debugging and profiling python web applications
6
7---
8
9**Table of contents**
10
111. [Simple web-service](#simple-web-service)
122. [Visualize profile](#visualize-profile)
133. [Update 2017-04-22](#update-2017-04-22)
14
15I have been profiling my software with KCachegrind for a long time now and I was missing this option when I am developing API's or other web services. I always knew that this is possible but never really took the time and dive into it.
16
17Before we begin there are some requirements. We will need to:
18
19- implement [cProfile](https://docs.python.org/2/library/profile.html#module-cProfile) into our web app,
20- convert output to [callgrind](http://valgrind.org/docs/manual/cl-manual.html) format with [pyprof2calltree](https://pypi.python.org/pypi/pyprof2calltree/),
21- visualize data with [KCachegrind](http://kcachegrind.sourceforge.net/html/Home.html) or [Profiling Viewer](http://www.profilingviewer.com/).
22
23
24If you are using MacOS you should check out [Profiling Viewer](http://www.profilingviewer.com/) or [MacCallGrind](http://www.maccallgrind.com/).
25
26![KCachegrind](/files/kcachegrind.png)
27
28We will be dividing this post into two main categories:
29
30- writing simple web-service,
31- visualize profile of this web-service.
32
33## Simple web-service
34
35Let's use virtualenv so we won't pollute our base system. If you don't have virtualenv installed on your system you can install it with pip command.
36
37```bash
38# let's install virtualenv globally
39$ sudo pip install virtualenv
40
41# let's also install pyprof2calltree globally
42$ sudo pip install pyprof2calltree
43
44# now we create project
45$ mkdir demo-project
46$ cd demo-project/
47
48# now let's create folder where we will store profiles
49$ mkdir prof
50
51# now we create empty virtualenv in venv/ folder
52$ virtualenv --no-site-packages venv
53
54# we now need to activate virtualenv
55$ source venv/bin/activate
56
57# you can check if virtualenv was correctly initialized by
58# checking where your python interpreter is located
59# if command bellow points to your created directory and not some
60# system dir like /usr/bin/python then everything is fine
61$ which python
62
63# we can check now if all is good ➜ if ok couple of
64# lines will be displayed
65$ pip freeze
66# appdirs==1.4.3
67# packaging==16.8
68# pyparsing==2.2.0
69# six==1.10.0
70
71# now we are ready to install bottlepy ➜ web micro-framework
72$ pip install bottle
73
74# you can deactivate virtualenv but you will then go
75# under system domain ➜ for now don't deactivate
76$ deactivate
77```
78
79We are now ready to write simple web service. Let's create file app.py and paste code bellow in this newly created file.
80
81```python
82# -*- coding: utf-8 -*-
83
84import bottle
85import random
86import cProfile
87
88app = bottle.Bottle()
89
90# this function is a decorator and encapsulates function
91# and performs profiling and then saves it to subfolder
92# prof/function-name.prof
93# in our example only awesome_random_number function will
94# be profiled because it has do_cprofile defined
95def do_cprofile(func):
96 def profiled_func(*args, **kwargs):
97 profile = cProfile.Profile()
98 try:
99 profile.enable()
100 result = func(*args, **kwargs)
101 profile.disable()
102 return result
103 finally:
104 profile.dump_stats("prof/" + str(func.__name__) + ".prof")
105 return profiled_func
106
107
108# we use profiling over specific function with including
109# @do_cprofile above function declaration
110@app.route("/")
111@do_cprofile
112def awesome_random_number():
113 awesome_random_number = random.randint(0, 100)
114 return "awesome random number is " + str(awesome_random_number)
115
116@app.route("/test")
117def test():
118 return "dummy test"
119
120if __name__ == '__main__':
121 bottle.run(
122 app = app,
123 host = "0.0.0.0",
124 port = 4000
125 )
126
127# run with 'python app.py'
128# open browser 'http://0.0.0.0:4000'
129```
130
131When browser hits awesome\_random\_number() function profile is created in prof/ subfolder.
132
133## Visualize profile
134
135Now let's create callgrind format from this cProfile output.
136
137```bash
138$ cd prof/
139$ pyprof2calltree -i awesome_random_number.prof
140# this creates 'awesome_random_number.prof.log' file in the same folder
141```
142
143This file can be opened with visualizing tools listed above. In this case we will be using Profilling Viewer under MacOS. You can open image in new tab. As you can see from this example there is hierarchy of execution order of your code.
144
145![Profilling Viewer](/files/profiling-viewer.png)
146
147> Make sure you convert output of the cProfile output every time you want to refresh and take a look at your possible optimizations because cProfile updates .prof file every time browser hits the function.
148
149This is just a simple example but when you are developing real-life applications this can be very illuminating, especially to see which parts of your code are bottlenecks and need to be optimized.
150
151## Update 2017-04-22
152
153Reddit user [mvt](https://www.reddit.com/user/mvt) also recommended this awesome web based profile visualizer [SnakeViz](https://jiffyclub.github.io/snakeviz/) that directly takes output from [cProfile](https://docs.python.org/2/library/profile.html#module-cProfile) module.
154
155<div class="reddit-embed" data-embed-media="www.redditmedia.com" data-embed-parent="false" data-embed-live="false" data-embed-uuid="583880c1-002e-41ed-a373-020a0ef2cff9" data-embed-created="2017-04-22T19:46:54.810Z"><a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/dgljhsb/">Comment</a> from discussion <a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/">Profiling Python web applications with visual tools</a>.</div><script async src="https://www.redditstatic.com/comment-embed.js"></script>
156
157```bash
158# let's install it globally as well
159$ sudo pip install snakeviz
160
161# now let's visualize
162$ cd prof/
163$ snakeviz awesome_random_number.prof
164# this automatically opens browser window and
165# shows visualized profile
166```
167
168![SnakeViz](/files/snakeviz.png)
169
170Reddit user [ccharles](https://www.reddit.com/user/ccharles) suggested a better way for installing pip software by targeting user level instead of using sudo.
171
172<div class="reddit-embed" data-embed-media="www.redditmedia.com" data-embed-parent="false" data-embed-live="false" data-embed-uuid="f4f0459e-684d-441e-bebe-eb49b2f0a31d" data-embed-created="2017-04-22T19:46:10.874Z"><a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/dglpzkx/">Comment</a> from discussion <a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/">Profiling Python web applications with visual tools</a>.</div><script async src="https://www.redditstatic.com/comment-embed.js"></script>
173
174```bash
175# now we need to add this path to our $PATH variable
176# we do this my adding this line at the end of your
177# ~/.bashrc file
178PATH=$PATH:$HOME/.local/bin/
179
180# in order to use this new configuration you can close
181# and reopen terminal or reload .bashrc file
182$ source ~/.bashrc
183
184# now let's test if new directory is present in $PATH
185$ echo $PATH
186
187# now we can install on user level by adding --user
188# without use of sudo
189$ pip install snakeviz --user
190```
191
192Or as suggested by [mvt](https://www.reddit.com/user/mvt) you can use [pipsi](https://github.com/mitsuhiko/pipsi).
diff --git a/_posts/2017-08-11-simple-iot-application.md b/_posts/2017-08-11-simple-iot-application.md
deleted file mode 100644
index cd0179e..0000000
--- a/_posts/2017-08-11-simple-iot-application.md
+++ /dev/null
@@ -1,499 +0,0 @@
1---
2
3layout: post
4title: Simple IOT application supported by real-time monitoring and data history
5description: Develop simple IOT application with Arduino MKR1000 and Python
6
7---
8
9**Table of contents**
10
111. [Initial thoughts](#initial-thoughts)
122. [Simple Python API](#simple-python-api)
13 1. [Basic web application](#basic-web-application)
14 2. [Web application security](#web-application-security)
15 3. [Simple API for writing data-points](#simple-api-for-writing-data-points)
163. [Sending data to API with Arduino MKR1000](#sending-data-to-api-with-arduino-mkr1000)
174. [Data visualization](#data-visualization)
185. [Conclusion](#conclusion)
19
20## Initial thoughts
21
22I have been developing these kind of application for the better part of my last 5 years and people keep asking me how to approach developing such application and I will give a try explaining it here.
23
24IOT applications are really no different than any other kind of applications. We have data that needs to be collected and visualized in some form of tables or charts. The main difference here is that most of the times these data is collected by some kind of device foreign to developer that mainly operates in web domain. But fear not, it's not that different than writing some JavaScript.
25
26There are many devices able to transmit data via wireless or wired network by default but for the sake of example we will be using commonly known Arduino with wireless module already on the board → [Arduino MKR1000](https://store.arduino.cc/arduino-mkr1000).
27
28In order to make this little project as accessible to others as possible I will try to make it as inexpensive as possible. And by this I mean that I will avoid using hosted virtual servers and will be using my own laptop as a server. But you must buy Arduino MKR1000 to follow steps below. But if you would want to deploy this software I would suggest using [DigitalOcean](https://www.digitalocean.com) → smallest VPS is only $5 per month making this one of the most affordable option out there. Please notice that this software will not run on stock web hosting that only supports LAMP (Linux, Apache, MySQL, and PHP).
29
30_But before we begin please take notice that this is strictly experimental code and not well optimized and there are much better ways in handling some aspects of the application but that requires much deeper knowledge of technology that is not needed for an example like this._
31
32**Development steps**
33
341. Simple Python API that will receive and store incoming data.
352. Prototype C++ code that will read "sensor data" and transmit it to API.
363. Data visualization with charts → extends Python web application.
37
38Step 1. and 3. will share the same web application. One route will be dedicated to API and another to serving HTML with chart.
39
40Schema below represents what we will try to achieve and how different parts correlates to each other.
41
42![Overview](/files/simple-iot-application-overview.svg)
43
44## Simple Python API
45
46I have always been a fan of simplicity so we will be using [Bottle: Python Web Framework](https://bottlepy.org/docs/dev/). It is a single file web framework that seriously simplifies working with routes, templating and has built-in web server that satisfies our need in this case.
47
48First we need to install bottle package. This can be done by downloading ```bottle.py``` and placing it in the root of your application or by using pip software ```pip install bottle --user```.
49
50If you are using Linux or MacOS then Python is already installed. If you will try to test this on Windows please install [Python for Windows](https://www.python.org/downloads/windows/). There may be some problems with path when you will try to launch ```python webapp.py``` so please take care of this before you continue.
51
52### Basic web application
53
54Most basic bottle application is quite simple. Paste code below in ```webapp.py``` file and save.
55
56```python
57# -*- coding: utf-8 -*-
58
59import bottle
60
61# initializing bottle app
62app = bottle.Bottle()
63
64# triggered when / is accessed from browser
65# only accepts GET → no POST allowed
66@app.route("/", method=["GET"])
67def route_default():
68 return "howdy from python"
69
70# starting server on http://0.0.0.0:5000
71if __name__ == "__main__":
72 bottle.run(
73 app = app,
74 host = "0.0.0.0",
75 port = 5000,
76 debug = True,
77 reloader = True,
78 catchall = True,
79 )
80```
81
82To run this simple application you should open command prompt or terminal on your machine and go to the folder containing your file and type ```python webapp.py```. If everything goes ok then open your web browser and point it to ```http://0.0.0.0:5000```.
83
84If you would like change the port of your application (like port 80) and not use root to run your app this will present a problem. The TCP/IP port numbers below 1024 are privileged ports → this is a security feature. So in order of simplicity and security use a port number above 1024 like I have used port 5000.
85
86If this fails at any time please fix it before you continue, because nothing below will work otherwise.
87
88> We use 0.0.0.0 as default host so that this app is available over your local network. If you find your local ip (```ifconfig```) and try accessing this site with your phone (if on same network/router as your machine) this should work as well (example of such ip ```http://192.168.1.15:5000```). This is a must have because Arduino will be accessing this application to send it's data.
89
90### Web application security
91
92There is a lot to be said about security and is a topic of many books. Of course all this can not be written here but to just establish some basic security → you should always use SSL with your application. Some fantastic free certificates are available by [Let's Encrypt - Free SSL/TLS Certificates](https://letsencrypt.org). With SSL certificate installed you should then make use of HTTP headers and send your "API key" via a header. If your key is send via header then this key is encrypted by SSL and send encrypted over the network. Never send your api keys by GET parameter like ```http://example.com/?api_key=somekeyvalue```. The problem that this kind of sending presents is that this key is visible in logs and by network sniffers.
93
94There is a fantastic article describing some aspects about security: [11 Web Application Security Best Practices](https://www.keycdn.com/blog/web-application-security-best-practices/). Please check it out.
95
96### Simple API for writing data-points
97
98We will now be using boilerplate code from example above and extend it to be able to write data received by API to local storage. For example use I will use SQLite3 because it plays well with Python and can store quite large amount of data. I have been using it to collect gigabytes of data in a single database without any corruption or problems → your experience may vary.
99
100To avoid learning SQLite I will be using [Dataset: databases for lazy people](https://dataset.readthedocs.io/en/latest/index.html). This package abstracts SQL and simplifies writing and reading data from database. You should install this package with pip software ```pip install dataset --user```.
101
102Because API will use POST method I will be testing if code works correctly by using [Restlet Client for Google Chrome](https://chrome.google.com/webstore/detail/restlet-client-rest-api-t/aejoelaoggembcahagimdiliamlcdmfm). This software also allows you to set headers → for basic security with API_KEY.
103
104To quickly generate passwords or API keys I usually use this nifty website [RandomKeygen](https://randomkeygen.com/).
105
106Copy and paste code below over your previous code in file ```webapp.py```.
107
108```python
109# -*- coding: utf-8 -*-
110
111import time
112import bottle
113import random
114import dataset
115
116# initializing bottle app
117app = bottle.Bottle()
118
119# connects to sqlite database
120# check_same_thread=False allows using it in multi-threaded mode
121app.config["dsn"] = dataset.connect("sqlite:///data.db?check_same_thread=False")
122
123# api key that will be used in Arduino code
124app.config["api_key"] = "JtF2aUE5SGHfVJBCG5SH"
125
126# triggered when /api is accessed from browser
127# only accepts POST → no GET allowed
128@app.route("/api", method=["POST"])
129def route_default():
130 status = 400
131 ts = int(time.time()) # current timestamp
132 value = bottle.request.body.read() # data from device
133 api_key = bottle.request.get_header("Api_Key") # api key from header
134
135 # outputs to console received data for debug reason
136 print ">>> {} :: {}".format(value, api_key)
137
138 # if api_key is correct and value is present
139 # then writes attribute to point table
140 if api_key == app.config["api_key"] and value:
141 app.config["dsn"]["point"].insert(dict(ts=ts, value=value))
142 status = 200
143
144 # we only need to return status
145 return bottle.HTTPResponse(status=status, body="")
146
147# starting server on http://0.0.0.0:5000
148if __name__ == "__main__":
149 bottle.run(
150 app = app,
151 host = "0.0.0.0",
152 port = 5000,
153 debug = True,
154 reloader = True,
155 catchall = True,
156 )
157```
158
159To run this simply go to folder containing python file and run ```python webapp.py``` from terminal. If everything goes ok you should have simple API available via POST method on /api route.
160
161After testing the service with Restlet Client you should be able to view your data in a database file ```data.db```.
162
163![REST settings example](/files/iot-rest-example.png)
164
165You can also check the contents of new database file by using desktop client for SQLite → [DB Browser for SQLite](http://sqlitebrowser.org/).
166
167![SQLite database example](/files/iot-sqlite-db.png)
168
169Table structure is as simple as it can be. We have ts (timestamp) and value (value from Arduino). As you can see timestamp is generated on API side. If you would happen to have atomic clock on Arduino it would be then better to generate and send timestamp with the value. This would be particularity useful if we would be collecting sensor data at a higher frequency and then sending this data in bulk to API.
170
171> If you will deploy this app with uWSGI and multi-threaded, use DSN (Data Source Name) url with ```?check_same_thread=False```.
172
173Ok, now that we have some sort of a working API with some basic security so unwanted people can not post data to your database can we proceed further and try to program Arduino to send data to API.
174
175## Sending data to API with Arduino MKR1000
176
177First of all you should have MKR1000 module and microUSB cable to proceed. If you have ever done any work with Arduino you should know that you also need [Arduino IDE](https://www.arduino.cc/en/Main/Software). On provided link you should be able to download and install IDE. Once that task is completed and you have successfully run blink example you should proceed to the next step.
178
179In order to use wireless capabilities of MKR1000 you need to first install [WiFi101 library](https://www.arduino.cc/en/Reference/WiFi101) in Arduino IDE. Please check before you install, you may already have it installed.
180
181Code below is a working example that sends data to API. Before you try to test your code make sure you have run Python web application. Then change settings for wifi, api endpoint and api_key. If by some reason code bellow doesn't work for you please leave a comment and I'll try to help.
182
183Once you have opened IDE and copied this code try to compile and upload it. Then open "Serial monitor" to see if any output is presented by Arduino.
184
185```c
186#include <WiFi101.h>
187
188// wifi settings
189char ssid[] = "ssid-name";
190char pass[] = "ssid-password";
191
192// api server enpoint
193char server[] = "192.168.6.22";
194int port = 5000;
195
196// api key that must be the same as the one in Python code
197String api_key = "JtF2aUE5SGHfVJBCG5SH";
198
199// frequency data is sent in ms - every 5 seconds
200int timeout = 1000 * 5;
201
202int status = WL_IDLE_STATUS;
203
204void setup() {
205
206 // initialize serial and wait for port to open:
207 Serial.begin(9600);
208 delay(1000);
209
210 // check for the presence of the shield
211 if (WiFi.status() == WL_NO_SHIELD) {
212 Serial.println("WiFi shield not present");
213 while (true);
214 }
215
216 // attempt to connect to wifi network
217 while (status != WL_CONNECTED) {
218 Serial.print("Attempting to connect to SSID: ");
219 Serial.println(ssid);
220 status = WiFi.begin(ssid, pass);
221 // wait 10 seconds for connection
222 delay(10000);
223 }
224
225 // output wifi status to serial monitor
226 Serial.print("SSID: ");
227 Serial.println(WiFi.SSID());
228
229 IPAddress ip = WiFi.localIP();
230 Serial.print("IP Address: ");
231 Serial.println(ip);
232
233 long rssi = WiFi.RSSI();
234 Serial.print("signal strength (RSSI):");
235 Serial.print(rssi);
236 Serial.println(" dBm");
237}
238
239void loop() {
240
241 WiFiClient client;
242
243 if (client.connect(server, port)) {
244
245 // I use random number generator for this example
246 // but you can use analog or digital inputs from arduino
247 String content = String(random(1000));
248
249 client.println("POST /api HTTP/1.1");
250 client.println("Connection: close");
251 client.println("Api-Key: " + api_key);
252 client.println("Content-Length: " + String(content.length()));
253 client.println();
254 client.println(content);
255
256 delay(100);
257 client.stop();
258 Serial.println("Data sent successfully ...");
259
260 } else {
261 Serial.println("Problem sending data ...");
262 }
263
264 // waits for x seconds and continue looping
265 delay(timeout);
266
267}
268```
269
270As seen from example you can notice that Arduino is generating random integer between [ 0 .. 1000 ]. You can easily replace this with a temperature sensor or any other kind of sensor.
271
272Now that we have API under the hood and Arduino is sending demo data we can now focus on data visualization.
273
274## Data visualization
275
276Before we continue we should examine our project folder structure. Currently we only have two files in our project:
277
278_simple-iot-app/_
279
280* _webapp.py_
281* _data.db_
282
283We will now add HTML template that will contain CSS and JavaScript code inline for the simplicity reason. And for the bottle framework to be able to scan root application folder for templates we will add ```bottle.TEMPLATE_PATH.insert(0, "./")``` in ```webapp.py```. By default bottle framework uses ```views/``` subfolder to store templates. This is not the ideal situation and if you will use bottle to develop web applications you should use native behavior and store templates in it's predefined folder. But for the sake of example we will over-ride this. Be careful to fully replace your code with new code that is provided below. Avoid partially replacing code in file :) Also new code for reading data-points is provided in Python example below.
284
285First we add new route to our web application. It should be trigger when browser hits root of application ```http://0.0.0.0:5000/```. This route will do nothing more than render ```frontend.html``` template. This is done by ```return bottle.template("frontend.html")```. Check code below to further examine how exactly this is done.
286
287Now we will expand ```/api``` route and use different methods to write or read data-points. For writing data-point we will use POST method and for reading points we will use GET method. GET method will return JSON object with latest readings and historical data.
288
289There is a fantastic JavaScript library for plotting time-series charts called [MetricsGraphics.js](https://www.metricsgraphicsjs.org) that is based on [D3.js](https://d3js.org/) library for visualizing data.
290
291Data schema required by MetricsGraphics.js → to achieve this we need to transform data from database into this format:
292
293```json
294[
295 {
296 "date": "2017-08-11 01:07:20",
297 "value": 933
298 },
299 {
300 "date": "2017-08-11 01:07:30",
301 "value": 743
302 }
303]
304```
305
306Web application is now complete and we only need ```frontend.html``` that we will develop now. If you would try to start web app now and go to root app this will return error because we don't have frontend.html yet.
307
308```python
309# -*- coding: utf-8 -*-
310
311import time
312import bottle
313import json
314import datetime
315import random
316import dataset
317
318# initializing bottle app
319app = bottle.Bottle()
320
321# adds root directory as template folder
322bottle.TEMPLATE_PATH.insert(0, "./")
323
324# connects to sqlite database
325# check_same_thread=False allows using it in multi-threaded mode
326app.config["db"] = dataset.connect("sqlite:///data.db?check_same_thread=False")
327
328# api key that will be used in Arduino code
329app.config["api_key"] = "JtF2aUE5SGHfVJBCG5SH"
330
331# triggered when / is accessed from browser
332# only accepts GET → no POST allowed
333@app.route("/", method=["GET"])
334def route_default():
335 return bottle.template("frontend.html")
336
337# triggered when /api is accessed from browser
338# accepts POST and GET
339@app.route("/api", method=["GET", "POST"])
340def route_default():
341
342 # if method is POST then we write datapoint
343 if bottle.request.method == "POST":
344 status = 400
345 ts = int(time.time()) # current timestamp
346 value = bottle.request.body.read() # data from device
347 api_key = bottle.request.get_header("Api-Key") # api key from header
348
349 # outputs to console recieved data for debug reason
350 print ">>> {} :: {}".format(value, api_key)
351
352 # if api_key is correct and value is present
353 # then writes attribute to point table
354 if api_key == app.config["api_key"] and value:
355 app.config["db"]["point"].insert(dict(ts=ts, value=value))
356 status = 200
357
358 # we only need to return status
359 return bottle.HTTPResponse(status=status, body="")
360
361 # if method is GET then we read datapoint
362 else:
363 response = []
364 datapoints = app.config["db"]["point"].all()
365
366 for point in datapoints:
367 response.append({
368 "date": datetime.datetime.fromtimestamp(int(point["ts"])).strftime("%Y-%m-%d %H:%M:%S"),
369 "value": point["value"]
370 })
371
372 bottle.response.content_type = "application/json"
373 return json.dumps(response)
374
375# starting server on http://0.0.0.0:5000
376if __name__ == "__main__":
377 bottle.run(
378 app = app,
379 host = "0.0.0.0",
380 port = 5000,
381 debug = True,
382 reloader = True,
383 catchall = True,
384 )
385```
386
387And now finally we can implement ```frontend.html```. Create file with this name and copy code below. When you are done you can start web application. Steps for this part are listed below the code.
388
389```html
390<!DOCTYPE html>
391<html>
392
393 <head>
394 <meta charset="utf-8">
395 <title>Simple IOT application</title>
396 </head>
397
398 <body>
399
400 <h1>Simple IOT application</h1>
401
402 <div class="chart-placeholder">
403 <div id="chart"></div>
404 </div>
405
406 <!-- application main script -->
407 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
408 <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/4.10.0/d3.min.js"></script>
409 <script src="https://cdnjs.cloudflare.com/ajax/libs/metrics-graphics/2.11.0/metricsgraphics.min.js"></script>
410 <script>
411 function fetch_and_render() {
412 d3.json("/api", function(data) {
413 data = MG.convert.date(data, "date", "%Y-%m-%d %H:%M:%S");
414 MG.data_graphic({
415 data: data,
416 chart_type: "line",
417 full_width: true,
418 height: 270,
419 target: document.getElementById("chart"),
420 x_accessor: "date",
421 y_accessor: "value"
422 });
423 });
424 }
425 window.onload = function() {
426 // initial call for rendering
427 fetch_and_render();
428
429 // updates chart every 5 seconds
430 setInterval(function() {
431 fetch_and_render();
432 }, 5000);
433 }
434 </script>
435
436 <!-- application styles -->
437 <style>
438 body {
439 font: 13px sans-serif;
440 padding: 20px 50px;
441 }
442 .chart-placeholder {
443 border: 2px solid #ccc;
444 width: 100%;
445 user-select: none;
446 }
447 /* chart styles */
448 .mg-line1-color {
449 stroke: red;
450 stroke-width: 2;
451 }
452 .mg-main-area, .mg-main-line {
453 fill: #fff;
454 }
455 .mg-x-axis line, .mg-y-axis line {
456 stroke: #b3b2b2;
457 stroke-width: 1px;
458 }
459 </style>
460
461 </body>
462
463</html>
464```
465
466Now the folder structure should look like:
467
468_simple-iot-app/_
469
470* _webapp.py_
471* _data.db_
472* _frontend.html_
473
474Ok, lets now start application and start feeding it data.
475
4761. ```python webapp.py```
4772. connect Arduino MKR1000 to power source
4783. open browser and go to ```http://0.0.0.0:5000```
479
480If everything goes well you should be seeing new data-points rendered on chart every 5 seconds.
481
482If you navigate to ```http://0.0.0.0:5000``` you should see rendered chart as shown on picture below.
483
484![Application output](/files/iot-app-output.png)
485
486Complete application with all the code is available for [download](/files/simple-iot-application.zip).
487
488## Conclusion
489
490I hope this clarifies some aspects of IOT application development. Of course this is a minimal example and is far from what can be done in real life with some further dive into other technologies.
491
492If you would like to continue exploring IOT world here are some interesting resources for you to examine:
493
494* [Reading Sensors with an Arduino](https://www.allaboutcircuits.com/projects/reading-sensors-with-an-arduino/)
495* [MQTT 101 – How to Get Started with the lightweight IoT Protocol](http://www.hivemq.com/blog/how-to-get-started-with-mqtt)
496* [Stream Updates with Server-Sent Events](https://www.html5rocks.com/en/tutorials/eventsource/basics/)
497* [Internet of Things (IoT) Tutorials](http://www.tutorialspoint.com/internet_of_things/)
498
499Any comment or additional ideas are welcomed in comments below.
diff --git a/_posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md b/_posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md
deleted file mode 100644
index 4eccd25..0000000
--- a/_posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md
+++ /dev/null
@@ -1,271 +0,0 @@
1---
2
3layout: post
4title: Using DigitalOcean Spaces Object Storage with FUSE
5description: Using DigitalOcean Spaces Object Storage with FUSE
6
7---
8
9**Table of contents**
10
111. [Is it possible to use them as a mounted drive with FUSE?](#is-it-possible-to-use-them-as-a-mounted-drive-with-fuse)
122. [Will the performance degrade over time and over different sizes of objects?](#will-the-performance-degrade-over-time-and-over-different-sizes-of-objects)
13 1. [Measurement experiment 1: File copy](#measurement-experiment-1-file-copy)
14 2. [Measurement experiment 2: SQLite performanse](#measurement-experiment-2-sqlite-performanse)
153. [Can storage be mounted on multiple machines at the same time and be writable?](#can-storage-be-mounted-on-multiple-machines-at-the-same-time-and-be-writable)
164. [Observations and conslusion](#observations-and-conslusion)
17
18Couple of months ago [DigitalOcean](https://www.digitalocean.com) introduced new product called [Spaces](https://blog.digitalocean.com/introducing-spaces-object-storage/) which is Object Storage very similar to Amazon's S3. This really peaked my interest, because this was something I was missing and even the thought of going over the internet for such functionality was in no interest to me. Also in fashion with their previous pricing this also is very cheap and pricing page is a no-brainer compared to AWS or GCE. [Prices are clearly and precisely defined and outlined](https://www.digitalocean.com/pricing/). You must love them for that :)
19
20### Initial requirements
21
22* Is it possible to use them as a mounted drive with FUSE? (tl;dr YES)
23* Will the performance degrade over time and over different sizes of objects? (tl;dr NO&YES)
24* Can storage be mounted on multiple machines at the same time and be writable? (tl;dr YES)
25
26> Let me be clear. This scripts I use are made just for benchmarking and are not intended to be used in real-life situations. Besides that, I am looking into using this approaches but adding caching service in front of it and then dumping everything as an object to storage. This could potentially be some interesting post of itself. But in case you would need real-time data without eventual consistency please take this scripts as they are: not usable in such situations.
27
28## Is it possible to use them as a mounted drive with FUSE?
29
30Well, actually they can be used in such manor. Because they are similar to [AWS S3](https://aws.amazon.com/s3/) many tools are available and you can find many articles and [Stackoverflow items](https://stackoverflow.com/search?q=s3+fuse).
31
32To make this work you will need DigitalOcean account. If you don't have one you will not be able to test this code. But if you have an account then you go and [create new Droplet](https://cloud.digitalocean.com/droplets/new?size=s-1vcpu-1gb&region=ams3&distro=debian&distroImage=debian-9-x64&options=private_networking,install_agent). If you click on this link you will already have preselected Debian 9 with smallest VM option.
33
34* Please be sure to add you SSH key, because we will login to this machine remotely.
35* If you change your region please remember which one you choose because we will need this information when we try to mount space to our machine.
36
37Instuctions on how to use SSH keys and how to setup them are available in article [How To Use SSH Keys with DigitalOcean Droplets](https://www.digitalocean.com/community/tutorials/how-to-use-ssh-keys-with-digitalocean-droplets).
38
39![DigitalOcean Droplets](/files/fuse-droplets.png)
40
41After we created Droplet it's time to create new Space. This is done by clicking on a button [Create](https://cloud.digitalocean.com/spaces/new) (right top corner) and selecting Spaces. Choose pronounceable ```Unique name``` because we will use it in examples below. You can either choose Private or Public, it doesn't matter in our case. And you can always change that in the future.
42
43When you have created new Space we should [generate Access key](https://cloud.digitalocean.com/settings/api/tokens). This link will guide to the page when you can generate this key. After you create new one, please save provided Key and Secret because Secret will not be shown again.
44
45![DigitalOcean Spaces](/files/fuse-spaces.png)
46
47Now that we have new Space and Access key we should SSH into our machine.
48
49```bash
50# replace IP with the ip of your newly created droplet
51ssh root@IP
52
53# this will install utilities for mounting storage objects as FUSE
54apt install s3fs
55
56# we now need to provide credentials (access key we created earlier)
57# replace KEY and SECRET with your own credentials but leave the colon between them
58# we also need to set proper permissions
59echo "KEY:SECRET" > .passwd-s3fs
60chmod 600 .passwd-s3fs
61
62# now we mount space to our machine
63# replace UNIQUE-NAME with the name you choose earlier
64# if you choose different region for your space be careful about -ourl option (ams3)
65s3fs UNIQUE-NAME /mnt/ -ourl=https://ams3.digitaloceanspaces.com -ouse_cache=/tmp
66
67# now we try to create a file
68# once you mount it may take a couple of seconds to retrieve data
69echo "Hello cruel world" > /mnt/hello.txt
70```
71
72After all this you can return to your browser and go to [DigitalOcean Spaces](https://cloud.digitalocean.com/spaces) and click on your created space. If file hello.txt is present you have successfully mounted space to your machine and wrote data to it.
73
74I choose the same region for my Droplet and my Space but you don't have to. You can have different regions. What this actually does to performance I don't know.
75
76Additional information on FUSE:
77
78* [Github project page for s3fs](https://github.com/s3fs-fuse/s3fs-fuse)
79* [FUSE - Filesystem in Userspace](https://en.wikipedia.org/wiki/Filesystem_in_Userspace)
80
81## Will the performance degrade over time and over different sizes of objects?
82
83For this task I didn't want to just read and write text files or uploading images. I actually wanted to figure out if using something like SQlite is viable in this case.
84
85### Measurement experiment 1: File copy
86
87```bash
88# first we create some dummy files at different sizes
89dd if=/dev/zero of=10KB.dat bs=1024 count=10 #10KB
90dd if=/dev/zero of=100KB.dat bs=1024 count=100 #100KB
91dd if=/dev/zero of=1MB.dat bs=1024 count=1024 #1MB
92dd if=/dev/zero of=10MB.dat bs=1024 count=10240 #10MB
93
94# now we set time command to only return real
95TIMEFORMAT=%R
96
97# now lets test it
98(time cp 10KB.dat /mnt/) |& tee -a 10KB.results.txt
99
100# and now we automate
101# this will perform the same operation 100 times
102# this will output results into separated files based on objecty size
103n=0; while (( n++ < 100 )); do (time cp 10KB.dat /mnt/10KB.$n.dat) |& tee -a 10KB.results.txt; done
104n=0; while (( n++ < 100 )); do (time cp 100KB.dat /mnt/100KB.$n.dat) |& tee -a 100KB.results.txt; done
105n=0; while (( n++ < 100 )); do (time cp 1MB.dat /mnt/1MB.$n.dat) |& tee -a 1MB.results.txt; done
106n=0; while (( n++ < 100 )); do (time cp 10MB.dat /mnt/10MB.$n.dat) |& tee -a 10MB.results.txt; done
107```
108
109Files of size 100MB were not successfully transferred and ended up displaying error (cp: failed to close '/mnt/100MB.1.dat': Operation not permitted).
110
111As I suspected, object size is not really that important. Sadly I don't have the time to test performance over periods of time. But if some of you would do it please send me your data. I would be interested in seeing results.
112
113**Here are plotted results**
114
115You can download [raw result here](/files/copy-benchmarks.tsv). Measurements are in seconds.
116
117<script src="/assets/plotly-latest.min.js"></script>
118<div id="copy-benchmarks"></div>
119<script>
120(function(){
121 var request = new XMLHttpRequest();
122 request.open("GET", "/files/copy-benchmarks.tsv", true);
123 request.onload = function() {
124 if (request.status >= 200 && request.status < 400) {
125 var payload = request.responseText.trim();
126 var tsv = payload.split("\n");
127 for (var i=0; i<tsv.length; i++) { tsv[i] = tsv[i].split("\t"); }
128 var traces = [];
129 var headers = tsv[0];
130 tsv.shift();
131 Array.prototype.forEach.call(headers, function(el, idx) {
132 var x = [];
133 var y = [];
134 for (var j=0; j<tsv.length; j++) {
135 x.push(j);
136 y.push(parseFloat(tsv[j][idx].replace(",", ".")));
137 }
138 traces.push({ x: x, y: y, type: "scatter", name: el, line: { width: 1, shape: "spline" } });
139 });
140 var copy = Plotly.newPlot("copy-benchmarks", traces, { legend: {"orientation": "h"}, height: 400, margin: { l: 40, r: 0, b: 20, t: 30, pad: 0 }, yaxis: { title: "execution time in seconds", titlefont: { size: 12 } }, xaxis: { title: "fn(i)", titlefont: { size: 12 } } });
141 } else { }
142 };
143 request.onerror = function() { };
144 request.send(null);
145})();
146</script>
147
148As far as these tests show, performance is quite stable and can be predicted which is fantastic. But this is a small test and spans only over couple of hours. So you should not completely trust them.
149
150### Measurement experiment 2: SQLite performanse
151
152I was unable to use database file directly from mounted drive so this is a no-go as I suspected. So I executed code below on a local disk just to get some benchmarks. I inserted 1000 records with DROPTABLE, CREATETABLE, INSERTMANY, FETCHALL, COMMIT for 1000 times to generate statistics. As you can see performance of SQLite is quite amazing. You could then potentially just copy file to mounted drive and be done with it.
153
154```python
155import time
156import sqlite3
157import sys
158
159if len(sys.argv) < 3:
160 print("usage: python sqlite-benchmark.py DB_PATH NUM_RECORDS REPEAT")
161 exit()
162
163def data_iter(x):
164 for i in range(x):
165 yield "m" + str(i), "f" + str(i*i)
166
167header_line = "%s\t%s\t%s\t%s\t%s\n" % ("DROPTABLE", "CREATETABLE", "INSERTMANY", "FETCHALL", "COMMIT")
168with open("sqlite-benchmarks.tsv", "w") as fp:
169 fp.write(header_line)
170
171start_time = time.time()
172conn = sqlite3.connect(sys.argv[1])
173c = conn.cursor()
174end_time = time.time()
175result_time = CONNECT = end_time - start_time
176print("CONNECT: %g seconds" % (result_time))
177
178start_time = time.time()
179c.execute("PRAGMA journal_mode=WAL")
180c.execute("PRAGMA temp_store=MEMORY")
181c.execute("PRAGMA synchronous=OFF")
182result_time = PRAGMA = end_time - start_time
183print("PRAGMA: %g seconds" % (result_time))
184
185for i in range(int(sys.argv[3])):
186 print("#%i" % (i))
187
188 start_time = time.time()
189 c.execute("drop table if exists test")
190 end_time = time.time()
191 result_time = DROPTABLE = end_time - start_time
192 print("DROPTABLE: %g seconds" % (result_time))
193
194 start_time = time.time()
195 c.execute("create table if not exists test(a,b)")
196 end_time = time.time()
197 result_time = CREATETABLE = end_time - start_time
198 print("CREATETABLE: %g seconds" % (result_time))
199
200 start_time = time.time()
201 c.executemany("INSERT INTO test VALUES (?, ?)", data_iter(int(sys.argv[2])))
202 end_time = time.time()
203 result_time = INSERTMANY = end_time - start_time
204 print("INSERTMANY: %g seconds" % (result_time))
205
206 start_time = time.time()
207 c.execute("select count(*) from test")
208 res = c.fetchall()
209 end_time = time.time()
210 result_time = FETCHALL = end_time - start_time
211 print("FETCHALL: %g seconds" % (result_time))
212
213 start_time = time.time()
214 conn.commit()
215 end_time = time.time()
216 result_time = COMMIT = end_time - start_time
217 print("COMMIT: %g seconds" % (result_time))
218
219 print
220 log_line = "%f\t%f\t%f\t%f\t%f\n" % (DROPTABLE, CREATETABLE, INSERTMANY, FETCHALL, COMMIT)
221 with open("sqlite-benchmarks.tsv", "a") as fp:
222 fp.write(log_line)
223
224start_time = time.time()
225conn.close()
226end_time = time.time()
227result_time = CLOSE = end_time - start_time
228print("CLOSE: %g seconds" % (result_time))
229```
230
231You can download [raw result here](/files/sqlite-benchmarks.tsv). And again, these results are done on a local block storage and do not represent capabilities of object storage. With my current approach and state of the test code these can not be done. I would need to make Python code much more robust and check locking etc.
232
233<div id="sqlite-benchmarks"></div>
234<script>
235(function(){
236 var request = new XMLHttpRequest();
237 request.open("GET", "/files/sqlite-benchmarks.tsv", true);
238 request.onload = function() {
239 if (request.status >= 200 && request.status < 400) {
240 var payload = request.responseText.trim();
241 var tsv = payload.split("\n");
242 for (var i=0; i<tsv.length; i++) { tsv[i] = tsv[i].split("\t"); }
243 var traces = [];
244 var headers = tsv[0];
245 tsv.shift();
246 Array.prototype.forEach.call(headers, function(el, idx) {
247 var x = [];
248 var y = [];
249 for (var j=0; j<tsv.length; j++) {
250 x.push(j);
251 y.push(parseFloat(tsv[j][idx].replace(",", ".")));
252 }
253 traces.push({ x: x, y: y, type: "scatter", name: el, line: { width: 1, shape: "spline" } });
254 });
255 var sqlite = Plotly.newPlot("sqlite-benchmarks", traces, { legend: {"orientation": "h"}, height: 400, margin: { l: 50, r: 0, b: 20, t: 30, pad: 0 }, yaxis: { title: "execution time in seconds", titlefont: { size: 12 } } });
256 } else { }
257 };
258 request.onerror = function() { };
259 request.send(null);
260})();
261</script>
262
263## Can storage be mounted on multiple machines at the same time and be writable?
264
265Well, this one didn't take long to test. And the answer is **YES**. I mounted space on both machines and measured same performance on both machines. But because file is downloaded before write and then uploaded on complete there could potentially be problems is another process is trying to access the same file.
266
267## Observations and conslusion
268
269Using Spaces in this way makes it easier to access and manage files. But besides that you would need to write additional code to make this one play nice with you applications.
270
271Nevertheless, this was extremely simple to setup and use and this is just another excellent product in DigitalOcean product line. I found this exercise very valuable and am thinking about implementing some sort of mechanism for SQLite, so data can be stored on Spaces and accessed by many VM's. For a project where data doesn't need to be accessible in real-time and can have couple of minutes old data this would be very interesting. If any of you find this proposal interesting please write in a comment box below or shoot me an email and I will keep you posted.
diff --git a/_posts/2018-08-05-the-bullshit-web-developments-pov.md b/_posts/2018-08-05-the-bullshit-web-developments-pov.md
deleted file mode 100644
index b8346f6..0000000
--- a/_posts/2018-08-05-the-bullshit-web-developments-pov.md
+++ /dev/null
@@ -1,113 +0,0 @@
1---
2
3layout: post
4title: The Bullshit Web - Development's Point of View
5description: State of front-end development and what this does to the future of web
6
7---
8
9**Table of contents**
10
111. [Initial thoughts](#initial-thoughts)
122. [Front-end frameworks](#front-end-frameworks)
133. [Obsolescence to the rescue](#obsolescence-to-the-rescue)
144. [Unnecessary complexity](#unnecessary-complexity)
155. [Speed of development trumps code quality](#speed-of-development-trumps-code-quality)
166. [Load times of most popular websites](#load-times-of-most-popular-websites)
17
18## Initial thoughts
19
20I have recently read an amazing essay by Nick Heer on the web called [The Bullshit Web](https://pxlnv.com/blog/bullshit-web/) and it got me thinking about the future of the web as it is today.
21
22> The average internet connection in the United States is about six times as fast as it was just ten years ago, but instead of making it faster to browse the same types of websites, we’re simply occupying that extra bandwidth with more stuff.
23>
24> **-- Nick Heer**
25
26I really try to stray away from frond-end development as much as possible. The reason is nowhere close to me having any bad opinions but having to work with clients on visual stuff drains me to the point of sheer horror.
27
28I have observed silently the progress that was made in this field because I thought things will get better with time. I was so wrong. So wrong. Not only that things got extremely complicated to work with, the whole stack became so massive even simple pages have insanely large footprint.
29
30The Bullshit Web essay concentrates mostly on page sizes and AMP but I would like to address tooling and technologies for development in this post.
31
32Currently we have two types of websites:
33
34- informational websites,
35- web applications.
36
37The problem that occurs is that more and more websites are treathed as web application where simple web page would suffice. And this in my opinion adds insult to the injury.
38
39We talk about progressive web applications, AMP, and other technologies that are solving the problems of bandwidth, usability and in general making web faster but in reality this rarely gets applied in real life scenarios. Most of the time this are just demos on conferences.
40
41## Front-end frameworks
42
43I am not of those purists that denies usage of JavaScript frameworks or SASS but there are limits to where this obsession should go. In order to use these technologies properly one should ask himself where exactly they are needed and not use them like hammer for nails.
44
45Whenever I need to do front-end UI I usually check specification before embarking on journey of coding. And most of the times I really don't need frameworks. Most of the code I need to write in JavaScript is done in couple of hundred lines of code and does exactly what specification requires. And developer that will be working on this code after me doesn't need to learn new framework, tooling, etc. Just pure vanilla JavaScript. In all of my years as a developer I can count on fingers on my one hand when I used some sort of a framework. And even in this exceptions we later rewrote code to vanilla JavaScript because maintaining complex code was just to time consuming.
46
47There is an argument to be made for using frameworks in cases where multiple people are working a project and code must be easily transferable and on-boarding process must be swift. But in reality this is just another bullshit excuse to stick with what is "cool". I stand by Function over Form. And this also conflicts with the notion that frameworks never change. Frameworks evolve and adapt to market needs and most of the times get massive and hard to maintain. And we get stuck with massive codebase that is developed with many hacks and workarounds, because framework didn't support some feature at the time of development. I personally hate workarounds and being a smart-ass that intentionally makes code harder to read. I find frameworks similar to the story about Cain and Abel. Either you get murdered or framework gets. Most of the times framework dies and leaves legacy nobody would want.
48
49Huge strives have been made to address this problem and many fantastic frameworks emerged and some of theme are absolutely amazing. But there needs to be a strong case for using them in a project. We should never blindly use them regardless of the problem we are trying to solve.
50
51I must admit that tooling around front-end is getting better and better and we are slowly getting there but there still is a long road ahead.
52
53## Obsolescence to the rescue
54
55We can all agree that frameworks or libraries usually are there to fill the gap what currently is widely supported by the standard. Most of this so called frameworks are just libraries that unifies browser compatibility. The prime example of this is jQuery. There was a time almost everybody was using jQuery. But through time HTML5 specs were updated to include ideas from jQuery and this filled the browser compatibility gap. There is this awesome article [The Rise and Fall of jQuery](https://www.evolutionjobs.com/uk/media/the-rise-and-fall-of-jquery-117981/).
56
57Don't get me wrong. Yes, I dislike jQuery but I find it indispensable and without it our web would be very different. For the worst in my opinion. It was a huge stepping stone for front-end development. But there comes a time where technologies get obsolete and standards catch up with the requirements of the field.
58
59And because libraries and frameworks have short lifespan I try to stay away from them and if possible use vanilla code. There is a wonderfull article about [The Brutal Lifecycle of JavaScript Frameworks](https://stackoverflow.blog/2018/01/11/brutal-lifecycle-javascript-frameworks/) that explains how quick they popup and become obsolete.
60
61> JavaScript UI frameworks and libraries work in cycles. Every six months or so, a new one pops up, claiming that it has revolutionized UI development. Thousands of developers adopt it into their new projects, blog posts are written, Stack Overflow questions are asked and answered, and then a newer (and even more revolutionary) framework pops up to usurp the throne.
62>
63> **-- Ian Allen**
64
65## Unnecessary complexity
66
67Libraries have a tendency to speed up development which is ok but there are a huge drawbacks in the future. Most of the times we work on simple projects. Not everybody is working on Facebook, Google or that kind of mamuth apps and by using libraries provided to us by these companies we introduce complexity these companies need in order to make their apps. And usually these libraries include edge case functionalities that only apply to them and by providing simpler way to use libraries very complex approaches get implemented.
68
69Another reason for me to not use frameworks and libraries is that there usually is a team behind a project and by working on a feature by your own it takes too much time to read through the documentation and properly understand what the reasoning was behind a feature in a library. Most of the stuff (dashboarding, tables, widgets) that I work on are done much faster by pure using JS. Codebase footprint is smaller and doesn't require other developers to learn a completly new framework.
70
71This freameworks are heavily opinionated. No question about it. And by using them you accept their dogma. And by doing so you put yourself in a wierd position when new "disruptive" framework comes to life. If we think about it these frameworks should rather be called "approaches".
72
73> *Just to be completely honest*
74>
75> There are use-cases for such frameworks. And there are situations where they are indispensable. I am not saying that they don't make sense. All I am saying that in my line of work I noticed that not every project is fit for a framework and it's better to not use them in such cases.
76
77An awesome talk about [Learning from JavaScript Libraries by Trevor Landau](https://www.youtube.com/watch?v=u2PgPWj8KrM).
78
79## Speed of development trumps code quality
80
81I have found out that most of these frameworks or libraries have become very difficult to undestand in a matter of hours. In the past this was diifferent somehow. You could learn jQuery in a matter of hours and use it the next day like you were a pro. I know that it's not fair to compare framework and library but for our case this is acceptable.
82
83Every developer should have the knowledge and experience when selecting or not selection framework. I always stay true to [Occam's razor](https://en.wikipedia.org/wiki/Occam%27s_razor). And when prototyping I always use as barebone setup as I can. I see no problem with completly dumping a block of code and replacing it with something more complex if this makes sense. But there needs to be a huge reason behind this decision.
84
85Workarounds are one of the nessesary evils perticulary when dealing with frameworks. Either because the lack of time of just plain reason that framework doesn't support something. And this is the my main problem with them. In real life we don't have the time to properly implement ideas behind a framework. And when shit hits the fan we butcher up the code and mix different ideas just to catch a deadline. And this is in contadiction with the whole idea of using a framework.
86
87The impact that this has on quality and readability of code is massive. And threating this just as a symptom is probably the worst thing you can do. Through time these hacked-up code becomes legacy and additional code is molded to the code that already is in the codebase. And by doing this our code becomes more and more foregin of the initial concept.
88
89Code quality and readability should come first regardless of frameworks and libraries. Code should be as close to bare-metal as possible so when frameworks change our code is still usable and can be refreshed by any developer with the basic knowledge of desired programming language.
90
91## Load times of most popular websites
92
93All this directly impacts performanse. Terabytes of bandwidth wasted because there was a decision made early in the development cycle. Laggy performance, slow loading, bad experience just because development team was not cautious enough.
94
95Here are some examples of loading times. It's up to you to decide if this really is the best way to do web.
96
97| URL | # req | Transfered | Finish | DOM Content Loaded | Load |
98| ------------------ | ----- | ---------- | ------- | ------------------ | ------ |
99| cnn.com | 134 | 3.22 MB | 4.7 s | 575 ms | 3.60 s |
100| youtube.com | 61 | 1.8 MB | 5.13 s | 1.78 s | 1.97 s |
101| wikipedia.com | 11 | 64.5 KB | 642 ms | 531 ms | 573 ms |
102| reddit.com | 177 | 12.9 MB | 7.65 s | 2.03 s | 3.74 s |
103| amazon.com | 278 | 8.0 MB | 5.20 s | 1.15s | 2.99 s |
104| twitter.com | 202 | 5.1 MB | 23.48 s | 3.20 s | 4.55 s |
105| twitch.tv | 177 | 4.4 MB | 5.08 s | 579 ms | 798 ms |
106| microsoft.com | 77 | 1.1 MB | 3.96 s | 1.01 s | 1.26 s |
107| huffingtonpost.com | 134 | 2.9 MB | 2.30 s | 789 ms | 1.47 s |
108| nytimes.com | 240 | 2.9 MB | 4.64 s | 1.30 s | 4.29 s |
109| foxnews.com | 195 | 1.7 MB | 4.42 s | 1.25 s | 3.86 s |
110| theguardian.com | 203 | 2.8 MB | 2.75 s | 784 ms | 2.43 s |
111| bbc.com | 127 | 1.3 MB | 3.44 s | 1.24 s | 2.65 s |
112
113Chrome Browser Developer tools was used to measure load times.
diff --git a/_posts/2019-01-03-encoding-binary-data-into-dna-sequence.md b/_posts/2019-01-03-encoding-binary-data-into-dna-sequence.md
deleted file mode 100644
index 56e96dd..0000000
--- a/_posts/2019-01-03-encoding-binary-data-into-dna-sequence.md
+++ /dev/null
@@ -1,415 +0,0 @@
1---
2
3layout: post
4title: Encoding binary data into DNA sequence
5description: Imagine a world where you could go outside and take a leaf from a tree and put it through your personal DNA sequencer and get data like music, videos or computer programs from it.
6
7---
8
9**Table of contents**
10
111. [Initial thoughts](#initial-thoughts)
122. [Glossary](#glossary)
133. [Data encoding](#data-encoding)
144. [Quick history of DNA](#quick-history-of-dna)
155. [What is DNA?](#what-is-dna)
166. [Encode binary data into DNA sequence](#encode-binary-data-into-dna-sequence)
17 1. [Basic Encoding](#basic-encoding)
18 2. [FASTA file format](#fasta-file-format)
19 3. [PNG encoded DNA sequence](#png-encoded-dna-sequence)
207. [Encoding text file in practice](#encoding-text-file-in-practice)
218. [Toolkit for encoding data](#toolkit-for-encoding-data)
22 1. [dnae-encode](#dnae-encode)
23 2. [dnae-png](#dnae-png)
249. [Benchmarks](#benchmarks)
2510. [References](#references)
26
27## Initial thoughts
28
29Imagine a world where you could go outside and take a leaf from a tree and put it through your personal DNA sequencer and get data like music, videos or computer programs from it. Well, this is all possible now. It was not done on a large scale because it is quite expensive to create DNA strands but it's possible.
30
31Encoding data into DNA sequence is relatively simple process once you understand the relationship between binary data and nucleotides and scientists have been making large leaps in this field in order to provide viable long-term storage solution for our data that would potentially survive our specie if case of global disaster. We could imprint all the world's knowledge into plants and ensure the survival of our knowledge.
32
33More optimistic usage for this technology would be easier storage of ever growing data we produce every day. Once machines for sequencing DNA become fast enough and cheaper this could mean the next evolution of storing data and abandoning classical hard and solid state drives in data warehouses.
34
35As we currently stand this is still not viable but it is quite an amazing and cool technology.
36
37My interests in this field are purely in encoding processes and experimental testing mainly because I don't have the access to this expensive machines. My initial goal was to create a toolkit that can be used by everybody to encode their data into a proper DNA sequence.
38
39## Glossary
40
41**deoxyribose**
42: A five-carbon sugar molecule with a hydrogen atom rather than a hydroxyl group in the 2′ position; the sugar component of DNA nucleotides.
43
44**double helix**
45: The molecular shape of DNA in which two strands of nucleotides wind around each other in a spiral shape.
46
47**nitrogenous base**
48: A nitrogen-containing molecule that acts as a base; often referring to one of the purine or pyrimidine components of nucleic acids.
49
50**phosphate group**
51: A molecular group consisting of a central phosphorus atom bound to four oxygen atoms.
52
53**RGB**
54: The RGB color model is an additive color model in which red, green and blue light are added together in various ways to reproduce a broad array of colors.
55
56**GCC**
57: The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages.
58
59## Data encoding
60
61**TL;DR:** Encoding involves the use of a code to change original data into a form that can be used by an external process [^1].
62
63Encoding is the process of converting data into a format required for a number of information processing needs, including:
64
65- Program compiling and execution
66- Data transmission, storage and compression/decompression
67- Application data processing, such as file conversion
68
69Encoding can have two meanings[^1]:
70
71- In computer technology, encoding is the process of applying a specific code, such as letters, symbols and numbers, to data for conversion into an equivalent cipher.
72- In electronics, encoding refers to analog to digital conversion.
73
74## Quick history of DNA
75
76- **1869** - Friedrich Miescher identifies "nuclein".
77- **1900s** - The Eugenics Movement.
78- **1900** – Mendel's theories are rediscovered by researchers.
79- **1944** - Oswald Avery identifies DNA as the 'transforming principle'.
80- **1952** - Rosalind Franklin photographs crystallized DNA fibres.
81- **1953** - James Watson and Francis Crick discover the double helix structure of DNA.
82- **1965** - Marshall Nirenberg is the first person to sequence the bases in each codon.
83- **1983** - Huntington's disease is the first mapped genetic disease.
84- **1990** - The Human Genome Project begins.
85- **1995** - Haemophilus Influenzae is the first bacterium genome sequenced.
86- **1996** - Dolly the sheep is cloned.
87- **1999** - First human chromosome is decoded.
88- **2000** – Genetic code of the fruit fly is decoded.
89- **2002** – Mouse is the first mammal to have its genome decoded.
90- **2003** – The Human Genome Project is completed.
91- **2013** – DNA Worldwide and Eurofins Forensic discover identical twins have differences in their genetic makeup [^2].
92
93## What is DNA?
94
95Deoxyribonucleic acid, a self-replicating material which is **present in nearly all living organisms** as the main constituent of chromosomes. It is the **carrier of genetic information**.
96
97> The nitrogen in our DNA, the calcium in our teeth, the iron in our blood, the carbon in our apple pies were made in the interiors of collapsing stars. We are made of starstuff.
98>
99> **-- Carl Sagan, Cosmos**
100
101The nucleotide in DNA consists of a sugar (deoxyribose), one of four bases (cytosine (C), thymine (T), adenine (A), guanine (G)), and a phosphate. Cytosine and thymine are pyrimidine bases, while adenine and guanine are purine bases. The sugar and the base together are called a nucleoside.
102
103![DNA](/files/dna-sequence/dna-basics.jpg#center)
104
105*DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and cytosine pairs with guanine. (credit a: modification of work by Jerome Walker, Dennis Myts) [^3]*
106
107## Encode binary data into DNA sequence
108
109As an input file you can use any file you want:
110- ASCII files,
111- Compiled programs,
112- Multimedia files (MP3, MP4, MVK, etc),
113- Images,
114- Database files,
115- etc.
116
117Note: If you would copy all the bytes from RAM to file or pipe data to file you could encode also this data as long as you provide file pointer to the encoder.
118
119### Basic Encoding
120
121As already mentioned, the Basic Encoding is based on a simple mapping. Since DNA is composed of 4 nucleotides (Adenine, Cytosine, Guanine, Thymine; usually referred using the first letter). Using this technique we can encode
122
123$$ log_2(4) = log_2(2^2) = 2 bits $$
124
125using a single nucleotide. In this way, we are able to use the 4 bases that compose the DNA strand to encode each byte of data. [^4]
126
127| Two bits | Nucleotides |
128| -------- | ---------------- |
129| 00 | **A** (Adenine) |
130| 10 | **G** (Guanine) |
131| 01 | **C** (Cytosine) |
132| 11 | **T** (Thymine) |
133
134With this in mind we can simply encode any data by using two-bit to Nucleotides conversion
135
136```pascal
137{ Algorithm 1: Naive byte array to DNA encode }
138procedure EncodeToDNASequence(f) string
139begin
140 enc string
141 while not eof(f) do
142 c byte := buffer[0] { Read 1 byte from buffer }
143 bin integer := sprintf('08b', c) { Convert to string binary }
144 for e in range[0, 2, 4, 6] do
145 if e[0] == 48 and e[1] == 48 then { 0x00 - A (Adenine) }
146 enc += 'A'
147 else if e[0] == 48 and e[1] == 49 then { 0x01 - G (Guanine) }
148 enc += 'G'
149 else if e[0] == 49 and e[1] == 48 then { 0x10 - C (Cytosine) }
150 enc += 'C'
151 else if e[0] == 49 and e[1] == 49 then { 0x11 - T (Thymine) }
152 enc += 'T'
153 return enc { Return DNA sequence }
154end
155```
156
157Another encoding would be **Goldman encoding**. Using this encoding helps with Nonsense mutation (amino acids replaced by a stop codon) that occurs and is the most problematic during translation because it leads to truncated amino acid sequences, which in turn results in truncated proteins. [^4]
158
159[Where to store big data? In DNA: Nick Goldman at TEDxPrague](https://www.youtube.com/watch?v=a4PiGWNsIEU)
160
161### FASTA file format
162
163In bioinformatics, FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The format originates from the FASTA software package, but has now become a standard in the field of bioinformatics. [^5]
164
165The first line in a FASTA file started either with a ">" (greater-than) symbol or, less frequently, a ";" (semicolon) was taken as a comment. Subsequent lines starting with a semicolon would be ignored by software. Since the only comment used was the first, it quickly became used to hold a summary description of the sequence, often starting with a unique library accession number, and with time it has become commonplace to always use ">" for the first line and to not use ";" comments (which would otherwise be ignored).
166
167```
168;LCBO - Prolactin precursor - Bovine
169; a sample sequence in FASTA format
170MDSKGSSQKGSRLLLLLVVSNLLLCQGVVSTPVCPNGPGNCQVSLRDLFDRAVMVSHYIHDLSS
171EMFNEFDKRYAQGKGFITMALNSCHTSSLPTPEDKEQAQQTHHEVLMSLILGLLRSWNDPLYHL
172VTEVRGMKGAPDAILSRAIEIEEENKRLLEGMEMIFGQVIPGAKETEPYPVWSGLPSLQTKDED
173ARYSAFYNLLHCLRRDSSKIDTYLKLLNCRIIYNNNC*
174
175>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
176ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID
177FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
178DIDGDGQVNYEEFVQMMTAK*
179
180>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
181LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
182EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
183LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
184GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
185IENY
186```
187
188FASTA format was extended by [FASTQ](https://en.wikipedia.org/wiki/FASTQ_format) format from the [Sanger Centre](https://www.sanger.ac.uk/) in Cambridge.
189
190### PNG encoded DNA sequence
191
192| Nucleotides | RGB | Color name |
193| ------------ | ----------- | ---------- |
194| A (Adenine) | (0,0,255) | Blue |
195| G (Guanine) | (0,100,0) | Green |
196| C (Cytosine) | (255,0,0) | Red |
197| T (Thymine) | (255,255,0) | Yellow |
198
199With this in mind we can create a simple algorithm to create PNG representation of a DNA sequence.
200
201```pascal
202{ Algorithm 2: Naive DNA to PNG encode from FASTA file }
203procedure EncodeDNASequenceToPNG(f)
204begin
205 i image
206 while not eof(f) do
207 c char := buffer[0] { Read 1 char from buffer }
208 case c of
209 'A': color := RGB(0, 0, 255) { Blue }
210 'G': color := RGB(0, 100, 0) { Green }
211 'C': color := RGB(255, 0, 0) { Red }
212 'T': color := RGB(255, 255, 0) { Yellow }
213 drawRect(i, [x, y], color)
214 save(i) { Save PNG image }
215end
216```
217
218## Encoding text file in practice
219
220In this example we will take a simple text file as our input stream for encoding. This file will have a quote from Niels Bohr and saved as txt file.
221
222> How wonderful that we have met with a paradox. Now we have some hope of making progress.
223> ― Niels Bohr
224
225First we encode text file into FASTA file.
226
227```bash
228./dnae-encode -i quote.txt -o quote.fa
2292019/01/10 00:38:29 Gathering input file stats
2302019/01/10 00:38:29 Starting encoding ...
231 106 B / 106 B [==================================] 100.00% 0s
2322019/01/10 00:38:29 Saving to FASTA file ...
2332019/01/10 00:38:29 Output FASTA file length is 438 B
2342019/01/10 00:38:29 Process took 987.263µs
2352019/01/10 00:38:29 Done ...
236```
237
238Output of `quote.fa` file contains the encoded DNA sequence in ASCII format.
239
240```
241>SEQ1
242GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA
243GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA
244ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA
245ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT
246GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT
247GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC
248AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC
249AACC
250```
251
252Then we encode FASTA file from previous operation to encode this data into PNG.
253
254```bash
255./dnae-png -i quote.fa -o quote.png
2562019/01/10 00:40:09 Gathering input file stats ...
2572019/01/10 00:40:09 Deconstructing FASTA file ...
2582019/01/10 00:40:09 Compositing image file ...
259 424 / 424 [==================================] 100.00% 0s
2602019/01/10 00:40:09 Saving output file ...
2612019/01/10 00:40:09 Output image file length is 1.1 kB
2622019/01/10 00:40:09 Process took 19.036117ms
2632019/01/10 00:40:09 Done ...
264```
265
266After encoding into PNG format this file looks like this.
267
268![Encoded Quote in PNG format](/files/dna-sequence/quote.png)
269
270The larger the input stream is the larger the PNG file would be.
271
272Compiled basic Hello World C program with [GCC](https://www.gnu.org/software/gcc/) would [look like](/files/dna-sequence/sample.png).
273
274```c
275// gcc -O3 -o sample sample.c
276#include <stdio.h>
277
278main() {
279 printf("Hello, world!\n");
280 return 0;
281}
282```
283
284## Toolkit for encoding data
285
286I have created a toolkit with two main programs:
287- dnae-encode (encodes file into FASTA file)
288- dnae-png (encodes FASTA file into PNG)
289
290Toolkit with full source code is available on [github.com/mitjafelicijan/dna-encoding](https://github.com/mitjafelicijan/dna-encoding).
291
292### dnae-encode
293
294```bash
295> ./dnae-encode --help
296usage: dnae-encode --input=INPUT [<flags>]
297
298A command-line application that encodes file into DNA sequence.
299
300Flags:
301 --help Show context-sensitive help (also try --help-long and --help-man).
302 -i, --input=INPUT Input file (ASCII or binary) which will be encoded into DNA sequence.
303 -o, --output="out.fa" Output file which stores DNA sequence in FASTA format.
304 -s, --sequence=SEQ1 The description line (defline) or header/identifier line, gives a name and/or a unique identifier for the sequence.
305 -c, --columns=60 Row characters length (no more than 120 characters). Devices preallocate fixed line sizes in software.
306 --version Show application version.
307```
308
309### dnae-png
310
311```bash
312> ./dnae-png --help
313usage: dnae-png --input=INPUT [<flags>]
314
315A command-line application that encodes FASTA file into PNG image.
316
317Flags:
318 --help Show context-sensitive help (also try --help-long and --help-man).
319 -i, --input=INPUT Input FASTA file which will be encoded into PNG image.
320 -o, --output="out.png" Output file in PNG format that represents DNA sequence in graphical way.
321 -s, --size=10 Size of pairings of DNA bases on image in pixels (lower resolution lower file size).
322 --version Show application version.
323```
324
325## Benchmarks
326
327First we generate some binary sample data with dd.
328
329```bash
330dd if=<(openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero) of=1KB.bin bs=1KB count=1 iflag=fullblock
331```
332
333Our freshly generated 1KB file looks something like this (its full of garbage data as intended).
334
335![Sample binary file 1KB](/files/dna-sequence/sample-binary-file.png)
336
337We create following binary files:
338- 1KB.bin
339- 10KB.bin
340- 100KB.bin
341- 1MB.bin
342- 10MB.bin
343- 100MB.bin
344
345After this we create FASTA files for all the binary files by encoding them into DNA sequence.
346
347```bash
348./dnae-encode -i 100MB.bin -o 100MB.fa
349```
350
351Then we GZIP all the FASTA files to see how much the can be compressed.
352
353```bash
354gzip -9 < 10MB.fa > 10MB.fa.gz
355```
356
357<script src="/assets/plotly-latest.min.js"></script>
358
359**Speed of encoding binary file into FASTA format.**
360
361<div id="encoding-benchmarks"></div>
362<script>
363(function(){
364 var trace1 = {
365 x: ['1KB.bin', '10KB.bin', '100KB.bin', '1MB.bin', '10MB.bin', '100MB.bin'],
366 y: [5.625224, 32.679975, 112.864416, 872.887675, 8472.693202, 85525.178217],
367 type: 'scatter',
368 };
369 var data = [trace1];
370 Plotly.newPlot("encoding-benchmarks", data, {
371 legend: {"orientation": "h"},
372 height: 300,
373 margin: { l: 50, r: 0, b: 50, t: 30, pad: 0 },
374 yaxis: { title: "execution time in milliseconds", titlefont: { size: 12 } },
375 });
376})();
377</script>
378
379**File sizes of encoded files and also GZIP-ed variations.**
380
381<div id="size-benchmarks"></div>
382<script>
383(function(){
384 var trace1 = {
385 x: ['1KB.bin', '10KB.bin', '100KB.bin', '1MB.bin', '10MB.bin', '100MB.bin'],
386 y: [4.1, 40.7, 406.7, 4100, 40700, 406700],
387 name: 'FASTA file size',
388 type: 'bar',
389 };
390 var trace2 = {
391 x: ['1KB.bin', '10KB.bin', '100KB.bin', '1MB.bin', '10MB.bin', '100MB.bin'],
392 y: [1.4, 13, 121, 1200, 12000, 118000],
393 name: 'FASTA GZIPPED file size',
394 type: 'bar',
395 };
396 var data = [trace1, trace2];
397 Plotly.newPlot("size-benchmarks", data, {
398 legend: {"orientation": "h"},
399 height: 300,
400 margin: { l: 50, r: 0, b: 50, t: 30, pad: 0 },
401 yaxis: { title: "size in kilobytes", titlefont: { size: 12 } },
402 barmode: 'stack'
403 });
404})();
405</script>
406
407[Download ODS file with benchmarks.](/files/dna-sequence/benchmarks.ods).
408
409## References
410
411[^1]: https://www.techopedia.com/definition/948/encoding
412[^2]: https://www.dna-worldwide.com/resource/160/history-dna-timeline
413[^3]: https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/
414[^4]: https://arxiv.org/abs/1801.04774
415[^5]: https://en.wikipedia.org/wiki/FASTA_format