diff options
| author | Mitja Felicijan <mitja.felicijan@gmail.com> | 2021-01-24 01:42:03 +0100 |
|---|---|---|
| committer | Mitja Felicijan <mitja.felicijan@gmail.com> | 2021-01-24 01:42:03 +0100 |
| commit | e07ab67bf95ea7e65828e373c731b6cdf984a7de (patch) | |
| tree | 4fe471a1a8492149bb0b3e6ec726184e3bcf1647 /posts | |
| parent | 36fb49bbef11294a93a53c363d32c2134f6b19b4 (diff) | |
| download | mitjafelicijan.com-e07ab67bf95ea7e65828e373c731b6cdf984a7de.tar.gz | |
Moved to altenator and DO
Diffstat (limited to 'posts')
18 files changed, 2725 insertions, 0 deletions
diff --git a/posts/2015-03-07-curriculum-vitae.md b/posts/2015-03-07-curriculum-vitae.md new file mode 100644 index 0000000..bb082e7 --- /dev/null +++ b/posts/2015-03-07-curriculum-vitae.md | |||
| @@ -0,0 +1,72 @@ | |||
| 1 | --- | ||
| 2 | Title: Curriculum Vitae | ||
| 3 | Description: Curriculum Vitae | ||
| 4 | Slug: curriculum-vitae | ||
| 5 | Listing: false | ||
| 6 | Created: "" | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | **Mitja Felicijan** | ||
| 11 | |||
| 12 | *[m@mitjafelicijan.com](mailto:m@mitjafelicijan.com?subject=Website+CV+Contact)* | ||
| 13 | |||
| 14 | *Slovenia, EU* | ||
| 15 | |||
| 16 | ## Technical experience | ||
| 17 | |||
| 18 | - **Key languages:** Golang, Python, C, Bash. | ||
| 19 | - **Platforms:** GNU/Linux, macOS. | ||
| 20 | - **Interests:** Zigbee, KNX, Modbus, Machine to Machine, Embedded systems, Operating systems, Distributed systems, IOT, RDBMS, Algorithms, Database engine design, SQL, NoSQL, NewSQL, Big data analytics, Machine learning, Prediction algorithms, Realtime analytics, Systems automation, Natural language processing, Bioinformatics. | ||
| 21 | |||
| 22 | ## Major projects | ||
| 23 | |||
| 24 | - SMS marketing system (2007) | ||
| 25 | - Yacht management software (2008) | ||
| 26 | - Smart Home Gateway (2009) | ||
| 27 | - Moxa UPort 1130 USB to RS485 Universal Linux driver (2009) | ||
| 28 | - Remote management of electricity meter (2009) | ||
| 29 | - Remote management of blood pressure monitor (2010) | ||
| 30 | - Infomat automation system (2010) | ||
| 31 | - GPS Tourist - GIS Software (2011) | ||
| 32 | - Minimal GNU/Linux distribution for embedded platforms (2011) | ||
| 33 | - Digital Jukebox system (2012) | ||
| 34 | - NanoCloudLogger - Machine to Machine (2012) | ||
| 35 | - Street Lightning System (2012) | ||
| 36 | - Smart cabins with hardware sensor management (2013) | ||
| 37 | - Contextual advertising server (2015) | ||
| 38 | - Network accessible database engine for caching and in-memory storage (2016) | ||
| 39 | - Tick database engine specifically designed for storing and processing large amount of sensor data with high write throughput (2016) | ||
| 40 | - Wireless industrial lighting management system - hardware and software (2016) | ||
| 41 | - Minimal configuration reverse proxy (2017) | ||
| 42 | - Industrial IOT platform for deployment on on-premise (2018) | ||
| 43 | - Custom Platform as a service based on Docker Swarm (2018) | ||
| 44 | - Toolkit for encoding binary data into DNA sequence (2019) | ||
| 45 | - Minimal configuration reverse proxy with load balancing and rate limiting (2019) | ||
| 46 | - E-ink conference room occupancy display, hardware and software solution (2019) | ||
| 47 | |||
| 48 | ## Employment history | ||
| 49 | |||
| 50 | - Freelancer (2001 – Present) | ||
| 51 | - Software developer at Mobinia (2005 – 2007) | ||
| 52 | - CTO at Milk (2007 – 2009) | ||
| 53 | - Co-Founder of UTS (2009 – 2014) | ||
| 54 | - Senior Software Engineer at TSmedia (2015 - 2017) | ||
| 55 | - Senior Software Engineer at Renderspace (2017 - 2019) | ||
| 56 | - IT Consultant (2017 – Present) | ||
| 57 | |||
| 58 | ## Awards | ||
| 59 | |||
| 60 | - Regional Award for Innovation by Chamber of Commerce and Industry of Slovenia for project Intelligent system management and regulation of Street Lighting, 2010 | ||
| 61 | - National Award for Innovation by Chamber of Commerce and Industry of Slovenia for project Intelligent system management and regulation of Street Lighting, 2010 | ||
| 62 | |||
| 63 | ## Key responsibilities | ||
| 64 | |||
| 65 | - Embedded platform development. | ||
| 66 | - Hardware design and driver development. | ||
| 67 | - Designing, developing and testing systems. | ||
| 68 | - Implementation of the systems. | ||
| 69 | - Writing and maintaining user and technical documents. | ||
| 70 | - Development and maintenance of the project. | ||
| 71 | - Code revision, testing and output. | ||
| 72 | - Work on the enhancement suggested by the customers and fixes the bugs reported. | ||
diff --git a/posts/2017-03-07-golang-profiling-simplified.md b/posts/2017-03-07-golang-profiling-simplified.md new file mode 100644 index 0000000..8059aec --- /dev/null +++ b/posts/2017-03-07-golang-profiling-simplified.md | |||
| @@ -0,0 +1,113 @@ | |||
| 1 | --- | ||
| 2 | Title: Golang profiling simplified | ||
| 3 | Description: Golang profiling simplified | ||
| 4 | Slug: golang-profiling-simplified | ||
| 5 | Listing: true | ||
| 6 | Created: 2017, March 7 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | Many posts have been written regarding profiling in Golang and I haven’t found proper tutorial regarding this. Almost all of them are missing some part of important information and it gets pretty frustrating when you have a deadline and are not finding simple distilled solution. | ||
| 11 | |||
| 12 | Nevertheless, after searching and experimenting I have found a solution that works for me and probably should also for you. | ||
| 13 | |||
| 14 | ## Where are my pprof files? | ||
| 15 | |||
| 16 | By default pprof files are generated in /tmp/ folder. You can override folder where this files are generated programmatically in your golang code as we will see below in example. | ||
| 17 | |||
| 18 | ## Why is my CPU profile empty? | ||
| 19 | |||
| 20 | I have found out that sometimes CPU profile is empty because program was not executing long enough. Programs, that execute too quickly don’t produce pprof file in my cases. Well, file is generated but only contains 4KB of information. | ||
| 21 | |||
| 22 | ## Profiling | ||
| 23 | |||
| 24 | As you can see from examples we are executing dummy_benchmark functions to ensure some sort of execution. Memory profiling can be done without such a “complex” function. But CPU profiling needs it. | ||
| 25 | |||
| 26 | Both memory and CPU profiling examples are almost the same. Only parameters in main function when calling profile.Start are different. When we set profile.ProfilePath(“.”) we tell profiler to store pprof files in the same folder as our program. | ||
| 27 | |||
| 28 | ### Memory profiling | ||
| 29 | |||
| 30 | ```go | ||
| 31 | package main | ||
| 32 | |||
| 33 | import ( | ||
| 34 | "fmt" | ||
| 35 | "time" | ||
| 36 | "github.com/pkg/profile" | ||
| 37 | ) | ||
| 38 | |||
| 39 | func dummy_benchmark() { | ||
| 40 | |||
| 41 | fmt.Println("first set ...") | ||
| 42 | for i := 0; i < 918231333; i++ { | ||
| 43 | i *= 2 | ||
| 44 | i /= 2 | ||
| 45 | } | ||
| 46 | |||
| 47 | <-time.After(time.Second*3) | ||
| 48 | |||
| 49 | fmt.Println("sencond set ...") | ||
| 50 | for i := 0; i < 9182312232; i++ { | ||
| 51 | i *= 2 | ||
| 52 | i /= 2 | ||
| 53 | } | ||
| 54 | } | ||
| 55 | |||
| 56 | func main() { | ||
| 57 | defer profile.Start(profile.MemProfile, profile.ProfilePath("."), profile.NoShutdownHook).Stop() | ||
| 58 | dummy_benchmark() | ||
| 59 | } | ||
| 60 | ``` | ||
| 61 | |||
| 62 | ### CPU profiling | ||
| 63 | |||
| 64 | ```go | ||
| 65 | package main | ||
| 66 | |||
| 67 | import ( | ||
| 68 | "fmt" | ||
| 69 | "time" | ||
| 70 | "github.com/pkg/profile" | ||
| 71 | ) | ||
| 72 | |||
| 73 | func dummy_benchmark() { | ||
| 74 | |||
| 75 | fmt.Println("first set ...") | ||
| 76 | for i := 0; i < 918231333; i++ { | ||
| 77 | i *= 2 | ||
| 78 | i /= 2 | ||
| 79 | } | ||
| 80 | |||
| 81 | <-time.After(time.Second*3) | ||
| 82 | |||
| 83 | fmt.Println("sencond set ...") | ||
| 84 | for i := 0; i < 9182312232; i++ { | ||
| 85 | i *= 2 | ||
| 86 | i /= 2 | ||
| 87 | } | ||
| 88 | } | ||
| 89 | |||
| 90 | func main() { | ||
| 91 | defer profile.Start(profile.CPUProfile, profile.ProfilePath("."), profile.NoShutdownHook).Stop() | ||
| 92 | dummy_benchmark() | ||
| 93 | } | ||
| 94 | ``` | ||
| 95 | |||
| 96 | ### Generating profiling reports | ||
| 97 | |||
| 98 | ```bash | ||
| 99 | # memory profiling | ||
| 100 | go build mem.go | ||
| 101 | ./mem | ||
| 102 | go tool pprof -pdf ./mem mem.pprof > mem.pdf | ||
| 103 | |||
| 104 | # cpu profiling | ||
| 105 | go build cpu.go | ||
| 106 | ./cpu | ||
| 107 | go tool pprof -pdf ./cpu cpu.pprof > cpu.pdf | ||
| 108 | ``` | ||
| 109 | |||
| 110 | This will generate PDF document with visualized profile. | ||
| 111 | |||
| 112 | - [Memory PDF profile example](/assets/go-profiling/golang-profiling-mem.pdf) | ||
| 113 | - [CPU PDF profile example](/assets/go-profiling/golang-profiling-cpu.pdf) | ||
diff --git a/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md b/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md new file mode 100644 index 0000000..90fe238 --- /dev/null +++ b/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md | |||
| @@ -0,0 +1,136 @@ | |||
| 1 | --- | ||
| 2 | Title: What I've learned developing ad server | ||
| 3 | Description: Lessons I learned developing contextual ad server | ||
| 4 | Slug: what-i-ve-learned-developing-ad-server | ||
| 5 | Listing: true | ||
| 6 | Created: 2017, April 17 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | For the past year and half I have been developing native advertising server that contextually matches ads and displays them in different template forms on variety of websites. This project grew from serving thousands of ads per day to millions. | ||
| 11 | |||
| 12 | The system is made from couple of core components: | ||
| 13 | |||
| 14 | - API for serving ads, | ||
| 15 | - Utils - cronjobs and queue management tools, | ||
| 16 | - Dashboard UI. | ||
| 17 | |||
| 18 | Initial release was using [MongoDB](https://www.mongodb.com/) for full-text search but was later replaced by [Elasticsearch](https://www.elastic.co/) for better CPU utilization and better search performance. This provided us with many amazing functionalities of [Elasticsearch](https://www.elastic.co/). You should check it out if you do any search related operations. | ||
| 19 | |||
| 20 | Because the premise of the server is to provide native ad experience, they are rendered on the client side via simple templating engine. This ensures that ads can be displayed number of different ways based on the visual style of the page. And this makes JavaScript client library quite complex. | ||
| 21 | |||
| 22 | So now that you know basic information about the product lets get into the lessons we learned. | ||
| 23 | |||
| 24 | ## Aggregate everything | ||
| 25 | |||
| 26 | After beta version was released everything (impressions, clicks, etc) was written in nanosecond resolution in the database. At that time we were using [PostgreSQL](https://www.postgresql.org/) and database quickly grew way above 200GB in disk space. And that was problematic. Statistics took disturbingly long time to aggregate. Also using indexes on stats table in database was no help after we reached 500 million datapoints. | ||
| 27 | |||
| 28 | > There is a marketing product information and there is real life experience. And the tend to be quite the opposite. | ||
| 29 | |||
| 30 | This was the reason that now everything is aggregated on daily basis and this data is then fed to Elastic in form of daily summary. With this we achieved we can now track many more dimensions such as zone, channel and platform information. And with this information we can now adapt occurrences of ads on specific places more precisely. | ||
| 31 | |||
| 32 | We have also adapted [Redis](https://redis.io/) as a full-time citizen in our stack. Because Redis also stores information on a local disk we have some sort of backup if server would accidentally suffer some failure. | ||
| 33 | |||
| 34 | All the real-time statistics for ad serving and redirecting is presented as counters in Redis instance and daily extracted and pushed to Elastic. | ||
| 35 | |||
| 36 | ## Measure everything | ||
| 37 | |||
| 38 | The thing about software is that we really don't know how well it is performing under load until such load is presented. When testing locally everything is fine but when on production things tend to fall apart. | ||
| 39 | |||
| 40 | As a solution for this we are measuring everything we can. Function execution time (by encapsulating functions with timers), server performance (cpu, memory, disk, etc), Nginx and [uWSGI](https://uwsgi-docs.readthedocs.io/) performance. We sacrifice a bit of performance for the sake of this information. And we store all this information for later analysis. | ||
| 41 | |||
| 42 | **Example of function execution time** | ||
| 43 | |||
| 44 | ```json | ||
| 45 | { | ||
| 46 | "get_final_filtered_ads": { | ||
| 47 | "counter": 1931250, | ||
| 48 | "avg": 0.0066143431, | ||
| 49 | "elapsed": 12773.9500310003 | ||
| 50 | }, | ||
| 51 | "store_keywords_statistics": { | ||
| 52 | "counter": 1931011, | ||
| 53 | "avg": 0.0004605267, | ||
| 54 | "elapsed": 889.2821669996 | ||
| 55 | }, | ||
| 56 | "match_by_context": { | ||
| 57 | "counter": 1931011, | ||
| 58 | "avg": 0.0055960716, | ||
| 59 | "elapsed": 10806.0758889999 | ||
| 60 | }, | ||
| 61 | "match_by_high_performance": { | ||
| 62 | "counter": 262, | ||
| 63 | "avg": 0.0152770229, | ||
| 64 | "elapsed": 4.00258 | ||
| 65 | }, | ||
| 66 | "store_impression_stats": { | ||
| 67 | "counter": 1931250, | ||
| 68 | "avg": 0.0006189991, | ||
| 69 | "elapsed": 1195.4419869999 | ||
| 70 | } | ||
| 71 | } | ||
| 72 | ``` | ||
| 73 | |||
| 74 | We have also started profiling with [cProfile](https://pymotw.com/2/profile/) and then visualizing with [KCachegrind](http://kcachegrind.sourceforge.net/). This provides much more detailed look into code execution. | ||
| 75 | |||
| 76 | ## Cache control is your friend | ||
| 77 | |||
| 78 | Because we use Javascript library for rendering ads we rely on this script extensively and when in need we need to be able to change behavior of the script quickly. | ||
| 79 | |||
| 80 | In our case we can not simply replace javascript url in html code. It usually takes a day or two for the guys who maintain sites to change code or add ?ver=xxx attribute. And this makes rapid deployment and testing very difficult and time consuming. There is a limitation of how much you can test locally. | ||
| 81 | |||
| 82 | We are now in the process of integrating [Google Tag Manager](https://www.google.com/analytics/tag-manager/) but couple of websites are developed on ASP.net platform that have some problems with tag manager. With a solution below we are certain that we are serving latest version of the script. | ||
| 83 | |||
| 84 | And it only takes one mistake and users have the script cached and in case of caching it for 1 year you probably know where the problem is. | ||
| 85 | |||
| 86 | ```nginx | ||
| 87 | # nginx ➜ /etc/nginx/sites-available/default | ||
| 88 | location /static/ { | ||
| 89 | alias /path-to-static-content/; | ||
| 90 | autoindex off; | ||
| 91 | charset utf-8; | ||
| 92 | gzip on; | ||
| 93 | gzip_types text/plain application/javascript application/x-javascript text/javascript text/xml text/css; | ||
| 94 | location ~* \.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)$ { | ||
| 95 | expires 1y; | ||
| 96 | add_header Pragma public; | ||
| 97 | add_header Cache-Control "public"; | ||
| 98 | } | ||
| 99 | location ~* \.(css|js|txt)$ { | ||
| 100 | expires 3600s; | ||
| 101 | add_header Pragma public; | ||
| 102 | add_header Cache-Control "public, must-revalidate"; | ||
| 103 | } | ||
| 104 | } | ||
| 105 | ``` | ||
| 106 | |||
| 107 | Also be careful when redirecting to url in your python code. We noticed that if we didn't precisely setup cache control and expire headers in response we didn't get the request on the server and therefore couldn't measure clicks. So when redirecting do as follows and there will be no problems. | ||
| 108 | |||
| 109 | ```python | ||
| 110 | # python ➜ bottlepy web micro-framework | ||
| 111 | response = bottle.HTTPResponse(status=302) | ||
| 112 | response.set_header("Cache-Control", "no-store, no-cache, must-revalidate") | ||
| 113 | response.set_header("Expires", "Thu, 01 Jan 1970 00:00:00 GMT") | ||
| 114 | response.set_header("Location", url) | ||
| 115 | return response | ||
| 116 | ``` | ||
| 117 | |||
| 118 | > Cache control in browsers is quite aggressive and you need to be precise to avoid future problems. We learned that lesson the hard way. | ||
| 119 | |||
| 120 | ## Learn NGINX | ||
| 121 | |||
| 122 | When deciding on a web server we went with Nginx as a reverse proxy for our applications. We adapted micro-service oriented architecture early in the project to ensure when we scale we can easily add additional servers to our cluster. And Nginx was crucial to perform load balancing and static content delivery. | ||
| 123 | |||
| 124 | At first our config file was quite simple and later grew larger. After patching and adding new settings I sat down and learned more about the guts of Nginx. This proved to be very useful and we were able to squeeze much more out of our setup. So I advise you to take your time and read through the [documentation](https://nginx.org/en/docs/). This saved us a lot of headache. Googling for solutions only goes so far. | ||
| 125 | |||
| 126 | ## Use Redis/Memcached | ||
| 127 | |||
| 128 | As explained above we are using caching basically for everything. It is the corner stone of our services. At first we were very careful about the quantity of things we stored in [Redis](https://redis.io/). But we later found out that the memory footprint is very low even when storing large amount of data in it. | ||
| 129 | |||
| 130 | So we gradually increased our usage to caching whole HTML outputs of dashboard. This improved our performance in order of magnitude. And by using native TTL support this goes hand in hand with our needs. | ||
| 131 | |||
| 132 | The reason why we choose [Redis](https://redis.io/) over [Memcached](https://memcached.org/) was the nature of scalability of Redis out of the box. But all this can be achieved with Memcached. | ||
| 133 | |||
| 134 | ## Conclusion | ||
| 135 | |||
| 136 | There are a lot more details that could have been written and every single topic in here deserves it's own post but you probably got the idea about the problems we faced. | ||
diff --git a/posts/2017-04-21-profiling-python-web-applications-with-visual-tools.md b/posts/2017-04-21-profiling-python-web-applications-with-visual-tools.md new file mode 100644 index 0000000..af2c65a --- /dev/null +++ b/posts/2017-04-21-profiling-python-web-applications-with-visual-tools.md | |||
| @@ -0,0 +1,187 @@ | |||
| 1 | --- | ||
| 2 | Title: Profiling Python web applications with visual tools | ||
| 3 | Description: Missing link when debugging and profiling python web application | ||
| 4 | Slug: profiling-python-web-applications-with-visual-tools | ||
| 5 | Listing: true | ||
| 6 | Created: 2017, April 21 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | I have been profiling my software with KCachegrind for a long time now and I was missing this option when I am developing API's or other web services. I always knew that this is possible but never really took the time and dive into it. | ||
| 11 | |||
| 12 | Before we begin there are some requirements. We will need to: | ||
| 13 | |||
| 14 | - implement [cProfile](https://docs.python.org/2/library/profile.html#module-cProfile) into our web app, | ||
| 15 | - convert output to [callgrind](http://valgrind.org/docs/manual/cl-manual.html) format with [pyprof2calltree](https://pypi.python.org/pypi/pyprof2calltree/), | ||
| 16 | - visualize data with [KCachegrind](http://kcachegrind.sourceforge.net/html/Home.html) or [Profiling Viewer](http://www.profilingviewer.com/). | ||
| 17 | |||
| 18 | |||
| 19 | If you are using MacOS you should check out [Profiling Viewer](http://www.profilingviewer.com/) or [MacCallGrind](http://www.maccallgrind.com/). | ||
| 20 | |||
| 21 |  | ||
| 22 | |||
| 23 | We will be dividing this post into two main categories: | ||
| 24 | |||
| 25 | - writing simple web-service, | ||
| 26 | - visualize profile of this web-service. | ||
| 27 | |||
| 28 | ## Simple web-service | ||
| 29 | |||
| 30 | Let's use virtualenv so we won't pollute our base system. If you don't have virtualenv installed on your system you can install it with pip command. | ||
| 31 | |||
| 32 | ```bash | ||
| 33 | # let's install virtualenv globally | ||
| 34 | $ sudo pip install virtualenv | ||
| 35 | |||
| 36 | # let's also install pyprof2calltree globally | ||
| 37 | $ sudo pip install pyprof2calltree | ||
| 38 | |||
| 39 | # now we create project | ||
| 40 | $ mkdir demo-project | ||
| 41 | $ cd demo-project/ | ||
| 42 | |||
| 43 | # now let's create folder where we will store profiles | ||
| 44 | $ mkdir prof | ||
| 45 | |||
| 46 | # now we create empty virtualenv in venv/ folder | ||
| 47 | $ virtualenv --no-site-packages venv | ||
| 48 | |||
| 49 | # we now need to activate virtualenv | ||
| 50 | $ source venv/bin/activate | ||
| 51 | |||
| 52 | # you can check if virtualenv was correctly initialized by | ||
| 53 | # checking where your python interpreter is located | ||
| 54 | # if command bellow points to your created directory and not some | ||
| 55 | # system dir like /usr/bin/python then everything is fine | ||
| 56 | $ which python | ||
| 57 | |||
| 58 | # we can check now if all is good ➜ if ok couple of | ||
| 59 | # lines will be displayed | ||
| 60 | $ pip freeze | ||
| 61 | # appdirs==1.4.3 | ||
| 62 | # packaging==16.8 | ||
| 63 | # pyparsing==2.2.0 | ||
| 64 | # six==1.10.0 | ||
| 65 | |||
| 66 | # now we are ready to install bottlepy ➜ web micro-framework | ||
| 67 | $ pip install bottle | ||
| 68 | |||
| 69 | # you can deactivate virtualenv but you will then go | ||
| 70 | # under system domain ➜ for now don't deactivate | ||
| 71 | $ deactivate | ||
| 72 | ``` | ||
| 73 | |||
| 74 | We are now ready to write simple web service. Let's create file app.py and paste code bellow in this newly created file. | ||
| 75 | |||
| 76 | ```python | ||
| 77 | # -*- coding: utf-8 -*- | ||
| 78 | |||
| 79 | import bottle | ||
| 80 | import random | ||
| 81 | import cProfile | ||
| 82 | |||
| 83 | app = bottle.Bottle() | ||
| 84 | |||
| 85 | # this function is a decorator and encapsulates function | ||
| 86 | # and performs profiling and then saves it to subfolder | ||
| 87 | # prof/function-name.prof | ||
| 88 | # in our example only awesome_random_number function will | ||
| 89 | # be profiled because it has do_cprofile defined | ||
| 90 | def do_cprofile(func): | ||
| 91 | def profiled_func(*args, **kwargs): | ||
| 92 | profile = cProfile.Profile() | ||
| 93 | try: | ||
| 94 | profile.enable() | ||
| 95 | result = func(*args, **kwargs) | ||
| 96 | profile.disable() | ||
| 97 | return result | ||
| 98 | finally: | ||
| 99 | profile.dump_stats("prof/" + str(func.__name__) + ".prof") | ||
| 100 | return profiled_func | ||
| 101 | |||
| 102 | |||
| 103 | # we use profiling over specific function with including | ||
| 104 | # @do_cprofile above function declaration | ||
| 105 | @app.route("/") | ||
| 106 | @do_cprofile | ||
| 107 | def awesome_random_number(): | ||
| 108 | awesome_random_number = random.randint(0, 100) | ||
| 109 | return "awesome random number is " + str(awesome_random_number) | ||
| 110 | |||
| 111 | @app.route("/test") | ||
| 112 | def test(): | ||
| 113 | return "dummy test" | ||
| 114 | |||
| 115 | if __name__ == '__main__': | ||
| 116 | bottle.run( | ||
| 117 | app = app, | ||
| 118 | host = "0.0.0.0", | ||
| 119 | port = 4000 | ||
| 120 | ) | ||
| 121 | |||
| 122 | # run with 'python app.py' | ||
| 123 | # open browser 'http://0.0.0.0:4000' | ||
| 124 | ``` | ||
| 125 | |||
| 126 | When browser hits awesome\_random\_number() function profile is created in prof/ subfolder. | ||
| 127 | |||
| 128 | ## Visualize profile | ||
| 129 | |||
| 130 | Now let's create callgrind format from this cProfile output. | ||
| 131 | |||
| 132 | ```bash | ||
| 133 | $ cd prof/ | ||
| 134 | $ pyprof2calltree -i awesome_random_number.prof | ||
| 135 | # this creates 'awesome_random_number.prof.log' file in the same folder | ||
| 136 | ``` | ||
| 137 | |||
| 138 | This file can be opened with visualizing tools listed above. In this case we will be using Profilling Viewer under MacOS. You can open image in new tab. As you can see from this example there is hierarchy of execution order of your code. | ||
| 139 | |||
| 140 |  | ||
| 141 | |||
| 142 | > Make sure you convert output of the cProfile output every time you want to refresh and take a look at your possible optimizations because cProfile updates .prof file every time browser hits the function. | ||
| 143 | |||
| 144 | This is just a simple example but when you are developing real-life applications this can be very illuminating, especially to see which parts of your code are bottlenecks and need to be optimized. | ||
| 145 | |||
| 146 | ## Update 2017-04-22 | ||
| 147 | |||
| 148 | Reddit user [mvt](https://www.reddit.com/user/mvt) also recommended this awesome web based profile visualizer [SnakeViz](https://jiffyclub.github.io/snakeviz/) that directly takes output from [cProfile](https://docs.python.org/2/library/profile.html#module-cProfile) module. | ||
| 149 | |||
| 150 | <div class="reddit-embed" data-embed-media="www.redditmedia.com" data-embed-parent="false" data-embed-live="false" data-embed-uuid="583880c1-002e-41ed-a373-020a0ef2cff9" data-embed-created="2017-04-22T19:46:54.810Z"><a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/dgljhsb/">Comment</a> from discussion <a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/">Profiling Python web applications with visual tools</a>.</div><script async src="https://www.redditstatic.com/comment-embed.js"></script> | ||
| 151 | |||
| 152 | ```bash | ||
| 153 | # let's install it globally as well | ||
| 154 | $ sudo pip install snakeviz | ||
| 155 | |||
| 156 | # now let's visualize | ||
| 157 | $ cd prof/ | ||
| 158 | $ snakeviz awesome_random_number.prof | ||
| 159 | # this automatically opens browser window and | ||
| 160 | # shows visualized profile | ||
| 161 | ``` | ||
| 162 | |||
| 163 |  | ||
| 164 | |||
| 165 | Reddit user [ccharles](https://www.reddit.com/user/ccharles) suggested a better way for installing pip software by targeting user level instead of using sudo. | ||
| 166 | |||
| 167 | <div class="reddit-embed" data-embed-media="www.redditmedia.com" data-embed-parent="false" data-embed-live="false" data-embed-uuid="f4f0459e-684d-441e-bebe-eb49b2f0a31d" data-embed-created="2017-04-22T19:46:10.874Z"><a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/dglpzkx/">Comment</a> from discussion <a href="https://www.reddit.com/r/Python/comments/66v373/profiling_python_web_applications_with_visual/">Profiling Python web applications with visual tools</a>.</div><script async src="https://www.redditstatic.com/comment-embed.js"></script> | ||
| 168 | |||
| 169 | ```bash | ||
| 170 | # now we need to add this path to our $PATH variable | ||
| 171 | # we do this my adding this line at the end of your | ||
| 172 | # ~/.bashrc file | ||
| 173 | PATH=$PATH:$HOME/.local/bin/ | ||
| 174 | |||
| 175 | # in order to use this new configuration you can close | ||
| 176 | # and reopen terminal or reload .bashrc file | ||
| 177 | $ source ~/.bashrc | ||
| 178 | |||
| 179 | # now let's test if new directory is present in $PATH | ||
| 180 | $ echo $PATH | ||
| 181 | |||
| 182 | # now we can install on user level by adding --user | ||
| 183 | # without use of sudo | ||
| 184 | $ pip install snakeviz --user | ||
| 185 | ``` | ||
| 186 | |||
| 187 | Or as suggested by [mvt](https://www.reddit.com/user/mvt) you can use [pipsi](https://github.com/mitsuhiko/pipsi). | ||
diff --git a/posts/2017-08-11-simple-iot-application.md b/posts/2017-08-11-simple-iot-application.md new file mode 100644 index 0000000..dee5e74 --- /dev/null +++ b/posts/2017-08-11-simple-iot-application.md | |||
| @@ -0,0 +1,489 @@ | |||
| 1 | --- | ||
| 2 | Title: Simple IOT application supported by real-time monitoring and data history | ||
| 3 | Description: Develop simple IOT application with Arduino MKR1000 and Python | ||
| 4 | Slug: simple-iot-application | ||
| 5 | Listing: true | ||
| 6 | Created: 2017, August 11 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | ## Initial thoughts | ||
| 11 | |||
| 12 | I have been developing these kind of application for the better part of my last 5 years and people keep asking me how to approach developing such application and I will give a try explaining it here. | ||
| 13 | |||
| 14 | IOT applications are really no different than any other kind of applications. We have data that needs to be collected and visualized in some form of tables or charts. The main difference here is that most of the times these data is collected by some kind of device foreign to developer that mainly operates in web domain. But fear not, it's not that different than writing some JavaScript. | ||
| 15 | |||
| 16 | There are many devices able to transmit data via wireless or wired network by default but for the sake of example we will be using commonly known Arduino with wireless module already on the board → [Arduino MKR1000](https://store.arduino.cc/arduino-mkr1000). | ||
| 17 | |||
| 18 | In order to make this little project as accessible to others as possible I will try to make it as inexpensive as possible. And by this I mean that I will avoid using hosted virtual servers and will be using my own laptop as a server. But you must buy Arduino MKR1000 to follow steps below. But if you would want to deploy this software I would suggest using [DigitalOcean](https://www.digitalocean.com) → smallest VPS is only per month making this one of the most affordable option out there. Please notice that this software will not run on stock web hosting that only supports LAMP (Linux, Apache, MySQL, and PHP). | ||
| 19 | |||
| 20 | _But before we begin please take notice that this is strictly experimental code and not well optimized and there are much better ways in handling some aspects of the application but that requires much deeper knowledge of technology that is not needed for an example like this._ | ||
| 21 | |||
| 22 | **Development steps** | ||
| 23 | |||
| 24 | 1. Simple Python API that will receive and store incoming data. | ||
| 25 | 2. Prototype C++ code that will read "sensor data" and transmit it to API. | ||
| 26 | 3. Data visualization with charts → extends Python web application. | ||
| 27 | |||
| 28 | Step 1. and 3. will share the same web application. One route will be dedicated to API and another to serving HTML with chart. | ||
| 29 | |||
| 30 | Schema below represents what we will try to achieve and how different parts correlates to each other. | ||
| 31 | |||
| 32 |  | ||
| 33 | |||
| 34 | ## Simple Python API | ||
| 35 | |||
| 36 | I have always been a fan of simplicity so we will be using [Bottle: Python Web Framework](https://bottlepy.org/docs/dev/). It is a single file web framework that seriously simplifies working with routes, templating and has built-in web server that satisfies our need in this case. | ||
| 37 | |||
| 38 | First we need to install bottle package. This can be done by downloading ```bottle.py``` and placing it in the root of your application or by using pip software ```pip install bottle --user```. | ||
| 39 | |||
| 40 | If you are using Linux or MacOS then Python is already installed. If you will try to test this on Windows please install [Python for Windows](https://www.python.org/downloads/windows/). There may be some problems with path when you will try to launch ```python webapp.py``` so please take care of this before you continue. | ||
| 41 | |||
| 42 | ### Basic web application | ||
| 43 | |||
| 44 | Most basic bottle application is quite simple. Paste code below in ```webapp.py``` file and save. | ||
| 45 | |||
| 46 | ```python | ||
| 47 | # -*- coding: utf-8 -*- | ||
| 48 | |||
| 49 | import bottle | ||
| 50 | |||
| 51 | # initializing bottle app | ||
| 52 | app = bottle.Bottle() | ||
| 53 | |||
| 54 | # triggered when / is accessed from browser | ||
| 55 | # only accepts GET → no POST allowed | ||
| 56 | @app.route("/", method=["GET"]) | ||
| 57 | def route_default(): | ||
| 58 | return "howdy from python" | ||
| 59 | |||
| 60 | # starting server on http://0.0.0.0:5000 | ||
| 61 | if __name__ == "__main__": | ||
| 62 | bottle.run( | ||
| 63 | app = app, | ||
| 64 | host = "0.0.0.0", | ||
| 65 | port = 5000, | ||
| 66 | debug = True, | ||
| 67 | reloader = True, | ||
| 68 | catchall = True, | ||
| 69 | ) | ||
| 70 | ``` | ||
| 71 | |||
| 72 | To run this simple application you should open command prompt or terminal on your machine and go to the folder containing your file and type ```python webapp.py```. If everything goes ok then open your web browser and point it to ```http://0.0.0.0:5000```. | ||
| 73 | |||
| 74 | If you would like change the port of your application (like port 80) and not use root to run your app this will present a problem. The TCP/IP port numbers below 1024 are privileged ports → this is a security feature. So in order of simplicity and security use a port number above 1024 like I have used port 5000. | ||
| 75 | |||
| 76 | If this fails at any time please fix it before you continue, because nothing below will work otherwise. | ||
| 77 | |||
| 78 | We use 0.0.0.0 as default host so that this app is available over your local network. If you find your local ip ```ifconfig``` and try accessing this site with your phone (if on same network/router as your machine) this should work as well (example of such ip ```http://192.168.1.15:5000```). This is a must have because Arduino will be accessing this application to send it's data. | ||
| 79 | |||
| 80 | ### Web application security | ||
| 81 | |||
| 82 | There is a lot to be said about security and is a topic of many books. Of course all this can not be written here but to just establish some basic security → you should always use SSL with your application. Some fantastic free certificates are available by [Let's Encrypt - Free SSL/TLS Certificates](https://letsencrypt.org). With SSL certificate installed you should then make use of HTTP headers and send your "API key" via a header. If your key is send via header then this key is encrypted by SSL and send encrypted over the network. Never send your api keys by GET parameter like ```http://example.com/?api_key=somekeyvalue```. The problem that this kind of sending presents is that this key is visible in logs and by network sniffers. | ||
| 83 | |||
| 84 | There is a fantastic article describing some aspects about security: [11 Web Application Security Best Practices](https://www.keycdn.com/blog/web-application-security-best-practices/). Please check it out. | ||
| 85 | |||
| 86 | ### Simple API for writing data-points | ||
| 87 | |||
| 88 | We will now be using boilerplate code from example above and extend it to be able to write data received by API to local storage. For example use I will use SQLite3 because it plays well with Python and can store quite large amount of data. I have been using it to collect gigabytes of data in a single database without any corruption or problems → your experience may vary. | ||
| 89 | |||
| 90 | To avoid learning SQLite I will be using [Dataset: databases for lazy people](https://dataset.readthedocs.io/en/latest/index.html). This package abstracts SQL and simplifies writing and reading data from database. You should install this package with pip software ```pip install dataset --user```. | ||
| 91 | |||
| 92 | Because API will use POST method I will be testing if code works correctly by using [Restlet Client for Google Chrome](https://chrome.google.com/webstore/detail/restlet-client-rest-api-t/aejoelaoggembcahagimdiliamlcdmfm). This software also allows you to set headers → for basic security with API_KEY. | ||
| 93 | |||
| 94 | To quickly generate passwords or API keys I usually use this nifty website [RandomKeygen](https://randomkeygen.com/). | ||
| 95 | |||
| 96 | Copy and paste code below over your previous code in file ```webapp.py```. | ||
| 97 | |||
| 98 | ```python | ||
| 99 | # -*- coding: utf-8 -*- | ||
| 100 | |||
| 101 | import time | ||
| 102 | import bottle | ||
| 103 | import random | ||
| 104 | import dataset | ||
| 105 | |||
| 106 | # initializing bottle app | ||
| 107 | app = bottle.Bottle() | ||
| 108 | |||
| 109 | # connects to sqlite database | ||
| 110 | # check_same_thread=False allows using it in multi-threaded mode | ||
| 111 | app.config["dsn"] = dataset.connect("sqlite:///data.db?check_same_thread=False") | ||
| 112 | |||
| 113 | # api key that will be used in Arduino code | ||
| 114 | app.config["api_key"] = "JtF2aUE5SGHfVJBCG5SH" | ||
| 115 | |||
| 116 | # triggered when /api is accessed from browser | ||
| 117 | # only accepts POST → no GET allowed | ||
| 118 | @app.route("/api", method=["POST"]) | ||
| 119 | def route_default(): | ||
| 120 | status = 400 | ||
| 121 | ts = int(time.time()) # current timestamp | ||
| 122 | value = bottle.request.body.read() # data from device | ||
| 123 | api_key = bottle.request.get_header("Api_Key") # api key from header | ||
| 124 | |||
| 125 | # outputs to console received data for debug reason | ||
| 126 | print ">>> {} :: {}".format(value, api_key) | ||
| 127 | |||
| 128 | # if api_key is correct and value is present | ||
| 129 | # then writes attribute to point table | ||
| 130 | if api_key == app.config["api_key"] and value: | ||
| 131 | app.config["dsn"]["point"].insert(dict(ts=ts, value=value)) | ||
| 132 | status = 200 | ||
| 133 | |||
| 134 | # we only need to return status | ||
| 135 | return bottle.HTTPResponse(status=status, body="") | ||
| 136 | |||
| 137 | # starting server on http://0.0.0.0:5000 | ||
| 138 | if __name__ == "__main__": | ||
| 139 | bottle.run( | ||
| 140 | app = app, | ||
| 141 | host = "0.0.0.0", | ||
| 142 | port = 5000, | ||
| 143 | debug = True, | ||
| 144 | reloader = True, | ||
| 145 | catchall = True, | ||
| 146 | ) | ||
| 147 | ``` | ||
| 148 | |||
| 149 | To run this simply go to folder containing python file and run ```python webapp.py``` from terminal. If everything goes ok you should have simple API available via POST method on /api route. | ||
| 150 | |||
| 151 | After testing the service with Restlet Client you should be able to view your data in a database file ```data.db```. | ||
| 152 | |||
| 153 |  | ||
| 154 | |||
| 155 | You can also check the contents of new database file by using desktop client for SQLite → [DB Browser for SQLite](http://sqlitebrowser.org/). | ||
| 156 | |||
| 157 |  | ||
| 158 | |||
| 159 | Table structure is as simple as it can be. We have ts (timestamp) and value (value from Arduino). As you can see timestamp is generated on API side. If you would happen to have atomic clock on Arduino it would be then better to generate and send timestamp with the value. This would be particularity useful if we would be collecting sensor data at a higher frequency and then sending this data in bulk to API. | ||
| 160 | |||
| 161 | If you will deploy this app with uWSGI and multi-threaded, use DSN (Data Source Name) url with ```?check_same_thread=False```. | ||
| 162 | |||
| 163 | Ok, now that we have some sort of a working API with some basic security so unwanted people can not post data to your database can we proceed further and try to program Arduino to send data to API. | ||
| 164 | |||
| 165 | ## Sending data to API with Arduino MKR1000 | ||
| 166 | |||
| 167 | First of all you should have MKR1000 module and microUSB cable to proceed. If you have ever done any work with Arduino you should know that you also need [Arduino IDE](https://www.arduino.cc/en/Main/Software). On provided link you should be able to download and install IDE. Once that task is completed and you have successfully run blink example you should proceed to the next step. | ||
| 168 | |||
| 169 | In order to use wireless capabilities of MKR1000 you need to first install [WiFi101 library](https://www.arduino.cc/en/Reference/WiFi101) in Arduino IDE. Please check before you install, you may already have it installed. | ||
| 170 | |||
| 171 | Code below is a working example that sends data to API. Before you try to test your code make sure you have run Python web application. Then change settings for wifi, api endpoint and api_key. If by some reason code bellow doesn't work for you please leave a comment and I'll try to help. | ||
| 172 | |||
| 173 | Once you have opened IDE and copied this code try to compile and upload it. Then open "Serial monitor" to see if any output is presented by Arduino. | ||
| 174 | |||
| 175 | ```c | ||
| 176 | #include <WiFi101.h> | ||
| 177 | |||
| 178 | // wifi settings | ||
| 179 | char ssid[] = "ssid-name"; | ||
| 180 | char pass[] = "ssid-password"; | ||
| 181 | |||
| 182 | // api server enpoint | ||
| 183 | char server[] = "192.168.6.22"; | ||
| 184 | int port = 5000; | ||
| 185 | |||
| 186 | // api key that must be the same as the one in Python code | ||
| 187 | String api_key = "JtF2aUE5SGHfVJBCG5SH"; | ||
| 188 | |||
| 189 | // frequency data is sent in ms - every 5 seconds | ||
| 190 | int timeout = 1000 * 5; | ||
| 191 | |||
| 192 | int status = WL_IDLE_STATUS; | ||
| 193 | |||
| 194 | void setup() { | ||
| 195 | |||
| 196 | // initialize serial and wait for port to open: | ||
| 197 | Serial.begin(9600); | ||
| 198 | delay(1000); | ||
| 199 | |||
| 200 | // check for the presence of the shield | ||
| 201 | if (WiFi.status() == WL_NO_SHIELD) { | ||
| 202 | Serial.println("WiFi shield not present"); | ||
| 203 | while (true); | ||
| 204 | } | ||
| 205 | |||
| 206 | // attempt to connect to wifi network | ||
| 207 | while (status != WL_CONNECTED) { | ||
| 208 | Serial.print("Attempting to connect to SSID: "); | ||
| 209 | Serial.println(ssid); | ||
| 210 | status = WiFi.begin(ssid, pass); | ||
| 211 | // wait 10 seconds for connection | ||
| 212 | delay(10000); | ||
| 213 | } | ||
| 214 | |||
| 215 | // output wifi status to serial monitor | ||
| 216 | Serial.print("SSID: "); | ||
| 217 | Serial.println(WiFi.SSID()); | ||
| 218 | |||
| 219 | IPAddress ip = WiFi.localIP(); | ||
| 220 | Serial.print("IP Address: "); | ||
| 221 | Serial.println(ip); | ||
| 222 | |||
| 223 | long rssi = WiFi.RSSI(); | ||
| 224 | Serial.print("signal strength (RSSI):"); | ||
| 225 | Serial.print(rssi); | ||
| 226 | Serial.println(" dBm"); | ||
| 227 | } | ||
| 228 | |||
| 229 | void loop() { | ||
| 230 | |||
| 231 | WiFiClient client; | ||
| 232 | |||
| 233 | if (client.connect(server, port)) { | ||
| 234 | |||
| 235 | // I use random number generator for this example | ||
| 236 | // but you can use analog or digital inputs from arduino | ||
| 237 | String content = String(random(1000)); | ||
| 238 | |||
| 239 | client.println("POST /api HTTP/1.1"); | ||
| 240 | client.println("Connection: close"); | ||
| 241 | client.println("Api-Key: " + api_key); | ||
| 242 | client.println("Content-Length: " + String(content.length())); | ||
| 243 | client.println(); | ||
| 244 | client.println(content); | ||
| 245 | |||
| 246 | delay(100); | ||
| 247 | client.stop(); | ||
| 248 | Serial.println("Data sent successfully ..."); | ||
| 249 | |||
| 250 | } else { | ||
| 251 | Serial.println("Problem sending data ..."); | ||
| 252 | } | ||
| 253 | |||
| 254 | // waits for x seconds and continue looping | ||
| 255 | delay(timeout); | ||
| 256 | |||
| 257 | } | ||
| 258 | ``` | ||
| 259 | |||
| 260 | As seen from example you can notice that Arduino is generating random integer between [ 0 .. 1000 ]. You can easily replace this with a temperature sensor or any other kind of sensor. | ||
| 261 | |||
| 262 | Now that we have API under the hood and Arduino is sending demo data we can now focus on data visualization. | ||
| 263 | |||
| 264 | ## Data visualization | ||
| 265 | |||
| 266 | Before we continue we should examine our project folder structure. Currently we only have two files in our project: | ||
| 267 | |||
| 268 | _simple-iot-app/_ | ||
| 269 | |||
| 270 | * _webapp.py_ | ||
| 271 | * _data.db_ | ||
| 272 | |||
| 273 | We will now add HTML template that will contain CSS and JavaScript code inline for the simplicity reason. And for the bottle framework to be able to scan root application folder for templates we will add ```bottle.TEMPLATE_PATH.insert(0, "./")``` in ```webapp.py```. By default bottle framework uses ```views/``` subfolder to store templates. This is not the ideal situation and if you will use bottle to develop web applications you should use native behavior and store templates in it's predefined folder. But for the sake of example we will over-ride this. Be careful to fully replace your code with new code that is provided below. Avoid partially replacing code in file :) Also new code for reading data-points is provided in Python example below. | ||
| 274 | |||
| 275 | First we add new route to our web application. It should be trigger when browser hits root of application ```http://0.0.0.0:5000/```. This route will do nothing more than render ```frontend.html``` template. This is done by ```return bottle.template("frontend.html")```. Check code below to further examine how exactly this is done. | ||
| 276 | |||
| 277 | Now we will expand ```/api``` route and use different methods to write or read data-points. For writing data-point we will use POST method and for reading points we will use GET method. GET method will return JSON object with latest readings and historical data. | ||
| 278 | |||
| 279 | There is a fantastic JavaScript library for plotting time-series charts called [MetricsGraphics.js](https://www.metricsgraphicsjs.org) that is based on [D3.js](https://d3js.org/) library for visualizing data. | ||
| 280 | |||
| 281 | Data schema required by MetricsGraphics.js → to achieve this we need to transform data from database into this format: | ||
| 282 | |||
| 283 | ```json | ||
| 284 | [ | ||
| 285 | { | ||
| 286 | "date": "2017-08-11 01:07:20", | ||
| 287 | "value": 933 | ||
| 288 | }, | ||
| 289 | { | ||
| 290 | "date": "2017-08-11 01:07:30", | ||
| 291 | "value": 743 | ||
| 292 | } | ||
| 293 | ] | ||
| 294 | ``` | ||
| 295 | |||
| 296 | Web application is now complete and we only need ```frontend.html``` that we will develop now. If you would try to start web app now and go to root app this will return error because we don't have frontend.html yet. | ||
| 297 | |||
| 298 | ```python | ||
| 299 | # -*- coding: utf-8 -*- | ||
| 300 | |||
| 301 | import time | ||
| 302 | import bottle | ||
| 303 | import json | ||
| 304 | import datetime | ||
| 305 | import random | ||
| 306 | import dataset | ||
| 307 | |||
| 308 | # initializing bottle app | ||
| 309 | app = bottle.Bottle() | ||
| 310 | |||
| 311 | # adds root directory as template folder | ||
| 312 | bottle.TEMPLATE_PATH.insert(0, "./") | ||
| 313 | |||
| 314 | # connects to sqlite database | ||
| 315 | # check_same_thread=False allows using it in multi-threaded mode | ||
| 316 | app.config["db"] = dataset.connect("sqlite:///data.db?check_same_thread=False") | ||
| 317 | |||
| 318 | # api key that will be used in Arduino code | ||
| 319 | app.config["api_key"] = "JtF2aUE5SGHfVJBCG5SH" | ||
| 320 | |||
| 321 | # triggered when / is accessed from browser | ||
| 322 | # only accepts GET → no POST allowed | ||
| 323 | @app.route("/", method=["GET"]) | ||
| 324 | def route_default(): | ||
| 325 | return bottle.template("frontend.html") | ||
| 326 | |||
| 327 | # triggered when /api is accessed from browser | ||
| 328 | # accepts POST and GET | ||
| 329 | @app.route("/api", method=["GET", "POST"]) | ||
| 330 | def route_default(): | ||
| 331 | |||
| 332 | # if method is POST then we write datapoint | ||
| 333 | if bottle.request.method == "POST": | ||
| 334 | status = 400 | ||
| 335 | ts = int(time.time()) # current timestamp | ||
| 336 | value = bottle.request.body.read() # data from device | ||
| 337 | api_key = bottle.request.get_header("Api-Key") # api key from header | ||
| 338 | |||
| 339 | # outputs to console recieved data for debug reason | ||
| 340 | print ">>> {} :: {}".format(value, api_key) | ||
| 341 | |||
| 342 | # if api_key is correct and value is present | ||
| 343 | # then writes attribute to point table | ||
| 344 | if api_key == app.config["api_key"] and value: | ||
| 345 | app.config["db"]["point"].insert(dict(ts=ts, value=value)) | ||
| 346 | status = 200 | ||
| 347 | |||
| 348 | # we only need to return status | ||
| 349 | return bottle.HTTPResponse(status=status, body="") | ||
| 350 | |||
| 351 | # if method is GET then we read datapoint | ||
| 352 | else: | ||
| 353 | response = [] | ||
| 354 | datapoints = app.config["db"]["point"].all() | ||
| 355 | |||
| 356 | for point in datapoints: | ||
| 357 | response.append({ | ||
| 358 | "date": datetime.datetime.fromtimestamp(int(point["ts"])).strftime("%Y-%m-%d %H:%M:%S"), | ||
| 359 | "value": point["value"] | ||
| 360 | }) | ||
| 361 | |||
| 362 | bottle.response.content_type = "application/json" | ||
| 363 | return json.dumps(response) | ||
| 364 | |||
| 365 | # starting server on http://0.0.0.0:5000 | ||
| 366 | if __name__ == "__main__": | ||
| 367 | bottle.run( | ||
| 368 | app = app, | ||
| 369 | host = "0.0.0.0", | ||
| 370 | port = 5000, | ||
| 371 | debug = True, | ||
| 372 | reloader = True, | ||
| 373 | catchall = True, | ||
| 374 | ) | ||
| 375 | ``` | ||
| 376 | |||
| 377 | And now finally we can implement ```frontend.html```. Create file with this name and copy code below. When you are done you can start web application. Steps for this part are listed below the code. | ||
| 378 | |||
| 379 | ```html | ||
| 380 | <!DOCTYPE html> | ||
| 381 | <html> | ||
| 382 | |||
| 383 | <head> | ||
| 384 | <meta charset="utf-8"> | ||
| 385 | <title>Simple IOT application</title> | ||
| 386 | </head> | ||
| 387 | |||
| 388 | <body> | ||
| 389 | |||
| 390 | <h1>Simple IOT application</h1> | ||
| 391 | |||
| 392 | <div class="chart-placeholder"> | ||
| 393 | <div id="chart"></div> | ||
| 394 | </div> | ||
| 395 | |||
| 396 | <!-- application main script --> | ||
| 397 | <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script> | ||
| 398 | <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/4.10.0/d3.min.js"></script> | ||
| 399 | <script src="https://cdnjs.cloudflare.com/ajax/libs/metrics-graphics/2.11.0/metricsgraphics.min.js"></script> | ||
| 400 | <script> | ||
| 401 | function fetch_and_render() { | ||
| 402 | d3.json("/api", function(data) { | ||
| 403 | data = MG.convert.date(data, "date", "%Y-%m-%d %H:%M:%S"); | ||
| 404 | MG.data_graphic({ | ||
| 405 | data: data, | ||
| 406 | chart_type: "line", | ||
| 407 | full_width: true, | ||
| 408 | height: 270, | ||
| 409 | target: document.getElementById("chart"), | ||
| 410 | x_accessor: "date", | ||
| 411 | y_accessor: "value" | ||
| 412 | }); | ||
| 413 | }); | ||
| 414 | } | ||
| 415 | window.onload = function() { | ||
| 416 | // initial call for rendering | ||
| 417 | fetch_and_render(); | ||
| 418 | |||
| 419 | // updates chart every 5 seconds | ||
| 420 | setInterval(function() { | ||
| 421 | fetch_and_render(); | ||
| 422 | }, 5000); | ||
| 423 | } | ||
| 424 | </script> | ||
| 425 | |||
| 426 | <!-- application styles --> | ||
| 427 | <style> | ||
| 428 | body { | ||
| 429 | font: 13px sans-serif; | ||
| 430 | padding: 20px 50px; | ||
| 431 | } | ||
| 432 | .chart-placeholder { | ||
| 433 | border: 2px solid #ccc; | ||
| 434 | width: 100%; | ||
| 435 | user-select: none; | ||
| 436 | } | ||
| 437 | /* chart styles */ | ||
| 438 | .mg-line1-color { | ||
| 439 | stroke: red; | ||
| 440 | stroke-width: 2; | ||
| 441 | } | ||
| 442 | .mg-main-area, .mg-main-line { | ||
| 443 | fill: #fff; | ||
| 444 | } | ||
| 445 | .mg-x-axis line, .mg-y-axis line { | ||
| 446 | stroke: #b3b2b2; | ||
| 447 | stroke-width: 1px; | ||
| 448 | } | ||
| 449 | </style> | ||
| 450 | |||
| 451 | </body> | ||
| 452 | |||
| 453 | </html> | ||
| 454 | ``` | ||
| 455 | |||
| 456 | Now the folder structure should look like: | ||
| 457 | |||
| 458 | _simple-iot-app/_ | ||
| 459 | |||
| 460 | * _webapp.py_ | ||
| 461 | * _data.db_ | ||
| 462 | * _frontend.html_ | ||
| 463 | |||
| 464 | Ok, lets now start application and start feeding it data. | ||
| 465 | |||
| 466 | 1. ```python webapp.py``` | ||
| 467 | 2. connect Arduino MKR1000 to power source | ||
| 468 | 3. open browser and go to ```http://0.0.0.0:5000``` | ||
| 469 | |||
| 470 | If everything goes well you should be seeing new data-points rendered on chart every 5 seconds. | ||
| 471 | |||
| 472 | If you navigate to ```http://0.0.0.0:5000``` you should see rendered chart as shown on picture below. | ||
| 473 | |||
| 474 |  | ||
| 475 | |||
| 476 | Complete application with all the code is available for [download](/assets/iot-application/simple-iot-application.zip). | ||
| 477 | |||
| 478 | ## Conclusion | ||
| 479 | |||
| 480 | I hope this clarifies some aspects of IOT application development. Of course this is a minimal example and is far from what can be done in real life with some further dive into other technologies. | ||
| 481 | |||
| 482 | If you would like to continue exploring IOT world here are some interesting resources for you to examine: | ||
| 483 | |||
| 484 | * [Reading Sensors with an Arduino](https://www.allaboutcircuits.com/projects/reading-sensors-with-an-arduino/) | ||
| 485 | * [MQTT 101 – How to Get Started with the lightweight IoT Protocol](http://www.hivemq.com/blog/how-to-get-started-with-mqtt) | ||
| 486 | * [Stream Updates with Server-Sent Events](https://www.html5rocks.com/en/tutorials/eventsource/basics/) | ||
| 487 | * [Internet of Things (IoT) Tutorials](http://www.tutorialspoint.com/internet_of_things/) | ||
| 488 | |||
| 489 | Any comment or additional ideas are welcomed in comments below. | ||
diff --git a/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md b/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md new file mode 100644 index 0000000..ae895f7 --- /dev/null +++ b/posts/2018-01-16-using-digitalocean-spaces-object-storage-with-fuse.md | |||
| @@ -0,0 +1,263 @@ | |||
| 1 | --- | ||
| 2 | Title: Using DigitalOcean Spaces Object Storage with FUSE | ||
| 3 | Description: Using DigitalOcean Spaces Object Storage with FUSE | ||
| 4 | Slug: using-digitalocean-spaces-object-storage-with-fuse | ||
| 5 | Listing: true | ||
| 6 | Created: 2018, January 16 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | Couple of months ago [DigitalOcean](https://www.digitalocean.com) introduced new product called [Spaces](https://blog.digitalocean.com/introducing-spaces-object-storage/) which is Object Storage very similar to Amazon's S3. This really peaked my interest, because this was something I was missing and even the thought of going over the internet for such functionality was in no interest to me. Also in fashion with their previous pricing this also is very cheap and pricing page is a no-brainer compared to AWS or GCE. [Prices are clearly and precisely defined and outlined](https://www.digitalocean.com/pricing/). You must love them for that :) | ||
| 11 | |||
| 12 | ### Initial requirements | ||
| 13 | |||
| 14 | * Is it possible to use them as a mounted drive with FUSE? (tl;dr YES) | ||
| 15 | * Will the performance degrade over time and over different sizes of objects? (tl;dr NO&YES) | ||
| 16 | * Can storage be mounted on multiple machines at the same time and be writable? (tl;dr YES) | ||
| 17 | |||
| 18 | > Let me be clear. This scripts I use are made just for benchmarking and are not intended to be used in real-life situations. Besides that, I am looking into using this approaches but adding caching service in front of it and then dumping everything as an object to storage. This could potentially be some interesting post of itself. But in case you would need real-time data without eventual consistency please take this scripts as they are: not usable in such situations. | ||
| 19 | |||
| 20 | ## Is it possible to use them as a mounted drive with FUSE? | ||
| 21 | |||
| 22 | Well, actually they can be used in such manor. Because they are similar to [AWS S3](https://aws.amazon.com/s3/) many tools are available and you can find many articles and [Stackoverflow items](https://stackoverflow.com/search?q=s3+fuse). | ||
| 23 | |||
| 24 | To make this work you will need DigitalOcean account. If you don't have one you will not be able to test this code. But if you have an account then you go and [create new Droplet](https://cloud.digitalocean.com/droplets/new?size=s-1vcpu-1gb®ion=ams3&distro=debian&distroImage=debian-9-x64&options=private_networking,install_agent). If you click on this link you will already have preselected Debian 9 with smallest VM option. | ||
| 25 | |||
| 26 | * Please be sure to add you SSH key, because we will login to this machine remotely. | ||
| 27 | * If you change your region please remember which one you choose because we will need this information when we try to mount space to our machine. | ||
| 28 | |||
| 29 | Instuctions on how to use SSH keys and how to setup them are available in article [How To Use SSH Keys with DigitalOcean Droplets](https://www.digitalocean.com/community/tutorials/how-to-use-ssh-keys-with-digitalocean-droplets). | ||
| 30 | |||
| 31 |  | ||
| 32 | |||
| 33 | After we created Droplet it's time to create new Space. This is done by clicking on a button [Create](https://cloud.digitalocean.com/spaces/new) (right top corner) and selecting Spaces. Choose pronounceable ```Unique name``` because we will use it in examples below. You can either choose Private or Public, it doesn't matter in our case. And you can always change that in the future. | ||
| 34 | |||
| 35 | When you have created new Space we should [generate Access key](https://cloud.digitalocean.com/settings/api/tokens). This link will guide to the page when you can generate this key. After you create new one, please save provided Key and Secret because Secret will not be shown again. | ||
| 36 | |||
| 37 |  | ||
| 38 | |||
| 39 | Now that we have new Space and Access key we should SSH into our machine. | ||
| 40 | |||
| 41 | ```bash | ||
| 42 | # replace IP with the ip of your newly created droplet | ||
| 43 | ssh root@IP | ||
| 44 | |||
| 45 | # this will install utilities for mounting storage objects as FUSE | ||
| 46 | apt install s3fs | ||
| 47 | |||
| 48 | # we now need to provide credentials (access key we created earlier) | ||
| 49 | # replace KEY and SECRET with your own credentials but leave the colon between them | ||
| 50 | # we also need to set proper permissions | ||
| 51 | echo "KEY:SECRET" > .passwd-s3fs | ||
| 52 | chmod 600 .passwd-s3fs | ||
| 53 | |||
| 54 | # now we mount space to our machine | ||
| 55 | # replace UNIQUE-NAME with the name you choose earlier | ||
| 56 | # if you choose different region for your space be careful about -ourl option (ams3) | ||
| 57 | s3fs UNIQUE-NAME /mnt/ -ourl=https://ams3.digitaloceanspaces.com -ouse_cache=/tmp | ||
| 58 | |||
| 59 | # now we try to create a file | ||
| 60 | # once you mount it may take a couple of seconds to retrieve data | ||
| 61 | echo "Hello cruel world" > /mnt/hello.txt | ||
| 62 | ``` | ||
| 63 | |||
| 64 | After all this you can return to your browser and go to [DigitalOcean Spaces](https://cloud.digitalocean.com/spaces) and click on your created space. If file hello.txt is present you have successfully mounted space to your machine and wrote data to it. | ||
| 65 | |||
| 66 | I choose the same region for my Droplet and my Space but you don't have to. You can have different regions. What this actually does to performance I don't know. | ||
| 67 | |||
| 68 | Additional information on FUSE: | ||
| 69 | |||
| 70 | * [Github project page for s3fs](https://github.com/s3fs-fuse/s3fs-fuse) | ||
| 71 | * [FUSE - Filesystem in Userspace](https://en.wikipedia.org/wiki/Filesystem_in_Userspace) | ||
| 72 | |||
| 73 | ## Will the performance degrade over time and over different sizes of objects? | ||
| 74 | |||
| 75 | For this task I didn't want to just read and write text files or uploading images. I actually wanted to figure out if using something like SQlite is viable in this case. | ||
| 76 | |||
| 77 | ### Measurement experiment 1: File copy | ||
| 78 | |||
| 79 | ```bash | ||
| 80 | # first we create some dummy files at different sizes | ||
| 81 | dd if=/dev/zero of=10KB.dat bs=1024 count=10 #10KB | ||
| 82 | dd if=/dev/zero of=100KB.dat bs=1024 count=100 #100KB | ||
| 83 | dd if=/dev/zero of=1MB.dat bs=1024 count=1024 #1MB | ||
| 84 | dd if=/dev/zero of=10MB.dat bs=1024 count=10240 #10MB | ||
| 85 | |||
| 86 | # now we set time command to only return real | ||
| 87 | TIMEFORMAT=%R | ||
| 88 | |||
| 89 | # now lets test it | ||
| 90 | (time cp 10KB.dat /mnt/) |& tee -a 10KB.results.txt | ||
| 91 | |||
| 92 | # and now we automate | ||
| 93 | # this will perform the same operation 100 times | ||
| 94 | # this will output results into separated files based on objecty size | ||
| 95 | n=0; while (( n++ < 100 )); do (time cp 10KB.dat /mnt/10KB.$n.dat) |& tee -a 10KB.results.txt; done | ||
| 96 | n=0; while (( n++ < 100 )); do (time cp 100KB.dat /mnt/100KB.$n.dat) |& tee -a 100KB.results.txt; done | ||
| 97 | n=0; while (( n++ < 100 )); do (time cp 1MB.dat /mnt/1MB.$n.dat) |& tee -a 1MB.results.txt; done | ||
| 98 | n=0; while (( n++ < 100 )); do (time cp 10MB.dat /mnt/10MB.$n.dat) |& tee -a 10MB.results.txt; done | ||
| 99 | ``` | ||
| 100 | |||
| 101 | Files of size 100MB were not successfully transferred and ended up displaying error (cp: failed to close '/mnt/100MB.1.dat': Operation not permitted). | ||
| 102 | |||
| 103 | As I suspected, object size is not really that important. Sadly I don't have the time to test performance over periods of time. But if some of you would do it please send me your data. I would be interested in seeing results. | ||
| 104 | |||
| 105 | **Here are plotted results** | ||
| 106 | |||
| 107 | You can download [raw result here](/assets/do-fuse/copy-benchmarks.tsv). Measurements are in seconds. | ||
| 108 | |||
| 109 | <script src="//cdn.plot.ly/plotly-latest.min.js"></script> | ||
| 110 | <div id="copy-benchmarks"></div> | ||
| 111 | <script> | ||
| 112 | (function(){ | ||
| 113 | var request = new XMLHttpRequest(); | ||
| 114 | request.open("GET", "/assets/do-fuse/copy-benchmarks.tsv", true); | ||
| 115 | request.onload = function() { | ||
| 116 | if (request.status >= 200 && request.status < 400) { | ||
| 117 | var payload = request.responseText.trim(); | ||
| 118 | var tsv = payload.split("\n"); | ||
| 119 | for (var i=0; i<tsv.length; i++) { tsv[i] = tsv[i].split("\t"); } | ||
| 120 | var traces = []; | ||
| 121 | var headers = tsv[0]; | ||
| 122 | tsv.shift(); | ||
| 123 | Array.prototype.forEach.call(headers, function(el, idx) { | ||
| 124 | var x = []; | ||
| 125 | var y = []; | ||
| 126 | for (var j=0; j<tsv.length; j++) { | ||
| 127 | x.push(j); | ||
| 128 | y.push(parseFloat(tsv[j][idx].replace(",", "."))); | ||
| 129 | } | ||
| 130 | traces.push({ x: x, y: y, type: "scatter", name: el, line: { width: 1, shape: "spline" } }); | ||
| 131 | }); | ||
| 132 | var copy = Plotly.newPlot("copy-benchmarks", traces, { legend: {"orientation": "h"}, height: 400, margin: { l: 40, r: 0, b: 20, t: 30, pad: 0 }, yaxis: { title: "execution time in seconds", titlefont: { size: 12 } }, xaxis: { title: "fn(i)", titlefont: { size: 12 } } }); | ||
| 133 | } else { } | ||
| 134 | }; | ||
| 135 | request.onerror = function() { }; | ||
| 136 | request.send(null); | ||
| 137 | })(); | ||
| 138 | </script> | ||
| 139 | |||
| 140 | As far as these tests show, performance is quite stable and can be predicted which is fantastic. But this is a small test and spans only over couple of hours. So you should not completely trust them. | ||
| 141 | |||
| 142 | ### Measurement experiment 2: SQLite performanse | ||
| 143 | |||
| 144 | I was unable to use database file directly from mounted drive so this is a no-go as I suspected. So I executed code below on a local disk just to get some benchmarks. I inserted 1000 records with DROPTABLE, CREATETABLE, INSERTMANY, FETCHALL, COMMIT for 1000 times to generate statistics. As you can see performance of SQLite is quite amazing. You could then potentially just copy file to mounted drive and be done with it. | ||
| 145 | |||
| 146 | ```python | ||
| 147 | import time | ||
| 148 | import sqlite3 | ||
| 149 | import sys | ||
| 150 | |||
| 151 | if len(sys.argv) < 3: | ||
| 152 | print("usage: python sqlite-benchmark.py DB_PATH NUM_RECORDS REPEAT") | ||
| 153 | exit() | ||
| 154 | |||
| 155 | def data_iter(x): | ||
| 156 | for i in range(x): | ||
| 157 | yield "m" + str(i), "f" + str(i*i) | ||
| 158 | |||
| 159 | header_line = "%s\t%s\t%s\t%s\t%s\n" % ("DROPTABLE", "CREATETABLE", "INSERTMANY", "FETCHALL", "COMMIT") | ||
| 160 | with open("sqlite-benchmarks.tsv", "w") as fp: | ||
| 161 | fp.write(header_line) | ||
| 162 | |||
| 163 | start_time = time.time() | ||
| 164 | conn = sqlite3.connect(sys.argv[1]) | ||
| 165 | c = conn.cursor() | ||
| 166 | end_time = time.time() | ||
| 167 | result_time = CONNECT = end_time - start_time | ||
| 168 | print("CONNECT: %g seconds" % (result_time)) | ||
| 169 | |||
| 170 | start_time = time.time() | ||
| 171 | c.execute("PRAGMA journal_mode=WAL") | ||
| 172 | c.execute("PRAGMA temp_store=MEMORY") | ||
| 173 | c.execute("PRAGMA synchronous=OFF") | ||
| 174 | result_time = PRAGMA = end_time - start_time | ||
| 175 | print("PRAGMA: %g seconds" % (result_time)) | ||
| 176 | |||
| 177 | for i in range(int(sys.argv[3])): | ||
| 178 | print("#%i" % (i)) | ||
| 179 | |||
| 180 | start_time = time.time() | ||
| 181 | c.execute("drop table if exists test") | ||
| 182 | end_time = time.time() | ||
| 183 | result_time = DROPTABLE = end_time - start_time | ||
| 184 | print("DROPTABLE: %g seconds" % (result_time)) | ||
| 185 | |||
| 186 | start_time = time.time() | ||
| 187 | c.execute("create table if not exists test(a,b)") | ||
| 188 | end_time = time.time() | ||
| 189 | result_time = CREATETABLE = end_time - start_time | ||
| 190 | print("CREATETABLE: %g seconds" % (result_time)) | ||
| 191 | |||
| 192 | start_time = time.time() | ||
| 193 | c.executemany("INSERT INTO test VALUES (?, ?)", data_iter(int(sys.argv[2]))) | ||
| 194 | end_time = time.time() | ||
| 195 | result_time = INSERTMANY = end_time - start_time | ||
| 196 | print("INSERTMANY: %g seconds" % (result_time)) | ||
| 197 | |||
| 198 | start_time = time.time() | ||
| 199 | c.execute("select count(*) from test") | ||
| 200 | res = c.fetchall() | ||
| 201 | end_time = time.time() | ||
| 202 | result_time = FETCHALL = end_time - start_time | ||
| 203 | print("FETCHALL: %g seconds" % (result_time)) | ||
| 204 | |||
| 205 | start_time = time.time() | ||
| 206 | conn.commit() | ||
| 207 | end_time = time.time() | ||
| 208 | result_time = COMMIT = end_time - start_time | ||
| 209 | print("COMMIT: %g seconds" % (result_time)) | ||
| 210 | |||
| 211 | |||
| 212 | log_line = "%f\t%f\t%f\t%f\t%f\n" % (DROPTABLE, CREATETABLE, INSERTMANY, FETCHALL, COMMIT) | ||
| 213 | with open("sqlite-benchmarks.tsv", "a") as fp: | ||
| 214 | fp.write(log_line) | ||
| 215 | |||
| 216 | start_time = time.time() | ||
| 217 | conn.close() | ||
| 218 | end_time = time.time() | ||
| 219 | result_time = CLOSE = end_time - start_time | ||
| 220 | print("CLOSE: %g seconds" % (result_time)) | ||
| 221 | ``` | ||
| 222 | |||
| 223 | You can download [raw result here](/assets/do-fuse/sqlite-benchmarks.tsv). And again, these results are done on a local block storage and do not represent capabilities of object storage. With my current approach and state of the test code these can not be done. I would need to make Python code much more robust and check locking etc. | ||
| 224 | |||
| 225 | <div id="sqlite-benchmarks"></div> | ||
| 226 | <script> | ||
| 227 | (function(){ | ||
| 228 | var request = new XMLHttpRequest(); | ||
| 229 | request.open("GET", "/assets/do-fuse/sqlite-benchmarks.tsv", true); | ||
| 230 | request.onload = function() { | ||
| 231 | if (request.status >= 200 && request.status < 400) { | ||
| 232 | var payload = request.responseText.trim(); | ||
| 233 | var tsv = payload.split("\n"); | ||
| 234 | for (var i=0; i<tsv.length; i++) { tsv[i] = tsv[i].split("\t"); } | ||
| 235 | var traces = []; | ||
| 236 | var headers = tsv[0]; | ||
| 237 | tsv.shift(); | ||
| 238 | Array.prototype.forEach.call(headers, function(el, idx) { | ||
| 239 | var x = []; | ||
| 240 | var y = []; | ||
| 241 | for (var j=0; j<tsv.length; j++) { | ||
| 242 | x.push(j); | ||
| 243 | y.push(parseFloat(tsv[j][idx].replace(",", "."))); | ||
| 244 | } | ||
| 245 | traces.push({ x: x, y: y, type: "scatter", name: el, line: { width: 1, shape: "spline" } }); | ||
| 246 | }); | ||
| 247 | var sqlite = Plotly.newPlot("sqlite-benchmarks", traces, { legend: {"orientation": "h"}, height: 400, margin: { l: 50, r: 0, b: 20, t: 30, pad: 0 }, yaxis: { title: "execution time in seconds", titlefont: { size: 12 } } }); | ||
| 248 | } else { } | ||
| 249 | }; | ||
| 250 | request.onerror = function() { }; | ||
| 251 | request.send(null); | ||
| 252 | })(); | ||
| 253 | </script> | ||
| 254 | |||
| 255 | ## Can storage be mounted on multiple machines at the same time and be writable? | ||
| 256 | |||
| 257 | Well, this one didn't take long to test. And the answer is **YES**. I mounted space on both machines and measured same performance on both machines. But because file is downloaded before write and then uploaded on complete there could potentially be problems is another process is trying to access the same file. | ||
| 258 | |||
| 259 | ## Observations and conslusion | ||
| 260 | |||
| 261 | Using Spaces in this way makes it easier to access and manage files. But besides that you would need to write additional code to make this one play nice with you applications. | ||
| 262 | |||
| 263 | Nevertheless, this was extremely simple to setup and use and this is just another excellent product in DigitalOcean product line. I found this exercise very valuable and am thinking about implementing some sort of mechanism for SQLite, so data can be stored on Spaces and accessed by many VM's. For a project where data doesn't need to be accessible in real-time and can have couple of minutes old data this would be very interesting. If any of you find this proposal interesting please write in a comment box below or shoot me an email and I will keep you posted. | ||
diff --git a/posts/2019-01-03-encoding-binary-data-into-dna-sequence.md b/posts/2019-01-03-encoding-binary-data-into-dna-sequence.md new file mode 100644 index 0000000..1bf39ea --- /dev/null +++ b/posts/2019-01-03-encoding-binary-data-into-dna-sequence.md | |||
| @@ -0,0 +1,348 @@ | |||
| 1 | --- | ||
| 2 | Title: Encoding binary data into DNA sequence | ||
| 3 | Description: Imagine a world where you could go outside and take a leaf from a tree and put it through your ~ personal DNA sequencer and get data like music, videos or computer programs from it | ||
| 4 | Slug: encoding-binary-data-into-dna-sequence | ||
| 5 | Listing: true | ||
| 6 | Created: 2019, January 3 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | ## Initial thoughts | ||
| 11 | |||
| 12 | Imagine a world where you could go outside and take a leaf from a tree and put it through your personal DNA sequencer and get data like music, videos or computer programs from it. Well, this is all possible now. It was not done on a large scale because it is quite expensive to create DNA strands but it's possible. | ||
| 13 | |||
| 14 | Encoding data into DNA sequence is relatively simple process once you understand the relationship between binary data and nucleotides and scientists have been making large leaps in this field in order to provide viable long-term storage solution for our data that would potentially survive our specie if case of global disaster. We could imprint all the world's knowledge into plants and ensure the survival of our knowledge. | ||
| 15 | |||
| 16 | More optimistic usage for this technology would be easier storage of ever growing data we produce every day. Once machines for sequencing DNA become fast enough and cheaper this could mean the next evolution of storing data and abandoning classical hard and solid state drives in data warehouses. | ||
| 17 | |||
| 18 | As we currently stand this is still not viable but it is quite an amazing and cool technology. | ||
| 19 | |||
| 20 | My interests in this field are purely in encoding processes and experimental testing mainly because I don't have the access to this expensive machines. My initial goal was to create a toolkit that can be used by everybody to encode their data into a proper DNA sequence. | ||
| 21 | |||
| 22 | ## Glossary | ||
| 23 | |||
| 24 | **deoxyribose** | ||
| 25 | : A five-carbon sugar molecule with a hydrogen atom rather than a hydroxyl group in the 2′ position; the sugar component of DNA nucleotides. | ||
| 26 | |||
| 27 | **double helix** | ||
| 28 | : The molecular shape of DNA in which two strands of nucleotides wind around each other in a spiral shape. | ||
| 29 | |||
| 30 | **nitrogenous base** | ||
| 31 | : A nitrogen-containing molecule that acts as a base; often referring to one of the purine or pyrimidine components of nucleic acids. | ||
| 32 | |||
| 33 | **phosphate group** | ||
| 34 | : A molecular group consisting of a central phosphorus atom bound to four oxygen atoms. | ||
| 35 | |||
| 36 | **RGB** | ||
| 37 | : The RGB color model is an additive color model in which red, green and blue light are added together in various ways to reproduce a broad array of colors. | ||
| 38 | |||
| 39 | **GCC** | ||
| 40 | : The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. | ||
| 41 | |||
| 42 | ## Data encoding | ||
| 43 | |||
| 44 | **TL;DR:** Encoding involves the use of a code to change original data into a form that can be used by an external process. | ||
| 45 | |||
| 46 | Encoding is the process of converting data into a format required for a number of information processing needs, including: | ||
| 47 | |||
| 48 | - Program compiling and execution | ||
| 49 | - Data transmission, storage and compression/decompression | ||
| 50 | - Application data processing, such as file conversion | ||
| 51 | |||
| 52 | Encoding can have two meanings: | ||
| 53 | |||
| 54 | - In computer technology, encoding is the process of applying a specific code, such as letters, symbols and numbers, to data for conversion into an equivalent cipher. | ||
| 55 | - In electronics, encoding refers to analog to digital conversion. | ||
| 56 | |||
| 57 | ## Quick history of DNA | ||
| 58 | |||
| 59 | - **1869** - Friedrich Miescher identifies "nuclein". | ||
| 60 | - **1900s** - The Eugenics Movement. | ||
| 61 | - **1900** – Mendel's theories are rediscovered by researchers. | ||
| 62 | - **1944** - Oswald Avery identifies DNA as the 'transforming principle'. | ||
| 63 | - **1952** - Rosalind Franklin photographs crystallized DNA fibres. | ||
| 64 | - **1953** - James Watson and Francis Crick discover the double helix structure of DNA. | ||
| 65 | - **1965** - Marshall Nirenberg is the first person to sequence the bases in each codon. | ||
| 66 | - **1983** - Huntington's disease is the first mapped genetic disease. | ||
| 67 | - **1990** - The Human Genome Project begins. | ||
| 68 | - **1995** - Haemophilus Influenzae is the first bacterium genome sequenced. | ||
| 69 | - **1996** - Dolly the sheep is cloned. | ||
| 70 | - **1999** - First human chromosome is decoded. | ||
| 71 | - **2000** – Genetic code of the fruit fly is decoded. | ||
| 72 | - **2002** – Mouse is the first mammal to have its genome decoded. | ||
| 73 | - **2003** – The Human Genome Project is completed. | ||
| 74 | - **2013** – DNA Worldwide and Eurofins Forensic discover identical twins have differences in their genetic makeup. | ||
| 75 | |||
| 76 | ## What is DNA? | ||
| 77 | |||
| 78 | Deoxyribonucleic acid, a self-replicating material which is **present in nearly all living organisms** as the main constituent of chromosomes. It is the **carrier of genetic information**. | ||
| 79 | |||
| 80 | > The nitrogen in our DNA, the calcium in our teeth, the iron in our blood, the carbon in our apple pies were made in the interiors of collapsing stars. We are made of starstuff. | ||
| 81 | > | ||
| 82 | > **-- Carl Sagan, Cosmos** | ||
| 83 | |||
| 84 | The nucleotide in DNA consists of a sugar (deoxyribose), one of four bases (cytosine (C), thymine (T), adenine (A), guanine (G)), and a phosphate. Cytosine and thymine are pyrimidine bases, while adenine and guanine are purine bases. The sugar and the base together are called a nucleoside. | ||
| 85 | |||
| 86 |  | ||
| 87 | |||
| 88 | *DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and cytosine pairs with guanine. (credit a: modification of work by Jerome Walker, Dennis Myts)* | ||
| 89 | |||
| 90 | ## Encode binary data into DNA sequence | ||
| 91 | |||
| 92 | As an input file you can use any file you want: | ||
| 93 | - ASCII files, | ||
| 94 | - Compiled programs, | ||
| 95 | - Multimedia files (MP3, MP4, MVK, etc), | ||
| 96 | - Images, | ||
| 97 | - Database files, | ||
| 98 | - etc. | ||
| 99 | |||
| 100 | Note: If you would copy all the bytes from RAM to file or pipe data to file you could encode also this data as long as you provide file pointer to the encoder. | ||
| 101 | |||
| 102 | ### Basic Encoding | ||
| 103 | |||
| 104 | As already mentioned, the Basic Encoding is based on a simple mapping. Since DNA is composed of 4 nucleotides (Adenine, Cytosine, Guanine, Thymine; usually referred using the first letter). Using this technique we can encode | ||
| 105 | |||
| 106 | $$ log_2(4) = log_2(2^2) = 2 bits $$ | ||
| 107 | |||
| 108 | using a single nucleotide. In this way, we are able to use the 4 bases that compose the DNA strand to encode each byte of data. | ||
| 109 | |||
| 110 | | Two bits | Nucleotides | | ||
| 111 | | -------- | ---------------- | | ||
| 112 | | 00 | **A** (Adenine) | | ||
| 113 | | 10 | **G** (Guanine) | | ||
| 114 | | 01 | **C** (Cytosine) | | ||
| 115 | | 11 | **T** (Thymine) | | ||
| 116 | |||
| 117 | With this in mind we can simply encode any data by using two-bit to Nucleotides conversion | ||
| 118 | |||
| 119 | ```python | ||
| 120 | { Algorithm 1: Naive byte array to DNA encode } | ||
| 121 | procedure EncodeToDNASequence(f) string | ||
| 122 | begin | ||
| 123 | enc string | ||
| 124 | while not eof(f) do | ||
| 125 | c byte := buffer[0] { Read 1 byte from buffer } | ||
| 126 | bin integer := sprintf('08b', c) { Convert to string binary } | ||
| 127 | for e in range[0, 2, 4, 6] do | ||
| 128 | if e[0] == 48 and e[1] == 48 then { 0x00 - A (Adenine) } | ||
| 129 | enc += 'A' | ||
| 130 | else if e[0] == 48 and e[1] == 49 then { 0x01 - G (Guanine) } | ||
| 131 | enc += 'G' | ||
| 132 | else if e[0] == 49 and e[1] == 48 then { 0x10 - C (Cytosine) } | ||
| 133 | enc += 'C' | ||
| 134 | else if e[0] == 49 and e[1] == 49 then { 0x11 - T (Thymine) } | ||
| 135 | enc += 'T' | ||
| 136 | return enc { Return DNA sequence } | ||
| 137 | end | ||
| 138 | ``` | ||
| 139 | |||
| 140 | Another encoding would be **Goldman encoding**. Using this encoding helps with Nonsense mutation (amino acids replaced by a stop codon) that occurs and is the most problematic during translation because it leads to truncated amino acid sequences, which in turn results in truncated proteins. | ||
| 141 | |||
| 142 | [Where to store big data? In DNA: Nick Goldman at TEDxPrague](https://www.youtube.com/watch?v=a4PiGWNsIEU) | ||
| 143 | |||
| 144 | ### FASTA file format | ||
| 145 | |||
| 146 | In bioinformatics, FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The format originates from the FASTA software package, but has now become a standard in the field of bioinformatics. | ||
| 147 | |||
| 148 | The first line in a FASTA file started either with a ">" (greater-than) symbol or, less frequently, a ";" (semicolon) was taken as a comment. Subsequent lines starting with a semicolon would be ignored by software. Since the only comment used was the first, it quickly became used to hold a summary description of the sequence, often starting with a unique library accession number, and with time it has become commonplace to always use ">" for the first line and to not use ";" comments (which would otherwise be ignored). | ||
| 149 | |||
| 150 | ```text | ||
| 151 | ;LCBO - Prolactin precursor - Bovine | ||
| 152 | ; a sample sequence in FASTA format | ||
| 153 | MDSKGSSQKGSRLLLLLVVSNLLLCQGVVSTPVCPNGPGNCQVSLRDLFDRAVMVSHYIHDLSS | ||
| 154 | EMFNEFDKRYAQGKGFITMALNSCHTSSLPTPEDKEQAQQTHHEVLMSLILGLLRSWNDPLYHL | ||
| 155 | VTEVRGMKGAPDAILSRAIEIEEENKRLLEGMEMIFGQVIPGAKETEPYPVWSGLPSLQTKDED | ||
| 156 | ARYSAFYNLLHCLRRDSSKIDTYLKLLNCRIIYNNNC* | ||
| 157 | |||
| 158 | >MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken | ||
| 159 | ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID | ||
| 160 | FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA | ||
| 161 | DIDGDGQVNYEEFVQMMTAK* | ||
| 162 | |||
| 163 | >gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] | ||
| 164 | LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV | ||
| 165 | EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG | ||
| 166 | LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL | ||
| 167 | GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX | ||
| 168 | IENY | ||
| 169 | ``` | ||
| 170 | |||
| 171 | FASTA format was extended by [FASTQ](https://en.wikipedia.org/wiki/FASTQ_format) format from the [Sanger Centre](https://www.sanger.ac.uk/) in Cambridge. | ||
| 172 | |||
| 173 | ### PNG encoded DNA sequence | ||
| 174 | |||
| 175 | | Nucleotides | RGB | Color name | | ||
| 176 | | ------------- | ----------- | ---------- | | ||
| 177 | | A -> Adenine | (0,0,255) | Blue | | ||
| 178 | | G -> Guanine | (0,100,0) | Green | | ||
| 179 | | C -> Cytosine | (255,0,0) | Red | | ||
| 180 | | T -> Thymine | (255,255,0) | Yellow | | ||
| 181 | |||
| 182 | With this in mind we can create a simple algorithm to create PNG representation of a DNA sequence. | ||
| 183 | |||
| 184 | ```python | ||
| 185 | { Algorithm 2: Naive DNA to PNG encode from FASTA file } | ||
| 186 | procedure EncodeDNASequenceToPNG(f) | ||
| 187 | begin | ||
| 188 | i image | ||
| 189 | while not eof(f) do | ||
| 190 | c char := buffer[0] { Read 1 char from buffer } | ||
| 191 | case c of | ||
| 192 | 'A': color := RGB(0, 0, 255) { Blue } | ||
| 193 | 'G': color := RGB(0, 100, 0) { Green } | ||
| 194 | 'C': color := RGB(255, 0, 0) { Red } | ||
| 195 | 'T': color := RGB(255, 255, 0) { Yellow } | ||
| 196 | drawRect(i, [x, y], color) | ||
| 197 | save(i) { Save PNG image } | ||
| 198 | end | ||
| 199 | ``` | ||
| 200 | |||
| 201 | ## Encoding text file in practice | ||
| 202 | |||
| 203 | In this example we will take a simple text file as our input stream for encoding. This file will have a quote from Niels Bohr and saved as txt file. | ||
| 204 | |||
| 205 | > How wonderful that we have met with a paradox. Now we have some hope of making progress. | ||
| 206 | > ― Niels Bohr | ||
| 207 | |||
| 208 | First we encode text file into FASTA file. | ||
| 209 | |||
| 210 | ```bash | ||
| 211 | ./dnae-encode -i quote.txt -o quote.fa | ||
| 212 | 2019/01/10 00:38:29 Gathering input file stats | ||
| 213 | 2019/01/10 00:38:29 Starting encoding ... | ||
| 214 | 106 B / 106 B [==================================] 100.00% 0s | ||
| 215 | 2019/01/10 00:38:29 Saving to FASTA file ... | ||
| 216 | 2019/01/10 00:38:29 Output FASTA file length is 438 B | ||
| 217 | 2019/01/10 00:38:29 Process took 987.263µs | ||
| 218 | 2019/01/10 00:38:29 Done ... | ||
| 219 | ``` | ||
| 220 | |||
| 221 | Output of `quote.fa` file contains the encoded DNA sequence in ASCII format. | ||
| 222 | |||
| 223 | ```text | ||
| 224 | >SEQ1 | ||
| 225 | GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA | ||
| 226 | GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA | ||
| 227 | ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA | ||
| 228 | ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT | ||
| 229 | GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT | ||
| 230 | GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC | ||
| 231 | AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC | ||
| 232 | AACC | ||
| 233 | ``` | ||
| 234 | |||
| 235 | Then we encode FASTA file from previous operation to encode this data into PNG. | ||
| 236 | |||
| 237 | ```bash | ||
| 238 | ./dnae-png -i quote.fa -o quote.png | ||
| 239 | 2019/01/10 00:40:09 Gathering input file stats ... | ||
| 240 | 2019/01/10 00:40:09 Deconstructing FASTA file ... | ||
| 241 | 2019/01/10 00:40:09 Compositing image file ... | ||
| 242 | 424 / 424 [==================================] 100.00% 0s | ||
| 243 | 2019/01/10 00:40:09 Saving output file ... | ||
| 244 | 2019/01/10 00:40:09 Output image file length is 1.1 kB | ||
| 245 | 2019/01/10 00:40:09 Process took 19.036117ms | ||
| 246 | 2019/01/10 00:40:09 Done ... | ||
| 247 | ``` | ||
| 248 | |||
| 249 | After encoding into PNG format this file looks like this. | ||
| 250 | |||
| 251 |  | ||
| 252 | |||
| 253 | The larger the input stream is the larger the PNG file would be. | ||
| 254 | |||
| 255 | Compiled basic Hello World C program with [GCC](https://www.gnu.org/software/gcc/) would [look like](/assets/dna-sequence/sample.png). | ||
| 256 | |||
| 257 | ```c | ||
| 258 | // gcc -O3 -o sample sample.c | ||
| 259 | #include <stdio.h> | ||
| 260 | |||
| 261 | main() { | ||
| 262 | printf("Hello, world!\n"); | ||
| 263 | return 0; | ||
| 264 | } | ||
| 265 | ``` | ||
| 266 | |||
| 267 | ## Toolkit for encoding data | ||
| 268 | |||
| 269 | I have created a toolkit with two main programs: | ||
| 270 | - dnae-encode (encodes file into FASTA file) | ||
| 271 | - dnae-png (encodes FASTA file into PNG) | ||
| 272 | |||
| 273 | Toolkit with full source code is available on [github.com/mitjafelicijan/dna-encoding](https://github.com/mitjafelicijan/dna-encoding). | ||
| 274 | |||
| 275 | ### dnae-encode | ||
| 276 | |||
| 277 | ```bash | ||
| 278 | > ./dnae-encode --help | ||
| 279 | usage: dnae-encode --input=INPUT [<flags>] | ||
| 280 | |||
| 281 | A command-line application that encodes file into DNA sequence. | ||
| 282 | |||
| 283 | Flags: | ||
| 284 | --help Show context-sensitive help (also try --help-long and --help-man). | ||
| 285 | -i, --input=INPUT Input file (ASCII or binary) which will be encoded into DNA sequence. | ||
| 286 | -o, --output="out.fa" Output file which stores DNA sequence in FASTA format. | ||
| 287 | -s, --sequence=SEQ1 The description line (defline) or header/identifier line, gives a name and/or a unique identifier for the sequence. | ||
| 288 | -c, --columns=60 Row characters length (no more than 120 characters). Devices preallocate fixed line sizes in software. | ||
| 289 | --version Show application version. | ||
| 290 | ``` | ||
| 291 | |||
| 292 | ### dnae-png | ||
| 293 | |||
| 294 | ```bash | ||
| 295 | > ./dnae-png --help | ||
| 296 | usage: dnae-png --input=INPUT [<flags>] | ||
| 297 | |||
| 298 | A command-line application that encodes FASTA file into PNG image. | ||
| 299 | |||
| 300 | Flags: | ||
| 301 | --help Show context-sensitive help (also try --help-long and --help-man). | ||
| 302 | -i, --input=INPUT Input FASTA file which will be encoded into PNG image. | ||
| 303 | -o, --output="out.png" Output file in PNG format that represents DNA sequence in graphical way. | ||
| 304 | -s, --size=10 Size of pairings of DNA bases on image in pixels (lower resolution lower file size). | ||
| 305 | --version Show application version. | ||
| 306 | ``` | ||
| 307 | |||
| 308 | ## Benchmarks | ||
| 309 | |||
| 310 | First we generate some binary sample data with dd. | ||
| 311 | |||
| 312 | ```bash | ||
| 313 | dd if=<(openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero) of=1KB.bin bs=1KB count=1 iflag=fullblock | ||
| 314 | ``` | ||
| 315 | |||
| 316 | Our freshly generated 1KB file looks something like this (its full of garbage data as intended). | ||
| 317 | |||
| 318 |  | ||
| 319 | |||
| 320 | We create following binary files: | ||
| 321 | - 1KB.bin | ||
| 322 | - 10KB.bin | ||
| 323 | - 100KB.bin | ||
| 324 | - 1MB.bin | ||
| 325 | - 10MB.bin | ||
| 326 | - 100MB.bin | ||
| 327 | |||
| 328 | After this we create FASTA files for all the binary files by encoding them into DNA sequence. | ||
| 329 | |||
| 330 | ```bash | ||
| 331 | ./dnae-encode -i 100MB.bin -o 100MB.fa | ||
| 332 | ``` | ||
| 333 | |||
| 334 | Then we GZIP all the FASTA files to see how much the can be compressed. | ||
| 335 | |||
| 336 | ```bash | ||
| 337 | gzip -9 < 10MB.fa > 10MB.fa.gz | ||
| 338 | ``` | ||
| 339 | |||
| 340 | [Download ODS file with benchmarks](/assets/dna-sequence/benchmarks.ods). | ||
| 341 | |||
| 342 | ## References | ||
| 343 | |||
| 344 | - https://www.techopedia.com/definition/948/encoding | ||
| 345 | - https://www.dna-worldwide.com/resource/160/history-dna-timeline | ||
| 346 | - https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/ | ||
| 347 | - https://arxiv.org/abs/1801.04774 | ||
| 348 | - https://en.wikipedia.org/wiki/FASTA_format | ||
diff --git a/posts/2019-10-14-simplifying-and-reducing-clutter.md b/posts/2019-10-14-simplifying-and-reducing-clutter.md new file mode 100644 index 0000000..24c55c6 --- /dev/null +++ b/posts/2019-10-14-simplifying-and-reducing-clutter.md | |||
| @@ -0,0 +1,24 @@ | |||
| 1 | --- | ||
| 2 | Title: Simplifying and reducing clutter in my life and work | ||
| 3 | Description: Simplifying and reducing clutter in my life and work | ||
| 4 | Slug: simplifying-and-reducing-clutter | ||
| 5 | Listing: true | ||
| 6 | Created: 2019, October 14 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | I recently moved my main working machine back from Hachintosh to Linux. Well the experiment was interesting and I have done some great work on macOS but it was time to move back. | ||
| 11 | |||
| 12 | I actually really missed Linux. The simplicity of `apt-get` or just the amount of software that exists for Linux should be a no-brainer. I spent most of my time on macOS finding solutions to make things work. Using [Brew](https://brew.sh/) was just a horrible experience and far from package managers of Linux. At least they managed to get that `sudo` debacle sorted. | ||
| 13 | |||
| 14 | Not all was bad. macOS in general was a perfectly good environment. Things like Docker and tooling like this worked without any hiccups. My normal tools like coding IDE worked flawlessly and the whole look and feel is just superb. I have been using MacBook Air for couple of years so I was used to the system but never as a daily driver. | ||
| 15 | |||
| 16 | One of the things I did after I installed Linux back on my machine was cleaning up my Dropbox folder. I have everything on Dropbox. Even projects folder. I write code for living so my whole life revolves around couple of megs of code (with assets). So it's not like I have huge files on my machine. I don't have movies or music or pictures on my PC. All of that stuff is in cloud. I use Google music and I have Netflix account which is more than enough for me. | ||
| 17 | |||
| 18 | I also went and deleted some of the repositories on my Github account. I have deleted more code than deployed. People find this strange but for me deleting something feels so cathartic and also forces me to write better code next time around when I am faced with similar problem. That was a huge relief if I am being totally honest. | ||
| 19 | |||
| 20 | Next step was to do something with my webpage. I have been using some scripts I wrote a while ago to generate static pages from markdown source posts. I kept on adding and adding stuff on top of it and it became a source of a frustration. And this is just a simple blog and I was using gulp and npm. Anyways after couple of hours of searching and testing static generators I found an interesting one [https://github.com/piranha/gostatic](https://github.com/piranha/gostatic) and I just decided to use this one. It was the only one that had a simple templating engine, not that I really need one. But others had this convoluted way of trying to solve everything and at the end just required quite bigger learning curve I was ready to go with. So I deleted couple of old posts, simplified HTML, trashed most of the CSS and went with [https://motherfuckingwebsite.com/](https://motherfuckingwebsite.com/) aesthetics. Yeah, the previous site was more visually stimulating but all I really care is the content at this point. And Times New Roman font is kind of awesome. | ||
| 21 | |||
| 22 | I stopped working on most of the projects in the past couple of months because the overhead was just too insane. There comes a point when you stretch yourself too much and then you stop progressing and with that comes dissatisfaction. | ||
| 23 | |||
| 24 | So that's about it. Moving forward minimal style. | ||
diff --git a/posts/2019-10-19-using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md b/posts/2019-10-19-using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md new file mode 100644 index 0000000..b975828 --- /dev/null +++ b/posts/2019-10-19-using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md | |||
| @@ -0,0 +1,88 @@ | |||
| 1 | --- | ||
| 2 | Title: Using sentiment analysis for click‑bait detection in RSS feeds | ||
| 3 | Description: Using Python with sentiment analysis to detect if titles in RSS feeds are click-bait | ||
| 4 | Slug: using-sentiment-analysis-for-click-bait-detection-in-rss-feeds | ||
| 5 | Listing: true | ||
| 6 | Created: 2019, October 19 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | ## Initial thoughts | ||
| 11 | |||
| 12 | One of the things that interested me for a while now is if major well established news sites use click bait titles to drive additional traffic to their sites and generate additional impressions. | ||
| 13 | |||
| 14 | Goal is to see how article titles and actual content of article differ from each other and see if titles are click-baited. | ||
| 15 | |||
| 16 | ## Preparing and cleaning data | ||
| 17 | |||
| 18 | For this example I opted to just use RSS feed from a new website and decided to go with [The Guardian](https://www.theguardian.com) World news. While this gets us limited data (~40) articles and also description (actual content) is trimmed this really doesn't reflect the actual article contents. | ||
| 19 | |||
| 20 | To get better content I could use web scraping and use RSS as link list and fetch contents directly from website, but for this simple example this will suffice. | ||
| 21 | |||
| 22 | There are couple of requirements we need to install before we continue: | ||
| 23 | |||
| 24 | - `pip3 install feedparser` (parses RSS feed from url) | ||
| 25 | - `pip3 install vaderSentiment` (does sentiment polarity analysis) | ||
| 26 | - `pip3 install matplotlib` (plots chart of results) | ||
| 27 | |||
| 28 | So first we need to fetch RSS data and sanitize HTML content from description. | ||
| 29 | |||
| 30 | ```python | ||
| 31 | import re | ||
| 32 | import feedparser | ||
| 33 | |||
| 34 | feed_url = "https://www.theguardian.com/world/rss" | ||
| 35 | feed = feedparser.parse(feed_url) | ||
| 36 | |||
| 37 | # sanitize html | ||
| 38 | for item in feed.entries: | ||
| 39 | item.description = re.sub('<[^<]+?>', '', item.description) | ||
| 40 | ``` | ||
| 41 | |||
| 42 | ## Perform sentiment analysis | ||
| 43 | |||
| 44 | Since we now have cleaned up data in our `feed.entries` object we can start with performing sentiment analysis. | ||
| 45 | |||
| 46 | There are many sentiment analysis libraries available that range from rule-based sentiment analysis up to machine learning supported analysis. To keep things simple I decided to use rule-based analysis library [vaderSentiment](https://github.com/cjhutto/vaderSentiment) from [C.J. Hutto](https://github.com/cjhutto). Really nice library and quite easy to use. | ||
| 47 | |||
| 48 | ```python | ||
| 49 | from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer | ||
| 50 | analyser = SentimentIntensityAnalyzer() | ||
| 51 | |||
| 52 | sentiment_results = [] | ||
| 53 | for item in feed.entries: | ||
| 54 | sentiment_title = analyser.polarity_scores(item.title) | ||
| 55 | sentiment_description = analyser.polarity_scores(item.description) | ||
| 56 | sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']]) | ||
| 57 | ``` | ||
| 58 | |||
| 59 | Now that we have this data in a shape that is compatible with matplotlib we can plot results to see the difference between title and description sentiment of an article. | ||
| 60 | |||
| 61 | ```python | ||
| 62 | import matplotlib.pyplot as plt | ||
| 63 | |||
| 64 | plt.rcParams['figure.figsize'] = (15, 3) | ||
| 65 | plt.plot(sentiment_results, drawstyle='steps') | ||
| 66 | plt.title('Sentiment analysis relationship between title and description (Guardian World News)') | ||
| 67 | plt.legend(['title', 'description']) | ||
| 68 | plt.show() | ||
| 69 | ``` | ||
| 70 | |||
| 71 | ## Results and assets | ||
| 72 | |||
| 73 | 1. Because of the small sample size further conclusions are impossible to make. | ||
| 74 | 2. Rule-based approach may not be the best way of doing this. By using deep learning we would be able to get better insights. | ||
| 75 | 3. **Next step would be to** periodically fetch RSS items and store them over a longer period of time and then perform analysis again and use either machine learning or deep learning on top of it. | ||
| 76 | |||
| 77 |  | ||
| 78 | |||
| 79 | Figure above displays difference between title and description sentiment for specific RSS feed item. 1 means positive and -1 means negative sentiment. | ||
| 80 | |||
| 81 | [» Download Jupyter Notebook](/assets/sentiment-analysis/sentiment-analysis.ipynb) | ||
| 82 | |||
| 83 | ## Going further | ||
| 84 | |||
| 85 | - [Twitter Sentiment Analysis by Bryan Schwierzke](https://github.com/bswiss/news_mood) | ||
| 86 | - [AFINN-based sentiment analysis for Node.js by Andrew Sliwinski](https://github.com/thisandagain/sentiment) | ||
| 87 | - [Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande](https://github.com/adeshpande3/LSTM-Sentiment-Analysis) | ||
| 88 | - [Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir](https://github.com/abdulfatir/twitter-sentiment-analysis) | ||
diff --git a/posts/2020-03-22-simple-sse-based-pubsub-server.md b/posts/2020-03-22-simple-sse-based-pubsub-server.md new file mode 100644 index 0000000..56a7dfa --- /dev/null +++ b/posts/2020-03-22-simple-sse-based-pubsub-server.md | |||
| @@ -0,0 +1,398 @@ | |||
| 1 | --- | ||
| 2 | Title: Simple Server-Sent Events based PubSub Server | ||
| 3 | Description: Simple Server-Sent Events based PubSub Server | ||
| 4 | Slug: simple-server-sent-events-based-pubsub-server | ||
| 5 | Listing: true | ||
| 6 | Created: 2020, March 22 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | ## Before we continue ... | ||
| 11 | |||
| 12 | Publisher Subscriber model is nothing new and there are many amazing solutions out there, so writing a new one would be a waste of time if other solutions wouldn't have quite complex install procedures and weren't so hard to maintain. But to be fair, comparing this simple server with something like [Kafka](https://kafka.apache.org/) or [RabbitMQ](https://www.rabbitmq.com/) is laughable at the least. Those solutions are enterprise grade and have many mechanisms there to ensure messages aren't lost and much more. Regardless of these drawbacks, this method has been tested on a large website and worked until now without any problems. So now, that we got that cleared up, let's continue. | ||
| 13 | |||
| 14 | ***Wiki definition:** Publish/subscribe messaging, or pub/sub messaging, is a form of asynchronous service-to-service communication used in serverless and microservices architectures. In a pub/sub model, any message published to a topic is immediately received by all the subscribers to the topic.* | ||
| 15 | |||
| 16 | ## General goals | ||
| 17 | |||
| 18 | - provide a simple server that relays messages to all the connected clients, | ||
| 19 | - messages can be posted on specific topics, | ||
| 20 | - messages get sent via [Server-Sent Events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) to all the subscribers. | ||
| 21 | |||
| 22 | ## How exactly does the pub/sub model work? | ||
| 23 | |||
| 24 | The easiest way to explain this is with diagram bellow. Basic function is simple. We have subscribers that receive messages, and we have publishers that create and post messages. Similar model is also well know pattern that works on a premise of consumers and producers, and they take similar roles. | ||
| 25 | |||
| 26 |  | ||
| 27 | |||
| 28 | **These are some naive characteristics we want to achieve:** | ||
| 29 | |||
| 30 | - producer is publishing messages to subscribe topic, | ||
| 31 | - consumer is receiving messages from subscribed topic, | ||
| 32 | - servers is also known as Broker, | ||
| 33 | - broker does not store messages or tracks success, | ||
| 34 | - broker uses [FIFO](https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics)) method for delivering messages, | ||
| 35 | - if consumer wants to receive messages from a topic, producer and consumer topics must match, | ||
| 36 | - consumer can subscribe to multiple topics, | ||
| 37 | - producer can publish to multiple topics, | ||
| 38 | - each message has a messageId. | ||
| 39 | |||
| 40 | **Known drawbacks:** | ||
| 41 | |||
| 42 | - messages will not be stored in a persistent queue or unreceived messages like [DeadLetterQueue](https://en.wikipedia.org/wiki/Dead_letter_queue) so old messages could be lost on server restart, | ||
| 43 | - [Server-Sent Events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) opens a long-running connection between the client and the server so make sure if your setup is load balanced that the load balancer in this case can have long opened connection, | ||
| 44 | - no system moderation due to the dynamic nature of creating queues. | ||
| 45 | |||
| 46 | ## Server-Sent Events | ||
| 47 | |||
| 48 | Read more about it on [official specification page](https://html.spec.whatwg.org/multipage/server-sent-events.html). | ||
| 49 | |||
| 50 | ### Current browser support | ||
| 51 | |||
| 52 |  | ||
| 53 | |||
| 54 | Check [https://caniuse.com/#feat=eventsource](https://caniuse.com/#feat=eventsource) for latest information about browser support. | ||
| 55 | |||
| 56 | ### Known issues | ||
| 57 | |||
| 58 | - Firefox 52 and below do not support EventSource in web/shared workers | ||
| 59 | - In Firefox prior to version 36 server-sent events do not reconnect automatically in case of a connection interrupt (bug) | ||
| 60 | - Reportedly, CORS in EventSource is currently supported in Firefox 10+, Opera 12+, Chrome 26+, Safari 7.0+. | ||
| 61 | - Antivirus software may block the event streaming data chunks. | ||
| 62 | |||
| 63 | Source: [https://caniuse.com/#feat=eventsource](https://caniuse.com/#feat=eventsource) | ||
| 64 | |||
| 65 | ### Message format | ||
| 66 | |||
| 67 | The simplest message that can be sent is only with data attribute: | ||
| 68 | |||
| 69 | ```bash | ||
| 70 | data: this is a simple message | ||
| 71 | <blank line> | ||
| 72 | ``` | ||
| 73 | |||
| 74 | You can send message IDs to be used if the connection is dropped: | ||
| 75 | |||
| 76 | ```bash | ||
| 77 | id: 33 | ||
| 78 | data: this is line one | ||
| 79 | data: this is line two | ||
| 80 | <blank line> | ||
| 81 | ``` | ||
| 82 | |||
| 83 | And you can specify your own event types (the above messages will all trigger the message event): | ||
| 84 | |||
| 85 | ```bash | ||
| 86 | id: 36 | ||
| 87 | event: price | ||
| 88 | data: 103.34 | ||
| 89 | <blank line> | ||
| 90 | ``` | ||
| 91 | |||
| 92 | ### Server requirements | ||
| 93 | |||
| 94 | The important thing is how you send headers and which headers are sent by the server that triggers browser to threat response as a EventStream. | ||
| 95 | |||
| 96 | Headers responsible for this are: | ||
| 97 | |||
| 98 | ```bash | ||
| 99 | Content-Type: text/event-stream | ||
| 100 | Cache-Control: no-cache | ||
| 101 | Connection: keep-alive | ||
| 102 | ``` | ||
| 103 | |||
| 104 | ### Debugging with Google Chrome | ||
| 105 | |||
| 106 | Google Chrome provides build-in debugging and exploration tool for [Server-Sent Events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) which is quite nice and available from Developer Tools under Network tab. | ||
| 107 | |||
| 108 | > You can debug only client side events that get received and not the server ones. For debugging server events add `console.log` to `server.js` code and print out events. | ||
| 109 | |||
| 110 |  | ||
| 111 | |||
| 112 | ## Server implementation | ||
| 113 | |||
| 114 | For the sake of this example we will use [Node.js](https://nodejs.org/en/) with [Express](https://expressjs.com) as our router since this is the easiest way to get started and we will use already written SSE library for node [sse-pubsub](https://www.npmjs.com/package/sse-pubsub) so we don't reinvent the wheel. | ||
| 115 | |||
| 116 | ```bash | ||
| 117 | npm init --yes | ||
| 118 | |||
| 119 | npm install express | ||
| 120 | npm install body-parser | ||
| 121 | npm install sse-pubsub | ||
| 122 | ``` | ||
| 123 | |||
| 124 | Basic implementation of a server (`server.js`): | ||
| 125 | |||
| 126 | ```js | ||
| 127 | const express = require('express'); | ||
| 128 | const bodyParser = require('body-parser'); | ||
| 129 | const SSETopic = require('sse-pubsub'); | ||
| 130 | |||
| 131 | const app = express(); | ||
| 132 | const port = process.env.PORT || 4000; | ||
| 133 | |||
| 134 | // topics container | ||
| 135 | const sseTopics = {}; | ||
| 136 | |||
| 137 | app.use(bodyParser.json()); | ||
| 138 | |||
| 139 | // open for all cors | ||
| 140 | app.all('*', (req, res, next) => { | ||
| 141 | res.header('Access-Control-Allow-Origin', '*'); | ||
| 142 | res.header('Access-Control-Allow-Headers', 'X-Requested-With, Content-Type'); | ||
| 143 | next(); | ||
| 144 | }); | ||
| 145 | |||
| 146 | // preflight request error fix | ||
| 147 | app.options('*', async (req, res) => { | ||
| 148 | res.header('Access-Control-Allow-Origin', '*'); | ||
| 149 | res.header('Access-Control-Allow-Headers', 'X-Requested-With, Content-Type'); | ||
| 150 | res.send('OK'); | ||
| 151 | }); | ||
| 152 | |||
| 153 | // serve the event streams | ||
| 154 | app.get('/stream/:topic', async (req, res, next) => { | ||
| 155 | const topic = req.params.topic; | ||
| 156 | |||
| 157 | if (!(topic in sseTopics)) { | ||
| 158 | sseTopics[topic] = new SSETopic({ | ||
| 159 | pingInterval: 0, | ||
| 160 | maxStreamDuration: 15000, | ||
| 161 | }); | ||
| 162 | } | ||
| 163 | |||
| 164 | // subscribing client to topic | ||
| 165 | sseTopics[topic].subscribe(req, res); | ||
| 166 | }); | ||
| 167 | |||
| 168 | // accepts new messages into topic | ||
| 169 | app.post('/publish', async (req, res) => { | ||
| 170 | let body = req.body; | ||
| 171 | let status = 200; | ||
| 172 | |||
| 173 | console.log('Incoming message:', req.body); | ||
| 174 | |||
| 175 | if ( | ||
| 176 | body.hasOwnProperty('topic') && | ||
| 177 | body.hasOwnProperty('event') && | ||
| 178 | body.hasOwnProperty('message') | ||
| 179 | ) { | ||
| 180 | const topic = req.body.topic; | ||
| 181 | const event = req.body.event; | ||
| 182 | const message = req.body.message; | ||
| 183 | |||
| 184 | if (topic in sseTopics) { | ||
| 185 | // sends message to all the subscribers | ||
| 186 | sseTopics[topic].publish(message, event); | ||
| 187 | } | ||
| 188 | } else { | ||
| 189 | status = 400; | ||
| 190 | } | ||
| 191 | |||
| 192 | res.status(status).send({ | ||
| 193 | status, | ||
| 194 | }); | ||
| 195 | }); | ||
| 196 | |||
| 197 | // returns JSON object of all opened topics | ||
| 198 | app.get('/status', async (req, res) => { | ||
| 199 | res.send(sseTopics); | ||
| 200 | }); | ||
| 201 | |||
| 202 | // health-check endpoint | ||
| 203 | app.get('/', async (req, res) => { | ||
| 204 | res.send('OK'); | ||
| 205 | }); | ||
| 206 | |||
| 207 | // return a 404 if no routes match | ||
| 208 | app.use((req, res, next) => { | ||
| 209 | res.set('Cache-Control', 'private, no-store'); | ||
| 210 | res.status(404).end('Not found'); | ||
| 211 | }); | ||
| 212 | |||
| 213 | // starts the server | ||
| 214 | app.listen(port, () => { | ||
| 215 | console.log(`PubSub server running on http://localhost:${port}`); | ||
| 216 | }); | ||
| 217 | ``` | ||
| 218 | |||
| 219 | ### Our custom message format | ||
| 220 | |||
| 221 | Each message posted on a server must be in a specific format that out server accepts. Having structure like this allows us to have multiple separated type of events on each topic. | ||
| 222 | |||
| 223 | With this we can separate streams and only receive events that belong to the topic. | ||
| 224 | |||
| 225 | One example would be, that we have index page and we want to receive messages about new upvotes or new subscribers but we don't want to follow events for other pages. This reduces clutter and overall network. And structure is much nicer and maintanable. | ||
| 226 | |||
| 227 | ```json | ||
| 228 | { | ||
| 229 | "topic": "sample-topic", | ||
| 230 | "event": "sample-event", | ||
| 231 | "message": { "name": "John" } | ||
| 232 | } | ||
| 233 | ``` | ||
| 234 | |||
| 235 | ## Publisher and subscriber clients | ||
| 236 | |||
| 237 | ### Publisher and subscriber in action | ||
| 238 | |||
| 239 | <video src="/assets/simple-pubsub-server/clients.mp4" controls></video> | ||
| 240 | |||
| 241 | You can download [the code](../assets/simple-pubsub-server/sse-pubsub-server.zip) and follow along. | ||
| 242 | |||
| 243 | ### Publisher | ||
| 244 | |||
| 245 | As talked about above publisher is the one that send messages to the broker/server. Message inside the payload can be whatever you want (string, object, array). I would however personally avoid send large chunks of data like blobs and such. | ||
| 246 | |||
| 247 | ```html | ||
| 248 | <!DOCTYPE html> | ||
| 249 | <html lang="en"> | ||
| 250 | |||
| 251 | <head> | ||
| 252 | <meta charset="UTF-8"> | ||
| 253 | <meta name="viewport" content="width=device-width, initial-scale=1.0"> | ||
| 254 | <title>Publisher</title> | ||
| 255 | </head> | ||
| 256 | |||
| 257 | <body> | ||
| 258 | |||
| 259 | <h1>Publisher</h1> | ||
| 260 | |||
| 261 | <fieldset> | ||
| 262 | <p> | ||
| 263 | <label>Server:</label> | ||
| 264 | <input type="text" id="server" value="http://localhost:4000"> | ||
| 265 | </p> | ||
| 266 | <p> | ||
| 267 | <label>Topic:</label> | ||
| 268 | <input type="text" id="topic" value="sample-topic"> | ||
| 269 | </p> | ||
| 270 | <p> | ||
| 271 | <label>Event:</label> | ||
| 272 | <input type="text" id="event" value="sample-event"> | ||
| 273 | </p> | ||
| 274 | <p> | ||
| 275 | <label>Message:</label> | ||
| 276 | <input type="text" id="message" value='{"name": "John"}'> | ||
| 277 | </p> | ||
| 278 | <p> | ||
| 279 | <button type="button" id="button">Publish message to topic</button> | ||
| 280 | </p> | ||
| 281 | </fieldset> | ||
| 282 | |||
| 283 | <script> | ||
| 284 | |||
| 285 | const button = document.querySelector('#button'); | ||
| 286 | const server = document.querySelector('#server'); | ||
| 287 | const topic = document.querySelector('#topic'); | ||
| 288 | const event = document.querySelector('#event'); | ||
| 289 | const message = document.querySelector('#message'); | ||
| 290 | |||
| 291 | button.addEventListener('click', async (evt) => { | ||
| 292 | const req = await fetch(`${server.value}/publish`, { | ||
| 293 | method: 'post', | ||
| 294 | headers: { | ||
| 295 | 'Accept': 'application/json', | ||
| 296 | 'Content-Type': 'application/json', | ||
| 297 | }, | ||
| 298 | body: JSON.stringify({ | ||
| 299 | topic: topic.value, | ||
| 300 | event: event.value, | ||
| 301 | message: JSON.parse(message.value), | ||
| 302 | }), | ||
| 303 | }); | ||
| 304 | |||
| 305 | const res = await req.json(); | ||
| 306 | console.log(res); | ||
| 307 | }); | ||
| 308 | |||
| 309 | </script> | ||
| 310 | |||
| 311 | </body> | ||
| 312 | |||
| 313 | </html> | ||
| 314 | |||
| 315 | ``` | ||
| 316 | |||
| 317 | ### Subscriber | ||
| 318 | |||
| 319 | Subscriber is responsible for receiving new messages that come from server via publisher. The code bellow is very rudimentary but works and follows the implementation guidelines for EventSource. | ||
| 320 | |||
| 321 | You can use either Developer Tools Console to see incoming messages or you can defer to Debugging with Google Chrome section above to see all EventStream messages. | ||
| 322 | |||
| 323 | > Don't be alarmed if the subscriber gets disconnected from the server every so often. The code we have here resets connection every 15s but it automatically get reconnected and fetches all messages up to last received message id. This setting can be adjusted in `server.js` file; search for the `maxStreamDuration` variable. | ||
| 324 | |||
| 325 | ```html | ||
| 326 | <!DOCTYPE html> | ||
| 327 | <html lang="en"> | ||
| 328 | |||
| 329 | <head> | ||
| 330 | <meta charset="UTF-8"> | ||
| 331 | <meta name="viewport" content="width=device-width, initial-scale=1.0"> | ||
| 332 | <title>Subscriber</title> | ||
| 333 | <link rel="stylesheet" href="style.css"> | ||
| 334 | </head> | ||
| 335 | |||
| 336 | <body> | ||
| 337 | |||
| 338 | <h1>Subscriber</h1> | ||
| 339 | |||
| 340 | <fieldset> | ||
| 341 | <p> | ||
| 342 | <label>Server:</label> | ||
| 343 | <input type="text" id="server" value="http://localhost:4000"> | ||
| 344 | </p> | ||
| 345 | <p> | ||
| 346 | <label>Topic:</label> | ||
| 347 | <input type="text" id="topic" value="sample-topic"> | ||
| 348 | </p> | ||
| 349 | <p> | ||
| 350 | <label>Event:</label> | ||
| 351 | <input type="text" id="event" value="sample-event"> | ||
| 352 | </p> | ||
| 353 | <p> | ||
| 354 | <button type="button" id="button">Subscribe to topic</button> | ||
| 355 | </p> | ||
| 356 | </fieldset> | ||
| 357 | |||
| 358 | <script> | ||
| 359 | |||
| 360 | const button = document.querySelector('#button'); | ||
| 361 | const server = document.querySelector('#server'); | ||
| 362 | const topic = document.querySelector('#topic'); | ||
| 363 | const event = document.querySelector('#event'); | ||
| 364 | |||
| 365 | button.addEventListener('click', async (evt) => { | ||
| 366 | |||
| 367 | let es = new EventSource(`${server.value}/stream/${topic.value}`); | ||
| 368 | |||
| 369 | es.addEventListener(event.value, function (evt) { | ||
| 370 | console.log(`incoming message`, JSON.parse(evt.data)); | ||
| 371 | }); | ||
| 372 | |||
| 373 | es.addEventListener('open', function (evt) { | ||
| 374 | console.log('connected', evt); | ||
| 375 | }); | ||
| 376 | |||
| 377 | es.addEventListener('error', function (evt) { | ||
| 378 | console.log('error', evt); | ||
| 379 | }); | ||
| 380 | |||
| 381 | }); | ||
| 382 | |||
| 383 | </script> | ||
| 384 | |||
| 385 | </body> | ||
| 386 | |||
| 387 | </html> | ||
| 388 | |||
| 389 | ``` | ||
| 390 | |||
| 391 | ## Reading further | ||
| 392 | |||
| 393 | - [Using server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) | ||
| 394 | - [Using SSE Instead Of WebSockets For Unidirectional Data Flow Over HTTP/2](https://www.smashingmagazine.com/2018/02/sse-websockets-data-flow-http2/) | ||
| 395 | - [What is Server-Sent Events?](https://apifriends.com/api-streaming/server-sent-events/) | ||
| 396 | - [An HTTP/2 extension for bidirectional messaging communication](https://tools.ietf.org/id/draft-xie-bidirectional-messaging-01.html) | ||
| 397 | - [Introduction to HTTP/2](https://developers.google.com/web/fundamentals/performance/http2) | ||
| 398 | - [The WebSocket API (WebSockets)](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API) | ||
diff --git a/posts/2020-03-27-create-placeholder-images-with-sharp.md b/posts/2020-03-27-create-placeholder-images-with-sharp.md new file mode 100644 index 0000000..ef035c9 --- /dev/null +++ b/posts/2020-03-27-create-placeholder-images-with-sharp.md | |||
| @@ -0,0 +1,85 @@ | |||
| 1 | --- | ||
| 2 | Title: Create placeholder images with sharp Node.js image processing library | ||
| 3 | Description: Create placeholder images with sharp Node.js image processing library | ||
| 4 | Slug: create-placeholder-images-with-sharp | ||
| 5 | Listing: true | ||
| 6 | Created: 2020, March 27 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | I have been searching for a solution to pre-generate some placeholder images for image server I needed to develop that resizes images on S3. I though this would be a 15min job and quickly found out how very mistaken I was. | ||
| 11 | |||
| 12 | Even though Node.js is not really the best way to do this kind of things (surely something written in C or Rust or even Golang would be the correct way to do this but we didn't need the speed in our case) I found an excellent library [sharp - High performance Node.js image processing](https://github.com/lovell/sharp). | ||
| 13 | |||
| 14 | Getting things running was a breeze. | ||
| 15 | |||
| 16 | ## Fetch image from S3 and save resized | ||
| 17 | |||
| 18 | ```js | ||
| 19 | const sharp = require('sharp'); | ||
| 20 | const aws = require('aws-sdk'); | ||
| 21 | |||
| 22 | const x,y = 100; | ||
| 23 | const s3 = new aws.S3({}); | ||
| 24 | |||
| 25 | aws.config.update({ | ||
| 26 | secretAccessKey: 'secretAccessKey', | ||
| 27 | accessKeyId: 'accessKeyId', | ||
| 28 | region: 'region' | ||
| 29 | }); | ||
| 30 | |||
| 31 | const originalImage = await s3.getObject({ | ||
| 32 | Bucket: 'some-bucket-name', | ||
| 33 | Key: 'image.jpg', | ||
| 34 | }).promise(); | ||
| 35 | |||
| 36 | const resizedImage = await sharp(originalImage.Body) | ||
| 37 | .resize(x, y) | ||
| 38 | .jpeg({ progressive: true }) | ||
| 39 | .toBuffer(); | ||
| 40 | |||
| 41 | s3.putObject({ | ||
| 42 | Bucket: 'some-bucket-name', | ||
| 43 | Key: `optimized/${x}x${y}/image.jpg`, | ||
| 44 | Body: resizedImage, | ||
| 45 | ContentType: 'image/jpeg', | ||
| 46 | ACL: 'public-read' | ||
| 47 | }).promise(); | ||
| 48 | ``` | ||
| 49 | |||
| 50 | All this code was wrapped inside a web service with some additional security checks and defensive coding to detect if key is missing on S3. | ||
| 51 | |||
| 52 | And at that point I needed to return placeholder images as a response in case key is missing or x,y are not allowed by the server etc. I could have created PNG in Gimp and just serve them but I wanted to respect aspect ratio and I didn't want to return some mangled images. | ||
| 53 | |||
| 54 | > Main problem with finding a clean solution I could copy and paste and change a bit was a task. API is changing constantly and there weren't clear examples or I was unable to find them. | ||
| 55 | |||
| 56 | ## Generating placeholder images using SVG | ||
| 57 | |||
| 58 | What I ended up was using SVG to generate text and created image with sharp and used composition to combine both layers. Response returned by this function is a buffer you can use to either upload to S3 or save to local file. | ||
| 59 | |||
| 60 | ```js | ||
| 61 | const generatePlaceholderImageWithText = async (width, height, message) => { | ||
| 62 | const overlay = `<svg width="${width - 20}" height="${height - 20}"> | ||
| 63 | <text x="50%" y="50%" font-family="sans-serif" font-size="16" text-anchor="middle">${message}</text> | ||
| 64 | </svg>`; | ||
| 65 | |||
| 66 | return await sharp({ | ||
| 67 | create: { | ||
| 68 | width: width, | ||
| 69 | height: height, | ||
| 70 | channels: 4, | ||
| 71 | background: { r: 230, g: 230, b: 230, alpha: 1 } | ||
| 72 | } | ||
| 73 | }) | ||
| 74 | .composite([{ | ||
| 75 | input: Buffer.from(overlay), | ||
| 76 | gravity: 'center', | ||
| 77 | }]) | ||
| 78 | .jpeg() | ||
| 79 | .toBuffer(); | ||
| 80 | } | ||
| 81 | ``` | ||
| 82 | |||
| 83 | That is about it. Nothing more to it. You can change the color of the image by changing `background` and if you want to change text styling you can adapt SVG to your needs. | ||
| 84 | |||
| 85 | > Also be careful about the length of the text. This function positions text at the center and adds `20px` padding on all sides. If text is longer than the image it will get cut. | ||
diff --git a/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md b/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md new file mode 100644 index 0000000..7d70a7d --- /dev/null +++ b/posts/2020-03-29-the-strange-case-of-elasticsearch-allocation-failure.md | |||
| @@ -0,0 +1,78 @@ | |||
| 1 | --- | ||
| 2 | Title: The strange case of Elasticsearch allocation failure | ||
| 3 | Description: Elasticsearch allocation failure on some indices while reporting domain processing | ||
| 4 | Slug: the-strange-case-of-elasticsearch-allocation-failure | ||
| 5 | Listing: true | ||
| 6 | Created: 2020, March 29 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | I've been using Elasticsearch in production for 5 years now and never had a single problem with it. Hell, never even known there could be a problem. Just worked. All this time. The first node that I deployed is still being used in production, never updated, upgraded, touched in anyway. | ||
| 11 | |||
| 12 | All this bliss came to an abrupt end this Friday when I got notification that Elasticsearch cluster went warm. Well, warm is not that bad right? Wrong! Quickly after that I got another email which sent chills down my spine. Cluster is now red. RED! Now, shit really hit the fan! | ||
| 13 | |||
| 14 | I tried googling what could be the problem and after executing allocation function noticed that some shards were unassigned and 5 attempts were already made (which is BTW to my luck the maximum) and that meant I am basically fucked. They also applied that one should wait for cluster to re-balance itself. So, I waited. One hour, two hours, several hours. Nothing, still RED. | ||
| 15 | |||
| 16 | The strangest thing about it all was, that queries were still being fulfilled. Data was coming out. On the outside it looked like nothing was wrong but everybody that would look at the cluster would know immediately that something was very very wrong and we were living on borrowed time here. | ||
| 17 | |||
| 18 | > **Please, DO NOT do what I did.** Seriously! Please ask someone on official forums or if you know an expert please consult him. There could be million of reasons and these solution fit my problem. Maybe in your case it would disastrous. I had all the data backed up and even if I would fail spectacularly I would be able to restore the data. It would be a huge pain and I would loose couple of days but I had a plan B. | ||
| 19 | |||
| 20 | Executing allocation and told me what the problem was but no clear solution yet. | ||
| 21 | |||
| 22 | ```yaml | ||
| 23 | GET /_cat/allocation?format=json | ||
| 24 | ``` | ||
| 25 | |||
| 26 | I got a message that `ALLOCATION_FAILED` with additional info `failed to create shard, failure ioexception[failed to obtain in-memory shard lock]`. Well splendid! I must also say that our cluster is capable more than enough to handle the traffic. Also JVM memory pressure never was an issue. So what happened really then? | ||
| 27 | |||
| 28 | I tried also re-routing failed ones with no success due to AWS restrictions on having managed Elasticsearch cluster (they lock some of the functions). | ||
| 29 | |||
| 30 | ```yaml | ||
| 31 | POST /_cluster/reroute?retry_failed=true | ||
| 32 | ``` | ||
| 33 | |||
| 34 | I got a message that significantly reduced my options. | ||
| 35 | |||
| 36 | ```json | ||
| 37 | { | ||
| 38 | "Message": "Your request: '/_cluster/reroute' is not allowed." | ||
| 39 | } | ||
| 40 | ``` | ||
| 41 | |||
| 42 | After that I went on a hunt again. I won't bother you with all the details because hours/days went by until I was finally able to re-index the problematic index and hoped for the best. Until that moment even re-indexing was giving me errors. | ||
| 43 | |||
| 44 | ```yaml | ||
| 45 | POST _reindex | ||
| 46 | { | ||
| 47 | "source": { | ||
| 48 | "index": "myindex" | ||
| 49 | }, | ||
| 50 | "dest": { | ||
| 51 | "index": "myindex-new" | ||
| 52 | } | ||
| 53 | } | ||
| 54 | ``` | ||
| 55 | |||
| 56 | I needed to do this multiple times to get all the documents re-indexed. Then I dropped the original one with the following command. | ||
| 57 | |||
| 58 | ```yaml | ||
| 59 | DELETE /myindex | ||
| 60 | ``` | ||
| 61 | |||
| 62 | And re-indexed again new one in the original one (well by name only). | ||
| 63 | |||
| 64 | ```yaml | ||
| 65 | POST _reindex | ||
| 66 | { | ||
| 67 | "source": { | ||
| 68 | "index": "myindex-new" | ||
| 69 | }, | ||
| 70 | "dest": { | ||
| 71 | "index": "myindex" | ||
| 72 | } | ||
| 73 | } | ||
| 74 | ``` | ||
| 75 | |||
| 76 | On the surface it looks like all is working but I have a long road in front of me to get all the things working again. Cluster now shows that it is in Green mode but I am also getting a notification that the cluster has processing status which could mean million of things. | ||
| 77 | |||
| 78 | Godspeed! | ||
diff --git a/posts/2020-03-30-my-love-and-hate-relationship-with-nodejs copy.md b/posts/2020-03-30-my-love-and-hate-relationship-with-nodejs copy.md new file mode 100644 index 0000000..70e0f51 --- /dev/null +++ b/posts/2020-03-30-my-love-and-hate-relationship-with-nodejs copy.md | |||
| @@ -0,0 +1,43 @@ | |||
| 1 | --- | ||
| 2 | Title: My love and hate relationship with Node.js | ||
| 3 | Description: How I found a way to love and hate Node.js with a passion | ||
| 4 | Slug: my-love-and-hate-relationship-with-nodejs | ||
| 5 | Listing: true | ||
| 6 | Created: 2020, March 30 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | Previous project I was working on was being coded in [Golang](https://golang.org/). Also was my first project using it. And damn, that was an awesome experience. The whole thing is just superb. From how errors are handled. The C-like way you handle compiling. The way the language is structured making it incredibly versatile and easy to learn. | ||
| 11 | |||
| 12 | It may cause some pain for somebody that is not used of using interfaces to map JSON and doing the recompilation all the time. But we have tools like [entr](http://eradman.com/entrproject/) and [make](https://www.gnu.org/software/make/) to fix that. | ||
| 13 | |||
| 14 | But we are not here to talk about my undying love for **Golang**. Only in some way we probably should. It is an excellent example of how modern language should be designed. And because I have used it extensively in the last couple of years this probably taints my views of other languages. And is doing me a great disservice. Nevertheless, here we are. | ||
| 15 | |||
| 16 | About two years ago I started flirting with [Node.js](https://nodejs.org/en/) for a project I started working on. What I wanted was to have things written in a language that is widely used, and we could get additional developers for. As much as **Golang** is amazing it's really hard to get developers for it. Even now. And after playing around with it for a week I felt in love with the speed of iteration and massive package ecosystem. Do you want SSO? You got it! Do you want some esoteric library for something? There is a strong chance somebody wrote it. It is so extensive that you find yourself evaluating packages based on **GitHub stars** and number of contributors. You get swallowed by the vanity metrics and that potentially will become the downfall of Node.js. | ||
| 17 | |||
| 18 | Because of the sheer amount of choice I often got anxiety when choosing libraries. Will I choose the correct one? Is this library something that will be supported for a foreseeable future or not? I am used of using libraries that are being in development for 10 years plus (Python, C) and that gave me some sort of comfort. And it is probably unfair to Node.js and community to expect same dedication. | ||
| 19 | |||
| 20 | Moving forward ... Work started and things were great. **Speed of iteration was insane**. For some feature that I would need a day in Golang only took me hour or two. I became lazy! Using packages all over the place. Falling into the same trap as others. Packages on top of packages. And [npm](https://www.npmjs.com/) didn't help at all. The way that the package manager works is just horrendous. And not allowing to have node_modules outside the project is also the stupidest idea ever. | ||
| 21 | |||
| 22 | So at that point I started feeling the technical debt that comes with Node.js and the whole ecosystem. What nobody tells you is that **structuring large Node.js apps** is more problematic than one would think. And going microservice for every single thing is also a bad idea. The amount of networking you introduce with that approach always ends up being a pain in the ass. And I don't even want to go into system administration here. The overhead is insane. Package-lock.json made many days feel like living hell for me. And I would eat the cost of all this if it meant for better development experience. Well, it didn't. | ||
| 23 | |||
| 24 | The **lack of Typescript** support in the interpreter is still mind boggling to me. Why haven't they added native support yet for this is beyond me?! That would have solved so many problems. Lack of type safety became a problem somewhere in the middle of the project where the codebase was sufficiently large enough to present problems. We started adding arguments to functions and there was **no way to implicitly define argument types**. And because at that point there were a lot of functions, it became impossible to know what each one accepts, development became more and more trial and error based. | ||
| 25 | |||
| 26 | I tried **implementing Typescript**, but that would present a large refactor that we were not willing to do at that point. The benefits were not enough. I also tried [Flow - static type checker](https://flow.org/) but implementation was also horrible. What Typescript and Flow forces you is to have src folder and then **transpile** your code into dist folder and run it with node. WTH is that all about. Why can't this be done in memory or some virtual file system? Why? I see no reason why this couldn't be done like this. But it is what it is. I abandoned all hope for static type checking. | ||
| 27 | |||
| 28 | One of the problems that resulted from not having interfaces or types was inability to model out our data from **Elasticsearch**. I could have done a **pedestrian implementation** of it, but there must be a better way of doing this without resorting to some hack basically. Or maybe I haven't found a solution, which is also a possibility. I have looked, though. No juice! | ||
| 29 | |||
| 30 | **Error handling?** Is that a joke? | ||
| 31 | |||
| 32 | Thank god for **await/async**. Without it, I would have probably just abandoned the whole thing and went with something else like Python. That's all I am going to say about this :) | ||
| 33 | |||
| 34 | I started asking myself a question if Node.js is actually ready to be used in a **large scale applications**? And this was a totally wrong question. What I should have been asking myself was, how to use Node.js in large scale application. And you don't get this in **marketing material** for Express or Koa etc. They never tell you this. Making Node.js scale on infrastructure or in codebase is really **more of an art than a science**. And just like with the whole JavaScript ecosystem: | ||
| 35 | - impossible to master, | ||
| 36 | - half of your time you work on your tooling, | ||
| 37 | - just accept transpilers that convert one code into another (holly smokes), | ||
| 38 | - error handling is a joke, | ||
| 39 | - standards? What standards? | ||
| 40 | |||
| 41 | But on the other hand. As I did, you will also learn to love it. Learn to use it quickly and do impossible things in crazy limited time. | ||
| 42 | |||
| 43 | I hate to admit it. But I love Node.js. Dammit, I love it :) | ||
diff --git a/posts/2020-05-05-remote-work.md b/posts/2020-05-05-remote-work.md new file mode 100755 index 0000000..1588dbe --- /dev/null +++ b/posts/2020-05-05-remote-work.md | |||
| @@ -0,0 +1,39 @@ | |||
| 1 | --- | ||
| 2 | Title: Remote work and how it affects the daily lives of people | ||
| 3 | Description: Remote work and how it affects the daily lives of people | ||
| 4 | Slug: remote-work | ||
| 5 | Listing: true | ||
| 6 | Created: 2020, May 5 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | I have been working remotely for the past 5 years. I love it. Love the freedom and make your schedule thingy. | ||
| 11 | |||
| 12 | ## You work more not less | ||
| 13 | |||
| 14 | I've heard from people things like: "Oh, you are so lucky, working from home, having all the free time you want". It was obvious they had no clue what means working remotely. They had this romantic idea of remote work. You can watch TV whenever you like, you can go outside for a picnic if you want and stuff like that. | ||
| 15 | |||
| 16 | This may be true if you work a day or two in a week from home. But if you go completely remote all these changes completely. I take some time to acclimate but then you start feeling the consequences of going fully remote. And it's not all rainbows and unicorns. Rather the opposite. | ||
| 17 | |||
| 18 | ## Feeling lost | ||
| 19 | |||
| 20 | At first, I remembered I felt lost. I was not used to this kind of environment. It felt disoriented and a part of you that is used to procrastinate turns on. You start thinking of a workday as a whole day. And soon this idea of "I can do this later" starts creeping in. Well, I have the whole day ahead of me. I can do this a bit later. | ||
| 21 | |||
| 22 | ## Hyper-performance | ||
| 23 | |||
| 24 | As a direct result, you become more focused on your work since you don't have all the interruptions common in the workplace. And you can quickly get used to this hyper-performance. But this mode requires also a lot of peace and quiet. | ||
| 25 | |||
| 26 | And here we come to the ugly parts of all this. **People rarely have the self-control** to not waste other people's time. It is paralyzing when people start calling you, sending you chat messages, etc. The thing is, that when I achieve this hyper-performance mode I am completely embroiled in the problem I am solving and this kind of interruptions mess with your head. I need an hour at least to get back in the zone. Sometimes not achieving the same focus the whole day. | ||
| 27 | |||
| 28 | I know that life is not how you want it to be and takes its route but from what I've learned this kind of interruptions can be avoided in 90% of the case easily just by closing any chat programs and putting your phone in a drawer. | ||
| 29 | |||
| 30 | ## Suggestion to all the new remote workers | ||
| 31 | |||
| 32 | - Stop wasting other people's time. You don't bother people at their desks in the office either. | ||
| 33 | - Do not replace daily chats in the hallways with instant messaging software. It will only interrupt people. Nothing good will come of it. | ||
| 34 | - Set your working hours and try to not allow it to bleed outside these boundaries and maintain your routine. | ||
| 35 | - Be prepared that hours will be longer regardless of your good intentions and your well thought of routine. | ||
| 36 | - Try to be hyper-focused and do only one thing at the time. Multitasking is the enemy of progress. | ||
| 37 | - Avoid long meetings and if possible eliminate them. Rather take time to write them out and allow others to respond in their own time. Meetings are usually a large waste of time and most of the people attending them are there just because the manager said so. | ||
| 38 | - The software will not solve your problems. And throwing money at problems neither. | ||
| 39 | - If you are in a managerial position don't supervise any single minute of workers. They are probably giving you more hours anyways. Track progress weekly not daily. You hired them and give them the benefit of the doubt that they will deliver what you agreed upon. | ||
diff --git a/posts/2020-08-15-systemd-disable-wake-onmouse.md b/posts/2020-08-15-systemd-disable-wake-onmouse.md new file mode 100644 index 0000000..f4ac0ee --- /dev/null +++ b/posts/2020-08-15-systemd-disable-wake-onmouse.md | |||
| @@ -0,0 +1,49 @@ | |||
| 1 | --- | ||
| 2 | Title: Disable mouse wake from suspend with systemd service | ||
| 3 | Description: Disable mouse wake from suspend with systemd service | ||
| 4 | Slug: disable-mouse-wake-from-suspend-with-systemd-service | ||
| 5 | Listing: true | ||
| 6 | Created: 2020, August 15 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | I recently bought [ThinkPad X220](https://www.laptopmag.com/reviews/laptops/lenovo-thinkpad-x220) just as a joke on eBay to test Linux distributions and play around with things and not destroy my main machine. Little to my knowledge I felt in love with it. Man, they really made awesome machines back then. | ||
| 11 | |||
| 12 | After changing disk that came with it to SSD and installing Ubuntu to test if everything works I noticed that even after a single touch of my external mouse the system would wake up from sleep even though the lid was shut down. | ||
| 13 | |||
| 14 | I wouldn't even noticed it if laptop didn't have [LED sleep indicator](https://support.lenovo.com/lk/en/solutions/~/media/Images/ContentImages/p/pd025386_x1_status_03.ashx?w=426&h=262). I already had a bad experience with Linux and it's power management. I had a [Dell Inspiron 7537](https://www.pcmag.com/reviews/dell-inspiron-15-7537) laptop with a touchscreen and while traveling it decided to wake up and started cooking in my backpack to the point that the digitizer responsible for touch actually glue off and the whole screen got wrecked. So, I am a bit touchy about this. | ||
| 15 | |||
| 16 | I went on solution hunting and to my surprise there is no easy way to disable specific devices to perform wake up. Why is this not under the power management tab in setting is really strange. | ||
| 17 | |||
| 18 | After googling for a solution I found [this nice article describing the solution](https://codetrips.com/2020/03/18/ubuntu-disable-mouse-wake-from-suspend/) that worked for me. The only problem with this solution was that he added his solution to `.bashrc` and this triggers `sudo` that asks for a password each time new terminal is opened, which get annoying quickly since I open a lot of terminals all the time. | ||
| 19 | |||
| 20 | I followed his instructions and got to solution `sudo sh -c "echo 'disabled' > /sys/bus/usb/devices/2-1.1/power/wakeup"`. | ||
| 21 | |||
| 22 | I created a system service file `sudo nano /etc/systemd/system/disable-mouse-wakeup.service` and removed `sudo` and replaced `sh` with `/usr/bin/sh` and pasted all that in `ExecStart`. | ||
| 23 | |||
| 24 | ```ini | ||
| 25 | [Unit] | ||
| 26 | Description=Disables wakeup on mouse event | ||
| 27 | After=network.target | ||
| 28 | StartLimitIntervalSec=0 | ||
| 29 | |||
| 30 | [Service] | ||
| 31 | Type=simple | ||
| 32 | Restart=always | ||
| 33 | RestartSec=1 | ||
| 34 | User=root | ||
| 35 | ExecStart=/usr/bin/sh -c "echo 'disabled' > /sys/bus/usb/devices/2-1.1/power/wakeup" | ||
| 36 | |||
| 37 | [Install] | ||
| 38 | WantedBy=multi-user.target | ||
| 39 | ``` | ||
| 40 | |||
| 41 | After that I enabled, started and checked status of service. | ||
| 42 | |||
| 43 | ```sh | ||
| 44 | sudo systemctl enable disable-mouse-wakeup.service | ||
| 45 | sudo systemctl start disable-mouse-wakeup.service | ||
| 46 | sudo systemctl status disable-mouse-wakeup.service | ||
| 47 | ``` | ||
| 48 | |||
| 49 | This will permanently disable that device from wakeing up you computer on boot. If you have many devices you would like to surpress from waking up your machine I would create a shell script and call that instead of direclty doing it in service file. | ||
diff --git a/posts/2020-09-06-esp-and-micropython.md b/posts/2020-09-06-esp-and-micropython.md new file mode 100644 index 0000000..1052795 --- /dev/null +++ b/posts/2020-09-06-esp-and-micropython.md | |||
| @@ -0,0 +1,205 @@ | |||
| 1 | --- | ||
| 2 | Title: Getting started with MicroPython and ESP8266 | ||
| 3 | Description: Getting started with MicroPython and ESP8266 | ||
| 4 | Slug: esp8266-and-micropython-guide | ||
| 5 | Listing: true | ||
| 6 | Created: 2020, September 6 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | **Table of contents** | ||
| 11 | |||
| 12 | 1. [Introduction](#introduction) | ||
| 13 | 2. [Flashing the SOC](#flashing-the-soc) | ||
| 14 | 3. [Install better tooling](#install-better-tooling) | ||
| 15 | 4. [Additional resources](#additional-resources) | ||
| 16 | |||
| 17 | |||
| 18 | ## Introduction | ||
| 19 | |||
| 20 | A while ago I bought some [ESP8266](https://www.espressif.com/en/products/socs/esp8266) and [ESP32](https://www.espressif.com/en/products/socs/esp32) dev boards to play around with and I finally found a project to try it out. | ||
| 21 | |||
| 22 | For my project, I used [ESP32](https://www.espressif.com/en/products/socs/esp32) but I could easily choose [ESP8266](https://www.espressif.com/en/products/socs/esp8266). This guide contains which tools I use and how I prepared my workspace to code for [ESP8266](https://www.espressif.com/en/products/socs/esp8266). | ||
| 23 | |||
| 24 |  | ||
| 25 | |||
| 26 | This guide covers: | ||
| 27 | - flashing SOC | ||
| 28 | - install proper tooling | ||
| 29 | - deploying a simple script | ||
| 30 | |||
| 31 | > Make sure that you are using **a good USB cable**. I had some problems with mine and once I replaced it everything started to work. | ||
| 32 | |||
| 33 | ## Flashing the SOC | ||
| 34 | |||
| 35 | Plug your ESP8266 to USB port and check if the device was recognized with executing `dmesg | grep ch341-uart`. | ||
| 36 | |||
| 37 | Then check if the device is available under `/dev/` by running `ls /dev/ttyUSB*`. | ||
| 38 | |||
| 39 | > **Linux users**: if a device is not available be sure you are in `dialout` group. You can check this by executing `groups $USER`. You can add a user to `dialout` group with `sudo adduser $USER dialout`. | ||
| 40 | |||
| 41 | After these conditions are meet go to the navigate to [https://micropython.org/download/esp8266/](https://micropython.org/download/esp8266/) and download `esp8266-20200902-v1.13.bin`. | ||
| 42 | |||
| 43 | ```sh | ||
| 44 | mkdir esp8266-test | ||
| 45 | cd esp8266-test | ||
| 46 | |||
| 47 | wget https://micropython.org/resources/firmware/esp8266-20200902-v1.13.bin | ||
| 48 | ``` | ||
| 49 | |||
| 50 | After obtaining firmware we will need some tooling to flash the firmware to the board. | ||
| 51 | |||
| 52 | ```sh | ||
| 53 | sudo pip3 install esptool | ||
| 54 | ``` | ||
| 55 | |||
| 56 | You can read more about `esptool` at [https://github.com/espressif/esptool/](https://github.com/espressif/esptool/). | ||
| 57 | |||
| 58 | Before flashing the firmware we need to erase the flash on device. Substitute `USB0` with the device listed in output of `ls /dev/ttyUSB*`. | ||
| 59 | |||
| 60 | ```sh | ||
| 61 | esptool.py --port /dev/ttyUSB0 erase_flash | ||
| 62 | ``` | ||
| 63 | |||
| 64 | If flash was successfully erased it is now time to flash the new firmware to it. | ||
| 65 | |||
| 66 | ```sh | ||
| 67 | esptool.py --port /dev/ttyUSB0 --baud 460800 write_flash --flash_size=detect 0 esp8266-20200902-v1.13.bin | ||
| 68 | ``` | ||
| 69 | |||
| 70 | If everything went ok you can try accessing MicroPython REPL with `screen /dev/ttyUSB0 115200` or `picocom /dev/ttyUSB0 -b115200`. | ||
| 71 | |||
| 72 | > Sometimes you will need to press `ENTER` in `screen` or `picocom` to access REPL. | ||
| 73 | |||
| 74 | When you are in REPL you can test if all is working properly following steps. | ||
| 75 | |||
| 76 | ```py | ||
| 77 | > import machine | ||
| 78 | > machine.freq() | ||
| 79 | ``` | ||
| 80 | |||
| 81 | This should output a number representing a frequency of the CPU (mine was `80000000`). | ||
| 82 | |||
| 83 | When you are in `screen` or `picocom` these can help you a bit. | ||
| 84 | |||
| 85 | | Key | Command | | ||
| 86 | | -------- | -------------------- | | ||
| 87 | | CTRL+d | preforms soft reboot | | ||
| 88 | | CTRL+a x | exits picocom | | ||
| 89 | | CTRL+a \ | exits screen | | ||
| 90 | |||
| 91 | |||
| 92 | ## Install better tooling | ||
| 93 | |||
| 94 | Now, to make our lives a little bit easier there are couple of additional tools that will make this whole experience a little more bearable. | ||
| 95 | |||
| 96 | There are twq cool ways of uploading local files to SOC flash. | ||
| 97 | |||
| 98 | - ampy → [https://github.com/scientifichackers/ampy](https://github.com/scientifichackers/ampy) | ||
| 99 | - rshell → [https://github.com/dhylands/rshell](https://github.com/dhylands/rshell) | ||
| 100 | |||
| 101 | ### ampy | ||
| 102 | |||
| 103 | ```bash | ||
| 104 | # installing ampy | ||
| 105 | sudo pip3 install adafruit-ampy | ||
| 106 | ``` | ||
| 107 | |||
| 108 | Listed below are some common commands I used. | ||
| 109 | |||
| 110 | ```bash | ||
| 111 | |||
| 112 | # uploads file to flash | ||
| 113 | ampy --delay 2 --port /dev/ttyUSB0 put boot.py | ||
| 114 | |||
| 115 | # lists file on flash | ||
| 116 | ampy --delay 2 --port /dev/ttyUSB0 ls | ||
| 117 | |||
| 118 | # outputs contents of file on flash | ||
| 119 | ampy --delay 2 --port /dev/ttyUSB0 cat boot.py | ||
| 120 | ``` | ||
| 121 | |||
| 122 | > I added `delay` of 2 seconds because I had problems with executing commands. | ||
| 123 | |||
| 124 | ### rshell | ||
| 125 | |||
| 126 | Even though `ampy` is a cool tool I opted with `rshell` in the end since it's much more polished and feature rich. | ||
| 127 | |||
| 128 | ```bash | ||
| 129 | # installing ampy | ||
| 130 | sudo pip3 install rshell | ||
| 131 | ``` | ||
| 132 | |||
| 133 | Now that `rshell` is installed we can connect to the board. | ||
| 134 | |||
| 135 | ```bash | ||
| 136 | rshell --buffer-size=30 -p /dev/ttyUSB0 -a | ||
| 137 | ``` | ||
| 138 | |||
| 139 | This will open a shell inside bash and from here you can execute multiple commands. You can check what is supported with `help` once you are inside of a shell. | ||
| 140 | |||
| 141 | ```bash | ||
| 142 | m@turing ~/Junk/esp8266-test | ||
| 143 | $ rshell --buffer-size=30 -p /dev/ttyUSB0 -a | ||
| 144 | |||
| 145 | Using buffer-size of 30 | ||
| 146 | Connecting to /dev/ttyUSB0 (buffer-size 30)... | ||
| 147 | Trying to connect to REPL connected | ||
| 148 | Testing if ubinascii.unhexlify exists ... Y | ||
| 149 | Retrieving root directories ... /boot.py/ | ||
| 150 | Setting time ... Sep 06, 2020 23:54:28 | ||
| 151 | Evaluating board_name ... pyboard | ||
| 152 | Retrieving time epoch ... Jan 01, 2000 | ||
| 153 | Welcome to rshell. Use Control-D (or the exit command) to exit rshell. | ||
| 154 | /home/m/Junk/esp8266-test> help | ||
| 155 | |||
| 156 | Documented commands (type help <topic>): | ||
| 157 | ======================================== | ||
| 158 | args cat connect date edit filesize help mkdir rm shell | ||
| 159 | boards cd cp echo exit filetype ls repl rsync | ||
| 160 | |||
| 161 | Use Control-D (or the exit command) to exit rshell. | ||
| 162 | ``` | ||
| 163 | |||
| 164 | > Inside a shell `ls` will display list of files on your machine. To get list of files on flash folder `/pyboard` is remapped inside the shell. To list files on flash you must perform `ls /pyboard`. | ||
| 165 | |||
| 166 | #### Moving files to flash | ||
| 167 | |||
| 168 | To avoid copying files all the time I used `rsync` function from the inside of `rshell`. | ||
| 169 | |||
| 170 | ```bash | ||
| 171 | rsync . /pyboard | ||
| 172 | ``` | ||
| 173 | |||
| 174 | #### Executing scripts | ||
| 175 | |||
| 176 | It is a pain to continuously reboot the device to trigger `/pyboard/boot.py` and there is a better way of testing local scripts on remote device. | ||
| 177 | |||
| 178 | Lets assume we have `src/freq.py` file that displays CPU frequency of a remote device. | ||
| 179 | |||
| 180 | ```py | ||
| 181 | # src/freq.py | ||
| 182 | |||
| 183 | import machine | ||
| 184 | print(machine.freq()) | ||
| 185 | ``` | ||
| 186 | |||
| 187 | Now lets upload this and execute it. | ||
| 188 | |||
| 189 | ```bash | ||
| 190 | # syncs files to remove device | ||
| 191 | rsync ./src /pyboard | ||
| 192 | |||
| 193 | # goes into REPL | ||
| 194 | repl | ||
| 195 | |||
| 196 | # we import file by importing it without .py extension and this will run the script | ||
| 197 | > import freq | ||
| 198 | |||
| 199 | # CTRL+x will exit REPL | ||
| 200 | ``` | ||
| 201 | |||
| 202 | ## Additional resources | ||
| 203 | |||
| 204 | - [https://randomnerdtutorials.com/getting-started-micropython-esp32-esp8266/](https://randomnerdtutorials.com/getting-started-micropython-esp32-esp8266/) | ||
| 205 | - [http://docs.micropython.org/en/latest/esp8266/quickref.html](http://docs.micropython.org/en/latest/esp8266/quickref.html) | ||
diff --git a/posts/2020-09-08-bind-warning-on-login.md b/posts/2020-09-08-bind-warning-on-login.md new file mode 100644 index 0000000..2ccc3c6 --- /dev/null +++ b/posts/2020-09-08-bind-warning-on-login.md | |||
| @@ -0,0 +1,42 @@ | |||
| 1 | --- | ||
| 2 | Title: Fix bind warning in .profile on login in Ubuntu | ||
| 3 | Description: Fix bind warning in .profile on login in Ubuntu | ||
| 4 | Slug: bind-warning-on-login-in-ubuntu | ||
| 5 | Listing: true | ||
| 6 | Created: 2020, September 8 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | Recently I moved back to [bash](https://www.gnu.org/software/bash/) as my default shell. I was previously using [fish](https://fishshell.com/) and got used to the cool features it has. But, regardless of that, I wanted to move to a more standard shell because I was hopping back and forth with exporting variables and stuff like that which got pretty annoying. | ||
| 11 | |||
| 12 | So I embarked on a mission to make [bash](https://www.gnu.org/software/bash/) more like [fish](https://fishshell.com/) and in the process found that I really missed autosuggest with TAB on changing directories. | ||
| 13 | |||
| 14 | I found a nice alternative that emulates [zsh](http://zsh.sourceforge.net/) like autosuggestion and autocomplete so I added the following to my `.bashrc` file. | ||
| 15 | |||
| 16 | ```bash | ||
| 17 | bind "TAB:menu-complete" | ||
| 18 | bind "set show-all-if-ambiguous on" | ||
| 19 | bind "set completion-ignore-case on" | ||
| 20 | bind "set menu-complete-display-prefix on" | ||
| 21 | bind '"\e[Z":menu-complete-backward' | ||
| 22 | ``` | ||
| 23 | |||
| 24 | I haven't noticed anything wrong with this and all was working fine until I restarted my machine and then I got this error. | ||
| 25 | |||
| 26 |  | ||
| 27 | |||
| 28 | When I pressed OK, I got into the [Gnome shell](https://wiki.gnome.org/Projects/GnomeShell) and all was working fine, but the error was still bugging me. I started looking for the reason why this is happening and found a solution to this error on [Remote SSH Commands - bash bind warning: line editing not enabled](https://superuser.com/a/892682). | ||
| 29 | |||
| 30 | So I added a simple `if [ -t 1 ]` around `bind` statements to avoid running commands that presume the session is interactive when it isn't. | ||
| 31 | |||
| 32 | ```bash | ||
| 33 | if [ -t 1 ]; then | ||
| 34 | bind "TAB:menu-complete" | ||
| 35 | bind "set show-all-if-ambiguous on" | ||
| 36 | bind "set completion-ignore-case on" | ||
| 37 | bind "set menu-complete-display-prefix on" | ||
| 38 | bind '"\e[Z":menu-complete-backward' | ||
| 39 | fi | ||
| 40 | ``` | ||
| 41 | |||
| 42 | After logging out and back in the problem was gone. | ||
diff --git a/posts/2020-09-09-digitalocean-sync.md b/posts/2020-09-09-digitalocean-sync.md new file mode 100644 index 0000000..eeaf096 --- /dev/null +++ b/posts/2020-09-09-digitalocean-sync.md | |||
| @@ -0,0 +1,66 @@ | |||
| 1 | --- | ||
| 2 | Title: Using Digitalocean Spaces to sync between computers | ||
| 3 | Description: Using Digitalocean Spaces to sync between computers | ||
| 4 | Slug: digitalocean-spaces-to-sync-between-computers | ||
| 5 | Listing: true | ||
| 6 | Created: 2020, September 9 | ||
| 7 | Tags: [] | ||
| 8 | --- | ||
| 9 | |||
| 10 | I've been using [Dropbox](https://www.dropbox.com/) for probably **10+ years** now and I-ve became so used to it that it runs in the background that I don't even imagine a world without it. But it's not without problems. | ||
| 11 | |||
| 12 | At first I had problems with `.venv` environments for Python and the only solution for excluding synchronization for this folder was to manually exclude a specific folder which is not really scalable. FYI, my whole project folder is synced on [Dropbox](https://www.dropbox.com/). This of course introduced a lot of syncing of files and folders that are not needed or even break things on other machines. In the case of **Python**, I couldn't use that on my second machine. I needed to delete `.venv` folder and pip it again which synced files again to the main machine. This was very frustrating. **Nodejs** handles this much nicer and I can just run the scripts without deleting `node_modules` again and reinstalling. However, `node_modules` is a beast of its own. It creates so many files that OS has a problem counting them when you check the folder contents for size. | ||
| 13 | |||
| 14 | I wanted something similar to Dropbox. I could without the instant syncing but it would need to be fast and had the option for me to exclude folders like `node_modules, .venv, .git` and folders like that. | ||
| 15 | |||
| 16 | I went on a hunt for an alternative to [Dropbox](https://www.dropbox.com/) and found: | ||
| 17 | |||
| 18 | - [Tresorit](https://tresorit.com/) | ||
| 19 | - [Sync.com](https://sync.com) | ||
| 20 | - [Box](https://www.box.com/) | ||
| 21 | |||
| 22 | You know, the usual list of suspects. I didn't include [Google drive](https://drive.google.com) or [One drive](https://onedrive.live.com/) since they are even more draconian than Dropbox. | ||
| 23 | |||
| 24 | > All this does not stem from me being paranoid but recently these companies have became more and more aggressive and they keep violating our privacy when they share our data with 3rd party services. It is getting out of control. | ||
| 25 | |||
| 26 | So, my main problem was still there. No way of excluding a specific folder from syncing. And before we go into "*But you have git, isn't that enough?*", I must say, that many of the files (PDFs, spreadsheets, etc) I have in a `git` repo don't get pushed upstream to Git and I still want to have them synced across my computers. | ||
| 27 | |||
| 28 | I initially wanted to use [rsync](https://linux.die.net/man/1/rsync) but I would need to then have a remote VPS or transfer between my computers directly. I wanted a solution where all my files could be accessible to me without my machine. | ||
| 29 | |||
| 30 | > **WARNING: This solution will cost you money!** DigitalOcean Spaces are $5 per month and there are some bandwidth limitations and if you go beyond that you get billed additionally. | ||
| 31 | |||
| 32 | Then I remembered that I could use something like [S3](https://en.wikipedia.org/wiki/Amazon_S3) since it has versioning and is fully managed. I didn't want to go down the AWS rabbit hole with this so I choose [DigitalOcean Spaces](https://www.digitalocean.com/products/spaces/). | ||
| 33 | |||
| 34 | Then I needed a command-line tool to sync between source and target. I found this nice tool [s3cmd](https://s3tools.org/s3cmd) and it is in the Ubuntu repositories. | ||
| 35 | |||
| 36 | ```bash | ||
| 37 | sudo apt install s3cmd | ||
| 38 | ``` | ||
| 39 | |||
| 40 | After installation will I create a new Space bucket on DigitalOcean. Remember the zone you will choose because you will need it when you will configure `s3cmd`. | ||
| 41 | |||
| 42 | Then I visited [Digitalocean Applications & API](https://cloud.digitalocean.com/account/api/tokens) and generated **Spaces access keys**. Save both key and secret somewhere safe because when you will leave the page secret will not be available anymore to you and you will need to re-generate it. | ||
| 43 | |||
| 44 | ```bash | ||
| 45 | # enter your key and secret and correct endpoint | ||
| 46 | # my endpoint is ams3.digitaloceanspaces.com because | ||
| 47 | # I created my bucket in Amsterdam regiin | ||
| 48 | s3cmd --configure | ||
| 49 | ``` | ||
| 50 | After that I played around with options for `s3cmd` and got to the following command. | ||
| 51 | |||
| 52 | ```bash | ||
| 53 | # I executed this command from my projects folder | ||
| 54 | cd projects | ||
| 55 | s3cmd sync --delete-removed --exclude 'node_modules/*' --exclude '.git/*' --exclude '.venv/*' ./ s3://my-bucket-name/projects/ | ||
| 56 | ``` | ||
| 57 | |||
| 58 | When syncing int he other direction you will need to change the order of the `SOURCE` and `TARGET` to `s3://my-bucket-name/projects/` and `./`. | ||
| 59 | |||
| 60 | > Be sure that all the paths have trailing slash so that sync knows that this are directories. | ||
| 61 | |||
| 62 | I am planning to implement some sort of a `.ignore` file that will enable me to have a project-specific exclude options. | ||
| 63 | |||
| 64 | I am currently running this every hour as a cronjob which is perfectly fine for now when I am testing how this whole thing works and how it all will turn out. | ||
| 65 | |||
| 66 | I have also created a small Gnome extension which is still very unstable, but when/if this whole experiment pays of I will share on Github. | ||
