aboutsummaryrefslogtreecommitdiff
path: root/src/notes
diff options
context:
space:
mode:
Diffstat (limited to 'src/notes')
-rw-r--r--src/notes/golang-profiling-simplified.md110
-rw-r--r--src/notes/simplifying-and-reducing-clutter.md21
-rw-r--r--src/notes/what-i-ve-learned-developing-ad-server.md133
3 files changed, 264 insertions, 0 deletions
diff --git a/src/notes/golang-profiling-simplified.md b/src/notes/golang-profiling-simplified.md
new file mode 100644
index 0000000..a49de67
--- /dev/null
+++ b/src/notes/golang-profiling-simplified.md
@@ -0,0 +1,110 @@
1title: Golang profiling simplified
2date: 2017-03-07
3tags: blog
4hide: false
5----
6
7Many posts have been written regarding profiling in Golang and I haven’t found proper tutorial regarding this. Almost all of them are missing some part of important information and it gets pretty frustrating when you have a deadline and are not finding simple distilled solution.
8
9Nevertheless, after searching and experimenting I have found a solution that works for me and probably should also for you.
10
11## Where are my pprof files?
12
13By default pprof files are generated in /tmp/ folder. You can override folder where this files are generated programmatically in your golang code as we will see below in example.
14
15## Why is my CPU profile empty?
16
17I have found out that sometimes CPU profile is empty because program was not executing long enough. Programs, that execute too quickly don’t produce pprof file in my cases. Well, file is generated but only contains 4KB of information.
18
19## Profiling
20
21As you can see from examples we are executing dummy_benchmark functions to ensure some sort of execution. Memory profiling can be done without such a “complex” function. But CPU profiling needs it.
22
23Both memory and CPU profiling examples are almost the same. Only parameters in main function when calling profile.Start are different. When we set profile.ProfilePath(“.”) we tell profiler to store pprof files in the same folder as our program.
24
25### Memory profiling
26
27```go
28package main
29
30import (
31 "fmt"
32 "time"
33 "github.com/pkg/profile"
34)
35
36func dummy_benchmark() {
37
38 fmt.Println("first set ...")
39 for i := 0; i < 918231333; i++ {
40 i *= 2
41 i /= 2
42 }
43
44 <-time.After(time.Second*3)
45
46 fmt.Println("sencond set ...")
47 for i := 0; i < 9182312232; i++ {
48 i *= 2
49 i /= 2
50 }
51}
52
53func main() {
54 defer profile.Start(profile.MemProfile, profile.ProfilePath("."), profile.NoShutdownHook).Stop()
55 dummy_benchmark()
56}
57```
58
59### CPU profiling
60
61```go
62package main
63
64import (
65 "fmt"
66 "time"
67 "github.com/pkg/profile"
68)
69
70func dummy_benchmark() {
71
72 fmt.Println("first set ...")
73 for i := 0; i < 918231333; i++ {
74 i *= 2
75 i /= 2
76 }
77
78 <-time.After(time.Second*3)
79
80 fmt.Println("sencond set ...")
81 for i := 0; i < 9182312232; i++ {
82 i *= 2
83 i /= 2
84 }
85}
86
87func main() {
88 defer profile.Start(profile.CPUProfile, profile.ProfilePath("."), profile.NoShutdownHook).Stop()
89 dummy_benchmark()
90}
91```
92
93### Generating profiling reports
94
95```bash
96# memory profiling
97go build mem.go
98./mem
99go tool pprof -pdf ./mem mem.pprof > mem.pdf
100
101# cpu profiling
102go build cpu.go
103./cpu
104go tool pprof -pdf ./cpu cpu.pprof > cpu.pdf
105```
106
107This will generate PDF document with visualized profile.
108
109- [Memory PDF profile example](/files/go-profiling/golang-profiling-mem.pdf)
110- [CPU PDF profile example](/files/go-profiling/golang-profiling-cpu.pdf)
diff --git a/src/notes/simplifying-and-reducing-clutter.md b/src/notes/simplifying-and-reducing-clutter.md
new file mode 100644
index 0000000..b435834
--- /dev/null
+++ b/src/notes/simplifying-and-reducing-clutter.md
@@ -0,0 +1,21 @@
1title: Simplifying and reducing clutter in my life and work
2date: 2019-10-14
3tags: blog
4hide: false
5----
6
7I recently moved my main working machine back from Hachintosh to Linux. Well the experiment was interesting and I have done some great work on macOS but it was time to move back.
8
9I actually really missed Linux. The simplicity of `apt-get` or just the amount of software that exists for Linux should be a no-brainer. I spent most of my time on macOS finding solutions to make things work. Using [Brew](https://brew.sh/) was just a horrible experience and far from package managers of Linux. At least they managed to get that `sudo` debacle sorted.
10
11Not all was bad. macOS in general was a perfectly good environment. Things like Docker and tooling like this worked without any hiccups. My normal tools like coding IDE worked flawlessly and the whole look and feel is just superb. I have been using MacBook Air for couple of years so I was used to the system but never as a daily driver.
12
13One of the things I did after I installed Linux back on my machine was cleaning up my Dropbox folder. I have everything on Dropbox. Even projects folder. I write code for living so my whole life revolves around couple of megs of code (with assets). So it's not like I have huge files on my machine. I don't have movies or music or pictures on my PC. All of that stuff is in cloud. I use Google music and I have Netflix account which is more than enough for me.
14
15I also went and deleted some of the repositories on my Github account. I have deleted more code than deployed. People find this strange but for me deleting something feels so cathartic and also forces me to write better code next time around when I am faced with similar problem. That was a huge relief if I am being totally honest.
16
17Next step was to do something with my webpage. I have been using some scripts I wrote a while ago to generate static pages from markdown source posts. I kept on adding and adding stuff on top of it and it became a source of a frustration. And this is just a simple blog and I was using gulp and npm. Anyways after couple of hours of searching and testing static generators I found an interesting one [https://github.com/piranha/gostatic](https://github.com/piranha/gostatic) and I just decided to use this one. It was the only one that had a simple templating engine, not that I really need one. But others had this convoluted way of trying to solve everything and at the end just required quite bigger learning curve I was ready to go with. So I deleted couple of old posts, simplified HTML, trashed most of the CSS and went with [https://motherfuckingwebsite.com/](https://motherfuckingwebsite.com/) aesthetics. Yeah, the previous site was more visually stimulating but all I really care is the content at this point. And Times New Roman font is kind of awesome.
18
19I stopped working on most of the projects in the past couple of months because the overhead was just too insane. There comes a point when you stretch yourself too much and then you stop progressing and with that comes dissatisfaction.
20
21So that's about it. Moving forward minimal style.
diff --git a/src/notes/what-i-ve-learned-developing-ad-server.md b/src/notes/what-i-ve-learned-developing-ad-server.md
new file mode 100644
index 0000000..527f9d0
--- /dev/null
+++ b/src/notes/what-i-ve-learned-developing-ad-server.md
@@ -0,0 +1,133 @@
1title: What I've learned developing ad server
2date: 2017-04-17
3tags: blog
4hide: false
5----
6
7For the past year and half I have been developing native advertising server that contextually matches ads and displays them in different template forms on variety of websites. This project grew from serving thousands of ads per day to millions.
8
9The system is made from couple of core components:
10
11- API for serving ads,
12- Utils - cronjobs and queue management tools,
13- Dashboard UI.
14
15Initial release was using [MongoDB](https://www.mongodb.com/) for full-text search but was later replaced by [Elasticsearch](https://www.elastic.co/) for better CPU utilization and better search performance. This provided us with many amazing functionalities of [Elasticsearch](https://www.elastic.co/). You should check it out if you do any search related operations.
16
17Because the premise of the server is to provide native ad experience, they are rendered on the client side via simple templating engine. This ensures that ads can be displayed number of different ways based on the visual style of the page. And this makes JavaScript client library quite complex.
18
19So now that you know basic information about the product lets get into the lessons we learned.
20
21## Aggregate everything
22
23After beta version was released everything (impressions, clicks, etc) was written in nanosecond resolution in the database. At that time we were using [PostgreSQL](https://www.postgresql.org/) and database quickly grew way above 200GB in disk space. And that was problematic. Statistics took disturbingly long time to aggregate. Also using indexes on stats table in database was no help after we reached 500 million datapoints.
24
25> There is a marketing product information and there is real life experience. And the tend to be quite the opposite.
26
27This was the reason that now everything is aggregated on daily basis and this data is then fed to Elastic in form of daily summary. With this we achieved we can now track many more dimensions such as zone, channel and platform information. And with this information we can now adapt occurrences of ads on specific places more precisely.
28
29We have also adapted [Redis](https://redis.io/) as a full-time citizen in our stack. Because Redis also stores information on a local disk we have some sort of backup if server would accidentally suffer some failure.
30
31All the real-time statistics for ad serving and redirecting is presented as counters in Redis instance and daily extracted and pushed to Elastic.
32
33## Measure everything
34
35The thing about software is that we really don't know how well it is performing under load until such load is presented. When testing locally everything is fine but when on production things tend to fall apart.
36
37As a solution for this we are measuring everything we can. Function execution time (by encapsulating functions with timers), server performance (cpu, memory, disk, etc), Nginx and [uWSGI](https://uwsgi-docs.readthedocs.io/) performance. We sacrifice a bit of performance for the sake of this information. And we store all this information for later analysis.
38
39**Example of function execution time**
40
41```json
42{
43 "get_final_filtered_ads": {
44 "counter": 1931250,
45 "avg": 0.0066143431,
46 "elapsed": 12773.9500310003
47 },
48 "store_keywords_statistics": {
49 "counter": 1931011,
50 "avg": 0.0004605267,
51 "elapsed": 889.2821669996
52 },
53 "match_by_context": {
54 "counter": 1931011,
55 "avg": 0.0055960716,
56 "elapsed": 10806.0758889999
57 },
58 "match_by_high_performance": {
59 "counter": 262,
60 "avg": 0.0152770229,
61 "elapsed": 4.00258
62 },
63 "store_impression_stats": {
64 "counter": 1931250,
65 "avg": 0.0006189991,
66 "elapsed": 1195.4419869999
67 }
68}
69```
70
71We have also started profiling with [cProfile](https://pymotw.com/2/profile/) and then visualizing with [KCachegrind](http://kcachegrind.sourceforge.net/). This provides much more detailed look into code execution.
72
73## Cache control is your friend
74
75Because we use Javascript library for rendering ads we rely on this script extensively and when in need we need to be able to change behavior of the script quickly.
76
77In our case we can not simply replace javascript url in html code. It usually takes a day or two for the guys who maintain sites to change code or add ?ver=xxx attribute. And this makes rapid deployment and testing very difficult and time consuming. There is a limitation of how much you can test locally.
78
79We are now in the process of integrating [Google Tag Manager](https://www.google.com/analytics/tag-manager/) but couple of websites are developed on ASP.net platform that have some problems with tag manager. With a solution below we are certain that we are serving latest version of the script.
80
81And it only takes one mistake and users have the script cached and in case of caching it for 1 year you probably know where the problem is.
82
83```nginx
84# nginx ➜ /etc/nginx/sites-available/default
85location /static/ {
86 alias /path-to-static-content/;
87 autoindex off;
88 charset utf-8;
89 gzip on;
90 gzip_types text/plain application/javascript application/x-javascript text/javascript text/xml text/css;
91 location ~* \.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)$ {
92 expires 1y;
93 add_header Pragma public;
94 add_header Cache-Control "public";
95 }
96 location ~* \.(css|js|txt)$ {
97 expires 3600s;
98 add_header Pragma public;
99 add_header Cache-Control "public, must-revalidate";
100 }
101}
102```
103
104Also be careful when redirecting to url in your python code. We noticed that if we didn't precisely setup cache control and expire headers in response we didn't get the request on the server and therefore couldn't measure clicks. So when redirecting do as follows and there will be no problems.
105
106```python
107# python ➜ bottlepy web micro-framework
108response = bottle.HTTPResponse(status=302)
109response.set_header("Cache-Control", "no-store, no-cache, must-revalidate")
110response.set_header("Expires", "Thu, 01 Jan 1970 00:00:00 GMT")
111response.set_header("Location", url)
112return response
113```
114
115> Cache control in browsers is quite aggressive and you need to be precise to avoid future problems. We learned that lesson the hard way.
116
117## Learn NGINX
118
119When deciding on a web server we went with Nginx as a reverse proxy for our applications. We adapted micro-service oriented architecture early in the project to ensure when we scale we can easily add additional servers to our cluster. And Nginx was crucial to perform load balancing and static content delivery.
120
121At first our config file was quite simple and later grew larger. After patching and adding new settings I sat down and learned more about the guts of Nginx. This proved to be very useful and we were able to squeeze much more out of our setup. So I advise you to take your time and read through the [documentation](https://nginx.org/en/docs/). This saved us a lot of headache. Googling for solutions only goes so far.
122
123## Use Redis/Memcached
124
125As explained above we are using caching basically for everything. It is the corner stone of our services. At first we were very careful about the quantity of things we stored in [Redis](https://redis.io/). But we later found out that the memory footprint is very low even when storing large amount of data in it.
126
127So we gradually increased our usage to caching whole HTML outputs of dashboard. This improved our performance in order of magnitude. And by using native TTL support this goes hand in hand with our needs.
128
129The reason why we choose [Redis](https://redis.io/) over [Memcached](https://memcached.org/) was the nature of scalability of Redis out of the box. But all this can be achieved with Memcached.
130
131## Conclusion
132
133There are a lot more details that could have been written and every single topic in here deserves it's own post but you probably got the idea about the problems we faced.