aboutsummaryrefslogtreecommitdiff
path: root/_posts/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md
diff options
context:
space:
mode:
Diffstat (limited to '_posts/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md')
-rw-r--r--_posts/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md200
1 files changed, 200 insertions, 0 deletions
diff --git a/_posts/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md b/_posts/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md
new file mode 100644
index 0000000..10aca0d
--- /dev/null
+++ b/_posts/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md
@@ -0,0 +1,200 @@
1---
2title: What I've learned developing ad server
3permalink: /what-i-ve-learned-developing-ad-server.html
4date: 2017-04-17T12:00:00+02:00
5layout: post
6type: post
7draft: false
8---
9
10For the past year and half I have been developing native advertising server that
11contextually matches ads and displays them in different template forms on
12variety of websites. This project grew from serving thousands of ads per day to
13millions.
14
15The system is made from couple of core components:
16
17- API for serving ads,
18- Utils - cronjobs and queue management tools,
19- Dashboard UI.
20
21Initial release was using [MongoDB](https://www.mongodb.com/) for full-text
22search but was later replaced by [Elasticsearch](https://www.elastic.co/) for
23better CPU utilization and better search performance. This provided us with many
24amazing functionalities of [Elasticsearch](https://www.elastic.co/). You should
25check it out if you do any search related operations.
26
27Because the premise of the server is to provide native ad experience, they are
28rendered on the client side via simple templating engine. This ensures that ads
29can be displayed number of different ways based on the visual style of the
30page. And this makes JavaScript client library quite complex.
31
32So now that you know basic information about the product lets get into the
33lessons we learned.
34
35## Aggregate everything
36
37After beta version was released everything (impressions, clicks, etc) was
38written in nanosecond resolution in the database. At that time we were using
39[PostgreSQL](https://www.postgresql.org/) and database quickly grew way above
40200GB in disk space. And that was problematic. Statistics took disturbingly long
41time to aggregate. Also using indexes on stats table in database was no help
42after we reached 500 million datapoints.
43
44> There is a marketing product information and there is real life experience.
45And the tend to be quite the opposite.
46
47This was the reason that now everything is aggregated on daily basis and this
48data is then fed to Elastic in form of daily summary. With this we achieved we
49can now track many more dimensions such as zone, channel and platform
50information. And with this information we can now adapt occurrences of ads on
51specific places more precisely.
52
53We have also adapted [Redis](https://redis.io/) as a full-time citizen in our
54stack. Because Redis also stores information on a local disk we have some sort
55of backup if server would accidentally suffer some failure.
56
57All the real-time statistics for ad serving and redirecting is presented as
58counters in Redis instance and daily extracted and pushed to Elastic.
59
60## Measure everything
61
62The thing about software is that we really don't know how well it is performing
63under load until such load is presented. When testing locally everything is fine
64but when on production things tend to fall apart.
65
66As a solution for this we are measuring everything we can. Function execution
67time (by encapsulating functions with timers), server performance (cpu, memory,
68disk, etc), Nginx and [uWSGI](https://uwsgi-docs.readthedocs.io/) performance.
69We sacrifice a bit of performance for the sake of this information. And we store
70all this information for later analysis.
71
72**Example of function execution time**
73
74```json
75{
76 "get_final_filtered_ads": {
77 "counter": 1931250,
78 "avg": 0.0066143431,
79 "elapsed": 12773.9500310003
80 },
81 "store_keywords_statistics": {
82 "counter": 1931011,
83 "avg": 0.0004605267,
84 "elapsed": 889.2821669996
85 },
86 "match_by_context": {
87 "counter": 1931011,
88 "avg": 0.0055960716,
89 "elapsed": 10806.0758889999
90 },
91 "match_by_high_performance": {
92 "counter": 262,
93 "avg": 0.0152770229,
94 "elapsed": 4.00258
95 },
96 "store_impression_stats": {
97 "counter": 1931250,
98 "avg": 0.0006189991,
99 "elapsed": 1195.4419869999
100 }
101}
102```
103
104We have also started profiling with [cProfile](https://pymotw.com/2/profile/)
105and then visualizing with [KCachegrind](http://kcachegrind.sourceforge.net/).
106This provides much more detailed look into code execution.
107
108## Cache control is your friend
109
110Because we use Javascript library for rendering ads we rely on this script
111extensively and when in need we need to be able to change behavior of the script
112quickly.
113
114In our case we can not simply replace javascript url in html code. It usually
115takes a day or two for the guys who maintain sites to change code or add
116?ver=xxx attribute. And this makes rapid deployment and testing very difficult
117and time consuming. There is a limitation of how much you can test locally.
118
119We are now in the process of integrating [Google Tag
120Manager](https://www.google.com/analytics/tag-manager/) but couple of websites
121are developed on ASP.net platform that have some problems with tag manager. With
122a solution below we are certain that we are serving latest version of the
123script.
124
125And it only takes one mistake and users have the script cached and in case of
126caching it for 1 year you probably know where the problem is.
127
128```nginx
129# nginx ➜ /etc/nginx/sites-available/default
130location /static/ {
131 alias /path-to-static-content/;
132 autoindex off;
133 charset utf-8;
134 gzip on;
135 gzip_types text/plain application/javascript application/x-javascript text/javascript text/xml text/css;
136 location ~* \.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)$ {
137 expires 1y;
138 add_header Pragma public;
139 add_header Cache-Control "public";
140 }
141 location ~* \.(css|js|txt)$ {
142 expires 3600s;
143 add_header Pragma public;
144 add_header Cache-Control "public, must-revalidate";
145 }
146}
147```
148
149Also be careful when redirecting to url in your python code. We noticed that if
150we didn't precisely setup cache control and expire headers in response we didn't
151get the request on the server and therefore couldn't measure clicks. So when
152redirecting do as follows and there will be no problems.
153
154```python
155# python ➜ bottlepy web micro-framework
156response = bottle.HTTPResponse(status=302)
157response.set_header("Cache-Control", "no-store, no-cache, must-revalidate")
158response.set_header("Expires", "Thu, 01 Jan 1970 00:00:00 GMT")
159response.set_header("Location", url)
160return response
161```
162
163> Cache control in browsers is quite aggressive and you need to be precise to
164avoid future problems. We learned that lesson the hard way.
165
166## Learn NGINX
167
168When deciding on a web server we went with Nginx as a reverse proxy for our
169applications. We adapted micro-service oriented architecture early in the
170project to ensure when we scale we can easily add additional servers to our
171cluster. And Nginx was crucial to perform load balancing and static content
172delivery.
173
174At first our config file was quite simple and later grew larger. After patching
175and adding new settings I sat down and learned more about the guts of Nginx.
176This proved to be very useful and we were able to squeeze much more out of our
177setup. So I advise you to take your time and read through the
178[documentation](https://nginx.org/en/docs/). This saved us a lot of headache.
179Googling for solutions only goes so far.
180
181## Use Redis/Memcached
182
183As explained above we are using caching basically for everything. It is the
184corner stone of our services. At first we were very careful about the quantity
185of things we stored in [Redis](https://redis.io/). But we later found out that
186the memory footprint is very low even when storing large amount of data in it.
187
188So we gradually increased our usage to caching whole HTML outputs of dashboard.
189This improved our performance in order of magnitude. And by using native TTL
190support this goes hand in hand with our needs.
191
192The reason why we choose [Redis](https://redis.io/) over
193[Memcached](https://memcached.org/) was the nature of scalability of Redis out
194of the box. But all this can be achieved with Memcached.
195
196## Conclusion
197
198There are a lot more details that could have been written and every single topic
199in here deserves it's own post but you probably got the idea about the problems
200we faced.