aboutsummaryrefslogtreecommitdiff
path: root/content/2017-04-17-what-i-ve-learned-developing-ad-server.md
diff options
context:
space:
mode:
authorMitja Felicijan <mitja.felicijan@gmail.com>2019-02-17 21:53:36 +0100
committerMitja Felicijan <mitja.felicijan@gmail.com>2019-02-17 21:53:36 +0100
commit8e9ef5ba62b8bee028428384ad5666e245eb854c (patch)
treeb382c5b40f122b2a152da2226006abab34abe105 /content/2017-04-17-what-i-ve-learned-developing-ad-server.md
parentad974810d43e1d5f70bca269665c25230e6a3221 (diff)
downloadmitjafelicijan.com-8e9ef5ba62b8bee028428384ad5666e245eb854c.tar.gz
content update
Diffstat (limited to 'content/2017-04-17-what-i-ve-learned-developing-ad-server.md')
-rw-r--r--content/2017-04-17-what-i-ve-learned-developing-ad-server.md144
1 files changed, 144 insertions, 0 deletions
diff --git a/content/2017-04-17-what-i-ve-learned-developing-ad-server.md b/content/2017-04-17-what-i-ve-learned-developing-ad-server.md
new file mode 100644
index 0000000..77b396d
--- /dev/null
+++ b/content/2017-04-17-what-i-ve-learned-developing-ad-server.md
@@ -0,0 +1,144 @@
1---
2layout: post
3title: What I've learned developing ad server
4description: Lessons I learned developing contextual ad server
5slug: what-i-ve-learned-developing-ad-server
6date: 2017-04-17
7---
8
9**Table of contents**
10
111. [Aggregate everything](#aggregate-everything)
122. [Measure everything](#measure-everything)
133. [Cache control is your friend](#cache-control-is-your-friend)
144. [Learn NGINX](#learn-nginx)
155. [Use Redis/Memcached](#use-redismemcached)
166. [Conclusion](#conclusion)
17
18For the past year and half I have been developing native advertising server that contextually matches ads and displays them in different template forms on variety of websites. This project grew from serving thousands of ads per day to millions.
19
20The system is made from couple of core components:
21
22- API for serving ads,
23- Utils - cronjobs and queue management tools,
24- Dashboard UI.
25
26Initial release was using [MongoDB](https://www.mongodb.com/) for full-text search but was later replaced by [Elasticsearch](https://www.elastic.co/) for better CPU utilization and better search performance. This provided us with many amazing functionalities of [Elasticsearch](https://www.elastic.co/). You should check it out if you do any search related operations.
27
28Because the premise of the server is to provide native ad experience, they are rendered on the client side via simple templating engine. This ensures that ads can be displayed number of different ways based on the visual style of the page. And this makes Javascript client library quite complex.
29
30So now that you know basic information about the product lets get into the lessons we learned.
31
32## Aggregate everything
33
34After beta version was released everything (impressions, clicks, etc) was written in nanosecond resolution in the database. At that time we were using [PostgreSQL](https://www.postgresql.org/) and database quickly grew way above 200GB in disk space. And that was problematic. Statistics took disturbingly long time to aggregate. Also using indexes on stats table in database was no help after we reached 500 million datapoints.
35
36> There is a marketing product information and there is real life experience. And the tend to be quite the opposite.
37
38This was the reason that now everything is aggregated on daily basis and this data is then fed to Elastic in form of daily summary. With this we achieved we can now track many more dimensions such as zone, channel and platform information. And with this information we can now adapt occurrences of ads on specific places more precisely.
39
40We have also adapted [Redis](https://redis.io/) as a full-time citizen in our stack. Because Redis also stores information on a local disk we have some sort of backup if server would accidentally suffer some failure.
41
42All the real-time statistics for ad serving and redirecting is presented as counters in Redis instance and daily extracted and pushed to Elastic.
43
44## Measure everything
45
46The thing about software is that we really don't know how well it is performing under load until such load is presented. When testing locally everything is fine but when on production things tend to fall apart.
47
48As a solution for this we are measuring everything we can. Function execution time (by encapsulating functions with timers), server performance (cpu, memory, disk, etc), Nginx and [uWSGI](https://uwsgi-docs.readthedocs.io/) performance. We sacrifice a bit of performance for the sake of this information. And we store all this information for later analysis.
49
50**Example of function execution time**
51
52```json
53{
54 "get_final_filtered_ads": {
55 "counter": 1931250,
56 "avg": 0.0066143431,
57 "elapsed": 12773.9500310003
58 },
59 "store_keywords_statistics": {
60 "counter": 1931011,
61 "avg": 0.0004605267,
62 "elapsed": 889.2821669996
63 },
64 "match_by_context": {
65 "counter": 1931011,
66 "avg": 0.0055960716,
67 "elapsed": 10806.0758889999
68 },
69 "match_by_high_performance": {
70 "counter": 262,
71 "avg": 0.0152770229,
72 "elapsed": 4.00258
73 },
74 "store_impression_stats": {
75 "counter": 1931250,
76 "avg": 0.0006189991,
77 "elapsed": 1195.4419869999
78 }
79}
80```
81
82We have also started profiling with [cProfile](https://pymotw.com/2/profile/) and then visualizing with [KCachegrind](http://kcachegrind.sourceforge.net/). This provides much more detailed look into code execution.
83
84## Cache control is your friend
85
86Because we use Javascript library for rendering ads we rely on this script extensively and when in need we need to be able to change behavior of the script quickly.
87
88In our case we can not simply replace javascript url in html code. It usually takes a day or two for the guys who maintain sites to change code or add ?ver=xxx attribute. And this makes rapid deployment and testing very difficult and time consuming. There is a limitation of how much you can test locally.
89
90We are now in the process of integrating [Google Tag Manager](https://www.google.com/analytics/tag-manager/) but couple of websites are developed on ASP.net platform that have some problems with tag manager. With a solution below we are certain that we are serving latest version of the script.
91
92And it only takes one mistake and users have the script cached and in case of caching it for 1 year you probably know where the problem is.
93
94```nginx
95# nginx ➜ /etc/nginx/sites-available/default
96location /static/ {
97 alias /path-to-static-content/;
98 autoindex off;
99 charset utf-8;
100 gzip on;
101 gzip_types text/plain application/javascript application/x-javascript text/javascript text/xml text/css;
102 location ~* \.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)$ {
103 expires 1y;
104 add_header Pragma public;
105 add_header Cache-Control "public";
106 }
107 location ~* \.(css|js|txt)$ {
108 expires 3600s;
109 add_header Pragma public;
110 add_header Cache-Control "public, must-revalidate";
111 }
112}
113```
114
115Also be careful when redirecting to url in your python code. We noticed that if we didn't precisely setup cache control and expire headers in response we didn't get the request on the server and therefore couldn't measure clicks. So when redirecting do as follows and there will be no problems.
116
117```python
118# python ➜ bottlepy web micro-framework
119response = bottle.HTTPResponse(status=302)
120response.set_header("Cache-Control", "no-store, no-cache, must-revalidate")
121response.set_header("Expires", "Thu, 01 Jan 1970 00:00:00 GMT")
122response.set_header("Location", url)
123return response
124```
125
126> Cache control in browsers is quite aggressive and you need to be precise to avoid future problems. We learned that lesson the hard way.
127
128## Learn NGINX
129
130When deciding on a web server we went with Nginx as a reverse proxy for our applications. We adapted micro-service oriented architecture early in the project to ensure when we scale we can easily add additional servers to our cluster. And Nginx was crucial to perform load balancing and static content delivery.
131
132At first our config file was quite simple and later grew larger. After patching and adding new settings I sat down and learned more about the guts of Nginx. This proved to be very useful and we were able to squeeze much more out of our setup. So I advise you to take your time and read through the [documentation](https://nginx.org/en/docs/). This saved us a lot of headache. Googling for solutions only goes so far.
133
134## Use Redis/Memcached
135
136As explained above we are using caching basically for everything. It is the corner stone of our services. At first we were very careful about the quantity of things we stored in [Redis](https://redis.io/). But we later found out that the memory footprint is very low even when storing large amount of data in it.
137
138So we gradually increased our usage to caching whole HTML outputs of dashboard. This improved our performance in order of magnitude. And by using native TTL support this goes hand in hand with our needs.
139
140The reason why we choose [Redis](https://redis.io/) over [Memcached](https://memcached.org/) was the nature of scalability of Redis out of the box. But all this can be achieved with Memcached.
141
142## Conclusion
143
144There are a lot more details that could have been written and every single topic in here deserves it's own post but you probably got the idea about the problems we faced.