aboutsummaryrefslogtreecommitdiff
path: root/content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md
diff options
context:
space:
mode:
authorMitja Felicijan <m@mitjafelicijan.com>2023-07-08 23:25:41 +0200
committerMitja Felicijan <m@mitjafelicijan.com>2023-07-08 23:25:41 +0200
commitcd6644ea4ddc78597934ab0ef5ba50e3c3daa927 (patch)
tree03de331a8db6386dfd6fa75155bfbcea6b4feaf3 /content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md
parent84ed124529ffeee1590295b8de3a8faf51848680 (diff)
downloadmitjafelicijan.com-cd6644ea4ddc78597934ab0ef5ba50e3c3daa927.tar.gz
Moved to a simpler SSG
Diffstat (limited to 'content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md')
-rw-r--r--content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md198
1 files changed, 0 insertions, 198 deletions
diff --git a/content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md b/content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md
deleted file mode 100644
index bb98efd..0000000
--- a/content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md
+++ /dev/null
@@ -1,198 +0,0 @@
1---
2title: What I've learned developing ad server
3url: what-i-ve-learned-developing-ad-server.html
4date: 2017-04-17T12:00:00+02:00
5draft: false
6---
7
8For the past year and half I have been developing native advertising server that
9contextually matches ads and displays them in different template forms on
10variety of websites. This project grew from serving thousands of ads per day to
11millions.
12
13The system is made from couple of core components:
14
15- API for serving ads,
16- Utils - cronjobs and queue management tools,
17- Dashboard UI.
18
19Initial release was using [MongoDB](https://www.mongodb.com/) for full-text
20search but was later replaced by [Elasticsearch](https://www.elastic.co/) for
21better CPU utilization and better search performance. This provided us with many
22amazing functionalities of [Elasticsearch](https://www.elastic.co/). You should
23check it out if you do any search related operations.
24
25Because the premise of the server is to provide native ad experience, they are
26rendered on the client side via simple templating engine. This ensures that ads
27can be displayed number of different ways based on the visual style of the
28page. And this makes JavaScript client library quite complex.
29
30So now that you know basic information about the product lets get into the
31lessons we learned.
32
33## Aggregate everything
34
35After beta version was released everything (impressions, clicks, etc) was
36written in nanosecond resolution in the database. At that time we were using
37[PostgreSQL](https://www.postgresql.org/) and database quickly grew way above
38200GB in disk space. And that was problematic. Statistics took disturbingly long
39time to aggregate. Also using indexes on stats table in database was no help
40after we reached 500 million datapoints.
41
42> There is a marketing product information and there is real life experience.
43And the tend to be quite the opposite.
44
45This was the reason that now everything is aggregated on daily basis and this
46data is then fed to Elastic in form of daily summary. With this we achieved we
47can now track many more dimensions such as zone, channel and platform
48information. And with this information we can now adapt occurrences of ads on
49specific places more precisely.
50
51We have also adapted [Redis](https://redis.io/) as a full-time citizen in our
52stack. Because Redis also stores information on a local disk we have some sort
53of backup if server would accidentally suffer some failure.
54
55All the real-time statistics for ad serving and redirecting is presented as
56counters in Redis instance and daily extracted and pushed to Elastic.
57
58## Measure everything
59
60The thing about software is that we really don't know how well it is performing
61under load until such load is presented. When testing locally everything is fine
62but when on production things tend to fall apart.
63
64As a solution for this we are measuring everything we can. Function execution
65time (by encapsulating functions with timers), server performance (cpu, memory,
66disk, etc), Nginx and [uWSGI](https://uwsgi-docs.readthedocs.io/) performance.
67We sacrifice a bit of performance for the sake of this information. And we store
68all this information for later analysis.
69
70**Example of function execution time**
71
72```json
73{
74 "get_final_filtered_ads": {
75 "counter": 1931250,
76 "avg": 0.0066143431,
77 "elapsed": 12773.9500310003
78 },
79 "store_keywords_statistics": {
80 "counter": 1931011,
81 "avg": 0.0004605267,
82 "elapsed": 889.2821669996
83 },
84 "match_by_context": {
85 "counter": 1931011,
86 "avg": 0.0055960716,
87 "elapsed": 10806.0758889999
88 },
89 "match_by_high_performance": {
90 "counter": 262,
91 "avg": 0.0152770229,
92 "elapsed": 4.00258
93 },
94 "store_impression_stats": {
95 "counter": 1931250,
96 "avg": 0.0006189991,
97 "elapsed": 1195.4419869999
98 }
99}
100```
101
102We have also started profiling with [cProfile](https://pymotw.com/2/profile/)
103and then visualizing with [KCachegrind](http://kcachegrind.sourceforge.net/).
104This provides much more detailed look into code execution.
105
106## Cache control is your friend
107
108Because we use Javascript library for rendering ads we rely on this script
109extensively and when in need we need to be able to change behavior of the script
110quickly.
111
112In our case we can not simply replace javascript url in html code. It usually
113takes a day or two for the guys who maintain sites to change code or add
114?ver=xxx attribute. And this makes rapid deployment and testing very difficult
115and time consuming. There is a limitation of how much you can test locally.
116
117We are now in the process of integrating [Google Tag
118Manager](https://www.google.com/analytics/tag-manager/) but couple of websites
119are developed on ASP.net platform that have some problems with tag manager. With
120a solution below we are certain that we are serving latest version of the
121script.
122
123And it only takes one mistake and users have the script cached and in case of
124caching it for 1 year you probably know where the problem is.
125
126```nginx
127# nginx ➜ /etc/nginx/sites-available/default
128location /static/ {
129 alias /path-to-static-content/;
130 autoindex off;
131 charset utf-8;
132 gzip on;
133 gzip_types text/plain application/javascript application/x-javascript text/javascript text/xml text/css;
134 location ~* \.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)$ {
135 expires 1y;
136 add_header Pragma public;
137 add_header Cache-Control "public";
138 }
139 location ~* \.(css|js|txt)$ {
140 expires 3600s;
141 add_header Pragma public;
142 add_header Cache-Control "public, must-revalidate";
143 }
144}
145```
146
147Also be careful when redirecting to url in your python code. We noticed that if
148we didn't precisely setup cache control and expire headers in response we didn't
149get the request on the server and therefore couldn't measure clicks. So when
150redirecting do as follows and there will be no problems.
151
152```python
153# python ➜ bottlepy web micro-framework
154response = bottle.HTTPResponse(status=302)
155response.set_header("Cache-Control", "no-store, no-cache, must-revalidate")
156response.set_header("Expires", "Thu, 01 Jan 1970 00:00:00 GMT")
157response.set_header("Location", url)
158return response
159```
160
161> Cache control in browsers is quite aggressive and you need to be precise to
162avoid future problems. We learned that lesson the hard way.
163
164## Learn NGINX
165
166When deciding on a web server we went with Nginx as a reverse proxy for our
167applications. We adapted micro-service oriented architecture early in the
168project to ensure when we scale we can easily add additional servers to our
169cluster. And Nginx was crucial to perform load balancing and static content
170delivery.
171
172At first our config file was quite simple and later grew larger. After patching
173and adding new settings I sat down and learned more about the guts of Nginx.
174This proved to be very useful and we were able to squeeze much more out of our
175setup. So I advise you to take your time and read through the
176[documentation](https://nginx.org/en/docs/). This saved us a lot of headache.
177Googling for solutions only goes so far.
178
179## Use Redis/Memcached
180
181As explained above we are using caching basically for everything. It is the
182corner stone of our services. At first we were very careful about the quantity
183of things we stored in [Redis](https://redis.io/). But we later found out that
184the memory footprint is very low even when storing large amount of data in it.
185
186So we gradually increased our usage to caching whole HTML outputs of dashboard.
187This improved our performance in order of magnitude. And by using native TTL
188support this goes hand in hand with our needs.
189
190The reason why we choose [Redis](https://redis.io/) over
191[Memcached](https://memcached.org/) was the nature of scalability of Redis out
192of the box. But all this can be achieved with Memcached.
193
194## Conclusion
195
196There are a lot more details that could have been written and every single topic
197in here deserves it's own post but you probably got the idea about the problems
198we faced.