From 1100562e29f6476448b656dbddd4cf22505523f6 Mon Sep 17 00:00:00 2001 From: Mitja Felicijan Date: Sun, 10 Mar 2024 14:59:14 +0100 Subject: Move back to JBMAFP --- ...04-17-what-i-ve-learned-developing-ad-server.md | 199 +++++++++++++++++++++ 1 file changed, 199 insertions(+) create mode 100644 content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md (limited to 'content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md') diff --git a/content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md b/content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md new file mode 100644 index 0000000..966788a --- /dev/null +++ b/content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md @@ -0,0 +1,199 @@ +--- +title: What I've learned developing ad server +url: /what-i-ve-learned-developing-ad-server.html +date: 2017-04-17T12:00:00+02:00 +type: post +draft: false +--- + +For the past year and half I have been developing native advertising server that +contextually matches ads and displays them in different template forms on +variety of websites. This project grew from serving thousands of ads per day to +millions. + +The system is made from couple of core components: + +- API for serving ads, +- Utils - cronjobs and queue management tools, +- Dashboard UI. + +Initial release was using [MongoDB](https://www.mongodb.com/) for full-text +search but was later replaced by [Elasticsearch](https://www.elastic.co/) for +better CPU utilization and better search performance. This provided us with many +amazing functionalities of [Elasticsearch](https://www.elastic.co/). You should +check it out if you do any search related operations. + +Because the premise of the server is to provide native ad experience, they are +rendered on the client side via simple templating engine. This ensures that ads +can be displayed number of different ways based on the visual style of the +page. And this makes JavaScript client library quite complex. + +So now that you know basic information about the product lets get into the +lessons we learned. + +## Aggregate everything + +After beta version was released everything (impressions, clicks, etc) was +written in nanosecond resolution in the database. At that time we were using +[PostgreSQL](https://www.postgresql.org/) and database quickly grew way above +200GB in disk space. And that was problematic. Statistics took disturbingly long +time to aggregate. Also using indexes on stats table in database was no help +after we reached 500 million datapoints. + +> There is a marketing product information and there is real life experience. +And the tend to be quite the opposite. + +This was the reason that now everything is aggregated on daily basis and this +data is then fed to Elastic in form of daily summary. With this we achieved we +can now track many more dimensions such as zone, channel and platform +information. And with this information we can now adapt occurrences of ads on +specific places more precisely. + +We have also adapted [Redis](https://redis.io/) as a full-time citizen in our +stack. Because Redis also stores information on a local disk we have some sort +of backup if server would accidentally suffer some failure. + +All the real-time statistics for ad serving and redirecting is presented as +counters in Redis instance and daily extracted and pushed to Elastic. + +## Measure everything + +The thing about software is that we really don't know how well it is performing +under load until such load is presented. When testing locally everything is fine +but when on production things tend to fall apart. + +As a solution for this we are measuring everything we can. Function execution +time (by encapsulating functions with timers), server performance (cpu, memory, +disk, etc), Nginx and [uWSGI](https://uwsgi-docs.readthedocs.io/) performance. +We sacrifice a bit of performance for the sake of this information. And we store +all this information for later analysis. + +**Example of function execution time** + +```json +{ + "get_final_filtered_ads": { + "counter": 1931250, + "avg": 0.0066143431, + "elapsed": 12773.9500310003 + }, + "store_keywords_statistics": { + "counter": 1931011, + "avg": 0.0004605267, + "elapsed": 889.2821669996 + }, + "match_by_context": { + "counter": 1931011, + "avg": 0.0055960716, + "elapsed": 10806.0758889999 + }, + "match_by_high_performance": { + "counter": 262, + "avg": 0.0152770229, + "elapsed": 4.00258 + }, + "store_impression_stats": { + "counter": 1931250, + "avg": 0.0006189991, + "elapsed": 1195.4419869999 + } +} +``` + +We have also started profiling with [cProfile](https://pymotw.com/2/profile/) +and then visualizing with [KCachegrind](http://kcachegrind.sourceforge.net/). +This provides much more detailed look into code execution. + +## Cache control is your friend + +Because we use Javascript library for rendering ads we rely on this script +extensively and when in need we need to be able to change behavior of the script +quickly. + +In our case we can not simply replace javascript url in html code. It usually +takes a day or two for the guys who maintain sites to change code or add +?ver=xxx attribute. And this makes rapid deployment and testing very difficult +and time consuming. There is a limitation of how much you can test locally. + +We are now in the process of integrating [Google Tag +Manager](https://www.google.com/analytics/tag-manager/) but couple of websites +are developed on ASP.net platform that have some problems with tag manager. With +a solution below we are certain that we are serving latest version of the +script. + +And it only takes one mistake and users have the script cached and in case of +caching it for 1 year you probably know where the problem is. + +```nginx +# nginx ➜ /etc/nginx/sites-available/default +location /static/ { + alias /path-to-static-content/; + autoindex off; + charset utf-8; + gzip on; + gzip_types text/plain application/javascript application/x-javascript text/javascript text/xml text/css; + location ~* \.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)$ { + expires 1y; + add_header Pragma public; + add_header Cache-Control "public"; + } + location ~* \.(css|js|txt)$ { + expires 3600s; + add_header Pragma public; + add_header Cache-Control "public, must-revalidate"; + } +} +``` + +Also be careful when redirecting to url in your python code. We noticed that if +we didn't precisely setup cache control and expire headers in response we didn't +get the request on the server and therefore couldn't measure clicks. So when +redirecting do as follows and there will be no problems. + +```python +# python ➜ bottlepy web micro-framework +response = bottle.HTTPResponse(status=302) +response.set_header("Cache-Control", "no-store, no-cache, must-revalidate") +response.set_header("Expires", "Thu, 01 Jan 1970 00:00:00 GMT") +response.set_header("Location", url) +return response +``` + +> Cache control in browsers is quite aggressive and you need to be precise to +avoid future problems. We learned that lesson the hard way. + +## Learn NGINX + +When deciding on a web server we went with Nginx as a reverse proxy for our +applications. We adapted micro-service oriented architecture early in the +project to ensure when we scale we can easily add additional servers to our +cluster. And Nginx was crucial to perform load balancing and static content +delivery. + +At first our config file was quite simple and later grew larger. After patching +and adding new settings I sat down and learned more about the guts of Nginx. +This proved to be very useful and we were able to squeeze much more out of our +setup. So I advise you to take your time and read through the +[documentation](https://nginx.org/en/docs/). This saved us a lot of headache. +Googling for solutions only goes so far. + +## Use Redis/Memcached + +As explained above we are using caching basically for everything. It is the +corner stone of our services. At first we were very careful about the quantity +of things we stored in [Redis](https://redis.io/). But we later found out that +the memory footprint is very low even when storing large amount of data in it. + +So we gradually increased our usage to caching whole HTML outputs of dashboard. +This improved our performance in order of magnitude. And by using native TTL +support this goes hand in hand with our needs. + +The reason why we choose [Redis](https://redis.io/) over +[Memcached](https://memcached.org/) was the nature of scalability of Redis out +of the box. But all this can be achieved with Memcached. + +## Conclusion + +There are a lot more details that could have been written and every single topic +in here deserves it's own post but you probably got the idea about the problems +we faced. -- cgit v1.2.3