From 8697555125c57ae64a0c9b78514b4aac4fd523de Mon Sep 17 00:00:00 2001 From: Mitja Felicijan Date: Tue, 27 Jun 2023 14:50:20 +0200 Subject: Massive formatting and added figcaption --- ...04-17-what-i-ve-learned-developing-ad-server.md | 154 ++++++++++----------- 1 file changed, 77 insertions(+), 77 deletions(-) (limited to 'content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md') diff --git a/content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md b/content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md index 1b9be06..bb98efd 100644 --- a/content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md +++ b/content/posts/2017-04-17-what-i-ve-learned-developing-ad-server.md @@ -5,10 +5,10 @@ date: 2017-04-17T12:00:00+02:00 draft: false --- -For the past year and half I have been developing native advertising server -that contextually matches ads and displays them in different template forms -on variety of websites. This project grew from serving thousands of ads per -day to millions. +For the past year and half I have been developing native advertising server that +contextually matches ads and displays them in different template forms on +variety of websites. This project grew from serving thousands of ads per day to +millions. The system is made from couple of core components: @@ -16,56 +16,56 @@ The system is made from couple of core components: - Utils - cronjobs and queue management tools, - Dashboard UI. -Initial release was using [MongoDB](https://www.mongodb.com/) for full-text -search but was later replaced by [Elasticsearch](https://www.elastic.co/) -for better CPU utilization and better search performance. This provided us -with many amazing functionalities of [Elasticsearch](https://www.elastic.co/). -You should check it out if you do any search related operations. +Initial release was using [MongoDB](https://www.mongodb.com/) for full-text +search but was later replaced by [Elasticsearch](https://www.elastic.co/) for +better CPU utilization and better search performance. This provided us with many +amazing functionalities of [Elasticsearch](https://www.elastic.co/). You should +check it out if you do any search related operations. -Because the premise of the server is to provide native ad experience, they -are rendered on the client side via simple templating engine. This ensures -that ads can be displayed number of different ways based on the visual style -of the page. And this makes JavaScript client library quite complex. +Because the premise of the server is to provide native ad experience, they are +rendered on the client side via simple templating engine. This ensures that ads +can be displayed number of different ways based on the visual style of the +page. And this makes JavaScript client library quite complex. -So now that you know basic information about the product lets get into the +So now that you know basic information about the product lets get into the lessons we learned. ## Aggregate everything -After beta version was released everything (impressions, clicks, etc) was -written in nanosecond resolution in the database. At that time we were using -[PostgreSQL](https://www.postgresql.org/) and database quickly grew way above -200GB in disk space. And that was problematic. Statistics took disturbingly -long time to aggregate. Also using indexes on stats table in database was no -help after we reached 500 million datapoints. +After beta version was released everything (impressions, clicks, etc) was +written in nanosecond resolution in the database. At that time we were using +[PostgreSQL](https://www.postgresql.org/) and database quickly grew way above +200GB in disk space. And that was problematic. Statistics took disturbingly long +time to aggregate. Also using indexes on stats table in database was no help +after we reached 500 million datapoints. -> There is a marketing product information and there is real life experience. +> There is a marketing product information and there is real life experience. And the tend to be quite the opposite. -This was the reason that now everything is aggregated on daily basis and this -data is then fed to Elastic in form of daily summary. With this we achieved we -can now track many more dimensions such as zone, channel and platform information. -And with this information we can now adapt occurrences of ads on specific -places more precisely. +This was the reason that now everything is aggregated on daily basis and this +data is then fed to Elastic in form of daily summary. With this we achieved we +can now track many more dimensions such as zone, channel and platform +information. And with this information we can now adapt occurrences of ads on +specific places more precisely. -We have also adapted [Redis](https://redis.io/) as a full-time citizen in our -stack. Because Redis also stores information on a local disk we have some sort +We have also adapted [Redis](https://redis.io/) as a full-time citizen in our +stack. Because Redis also stores information on a local disk we have some sort of backup if server would accidentally suffer some failure. -All the real-time statistics for ad serving and redirecting is presented as +All the real-time statistics for ad serving and redirecting is presented as counters in Redis instance and daily extracted and pushed to Elastic. ## Measure everything -The thing about software is that we really don't know how well it is performing -under load until such load is presented. When testing locally everything is -fine but when on production things tend to fall apart. +The thing about software is that we really don't know how well it is performing +under load until such load is presented. When testing locally everything is fine +but when on production things tend to fall apart. -As a solution for this we are measuring everything we can. Function execution -time (by encapsulating functions with timers), server performance (cpu, memory, -disk, etc), Nginx and [uWSGI](https://uwsgi-docs.readthedocs.io/) performance. -We sacrifice a bit of performance for the sake of this information. And we -store all this information for later analysis. +As a solution for this we are measuring everything we can. Function execution +time (by encapsulating functions with timers), server performance (cpu, memory, +disk, etc), Nginx and [uWSGI](https://uwsgi-docs.readthedocs.io/) performance. +We sacrifice a bit of performance for the sake of this information. And we store +all this information for later analysis. **Example of function execution time** @@ -99,27 +99,28 @@ store all this information for later analysis. } ``` -We have also started profiling with [cProfile](https://pymotw.com/2/profile/) -and then visualizing with [KCachegrind](http://kcachegrind.sourceforge.net/). +We have also started profiling with [cProfile](https://pymotw.com/2/profile/) +and then visualizing with [KCachegrind](http://kcachegrind.sourceforge.net/). This provides much more detailed look into code execution. ## Cache control is your friend -Because we use Javascript library for rendering ads we rely on this script -extensively and when in need we need to be able to change behavior of the -script quickly. +Because we use Javascript library for rendering ads we rely on this script +extensively and when in need we need to be able to change behavior of the script +quickly. -In our case we can not simply replace javascript url in html code. It usually -takes a day or two for the guys who maintain sites to change code or add -?ver=xxx attribute. And this makes rapid deployment and testing very difficult +In our case we can not simply replace javascript url in html code. It usually +takes a day or two for the guys who maintain sites to change code or add +?ver=xxx attribute. And this makes rapid deployment and testing very difficult and time consuming. There is a limitation of how much you can test locally. -We are now in the process of integrating [Google Tag Manager](https://www.google.com/analytics/tag-manager/) -but couple of websites are developed on ASP.net platform that have some -problems with tag manager. With a solution below we are certain that we are -serving latest version of the script. +We are now in the process of integrating [Google Tag +Manager](https://www.google.com/analytics/tag-manager/) but couple of websites +are developed on ASP.net platform that have some problems with tag manager. With +a solution below we are certain that we are serving latest version of the +script. -And it only takes one mistake and users have the script cached and in case of +And it only takes one mistake and users have the script cached and in case of caching it for 1 year you probably know where the problem is. ```nginx @@ -143,10 +144,10 @@ location /static/ { } ``` -Also be careful when redirecting to url in your python code. We noticed that -if we didn't precisely setup cache control and expire headers in response we -didn't get the request on the server and therefore couldn't measure clicks. -So when redirecting do as follows and there will be no problems. +Also be careful when redirecting to url in your python code. We noticed that if +we didn't precisely setup cache control and expire headers in response we didn't +get the request on the server and therefore couldn't measure clicks. So when +redirecting do as follows and there will be no problems. ```python # python ➜ bottlepy web micro-framework @@ -157,42 +158,41 @@ response.set_header("Location", url) return response ``` -> Cache control in browsers is quite aggressive and you need to be precise -to avoid future problems. We learned that lesson the hard way. +> Cache control in browsers is quite aggressive and you need to be precise to +avoid future problems. We learned that lesson the hard way. ## Learn NGINX -When deciding on a web server we went with Nginx as a reverse proxy for our -applications. We adapted micro-service oriented architecture early in the -project to ensure when we scale we can easily add additional servers to our -cluster. And Nginx was crucial to perform load balancing and static content +When deciding on a web server we went with Nginx as a reverse proxy for our +applications. We adapted micro-service oriented architecture early in the +project to ensure when we scale we can easily add additional servers to our +cluster. And Nginx was crucial to perform load balancing and static content delivery. -At first our config file was quite simple and later grew larger. After patching -and adding new settings I sat down and learned more about the guts of Nginx. -This proved to be very useful and we were able to squeeze much more out of our -setup. So I advise you to take your time and read through the -[documentation](https://nginx.org/en/docs/). This saved us a lot of headache. +At first our config file was quite simple and later grew larger. After patching +and adding new settings I sat down and learned more about the guts of Nginx. +This proved to be very useful and we were able to squeeze much more out of our +setup. So I advise you to take your time and read through the +[documentation](https://nginx.org/en/docs/). This saved us a lot of headache. Googling for solutions only goes so far. ## Use Redis/Memcached -As explained above we are using caching basically for everything. It is the -corner stone of our services. At first we were very careful about the quantity -of things we stored in [Redis](https://redis.io/). But we later found out that +As explained above we are using caching basically for everything. It is the +corner stone of our services. At first we were very careful about the quantity +of things we stored in [Redis](https://redis.io/). But we later found out that the memory footprint is very low even when storing large amount of data in it. -So we gradually increased our usage to caching whole HTML outputs of dashboard. -This improved our performance in order of magnitude. And by using native TTL +So we gradually increased our usage to caching whole HTML outputs of dashboard. +This improved our performance in order of magnitude. And by using native TTL support this goes hand in hand with our needs. -The reason why we choose [Redis](https://redis.io/) over [Memcached](https://memcached.org/) -was the nature of scalability of Redis out of the box. But all this can be -achieved with Memcached. +The reason why we choose [Redis](https://redis.io/) over +[Memcached](https://memcached.org/) was the nature of scalability of Redis out +of the box. But all this can be achieved with Memcached. ## Conclusion -There are a lot more details that could have been written and every single -topic in here deserves it's own post but you probably got the idea about -the problems we faced. - +There are a lot more details that could have been written and every single topic +in here deserves it's own post but you probably got the idea about the problems +we faced. -- cgit v1.2.3