diff options
| author | Mitja Felicijan <mitja.felicijan@gmail.com> | 2023-11-01 22:54:27 +0100 |
|---|---|---|
| committer | Mitja Felicijan <mitja.felicijan@gmail.com> | 2023-11-01 22:54:27 +0100 |
| commit | 2417a6b7603524dc5cd30d29b153f91024b9443d (patch) | |
| tree | 9be5ea8e5baba96dd9159217da6badf6157fb595 /_posts/2017-04-17-what-i-ve-learned-developing-ad-server.md | |
| parent | 89ba3497f07a8ea43d209b583f39fcc286acc923 (diff) | |
| download | mitjafelicijan.com-2417a6b7603524dc5cd30d29b153f91024b9443d.tar.gz | |
Move to Jekyll
Diffstat (limited to '_posts/2017-04-17-what-i-ve-learned-developing-ad-server.md')
| -rw-r--r-- | _posts/2017-04-17-what-i-ve-learned-developing-ad-server.md | 200 |
1 files changed, 200 insertions, 0 deletions
diff --git a/_posts/2017-04-17-what-i-ve-learned-developing-ad-server.md b/_posts/2017-04-17-what-i-ve-learned-developing-ad-server.md new file mode 100644 index 0000000..10aca0d --- /dev/null +++ b/_posts/2017-04-17-what-i-ve-learned-developing-ad-server.md | |||
| @@ -0,0 +1,200 @@ | |||
| 1 | --- | ||
| 2 | title: What I've learned developing ad server | ||
| 3 | permalink: /what-i-ve-learned-developing-ad-server.html | ||
| 4 | date: 2017-04-17T12:00:00+02:00 | ||
| 5 | layout: post | ||
| 6 | type: post | ||
| 7 | draft: false | ||
| 8 | --- | ||
| 9 | |||
| 10 | For the past year and half I have been developing native advertising server that | ||
| 11 | contextually matches ads and displays them in different template forms on | ||
| 12 | variety of websites. This project grew from serving thousands of ads per day to | ||
| 13 | millions. | ||
| 14 | |||
| 15 | The system is made from couple of core components: | ||
| 16 | |||
| 17 | - API for serving ads, | ||
| 18 | - Utils - cronjobs and queue management tools, | ||
| 19 | - Dashboard UI. | ||
| 20 | |||
| 21 | Initial release was using [MongoDB](https://www.mongodb.com/) for full-text | ||
| 22 | search but was later replaced by [Elasticsearch](https://www.elastic.co/) for | ||
| 23 | better CPU utilization and better search performance. This provided us with many | ||
| 24 | amazing functionalities of [Elasticsearch](https://www.elastic.co/). You should | ||
| 25 | check it out if you do any search related operations. | ||
| 26 | |||
| 27 | Because the premise of the server is to provide native ad experience, they are | ||
| 28 | rendered on the client side via simple templating engine. This ensures that ads | ||
| 29 | can be displayed number of different ways based on the visual style of the | ||
| 30 | page. And this makes JavaScript client library quite complex. | ||
| 31 | |||
| 32 | So now that you know basic information about the product lets get into the | ||
| 33 | lessons we learned. | ||
| 34 | |||
| 35 | ## Aggregate everything | ||
| 36 | |||
| 37 | After beta version was released everything (impressions, clicks, etc) was | ||
| 38 | written in nanosecond resolution in the database. At that time we were using | ||
| 39 | [PostgreSQL](https://www.postgresql.org/) and database quickly grew way above | ||
| 40 | 200GB in disk space. And that was problematic. Statistics took disturbingly long | ||
| 41 | time to aggregate. Also using indexes on stats table in database was no help | ||
| 42 | after we reached 500 million datapoints. | ||
| 43 | |||
| 44 | > There is a marketing product information and there is real life experience. | ||
| 45 | And the tend to be quite the opposite. | ||
| 46 | |||
| 47 | This was the reason that now everything is aggregated on daily basis and this | ||
| 48 | data is then fed to Elastic in form of daily summary. With this we achieved we | ||
| 49 | can now track many more dimensions such as zone, channel and platform | ||
| 50 | information. And with this information we can now adapt occurrences of ads on | ||
| 51 | specific places more precisely. | ||
| 52 | |||
| 53 | We have also adapted [Redis](https://redis.io/) as a full-time citizen in our | ||
| 54 | stack. Because Redis also stores information on a local disk we have some sort | ||
| 55 | of backup if server would accidentally suffer some failure. | ||
| 56 | |||
| 57 | All the real-time statistics for ad serving and redirecting is presented as | ||
| 58 | counters in Redis instance and daily extracted and pushed to Elastic. | ||
| 59 | |||
| 60 | ## Measure everything | ||
| 61 | |||
| 62 | The thing about software is that we really don't know how well it is performing | ||
| 63 | under load until such load is presented. When testing locally everything is fine | ||
| 64 | but when on production things tend to fall apart. | ||
| 65 | |||
| 66 | As a solution for this we are measuring everything we can. Function execution | ||
| 67 | time (by encapsulating functions with timers), server performance (cpu, memory, | ||
| 68 | disk, etc), Nginx and [uWSGI](https://uwsgi-docs.readthedocs.io/) performance. | ||
| 69 | We sacrifice a bit of performance for the sake of this information. And we store | ||
| 70 | all this information for later analysis. | ||
| 71 | |||
| 72 | **Example of function execution time** | ||
| 73 | |||
| 74 | ```json | ||
| 75 | { | ||
| 76 | "get_final_filtered_ads": { | ||
| 77 | "counter": 1931250, | ||
| 78 | "avg": 0.0066143431, | ||
| 79 | "elapsed": 12773.9500310003 | ||
| 80 | }, | ||
| 81 | "store_keywords_statistics": { | ||
| 82 | "counter": 1931011, | ||
| 83 | "avg": 0.0004605267, | ||
| 84 | "elapsed": 889.2821669996 | ||
| 85 | }, | ||
| 86 | "match_by_context": { | ||
| 87 | "counter": 1931011, | ||
| 88 | "avg": 0.0055960716, | ||
| 89 | "elapsed": 10806.0758889999 | ||
| 90 | }, | ||
| 91 | "match_by_high_performance": { | ||
| 92 | "counter": 262, | ||
| 93 | "avg": 0.0152770229, | ||
| 94 | "elapsed": 4.00258 | ||
| 95 | }, | ||
| 96 | "store_impression_stats": { | ||
| 97 | "counter": 1931250, | ||
| 98 | "avg": 0.0006189991, | ||
| 99 | "elapsed": 1195.4419869999 | ||
| 100 | } | ||
| 101 | } | ||
| 102 | ``` | ||
| 103 | |||
| 104 | We have also started profiling with [cProfile](https://pymotw.com/2/profile/) | ||
| 105 | and then visualizing with [KCachegrind](http://kcachegrind.sourceforge.net/). | ||
| 106 | This provides much more detailed look into code execution. | ||
| 107 | |||
| 108 | ## Cache control is your friend | ||
| 109 | |||
| 110 | Because we use Javascript library for rendering ads we rely on this script | ||
| 111 | extensively and when in need we need to be able to change behavior of the script | ||
| 112 | quickly. | ||
| 113 | |||
| 114 | In our case we can not simply replace javascript url in html code. It usually | ||
| 115 | takes a day or two for the guys who maintain sites to change code or add | ||
| 116 | ?ver=xxx attribute. And this makes rapid deployment and testing very difficult | ||
| 117 | and time consuming. There is a limitation of how much you can test locally. | ||
| 118 | |||
| 119 | We are now in the process of integrating [Google Tag | ||
| 120 | Manager](https://www.google.com/analytics/tag-manager/) but couple of websites | ||
| 121 | are developed on ASP.net platform that have some problems with tag manager. With | ||
| 122 | a solution below we are certain that we are serving latest version of the | ||
| 123 | script. | ||
| 124 | |||
| 125 | And it only takes one mistake and users have the script cached and in case of | ||
| 126 | caching it for 1 year you probably know where the problem is. | ||
| 127 | |||
| 128 | ```nginx | ||
| 129 | # nginx ➜ /etc/nginx/sites-available/default | ||
| 130 | location /static/ { | ||
| 131 | alias /path-to-static-content/; | ||
| 132 | autoindex off; | ||
| 133 | charset utf-8; | ||
| 134 | gzip on; | ||
| 135 | gzip_types text/plain application/javascript application/x-javascript text/javascript text/xml text/css; | ||
| 136 | location ~* \.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)$ { | ||
| 137 | expires 1y; | ||
| 138 | add_header Pragma public; | ||
| 139 | add_header Cache-Control "public"; | ||
| 140 | } | ||
| 141 | location ~* \.(css|js|txt)$ { | ||
| 142 | expires 3600s; | ||
| 143 | add_header Pragma public; | ||
| 144 | add_header Cache-Control "public, must-revalidate"; | ||
| 145 | } | ||
| 146 | } | ||
| 147 | ``` | ||
| 148 | |||
| 149 | Also be careful when redirecting to url in your python code. We noticed that if | ||
| 150 | we didn't precisely setup cache control and expire headers in response we didn't | ||
| 151 | get the request on the server and therefore couldn't measure clicks. So when | ||
| 152 | redirecting do as follows and there will be no problems. | ||
| 153 | |||
| 154 | ```python | ||
| 155 | # python ➜ bottlepy web micro-framework | ||
| 156 | response = bottle.HTTPResponse(status=302) | ||
| 157 | response.set_header("Cache-Control", "no-store, no-cache, must-revalidate") | ||
| 158 | response.set_header("Expires", "Thu, 01 Jan 1970 00:00:00 GMT") | ||
| 159 | response.set_header("Location", url) | ||
| 160 | return response | ||
| 161 | ``` | ||
| 162 | |||
| 163 | > Cache control in browsers is quite aggressive and you need to be precise to | ||
| 164 | avoid future problems. We learned that lesson the hard way. | ||
| 165 | |||
| 166 | ## Learn NGINX | ||
| 167 | |||
| 168 | When deciding on a web server we went with Nginx as a reverse proxy for our | ||
| 169 | applications. We adapted micro-service oriented architecture early in the | ||
| 170 | project to ensure when we scale we can easily add additional servers to our | ||
| 171 | cluster. And Nginx was crucial to perform load balancing and static content | ||
| 172 | delivery. | ||
| 173 | |||
| 174 | At first our config file was quite simple and later grew larger. After patching | ||
| 175 | and adding new settings I sat down and learned more about the guts of Nginx. | ||
| 176 | This proved to be very useful and we were able to squeeze much more out of our | ||
| 177 | setup. So I advise you to take your time and read through the | ||
| 178 | [documentation](https://nginx.org/en/docs/). This saved us a lot of headache. | ||
| 179 | Googling for solutions only goes so far. | ||
| 180 | |||
| 181 | ## Use Redis/Memcached | ||
| 182 | |||
| 183 | As explained above we are using caching basically for everything. It is the | ||
| 184 | corner stone of our services. At first we were very careful about the quantity | ||
| 185 | of things we stored in [Redis](https://redis.io/). But we later found out that | ||
| 186 | the memory footprint is very low even when storing large amount of data in it. | ||
| 187 | |||
| 188 | So we gradually increased our usage to caching whole HTML outputs of dashboard. | ||
| 189 | This improved our performance in order of magnitude. And by using native TTL | ||
| 190 | support this goes hand in hand with our needs. | ||
| 191 | |||
| 192 | The reason why we choose [Redis](https://redis.io/) over | ||
| 193 | [Memcached](https://memcached.org/) was the nature of scalability of Redis out | ||
| 194 | of the box. But all this can be achieved with Memcached. | ||
| 195 | |||
| 196 | ## Conclusion | ||
| 197 | |||
| 198 | There are a lot more details that could have been written and every single topic | ||
| 199 | in here deserves it's own post but you probably got the idea about the problems | ||
| 200 | we faced. | ||
