diff options
| author | Mitja Felicijan <mitja.felicijan@gmail.com> | 2019-01-10 19:24:18 +0100 |
|---|---|---|
| committer | Mitja Felicijan <mitja.felicijan@gmail.com> | 2019-01-10 19:24:18 +0100 |
| commit | ad974810d43e1d5f70bca269665c25230e6a3221 (patch) | |
| tree | 2396d87e409379d6ad4066b7caf62729650541e4 /_posts/2019-01-03-encoding-binary-data-into-dna-sequence.md | |
| parent | 591a568ab2223f8ed79c50b53f3533858fe2e68e (diff) | |
| download | mitjafelicijan.com-ad974810d43e1d5f70bca269665c25230e6a3221.tar.gz | |
charts to plot.ly
Diffstat (limited to '_posts/2019-01-03-encoding-binary-data-into-dna-sequence.md')
| -rw-r--r-- | _posts/2019-01-03-encoding-binary-data-into-dna-sequence.md | 70 |
1 files changed, 56 insertions, 14 deletions
diff --git a/_posts/2019-01-03-encoding-binary-data-into-dna-sequence.md b/_posts/2019-01-03-encoding-binary-data-into-dna-sequence.md index abd0164..56e96dd 100644 --- a/_posts/2019-01-03-encoding-binary-data-into-dna-sequence.md +++ b/_posts/2019-01-03-encoding-binary-data-into-dna-sequence.md | |||
| @@ -189,12 +189,12 @@ FASTA format was extended by [FASTQ](https://en.wikipedia.org/wiki/FASTQ_format) | |||
| 189 | 189 | ||
| 190 | ### PNG encoded DNA sequence | 190 | ### PNG encoded DNA sequence |
| 191 | 191 | ||
| 192 | | Nucleotides | RGB | Color name | | 192 | | Nucleotides | RGB | Color name | |
| 193 | | ------------ | ----------- | ----------- | | 193 | | ------------ | ----------- | ---------- | |
| 194 | | A (Adenine) | (0,0,255) | Blue | | 194 | | A (Adenine) | (0,0,255) | Blue | |
| 195 | | G (Guanine) | (0,100,0) | Green | | 195 | | G (Guanine) | (0,100,0) | Green | |
| 196 | | C (Cytosine) | (255,0,0) | Red | | 196 | | C (Cytosine) | (255,0,0) | Red | |
| 197 | | T (Thymine) | (255,255,0) | Yellow | | 197 | | T (Thymine) | (255,255,0) | Yellow | |
| 198 | 198 | ||
| 199 | With this in mind we can create a simple algorithm to create PNG representation of a DNA sequence. | 199 | With this in mind we can create a simple algorithm to create PNG representation of a DNA sequence. |
| 200 | 200 | ||
| @@ -335,12 +335,12 @@ Our freshly generated 1KB file looks something like this (its full of garbage da | |||
| 335 |  | 335 |  |
| 336 | 336 | ||
| 337 | We create following binary files: | 337 | We create following binary files: |
| 338 | - 1KB | 338 | - 1KB.bin |
| 339 | - 10KB | 339 | - 10KB.bin |
| 340 | - 100KB | 340 | - 100KB.bin |
| 341 | - 1MB | 341 | - 1MB.bin |
| 342 | - 10MB | 342 | - 10MB.bin |
| 343 | - 100MB | 343 | - 100MB.bin |
| 344 | 344 | ||
| 345 | After this we create FASTA files for all the binary files by encoding them into DNA sequence. | 345 | After this we create FASTA files for all the binary files by encoding them into DNA sequence. |
| 346 | 346 | ||
| @@ -354,13 +354,55 @@ Then we GZIP all the FASTA files to see how much the can be compressed. | |||
| 354 | gzip -9 < 10MB.fa > 10MB.fa.gz | 354 | gzip -9 < 10MB.fa > 10MB.fa.gz |
| 355 | ``` | 355 | ``` |
| 356 | 356 | ||
| 357 | <script src="/assets/plotly-latest.min.js"></script> | ||
| 358 | |||
| 357 | **Speed of encoding binary file into FASTA format.** | 359 | **Speed of encoding binary file into FASTA format.** |
| 358 | 360 | ||
| 359 |  | 361 | <div id="encoding-benchmarks"></div> |
| 362 | <script> | ||
| 363 | (function(){ | ||
| 364 | var trace1 = { | ||
| 365 | x: ['1KB.bin', '10KB.bin', '100KB.bin', '1MB.bin', '10MB.bin', '100MB.bin'], | ||
| 366 | y: [5.625224, 32.679975, 112.864416, 872.887675, 8472.693202, 85525.178217], | ||
| 367 | type: 'scatter', | ||
| 368 | }; | ||
| 369 | var data = [trace1]; | ||
| 370 | Plotly.newPlot("encoding-benchmarks", data, { | ||
| 371 | legend: {"orientation": "h"}, | ||
| 372 | height: 300, | ||
| 373 | margin: { l: 50, r: 0, b: 50, t: 30, pad: 0 }, | ||
| 374 | yaxis: { title: "execution time in milliseconds", titlefont: { size: 12 } }, | ||
| 375 | }); | ||
| 376 | })(); | ||
| 377 | </script> | ||
| 360 | 378 | ||
| 361 | **File sizes of encoded files and also GZIP-ed variations.** | 379 | **File sizes of encoded files and also GZIP-ed variations.** |
| 362 | 380 | ||
| 363 |  | 381 | <div id="size-benchmarks"></div> |
| 382 | <script> | ||
| 383 | (function(){ | ||
| 384 | var trace1 = { | ||
| 385 | x: ['1KB.bin', '10KB.bin', '100KB.bin', '1MB.bin', '10MB.bin', '100MB.bin'], | ||
| 386 | y: [4.1, 40.7, 406.7, 4100, 40700, 406700], | ||
| 387 | name: 'FASTA file size', | ||
| 388 | type: 'bar', | ||
| 389 | }; | ||
| 390 | var trace2 = { | ||
| 391 | x: ['1KB.bin', '10KB.bin', '100KB.bin', '1MB.bin', '10MB.bin', '100MB.bin'], | ||
| 392 | y: [1.4, 13, 121, 1200, 12000, 118000], | ||
| 393 | name: 'FASTA GZIPPED file size', | ||
| 394 | type: 'bar', | ||
| 395 | }; | ||
| 396 | var data = [trace1, trace2]; | ||
| 397 | Plotly.newPlot("size-benchmarks", data, { | ||
| 398 | legend: {"orientation": "h"}, | ||
| 399 | height: 300, | ||
| 400 | margin: { l: 50, r: 0, b: 50, t: 30, pad: 0 }, | ||
| 401 | yaxis: { title: "size in kilobytes", titlefont: { size: 12 } }, | ||
| 402 | barmode: 'stack' | ||
| 403 | }); | ||
| 404 | })(); | ||
| 405 | </script> | ||
| 364 | 406 | ||
| 365 | [Download ODS file with benchmarks.](/files/dna-sequence/benchmarks.ods). | 407 | [Download ODS file with benchmarks.](/files/dna-sequence/benchmarks.ods). |
| 366 | 408 | ||
