From 057be23acf19acae0683c59b0a346b411a04880a Mon Sep 17 00:00:00 2001 From: Mitja Felicijan Date: Sat, 5 Aug 2023 13:41:36 +0200 Subject: Cleanup of posts --- .../2019-01-03-encoding-binary-data-into-dna-sequence.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) (limited to 'content/posts/2019-01-03-encoding-binary-data-into-dna-sequence.md') diff --git a/content/posts/2019-01-03-encoding-binary-data-into-dna-sequence.md b/content/posts/2019-01-03-encoding-binary-data-into-dna-sequence.md index f003fc3..0d44a40 100644 --- a/content/posts/2019-01-03-encoding-binary-data-into-dna-sequence.md +++ b/content/posts/2019-01-03-encoding-binary-data-into-dna-sequence.md @@ -110,7 +110,6 @@ Cytosine and thymine are pyrimidine bases, while adenine and guanine are purine bases. The sugar and the base together are called a nucleoside. ![DNA](/posts/dna-sequence/dna-basics.jpg) - *DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and cytosine pairs with guanine. (credit a: modification of work by Jerome Walker, Dennis Myts)* @@ -135,7 +134,9 @@ As already mentioned, the Basic Encoding is based on a simple mapping. Since DNA is composed of 4 nucleotides (Adenine, Cytosine, Guanine, Thymine; usually referred using the first letter). Using this technique we can encode -$$ log_2(4) = log_2(2^2) = 2 bits $$ +
+ +
using a single nucleotide. In this way, we are able to use the 4 bases that compose the DNA strand to encode each byte of data. @@ -301,7 +302,6 @@ Then we encode FASTA file from previous operation to encode this data into PNG. After encoding into PNG format this file looks like this. ![Encoded Quote in PNG format](/posts/dna-sequence/quote.png) - The larger the input stream is the larger the PNG file would be. Compiled basic Hello World C program with @@ -396,11 +396,13 @@ Then we GZIP all the FASTA files to see how much the can be compressed. gzip -9 < 10MB.fa > 10MB.fa.gz ``` -[Download ODS file with benchmarks](/dna-sequence/benchmarks.ods). +![Encode to FASTA](/posts/dna-sequence/chart-speed.svg) +The speed increase that occurs when encoding to FASTA format. -![Sample binary file 1KB](/posts/dna-sequence/chart-1.png) +![File sizes](/posts/dna-sequence/chart-size.svg) +Size of the out file after encoding. -![Sample binary file 1KB](/posts/dna-sequence/chart-2.png) +[Download CSV file with benchmarks](/posts/dna-sequence/benchmarks.csv). ## References -- cgit v1.2.3