aboutsummaryrefslogtreecommitdiff
path: root/public/encoding-binary-data-into-dna-sequence.html
diff options
context:
space:
mode:
Diffstat (limited to 'public/encoding-binary-data-into-dna-sequence.html')
-rwxr-xr-xpublic/encoding-binary-data-into-dna-sequence.html16
1 files changed, 8 insertions, 8 deletions
diff --git a/public/encoding-binary-data-into-dna-sequence.html b/public/encoding-binary-data-into-dna-sequence.html
index bdd4543..48ce1b2 100755
--- a/public/encoding-binary-data-into-dna-sequence.html
+++ b/public/encoding-binary-data-into-dna-sequence.html
@@ -41,7 +41,7 @@ We are made of starstuff.
41<strong>-- Carl Sagan, Cosmos</strong></blockquote><p>The nucleotide in DNA consists of a sugar (deoxyribose), one of four bases 41<strong>-- Carl Sagan, Cosmos</strong></blockquote><p>The nucleotide in DNA consists of a sugar (deoxyribose), one of four bases
42(cytosine (C), thymine (T), adenine (A), guanine (G)), and a phosphate. 42(cytosine (C), thymine (T), adenine (A), guanine (G)), and a phosphate.
43Cytosine and thymine are pyrimidine bases, while adenine and guanine are purine 43Cytosine and thymine are pyrimidine bases, while adenine and guanine are purine
44bases. The sugar and the base together are called a nucleoside.<figure><img src=/posts/dna-sequence/dna-basics.jpg alt=DNA><figcaption><p><em>DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and 44bases. The sugar and the base together are called a nucleoside.<figure><img loading="lazy" src=/posts/dna-sequence/dna-basics.jpg alt=DNA><figcaption><p><em>DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and
45cytosine pairs with guanine. (credit a: modification of work by Jerome Walker, 45cytosine pairs with guanine. (credit a: modification of work by Jerome Walker,
46Dennis Myts)</em></figcaption></figure><h2 id=encode-binary-data-into-dna-sequence>Encode binary data into DNA sequence</h2><p>As an input file you can use any file you want:<ul><li>ASCII files,<li>Compiled programs,<li>Multimedia files (MP3, MP4, MVK, etc),<li>Images,<li>Database files,<li>etc.</ul><p>Note: If you would copy all the bytes from RAM to file or pipe data to file you 46Dennis Myts)</em></figcaption></figure><h2 id=encode-binary-data-into-dna-sequence>Encode binary data into DNA sequence</h2><p>As an input file you can use any file you want:<ul><li>ASCII files,<li>Compiled programs,<li>Multimedia files (MP3, MP4, MVK, etc),<li>Images,<li>Database files,<li>etc.</ul><p>Note: If you would copy all the bytes from RAM to file or pipe data to file you
47could encode also this data as long as you provide file pointer to the encoder.<h3 id=basic-encoding>Basic Encoding</h3><p>As already mentioned, the Basic Encoding is based on a simple mapping. Since DNA 47could encode also this data as long as you provide file pointer to the encoder.<h3 id=basic-encoding>Basic Encoding</h3><p>As already mentioned, the Basic Encoding is based on a simple mapping. Since DNA
@@ -143,7 +143,7 @@ making progress.
143</span></span><span style=display:flex><span>2019/01/10 00:40:09 Output image file length is 1.1 kB 143</span></span><span style=display:flex><span>2019/01/10 00:40:09 Output image file length is 1.1 kB
144</span></span><span style=display:flex><span>2019/01/10 00:40:09 Process took 19.036117ms 144</span></span><span style=display:flex><span>2019/01/10 00:40:09 Process took 19.036117ms
145</span></span><span style=display:flex><span>2019/01/10 00:40:09 Done ... 145</span></span><span style=display:flex><span>2019/01/10 00:40:09 Done ...
146</span></span></code></pre><p>After encoding into PNG format this file looks like this.<figure><img src=/posts/dna-sequence/quote.png alt="Encoded Quote in PNG format"><figcaption><p>The larger the input stream is the larger the PNG file would be.</figcaption></figure><p>Compiled basic Hello World C program with 146</span></span></code></pre><p>After encoding into PNG format this file looks like this.<figure><img loading="lazy" src=/posts/dna-sequence/quote.png alt="Encoded Quote in PNG format"><figcaption><p>The larger the input stream is the larger the PNG file would be.</figcaption></figure><p>Compiled basic Hello World C program with
147<a href=https://www.gnu.org/software/gcc/>GCC</a> would <a href=/posts/dna-sequence/sample.png>look 147<a href=https://www.gnu.org/software/gcc/>GCC</a> would <a href=/posts/dna-sequence/sample.png>look
148like</a>.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:green>// gcc -O3 -o sample sample.c 148like</a>.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:green>// gcc -O3 -o sample sample.c
149</span></span></span><span style=display:flex><span><span style=color:green></span><span style=color:#00f>#include</span> <span style=color:#00f>&lt;stdio.h&gt;</span><span style=color:#00f> 149</span></span></span><span style=display:flex><span><span style=color:green></span><span style=color:#00f>#include</span> <span style=color:#00f>&lt;stdio.h&gt;</span><span style=color:#00f>
@@ -178,14 +178,14 @@ like</a>.<pre tabindex=0 style=background-color:#fff><code><span style=display:f
178</span></span><span style=display:flex><span> --version Show application version. 178</span></span><span style=display:flex><span> --version Show application version.
179</span></span></code></pre><h2 id=benchmarks>Benchmarks</h2><p>First we generate some binary sample data with dd.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>dd <span style=color:#00f>if</span>=&lt;(openssl enc -aes-256-ctr -pass pass:<span style=color:#a31515>&#34;</span><span style=color:#00f>$(</span>dd <span style=color:#00f>if</span>=/dev/urandom bs=128 count=1 2&gt;/dev/null | base64<span style=color:#00f>)</span><span style=color:#a31515>&#34;</span> -nosalt &lt; /dev/zero) of=1KB.bin bs=1KB count=1 iflag=fullblock 179</span></span></code></pre><h2 id=benchmarks>Benchmarks</h2><p>First we generate some binary sample data with dd.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>dd <span style=color:#00f>if</span>=&lt;(openssl enc -aes-256-ctr -pass pass:<span style=color:#a31515>&#34;</span><span style=color:#00f>$(</span>dd <span style=color:#00f>if</span>=/dev/urandom bs=128 count=1 2&gt;/dev/null | base64<span style=color:#00f>)</span><span style=color:#a31515>&#34;</span> -nosalt &lt; /dev/zero) of=1KB.bin bs=1KB count=1 iflag=fullblock
180</span></span></code></pre><p>Our freshly generated 1KB file looks something like this (its full of garbage 180</span></span></code></pre><p>Our freshly generated 1KB file looks something like this (its full of garbage
181data as intended).<figure><img src=/posts/dna-sequence/sample-binary-file.png alt="Sample binary file 1KB"></figure><p>We create following binary files:<ul><li>1KB.bin<li>10KB.bin<li>100KB.bin<li>1MB.bin<li>10MB.bin<li>100MB.bin</ul><p>After this we create FASTA files for all the binary files by encoding them 181data as intended).<figure><img loading="lazy" src=/posts/dna-sequence/sample-binary-file.png alt="Sample binary file 1KB"></figure><p>We create following binary files:<ul><li>1KB.bin<li>10KB.bin<li>100KB.bin<li>1MB.bin<li>10MB.bin<li>100MB.bin</ul><p>After this we create FASTA files for all the binary files by encoding them
182into DNA sequence.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>./dnae-encode -i 100MB.bin -o 100MB.fa 182into DNA sequence.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>./dnae-encode -i 100MB.bin -o 100MB.fa
183</span></span></code></pre><p>Then we GZIP all the FASTA files to see how much the can be compressed.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>gzip -9 &lt; 10MB.fa &gt; 10MB.fa.gz 183</span></span></code></pre><p>Then we GZIP all the FASTA files to see how much the can be compressed.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>gzip -9 &lt; 10MB.fa &gt; 10MB.fa.gz
184</span></span></code></pre><figure><img src=/posts/dna-sequence/chart-speed.svg alt="Encode to FASTA"><figcaption><p>The speed increase that occurs when encoding to FASTA format.</figcaption></figure><figure><img src=/posts/dna-sequence/chart-size.svg alt="File sizes"><figcaption><p>Size of the out file after encoding.</figcaption></figure><p><a href=/posts/dna-sequence/benchmarks.csv>Download CSV file with benchmarks</a>.<h2 id=references>References</h2><ul><li><a href=https://www.techopedia.com/definition/948/encoding>https://www.techopedia.com/definition/948/encoding</a><li><a href=https://www.dna-worldwide.com/resource/160/history-dna-timeline>https://www.dna-worldwide.com/resource/160/history-dna-timeline</a><li><a href=https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/>https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/</a><li><a href=https://arxiv.org/abs/1801.04774>https://arxiv.org/abs/1801.04774</a><li><a href=https://en.wikipedia.org/wiki/FASTA_format>https://en.wikipedia.org/wiki/FASTA_format</a></ul></div></article></main><section><hr><h2>Posts from blogs I follow around the net</h2><ul><li><a href=https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSWhyNotDirectoryToFilesystem target=_blank rel=noopener>One reason that ZFS can't turn a directory into a filesystem</a> — <a href=https://utcc.utoronto.ca/~cks/space/blog/>Chris's Wiki :: blog</a><div>One of the wishes that I and other people frequently have for ZFS 184</span></span></code></pre><figure><img loading="lazy" src=/posts/dna-sequence/chart-speed.svg alt="Encode to FASTA"><figcaption><p>The speed increase that occurs when encoding to FASTA format.</figcaption></figure><figure><img loading="lazy" src=/posts/dna-sequence/chart-size.svg alt="File sizes"><figcaption><p>Size of the out file after encoding.</figcaption></figure><p><a href=/posts/dna-sequence/benchmarks.csv>Download CSV file with benchmarks</a>.<h2 id=references>References</h2><ul><li><a href=https://www.techopedia.com/definition/948/encoding>https://www.techopedia.com/definition/948/encoding</a><li><a href=https://www.dna-worldwide.com/resource/160/history-dna-timeline>https://www.dna-worldwide.com/resource/160/history-dna-timeline</a><li><a href=https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/>https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/</a><li><a href=https://arxiv.org/abs/1801.04774>https://arxiv.org/abs/1801.04774</a><li><a href=https://en.wikipedia.org/wiki/FASTA_format>https://en.wikipedia.org/wiki/FASTA_format</a></ul></div></article></main><section><hr><h2>Posts from blogs I follow around the net</h2><ul><li><a href=https://utcc.utoronto.ca/~cks/space/blog/linux/NFSv4ServerLockClients target=_blank rel=noopener>Finding which NFSv4 client owns a lock on a Linux NFS(v4) server</a> — <a href=https://utcc.utoronto.ca/~cks/space/blog/>Chris's Wiki :: blog</a><div>A while back I wrote an entry about finding which NFS client owns
185is the ability to take an existing directory (and everything 185a lock on a Linux NFS server, which turned
186underneath it) in a ZFS filesystem and turn it into a sub-filesystem 186out to be specific to NFS v3 (which I really should have seen coming,
187of its own. One reason for wanting this is that a number of things 187since it involved NLM and lockd). Finding the NFS v4 client that
188are set and controlled on a per-filesyst…<li><a href=http://www.landley.net/notes-2023.html#28-10-2023 target=_blank rel=noopener>October 28, 2023</a> — <a href=http://www.landley.net/notes-2023.html>Rob Landley's Blog Thing for 2023</a><div>Oh good grief, two of my least favorite licensing people, Larry Rosen 188owns a lock is, depending on your perspective, either simpl…<li><a href=http://www.landley.net/notes-2023.html#28-10-2023 target=_blank rel=noopener>October 28, 2023</a> — <a href=http://www.landley.net/notes-2023.html>Rob Landley's Blog Thing for 2023</a><div>Oh good grief, two of my least favorite licensing people, Larry Rosen
189and Bradley Kuhn, are interacting on the OSI's license-discuss 189and Bradley Kuhn, are interacting on the OSI's license-discuss
190list where the're doing 190list where the're doing
191bad computer history and insisting that a guy Larry Rosen 191bad computer history and insisting that a guy Larry Rosen