diff options
Diffstat (limited to 'public/encoding-binary-data-into-dna-sequence.html')
| -rwxr-xr-x | public/encoding-binary-data-into-dna-sequence.html | 16 |
1 files changed, 8 insertions, 8 deletions
diff --git a/public/encoding-binary-data-into-dna-sequence.html b/public/encoding-binary-data-into-dna-sequence.html index bdd4543..48ce1b2 100755 --- a/public/encoding-binary-data-into-dna-sequence.html +++ b/public/encoding-binary-data-into-dna-sequence.html | |||
| @@ -41,7 +41,7 @@ We are made of starstuff. | |||
| 41 | <strong>-- Carl Sagan, Cosmos</strong></blockquote><p>The nucleotide in DNA consists of a sugar (deoxyribose), one of four bases | 41 | <strong>-- Carl Sagan, Cosmos</strong></blockquote><p>The nucleotide in DNA consists of a sugar (deoxyribose), one of four bases |
| 42 | (cytosine (C), thymine (T), adenine (A), guanine (G)), and a phosphate. | 42 | (cytosine (C), thymine (T), adenine (A), guanine (G)), and a phosphate. |
| 43 | Cytosine and thymine are pyrimidine bases, while adenine and guanine are purine | 43 | Cytosine and thymine are pyrimidine bases, while adenine and guanine are purine |
| 44 | bases. The sugar and the base together are called a nucleoside.<figure><img src=/posts/dna-sequence/dna-basics.jpg alt=DNA><figcaption><p><em>DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and | 44 | bases. The sugar and the base together are called a nucleoside.<figure><img loading="lazy" src=/posts/dna-sequence/dna-basics.jpg alt=DNA><figcaption><p><em>DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and |
| 45 | cytosine pairs with guanine. (credit a: modification of work by Jerome Walker, | 45 | cytosine pairs with guanine. (credit a: modification of work by Jerome Walker, |
| 46 | Dennis Myts)</em></figcaption></figure><h2 id=encode-binary-data-into-dna-sequence>Encode binary data into DNA sequence</h2><p>As an input file you can use any file you want:<ul><li>ASCII files,<li>Compiled programs,<li>Multimedia files (MP3, MP4, MVK, etc),<li>Images,<li>Database files,<li>etc.</ul><p>Note: If you would copy all the bytes from RAM to file or pipe data to file you | 46 | Dennis Myts)</em></figcaption></figure><h2 id=encode-binary-data-into-dna-sequence>Encode binary data into DNA sequence</h2><p>As an input file you can use any file you want:<ul><li>ASCII files,<li>Compiled programs,<li>Multimedia files (MP3, MP4, MVK, etc),<li>Images,<li>Database files,<li>etc.</ul><p>Note: If you would copy all the bytes from RAM to file or pipe data to file you |
| 47 | could encode also this data as long as you provide file pointer to the encoder.<h3 id=basic-encoding>Basic Encoding</h3><p>As already mentioned, the Basic Encoding is based on a simple mapping. Since DNA | 47 | could encode also this data as long as you provide file pointer to the encoder.<h3 id=basic-encoding>Basic Encoding</h3><p>As already mentioned, the Basic Encoding is based on a simple mapping. Since DNA |
| @@ -143,7 +143,7 @@ making progress. | |||
| 143 | </span></span><span style=display:flex><span>2019/01/10 00:40:09 Output image file length is 1.1 kB | 143 | </span></span><span style=display:flex><span>2019/01/10 00:40:09 Output image file length is 1.1 kB |
| 144 | </span></span><span style=display:flex><span>2019/01/10 00:40:09 Process took 19.036117ms | 144 | </span></span><span style=display:flex><span>2019/01/10 00:40:09 Process took 19.036117ms |
| 145 | </span></span><span style=display:flex><span>2019/01/10 00:40:09 Done ... | 145 | </span></span><span style=display:flex><span>2019/01/10 00:40:09 Done ... |
| 146 | </span></span></code></pre><p>After encoding into PNG format this file looks like this.<figure><img src=/posts/dna-sequence/quote.png alt="Encoded Quote in PNG format"><figcaption><p>The larger the input stream is the larger the PNG file would be.</figcaption></figure><p>Compiled basic Hello World C program with | 146 | </span></span></code></pre><p>After encoding into PNG format this file looks like this.<figure><img loading="lazy" src=/posts/dna-sequence/quote.png alt="Encoded Quote in PNG format"><figcaption><p>The larger the input stream is the larger the PNG file would be.</figcaption></figure><p>Compiled basic Hello World C program with |
| 147 | <a href=https://www.gnu.org/software/gcc/>GCC</a> would <a href=/posts/dna-sequence/sample.png>look | 147 | <a href=https://www.gnu.org/software/gcc/>GCC</a> would <a href=/posts/dna-sequence/sample.png>look |
| 148 | like</a>.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:green>// gcc -O3 -o sample sample.c | 148 | like</a>.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:green>// gcc -O3 -o sample sample.c |
| 149 | </span></span></span><span style=display:flex><span><span style=color:green></span><span style=color:#00f>#include</span> <span style=color:#00f><stdio.h></span><span style=color:#00f> | 149 | </span></span></span><span style=display:flex><span><span style=color:green></span><span style=color:#00f>#include</span> <span style=color:#00f><stdio.h></span><span style=color:#00f> |
| @@ -178,14 +178,14 @@ like</a>.<pre tabindex=0 style=background-color:#fff><code><span style=display:f | |||
| 178 | </span></span><span style=display:flex><span> --version Show application version. | 178 | </span></span><span style=display:flex><span> --version Show application version. |
| 179 | </span></span></code></pre><h2 id=benchmarks>Benchmarks</h2><p>First we generate some binary sample data with dd.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>dd <span style=color:#00f>if</span>=<(openssl enc -aes-256-ctr -pass pass:<span style=color:#a31515>"</span><span style=color:#00f>$(</span>dd <span style=color:#00f>if</span>=/dev/urandom bs=128 count=1 2>/dev/null | base64<span style=color:#00f>)</span><span style=color:#a31515>"</span> -nosalt < /dev/zero) of=1KB.bin bs=1KB count=1 iflag=fullblock | 179 | </span></span></code></pre><h2 id=benchmarks>Benchmarks</h2><p>First we generate some binary sample data with dd.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>dd <span style=color:#00f>if</span>=<(openssl enc -aes-256-ctr -pass pass:<span style=color:#a31515>"</span><span style=color:#00f>$(</span>dd <span style=color:#00f>if</span>=/dev/urandom bs=128 count=1 2>/dev/null | base64<span style=color:#00f>)</span><span style=color:#a31515>"</span> -nosalt < /dev/zero) of=1KB.bin bs=1KB count=1 iflag=fullblock |
| 180 | </span></span></code></pre><p>Our freshly generated 1KB file looks something like this (its full of garbage | 180 | </span></span></code></pre><p>Our freshly generated 1KB file looks something like this (its full of garbage |
| 181 | data as intended).<figure><img src=/posts/dna-sequence/sample-binary-file.png alt="Sample binary file 1KB"></figure><p>We create following binary files:<ul><li>1KB.bin<li>10KB.bin<li>100KB.bin<li>1MB.bin<li>10MB.bin<li>100MB.bin</ul><p>After this we create FASTA files for all the binary files by encoding them | 181 | data as intended).<figure><img loading="lazy" src=/posts/dna-sequence/sample-binary-file.png alt="Sample binary file 1KB"></figure><p>We create following binary files:<ul><li>1KB.bin<li>10KB.bin<li>100KB.bin<li>1MB.bin<li>10MB.bin<li>100MB.bin</ul><p>After this we create FASTA files for all the binary files by encoding them |
| 182 | into DNA sequence.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>./dnae-encode -i 100MB.bin -o 100MB.fa | 182 | into DNA sequence.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>./dnae-encode -i 100MB.bin -o 100MB.fa |
| 183 | </span></span></code></pre><p>Then we GZIP all the FASTA files to see how much the can be compressed.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>gzip -9 < 10MB.fa > 10MB.fa.gz | 183 | </span></span></code></pre><p>Then we GZIP all the FASTA files to see how much the can be compressed.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>gzip -9 < 10MB.fa > 10MB.fa.gz |
| 184 | </span></span></code></pre><figure><img src=/posts/dna-sequence/chart-speed.svg alt="Encode to FASTA"><figcaption><p>The speed increase that occurs when encoding to FASTA format.</figcaption></figure><figure><img src=/posts/dna-sequence/chart-size.svg alt="File sizes"><figcaption><p>Size of the out file after encoding.</figcaption></figure><p><a href=/posts/dna-sequence/benchmarks.csv>Download CSV file with benchmarks</a>.<h2 id=references>References</h2><ul><li><a href=https://www.techopedia.com/definition/948/encoding>https://www.techopedia.com/definition/948/encoding</a><li><a href=https://www.dna-worldwide.com/resource/160/history-dna-timeline>https://www.dna-worldwide.com/resource/160/history-dna-timeline</a><li><a href=https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/>https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/</a><li><a href=https://arxiv.org/abs/1801.04774>https://arxiv.org/abs/1801.04774</a><li><a href=https://en.wikipedia.org/wiki/FASTA_format>https://en.wikipedia.org/wiki/FASTA_format</a></ul></div></article></main><section><hr><h2>Posts from blogs I follow around the net</h2><ul><li><a href=https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSWhyNotDirectoryToFilesystem target=_blank rel=noopener>One reason that ZFS can't turn a directory into a filesystem</a> — <a href=https://utcc.utoronto.ca/~cks/space/blog/>Chris's Wiki :: blog</a><div>One of the wishes that I and other people frequently have for ZFS | 184 | </span></span></code></pre><figure><img loading="lazy" src=/posts/dna-sequence/chart-speed.svg alt="Encode to FASTA"><figcaption><p>The speed increase that occurs when encoding to FASTA format.</figcaption></figure><figure><img loading="lazy" src=/posts/dna-sequence/chart-size.svg alt="File sizes"><figcaption><p>Size of the out file after encoding.</figcaption></figure><p><a href=/posts/dna-sequence/benchmarks.csv>Download CSV file with benchmarks</a>.<h2 id=references>References</h2><ul><li><a href=https://www.techopedia.com/definition/948/encoding>https://www.techopedia.com/definition/948/encoding</a><li><a href=https://www.dna-worldwide.com/resource/160/history-dna-timeline>https://www.dna-worldwide.com/resource/160/history-dna-timeline</a><li><a href=https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/>https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/</a><li><a href=https://arxiv.org/abs/1801.04774>https://arxiv.org/abs/1801.04774</a><li><a href=https://en.wikipedia.org/wiki/FASTA_format>https://en.wikipedia.org/wiki/FASTA_format</a></ul></div></article></main><section><hr><h2>Posts from blogs I follow around the net</h2><ul><li><a href=https://utcc.utoronto.ca/~cks/space/blog/linux/NFSv4ServerLockClients target=_blank rel=noopener>Finding which NFSv4 client owns a lock on a Linux NFS(v4) server</a> — <a href=https://utcc.utoronto.ca/~cks/space/blog/>Chris's Wiki :: blog</a><div>A while back I wrote an entry about finding which NFS client owns |
| 185 | is the ability to take an existing directory (and everything | 185 | a lock on a Linux NFS server, which turned |
| 186 | underneath it) in a ZFS filesystem and turn it into a sub-filesystem | 186 | out to be specific to NFS v3 (which I really should have seen coming, |
| 187 | of its own. One reason for wanting this is that a number of things | 187 | since it involved NLM and lockd). Finding the NFS v4 client that |
| 188 | are set and controlled on a per-filesyst…<li><a href=http://www.landley.net/notes-2023.html#28-10-2023 target=_blank rel=noopener>October 28, 2023</a> — <a href=http://www.landley.net/notes-2023.html>Rob Landley's Blog Thing for 2023</a><div>Oh good grief, two of my least favorite licensing people, Larry Rosen | 188 | owns a lock is, depending on your perspective, either simpl…<li><a href=http://www.landley.net/notes-2023.html#28-10-2023 target=_blank rel=noopener>October 28, 2023</a> — <a href=http://www.landley.net/notes-2023.html>Rob Landley's Blog Thing for 2023</a><div>Oh good grief, two of my least favorite licensing people, Larry Rosen |
| 189 | and Bradley Kuhn, are interacting on the OSI's license-discuss | 189 | and Bradley Kuhn, are interacting on the OSI's license-discuss |
| 190 | list where the're doing | 190 | list where the're doing |
| 191 | bad computer history and insisting that a guy Larry Rosen | 191 | bad computer history and insisting that a guy Larry Rosen |
