diff options
| author | Mitja Felicijan <mitja.felicijan@gmail.com> | 2020-03-25 05:19:49 +0100 |
|---|---|---|
| committer | Mitja Felicijan <mitja.felicijan@gmail.com> | 2020-03-25 05:19:49 +0100 |
| commit | f7eefe654a8eb27b4ac2ac10c033cbdfa85af567 (patch) | |
| tree | c42431d9b175a0d66e05393b7843361fd726acba /content | |
| parent | ed161e7fb20a697ecba070ef7db4c231d700f245 (diff) | |
| download | mitjafelicijan.com-f7eefe654a8eb27b4ac2ac10c033cbdfa85af567.tar.gz | |
Cleaned up some content
Diffstat (limited to 'content')
| -rw-r--r-- | content/encoding-binary-data-into-dna-sequence.md | 22 |
1 files changed, 11 insertions, 11 deletions
diff --git a/content/encoding-binary-data-into-dna-sequence.md b/content/encoding-binary-data-into-dna-sequence.md index a4f8b86..068aa32 100644 --- a/content/encoding-binary-data-into-dna-sequence.md +++ b/content/encoding-binary-data-into-dna-sequence.md | |||
| @@ -39,7 +39,7 @@ My interests in this field are purely in encoding processes and experimental tes | |||
| 39 | 39 | ||
| 40 | ## Data encoding | 40 | ## Data encoding |
| 41 | 41 | ||
| 42 | **TL;DR:** Encoding involves the use of a code to change original data into a form that can be used by an external process [^1]. | 42 | **TL;DR:** Encoding involves the use of a code to change original data into a form that can be used by an external process. |
| 43 | 43 | ||
| 44 | Encoding is the process of converting data into a format required for a number of information processing needs, including: | 44 | Encoding is the process of converting data into a format required for a number of information processing needs, including: |
| 45 | 45 | ||
| @@ -47,7 +47,7 @@ Encoding is the process of converting data into a format required for a number o | |||
| 47 | - Data transmission, storage and compression/decompression | 47 | - Data transmission, storage and compression/decompression |
| 48 | - Application data processing, such as file conversion | 48 | - Application data processing, such as file conversion |
| 49 | 49 | ||
| 50 | Encoding can have two meanings[^1]: | 50 | Encoding can have two meanings: |
| 51 | 51 | ||
| 52 | - In computer technology, encoding is the process of applying a specific code, such as letters, symbols and numbers, to data for conversion into an equivalent cipher. | 52 | - In computer technology, encoding is the process of applying a specific code, such as letters, symbols and numbers, to data for conversion into an equivalent cipher. |
| 53 | - In electronics, encoding refers to analog to digital conversion. | 53 | - In electronics, encoding refers to analog to digital conversion. |
| @@ -69,7 +69,7 @@ Encoding can have two meanings[^1]: | |||
| 69 | - **2000** – Genetic code of the fruit fly is decoded. | 69 | - **2000** – Genetic code of the fruit fly is decoded. |
| 70 | - **2002** – Mouse is the first mammal to have its genome decoded. | 70 | - **2002** – Mouse is the first mammal to have its genome decoded. |
| 71 | - **2003** – The Human Genome Project is completed. | 71 | - **2003** – The Human Genome Project is completed. |
| 72 | - **2013** – DNA Worldwide and Eurofins Forensic discover identical twins have differences in their genetic makeup [^2]. | 72 | - **2013** – DNA Worldwide and Eurofins Forensic discover identical twins have differences in their genetic makeup. |
| 73 | 73 | ||
| 74 | ## What is DNA? | 74 | ## What is DNA? |
| 75 | 75 | ||
| @@ -83,7 +83,7 @@ The nucleotide in DNA consists of a sugar (deoxyribose), one of four bases (cyto | |||
| 83 | 83 | ||
| 84 |  | 84 |  |
| 85 | 85 | ||
| 86 | *DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and cytosine pairs with guanine. (credit a: modification of work by Jerome Walker, Dennis Myts) [^3]* | 86 | *DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and cytosine pairs with guanine. (credit a: modification of work by Jerome Walker, Dennis Myts)* |
| 87 | 87 | ||
| 88 | ## Encode binary data into DNA sequence | 88 | ## Encode binary data into DNA sequence |
| 89 | 89 | ||
| @@ -135,13 +135,13 @@ begin | |||
| 135 | end | 135 | end |
| 136 | ``` | 136 | ``` |
| 137 | 137 | ||
| 138 | Another encoding would be **Goldman encoding**. Using this encoding helps with Nonsense mutation (amino acids replaced by a stop codon) that occurs and is the most problematic during translation because it leads to truncated amino acid sequences, which in turn results in truncated proteins. [^4] | 138 | Another encoding would be **Goldman encoding**. Using this encoding helps with Nonsense mutation (amino acids replaced by a stop codon) that occurs and is the most problematic during translation because it leads to truncated amino acid sequences, which in turn results in truncated proteins. |
| 139 | 139 | ||
| 140 | [Where to store big data? In DNA: Nick Goldman at TEDxPrague](https://www.youtube.com/watch?v=a4PiGWNsIEU) | 140 | [Where to store big data? In DNA: Nick Goldman at TEDxPrague](https://www.youtube.com/watch?v=a4PiGWNsIEU) |
| 141 | 141 | ||
| 142 | ### FASTA file format | 142 | ### FASTA file format |
| 143 | 143 | ||
| 144 | In bioinformatics, FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The format originates from the FASTA software package, but has now become a standard in the field of bioinformatics. [^5] | 144 | In bioinformatics, FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The format originates from the FASTA software package, but has now become a standard in the field of bioinformatics. |
| 145 | 145 | ||
| 146 | The first line in a FASTA file started either with a ">" (greater-than) symbol or, less frequently, a ";" (semicolon) was taken as a comment. Subsequent lines starting with a semicolon would be ignored by software. Since the only comment used was the first, it quickly became used to hold a summary description of the sequence, often starting with a unique library accession number, and with time it has become commonplace to always use ">" for the first line and to not use ";" comments (which would otherwise be ignored). | 146 | The first line in a FASTA file started either with a ">" (greater-than) symbol or, less frequently, a ";" (semicolon) was taken as a comment. Subsequent lines starting with a semicolon would be ignored by software. Since the only comment used was the first, it quickly became used to hold a summary description of the sequence, often starting with a unique library accession number, and with time it has become commonplace to always use ">" for the first line and to not use ";" comments (which would otherwise be ignored). |
| 147 | 147 | ||
| @@ -339,8 +339,8 @@ gzip -9 < 10MB.fa > 10MB.fa.gz | |||
| 339 | 339 | ||
| 340 | ## References | 340 | ## References |
| 341 | 341 | ||
| 342 | [^1]: https://www.techopedia.com/definition/948/encoding | 342 | - https://www.techopedia.com/definition/948/encoding |
| 343 | [^2]: https://www.dna-worldwide.com/resource/160/history-dna-timeline | 343 | - https://www.dna-worldwide.com/resource/160/history-dna-timeline |
| 344 | [^3]: https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/ | 344 | - https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/ |
| 345 | [^4]: https://arxiv.org/abs/1801.04774 | 345 | - https://arxiv.org/abs/1801.04774 |
| 346 | [^5]: https://en.wikipedia.org/wiki/FASTA_format | 346 | - https://en.wikipedia.org/wiki/FASTA_format |
