aboutsummaryrefslogtreecommitdiff
path: root/content/encoding-binary-data-into-dna-sequence.md
diff options
context:
space:
mode:
authorMitja Felicijan <mitja.felicijan@gmail.com>2020-03-25 05:19:49 +0100
committerMitja Felicijan <mitja.felicijan@gmail.com>2020-03-25 05:19:49 +0100
commitf7eefe654a8eb27b4ac2ac10c033cbdfa85af567 (patch)
treec42431d9b175a0d66e05393b7843361fd726acba /content/encoding-binary-data-into-dna-sequence.md
parented161e7fb20a697ecba070ef7db4c231d700f245 (diff)
downloadmitjafelicijan.com-f7eefe654a8eb27b4ac2ac10c033cbdfa85af567.tar.gz
Cleaned up some content
Diffstat (limited to 'content/encoding-binary-data-into-dna-sequence.md')
-rw-r--r--content/encoding-binary-data-into-dna-sequence.md22
1 files changed, 11 insertions, 11 deletions
diff --git a/content/encoding-binary-data-into-dna-sequence.md b/content/encoding-binary-data-into-dna-sequence.md
index a4f8b86..068aa32 100644
--- a/content/encoding-binary-data-into-dna-sequence.md
+++ b/content/encoding-binary-data-into-dna-sequence.md
@@ -39,7 +39,7 @@ My interests in this field are purely in encoding processes and experimental tes
39 39
40## Data encoding 40## Data encoding
41 41
42**TL;DR:** Encoding involves the use of a code to change original data into a form that can be used by an external process [^1]. 42**TL;DR:** Encoding involves the use of a code to change original data into a form that can be used by an external process.
43 43
44Encoding is the process of converting data into a format required for a number of information processing needs, including: 44Encoding is the process of converting data into a format required for a number of information processing needs, including:
45 45
@@ -47,7 +47,7 @@ Encoding is the process of converting data into a format required for a number o
47- Data transmission, storage and compression/decompression 47- Data transmission, storage and compression/decompression
48- Application data processing, such as file conversion 48- Application data processing, such as file conversion
49 49
50Encoding can have two meanings[^1]: 50Encoding can have two meanings:
51 51
52- In computer technology, encoding is the process of applying a specific code, such as letters, symbols and numbers, to data for conversion into an equivalent cipher. 52- In computer technology, encoding is the process of applying a specific code, such as letters, symbols and numbers, to data for conversion into an equivalent cipher.
53- In electronics, encoding refers to analog to digital conversion. 53- In electronics, encoding refers to analog to digital conversion.
@@ -69,7 +69,7 @@ Encoding can have two meanings[^1]:
69- **2000** – Genetic code of the fruit fly is decoded. 69- **2000** – Genetic code of the fruit fly is decoded.
70- **2002** – Mouse is the first mammal to have its genome decoded. 70- **2002** – Mouse is the first mammal to have its genome decoded.
71- **2003** – The Human Genome Project is completed. 71- **2003** – The Human Genome Project is completed.
72- **2013** – DNA Worldwide and Eurofins Forensic discover identical twins have differences in their genetic makeup [^2]. 72- **2013** – DNA Worldwide and Eurofins Forensic discover identical twins have differences in their genetic makeup.
73 73
74## What is DNA? 74## What is DNA?
75 75
@@ -83,7 +83,7 @@ The nucleotide in DNA consists of a sugar (deoxyribose), one of four bases (cyto
83 83
84![DNA](/assets/dna-sequence/dna-basics.jpg#center) 84![DNA](/assets/dna-sequence/dna-basics.jpg#center)
85 85
86*DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and cytosine pairs with guanine. (credit a: modification of work by Jerome Walker, Dennis Myts) [^3]* 86*DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and cytosine pairs with guanine. (credit a: modification of work by Jerome Walker, Dennis Myts)*
87 87
88## Encode binary data into DNA sequence 88## Encode binary data into DNA sequence
89 89
@@ -135,13 +135,13 @@ begin
135end 135end
136``` 136```
137 137
138Another encoding would be **Goldman encoding**. Using this encoding helps with Nonsense mutation (amino acids replaced by a stop codon) that occurs and is the most problematic during translation because it leads to truncated amino acid sequences, which in turn results in truncated proteins. [^4] 138Another encoding would be **Goldman encoding**. Using this encoding helps with Nonsense mutation (amino acids replaced by a stop codon) that occurs and is the most problematic during translation because it leads to truncated amino acid sequences, which in turn results in truncated proteins.
139 139
140[Where to store big data? In DNA: Nick Goldman at TEDxPrague](https://www.youtube.com/watch?v=a4PiGWNsIEU) 140[Where to store big data? In DNA: Nick Goldman at TEDxPrague](https://www.youtube.com/watch?v=a4PiGWNsIEU)
141 141
142### FASTA file format 142### FASTA file format
143 143
144In bioinformatics, FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The format originates from the FASTA software package, but has now become a standard in the field of bioinformatics. [^5] 144In bioinformatics, FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The format originates from the FASTA software package, but has now become a standard in the field of bioinformatics.
145 145
146The first line in a FASTA file started either with a ">" (greater-than) symbol or, less frequently, a ";" (semicolon) was taken as a comment. Subsequent lines starting with a semicolon would be ignored by software. Since the only comment used was the first, it quickly became used to hold a summary description of the sequence, often starting with a unique library accession number, and with time it has become commonplace to always use ">" for the first line and to not use ";" comments (which would otherwise be ignored). 146The first line in a FASTA file started either with a ">" (greater-than) symbol or, less frequently, a ";" (semicolon) was taken as a comment. Subsequent lines starting with a semicolon would be ignored by software. Since the only comment used was the first, it quickly became used to hold a summary description of the sequence, often starting with a unique library accession number, and with time it has become commonplace to always use ">" for the first line and to not use ";" comments (which would otherwise be ignored).
147 147
@@ -339,8 +339,8 @@ gzip -9 < 10MB.fa > 10MB.fa.gz
339 339
340## References 340## References
341 341
342[^1]: https://www.techopedia.com/definition/948/encoding 342- https://www.techopedia.com/definition/948/encoding
343[^2]: https://www.dna-worldwide.com/resource/160/history-dna-timeline 343- https://www.dna-worldwide.com/resource/160/history-dna-timeline
344[^3]: https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/ 344- https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/
345[^4]: https://arxiv.org/abs/1801.04774 345- https://arxiv.org/abs/1801.04774
346[^5]: https://en.wikipedia.org/wiki/FASTA_format 346- https://en.wikipedia.org/wiki/FASTA_format