aboutsummaryrefslogtreecommitdiff
path: root/posts
diff options
context:
space:
mode:
Diffstat (limited to 'posts')
-rw-r--r--posts/2022-07-05-what-would-dna-sound-if-synthesized.md233
1 files changed, 233 insertions, 0 deletions
diff --git a/posts/2022-07-05-what-would-dna-sound-if-synthesized.md b/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
new file mode 100644
index 0000000..0c41dd0
--- /dev/null
+++ b/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
@@ -0,0 +1,233 @@
1---
2Title: What would DNA sound if synthesized to an audio file
3Description: What would DNA sound if synthesized
4Slug: what-would-dna-sound-if-synthesized
5Listing: true
6Created: 2022-07-05
7Tags: []
8---
9
101. [Introduction](#introduction)
112. [DNA encoding and primer example](#dna-encoding-and-primer-example)
123. [Parsing DNA data](#parsing-dna-data)
134. [Generating sine wave](#generating-sine-wave)
145. [Generating a WAV file from accumulated sine waves](#generating-a-wav-file-from-accumulated-sine-waves)
156. [Generating Spectograms](#generating-spectograms)
167. [Pre-generated sequences](#pre-generated-sequences)
17 1. [Niels Bohr quote](#niels-bohr-quote)
18 2. [Mouse](#mouse)
19 3. [Bison](#bison)
20 4. [Taurus](#taurus)
218. [Going even further](#going-even-further)
22
23## Introduction
24
25Lately, I have been thinking a lot about the nature of life, what are the foundation blocks of life and things like that. It's remarkable how complex and on the other hand simple the creation is when you look at it. The miracle of life keeps us grounded when our imagination goes wild. If the DNA are the blocks of life, you could consider them to be an API nature provided us to better understand all of this chaos masquerading as order.
26
27I have been reading a lot about superintelligence and our somehow misguided path to create general artificial intelligence. What would the building blocks or our creation look like? Is the compression really the ultimate storage of information? Will our creation also ponder this questions when creating new worlds for themselves, or will we just disappear into the vastness of possibilities? It is a little offensive that we are playing God whilst being completely ignorant of our own reality. Who knows! Like many other breakthroughs, this one will also come at a cost not known to us when it finally happens.
28
29To keep things a bit lighter, I decided to convert some popular DNA sequences into an audio files for us to listen to. I am not the first one, nor I will be the last one to do this. But it is an interesting exercise in better understanding the relationship between art and science. Maybe listening to DNA instead of parsing it will find a way into better understanding, or at least enjoying the creation and cryptic nature of life.
30
31## DNA encoding and primer example
32
33I have been exploring DNA in the past in my post from about 3 years ago in [Encoding binary data into DNA sequence](/encoding-binary-data-into-dna-sequence.html) where I have been converting all sorts of data into DNA sequences.
34
35This will be a similar exercise but instead of converting to DNA, I will be generating tones from Nucleotides.
36
37| Nucleotides | Note | Frequency |
38| ---------------- | ---- | --------- |
39| **A** (Adenine) | A | 440 Hz |
40| **C** (Cytosine) | C | 783.99 Hz |
41| **G** (Guanine) | G | 523.25 Hz |
42| **T** (Thymine) | D | 587.33 Hz |
43
44Since we do not have T in equal-tempered scale, I choose D to represent T note.
45
46You can check [Frequencies for equal-tempered scale, A4 = 440 Hz](https://pages.mtu.edu/~suits/notefreqs.html). For this tuning, we also choose `Speed of Sound = 345 m/s = 1130 ft/s = 770 miles/hr`.
47
48Now that we have this out of the way, we can also brush up on the DNA sequencing a bit. This is a famous quote I also used for the encoding tests, and it goes like this.
49
50> How wonderful that we have met with a paradox. Now we have some hope of making progress.
51> ― Niels Bohr
52
53```shell
54>SEQ1
55GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA
56GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA
57ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA
58ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT
59GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT
60GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC
61AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC
62AACC
63```
64
65This is what we gonna work with to get things rolling forward, when creating parser and waveform generator.
66
67## Parsing DNA data
68
69This step is rather simple one. All we need to do is parse input DNA sequence in [FASTA format](https://en.wikipedia.org/wiki/FASTA_format) well known in [Bioinformatics](https://en.wikipedia.org/wiki/Bioinformatics) to extract single Nucleotides that will be converted into separate tones based on equal-tempered scale explained above.
70
71```python
72nucleotide_tone_map = {
73 'A': 440,
74 'C': 523.25,
75 'G': 783.99,
76 'T': 587.33, # converted to D
77}
78
79def split(word):
80 return [char for char in word]
81
82def generate_from_dna_sequence(sequence):
83 for nucleotide in split(sequence):
84 print(nucleotide, nucleotide_tone_map[nucleotide])
85```
86
87## Generating sine wave
88
89Because we are essentially creating a long stream of notes we will be appending sine notes to a global array we will later use for creating a WAV file out of it.
90
91```python
92import math
93
94def append_sinewave(freq=440.0, duration_milliseconds=500, volume=1.0):
95 global audio
96
97 num_samples = duration_milliseconds * (sample_rate / 1000.0)
98
99 for x in range(int(num_samples)):
100 audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate)))
101
102 return
103```
104
105The sine wave generated here is the standard beep. If you want something more aggressive, you could try a square or saw tooth waveform.
106
107## Generating a WAV file from accumulated sine waves
108
109
110```python
111import wave
112import struct
113
114def save_wav(file_name):
115 wav_file = wave.open(file_name, 'w')
116 nchannels = 1
117 sampwidth = 2
118
119 nframes = len(audio)
120 comptype = 'NONE'
121 compname = 'not compressed'
122 wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
123
124 for sample in audio:
125 wav_file.writeframes(struct.pack('h', int(sample * 32767.0)))
126
127 wav_file.close()
128```
129
13044100 is the industry standard sample rate - CD quality. If you need to save on file size, you can adjust it downwards. The standard for low quality is, 8000 or 8kHz.
131
132WAV files here are using short, 16 bit, signed integers for the sample size. So, we multiply the floating-point data we have by 32767, the maximum value for a short integer.
133
134> It is theoretically possible to use the floating point -1.0 to 1.0 data directly in a WAV file, but not obvious how to do that using the wave module in Python.
135
136## Generating Spectograms
137
138I have tried two methods of doing this and both were just fine. I however opted out to use the [SoX - Sound eXchange, the Swiss Army knife of audio manipulation](https://linux.die.net/man/1/sox) one because it didn't require anything else.
139
140```shell
141sox output.wav -n spectrogram -o spectrogram.png
142```
143
144An example spectrogram of Ludwig van Beethoven Symphony No. 6 First movement.
145
146<audio controls>
147 <source src="/assets/dna-synthesized/symphony-no6-1st-movement.mp3" type="audio/mpeg">
148</audio>
149
150![Ludwig van Beethoven Symphony No. 6 First movement](/assets/dna-synthesized/symphony-no6-1st-movement.png)
151
152The other option could also be in combination with [gnuplot](http://www.gnuplot.info/). This would require an intermediary step, however.
153
154```shell
155sox output.wav audio.dat
156tail -n+3 audio.dat > audio_only.dat
157gnuplot audio.gpi
158```
159
160And input file `audio.gpi` that would be passed to gnuplot looks something like this.
161
162```
163# set output format and size
164set term png size 1000,280
165
166# set output file
167set output "audio.png"
168
169# set y range
170set yr [-1:1]
171
172# we want just the data
173unset key
174unset tics
175unset border
176set lmargin 0
177set rmargin 0
178set tmargin 0
179set bmargin 0
180
181# draw rectangle to change background color
182set obj 1 rectangle behind from screen 0,0 to screen 1,1
183set obj 1 fillstyle solid 1.0 fillcolor rgbcolor "#ffffff"
184
185# draw data with foreground color
186plot "audio_only.dat" with lines lt rgb 'red'
187```
188
189## Pre-generated sequences
190
191### Niels Bohr quote
192
193<audio controls>
194 <source src="/assets/dna-synthesized/quote/out.mp3" type="audio/mpeg">
195</audio>
196
197![Spectogram](/assets/dna-synthesized/quote/spectogram.png)
198
199### Mouse
200
201This is part of a mouse genome `Mus_musculus.GRCm39.dna.nonchromosomal`. You can get [genom data here](http://ftp.ensembl.org/pub/release-106/fasta/mus_musculus/dna/).
202
203<audio controls>
204 <source src="/assets/dna-synthesized/mouse/out.mp3" type="audio/mpeg">
205</audio>
206
207![Spectogram](/assets/dna-synthesized/mouse/spectogram.png)
208
209### Bison
210
211This is part of a mouse genome `Bison_bison_bison.Bison_UMD1.0.cdna`. You can get [genom data here](http://ftp.ensembl.org/pub/release-106/fasta/bison_bison_bison/cdna/).
212
213<audio controls>
214 <source src="/assets/dna-synthesized/bison/out.mp3" type="audio/mpeg">
215</audio>
216
217![Spectogram](/assets/dna-synthesized/bison/spectogram.png)
218
219### Taurus
220
221This is part of a mouse genome `Bos_taurus.ARS-UCD1.2.cdna`. You can get [genom data here](http://ftp.ensembl.org/pub/release-106/fasta/bos_taurus/cdna/).
222
223<audio controls>
224 <source src="/assets/dna-synthesized/taurus/out.mp3" type="audio/mpeg">
225</audio>
226
227![Spectogram](/assets/dna-synthesized/taurus/spectogram.png)
228
229## Going even further
230
231As you probably notice, the end results are quite similar to each other. This is to be expected because we are operating only with 4 notes essentially. What could make this more interesting is using something like [Supercollider](https://supercollider.github.io/) to create more interesting sounds. By transposing notes or using effects based on repeated data in a sequence. Possibilities are endless.
232
233I actually find the results fascinating. I took some time and listened to this music of nature. Even though it's quite the same, it's also quite different. The subtle differences on repeat kind of creates music on its own. Makes you wonder. It kind of puts Occam’s Razor in its place. Nature for sure loves to make things as energy efficient as possible.