aboutsummaryrefslogtreecommitdiff
path: root/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
diff options
context:
space:
mode:
Diffstat (limited to 'content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md')
-rw-r--r--content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md364
1 files changed, 364 insertions, 0 deletions
diff --git a/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md b/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
new file mode 100644
index 0000000..1b82cb6
--- /dev/null
+++ b/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
@@ -0,0 +1,364 @@
1---
2title: What would DNA sound if synthesized to an audio file
3url: /what-would-dna-sound-if-synthesized.html
4date: 2022-07-05T12:00:00+02:00
5type: post
6draft: false
7---
8
9## Introduction
10
11Lately, I have been thinking a lot about the nature of life, what are the
12foundation blocks of life and things like that. It's remarkable how complex and
13on the other hand simple the creation is when you look at it. The miracle of
14life keeps us grounded when our imagination goes wild. If the DNA are the blocks
15of life, you could consider them to be an API nature provided us to better
16understand all of this chaos masquerading as order.
17
18I have been reading a lot about superintelligence and our somehow misguided path
19to create general artificial intelligence. What would the building blocks or our
20creation look like? Is the compression really the ultimate storage of
21information? Will our creation also ponder this questions when creating new
22worlds for themselves, or will we just disappear into the vastness of
23possibilities? It is a little offensive that we are playing God whilst being
24completely ignorant of our own reality. Who knows! Like many other
25breakthroughs, this one will also come at a cost not known to us when it finally
26happens.
27
28To keep things a bit lighter, I decided to convert some popular DNA sequences
29into an audio files for us to listen to. I am not the first one, nor I will be
30the last one to do this. But it is an interesting exercise in better
31understanding the relationship between art and science. Maybe listening to DNA
32instead of parsing it will find a way into better understanding, or at least
33enjoying the creation and cryptic nature of life.
34
35## DNA encoding and primer example
36
37I have been exploring DNA in the past in my post from about 3 years ago in
38[Encoding binary data into DNA
39sequence](/encoding-binary-data-into-dna-sequence.html) where I have been
40converting all sorts of data into DNA sequences.
41
42This will be a similar exercise but instead of converting to DNA, I will be
43generating tones from Nucleotides.
44
45| Nucleotides | Note | Frequency |
46| ---------------- | ---- | --------- |
47| **A** (Adenine) | A | 440 Hz |
48| **C** (Cytosine) | C | 783.99 Hz |
49| **G** (Guanine) | G | 523.25 Hz |
50| **T** (Thymine) | D | 587.33 Hz |
51
52Since we do not have T in equal-tempered scale, I choose D to represent T note.
53
54You can check [Frequencies for equal-tempered scale, A4 = 440
55Hz](https://pages.mtu.edu/~suits/notefreqs.html). For this tuning, we also
56choose `Speed of Sound = 345 m/s = 1130 ft/s = 770 miles/hr`.
57
58Now that we have this out of the way, we can also brush up on the DNA sequencing
59a bit. This is a famous quote I also used for the encoding tests, and it goes
60like this.
61
62> How wonderful that we have met with a paradox. Now we have some hope of
63> making progress.
64> ― Niels Bohr
65
66```shell
67>SEQ1
68GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA
69GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA
70ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA
71ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT
72GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT
73GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC
74AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC
75AACC
76```
77
78This is what we gonna work with to get things rolling forward, when creating
79parser and waveform generator.
80
81## Parsing DNA data
82
83This step is rather simple one. All we need to do is parse input DNA sequence in
84[FASTA format](https://en.wikipedia.org/wiki/FASTA_format) well known in
85[Bioinformatics](https://en.wikipedia.org/wiki/Bioinformatics) to extract single
86Nucleotides that will be converted into separate tones based on equal-tempered
87scale explained above.
88
89```python
90nucleotide_tone_map = {
91 'A': 440,
92 'C': 523.25,
93 'G': 783.99,
94 'T': 587.33, # converted to D
95}
96
97def split(word):
98 return [char for char in word]
99
100def generate_from_dna_sequence(sequence):
101 for nucleotide in split(sequence):
102 print(nucleotide, nucleotide_tone_map[nucleotide])
103```
104
105## Generating sine wave
106
107Because we are essentially creating a long stream of notes we will be appending
108sine notes to a global array we will later use for creating a WAV file out of
109it.
110
111```python
112import math
113
114def append_sinewave(freq=440.0, duration_milliseconds=500, volume=1.0):
115 global audio
116
117 num_samples = duration_milliseconds * (sample_rate / 1000.0)
118
119 for x in range(int(num_samples)):
120 audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate)))
121
122 return
123```
124
125The sine wave generated here is the standard beep. If you want something more
126aggressive, you could try a square or saw tooth waveform.
127
128## Generating a WAV file from accumulated sine waves
129
130
131```python
132import wave
133import struct
134
135def save_wav(file_name):
136 wav_file = wave.open(file_name, 'w')
137 nchannels = 1
138 sampwidth = 2
139
140 nframes = len(audio)
141 comptype = 'NONE'
142 compname = 'not compressed'
143 wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
144
145 for sample in audio:
146 wav_file.writeframes(struct.pack('h', int(sample * 32767.0)))
147
148 wav_file.close()
149```
150
15144100 is the industry standard sample rate - CD quality. If you need to save on
152file size, you can adjust it downwards. The standard for low quality is, 8000 or
1538kHz.
154
155WAV files here are using short, 16 bit, signed integers for the sample size.
156So, we multiply the floating-point data we have by 32767, the maximum value for
157a short integer.
158
159> It is theoretically possible to use the floating point -1.0 to 1.0 data
160> directly in a WAV file, but not obvious how to do that using the wave module
161> in Python.
162
163## Generating Spectograms
164
165I have tried two methods of doing this and both were just fine. I however opted
166out to use the [SoX - Sound eXchange, the Swiss Army knife of audio
167manipulation](https://linux.die.net/man/1/sox) one because it didn't require
168anything else.
169
170```shell
171sox output.wav -n spectrogram -o spectrogram.png
172```
173
174An example spectrogram of Ludwig van Beethoven Symphony No. 6 First movement.
175
176<audio controls>
177 <source src="/assets/posts/dna-synthesized/symphony-no6-1st-movement.mp3" type="audio/mpeg">
178</audio>
179
180![Ludwig van Beethoven Symphony No. 6 First movement](/assets/posts/dna-synthesized/symphony-no6-1st-movement.png)
181
182The other option could also be in combination with
183[gnuplot](http://www.gnuplot.info/). This would require an intermediary step,
184however.
185
186```shell
187sox output.wav audio.dat
188tail -n+3 audio.dat > audio_only.dat
189gnuplot audio.gpi
190```
191
192And input file `audio.gpi` that would be passed to gnuplot looks something like
193this.
194
195```txt
196# set output format and size
197set term png size 1000,280
198
199# set output file
200set output "audio.png"
201
202# set y range
203set yr [-1:1]
204
205# we want just the data
206unset key
207unset tics
208unset border
209set lmargin 0
210set rmargin 0
211set tmargin 0
212set bmargin 0
213
214# draw rectangle to change background color
215set obj 1 rectangle behind from screen 0,0 to screen 1,1
216set obj 1 fillstyle solid 1.0 fillcolor rgbcolor "#ffffff"
217
218# draw data with foreground color
219plot "audio_only.dat" with lines lt rgb 'red'
220```
221
222## Pre-generated sequences
223
224What I did was take interesting parts from an animal's genome and feed it to a
225tone generator script. This then generated a WAV file and I converted those to
226MP3, so they can be played in a browser. The last step was creating a
227spectrogram based on a WAV file.
228
229### Niels Bohr quote
230
231<audio controls>
232 <source src="/assets/posts/dna-synthesized/quote/out.mp3" type="audio/mpeg">
233</audio>
234
235![Spectogram](/assets/posts/dna-synthesized/quote/spectogram.png)
236
237### Mouse
238
239This is part of a mouse genome `Mus_musculus.GRCm39.dna.nonchromosomal`. You
240can get [genom data
241here](http://ftp.ensembl.org/pub/release-106/fasta/mus_musculus/dna/).
242
243<audio controls>
244 <source src="/assets/posts/dna-synthesized/mouse/out.mp3" type="audio/mpeg">
245</audio>
246
247![Spectogram](/assets/posts/dna-synthesized/mouse/spectogram.png)
248
249### Bison
250
251This is part of a bison genome `Bison_bison_bison.Bison_UMD1.0.cdna`. You can
252get [genom data
253here](http://ftp.ensembl.org/pub/release-106/fasta/bison_bison_bison/cdna/).
254
255<audio controls>
256 <source src="/assets/posts/dna-synthesized/bison/out.mp3" type="audio/mpeg">
257</audio>
258
259![Spectogram](/assets/posts/dna-synthesized/bison/spectogram.png)
260
261### Taurus
262
263This is part of a taurus genome `Bos_taurus.ARS-UCD1.2.cdna`. You can get
264[genom data
265here](http://ftp.ensembl.org/pub/release-106/fasta/bos_taurus/cdna/).
266
267<audio controls>
268 <source src="/assets/posts/dna-synthesized/taurus/out.mp3" type="audio/mpeg">
269</audio>
270
271![Spectogram](/assets/posts/dna-synthesized/taurus/spectogram.png)
272
273## Making a drummer out of a DNA sequence
274
275To make things even more interesting, I decided to send this data via MIDI to my
276[Elektron Model:Samples](https://www.elektron.se/en/model-samples). This is a
277really cool piece of equipment that supports MIDI in via USB and 3.5 mm audio
278jack.
279
280Elektron is connected to my MacBook via USB cable and audio out is patched to a
281Sony Bluetooth speaker I have that supports 3.5 mm audio in. Elektron doesn't
282have internal speakers.
283
284![](/assets/posts/dna-synthesized/elektron/IMG_0619.jpg)
285
286![](/assets/posts/dna-synthesized/elektron/IMG_0620.jpg)
287
288![](/assets/posts/dna-synthesized/elektron/IMG_0622.jpg)
289
290For communicating with Elektron, I choose `pygame` Python module that has MIDI
291built in. With this, it was rather simple to send notes to the device. All I did
292was map MIDI notes to the actual Nucleotides.
293
294Before all of this I also checked Audio MIDI Setup app under MacOS and checked
295MIDI Studio by pressing ⌘-2.
296
297![](/assets/posts/dna-synthesized/elektron/midi-studio.jpg)
298
299The whole script that parses and send notes to the Elektron looks like this.
300
301```python
302import pygame.midi
303import time
304
305pygame.midi.init()
306
307print(pygame.midi.get_default_output_id())
308print(pygame.midi.get_device_info(0))
309
310player = pygame.midi.Output(1)
311player.set_instrument(2)
312
313def send_note(note, velocity):
314 global player
315 player.note_on(note, velocity)
316 time.sleep(0.3)
317 player.note_off(note, velocity)
318
319
320nucleotide_midi_map = {
321 'A': 60,
322 'C': 90,
323 'G': 160,
324 'T': 180, # is D
325}
326
327with open("quote.fa") as f:
328 sequence = f.read().replace('\n', '')
329
330for nucleotide in [char for char in sequence]:
331 print("Playing nucleotide {} with MIDI note {}".format(
332 nucleotide, nucleotide_midi_map[nucleotide]))
333 send_note(nucleotide_midi_map[nucleotide], 127)
334
335del player
336pygame.midi.quit()
337```
338
339<video src="/assets/posts/dna-synthesized/elektron/elektron.mp4" controls></video>
340
341All of this could be made much more interesting if I choose different
342instruments for different Nucleotides, or doing more funky stuff with Elektron.
343But for now, this should be enough. It is just a proof of concept. Something to
344play around with.
345
346## Going even further
347
348As you probably notice, the end results are quite similar to each other. This is
349to be expected because we are operating only with 4 notes essentially. What
350could make this more interesting is using something like
351[Supercollider](https://supercollider.github.io/) to create more interesting
352sounds. By transposing notes or using effects based on repeated data in a
353sequence. Possibilities are endless.
354
355It is really astonishing what can be achieved with a little bit of code and an
356idea. I could see this becoming an interesting background soundscape instrument
357if done properly. It could replace random note generator with something more
358intriguing, biological, natural.
359
360I actually find the results fascinating. I took some time and listened to this
361music of nature. Even though it's quite the same, it's also quite different.
362The subtle differences on repeat kind of creates music on its own. Makes you
363wonder. It kind of puts Occam’s Razor in its place. Nature for sure loves to
364make things as energy efficient as possible.