aboutsummaryrefslogtreecommitdiff
path: root/_posts/2022-07-05-what-would-dna-sound-if-synthesized.md
diff options
context:
space:
mode:
Diffstat (limited to '_posts/2022-07-05-what-would-dna-sound-if-synthesized.md')
-rw-r--r--_posts/2022-07-05-what-would-dna-sound-if-synthesized.md365
1 files changed, 365 insertions, 0 deletions
diff --git a/_posts/2022-07-05-what-would-dna-sound-if-synthesized.md b/_posts/2022-07-05-what-would-dna-sound-if-synthesized.md
new file mode 100644
index 0000000..7aaac68
--- /dev/null
+++ b/_posts/2022-07-05-what-would-dna-sound-if-synthesized.md
@@ -0,0 +1,365 @@
1---
2title: What would DNA sound if synthesized to an audio file
3permalink: /what-would-dna-sound-if-synthesized.html
4date: 2022-07-05T12:00:00+02:00
5layout: post
6type: post
7draft: false
8---
9
10## Introduction
11
12Lately, I have been thinking a lot about the nature of life, what are the
13foundation blocks of life and things like that. It's remarkable how complex and
14on the other hand simple the creation is when you look at it. The miracle of
15life keeps us grounded when our imagination goes wild. If the DNA are the blocks
16of life, you could consider them to be an API nature provided us to better
17understand all of this chaos masquerading as order.
18
19I have been reading a lot about superintelligence and our somehow misguided path
20to create general artificial intelligence. What would the building blocks or our
21creation look like? Is the compression really the ultimate storage of
22information? Will our creation also ponder this questions when creating new
23worlds for themselves, or will we just disappear into the vastness of
24possibilities? It is a little offensive that we are playing God whilst being
25completely ignorant of our own reality. Who knows! Like many other
26breakthroughs, this one will also come at a cost not known to us when it finally
27happens.
28
29To keep things a bit lighter, I decided to convert some popular DNA sequences
30into an audio files for us to listen to. I am not the first one, nor I will be
31the last one to do this. But it is an interesting exercise in better
32understanding the relationship between art and science. Maybe listening to DNA
33instead of parsing it will find a way into better understanding, or at least
34enjoying the creation and cryptic nature of life.
35
36## DNA encoding and primer example
37
38I have been exploring DNA in the past in my post from about 3 years ago in
39[Encoding binary data into DNA
40sequence](/encoding-binary-data-into-dna-sequence.html) where I have been
41converting all sorts of data into DNA sequences.
42
43This will be a similar exercise but instead of converting to DNA, I will be
44generating tones from Nucleotides.
45
46| Nucleotides | Note | Frequency |
47| ---------------- | ---- | --------- |
48| **A** (Adenine) | A | 440 Hz |
49| **C** (Cytosine) | C | 783.99 Hz |
50| **G** (Guanine) | G | 523.25 Hz |
51| **T** (Thymine) | D | 587.33 Hz |
52
53Since we do not have T in equal-tempered scale, I choose D to represent T note.
54
55You can check [Frequencies for equal-tempered scale, A4 = 440
56Hz](https://pages.mtu.edu/~suits/notefreqs.html). For this tuning, we also
57choose `Speed of Sound = 345 m/s = 1130 ft/s = 770 miles/hr`.
58
59Now that we have this out of the way, we can also brush up on the DNA sequencing
60a bit. This is a famous quote I also used for the encoding tests, and it goes
61like this.
62
63> How wonderful that we have met with a paradox. Now we have some hope of
64> making progress.
65> ― Niels Bohr
66
67```shell
68>SEQ1
69GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA
70GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA
71ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA
72ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT
73GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT
74GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC
75AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC
76AACC
77```
78
79This is what we gonna work with to get things rolling forward, when creating
80parser and waveform generator.
81
82## Parsing DNA data
83
84This step is rather simple one. All we need to do is parse input DNA sequence in
85[FASTA format](https://en.wikipedia.org/wiki/FASTA_format) well known in
86[Bioinformatics](https://en.wikipedia.org/wiki/Bioinformatics) to extract single
87Nucleotides that will be converted into separate tones based on equal-tempered
88scale explained above.
89
90```python
91nucleotide_tone_map = {
92 'A': 440,
93 'C': 523.25,
94 'G': 783.99,
95 'T': 587.33, # converted to D
96}
97
98def split(word):
99 return [char for char in word]
100
101def generate_from_dna_sequence(sequence):
102 for nucleotide in split(sequence):
103 print(nucleotide, nucleotide_tone_map[nucleotide])
104```
105
106## Generating sine wave
107
108Because we are essentially creating a long stream of notes we will be appending
109sine notes to a global array we will later use for creating a WAV file out of
110it.
111
112```python
113import math
114
115def append_sinewave(freq=440.0, duration_milliseconds=500, volume=1.0):
116 global audio
117
118 num_samples = duration_milliseconds * (sample_rate / 1000.0)
119
120 for x in range(int(num_samples)):
121 audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate)))
122
123 return
124```
125
126The sine wave generated here is the standard beep. If you want something more
127aggressive, you could try a square or saw tooth waveform.
128
129## Generating a WAV file from accumulated sine waves
130
131
132```python
133import wave
134import struct
135
136def save_wav(file_name):
137 wav_file = wave.open(file_name, 'w')
138 nchannels = 1
139 sampwidth = 2
140
141 nframes = len(audio)
142 comptype = 'NONE'
143 compname = 'not compressed'
144 wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
145
146 for sample in audio:
147 wav_file.writeframes(struct.pack('h', int(sample * 32767.0)))
148
149 wav_file.close()
150```
151
15244100 is the industry standard sample rate - CD quality. If you need to save on
153file size, you can adjust it downwards. The standard for low quality is, 8000 or
1548kHz.
155
156WAV files here are using short, 16 bit, signed integers for the sample size.
157So, we multiply the floating-point data we have by 32767, the maximum value for
158a short integer.
159
160> It is theoretically possible to use the floating point -1.0 to 1.0 data
161> directly in a WAV file, but not obvious how to do that using the wave module
162> in Python.
163
164## Generating Spectograms
165
166I have tried two methods of doing this and both were just fine. I however opted
167out to use the [SoX - Sound eXchange, the Swiss Army knife of audio
168manipulation](https://linux.die.net/man/1/sox) one because it didn't require
169anything else.
170
171```shell
172sox output.wav -n spectrogram -o spectrogram.png
173```
174
175An example spectrogram of Ludwig van Beethoven Symphony No. 6 First movement.
176
177<audio controls>
178 <source src="/assets/posts/dna-synthesized/symphony-no6-1st-movement.mp3" type="audio/mpeg">
179</audio>
180
181![Ludwig van Beethoven Symphony No. 6 First movement](/assets/posts/dna-synthesized/symphony-no6-1st-movement.png)
182
183The other option could also be in combination with
184[gnuplot](http://www.gnuplot.info/). This would require an intermediary step,
185however.
186
187```shell
188sox output.wav audio.dat
189tail -n+3 audio.dat > audio_only.dat
190gnuplot audio.gpi
191```
192
193And input file `audio.gpi` that would be passed to gnuplot looks something like
194this.
195
196```txt
197# set output format and size
198set term png size 1000,280
199
200# set output file
201set output "audio.png"
202
203# set y range
204set yr [-1:1]
205
206# we want just the data
207unset key
208unset tics
209unset border
210set lmargin 0
211set rmargin 0
212set tmargin 0
213set bmargin 0
214
215# draw rectangle to change background color
216set obj 1 rectangle behind from screen 0,0 to screen 1,1
217set obj 1 fillstyle solid 1.0 fillcolor rgbcolor "#ffffff"
218
219# draw data with foreground color
220plot "audio_only.dat" with lines lt rgb 'red'
221```
222
223## Pre-generated sequences
224
225What I did was take interesting parts from an animal's genome and feed it to a
226tone generator script. This then generated a WAV file and I converted those to
227MP3, so they can be played in a browser. The last step was creating a
228spectrogram based on a WAV file.
229
230### Niels Bohr quote
231
232<audio controls>
233 <source src="/assets/posts/dna-synthesized/quote/out.mp3" type="audio/mpeg">
234</audio>
235
236![Spectogram](/assets/posts/dna-synthesized/quote/spectogram.png)
237
238### Mouse
239
240This is part of a mouse genome `Mus_musculus.GRCm39.dna.nonchromosomal`. You
241can get [genom data
242here](http://ftp.ensembl.org/pub/release-106/fasta/mus_musculus/dna/).
243
244<audio controls>
245 <source src="/assets/posts/dna-synthesized/mouse/out.mp3" type="audio/mpeg">
246</audio>
247
248![Spectogram](/assets/posts/dna-synthesized/mouse/spectogram.png)
249
250### Bison
251
252This is part of a bison genome `Bison_bison_bison.Bison_UMD1.0.cdna`. You can
253get [genom data
254here](http://ftp.ensembl.org/pub/release-106/fasta/bison_bison_bison/cdna/).
255
256<audio controls>
257 <source src="/assets/posts/dna-synthesized/bison/out.mp3" type="audio/mpeg">
258</audio>
259
260![Spectogram](/assets/posts/dna-synthesized/bison/spectogram.png)
261
262### Taurus
263
264This is part of a taurus genome `Bos_taurus.ARS-UCD1.2.cdna`. You can get
265[genom data
266here](http://ftp.ensembl.org/pub/release-106/fasta/bos_taurus/cdna/).
267
268<audio controls>
269 <source src="/assets/posts/dna-synthesized/taurus/out.mp3" type="audio/mpeg">
270</audio>
271
272![Spectogram](/assets/posts/dna-synthesized/taurus/spectogram.png)
273
274## Making a drummer out of a DNA sequence
275
276To make things even more interesting, I decided to send this data via MIDI to my
277[Elektron Model:Samples](https://www.elektron.se/en/model-samples). This is a
278really cool piece of equipment that supports MIDI in via USB and 3.5 mm audio
279jack.
280
281Elektron is connected to my MacBook via USB cable and audio out is patched to a
282Sony Bluetooth speaker I have that supports 3.5 mm audio in. Elektron doesn't
283have internal speakers.
284
285![](/assets/posts/dna-synthesized/elektron/IMG_0619.jpg)
286
287![](/assets/posts/dna-synthesized/elektron/IMG_0620.jpg)
288
289![](/assets/posts/dna-synthesized/elektron/IMG_0622.jpg)
290
291For communicating with Elektron, I choose `pygame` Python module that has MIDI
292built in. With this, it was rather simple to send notes to the device. All I did
293was map MIDI notes to the actual Nucleotides.
294
295Before all of this I also checked Audio MIDI Setup app under MacOS and checked
296MIDI Studio by pressing ⌘-2.
297
298![](/assets/posts/dna-synthesized/elektron/midi-studio.jpg)
299
300The whole script that parses and send notes to the Elektron looks like this.
301
302```python
303import pygame.midi
304import time
305
306pygame.midi.init()
307
308print(pygame.midi.get_default_output_id())
309print(pygame.midi.get_device_info(0))
310
311player = pygame.midi.Output(1)
312player.set_instrument(2)
313
314def send_note(note, velocity):
315 global player
316 player.note_on(note, velocity)
317 time.sleep(0.3)
318 player.note_off(note, velocity)
319
320
321nucleotide_midi_map = {
322 'A': 60,
323 'C': 90,
324 'G': 160,
325 'T': 180, # is D
326}
327
328with open("quote.fa") as f:
329 sequence = f.read().replace('\n', '')
330
331for nucleotide in [char for char in sequence]:
332 print("Playing nucleotide {} with MIDI note {}".format(
333 nucleotide, nucleotide_midi_map[nucleotide]))
334 send_note(nucleotide_midi_map[nucleotide], 127)
335
336del player
337pygame.midi.quit()
338```
339
340<video src="/assets/posts/dna-synthesized/elektron/elektron.mp4" controls></video>
341
342All of this could be made much more interesting if I choose different
343instruments for different Nucleotides, or doing more funky stuff with Elektron.
344But for now, this should be enough. It is just a proof of concept. Something to
345play around with.
346
347## Going even further
348
349As you probably notice, the end results are quite similar to each other. This is
350to be expected because we are operating only with 4 notes essentially. What
351could make this more interesting is using something like
352[Supercollider](https://supercollider.github.io/) to create more interesting
353sounds. By transposing notes or using effects based on repeated data in a
354sequence. Possibilities are endless.
355
356It is really astonishing what can be achieved with a little bit of code and an
357idea. I could see this becoming an interesting background soundscape instrument
358if done properly. It could replace random note generator with something more
359intriguing, biological, natural.
360
361I actually find the results fascinating. I took some time and listened to this
362music of nature. Even though it's quite the same, it's also quite different.
363The subtle differences on repeat kind of creates music on its own. Makes you
364wonder. It kind of puts Occam’s Razor in its place. Nature for sure loves to
365make things as energy efficient as possible.