aboutsummaryrefslogtreecommitdiff
path: root/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
diff options
context:
space:
mode:
authorMitja Felicijan <m@mitjafelicijan.com>2023-07-08 23:25:41 +0200
committerMitja Felicijan <m@mitjafelicijan.com>2023-07-08 23:25:41 +0200
commitcd6644ea4ddc78597934ab0ef5ba50e3c3daa927 (patch)
tree03de331a8db6386dfd6fa75155bfbcea6b4feaf3 /content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
parent84ed124529ffeee1590295b8de3a8faf51848680 (diff)
downloadmitjafelicijan.com-cd6644ea4ddc78597934ab0ef5ba50e3c3daa927.tar.gz
Moved to a simpler SSG
Diffstat (limited to 'content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md')
-rw-r--r--content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md363
1 files changed, 0 insertions, 363 deletions
diff --git a/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md b/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
deleted file mode 100644
index e26088b..0000000
--- a/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
+++ /dev/null
@@ -1,363 +0,0 @@
1---
2title: What would DNA sound if synthesized to an audio file
3url: what-would-dna-sound-if-synthesized.html
4date: 2022-07-05T12:00:00+02:00
5draft: false
6---
7
8## Introduction
9
10Lately, I have been thinking a lot about the nature of life, what are the
11foundation blocks of life and things like that. It's remarkable how complex and
12on the other hand simple the creation is when you look at it. The miracle of
13life keeps us grounded when our imagination goes wild. If the DNA are the blocks
14of life, you could consider them to be an API nature provided us to better
15understand all of this chaos masquerading as order.
16
17I have been reading a lot about superintelligence and our somehow misguided path
18to create general artificial intelligence. What would the building blocks or our
19creation look like? Is the compression really the ultimate storage of
20information? Will our creation also ponder this questions when creating new
21worlds for themselves, or will we just disappear into the vastness of
22possibilities? It is a little offensive that we are playing God whilst being
23completely ignorant of our own reality. Who knows! Like many other
24breakthroughs, this one will also come at a cost not known to us when it finally
25happens.
26
27To keep things a bit lighter, I decided to convert some popular DNA sequences
28into an audio files for us to listen to. I am not the first one, nor I will be
29the last one to do this. But it is an interesting exercise in better
30understanding the relationship between art and science. Maybe listening to DNA
31instead of parsing it will find a way into better understanding, or at least
32enjoying the creation and cryptic nature of life.
33
34## DNA encoding and primer example
35
36I have been exploring DNA in the past in my post from about 3 years ago in
37[Encoding binary data into DNA
38sequence](/encoding-binary-data-into-dna-sequence.html) where I have been
39converting all sorts of data into DNA sequences.
40
41This will be a similar exercise but instead of converting to DNA, I will be
42generating tones from Nucleotides.
43
44| Nucleotides | Note | Frequency |
45| ---------------- | ---- | --------- |
46| **A** (Adenine) | A | 440 Hz |
47| **C** (Cytosine) | C | 783.99 Hz |
48| **G** (Guanine) | G | 523.25 Hz |
49| **T** (Thymine) | D | 587.33 Hz |
50
51Since we do not have T in equal-tempered scale, I choose D to represent T note.
52
53You can check [Frequencies for equal-tempered scale, A4 = 440
54Hz](https://pages.mtu.edu/~suits/notefreqs.html). For this tuning, we also
55choose `Speed of Sound = 345 m/s = 1130 ft/s = 770 miles/hr`.
56
57Now that we have this out of the way, we can also brush up on the DNA sequencing
58a bit. This is a famous quote I also used for the encoding tests, and it goes
59like this.
60
61> How wonderful that we have met with a paradox. Now we have some hope of
62> making progress.
63> ― Niels Bohr
64
65```shell
66>SEQ1
67GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA
68GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA
69ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA
70ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT
71GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT
72GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC
73AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC
74AACC
75```
76
77This is what we gonna work with to get things rolling forward, when creating
78parser and waveform generator.
79
80## Parsing DNA data
81
82This step is rather simple one. All we need to do is parse input DNA sequence in
83[FASTA format](https://en.wikipedia.org/wiki/FASTA_format) well known in
84[Bioinformatics](https://en.wikipedia.org/wiki/Bioinformatics) to extract single
85Nucleotides that will be converted into separate tones based on equal-tempered
86scale explained above.
87
88```python
89nucleotide_tone_map = {
90 'A': 440,
91 'C': 523.25,
92 'G': 783.99,
93 'T': 587.33, # converted to D
94}
95
96def split(word):
97 return [char for char in word]
98
99def generate_from_dna_sequence(sequence):
100 for nucleotide in split(sequence):
101 print(nucleotide, nucleotide_tone_map[nucleotide])
102```
103
104## Generating sine wave
105
106Because we are essentially creating a long stream of notes we will be appending
107sine notes to a global array we will later use for creating a WAV file out of
108it.
109
110```python
111import math
112
113def append_sinewave(freq=440.0, duration_milliseconds=500, volume=1.0):
114 global audio
115
116 num_samples = duration_milliseconds * (sample_rate / 1000.0)
117
118 for x in range(int(num_samples)):
119 audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate)))
120
121 return
122```
123
124The sine wave generated here is the standard beep. If you want something more
125aggressive, you could try a square or saw tooth waveform.
126
127## Generating a WAV file from accumulated sine waves
128
129
130```python
131import wave
132import struct
133
134def save_wav(file_name):
135 wav_file = wave.open(file_name, 'w')
136 nchannels = 1
137 sampwidth = 2
138
139 nframes = len(audio)
140 comptype = 'NONE'
141 compname = 'not compressed'
142 wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
143
144 for sample in audio:
145 wav_file.writeframes(struct.pack('h', int(sample * 32767.0)))
146
147 wav_file.close()
148```
149
15044100 is the industry standard sample rate - CD quality. If you need to save on
151file size, you can adjust it downwards. The standard for low quality is, 8000 or
1528kHz.
153
154WAV files here are using short, 16 bit, signed integers for the sample size.
155So, we multiply the floating-point data we have by 32767, the maximum value for
156a short integer.
157
158> It is theoretically possible to use the floating point -1.0 to 1.0 data
159> directly in a WAV file, but not obvious how to do that using the wave module
160> in Python.
161
162## Generating Spectograms
163
164I have tried two methods of doing this and both were just fine. I however opted
165out to use the [SoX - Sound eXchange, the Swiss Army knife of audio
166manipulation](https://linux.die.net/man/1/sox) one because it didn't require
167anything else.
168
169```shell
170sox output.wav -n spectrogram -o spectrogram.png
171```
172
173An example spectrogram of Ludwig van Beethoven Symphony No. 6 First movement.
174
175<audio controls>
176 <source src="/assets/dna-synthesized/symphony-no6-1st-movement.mp3" type="audio/mpeg">
177</audio>
178
179![Ludwig van Beethoven Symphony No. 6 First movement](/assets/dna-synthesized/symphony-no6-1st-movement.png)
180
181The other option could also be in combination with
182[gnuplot](http://www.gnuplot.info/). This would require an intermediary step,
183however.
184
185```shell
186sox output.wav audio.dat
187tail -n+3 audio.dat > audio_only.dat
188gnuplot audio.gpi
189```
190
191And input file `audio.gpi` that would be passed to gnuplot looks something like
192this.
193
194```
195# set output format and size
196set term png size 1000,280
197
198# set output file
199set output "audio.png"
200
201# set y range
202set yr [-1:1]
203
204# we want just the data
205unset key
206unset tics
207unset border
208set lmargin 0
209set rmargin 0
210set tmargin 0
211set bmargin 0
212
213# draw rectangle to change background color
214set obj 1 rectangle behind from screen 0,0 to screen 1,1
215set obj 1 fillstyle solid 1.0 fillcolor rgbcolor "#ffffff"
216
217# draw data with foreground color
218plot "audio_only.dat" with lines lt rgb 'red'
219```
220
221## Pre-generated sequences
222
223What I did was take interesting parts from an animal's genome and feed it to a
224tone generator script. This then generated a WAV file and I converted those to
225MP3, so they can be played in a browser. The last step was creating a
226spectrogram based on a WAV file.
227
228### Niels Bohr quote
229
230<audio controls>
231 <source src="/assets/dna-synthesized/quote/out.mp3" type="audio/mpeg">
232</audio>
233
234![Spectogram](/assets/dna-synthesized/quote/spectogram.png)
235
236### Mouse
237
238This is part of a mouse genome `Mus_musculus.GRCm39.dna.nonchromosomal`. You
239can get [genom data
240here](http://ftp.ensembl.org/pub/release-106/fasta/mus_musculus/dna/).
241
242<audio controls>
243 <source src="/assets/dna-synthesized/mouse/out.mp3" type="audio/mpeg">
244</audio>
245
246![Spectogram](/assets/dna-synthesized/mouse/spectogram.png)
247
248### Bison
249
250This is part of a bison genome `Bison_bison_bison.Bison_UMD1.0.cdna`. You can
251get [genom data
252here](http://ftp.ensembl.org/pub/release-106/fasta/bison_bison_bison/cdna/).
253
254<audio controls>
255 <source src="/assets/dna-synthesized/bison/out.mp3" type="audio/mpeg">
256</audio>
257
258![Spectogram](/assets/dna-synthesized/bison/spectogram.png)
259
260### Taurus
261
262This is part of a taurus genome `Bos_taurus.ARS-UCD1.2.cdna`. You can get
263[genom data
264here](http://ftp.ensembl.org/pub/release-106/fasta/bos_taurus/cdna/).
265
266<audio controls>
267 <source src="/assets/dna-synthesized/taurus/out.mp3" type="audio/mpeg">
268</audio>
269
270![Spectogram](/assets/dna-synthesized/taurus/spectogram.png)
271
272## Making a drummer out of a DNA sequence
273
274To make things even more interesting, I decided to send this data via MIDI to my
275[Elektron Model:Samples](https://www.elektron.se/en/model-samples). This is a
276really cool piece of equipment that supports MIDI in via USB and 3.5 mm audio
277jack.
278
279Elektron is connected to my MacBook via USB cable and audio out is patched to a
280Sony Bluetooth speaker I have that supports 3.5 mm audio in. Elektron doesn't
281have internal speakers.
282
283![](/assets/dna-synthesized/elektron/IMG_0619.jpg)
284
285![](/assets/dna-synthesized/elektron/IMG_0620.jpg)
286
287![](/assets/dna-synthesized/elektron/IMG_0622.jpg)
288
289For communicating with Elektron, I choose `pygame` Python module that has MIDI
290built in. With this, it was rather simple to send notes to the device. All I did
291was map MIDI notes to the actual Nucleotides.
292
293Before all of this I also checked Audio MIDI Setup app under MacOS and checked
294MIDI Studio by pressing ⌘-2.
295
296![](/assets/dna-synthesized/elektron/midi-studio.jpg)
297
298The whole script that parses and send notes to the Elektron looks like this.
299
300```python
301import pygame.midi
302import time
303
304pygame.midi.init()
305
306print(pygame.midi.get_default_output_id())
307print(pygame.midi.get_device_info(0))
308
309player = pygame.midi.Output(1)
310player.set_instrument(2)
311
312def send_note(note, velocity):
313 global player
314 player.note_on(note, velocity)
315 time.sleep(0.3)
316 player.note_off(note, velocity)
317
318
319nucleotide_midi_map = {
320 'A': 60,
321 'C': 90,
322 'G': 160,
323 'T': 180, # is D
324}
325
326with open("quote.fa") as f:
327 sequence = f.read().replace('\n', '')
328
329for nucleotide in [char for char in sequence]:
330 print("Playing nucleotide {} with MIDI note {}".format(
331 nucleotide, nucleotide_midi_map[nucleotide]))
332 send_note(nucleotide_midi_map[nucleotide], 127)
333
334del player
335pygame.midi.quit()
336```
337
338<video src="/assets/dna-synthesized/elektron/elektron.mp4" controls></video>
339
340All of this could be made much more interesting if I choose different
341instruments for different Nucleotides, or doing more funky stuff with Elektron.
342But for now, this should be enough. It is just a proof of concept. Something to
343play around with.
344
345## Going even further
346
347As you probably notice, the end results are quite similar to each other. This is
348to be expected because we are operating only with 4 notes essentially. What
349could make this more interesting is using something like
350[Supercollider](https://supercollider.github.io/) to create more interesting
351sounds. By transposing notes or using effects based on repeated data in a
352sequence. Possibilities are endless.
353
354It is really astonishing what can be achieved with a little bit of code and an
355idea. I could see this becoming an interesting background soundscape instrument
356if done properly. It could replace random note generator with something more
357intriguing, biological, natural.
358
359I actually find the results fascinating. I took some time and listened to this
360music of nature. Even though it's quite the same, it's also quite different.
361The subtle differences on repeat kind of creates music on its own. Makes you
362wonder. It kind of puts Occam’s Razor in its place. Nature for sure loves to
363make things as energy efficient as possible.