mitjafelicijan.com - content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md

Path: mitjafelicijan.com / content / posts / 2022-07-05-what-would-dna-sound-if-synthesized.md (raw)
  1---
  2title: What would DNA sound if synthesized to an audio file
  3url: what-would-dna-sound-if-synthesized.html
  4date: 2022-07-05T12:00:00+02:00
  5type: post
  6draft: false
  7---
  8
  9## Introduction
 10
 11Lately, I have been thinking a lot about the nature of life, what are the
 12foundation blocks of life and things like that. It's remarkable how complex and
 13on the other hand simple the creation is when you look at it. The miracle of
 14life keeps us grounded when our imagination goes wild. If the DNA are the blocks
 15of life, you could consider them to be an API nature provided us to better
 16understand all of this chaos masquerading as order.
 17
 18I have been reading a lot about superintelligence and our somehow misguided path
 19to create general artificial intelligence. What would the building blocks or our
 20creation look like? Is the compression really the ultimate storage of
 21information? Will our creation also ponder this questions when creating new
 22worlds for themselves, or will we just disappear into the vastness of
 23possibilities? It is a little offensive that we are playing God whilst being
 24completely ignorant of our own reality. Who knows! Like many other
 25breakthroughs, this one will also come at a cost not known to us when it finally
 26happens.
 27
 28To keep things a bit lighter, I decided to convert some popular DNA sequences
 29into an audio files for us to listen to. I am not the first one, nor I will be
 30the last one to do this. But it is an interesting exercise in better
 31understanding the relationship between art and science. Maybe listening to DNA
 32instead of parsing it will find a way into better understanding, or at least
 33enjoying the creation and cryptic nature of life.
 34
 35## DNA encoding and primer example
 36
 37I have been exploring DNA in the past in my post from about 3 years ago in
 38[Encoding binary data into DNA
 39sequence](/encoding-binary-data-into-dna-sequence.html) where I have been
 40converting all sorts of data into DNA sequences.
 41
 42This will be a similar exercise but instead of converting to DNA, I will be
 43generating tones from Nucleotides.
 44
 45| Nucleotides      | Note | Frequency |
 46| ---------------- | ---- | --------- |
 47| **A** (Adenine)  | A    | 440 Hz    |
 48| **C** (Cytosine) | C    | 783.99 Hz |
 49| **G** (Guanine)  | G    | 523.25 Hz |
 50| **T** (Thymine)  | D    | 587.33 Hz |
 51
 52Since we do not have T in equal-tempered scale, I choose D to represent T note.
 53
 54You can check [Frequencies for equal-tempered scale, A4 = 440
 55Hz](https://pages.mtu.edu/~suits/notefreqs.html).  For this tuning, we also
 56choose `Speed of Sound = 345 m/s = 1130 ft/s = 770 miles/hr`.
 57
 58Now that we have this out of the way, we can also brush up on the DNA sequencing
 59a bit. This is a famous quote I also used for the encoding tests, and it goes
 60like this.
 61
 62> How wonderful that we have met with a paradox. Now we have some hope of
 63> making progress.
 64> ― Niels Bohr
 65
 66```shell
 67>SEQ1
 68GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA
 69GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA
 70ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA
 71ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT
 72GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT
 73GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC
 74AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC
 75AACC
 76```
 77
 78This is what we gonna work with to get things rolling forward, when creating
 79parser and waveform generator.
 80
 81## Parsing DNA data
 82
 83This step is rather simple one. All we need to do is parse input DNA sequence in
 84[FASTA format](https://en.wikipedia.org/wiki/FASTA_format) well known in
 85[Bioinformatics](https://en.wikipedia.org/wiki/Bioinformatics) to extract single
 86Nucleotides that will be converted into separate tones based on equal-tempered
 87scale explained above.
 88
 89```python
 90nucleotide_tone_map = {
 91  'A': 440,
 92  'C': 523.25,
 93  'G': 783.99,
 94  'T': 587.33,  # converted to D
 95}
 96
 97def split(word):
 98  return [char for char in word]
 99
100def generate_from_dna_sequence(sequence):
101  for nucleotide in split(sequence):
102    print(nucleotide, nucleotide_tone_map[nucleotide])
103```
104
105## Generating sine wave
106
107Because we are essentially creating a long stream of notes we will be appending
108sine notes to a global array we will later use for creating a WAV file out of
109it.
110
111```python
112import math
113
114def append_sinewave(freq=440.0, duration_milliseconds=500, volume=1.0):
115  global audio
116
117  num_samples = duration_milliseconds * (sample_rate / 1000.0)
118
119  for x in range(int(num_samples)):
120    audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate)))
121
122  return
123```
124
125The sine wave generated here is the standard beep. If you want something more
126aggressive, you could try a square or saw tooth waveform.
127
128## Generating a WAV file from accumulated sine waves
129
130
131```python
132import wave
133import struct
134
135def save_wav(file_name):
136  wav_file = wave.open(file_name, 'w')
137  nchannels = 1
138  sampwidth = 2
139
140  nframes = len(audio)
141  comptype = 'NONE'
142  compname = 'not compressed'
143  wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
144
145  for sample in audio:
146    wav_file.writeframes(struct.pack('h', int(sample * 32767.0)))
147
148  wav_file.close()
149```
150
15144100 is the industry standard sample rate - CD quality.  If you need to save on
152file size, you can adjust it downwards. The standard for low quality is, 8000 or
1538kHz.
154
155WAV files here are using short, 16 bit, signed integers for the sample size.
156So, we multiply the floating-point data we have by 32767, the maximum value for
157a short integer.
158
159> It is theoretically possible to use the floating point -1.0 to 1.0 data
160> directly in a WAV file, but not obvious how to do that using the wave module
161> in Python.
162
163## Generating Spectograms
164
165I have tried two methods of doing this and both were just fine. I however opted
166out to use the [SoX - Sound eXchange, the Swiss Army knife of audio
167manipulation](https://linux.die.net/man/1/sox) one because it didn't require
168anything else.
169
170```shell
171sox output.wav -n spectrogram -o spectrogram.png
172```
173
174An example spectrogram of Ludwig van Beethoven Symphony No. 6 First movement.
175
176<audio controls>
177  <source src="/assets/posts/dna-synthesized/symphony-no6-1st-movement.mp3" type="audio/mpeg">
178</audio>
179
180![Ludwig van Beethoven Symphony No. 6 First movement](/assets/posts/dna-synthesized/symphony-no6-1st-movement.png)
181
182The other option could also be in combination with
183[gnuplot](http://www.gnuplot.info/).  This would require an intermediary step,
184however.
185
186```shell
187sox output.wav audio.dat
188tail -n+3 audio.dat > audio_only.dat
189gnuplot audio.gpi
190```
191
192And input file `audio.gpi` that would be passed to gnuplot looks something like
193this.
194
195```txt
196# set output format and size
197set term png size 1000,280
198
199# set output file
200set output "audio.png"
201
202# set y range
203set yr [-1:1]
204
205# we want just the data
206unset key
207unset tics
208unset border
209set lmargin 0
210set rmargin 0
211set tmargin 0
212set bmargin 0
213
214# draw rectangle to change background color
215set obj 1 rectangle behind from screen 0,0 to screen 1,1
216set obj 1 fillstyle solid 1.0 fillcolor rgbcolor "#ffffff"
217
218# draw data with foreground color
219plot "audio_only.dat" with lines lt rgb 'red'
220```
221
222## Pre-generated sequences
223
224What I did was take interesting parts from an animal's genome and feed it to a
225tone generator script. This then generated a WAV file and I converted those to
226MP3, so they can be played in a browser. The last step was creating a
227spectrogram based on a WAV file.
228
229### Niels Bohr quote
230
231<audio controls>
232  <source src="/assets/posts/dna-synthesized/quote/out.mp3" type="audio/mpeg">
233</audio>
234
235![Spectogram](/assets/posts/dna-synthesized/quote/spectogram.png)
236
237### Mouse
238
239This is part of a mouse genome `Mus_musculus.GRCm39.dna.nonchromosomal`.  You
240can get [genom data
241here](http://ftp.ensembl.org/pub/release-106/fasta/mus_musculus/dna/).
242
243<audio controls>
244  <source src="/assets/posts/dna-synthesized/mouse/out.mp3" type="audio/mpeg">
245</audio>
246
247![Spectogram](/assets/posts/dna-synthesized/mouse/spectogram.png)
248
249### Bison
250
251This is part of a bison genome `Bison_bison_bison.Bison_UMD1.0.cdna`.  You can
252get [genom data
253here](http://ftp.ensembl.org/pub/release-106/fasta/bison_bison_bison/cdna/).
254
255<audio controls>
256  <source src="/assets/posts/dna-synthesized/bison/out.mp3" type="audio/mpeg">
257</audio>
258
259![Spectogram](/assets/posts/dna-synthesized/bison/spectogram.png)
260
261### Taurus
262
263This is part of a taurus genome `Bos_taurus.ARS-UCD1.2.cdna`.  You can get
264[genom data
265here](http://ftp.ensembl.org/pub/release-106/fasta/bos_taurus/cdna/).
266
267<audio controls>
268  <source src="/assets/posts/dna-synthesized/taurus/out.mp3" type="audio/mpeg">
269</audio>
270
271![Spectogram](/assets/posts/dna-synthesized/taurus/spectogram.png)
272
273## Making a drummer out of a DNA sequence
274
275To make things even more interesting, I decided to send this data via MIDI to my
276[Elektron Model:Samples](https://www.elektron.se/en/model-samples). This is a
277really cool piece of equipment that supports MIDI in via USB and 3.5 mm audio
278jack.
279
280Elektron is connected to my MacBook via USB cable and audio out is patched to a
281Sony Bluetooth speaker I have that supports 3.5 mm audio in. Elektron doesn't
282have internal speakers.
283
284![](/assets/posts/dna-synthesized/elektron/IMG_0619.jpg)
285
286![](/assets/posts/dna-synthesized/elektron/IMG_0620.jpg)
287
288![](/assets/posts/dna-synthesized/elektron/IMG_0622.jpg)
289
290For communicating with Elektron, I choose `pygame` Python module that has MIDI
291built in. With this, it was rather simple to send notes to the device. All I did
292was map MIDI notes to the actual Nucleotides.
293
294Before all of this I also checked Audio MIDI Setup app under MacOS and checked
295MIDI Studio by pressing ⌘-2.
296
297![](/assets/posts/dna-synthesized/elektron/midi-studio.jpg)
298
299The whole script that parses and send notes to the Elektron looks like this.
300
301```python
302import pygame.midi
303import time
304
305pygame.midi.init()
306
307print(pygame.midi.get_default_output_id())
308print(pygame.midi.get_device_info(0))
309
310player = pygame.midi.Output(1)
311player.set_instrument(2)
312
313def send_note(note, velocity):
314  global player
315  player.note_on(note, velocity)
316  time.sleep(0.3)
317  player.note_off(note, velocity)
318
319
320nucleotide_midi_map = {
321  'A': 60,
322  'C': 90,
323  'G': 160,
324  'T': 180,  # is D
325}
326
327with open("quote.fa") as f:
328  sequence = f.read().replace('\n', '')
329
330for nucleotide in [char for char in sequence]:
331  print("Playing nucleotide {} with MIDI note {}".format(
332      nucleotide, nucleotide_midi_map[nucleotide]))
333  send_note(nucleotide_midi_map[nucleotide], 127)
334
335del player
336pygame.midi.quit()
337```
338
339<video src="/assets/posts/dna-synthesized/elektron/elektron.mp4" controls></video>
340
341All of this could be made much more interesting if I choose different
342instruments for different Nucleotides, or doing more funky stuff with Elektron.
343But for now, this should be enough. It is just a proof of concept. Something to
344play around with.
345
346## Going even further
347
348As you probably notice, the end results are quite similar to each other. This is
349to be expected because we are operating only with 4 notes essentially. What
350could make this more interesting is using something like
351[Supercollider](https://supercollider.github.io/) to create more interesting
352sounds. By transposing notes or using effects based on repeated data in a
353sequence. Possibilities are endless.
354
355It is really astonishing what can be achieved with a little bit of code and an
356idea. I could see this becoming an interesting background soundscape instrument
357if done properly. It could replace random note generator with something more
358intriguing, biological, natural.
359
360I actually find the results fascinating. I took some time and listened to this
361music of nature. Even though it's quite the same, it's also quite different.
362The subtle differences on repeat kind of creates music on its own. Makes you
363wonder. It kind of puts Occam’s Razor in its place. Nature for sure loves to
364make things as energy efficient as possible.