aboutsummaryrefslogtreecommitdiff
path: root/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
diff options
context:
space:
mode:
authorMitja Felicijan <mitja.felicijan@gmail.com>2022-08-27 14:05:48 +0200
committerMitja Felicijan <mitja.felicijan@gmail.com>2022-08-27 14:05:48 +0200
commit9f5454bda6299db43a4e9de5b3716471388b81d9 (patch)
tree1ceedf64a4517a372d70efc2b6f4bbd9478ce792 /content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
parente728c3a2cbd06d95cd1226d3b23473816bd0d67e (diff)
downloadmitjafelicijan.com-9f5454bda6299db43a4e9de5b3716471388b81d9.tar.gz
Move blog to Hugo
Diffstat (limited to 'content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md')
-rw-r--r--content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md300
1 files changed, 300 insertions, 0 deletions
diff --git a/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md b/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
new file mode 100644
index 0000000..c028ec2
--- /dev/null
+++ b/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
@@ -0,0 +1,300 @@
1---
2title: What would DNA sound if synthesized to an audio file
3url: what-would-dna-sound-if-synthesized.html
4date: 2022-07-05
5draft: false
6---
7
8**Table of contents**
9
101. [Introduction](#introduction)
112. [DNA encoding and primer example](#dna-encoding-and-primer-example)
123. [Parsing DNA data](#parsing-dna-data)
134. [Generating sine wave](#generating-sine-wave)
145. [Generating a WAV file from accumulated sine waves](#generating-a-wav-file-from-accumulated-sine-waves)
156. [Generating Spectograms](#generating-spectograms)
167. [Pre-generated sequences](#pre-generated-sequences)
17 1. [Niels Bohr quote](#niels-bohr-quote)
18 2. [Mouse](#mouse)
19 3. [Bison](#bison)
20 4. [Taurus](#taurus)
218. [Making a drummer out of a DNA sequence](#making-a-drummer-out-of-a-dna-sequence)
229. [Going even further](#going-even-further)
23
24## Introduction
25
26Lately, I have been thinking a lot about the nature of life, what are the foundation blocks of life and things like that. It's remarkable how complex and on the other hand simple the creation is when you look at it. The miracle of life keeps us grounded when our imagination goes wild. If the DNA are the blocks of life, you could consider them to be an API nature provided us to better understand all of this chaos masquerading as order.
27
28I have been reading a lot about superintelligence and our somehow misguided path to create general artificial intelligence. What would the building blocks or our creation look like? Is the compression really the ultimate storage of information? Will our creation also ponder this questions when creating new worlds for themselves, or will we just disappear into the vastness of possibilities? It is a little offensive that we are playing God whilst being completely ignorant of our own reality. Who knows! Like many other breakthroughs, this one will also come at a cost not known to us when it finally happens.
29
30To keep things a bit lighter, I decided to convert some popular DNA sequences into an audio files for us to listen to. I am not the first one, nor I will be the last one to do this. But it is an interesting exercise in better understanding the relationship between art and science. Maybe listening to DNA instead of parsing it will find a way into better understanding, or at least enjoying the creation and cryptic nature of life.
31
32## DNA encoding and primer example
33
34I have been exploring DNA in the past in my post from about 3 years ago in [Encoding binary data into DNA sequence](/encoding-binary-data-into-dna-sequence.html) where I have been converting all sorts of data into DNA sequences.
35
36This will be a similar exercise but instead of converting to DNA, I will be generating tones from Nucleotides.
37
38| Nucleotides | Note | Frequency |
39| ---------------- | ---- | --------- |
40| **A** (Adenine) | A | 440 Hz |
41| **C** (Cytosine) | C | 783.99 Hz |
42| **G** (Guanine) | G | 523.25 Hz |
43| **T** (Thymine) | D | 587.33 Hz |
44
45Since we do not have T in equal-tempered scale, I choose D to represent T note.
46
47You can check [Frequencies for equal-tempered scale, A4 = 440 Hz](https://pages.mtu.edu/~suits/notefreqs.html). For this tuning, we also choose `Speed of Sound = 345 m/s = 1130 ft/s = 770 miles/hr`.
48
49Now that we have this out of the way, we can also brush up on the DNA sequencing a bit. This is a famous quote I also used for the encoding tests, and it goes like this.
50
51> How wonderful that we have met with a paradox. Now we have some hope of making progress.
52> ― Niels Bohr
53
54```shell
55>SEQ1
56GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA
57GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA
58ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA
59ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT
60GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT
61GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC
62AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC
63AACC
64```
65
66This is what we gonna work with to get things rolling forward, when creating parser and waveform generator.
67
68## Parsing DNA data
69
70This step is rather simple one. All we need to do is parse input DNA sequence in [FASTA format](https://en.wikipedia.org/wiki/FASTA_format) well known in [Bioinformatics](https://en.wikipedia.org/wiki/Bioinformatics) to extract single Nucleotides that will be converted into separate tones based on equal-tempered scale explained above.
71
72```python
73nucleotide_tone_map = {
74 'A': 440,
75 'C': 523.25,
76 'G': 783.99,
77 'T': 587.33, # converted to D
78}
79
80def split(word):
81 return [char for char in word]
82
83def generate_from_dna_sequence(sequence):
84 for nucleotide in split(sequence):
85 print(nucleotide, nucleotide_tone_map[nucleotide])
86```
87
88## Generating sine wave
89
90Because we are essentially creating a long stream of notes we will be appending sine notes to a global array we will later use for creating a WAV file out of it.
91
92```python
93import math
94
95def append_sinewave(freq=440.0, duration_milliseconds=500, volume=1.0):
96 global audio
97
98 num_samples = duration_milliseconds * (sample_rate / 1000.0)
99
100 for x in range(int(num_samples)):
101 audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate)))
102
103 return
104```
105
106The sine wave generated here is the standard beep. If you want something more aggressive, you could try a square or saw tooth waveform.
107
108## Generating a WAV file from accumulated sine waves
109
110
111```python
112import wave
113import struct
114
115def save_wav(file_name):
116 wav_file = wave.open(file_name, 'w')
117 nchannels = 1
118 sampwidth = 2
119
120 nframes = len(audio)
121 comptype = 'NONE'
122 compname = 'not compressed'
123 wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
124
125 for sample in audio:
126 wav_file.writeframes(struct.pack('h', int(sample * 32767.0)))
127
128 wav_file.close()
129```
130
13144100 is the industry standard sample rate - CD quality. If you need to save on file size, you can adjust it downwards. The standard for low quality is, 8000 or 8kHz.
132
133WAV files here are using short, 16 bit, signed integers for the sample size. So, we multiply the floating-point data we have by 32767, the maximum value for a short integer.
134
135> It is theoretically possible to use the floating point -1.0 to 1.0 data directly in a WAV file, but not obvious how to do that using the wave module in Python.
136
137## Generating Spectograms
138
139I have tried two methods of doing this and both were just fine. I however opted out to use the [SoX - Sound eXchange, the Swiss Army knife of audio manipulation](https://linux.die.net/man/1/sox) one because it didn't require anything else.
140
141```shell
142sox output.wav -n spectrogram -o spectrogram.png
143```
144
145An example spectrogram of Ludwig van Beethoven Symphony No. 6 First movement.
146
147<audio controls>
148 <source src="/dna-synthesized/symphony-no6-1st-movement.mp3" type="audio/mpeg">
149</audio>
150
151![Ludwig van Beethoven Symphony No. 6 First movement](/dna-synthesized/symphony-no6-1st-movement.png)
152
153The other option could also be in combination with [gnuplot](http://www.gnuplot.info/). This would require an intermediary step, however.
154
155```shell
156sox output.wav audio.dat
157tail -n+3 audio.dat > audio_only.dat
158gnuplot audio.gpi
159```
160
161And input file `audio.gpi` that would be passed to gnuplot looks something like this.
162
163```
164# set output format and size
165set term png size 1000,280
166
167# set output file
168set output "audio.png"
169
170# set y range
171set yr [-1:1]
172
173# we want just the data
174unset key
175unset tics
176unset border
177set lmargin 0
178set rmargin 0
179set tmargin 0
180set bmargin 0
181
182# draw rectangle to change background color
183set obj 1 rectangle behind from screen 0,0 to screen 1,1
184set obj 1 fillstyle solid 1.0 fillcolor rgbcolor "#ffffff"
185
186# draw data with foreground color
187plot "audio_only.dat" with lines lt rgb 'red'
188```
189
190## Pre-generated sequences
191
192What I did was take interesting parts from an animal's genome and feed it to a tone generator script. This then generated a WAV file and I converted those to MP3, so they can be played in a browser. The last step was creating a spectrogram based on a WAV file.
193
194### Niels Bohr quote
195
196<audio controls>
197 <source src="/dna-synthesized/quote/out.mp3" type="audio/mpeg">
198</audio>
199
200![Spectogram](/dna-synthesized/quote/spectogram.png)
201
202### Mouse
203
204This is part of a mouse genome `Mus_musculus.GRCm39.dna.nonchromosomal`. You can get [genom data here](http://ftp.ensembl.org/pub/release-106/fasta/mus_musculus/dna/).
205
206<audio controls>
207 <source src="/dna-synthesized/mouse/out.mp3" type="audio/mpeg">
208</audio>
209
210![Spectogram](/dna-synthesized/mouse/spectogram.png)
211
212### Bison
213
214This is part of a bison genome `Bison_bison_bison.Bison_UMD1.0.cdna`. You can get [genom data here](http://ftp.ensembl.org/pub/release-106/fasta/bison_bison_bison/cdna/).
215
216<audio controls>
217 <source src="/dna-synthesized/bison/out.mp3" type="audio/mpeg">
218</audio>
219
220![Spectogram](/dna-synthesized/bison/spectogram.png)
221
222### Taurus
223
224This is part of a taurus genome `Bos_taurus.ARS-UCD1.2.cdna`. You can get [genom data here](http://ftp.ensembl.org/pub/release-106/fasta/bos_taurus/cdna/).
225
226<audio controls>
227 <source src="/dna-synthesized/taurus/out.mp3" type="audio/mpeg">
228</audio>
229
230![Spectogram](/dna-synthesized/taurus/spectogram.png)
231
232## Making a drummer out of a DNA sequence
233
234To make things even more interesting, I decided to send this data via MIDI to my [Elektron Model:Samples](https://www.elektron.se/en/model-samples). This is a really cool piece of equipment that supports MIDI in via USB and 3.5 mm audio jack.
235
236Elektron is connected to my MacBook via USB cable and audio out is patched to a Sony Bluetooth speaker I have that supports 3.5 mm audio in. Elektron doesn't have internal speakers.
237
238![](/dna-synthesized/elektron/IMG_0619.jpg)
239
240![](/dna-synthesized/elektron/IMG_0620.jpg)
241
242![](/dna-synthesized/elektron/IMG_0622.jpg)
243
244For communicating with Elektron, I choose `pygame` Python module that has MIDI built in. With this, it was rather simple to send notes to the device. All I did was map MIDI notes to the actual Nucleotides.
245
246Before all of this I also checked Audio MIDI Setup app under MacOS and checked MIDI Studio by pressing ⌘-2.
247
248![](/dna-synthesized/elektron/midi-studio.jpg)
249
250The whole script that parses and send notes to the Elektron looks like this.
251
252```python
253import pygame.midi
254import time
255
256pygame.midi.init()
257
258print(pygame.midi.get_default_output_id())
259print(pygame.midi.get_device_info(0))
260
261player = pygame.midi.Output(1)
262player.set_instrument(2)
263
264def send_note(note, velocity):
265 global player
266 player.note_on(note, velocity)
267 time.sleep(0.3)
268 player.note_off(note, velocity)
269
270
271nucleotide_midi_map = {
272 'A': 60,
273 'C': 90,
274 'G': 160,
275 'T': 180, # is D
276}
277
278with open("quote.fa") as f:
279 sequence = f.read().replace('\n', '')
280
281for nucleotide in [char for char in sequence]:
282 print("Playing nucleotide {} with MIDI note {}".format(
283 nucleotide, nucleotide_midi_map[nucleotide]))
284 send_note(nucleotide_midi_map[nucleotide], 127)
285
286del player
287pygame.midi.quit()
288```
289
290<video src="/dna-synthesized/elektron/elektron.mp4" controls></video>
291
292All of this could be made much more interesting if I choose different instruments for different Nucleotides, or doing more funky stuff with Elektron. But for now, this should be enough. It is just a proof of concept. Something to play around with.
293
294## Going even further
295
296As you probably notice, the end results are quite similar to each other. This is to be expected because we are operating only with 4 notes essentially. What could make this more interesting is using something like [Supercollider](https://supercollider.github.io/) to create more interesting sounds. By transposing notes or using effects based on repeated data in a sequence. Possibilities are endless.
297
298It is really astonishing what can be achieved with a little bit of code and an idea. I could see this becoming an interesting background soundscape instrument if done properly. It could replace random note generator with something more intriguing, biological, natural.
299
300I actually find the results fascinating. I took some time and listened to this music of nature. Even though it's quite the same, it's also quite different. The subtle differences on repeat kind of creates music on its own. Makes you wonder. It kind of puts Occam’s Razor in its place. Nature for sure loves to make things as energy efficient as possible.