diff options
Diffstat (limited to 'content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md')
| -rw-r--r-- | content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md | 364 |
1 files changed, 364 insertions, 0 deletions
diff --git a/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md b/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md new file mode 100644 index 0000000..968341c --- /dev/null +++ b/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md | |||
| @@ -0,0 +1,364 @@ | |||
| 1 | --- | ||
| 2 | title: What would DNA sound if synthesized to an audio file | ||
| 3 | url: what-would-dna-sound-if-synthesized.html | ||
| 4 | date: 2022-07-05T12:00:00+02:00 | ||
| 5 | type: post | ||
| 6 | draft: false | ||
| 7 | --- | ||
| 8 | |||
| 9 | ## Introduction | ||
| 10 | |||
| 11 | Lately, I have been thinking a lot about the nature of life, what are the | ||
| 12 | foundation blocks of life and things like that. It's remarkable how complex and | ||
| 13 | on the other hand simple the creation is when you look at it. The miracle of | ||
| 14 | life keeps us grounded when our imagination goes wild. If the DNA are the blocks | ||
| 15 | of life, you could consider them to be an API nature provided us to better | ||
| 16 | understand all of this chaos masquerading as order. | ||
| 17 | |||
| 18 | I have been reading a lot about superintelligence and our somehow misguided path | ||
| 19 | to create general artificial intelligence. What would the building blocks or our | ||
| 20 | creation look like? Is the compression really the ultimate storage of | ||
| 21 | information? Will our creation also ponder this questions when creating new | ||
| 22 | worlds for themselves, or will we just disappear into the vastness of | ||
| 23 | possibilities? It is a little offensive that we are playing God whilst being | ||
| 24 | completely ignorant of our own reality. Who knows! Like many other | ||
| 25 | breakthroughs, this one will also come at a cost not known to us when it finally | ||
| 26 | happens. | ||
| 27 | |||
| 28 | To keep things a bit lighter, I decided to convert some popular DNA sequences | ||
| 29 | into an audio files for us to listen to. I am not the first one, nor I will be | ||
| 30 | the last one to do this. But it is an interesting exercise in better | ||
| 31 | understanding the relationship between art and science. Maybe listening to DNA | ||
| 32 | instead of parsing it will find a way into better understanding, or at least | ||
| 33 | enjoying the creation and cryptic nature of life. | ||
| 34 | |||
| 35 | ## DNA encoding and primer example | ||
| 36 | |||
| 37 | I have been exploring DNA in the past in my post from about 3 years ago in | ||
| 38 | [Encoding binary data into DNA | ||
| 39 | sequence](/encoding-binary-data-into-dna-sequence.html) where I have been | ||
| 40 | converting all sorts of data into DNA sequences. | ||
| 41 | |||
| 42 | This will be a similar exercise but instead of converting to DNA, I will be | ||
| 43 | generating tones from Nucleotides. | ||
| 44 | |||
| 45 | | Nucleotides | Note | Frequency | | ||
| 46 | | ---------------- | ---- | --------- | | ||
| 47 | | **A** (Adenine) | A | 440 Hz | | ||
| 48 | | **C** (Cytosine) | C | 783.99 Hz | | ||
| 49 | | **G** (Guanine) | G | 523.25 Hz | | ||
| 50 | | **T** (Thymine) | D | 587.33 Hz | | ||
| 51 | |||
| 52 | Since we do not have T in equal-tempered scale, I choose D to represent T note. | ||
| 53 | |||
| 54 | You can check [Frequencies for equal-tempered scale, A4 = 440 | ||
| 55 | Hz](https://pages.mtu.edu/~suits/notefreqs.html). For this tuning, we also | ||
| 56 | choose `Speed of Sound = 345 m/s = 1130 ft/s = 770 miles/hr`. | ||
| 57 | |||
| 58 | Now that we have this out of the way, we can also brush up on the DNA sequencing | ||
| 59 | a bit. This is a famous quote I also used for the encoding tests, and it goes | ||
| 60 | like this. | ||
| 61 | |||
| 62 | > How wonderful that we have met with a paradox. Now we have some hope of | ||
| 63 | > making progress. | ||
| 64 | > ― Niels Bohr | ||
| 65 | |||
| 66 | ```shell | ||
| 67 | >SEQ1 | ||
| 68 | GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA | ||
| 69 | GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA | ||
| 70 | ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA | ||
| 71 | ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT | ||
| 72 | GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT | ||
| 73 | GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC | ||
| 74 | AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC | ||
| 75 | AACC | ||
| 76 | ``` | ||
| 77 | |||
| 78 | This is what we gonna work with to get things rolling forward, when creating | ||
| 79 | parser and waveform generator. | ||
| 80 | |||
| 81 | ## Parsing DNA data | ||
| 82 | |||
| 83 | This step is rather simple one. All we need to do is parse input DNA sequence in | ||
| 84 | [FASTA format](https://en.wikipedia.org/wiki/FASTA_format) well known in | ||
| 85 | [Bioinformatics](https://en.wikipedia.org/wiki/Bioinformatics) to extract single | ||
| 86 | Nucleotides that will be converted into separate tones based on equal-tempered | ||
| 87 | scale explained above. | ||
| 88 | |||
| 89 | ```python | ||
| 90 | nucleotide_tone_map = { | ||
| 91 | 'A': 440, | ||
| 92 | 'C': 523.25, | ||
| 93 | 'G': 783.99, | ||
| 94 | 'T': 587.33, # converted to D | ||
| 95 | } | ||
| 96 | |||
| 97 | def split(word): | ||
| 98 | return [char for char in word] | ||
| 99 | |||
| 100 | def generate_from_dna_sequence(sequence): | ||
| 101 | for nucleotide in split(sequence): | ||
| 102 | print(nucleotide, nucleotide_tone_map[nucleotide]) | ||
| 103 | ``` | ||
| 104 | |||
| 105 | ## Generating sine wave | ||
| 106 | |||
| 107 | Because we are essentially creating a long stream of notes we will be appending | ||
| 108 | sine notes to a global array we will later use for creating a WAV file out of | ||
| 109 | it. | ||
| 110 | |||
| 111 | ```python | ||
| 112 | import math | ||
| 113 | |||
| 114 | def append_sinewave(freq=440.0, duration_milliseconds=500, volume=1.0): | ||
| 115 | global audio | ||
| 116 | |||
| 117 | num_samples = duration_milliseconds * (sample_rate / 1000.0) | ||
| 118 | |||
| 119 | for x in range(int(num_samples)): | ||
| 120 | audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate))) | ||
| 121 | |||
| 122 | return | ||
| 123 | ``` | ||
| 124 | |||
| 125 | The sine wave generated here is the standard beep. If you want something more | ||
| 126 | aggressive, you could try a square or saw tooth waveform. | ||
| 127 | |||
| 128 | ## Generating a WAV file from accumulated sine waves | ||
| 129 | |||
| 130 | |||
| 131 | ```python | ||
| 132 | import wave | ||
| 133 | import struct | ||
| 134 | |||
| 135 | def save_wav(file_name): | ||
| 136 | wav_file = wave.open(file_name, 'w') | ||
| 137 | nchannels = 1 | ||
| 138 | sampwidth = 2 | ||
| 139 | |||
| 140 | nframes = len(audio) | ||
| 141 | comptype = 'NONE' | ||
| 142 | compname = 'not compressed' | ||
| 143 | wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname)) | ||
| 144 | |||
| 145 | for sample in audio: | ||
| 146 | wav_file.writeframes(struct.pack('h', int(sample * 32767.0))) | ||
| 147 | |||
| 148 | wav_file.close() | ||
| 149 | ``` | ||
| 150 | |||
| 151 | 44100 is the industry standard sample rate - CD quality. If you need to save on | ||
| 152 | file size, you can adjust it downwards. The standard for low quality is, 8000 or | ||
| 153 | 8kHz. | ||
| 154 | |||
| 155 | WAV files here are using short, 16 bit, signed integers for the sample size. | ||
| 156 | So, we multiply the floating-point data we have by 32767, the maximum value for | ||
| 157 | a short integer. | ||
| 158 | |||
| 159 | > It is theoretically possible to use the floating point -1.0 to 1.0 data | ||
| 160 | > directly in a WAV file, but not obvious how to do that using the wave module | ||
| 161 | > in Python. | ||
| 162 | |||
| 163 | ## Generating Spectograms | ||
| 164 | |||
| 165 | I have tried two methods of doing this and both were just fine. I however opted | ||
| 166 | out to use the [SoX - Sound eXchange, the Swiss Army knife of audio | ||
| 167 | manipulation](https://linux.die.net/man/1/sox) one because it didn't require | ||
| 168 | anything else. | ||
| 169 | |||
| 170 | ```shell | ||
| 171 | sox output.wav -n spectrogram -o spectrogram.png | ||
| 172 | ``` | ||
| 173 | |||
| 174 | An example spectrogram of Ludwig van Beethoven Symphony No. 6 First movement. | ||
| 175 | |||
| 176 | <audio controls> | ||
| 177 | <source src="/assets/dna-synthesized/symphony-no6-1st-movement.mp3" type="audio/mpeg"> | ||
| 178 | </audio> | ||
| 179 | |||
| 180 |  | ||
| 181 | |||
| 182 | The other option could also be in combination with | ||
| 183 | [gnuplot](http://www.gnuplot.info/). This would require an intermediary step, | ||
| 184 | however. | ||
| 185 | |||
| 186 | ```shell | ||
| 187 | sox output.wav audio.dat | ||
| 188 | tail -n+3 audio.dat > audio_only.dat | ||
| 189 | gnuplot audio.gpi | ||
| 190 | ``` | ||
| 191 | |||
| 192 | And input file `audio.gpi` that would be passed to gnuplot looks something like | ||
| 193 | this. | ||
| 194 | |||
| 195 | ``` | ||
| 196 | # set output format and size | ||
| 197 | set term png size 1000,280 | ||
| 198 | |||
| 199 | # set output file | ||
| 200 | set output "audio.png" | ||
| 201 | |||
| 202 | # set y range | ||
| 203 | set yr [-1:1] | ||
| 204 | |||
| 205 | # we want just the data | ||
| 206 | unset key | ||
| 207 | unset tics | ||
| 208 | unset border | ||
| 209 | set lmargin 0 | ||
| 210 | set rmargin 0 | ||
| 211 | set tmargin 0 | ||
| 212 | set bmargin 0 | ||
| 213 | |||
| 214 | # draw rectangle to change background color | ||
| 215 | set obj 1 rectangle behind from screen 0,0 to screen 1,1 | ||
| 216 | set obj 1 fillstyle solid 1.0 fillcolor rgbcolor "#ffffff" | ||
| 217 | |||
| 218 | # draw data with foreground color | ||
| 219 | plot "audio_only.dat" with lines lt rgb 'red' | ||
| 220 | ``` | ||
| 221 | |||
| 222 | ## Pre-generated sequences | ||
| 223 | |||
| 224 | What I did was take interesting parts from an animal's genome and feed it to a | ||
| 225 | tone generator script. This then generated a WAV file and I converted those to | ||
| 226 | MP3, so they can be played in a browser. The last step was creating a | ||
| 227 | spectrogram based on a WAV file. | ||
| 228 | |||
| 229 | ### Niels Bohr quote | ||
| 230 | |||
| 231 | <audio controls> | ||
| 232 | <source src="/assets/dna-synthesized/quote/out.mp3" type="audio/mpeg"> | ||
| 233 | </audio> | ||
| 234 | |||
| 235 |  | ||
| 236 | |||
| 237 | ### Mouse | ||
| 238 | |||
| 239 | This is part of a mouse genome `Mus_musculus.GRCm39.dna.nonchromosomal`. You | ||
| 240 | can get [genom data | ||
| 241 | here](http://ftp.ensembl.org/pub/release-106/fasta/mus_musculus/dna/). | ||
| 242 | |||
| 243 | <audio controls> | ||
| 244 | <source src="/assets/dna-synthesized/mouse/out.mp3" type="audio/mpeg"> | ||
| 245 | </audio> | ||
| 246 | |||
| 247 |  | ||
| 248 | |||
| 249 | ### Bison | ||
| 250 | |||
| 251 | This is part of a bison genome `Bison_bison_bison.Bison_UMD1.0.cdna`. You can | ||
| 252 | get [genom data | ||
| 253 | here](http://ftp.ensembl.org/pub/release-106/fasta/bison_bison_bison/cdna/). | ||
| 254 | |||
| 255 | <audio controls> | ||
| 256 | <source src="/assets/dna-synthesized/bison/out.mp3" type="audio/mpeg"> | ||
| 257 | </audio> | ||
| 258 | |||
| 259 |  | ||
| 260 | |||
| 261 | ### Taurus | ||
| 262 | |||
| 263 | This is part of a taurus genome `Bos_taurus.ARS-UCD1.2.cdna`. You can get | ||
| 264 | [genom data | ||
| 265 | here](http://ftp.ensembl.org/pub/release-106/fasta/bos_taurus/cdna/). | ||
| 266 | |||
| 267 | <audio controls> | ||
| 268 | <source src="/assets/dna-synthesized/taurus/out.mp3" type="audio/mpeg"> | ||
| 269 | </audio> | ||
| 270 | |||
| 271 |  | ||
| 272 | |||
| 273 | ## Making a drummer out of a DNA sequence | ||
| 274 | |||
| 275 | To make things even more interesting, I decided to send this data via MIDI to my | ||
| 276 | [Elektron Model:Samples](https://www.elektron.se/en/model-samples). This is a | ||
| 277 | really cool piece of equipment that supports MIDI in via USB and 3.5 mm audio | ||
| 278 | jack. | ||
| 279 | |||
| 280 | Elektron is connected to my MacBook via USB cable and audio out is patched to a | ||
| 281 | Sony Bluetooth speaker I have that supports 3.5 mm audio in. Elektron doesn't | ||
| 282 | have internal speakers. | ||
| 283 | |||
| 284 |  | ||
| 285 | |||
| 286 |  | ||
| 287 | |||
| 288 |  | ||
| 289 | |||
| 290 | For communicating with Elektron, I choose `pygame` Python module that has MIDI | ||
| 291 | built in. With this, it was rather simple to send notes to the device. All I did | ||
| 292 | was map MIDI notes to the actual Nucleotides. | ||
| 293 | |||
| 294 | Before all of this I also checked Audio MIDI Setup app under MacOS and checked | ||
| 295 | MIDI Studio by pressing ⌘-2. | ||
| 296 | |||
| 297 |  | ||
| 298 | |||
| 299 | The whole script that parses and send notes to the Elektron looks like this. | ||
| 300 | |||
| 301 | ```python | ||
| 302 | import pygame.midi | ||
| 303 | import time | ||
| 304 | |||
| 305 | pygame.midi.init() | ||
| 306 | |||
| 307 | print(pygame.midi.get_default_output_id()) | ||
| 308 | print(pygame.midi.get_device_info(0)) | ||
| 309 | |||
| 310 | player = pygame.midi.Output(1) | ||
| 311 | player.set_instrument(2) | ||
| 312 | |||
| 313 | def send_note(note, velocity): | ||
| 314 | global player | ||
| 315 | player.note_on(note, velocity) | ||
| 316 | time.sleep(0.3) | ||
| 317 | player.note_off(note, velocity) | ||
| 318 | |||
| 319 | |||
| 320 | nucleotide_midi_map = { | ||
| 321 | 'A': 60, | ||
| 322 | 'C': 90, | ||
| 323 | 'G': 160, | ||
| 324 | 'T': 180, # is D | ||
| 325 | } | ||
| 326 | |||
| 327 | with open("quote.fa") as f: | ||
| 328 | sequence = f.read().replace('\n', '') | ||
| 329 | |||
| 330 | for nucleotide in [char for char in sequence]: | ||
| 331 | print("Playing nucleotide {} with MIDI note {}".format( | ||
| 332 | nucleotide, nucleotide_midi_map[nucleotide])) | ||
| 333 | send_note(nucleotide_midi_map[nucleotide], 127) | ||
| 334 | |||
| 335 | del player | ||
| 336 | pygame.midi.quit() | ||
| 337 | ``` | ||
| 338 | |||
| 339 | <video src="/assets/dna-synthesized/elektron/elektron.mp4" controls></video> | ||
| 340 | |||
| 341 | All of this could be made much more interesting if I choose different | ||
| 342 | instruments for different Nucleotides, or doing more funky stuff with Elektron. | ||
| 343 | But for now, this should be enough. It is just a proof of concept. Something to | ||
| 344 | play around with. | ||
| 345 | |||
| 346 | ## Going even further | ||
| 347 | |||
| 348 | As you probably notice, the end results are quite similar to each other. This is | ||
| 349 | to be expected because we are operating only with 4 notes essentially. What | ||
| 350 | could make this more interesting is using something like | ||
| 351 | [Supercollider](https://supercollider.github.io/) to create more interesting | ||
| 352 | sounds. By transposing notes or using effects based on repeated data in a | ||
| 353 | sequence. Possibilities are endless. | ||
| 354 | |||
| 355 | It is really astonishing what can be achieved with a little bit of code and an | ||
| 356 | idea. I could see this becoming an interesting background soundscape instrument | ||
| 357 | if done properly. It could replace random note generator with something more | ||
| 358 | intriguing, biological, natural. | ||
| 359 | |||
| 360 | I actually find the results fascinating. I took some time and listened to this | ||
| 361 | music of nature. Even though it's quite the same, it's also quite different. | ||
| 362 | The subtle differences on repeat kind of creates music on its own. Makes you | ||
| 363 | wonder. It kind of puts Occam’s Razor in its place. Nature for sure loves to | ||
| 364 | make things as energy efficient as possible. | ||
