Fun with sine waves: DEF CON 2025 quals echoid writeup
Overview
I had a great time playing quals this year with Shellphish. I usually try to write full-length stories for my writeups (see my blog archives for a few), but this one will be a bit less complete for time reasons.
Challenge description
Are you hearing what I’m hearing?
HOST: echoid-7xf5f2sbdofl6.shellweplayaga.me
PORT: 1337
Files:
- echoid
- fingerprints.db
$ nc echoid-7xf5f2sbdofl6.shellweplayaga.me 1337
Ticket please: ticket{redacted}
Send us your song to be identified
asdf
ERROR: Invalid input size: 1717859169
Source from the organizers: https://github.com/Nautilus-Institute/quals-2025/tree/main/echoid
The challenge was released Saturday night. My teammates worked out that echoid
was a compiled Crystal binary that functioned as an audio fingerprinter. The remote called echoid find
and passed your input.
usage: audio_fingerprinter <command> [options]
add Add a song to the database
find Find a matching song from a sample provided on stdin
list List all of the songs in the database
update Update the details of a song in the database
clear Clear all the songs from the database
help Show this help
version Show version
-i FILE, --input=FILE Input audio file
-a NAME, --artist=NAME Artist name
-t TITLE, --title=TITLE Song title
-s ID, --song=ID Song identifier
The sqlite database contains a song fingerprint, ostensibly for a song related to the flag.
$ sqlite3 fingerprints.db
sqlite> .schema
CREATE TABLE songs (
id TEXT PRIMARY KEY,
artist TEXT NOT NULL,
title TEXT NOT NULL,
added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE fingerprints (
id INTEGER PRIMARY KEY AUTOINCREMENT,
song TEXT NOT NULL,
hash INTEGER NOT NULL,
offset INTEGER NOT NULL,
FOREIGN KEY (song) REFERENCES songs(id),
UNIQUE (song, hash, offset)
);
CREATE TABLE sqlite_sequence(name,seq);
CREATE INDEX idx_fingerprints_hash ON fingerprints (hash);
sqlite> select * from songs;
33ebede1-26ee-4521-8300-ef8eb8b9ffd8|Nautilus Institute|flag{contact_support}|2025-04-12 14:54:21
sqlite> select * from fingerprints limit 5;
1|33ebede1-26ee-4521-8300-ef8eb8b9ffd8|1859126912|145
2|33ebede1-26ee-4521-8300-ef8eb8b9ffd8|1435502464|145
3|33ebede1-26ee-4521-8300-ef8eb8b9ffd8|894437824|145
4|33ebede1-26ee-4521-8300-ef8eb8b9ffd8|49285632|145
5|33ebede1-26ee-4521-8300-ef8eb8b9ffd8|49285632|4812
Presumably we need to construct an input such that the fingerprint will match so the server will give us the flag that is seemingly in the title.
My teammates did a bunch of binary RE and poked at the binary and concluded it took a WAV file as input and resampled it to 11025 samples/second before doing some sort of math on it.
We learned that you can create wav files via your standard Python CTF nonsense, and also via the sox
command line tool like so: sox -n -r 11025 -c 1 output.wav synth 1 sine 440
.
The binary would call out “low-confidence” matches pretty easily, but it was unclear what was required to trigger a “high-confidence” match, which would print the song title.
Black-box reverse engineering
It’s funny – as a student I never paid any mind to black-box reversing. I figured you’d want as much info available to you as possible, so why not tear the binary to pieces and learn every last bit you can. However, this is now the second writeup on my blog where I said “screw it” to binary RE and decided to take another path. I opened the binary, saw a bunch of SIMD nonsense in the function handling the fingerprints, and decided “not today”.
I picked up around here in the timeline, after lunch on Sunday. One of my teammates pointed out to me this chal was worth a lot of points (ended up being ~324 points, which actually seems a tad overvalued to me) so I decided to look at it.
I remembered the existence of SongRec, an open-source Shazam implementation, so I googled for it to get some inspiration. Their overview of the algorithm mentioned looking for frequency peaks in various parts of the spectrogram, which lined up just enough with the notes we had from binary RE that I decided to use this as a working guess of sorts.
Triage
Apparently I woke up in the mood for SQL queries, because I started trying to make heads/tails of the fingerprint provided in the handout using the sqlite3
shell.
me: looking at sample data, it looks like each song offset tends to have 15 “fingerprints”, but some outliers have different numbers of fingerprints
sqlite> select count(count), count from ( select count(*) as count, offset from fingerprints where song = "33ebede1-26ee-4521-8300-ef8eb8b9ffd8" group by offset) group by count order by count desc;
364|15
2|14
1|11
4|9
3|5
2|2
A teammate tested a plain sine wave and noticed the results weren’t all the same. So there’s a fudge factor somewhere due to numerical error.
$ sox -n comp-3.wav synth 10.0 sine 1477 channels 1
$ ./echoid add -i comp-3.wav -a asfd -s asdf -t comp3
Successfully added 'comp3' by asfd to the database
$ sqlite3 fingerprints.db
sqlite> select * from songs;
20e11893-971d-4547-9fad-79cc96c11f2a|asfd|7h1|2025-04-13 17:36:14
71bcb98e-6803-4e20-a4a3-b19fabeb940a|asfd|comp3|2025-04-13 17:39:15
sqlite> select max(hash) - min(hash) from fingerprints where song == "71bcb98e-6803-4e20-a4a3-b19fabeb940a";
28992
Next, I wanted to know if the hash function was linear. Is there actually some crypto / obfuscation we have to deal with, or can we do some algebra and avoid it? I generated two .wav files: one with a 440Hz sine wave and another with an 880Hz sine wave. I then paired up the “hashes” in their respective fingerprints, and found they were near-perfectly double.
sqlite> select a_row, a_hash, b_hash, cast(a_hash as real) / b_hash from (select (row_number() over (order by '')) as a_row, hash as a_hash from fingerprints where song = '8989bf16-c040-4e4a-abe4-c7bcef947df6' group by hash order by hash), (select (row_number() over (order by '')) as b_row, hash as b_hash from fingerprints where song = '3d46c8ce-066c-475b-a742-b6e54c65a75d' group by hash order by hash) where a_row == b_row;
1|461375808|923796928|0.499434230636454
2|461376384|923797824|0.499434369743655
3|461378176|923798400|0.499435998157174
4|461378752|923799296|0.499436137262438
5|461381120|923800768|0.499437904775589
6|461381696|923801664|0.499438043878648
7|461383488|923802240|0.499439672283107
8|461384064|923803136|0.49943981138423
9|461386432|923804608|0.499441578884179
10|461387008|923805504|0.499441717983096
11|461388800|923806080|0.499443346378495
12|461389376|923806976|0.499443485475477
<snip>
24|461405312|923817600|0.499454991981101
We then tried a whole bunch of stuff, learning by trial and error that:
- Some combinations of frequency, bit depth, and sample rate would yield a match with themselves, some wouldn’t. We still aren’t sure why.
- We tried a bunch of DTMF tones hoping they’d yield results. We got a few partial matches but nothing super panned out.
- Eventually I tried double 880Hz and found that the hash doubled again, more confirmation for the “hash ~= frequency” theory
Pretty pictures
Recently I found myself trying to write a Morse code decoder, and one of my takeaways from that experience was that you can’t debug signals processing code via printf
. (At least I can’t.) Visualizations are important. So I graphed the “flag” song via matplotlib:
Here “offset” (whatever that is) is on the x-axis and “hash” is on the y-axis. We see some tones, but more than one or two. I was expecting to see some obvious peaks along with noise from an imperfect FFT, but this is not that. It’s still plausibly the output of an FFT with some calculations after it, though.
from collections import defaultdict
import matplotlib.pyplot as plt
import sqlite3
conn = sqlite3.connect("./test.db")
conn.row_factory = sqlite3.Row
c = conn.cursor()
c.execute('select * from songs where artist = "sox";')
#c.execute('select * from songs where artist = "Nautilus Institute";')
songs = []
for song in c.fetchall():
songs.append(dict(song))
for song in songs:
songid = song["id"]
song_title = song["title"]
c.execute('select offset, hash from fingerprints where song = ? order by offset', (songid,))
last = None
offset_freqs = defaultdict(list)
for r in c.fetchall():
if last is None:
last = r["offset"]
offset_freqs[r["offset"]].append(r["hash"])
if r["offset"] != last:
last = r["offset"]
xs = []
ys = []
for k, items in offset_freqs.items():
for i in items:
xs.append(k)
ys.append(i)
plt.title(song_title)
plt.scatter(xs, ys)
plt.show()
We still had an open question of how much processing was being done on the output of the FFT I assumed was there. So we made some more test files. Here’s one with a 440Hz tone then an 880 Hz tone, one after the other, 30 seconds per segment.
$ sox -n -r 11025 -c 1 440_11025_30.wav synth 30 sine 440
$ sox -n -r 11025 -c 1 880_11025_30.wav synth 30 sine 880
$ sox 440_11025_30.wav 880_11025_30.wav 440_then_880_11025_60.wav
$ ./echoid add -i 440_then_880_11025_60.wav -t 440_then_880_11025_60 -a sox
This made it pretty clear the x-axis was milliseconds and the y-axis was correlated to frequency.
So in theory we just need to figure out the mapping for frequencies and match the “flag” song. Hopefully it will match.
Building a lookup table
I decided to just build a lookup table, 10Hz at a time:
#!/bin/bash
set -euo pipefail
length=20
for i in $(seq 150 10 5400)
do
filename=${i}_11025_${length}
echo $filename
sox -n -r 11025 -c 1 $filename.wav synth $length sine $i
./echoid add -i ${filename}.wav -t $filename -a sox
done
Then wrote a script to parse & graph it. I arbitrarily took the median of the hashes for each frequency.
from collections import defaultdict
import statistics
import matplotlib.pyplot as plt
import sqlite3
conn = sqlite3.connect("./freqs.db")
conn.row_factory = sqlite3.Row
c = conn.cursor()
c.execute('select * from songs where artist = "sox";')
#c.execute('select * from songs where artist = "Nautilus Institute";')
songs = []
for song in c.fetchall():
songs.append(dict(song))
freqs_table = {}
for song in songs:
songid = song["id"]
song_title = song["title"]
hashes = []
c.execute('select hash from fingerprints where song = ? order by hash', (songid,))
for r in c.fetchall():
hashes.append(int(r["hash"]))
if hashes:
freqs_table[int(song_title.split("_")[0])] = statistics.median(hashes)
print(freqs_table)
xs = []
ys = []
for k, v in freqs_table.items():
xs.append(k)
ys.append(v)
plt.title("Frequency to hash")
plt.xlabel("Frequency (Hz)")
plt.ylabel("Hash")
plt.scatter(xs, ys)
plt.show()
This produced the following graph of the relationship. It has some discontinuities, but it’s linear. I could have done the obvious linear regression to derive the formula at this point, but I figured I’d see if I could just pick the closest point to each hash.
Synthesizing the “flag” audio
I just needed to synthesize it. The right way is almost certainly some numpy
math, but I went for the simple stupid method of generating a shell script that called sox
to do it for me :)
# ...
def find_closest_freq(hashval):
dist = 999999999
freq = None
for h, f in inverse_table.items():
d = abs(hashval - h)
if d < dist:
dist = d
freq = f
if freq is None:
raise Exception("bad")
return freq, dist
# tuples of (offset, [freqs])
audio = []
dists = []
for i in change_offsets:
data = offset_freqs[i]
freqs = []
for d in data:
freq, dist = find_closest_freq(d)
freqs.append(freq)
dists.append(dist)
audio.append((i, freqs))
#print(audio)
filenames = []
for i, (offset, freqs) in enumerate(audio):
try:
next_offset = audio[i+1][0]
length = next_offset - offset
except IndexError:
length = 100 # arbitrarily assign 100ms
#print(f"{length=} {freqs=}")
filename = f"solve_{i}.wav"
filenames.append(filename)
sines = "sine " + ' sine '.join([str(i) for i in freqs])
print(f"sox -n -r 11025 -c 1 {filename} synth {float(length) / 1000} {sines}")
print(f"sox {' '.join(filenames)} solve.wav")
This generated a shell script of the form
sox -n -r 11025 -c 1 solve_0.wav synth 0.04 sine 2100 sine 4430 sine 4460 sine 2560 sine 580 sine 2750 sine 2910 sine 5000 sine 1170 sine 3340 sine 3420 sine 1510 sine 3670 sine 1780 sine 3930
# ...
sox -n -r 11025 -c 1 solve_375.wav synth 0.1 sine 2520 sine 3380
sox solve_0.wav <snip> solve_375.wav solve.wav
Which actually failed the first time due to too many open files! A quick google and a ulimit -n 2048
later, and I had a .wav file that in theory matched the flag song. I listened to it and it sounded … like a bunch of random tones. It didn’t sound like a flag, or a song, or anything.
It did sound like tones, so I picked up my teammate’s pwntools script to throw it at the echoid
binary and it passed the check and printed the test flag! I flipped the parameter to throw it at the remote and got the real flag!
Thanks to all of my Shellphish teammates for playing along. I couldn’t get everyone’s contributions into this writeup, but there were more than just me working this for sure :)
Full .wav generation script
I snipped this script above for readability: here it is in full. To run it you need the fingerprints.db
from the challenge handout plus the ./fingerprints.db
from my 10Hz granularity shell script above, renamed to ./freqs.db
.
from collections import defaultdict
import statistics
import matplotlib.pyplot as plt
import sqlite3
def get_lookup_table(show_plot=False):
conn = sqlite3.connect("./freqs.db")
conn.row_factory = sqlite3.Row
c = conn.cursor()
c.execute('select * from songs where artist = "sox";')
#c.execute('select * from songs where artist = "Nautilus Institute";')
songs = []
for song in c.fetchall():
songs.append(dict(song))
freqs_table = {}
for song in songs:
songid = song["id"]
song_title = song["title"]
hashes = []
c.execute('select hash from fingerprints where song = ? order by hash', (songid,))
for r in c.fetchall():
hashes.append(int(r["hash"]))
if hashes:
freqs_table[int(song_title.split("_")[0])] = statistics.median(hashes)
if show_plot:
print(freqs_table)
xs = []
ys = []
for k, v in freqs_table.items():
xs.append(k)
ys.append(v)
plt.title("Frequency to hash")
plt.xlabel("Frequency (Hz)")
plt.ylabel("Hash")
plt.scatter(xs, ys)
plt.show()
return freqs_table
freqs_table = get_lookup_table()
# hash to freq
inverse_table = {v:k for k,v in freqs_table.items()}
conn = sqlite3.connect("./fingerprints.db")
conn.row_factory = sqlite3.Row
c = conn.cursor()
#c.execute('select * from songs where artist = "sox";')
c.execute('select * from songs where artist = "Nautilus Institute";')
song = dict(c.fetchone())
songid = song["id"]
song_title = song["title"]
c.execute('select offset, hash from fingerprints where song = ? order by offset', (songid,))
last = None
offset_freqs = defaultdict(list)
change_offsets = []
for r in c.fetchall():
offset_freqs[r["offset"]].append(r["hash"])
if r["offset"] != last:
change_offsets.append(r["offset"])
last = r["offset"]
print(f"{change_offsets=}")
def find_closest_freq(hashval):
dist = 999999999
freq = None
for h, f in inverse_table.items():
d = abs(hashval - h)
if d < dist:
dist = d
freq = f
if freq is None:
raise Exception("bad")
return freq, dist
# tuples of (offset, [freqs])
audio = []
dists = []
for i in change_offsets:
data = offset_freqs[i]
freqs = []
for d in data:
freq, dist = find_closest_freq(d)
freqs.append(freq)
dists.append(dist)
audio.append((i, freqs))
#print(audio)
filenames = []
for i, (offset, freqs) in enumerate(audio):
try:
next_offset = audio[i+1][0]
length = next_offset - offset
except IndexError:
length = 100 # arbitrarily assign 100ms
#print(f"{length=} {freqs=}")
filename = f"solve_{i}.wav"
filenames.append(filename)
sines = "sine " + ' sine '.join([str(i) for i in freqs])
print(f"sox -n -r 11025 -c 1 {filename} synth {float(length) / 1000} {sines}")
print(f"sox {' '.join(filenames)} solve.wav")