Fun with sine waves: DEF CON 2025 quals echoid writeup

11 minute read – posted on April 13, 2025 by Zack Orndorff

Overview

I had a great time playing quals this year with Shellphish. I usually try to write full-length stories for my writeups (see my blog archives for a few), but this one will be a bit less complete for time reasons.

Challenge description

Are you hearing what I’m hearing?

HOST: echoid-7xf5f2sbdofl6.shellweplayaga.me

PORT: 1337

Files:

echoid

fingerprints.db

$ nc echoid-7xf5f2sbdofl6.shellweplayaga.me 1337
Ticket please: ticket{redacted}
Send us your song to be identified
asdf
ERROR: Invalid input size: 1717859169

Source from the organizers: https://github.com/Nautilus-Institute/quals-2025/tree/main/echoid

The challenge was released Saturday night. My teammates worked out that echoid was a compiled Crystal binary that functioned as an audio fingerprinter. The remote called echoid find and passed your input.

usage: audio_fingerprinter <command> [options]
    add                              Add a song to the database
    find                             Find a matching song from a sample provided on stdin
    list                             List all of the songs in the database
    update                           Update the details of a song in the database
    clear                            Clear all the songs from the database
    help                             Show this help
    version                          Show version
    -i FILE, --input=FILE            Input audio file
    -a NAME, --artist=NAME           Artist name
    -t TITLE, --title=TITLE          Song title
    -s ID, --song=ID                 Song identifier

The sqlite database contains a song fingerprint, ostensibly for a song related to the flag.

$ sqlite3 fingerprints.db
sqlite> .schema
CREATE TABLE songs (
                    id TEXT PRIMARY KEY,
                    artist TEXT NOT NULL,
                    title TEXT NOT NULL,
                    added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
                );
CREATE TABLE fingerprints (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    song TEXT NOT NULL,
                    hash INTEGER NOT NULL,
                    offset INTEGER NOT NULL,
                    FOREIGN KEY (song) REFERENCES songs(id),
                    UNIQUE (song, hash, offset)
                );
CREATE TABLE sqlite_sequence(name,seq);
CREATE INDEX idx_fingerprints_hash ON fingerprints (hash);
sqlite> select * from songs;
33ebede1-26ee-4521-8300-ef8eb8b9ffd8|Nautilus Institute|flag{contact_support}|2025-04-12 14:54:21
sqlite> select * from fingerprints limit 5;
1|33ebede1-26ee-4521-8300-ef8eb8b9ffd8|1859126912|145
2|33ebede1-26ee-4521-8300-ef8eb8b9ffd8|1435502464|145
3|33ebede1-26ee-4521-8300-ef8eb8b9ffd8|894437824|145
4|33ebede1-26ee-4521-8300-ef8eb8b9ffd8|49285632|145
5|33ebede1-26ee-4521-8300-ef8eb8b9ffd8|49285632|4812

Presumably we need to construct an input such that the fingerprint will match so the server will give us the flag that is seemingly in the title.

My teammates did a bunch of binary RE and poked at the binary and concluded it took a WAV file as input and resampled it to 11025 samples/second before doing some sort of math on it.

We learned that you can create wav files via your standard Python CTF nonsense, and also via the sox command line tool like so: sox -n -r 11025 -c 1 output.wav synth 1 sine 440.

The binary would call out “low-confidence” matches pretty easily, but it was unclear what was required to trigger a “high-confidence” match, which would print the song title.

Black-box reverse engineering

It’s funny – as a student I never paid any mind to black-box reversing. I figured you’d want as much info available to you as possible, so why not tear the binary to pieces and learn every last bit you can. However, this is now the second writeup on my blog where I said “screw it” to binary RE and decided to take another path. I opened the binary, saw a bunch of SIMD nonsense in the function handling the fingerprints, and decided “not today”.

I picked up around here in the timeline, after lunch on Sunday. One of my teammates pointed out to me this chal was worth a lot of points (ended up being ~324 points, which actually seems a tad overvalued to me) so I decided to look at it.

I remembered the existence of SongRec, an open-source Shazam implementation, so I googled for it to get some inspiration. Their overview of the algorithm mentioned looking for frequency peaks in various parts of the spectrogram, which lined up just enough with the notes we had from binary RE that I decided to use this as a working guess of sorts.

Triage

Apparently I woke up in the mood for SQL queries, because I started trying to make heads/tails of the fingerprint provided in the handout using the sqlite3 shell.

me: looking at sample data, it looks like each song offset tends to have 15 “fingerprints”, but some outliers have different numbers of fingerprints

sqlite> select count(count), count  from ( select count(*) as count, offset from fingerprints where song = "33ebede1-26ee-4521-8300-ef8eb8b9ffd8" group by offset) group by count order by count desc;
364|15
2|14
1|11
4|9
3|5
2|2

A teammate tested a plain sine wave and noticed the results weren’t all the same. So there’s a fudge factor somewhere due to numerical error.

$ sox -n comp-3.wav synth 10.0 sine 1477 channels 1
$ ./echoid add -i comp-3.wav -a asfd -s asdf -t comp3
Successfully added 'comp3' by asfd to the database
$ sqlite3 fingerprints.db
sqlite> select * from songs;
20e11893-971d-4547-9fad-79cc96c11f2a|asfd|7h1|2025-04-13 17:36:14
71bcb98e-6803-4e20-a4a3-b19fabeb940a|asfd|comp3|2025-04-13 17:39:15
sqlite> select max(hash) - min(hash) from fingerprints where song == "71bcb98e-6803-4e20-a4a3-b19fabeb940a";
28992

Next, I wanted to know if the hash function was linear. Is there actually some crypto / obfuscation we have to deal with, or can we do some algebra and avoid it? I generated two .wav files: one with a 440Hz sine wave and another with an 880Hz sine wave. I then paired up the “hashes” in their respective fingerprints, and found they were near-perfectly double.

sqlite> select a_row, a_hash, b_hash, cast(a_hash as real) / b_hash from (select (row_number() over (order by '')) as a_row, hash as a_hash from fingerprints where song = '8989bf16-c040-4e4a-abe4-c7bcef947df6' group by hash order by hash), (select (row_number() over (order by '')) as b_row, hash as b_hash from fingerprints where song = '3d46c8ce-066c-475b-a742-b6e54c65a75d' group by hash order by hash) where a_row == b_row;
1|461375808|923796928|0.499434230636454
2|461376384|923797824|0.499434369743655
3|461378176|923798400|0.499435998157174
4|461378752|923799296|0.499436137262438
5|461381120|923800768|0.499437904775589
6|461381696|923801664|0.499438043878648
7|461383488|923802240|0.499439672283107
8|461384064|923803136|0.49943981138423
9|461386432|923804608|0.499441578884179
10|461387008|923805504|0.499441717983096
11|461388800|923806080|0.499443346378495
12|461389376|923806976|0.499443485475477
<snip>
24|461405312|923817600|0.499454991981101

We then tried a whole bunch of stuff, learning by trial and error that:

Some combinations of frequency, bit depth, and sample rate would yield a match with themselves, some wouldn’t. We still aren’t sure why.
We tried a bunch of DTMF tones hoping they’d yield results. We got a few partial matches but nothing super panned out.
Eventually I tried double 880Hz and found that the hash doubled again, more confirmation for the “hash ~= frequency” theory

Pretty pictures

Recently I found myself trying to write a Morse code decoder, and one of my takeaways from that experience was that you can’t debug signals processing code via printf. (At least I can’t.) Visualizations are important. So I graphed the “flag” song via matplotlib:

the flag song graphed

Here “offset” (whatever that is) is on the x-axis and “hash” is on the y-axis. We see some tones, but more than one or two. I was expecting to see some obvious peaks along with noise from an imperfect FFT, but this is not that. It’s still plausibly the output of an FFT with some calculations after it, though.

from collections import defaultdict
import matplotlib.pyplot as plt
import sqlite3
conn = sqlite3.connect("./test.db")
conn.row_factory = sqlite3.Row
c = conn.cursor()
c.execute('select * from songs where artist = "sox";')
#c.execute('select * from songs where artist = "Nautilus Institute";')
songs = []
for song in c.fetchall():
    songs.append(dict(song))

for song in songs:
    songid = song["id"]
    song_title = song["title"]
    c.execute('select offset, hash from fingerprints where song = ? order by offset', (songid,))

    last = None
    offset_freqs = defaultdict(list)
    for r in c.fetchall():
        if last is None:
            last = r["offset"]
        offset_freqs[r["offset"]].append(r["hash"])

        if r["offset"] != last:
            last = r["offset"]
    xs = []
    ys = []
    for k, items in offset_freqs.items():
        for i in items:
            xs.append(k)
            ys.append(i)
    plt.title(song_title)
    plt.scatter(xs, ys)
    plt.show()

We still had an open question of how much processing was being done on the output of the FFT I assumed was there. So we made some more test files. Here’s one with a 440Hz tone then an 880 Hz tone, one after the other, 30 seconds per segment.

$ sox -n -r 11025 -c 1 440_11025_30.wav synth 30 sine 440
$ sox -n -r 11025 -c 1 880_11025_30.wav synth 30 sine 880
$ sox 440_11025_30.wav 880_11025_30.wav 440_then_880_11025_60.wav
$ ./echoid add -i 440_then_880_11025_60.wav -t 440_then_880_11025_60 -a sox

a graph with a line across the bottom to the middle, then a jump to the top to the end

This made it pretty clear the x-axis was milliseconds and the y-axis was correlated to frequency.

So in theory we just need to figure out the mapping for frequencies and match the “flag” song. Hopefully it will match.

Building a lookup table

I decided to just build a lookup table, 10Hz at a time:

#!/bin/bash
set -euo pipefail

length=20
for i in $(seq 150 10 5400)
do
filename=${i}_11025_${length}
echo $filename
sox -n -r 11025 -c 1 $filename.wav synth $length sine $i
./echoid add -i ${filename}.wav -t $filename -a sox
done

Then wrote a script to parse & graph it. I arbitrarily took the median of the hashes for each frequency.

from collections import defaultdict
import statistics

import matplotlib.pyplot as plt
import sqlite3

conn = sqlite3.connect("./freqs.db")
conn.row_factory = sqlite3.Row
c = conn.cursor()
c.execute('select * from songs where artist = "sox";')
#c.execute('select * from songs where artist = "Nautilus Institute";')
songs = []
for song in c.fetchall():
    songs.append(dict(song))

freqs_table = {}
for song in songs:
    songid = song["id"]
    song_title = song["title"]
    hashes = []
    c.execute('select hash from fingerprints where song = ? order by hash', (songid,))
    for r in c.fetchall():
        hashes.append(int(r["hash"]))
    if hashes:
        freqs_table[int(song_title.split("_")[0])] = statistics.median(hashes)

print(freqs_table)

xs = []
ys = []
for k, v in freqs_table.items():
    xs.append(k)
    ys.append(v)
plt.title("Frequency to hash")
plt.xlabel("Frequency (Hz)")
plt.ylabel("Hash")
plt.scatter(xs, ys)
plt.show()

This produced the following graph of the relationship. It has some discontinuities, but it’s linear. I could have done the obvious linear regression to derive the formula at this point, but I figured I’d see if I could just pick the closest point to each hash.

a graph with a zig-zag relationship between variables

Synthesizing the “flag” audio

I just needed to synthesize it. The right way is almost certainly some numpy math, but I went for the simple stupid method of generating a shell script that called sox to do it for me :)

# ...
def find_closest_freq(hashval):
    dist = 999999999
    freq = None
    for h, f in inverse_table.items():
        d = abs(hashval - h)
        if d < dist:
            dist = d
            freq = f
    if freq is None:
        raise Exception("bad")
    return freq, dist

# tuples of (offset, [freqs])
audio = []
dists = []
for i in change_offsets:
    data = offset_freqs[i]
    freqs = []
    for d in data:
        freq, dist = find_closest_freq(d)
        freqs.append(freq)
        dists.append(dist)
    audio.append((i, freqs))

#print(audio)
filenames = []
for i, (offset, freqs) in enumerate(audio):
    try:
        next_offset = audio[i+1][0]
        length = next_offset - offset
    except IndexError:
        length = 100 # arbitrarily assign 100ms
    #print(f"{length=} {freqs=}")
    filename = f"solve_{i}.wav"
    filenames.append(filename)
    sines = "sine " + ' sine '.join([str(i) for i in freqs])
    print(f"sox -n -r 11025 -c 1 {filename} synth {float(length) / 1000} {sines}")
print(f"sox {' '.join(filenames)} solve.wav")

This generated a shell script of the form

sox -n -r 11025 -c 1 solve_0.wav synth 0.04 sine 2100 sine 4430 sine 4460 sine 2560 sine 580 sine 2750 sine 2910 sine 5000 sine 1170 sine 3340 sine 3420 sine 1510 sine 3670 sine 1780 sine 3930
# ...
sox -n -r 11025 -c 1 solve_375.wav synth 0.1 sine 2520 sine 3380
sox solve_0.wav <snip> solve_375.wav solve.wav

Which actually failed the first time due to too many open files! A quick google and a ulimit -n 2048 later, and I had a .wav file that in theory matched the flag song. I listened to it and it sounded … like a bunch of random tones. It didn’t sound like a flag, or a song, or anything.

It did sound like tones, so I picked up my teammate’s pwntools script to throw it at the echoid binary and it passed the check and printed the test flag! I flipped the parameter to throw it at the remote and got the real flag!

Thanks to all of my Shellphish teammates for playing along. I couldn’t get everyone’s contributions into this writeup, but there were more than just me working this for sure :)

Full .wav generation script

I snipped this script above for readability: here it is in full. To run it you need the fingerprints.db from the challenge handout plus the ./fingerprints.db from my 10Hz granularity shell script above, renamed to ./freqs.db.

from collections import defaultdict
import statistics

import matplotlib.pyplot as plt
import sqlite3

def get_lookup_table(show_plot=False):
    conn = sqlite3.connect("./freqs.db")
    conn.row_factory = sqlite3.Row
    c = conn.cursor()
    c.execute('select * from songs where artist = "sox";')
    #c.execute('select * from songs where artist = "Nautilus Institute";')
    songs = []
    for song in c.fetchall():
        songs.append(dict(song))

    freqs_table = {}
    for song in songs:
        songid = song["id"]
        song_title = song["title"]
        hashes = []
        c.execute('select hash from fingerprints where song = ? order by hash', (songid,))
        for r in c.fetchall():
            hashes.append(int(r["hash"]))
        if hashes:
            freqs_table[int(song_title.split("_")[0])] = statistics.median(hashes)

    if show_plot:
        print(freqs_table)

        xs = []
        ys = []
        for k, v in freqs_table.items():
            xs.append(k)
            ys.append(v)
        plt.title("Frequency to hash")
        plt.xlabel("Frequency (Hz)")
        plt.ylabel("Hash")
        plt.scatter(xs, ys)
        plt.show()
    return freqs_table

freqs_table = get_lookup_table()
# hash to freq
inverse_table = {v:k for k,v in freqs_table.items()}

conn = sqlite3.connect("./fingerprints.db")
conn.row_factory = sqlite3.Row
c = conn.cursor()
#c.execute('select * from songs where artist = "sox";')
c.execute('select * from songs where artist = "Nautilus Institute";')
song = dict(c.fetchone())

songid = song["id"]
song_title = song["title"]
c.execute('select offset, hash from fingerprints where song = ? order by offset', (songid,))

last = None
offset_freqs = defaultdict(list)
change_offsets = []
for r in c.fetchall():
    offset_freqs[r["offset"]].append(r["hash"])

    if r["offset"] != last:
        change_offsets.append(r["offset"])
        last = r["offset"]

print(f"{change_offsets=}")

def find_closest_freq(hashval):
    dist = 999999999
    freq = None
    for h, f in inverse_table.items():
        d = abs(hashval - h)
        if d < dist:
            dist = d
            freq = f
    if freq is None:
        raise Exception("bad")
    return freq, dist

# tuples of (offset, [freqs])
audio = []
dists = []
for i in change_offsets:
    data = offset_freqs[i]
    freqs = []
    for d in data:
        freq, dist = find_closest_freq(d)
        freqs.append(freq)
        dists.append(dist)
    audio.append((i, freqs))

#print(audio)
filenames = []
for i, (offset, freqs) in enumerate(audio):
    try:
        next_offset = audio[i+1][0]
        length = next_offset - offset
    except IndexError:
        length = 100 # arbitrarily assign 100ms
    #print(f"{length=} {freqs=}")
    filename = f"solve_{i}.wav"
    filenames.append(filename)
    sines = "sine " + ' sine '.join([str(i) for i in freqs])
    print(f"sox -n -r 11025 -c 1 {filename} synth {float(length) / 1000} {sines}")
print(f"sox {' '.join(filenames)} solve.wav")

Categories: Ctf Security

Tags: Ctf Defcon Dsp Writeup