-
Notifications
You must be signed in to change notification settings - Fork 8
Home
@d0tslash on Twitter sent me some links to footage from news helicopters with a strange sound in the background. He informed me that there might be a signal embedded, so I set to work creating this repo to decode the signal.
https://www.youtube.com/watch?v=2MprHxarmOI
https://www.facebook.com/watch/live/?v=592507485006454&ref=watch_permalink
The first step is to record the audio from the video. I chose to use pacrec to record the audio as unsigned 8 bit bytes at a rate of 44,100 Hz (samples per second). What I didn't realize the first time through was that parec was recording in stereo. This meant that when I went to process the data I treated the entire file as if it were one channel. Amazingly it still worked, but my bitrate and bandwidth were half of what they should be. Anyway, here's the command to use:
parec -r --format u8 --channels 2 > /tmp/recording_44100Hz_Stereo.u8
For those of you that have worked with SDRs you might think (as did I) that you could treat this as a single complex file. Well, don't do that as it's stupid and won't work. This is audio data, not complex IQ samples. In this case only one channel of the audio contains the signal. The other is just noise. It's not clear if you will always encounter the signal in a specific channel (left vs right) or if it can change. So I recorded both channels
For those of you that have never heard of baudline, go look it up. It's amazing! It can be a bit of a pain to setup in anything other than Ubuntu, but with a little Googling you can figure out how to get it working on your platform.
Anyway, loading the file into Baudline is pretty easy. Turn off decompression, set the sample rate to 44100, channels to 2 (not quadrature or flip complex!) and decode format to 8 bit linear (unsigned).
After opening the file you should have two different color signals overlaid on top of one another. This is the part where you figure out which channel to use. One of the channels will have a signal that's about 1200 Hz wide. In the image below, it's the magenta (or blue if you're color "challenged" like me!) channel.
Open up the input mapping and you should see that (in this case) the blue channel is channel 2
You could likely strip the first channel out in Baudline, but I used GNU Radio for that part
The next step for me was to create a GNU Radio graph
First thing to do is add blocks to open the input file, throttle it (this is important if you don't want your system to thrash trying to play the file as fast as possible), deinterleave the samples, extract out just channel 2, convert to complex floats, and then display with the Qt frequency sink
So this proves that we have the correct channel, and that the conversion to complex floats worked!
It's pretty easy to see the edges of this signal. As noted in Baudline it takes up about 1200 Hz of bandwidth. So the next step is the frequency translate and filter.
Frequency translation is just a fancy way to say move the bandwidth of interest to be centered a 0 Hz. Based on eyeballing it in Baudline, the center of the signal is at about 1,660 Hz. Don't have to be bang on right now. So the next step is to add a complex multiply to the flow graph to push the signal over a bit. For this I like to use a Qt slider so that I can move things around in realtime. Makes life easier if I need to make adjustments.
Now that the signal is roughly centered, we can filter out all the things around it. The easiest way to do that is with a low pass filter.
Notice that two more Qt range blocks were added. These control the transition width (how sharp the falloff of the filter is) and the bandwidth (frequency at which the filter begins rejecting energy).
The filter bandwidth was chosen just by eyeballing and trying to fit the signal inside the passband of the filter with a little bit of room on either side. Tighten the filter too much and demod will fail (as you're filtering out data)
We still don't know what this thing is. But, now we have it cornered, and we can poke at it with sharp sticks until it tells us exactly what it is.
Going back to Baudline, if you knock the FFT size down the the lowest value of 128 samples, you will see the following
Look closely and you'll see a little tiny square wave running vertically on the left side. That, my friends, is an FSK.
A pretty easy way to validate that hypothesis is to use the quadrature demod block (basically arctan) and see if we get a square wave looking signal on the output.
The gain parameter isn't important as it just scales the Y axis, plus, this is just for a sanity check.
Anyway, that's a decent square wave. So it's safe to assume at this point that we have an FSK!
The only parameter that really matters at this point is baud rate. Handily, the quadrature demod output will allow us to measure the baud rate.
Zooming in a bit on the time plot shows several peaks and troughs. Find a spot where there is a peak and then an immediate trough.
Subtract the two timestamps and take the absolute value
abs(13.0702-12.2049) => 0.8652999999999995
As seen in the images, the times are in milliseconds, so the output value here is also in milliseconds. We want Hz (samples per second) so we need to take the reciprocal. First thing is to convert the output value above to seconds by dividing by 1,000 then taking the reciprocal
1 / 0.0008652999999999995 => 1155.6685542586392
So the baud rate is roughly 1156 Hz. Rarely are baud rates strange numbers like that, so it's more likely that the real baud rate is closer to 1000 or 1200 Hz. To get a better idea, we can try another trick.
Multiplying the current sample by the conjugate of the previous sample will give you the above plot. I think this will even work if you just multiply the current sample by it's conjugate (no delay) too. Anyway, the idea is that you look for the first spike in the graph (from left to right). Here that first spike is dead on 1,200 Hz. So, that's almost certainly the baud rate.
Now that we know the baud rate, we need to resample the data down to an integer multiple of the baud rate. This is necessary for demod. Now, figuring out the decimation and interpolation factors by hand is annoying. Thankfully Python has you covered!
import fractions; fractions.Fraction(1200, 44100) => Fraction(4, 147)
The above tells us that in order to have one sample per baud (one sample per symbol) we need to interpolate by 4, and decimate by 147. BUT that won't work as the demodulation block need multiple samples per symbol so that it can find the best spot to sample. We'll go with 5 samples per symbol (really anything between 3 and 6 or 7 is fine). That changes the above code to:
import fractions; fractions.Fraction(1200 * 5, 44100) => Fraction(20, 147)
Note that there are some new variables in the graph. dec and interp. These are the decimation and interpolation values for the resampler. There is also a new variable called real_rate. This is the samp_rate * interp / dec (or 44100 * 20 / 147 => 6000 samples per second. Note that 6000 is just 1200 * 5 where 5 is the number of samples per symbol chosen above. This new variable (real_rate) is used in place of samp_rate for all of the blocks after the resampler. This is because the data went from 44.1 kHz to 1.2 kHz in the resampler and the downstream blocks need to know about that rate change in order to display data and/or process data properly. In this case the only blocks that actually care about the true sampling rate are the display blocks (time and frequency sinks) as they need to present the correct values on a graph
There is a fair bit of magic in the GFSK demod block in GNU Radio. The real pain with demod is time and frequency synchronization. If not for this lovely GNU Radio block, this overview would be much longer. But, thanks to the wonderful people at the GNU Radio Foundation, we get to take the easy way out!
We will use the GFSK demod block to time and frequency sync the samples for us. The output of this block is one byte per bit. That means that there are two possible byte values \x00 and \x01. You could pack the bits now, but that will just make life harder down the road. For now we will just save the raw \x00 and \x01 bytes to a file.
The only parameter you need to change is the samples per symbol which is 5 (from earlier)
We are now done with GNU Radio and it's time to parse the bits. Step one is to visualize the bits somehow. The way I like to do things is with gnome-terminal. It has the ability to resize the word wrapping as you resize the terminal, so you can start to see patterns in the bits. A lot of other terminals don't do that :(
But, the data in the output file is garbage that the terminal can't display. We can use sed to fix that problem
sed 's/\x00/0/g;s/\x01/1/g' /path/to/output_file
Now, I love a wall of binary as much as the next guy, but we can do better. Most terminals can display some funky characters, and we need something that really stands out as a 1 or 0. I found that ASCII 219 does a great job https://theasciicode.com.ar/extended-ascii-code/block-graphic-character-ascii-code-219.html. To make this work I go to that page, select the character at the top, copy, and then paste into the '0' or '1' of the sed command (can't paste here as it wouldn't look right)
Looks like a bunch of garbage eh? Well, this is where resizing the display comes in. Resize the terminal width-wise. You should see a pattern every 10 characters of resizing
Looks a lot better now! It'll look even better if you flip the 1 and 0 values in sed (use the block character for 0)
That's a lot cleaner :)
Now is a good time to talk about ASCII. All of the normal printable English letters are < decimal 127. That means that of the 8 bits in an ASCII byte, the top bit is always zero for printable characters. If you look at the data starting at one of the solid vertical columns, you'll see the following pattern
10XXXXXXX0
Where X is a changing bit. Keep in mind that we don't yet know if our assumptions over what is a 1 and what is a 0 are correct! But, there's very definitely the chance that bits (zero based, from the left) 1-8, or 2-9 are ASCII (would need to reverse the ASCII bits)
So, the easy thing here is to do a take-skip. The idea is that you take M bits, skip N bits. In this case it would be take 8 skip 2 (for our 10-bit frame). Then you would just need to adjust which bit you start at to account for alignment. This all assumes that the data is continuous as this method will garble the data if you ever lose sync with the data, or if something else is transmitted that throws off the alignment.
I don't know of any command line utilities for this, so I had to write a Python script to take care of it for me.
#!/usr/bin/python
import sys
take = int(sys.argv[1])
skip = int(sys.argv[2])
offset = int(sys.argv[3])
sys.stdin.read(offset)
chunk_size = take + skip
while 1:
data = sys.stdin.read(chunk_size)
if len(data) != chunk_size:
break
sys.stdout.write(chr(int(data[0:take][::-1], 2)))
sys.stdout.flush()
Save the script above off as take_skip.py and make sure it's executable.
The take and skip fields will always be 8 and 2. The offset value will change depending on the start bit. For that reason we will need to step through all 10 possible offsets to find the correct one. BUT there's still the issue of which value really means '1'. Is it \x00 or \x01? That means we need to try (at worst) 20 different possibilities.
A "simple" bash one-liner will help out (added the stderr redirect to shut Python up):
for i in {0..9}; do clear; echo -e "OFFSET: $i\n\n"; sed 's/\x01/1/g;s/\x00/0/g' output_bits | ./take_skip.py 8 2 $i 2>/dev/null | head --bytes 100; read; done
The above bash one-liner will try all possible shifts (0 through 9), pausing after each to let you see the output. Press enter to advance to the next shift. The current offset is shown at the top of the screen each time you press enter.
Above is an example of the wrong offset
Running through with the command above might result in nothing but garbled data. So, flip the '0' and '1' values in the sed command
for i in {0..9}; do clear; echo -e "OFFSET: $i\n\n"; sed 's/\x01/0/g;s/\x00/1/g' output_bits | ./take_skip.py 8 2 $i 2>/dev/null | head --bytes 100; read; done
In my case the text dropped out at the final offset
You can now use a command like this to dump out the entire transmission
sed 's/\x01/0/g;s/\x00/1/g' output_bits | ./take_skip.py 8 2 N
Where N is the offset that you got from the earlier commands
If you look closely you should notice that the length of each transmission is the same. If you count the number of characters it's 53 starting at AN and ending at the l0 line. Add in at least on char for newlines and the number comes up to 61.
Open up the file in vim
In my case the beginning contains some garbage. Delete any lines up to the first AN line.
Then run %! xxd -cols 61 in vim. You should end up with something like this
Notice that if you let your eyes lose focus you should be able to see a pattern that runs diagonally from top to bottom, left to right. This means that we don't quite have the correct width.
Press the 'u' key to undo the xxd command, then re-run the xxd command, but increase the -cols value by some small amount (if you're new to this, step through one at a time)
As you sneak up on the correct width you should start to see a very obvious pattern. The image below is 1 column off
The correct value for my data was 73 columns.
You should notice that the hex 0d0a shows up a lot. That's just a carriage return and newline. Standard stuff. The thing that's interesting is that at the very end you see 3831 340d 0a03 020d 0a. The 3831 34 is just ASCII characters. Then there is a 0d0a, but then 0302. Looking up ASCII codes 0x03 and 0x02 yields (image from https://www.ascii-code.com/)
0x03 is End of Text and 0x02 is Start of Text. Neat! What's more, you can use that pattern to synchronize the raw bits. Just look for 00000011XX_00000010XX (added _ for readability) where X is a "don't care" bit. Now, the actual process for searching for that pattern is a little more involved as you would want to account for bit reversal, and the location of the X bits in my example might not be correct. Just something to think about :)
The coordinates don't appear to be completely valid. The video was of a protest in Denver Colorado (likely near the capitol). The coordinates appear to be a good 46 km away from downtown which given the angle of the video just cannot be correct. Not sure what the deal is there.
If you want to plot the coordinates that are received, there is a dumb Python script that generates a KML file with timestamps of a single number incrementing by 1 (no real timestamp). You can definitely make out an orbit, but the location isn't quite right. Off by ~ 0.2-3 of a degree in both the lat and long.
Example usage of the script: sed 's/\x01/0/g;s/\x00/1/g' output_bits | ./take_skip.py 8 2 N | ./to_kml > moocow.kml
Also interesting is that the helicopter hit a wormhole at some point and ended up 35 km from it's previous position O.o
EDIT Turns out that I was not treating the coordinates correctly. Huge thanks to Flamewires for pointing out my mistake. Here's what needed to happen
Using the example coordinate of 39 4417 we break the 39 off and call that degrees, then take the 44 and treat that as minutes, and finally 17 as a percentage of 60 seconds (call it sec_percent). Think of 4417 as 44.17 minutes.
To convert that to actual DMS format, we do:
degrees + (minutes / 60.0) + (((sec_percent / 100.0) * 60) / 3600)
With the example, the numbers end up being:
39 + (44 / 60) + (((17 / 100) * 60) / 3600) => 39.73616666666667
The to_kml.py script has been updated to output correct coordinates
Here's the updated Google Earth view of the coordinates:
To make life easier, we can use the pattern (0x03 and 0x02) from earlier to sync up the bits so that all we need to do is run ./take_skip 8 2 0 and not have to worry about manually finding the correct bit offset.
We know that the pattern 11000000XX01000000XX shows up before a new message starts. So if we read through the input '1's and '0's until we find that pattern (throwing away data until the pattern is found) then we can line up such that the next 8 bits are ASCII characters (bit reversed in this case) followed by 2 bits of flags. That pattern just keeps on repeating until the end.
Check out the sync.py script for details on how it works.
With the new script, the full command to get text is as simple as:
sed 's/\x00/1/g;s/\x01/0/g' output_bits | python sync.py | python take_skip.py 8 2 0
That's it. Hopefully you learned something new!
I cannot promise that the parameters are perfect, the methods are valid, or the output is correct. Just a fun exercise that yielded what appear to be some really neat results :)
Are there better ways to do this: Yes. Oh my sweet Christ yes. I did this for kicks to see if I could. There are plenty of tools out there that you could feed the samples to and get immediate output. I wrote this up because figuring out what something is turns out to be the hardest part. A lot of the process is just something you hone over time. Reading the tea leaves. It's magic. I wanted to share my (possibly convoluted) process in the hopes that it encourages others to give stuff like this a try.
If you've gotten this far, thank you! And, go learn GNU Radio!! The application is amazing, and so are the people that maintain it!