Thanks to visit codestin.com
Credit goes to gist.github.com

Skip to content

Instantly share code, notes, and snippets.

@sahal
Created May 31, 2025 22:51
Show Gist options
  • Save sahal/5c037fffe8c3a619bdf6c0ccef4263e0 to your computer and use it in GitHub Desktop.
Save sahal/5c037fffe8c3a619bdf6c0ccef4263e0 to your computer and use it in GitHub Desktop.
Google voice short code analysis

Google voice short code analysis

I'm trying to move away from Google products but one I have yet to move away from is Google voice. I have found an alternative, yay: voip.ms. During the transition though, I do not want to lose any shortcode based two factor authentication. I wish all services provided TOTP, but alas this is not the case, unfortunately.

Grab a dump of your Google voice data

You can do so by following the instructions provided on the Export your data from Voice help page.

HTML to JSON

Unfortunately, the data is available in html format, so its a little harder to parse.

isolate the shortcode text messages

Most shortcode msgs are from 5 digit codes, but some are six digit. This regex will match 5 or more digit codes. It doesn't capture phonenumbers because those are prefixed with a plus. It also doesn't capture shortcodes that are added to your contact list, because Google replaces the phone number/shortcode with the contact name.

find ./ -type f -regex "^\.\/[0-9][0-9][0-9][0-9][0-9]*\ .*"

install jq and pup

These are required for my shitty script that follows.

  • pup - Parsing HTML at the command line
  • jq - jq is a lightweight and flexible command-line JSON processor.

Run the jsonify.sh script

This will convert the html files to json so you can use something to filter the text automagically.

./jsonify.sh
cat *.json > json-output.json

filter using jq

$ jq '.phone' json-output.json |sort | uniq -c | sort -rn|head -10
    220 "tel:12345"
    166 "tel:12345"
    148 "tel:12345"
    147 "tel:12345"
     83 "tel:12345"
     81 "tel:12345"
     75 "tel:12345"
     65 "tel:12345"
     59 "tel:12345"
     50 "tel:12345"
  • note: I replaced the actual shortcodes here with 12345

This just shows the numbers that contacted you the most.

convert to csv

$ jq -r '[.phone, .text, .dt.m]|@csv' json-output.json > csv-output.csv

This is useful to bring this into your favorite spreadsheet software to run analysis there, if you'd like.

Alternative solutions

You might be able to use xpath or something to parse the html too -- :shrugs:

#!/usr/bin/env bash
set -e
set -u
while IFS= read -r -d '' file
do
filename="${file%.*}"
pup 'body json{}' < "${filename}.html"| \
jq '.[0] |
{ text: .children[0]|select(.class == "hChatLog hfeed").children[0].children[2].text,
phone: .children[0]|select(.class == "hChatLog hfeed").children[0].children[1].children[0].href,
dt: { m: .children[0]|select(.class == "hChatLog hfeed").children[0].children[0].title,
h: .children[0]|select(.class == "hChatLog hfeed").children[0].children[0].text },
tags: .children[1].children[].text, deleted: .children[2].text }' > "${filename}.json"
done < <(find . -type f -name "*.html" -print0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment