I'm trying to move away from Google products but one I have yet to move away from is Google voice. I have found an alternative, yay: voip.ms. During the transition though, I do not want to lose any shortcode based two factor authentication. I wish all services provided TOTP, but alas this is not the case, unfortunately.
You can do so by following the instructions provided on the Export your data from Voice help page.
Unfortunately, the data is available in html format, so its a little harder to parse.
Most shortcode msgs are from 5 digit codes, but some are six digit. This regex will match 5 or more digit codes. It doesn't capture phonenumbers because those are prefixed with a plus. It also doesn't capture shortcodes that are added to your contact list, because Google replaces the phone number/shortcode with the contact name.
find ./ -type f -regex "^\.\/[0-9][0-9][0-9][0-9][0-9]*\ .*"
These are required for my shitty script that follows.
- pup - Parsing HTML at the command line
- jq - jq is a lightweight and flexible command-line JSON processor.
This will convert the html files to json so you can use something to filter the text automagically.
./jsonify.sh
cat *.json > json-output.json
$ jq '.phone' json-output.json |sort | uniq -c | sort -rn|head -10
220 "tel:12345"
166 "tel:12345"
148 "tel:12345"
147 "tel:12345"
83 "tel:12345"
81 "tel:12345"
75 "tel:12345"
65 "tel:12345"
59 "tel:12345"
50 "tel:12345"
- note: I replaced the actual shortcodes here with
12345
This just shows the numbers that contacted you the most.
$ jq -r '[.phone, .text, .dt.m]|@csv' json-output.json > csv-output.csv
This is useful to bring this into your favorite spreadsheet software to run analysis there, if you'd like.
You might be able to use xpath or something to parse the html too -- :shrugs: