#ChatGPT: 4. Unlocking Geolocation with Large Language Models: A Workflow by Henk van Ess
How large language models can help extract, find and enhance location data
Geolocation is critical for finding out where something happened. Knowing the location helps you figure out what's going on.
Large language models like ChatGPT seem to be the most unlikely candidate to assist you in geolocating:
Text based robots can’t process visual information, meaning they can't see the landmarks, signs, or other cues that are essential for navigating a location effectively. It's like trying to find your way in a pitch-black room with only a text description to guide you.
They have limitations in their knowledge base, meaning they may not be aware of all the nuances and details of a particular location. It's like trying to solve a crossword puzzle with only half the clues - you might get close, but you're unlikely to arrive at the right answer.
They can "hallucinate". They can create sentences and information that isn't real or accurate, but can still seem plausible. For example, if you asked ChatGPT about a location, it might give you an answer that sounds right, but it could be missing important details or even completely wrong.
Having said that, I still see ways how LLM #ai can assist, besides describing what you see and ask for more details. In this newsletter, I show you a complete workflow.
Digging for Data: extracting locations from long texts, like PDF’s
Chasing Coordinates: finding geolocation of a bunch of addresses
Adding Tabasco: finding additional information
Filling the blanks: with street names that are only shown partially
Digging for Data
Mathis Lichtenberger, inspired by this tweet, came up with Chatpdf. You can upload PDF’s and ask questions about the document.
While investigating contracts containing location information, I needed to cross-check the places by using Google Streetview. Doing it manually was time-consuming, so I wondered if there was a faster way. I then uploaded a file, and the location information was accurately identified by the Chatpdf:
Greetings! This PDF file contains the campaign finance report for (redacted by me) during the Fall Pre-Election 2012 period. The report includes a summary of the committee's gross expenditures, contributions, and disbursements.
I asked the tool to extract all geolocations and it did that quickly.
Because Mathis is not using Chat-GPT, but Open AI, it was not possible to get the geographical coordinates right away, but it was already helpful I had a list of addresses. Mathis told me he is still in awe about the possibilities of #AI. “I wrote my master's thesis on natural language processing and am deeply impressed by the field. I think that AI progress will completely change the world in the next few years and I'm excited to be a part of the AI revolution.”
Chasing Coordinates
The next step was to write a script for ChatGPT that allowed me to quickly look up the addresses in Google Maps. It was just one sentence. This is what I gave ChatGPT (3.5) to work with:
show me geo coordinates of the following addresses, put them in a table and come up with a query to google maps for the locations
Presto, a big time saver.
Some geolocation coordinates were off, so that’s why I asked ChatGPT to search for the address in the Google Maps Query. Some zipcodes were old and not found in Google Maps, so in a variation of the script I left out the zipcodes.
The author has put in a lot of time and effort to create a workflow for geolocation using LLM tools. Full scripts are only available for subscribers (60 pages)
Adding Tabasco
Now I have the addresses, it would be great to add some tabasco. Can I find images from those place quickly? ChatGPT, do this for me:
add to the table a link to images in google for each address
Excellent. I added three extra’s: google image search on all addresses before 2020, just PDF’s and just social media.
How did I do that?
Timesearch Google Images:
make another column in table with link to address for google images, but end each search query with before:2020-01-01
PDF documents:
now add the address in table with clickable link and search for it in google, add filetype:pdf in search query and use as header of column "PDF search"
Social Media Link
Pascal Thierry Revelin showed me his ChatGPT 3.5 tool DorkGPT. The idea is to reduce the barrier of entry to create Google queries made with Google dorks, as I did manually in Google on Steroids
“A human will always be necessary but the tool can give you a headstart”, Thierry Revelin told me yesterday. So Pascal, what are your thoughts about #osint and #ai? “AI is of good help for the analyst but it will not replace him”
Let’s put his tool to the test.
It gave me:
site:twitter.com OR site:facebook.com OR site:instagram.com OR site:linkedin.com OR site:pinterest.com OR site:tumblr.com OR site:reddit.com OR site:snapchat.com OR site:flickr.com OR site:myspace.com
That’s fine, although, MySpace, that one is dead :) I instructed ChatGPT to do this:
And I had a new table
You could argue this takes time too. It does. But the beauty is, that you can repeat it over and over again. Here is my endgame:
And there it is: from now on, I can type in the word TABLE and get my personal osint dashboard.
Filling the blanks
Sometimes if you want to geolocate stuff, you can’t read the whole street name.
But ChatGPT gave me an idea :) (I had to train it first)
And I used the same script from 3. Adding Tabasco
Isn’t that lovely?
Below: all the scripts I wrote, 60 pages, with even some more ideas so you can learn from it! (Subscribers only) and the PDF I used for data-extraction.
Keep reading with a 7-day free trial
Subscribe to Digital Digging with Henk van Ess to keep reading this post and get 7 days of free access to the full post archives.