Digital Digging with Henk van Ess

Digital Digging with Henk van Ess

Share this post

Digital Digging with Henk van Ess
Digital Digging with Henk van Ess
#ChatGPT: 4. Unlocking Geolocation with Large Language Models: A Workflow by Henk van Ess

#ChatGPT: 4. Unlocking Geolocation with Large Language Models: A Workflow by Henk van Ess

How large language models can help extract, find and enhance location data

Henk van Ess's avatar
Henk van Ess
May 09, 2023
∙ Paid
20

Share this post

Digital Digging with Henk van Ess
Digital Digging with Henk van Ess
#ChatGPT: 4. Unlocking Geolocation with Large Language Models: A Workflow by Henk van Ess
3
3
Share
Original artwork digitally made by Henk van Ess with MidJourney

Geolocation is critical for finding out where something happened. Knowing the location helps you figure out what's going on.

Large language models like ChatGPT seem to be the most unlikely candidate to assist you in geolocating:

  • Text based robots can’t process visual information, meaning they can't see the landmarks, signs, or other cues that are essential for navigating a location effectively. It's like trying to find your way in a pitch-black room with only a text description to guide you.

  • They have limitations in their knowledge base, meaning they may not be aware of all the nuances and details of a particular location. It's like trying to solve a crossword puzzle with only half the clues - you might get close, but you're unlikely to arrive at the right answer.

  • They can "hallucinate". They can create sentences and information that isn't real or accurate, but can still seem plausible. For example, if you asked ChatGPT about a location, it might give you an answer that sounds right, but it could be missing important details or even completely wrong.

Having said that, I still see ways how LLM #ai can assist, besides describing what you see and ask for more details. In this newsletter, I show you a complete workflow.

  1. Digging for Data: extracting locations from long texts, like PDF’s

  2. Chasing Coordinates: finding geolocation of a bunch of addresses

  3. Adding Tabasco: finding additional information

  4. Filling the blanks: with street names that are only shown partially

  1. Digging for Data

Mathis Lichtenberger, inspired by this tweet, came up with Chatpdf. You can upload PDF’s and ask questions about the document.

While investigating contracts containing location information, I needed to cross-check the places by using Google Streetview. Doing it manually was time-consuming, so I wondered if there was a faster way. I then uploaded a file, and the location information was accurately identified by the Chatpdf:

Greetings! This PDF file contains the campaign finance report for (redacted by me) during the Fall Pre-Election 2012 period. The report includes a summary of the committee's gross expenditures, contributions, and disbursements.

I asked the tool to extract all geolocations and it did that quickly.

Because Mathis is not using Chat-GPT, but Open AI, it was not possible to get the geographical coordinates right away, but it was already helpful I had a list of addresses. Mathis told me he is still in awe about the possibilities of #AI. “I wrote my master's thesis on natural language processing and am deeply impressed by the field. I think that AI progress will completely change the world in the next few years and I'm excited to be a part of the AI revolution.”

  1. Chasing Coordinates

The next step was to write a script for ChatGPT that allowed me to quickly look up the addresses in Google Maps. It was just one sentence. This is what I gave ChatGPT (3.5) to work with:

show me geo coordinates of the following addresses, put them in a table and come up with a query to google maps for the locations

Presto, a big time saver.

Some geolocation coordinates were off, so that’s why I asked ChatGPT to search for the address in the Google Maps Query. Some zipcodes were old and not found in Google Maps, so in a variation of the script I left out the zipcodes.

The author has put in a lot of time and effort to create a workflow for geolocation using LLM tools. Full scripts are only available for subscribers (60 pages)

Get 10% off for 1 year

  1. Adding Tabasco

Now I have the addresses, it would be great to add some tabasco. Can I find images from those place quickly? ChatGPT, do this for me:

add to the table a link to images in google for each address

Excellent. I added three extra’s: google image search on all addresses before 2020, just PDF’s and just social media.

How did I do that?

Timesearch Google Images:

make another column in table with link to address for google images, but end each search query with before:2020-01-01

PDF documents:

now add the address in table with clickable link and search for it in google, add filetype:pdf in search query and use as header of column "PDF search"

Social Media Link

Pascal Thierry Revelin showed me his ChatGPT 3.5 tool DorkGPT. The idea is to reduce the barrier of entry to create Google queries made with Google dorks, as I did manually in Google on Steroids

“A human will always be necessary but the tool can give you a headstart”, Thierry Revelin told me yesterday. So Pascal, what are your thoughts about #osint and #ai? “AI is of good help for the analyst but it will not replace him”

Let’s put his tool to the test.

It gave me:

site:twitter.com OR site:facebook.com OR site:instagram.com OR site:linkedin.com OR site:pinterest.com OR site:tumblr.com OR site:reddit.com OR site:snapchat.com OR site:flickr.com OR site:myspace.com

That’s fine, although, MySpace, that one is dead :) I instructed ChatGPT to do this:

And I had a new table

You could argue this takes time too. It does. But the beauty is, that you can repeat it over and over again. Here is my endgame:

And there it is: from now on, I can type in the word TABLE and get my personal osint dashboard.

  1. Filling the blanks

Sometimes if you want to geolocate stuff, you can’t read the whole street name.

But ChatGPT gave me an idea :) (I had to train it first)

And I used the same script from 3. Adding Tabasco

Isn’t that lovely?

Below: all the scripts I wrote, 60 pages, with even some more ideas so you can learn from it! (Subscribers only) and the PDF I used for data-extraction.

Keep reading with a 7-day free trial

Subscribe to Digital Digging with Henk van Ess to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 𝚑𝚎𝚗𝚔 𝚟𝚊𝚗 𝚎𝚜𝚜
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share