Module 2: Python for SEO Automation

In this module, you’ll learn how to use Python to automate SEO tasks and process large datasets efficiently. No prior programming experience required!

Learning Objectives

By the end of this module, you will be able to:

Write Python scripts, assisted by AI to process SEO data
Create automated workflows for common SEO tasks
Integrate LLMs with Python for intelligent analysis
Build custom SEO tools and dashboards

Prerequisites

Completion of Module 1: Technical Foundations for SEO
Basic computer skills (no programming experience required)

Why it matters

You’re learning to understand how data automation works - not to become a programmer, but to architect solutions, troubleshoot data issues, and communicate effectively with technical teams.

When you understand Python concepts, you can design automation strategies, debug data processing problems, and evaluate technical proposals from developers. You’ll know what’s possible, what’s realistic, and how to specify requirements that actually work.

This is one of the most important skill-sets you can develop in 2026.

Example in action

Instead of manually categorising 10,000 keywords in Excel, you’ll use an LLM to generate a Python script that removes duplicates, groups them by intent and exports clean data in seconds.

You don’t need to know how to develop this yourself, although don’t let me stop you! You just need to understand how you might do it and what is possible. Following this, you need to be able to have that conversation with an LLM and run the outputs successfully.

With this understanding, you can:

Architect data processing workflows that handle enterprise-scale SEO data
Troubleshoot integration issues between different SEO tools and platforms
Design automation strategies that save hundreds of hours per month
Communicate data requirements clearly to developers and analysts
Evaluate and choose the right technical approaches for complex SEO challenges

Common mistakes

Trying to learn everything at once (focus on practical, small tasks first)
Not backing up data before processing it
Forgetting to handle missing or messy data
Writing overly complex code when simple, step-by-step solutions work better
Not commenting your code (you’ll forget what it does next week)
Not using LLMs effectively - unclear prompts lead to poor results
Not validating LLM output - always test generated code with sample data

Key Principles for SEO Python Success

1. You’re Not Becoming a Programmer

You’re learning to communicate with AI to generate the code you need. This approach lets you:

Focus on problem-solving rather than syntax memorisation
Generate working code quickly for specific SEO tasks
Understand the concepts without getting lost in implementation details
Iterate and improve solutions through conversation with AI

2. Essential LLM Prompting for Python

Always Be Specific About Your Data

❌ BAD: "Write code to analyse keywords"
✅ GOOD: "Write Python code to analyse a CSV file with columns: keyword, search_volume, clicks, impressions. Calculate CTR and find keywords with high impressions but low CTR."

But wait, this isn’t as complex or as sophisticated as the prompt engineering posts I see on linked in!

This is VERY intentional. Maybe some of those prompts shared on LinkedIn are genius, but maybe - just maybe - the free, overly complicated “prompt engineering masterclass” shared by tens/hundreds of thousands of people is actually, at best, barely-useful waffle.

Simplicitiy is king, if you understand what your prompts are doing and you can iterate and improve, you’re moving in the right direction. Start simple!

Specify the Output Format

❌ BAD: "Process this data"
✅ GOOD: "Process this data and export results to a new CSV file, with the format YYYYMMMDD_file_name.csv with columns: keyword, ctr, opportunity_score"

Include Error Handling

❌ BAD: "Read my CSV and analyse the keywords"
✅ GOOD: "Read a GSC export CSV (columns: keyword, clicks, impressions, ctr, position). If the file is missing or empty, print a clear error message and exit. If required columns are missing, list which ones and exit instead of raising a traceback."

Ask for Explanations

✅ GOOD: "Generate Python code to [task] and explain what each part does in simple terms"

Always Request Comments in Generated Code

Most AI-generated python will include comments, but the main message here is always be explicit if you want a specific outcome.

✅ GOOD: "Generate Python code with detailed comments explaining what each section does"

Why comments matter:

You’ll understand the code when you return to it later
Comments help you learn Python concepts
Makes debugging easier when something goes wrong
Essential for sharing code with team members

3. Where to Run Your Python Scripts

Local Execution (Your Computer) - Best for: Large datasets, sensitive data, custom environments Google Colab (Cloud) - Best for: Learning, sharing, quick analysis
Streamlit (Web Applications) - Best for: Sharing results with non-technical stakeholders

There is no single best option here, but Streamlit (run in the web UI) will produce some of the most robust tests, and can be done quickly or easily. Streamlit will load in GitHub repos, so your workflow can be super-simple:

Write python app locally
Convert python app into Streamlit Application
Commit and Push to GitHub
Sign-in to Streamlit using your Github account, create a new project and select your GitHub repo
Deploy the app and test

If anything doesn’t work, you can copy+paste any error messages back into your AI coding software and it’ll be able to do a lot of the troubleshooting and correction for you.

Important: Streamlit Cloud Limitations

Free tier can only deploy from public GitHub repositories -
For private repos, you need a paid plan or use alternatives like Heroku, Railway, or Render
Keep sensitive data in environment variables, not in your code repository

Include requirements.txt for Streamlit Apps

✅ GOOD: "Create a Streamlit app run out of the Web user interfee for [task] and include a requirements.txt file with all necessary dependencies"
❌ BAD: "Create a Streamlit app" (without specifying requirements.txt or streamlit UI)

Power Tip - you can run streamlit locally, but if you want to host it on the free, public cloud version you may not want to spend time testing it locally. You can get a Python scripy running properly locally and then prompt to convert that.

✅ GOOD: "Convert script.py to run in the steamlit web UI. Ensure you include a requirements.txt with all ncessary dependencies. Ensure no API keys or secrets are exposed as part of this.

4. Common LLM Prompting Pitfalls

❌ Vague Prompts Lead to Poor Results

Problem: “Make a chart”
Solution: “Create a bar chart showing top 10 keywords by clicks using matplotlib, with proper labels and title.”

You can also specify fonts and colours too - even paste in a website or example chart if you wanted to work from an example.

❌ Not Providing Context

Problem: “Analyse this data”
Solution: “Analyse this SEO keyword data to find opportunities where impressions > 1000 and position > 10”

❌ Not Testing Generated Code

Problem: Using LLM output without validation
Solution: Prompt the to run testing of the code and provide debugging messages and error handling. Always test with sample data first, then scale up.

❌ Not Iterating on Results

Problem: Accepting first LLM response
Solution: Ask for improvements: “Make this code more efficient” or even “suggest ways to improve this”

Python Installation and Setup

Installing Python3

For macOS:

# Using Homebrew (recommended)
brew install python3

# Or download from python.org
# Visit https://www.python.org/downloads/macos/

For Windows:

# Download from python.org
# Visit https://www.python.org/downloads/windows/
# Make sure to check "Add Python to PATH" during installation

# Or using Chocolatey
choco install python3

# Or using winget
winget install Python.Python.3

For Linux (Ubuntu/Debian):

sudo apt update
sudo apt install python3 python3-pip python3-venv

Verify Python installation:

# macOS/Linux
python3 --version
python3 -c "print('Python is working!')"

# Windows
python --version
python -c "print('Python is working!')"

Installing Python Packages via pip

pip is the python package manager, it allows you to install new python packages. A pack is a bundle of code that enables more functionality within python. Some may have heard of before.

Once Python is installed, install essential packages:

# macOS/Linux
pip3 install pandas numpy matplotlib seaborn jupyter requests beautifulsoup4 streamlit

# Windows
pip install pandas numpy matplotlib seaborn jupyter requests beautifulsoup4 streamlit

# Verify installation
python3 -c "import pandas; print('pandas installed successfully')"  # macOS/Linux
python -c "import pandas; print('pandas installed successfully')"  # Windows

Essential Python Libraries for SEO

Library	Purpose	Why Use It	Alternatives	Install Command
pandas	Data manipulation and analysis	Excel on steroids - handles large datasets	polars, dask	`pip install pandas`
numpy	Mathematical operations	Fast calculations for large datasets	scipy, numba	`pip install numpy`
matplotlib	Basic charting	Simple, reliable plotting	seaborn, plotly	`pip install matplotlib`
seaborn	Statistical visualisation	Beautiful, publication-ready charts	plotly, bokeh	`pip install seaborn`
jupyter	Interactive notebooks	Mix code, results, and notes	Google Colab, Cursor	`pip install jupyter`
requests	API calls	Connect to SEO tools and websites	httpx, aiohttp	`pip install requests`
beautifulsoup4	Web scraping	Parse HTML and extract data	scrapy, selenium	`pip install beautifulsoup4`
streamlit	Web apps	Turn analysis into shareable dashboards	dash, flask	`pip install streamlit`

Development Environment: Cursor (Recommended)

Why Cursor over VS Code? Cursor has built-in AI assistance that makes Python development much easier for non-developers.

Download Cursor from cursor.sh
Open your project folder in Cursor
Use the integrated terminal (Ctrl/Cmd + `) for running Python commands
Leverage AI assistance for code generation and debugging

Jupyter vs Google Colab

Understanding Your Python Deployment Options

Jupyter and Google Colab are both interactive environments for running Python code, but they serve different purposes and have different strengths. Think of them as alternative ways to deploy and run your Python scripts:

Jupyter runs locally on your computer - like having a personal workspace
Google Colab runs in the cloud - like renting a workspace when you need it

Both allow you to write code, see results immediately, and mix code with explanations, but they’re optimised for different use cases.

Feature	Jupyter (Local)	Google Colab
Setup	Requires local installation	No setup - runs in browser
Performance	Uses your computer’s power	Limited by Google’s servers
Data Privacy	Data stays on your machine	Data goes to Google
Collaboration	Share files manually	Built-in sharing and collaboration
Cost	Free	Free tier + paid options
Best For	Sensitive data, large datasets	Learning, sharing, quick analysis

When to use Jupyter:

Working with sensitive data
Large datasets that need local processing
Custom environments with specific packages

When to use Colab:

Learning and experimentation
Sharing analysis with team members
Quick prototyping and demos

Try it yourself

Complete these exercises to build your Python skills using LLM prompting:

Exercise 1: Links Report Anchor Text Extraction

Complete a Screaming Frog or Sitebulb crawl, export the internal links data to csv format.

Task: Extract all pages with anchor text that matches a specific regex pattern from a large links report and create a smaller, manageable CSV file.

LLM Prompt Type Needed: Large file processing and regex filtering prompt

A starter example:

"Write Python code to process a large links report CSV file (columns: source_url, target_url, anchor_text, link_type, domain_authority <- Edit/amend these as needed). Extract all rows where anchor_text matches the regex pattern '[YOUR_PATTERN]', handle memory efficiently for large files, and export filtered results to a new CSV. Include detailed comments explaining the regex matching and memory optimisation techniques."

This is intentionally not perfect and misses out some potentially issues. After a few attempts - asking the AI assistant for help - and you’ll get this up and running. Python can be used to complete this, but also you can use command line tools like XSV to do the same job. Keep it output focused for now don’t worry so much about the method.

Common Pitfalls to Watch Out For:

Not handling large files efficiently (use chunking for files > 1GB)
Forgetting to escape special regex characters in the pattern
Not asking for case-insensitive matching options
Not requesting validation of regex pattern before processing
Not specifying how to handle malformed CSV rows
Error handling

Exercise 2: Google Search Console Click Volume Analysis

For this you’ll need to download the queries and clicks data from Google Search Console, again in a CSV format.

Task: Analyse a large GSC file to create click volume buckets and count query terms by total click volume.

LLM Prompt Type Needed: GSC data aggregation and bucketing prompt

A starter example:

"Create Python code to analyse Google Search Console data (columns: query, clicks, impressions, ctr, position, date <- Delete/amend as needed). Group queries by total click volume into buckets (10-100, 100-500, 500-1000, 1000-5000, 5000+ clicks), count terms within each bucket, and create a summary table. Include detailed comments explaining the bucketing logic and aggregation methods."

Extra things to be aware of

This can also be used to group impressions, which can also be hugely useful.
Grouping similar queries (plurals/typos etc) can make for more meaningful outputs.
The click bucket volumes themselves may need changing, depending on the site in question.
Date formatting is often a pain, YYYY-MM-DD is often the easiest to work with.

Exercise 3: Keyword Deduplication with Stemming

De-duplication of similar keywords can often take a long time if done manually, this method will speed things up.

Task: Remove duplicate keywords by applying stemming/lemmatisation, removing punctuation, and identifying close variants.

LLM Prompt Type Needed: Text preprocessing and deduplication prompt

A starter example:

"Write Python code to deduplicate a keyword list using stemming/lemmatisation, punctuation removal, and close variant detection. Process columns: keyword, search_volume, competition, cpc. Remove exact duplicates, stemmed duplicates, and close variants (edit distance < 2). Export cleaned data with original and processed versions. Include detailed comments explaining the text preprocessing pipeline."

Common Pitfalls to Watch Out For:

Not choosing appropriate stemming library (NLTK vs spaCy vs TextBlob) - you can ask the most appropriate one of the task to get your started.
Forgetting to handle different languages and stemming rules - For now, avoid having different languages within the example OR just focus on the main language.
Not asking for edit distance calculation for close variants - this may also take some testing to get right, but the AI assistant will get you started at the right point
Not requesting preservation of highest-value keyword when removing duplicates - OR get it to merge the values against each, depends on where the keyword data has come from.
Not specifying how to handle special characters and punctuation

Exercise 4: Web Access Log Bot Analysis

Sometimes the hardest task here is to actually obtain the log files, but assuming you can get them, this method will help you start to analyse them.

Task: Analyse web access logs to identify and summarise Googlebot hits with detailed statistics.

LLM Prompt Type Needed: Log file parsing and bot detection prompt

"Create Python code to parse web access logs (supply example for format) and analyse Googlebot hits. Extract: IP address, user agent, request path, status code, response time, referrer. Generate summary statistics including total hits, unique pages crawled and response code distribution. Include detailed comments explaining the log parsing and bot detection logic."

Common Pitfalls and other hints:

Not handling different log formats
Forgetting to validate Googlebot user agent strings (fake bots exist) - Google provides a list of valid IP address to check against
Not asking for proper handling of log rotation and multiple files - I.e. identifying where gaps may exist
Not requesting analysis of crawl patterns and frequency
Not specifying how to handle malformed log entries
It doesn’t have to be kept to Gooblebot, you can also use AI Crawlers too.

Resources

Python Quick Reference Guide - Essential Python commands and patterns for SEO automation
LLM Prompting Guide - Master the art of prompting LLMs for SEO tasks
SEO Tools Integration Guide - Complete guide to integrating with major SEO tools

Next: Module 3: APIs & JSON

Supporting materials

Ready to automate your SEO workflow? This module provides hands-on experience with Python for SEO professionals.

Module 2: Python for SEO Automation

Module 2: Python for SEO Automation

Learning Objectives

Prerequisites

Why it matters

Example in action

Common mistakes

Key Principles for SEO Python Success

1. You’re Not Becoming a Programmer

2. Essential LLM Prompting for Python

Always Be Specific About Your Data

Specify the Output Format

Include Error Handling

Ask for Explanations

Always Request Comments in Generated Code

3. Where to Run Your Python Scripts

Include requirements.txt for Streamlit Apps

4. Common LLM Prompting Pitfalls

❌ Vague Prompts Lead to Poor Results

❌ Not Providing Context

❌ Not Testing Generated Code

❌ Not Iterating on Results

Python Installation and Setup

Installing Python3

Installing Python Packages via pip

Essential Python Libraries for SEO

Development Environment: Cursor (Recommended)

Jupyter vs Google Colab

Try it yourself

Exercise 1: Links Report Anchor Text Extraction

Exercise 2: Google Search Console Click Volume Analysis

Exercise 3: Keyword Deduplication with Stemming

Exercise 4: Web Access Log Bot Analysis

Resources

You Might Also Like

Read More

Read More

Read More

Read More