Module 2: Python for SEO Automation
In this module, you’ll learn how to use Python to automate SEO tasks and process large datasets efficiently. No prior programming experience required!
Learning Objectives
By the end of this module, you will be able to:
- Write Python scripts, assisted by AI to process SEO data
- Create automated workflows for common SEO tasks
- Integrate LLMs with Python for intelligent analysis
- Build custom SEO tools and dashboards
Prerequisites
- Completion of Module 1: Technical Foundations for SEO
- Basic computer skills (no programming experience required)
Why it matters
You’re learning to understand how data automation works - not to become a programmer, but to architect solutions, troubleshoot data issues, and communicate effectively with technical teams.
When you understand Python concepts, you can design automation strategies, debug data processing problems, and evaluate technical proposals from developers. You’ll know what’s possible, what’s realistic, and how to specify requirements that actually work.
This is one of the most important skill-sets you can develop in 2026.
Example in action
Instead of manually categorising 10,000 keywords in Excel, you’ll use an LLM to generate a Python script that removes duplicates, groups them by intent and exports clean data in seconds.
You don’t need to know how to develop this yourself, although don’t let me stop you! You just need to understand how you might do it and what is possible. Following this, you need to be able to have that conversation with an LLM and run the outputs successfully.
With this understanding, you can:
- Architect data processing workflows that handle enterprise-scale SEO data
- Troubleshoot integration issues between different SEO tools and platforms
- Design automation strategies that save hundreds of hours per month
- Communicate data requirements clearly to developers and analysts
- Evaluate and choose the right technical approaches for complex SEO challenges
Common mistakes
- Trying to learn everything at once (focus on practical, small tasks first)
- Not backing up data before processing it
- Forgetting to handle missing or messy data
- Writing overly complex code when simple, step-by-step solutions work better
- Not commenting your code (you’ll forget what it does next week)
- Not using LLMs effectively - unclear prompts lead to poor results
- Not validating LLM output - always test generated code with sample data
Key Principles for SEO Python Success
1. You’re Not Becoming a Programmer
You’re learning to communicate with AI to generate the code you need. This approach lets you:
- Focus on problem-solving rather than syntax memorisation
- Generate working code quickly for specific SEO tasks
- Understand the concepts without getting lost in implementation details
- Iterate and improve solutions through conversation with AI
2. Essential LLM Prompting for Python
Always Be Specific About Your Data
❌ BAD: "Write code to analyse keywords"
✅ GOOD: "Write Python code to analyse a CSV file with columns: keyword, search_volume, clicks, impressions. Calculate CTR and find keywords with high impressions but low CTR."
But wait, this isn’t as complex or as sophisticated as the prompt engineering posts I see on linked in!
This is VERY intentional. Maybe some of those prompts shared on LinkedIn are genius, but maybe - just maybe - the free, overly complicated “prompt engineering masterclass” shared by tens/hundreds of thousands of people is actually, at best, barely-useful waffle.
Simplicitiy is king, if you understand what your prompts are doing and you can iterate and improve, you’re moving in the right direction. Start simple!
Specify the Output Format
❌ BAD: "Process this data"
✅ GOOD: "Process this data and export results to a new CSV file, with the format YYYYMMMDD_file_name.csv with columns: keyword, ctr, opportunity_score"
Include Error Handling
❌ BAD: "Read my CSV and analyse the keywords"
✅ GOOD: "Read a GSC export CSV (columns: keyword, clicks, impressions, ctr, position). If the file is missing or empty, print a clear error message and exit. If required columns are missing, list which ones and exit instead of raising a traceback."
Ask for Explanations
✅ GOOD: "Generate Python code to [task] and explain what each part does in simple terms"
Always Request Comments in Generated Code
Most AI-generated python will include comments, but the main message here is always be explicit if you want a specific outcome.
✅ GOOD: "Generate Python code with detailed comments explaining what each section does"
Why comments matter:
- You’ll understand the code when you return to it later
- Comments help you learn Python concepts
- Makes debugging easier when something goes wrong
- Essential for sharing code with team members
3. Where to Run Your Python Scripts
Local Execution (Your Computer) - Best for: Large datasets, sensitive data, custom environments
Google Colab (Cloud) - Best for: Learning, sharing, quick analysis
Streamlit (Web Applications) - Best for: Sharing results with non-technical stakeholders
There is no single best option here, but Streamlit (run in the web UI) will produce some of the most robust tests, and can be done quickly or easily. Streamlit will load in GitHub repos, so your workflow can be super-simple:
- Write python app locally
- Convert python app into Streamlit Application
- Commit and Push to GitHub
- Sign-in to Streamlit using your Github account, create a new project and select your GitHub repo
- Deploy the app and test
If anything doesn’t work, you can copy+paste any error messages back into your AI coding software and it’ll be able to do a lot of the troubleshooting and correction for you.
Important: Streamlit Cloud Limitations
- Free tier can only deploy from public GitHub repositories -
- For private repos, you need a paid plan or use alternatives like Heroku, Railway, or Render
- Keep sensitive data in environment variables, not in your code repository
Include requirements.txt for Streamlit Apps
✅ GOOD: "Create a Streamlit app run out of the Web user interfee for [task] and include a requirements.txt file with all necessary dependencies"
❌ BAD: "Create a Streamlit app" (without specifying requirements.txt or streamlit UI)
Power Tip - you can run streamlit locally, but if you want to host it on the free, public cloud version you may not want to spend time testing it locally. You can get a Python scripy running properly locally and then prompt to convert that.
✅ GOOD: "Convert script.py to run in the steamlit web UI. Ensure you include a requirements.txt with all ncessary dependencies. Ensure no API keys or secrets are exposed as part of this.
4. Common LLM Prompting Pitfalls
❌ Vague Prompts Lead to Poor Results
- Problem: “Make a chart”
- Solution: “Create a bar chart showing top 10 keywords by clicks using matplotlib, with proper labels and title.”
You can also specify fonts and colours too - even paste in a website or example chart if you wanted to work from an example.
❌ Not Providing Context
- Problem: “Analyse this data”
- Solution: “Analyse this SEO keyword data to find opportunities where impressions > 1000 and position > 10”
❌ Not Testing Generated Code
- Problem: Using LLM output without validation
- Solution: Prompt the to run testing of the code and provide debugging messages and error handling. Always test with sample data first, then scale up.
❌ Not Iterating on Results
- Problem: Accepting first LLM response
- Solution: Ask for improvements: “Make this code more efficient” or even “suggest ways to improve this”
Python Installation and Setup
Installing Python3
For macOS:
# Using Homebrew (recommended)
brew install python3
# Or download from python.org
# Visit https://www.python.org/downloads/macos/
For Windows:
# Download from python.org
# Visit https://www.python.org/downloads/windows/
# Make sure to check "Add Python to PATH" during installation
# Or using Chocolatey
choco install python3
# Or using winget
winget install Python.Python.3
For Linux (Ubuntu/Debian):
sudo apt update
sudo apt install python3 python3-pip python3-venv
Verify Python installation:
# macOS/Linux
python3 --version
python3 -c "print('Python is working!')"
# Windows
python --version
python -c "print('Python is working!')"
Installing Python Packages via pip
pip is the python package manager, it allows you to install new python packages. A pack is a bundle of code that enables more functionality within python. Some may have heard of before.
Once Python is installed, install essential packages:
# macOS/Linux
pip3 install pandas numpy matplotlib seaborn jupyter requests beautifulsoup4 streamlit
# Windows
pip install pandas numpy matplotlib seaborn jupyter requests beautifulsoup4 streamlit
# Verify installation
python3 -c "import pandas; print('pandas installed successfully')" # macOS/Linux
python -c "import pandas; print('pandas installed successfully')" # Windows
Essential Python Libraries for SEO
| Library | Purpose | Why Use It | Alternatives | Install Command |
|---|---|---|---|---|
| pandas | Data manipulation and analysis | Excel on steroids - handles large datasets | polars, dask | pip install pandas |
| numpy | Mathematical operations | Fast calculations for large datasets | scipy, numba | pip install numpy |
| matplotlib | Basic charting | Simple, reliable plotting | seaborn, plotly | pip install matplotlib |
| seaborn | Statistical visualisation | Beautiful, publication-ready charts | plotly, bokeh | pip install seaborn |
| jupyter | Interactive notebooks | Mix code, results, and notes | Google Colab, Cursor | pip install jupyter |
| requests | API calls | Connect to SEO tools and websites | httpx, aiohttp | pip install requests |
| beautifulsoup4 | Web scraping | Parse HTML and extract data | scrapy, selenium | pip install beautifulsoup4 |
| streamlit | Web apps | Turn analysis into shareable dashboards | dash, flask | pip install streamlit |
Development Environment: Cursor (Recommended)
Why Cursor over VS Code? Cursor has built-in AI assistance that makes Python development much easier for non-developers.
- Download Cursor from cursor.sh
- Open your project folder in Cursor
- Use the integrated terminal (Ctrl/Cmd + `) for running Python commands
- Leverage AI assistance for code generation and debugging
Jupyter vs Google Colab
Understanding Your Python Deployment Options
Jupyter and Google Colab are both interactive environments for running Python code, but they serve different purposes and have different strengths. Think of them as alternative ways to deploy and run your Python scripts:
- Jupyter runs locally on your computer - like having a personal workspace
- Google Colab runs in the cloud - like renting a workspace when you need it
Both allow you to write code, see results immediately, and mix code with explanations, but they’re optimised for different use cases.
| Feature | Jupyter (Local) | Google Colab |
|---|---|---|
| Setup | Requires local installation | No setup - runs in browser |
| Performance | Uses your computer’s power | Limited by Google’s servers |
| Data Privacy | Data stays on your machine | Data goes to Google |
| Collaboration | Share files manually | Built-in sharing and collaboration |
| Cost | Free | Free tier + paid options |
| Best For | Sensitive data, large datasets | Learning, sharing, quick analysis |
When to use Jupyter:
- Working with sensitive data
- Large datasets that need local processing
- Custom environments with specific packages
When to use Colab:
- Learning and experimentation
- Sharing analysis with team members
- Quick prototyping and demos
Try it yourself
Complete these exercises to build your Python skills using LLM prompting:
Exercise 1: Links Report Anchor Text Extraction
Complete a Screaming Frog or Sitebulb crawl, export the internal links data to csv format.
Task: Extract all pages with anchor text that matches a specific regex pattern from a large links report and create a smaller, manageable CSV file.
LLM Prompt Type Needed: Large file processing and regex filtering prompt
A starter example:
"Write Python code to process a large links report CSV file (columns: source_url, target_url, anchor_text, link_type, domain_authority <- Edit/amend these as needed). Extract all rows where anchor_text matches the regex pattern '[YOUR_PATTERN]', handle memory efficiently for large files, and export filtered results to a new CSV. Include detailed comments explaining the regex matching and memory optimisation techniques."
This is intentionally not perfect and misses out some potentially issues. After a few attempts - asking the AI assistant for help - and you’ll get this up and running. Python can be used to complete this, but also you can use command line tools like XSV to do the same job. Keep it output focused for now don’t worry so much about the method.
Common Pitfalls to Watch Out For:
- Not handling large files efficiently (use chunking for files > 1GB)
- Forgetting to escape special regex characters in the pattern
- Not asking for case-insensitive matching options
- Not requesting validation of regex pattern before processing
- Not specifying how to handle malformed CSV rows
- Error handling
Exercise 2: Google Search Console Click Volume Analysis
For this you’ll need to download the queries and clicks data from Google Search Console, again in a CSV format.
Task: Analyse a large GSC file to create click volume buckets and count query terms by total click volume.
LLM Prompt Type Needed: GSC data aggregation and bucketing prompt
A starter example:
"Create Python code to analyse Google Search Console data (columns: query, clicks, impressions, ctr, position, date <- Delete/amend as needed). Group queries by total click volume into buckets (10-100, 100-500, 500-1000, 1000-5000, 5000+ clicks), count terms within each bucket, and create a summary table. Include detailed comments explaining the bucketing logic and aggregation methods."
Extra things to be aware of
- This can also be used to group impressions, which can also be hugely useful.
- Grouping similar queries (plurals/typos etc) can make for more meaningful outputs.
- The click bucket volumes themselves may need changing, depending on the site in question.
- Date formatting is often a pain, YYYY-MM-DD is often the easiest to work with.
Exercise 3: Keyword Deduplication with Stemming
De-duplication of similar keywords can often take a long time if done manually, this method will speed things up.
Task: Remove duplicate keywords by applying stemming/lemmatisation, removing punctuation, and identifying close variants.
LLM Prompt Type Needed: Text preprocessing and deduplication prompt
A starter example:
"Write Python code to deduplicate a keyword list using stemming/lemmatisation, punctuation removal, and close variant detection. Process columns: keyword, search_volume, competition, cpc. Remove exact duplicates, stemmed duplicates, and close variants (edit distance < 2). Export cleaned data with original and processed versions. Include detailed comments explaining the text preprocessing pipeline."
Common Pitfalls to Watch Out For:
- Not choosing appropriate stemming library (NLTK vs spaCy vs TextBlob) - you can ask the most appropriate one of the task to get your started.
- Forgetting to handle different languages and stemming rules - For now, avoid having different languages within the example OR just focus on the main language.
- Not asking for edit distance calculation for close variants - this may also take some testing to get right, but the AI assistant will get you started at the right point
- Not requesting preservation of highest-value keyword when removing duplicates - OR get it to merge the values against each, depends on where the keyword data has come from.
- Not specifying how to handle special characters and punctuation
Exercise 4: Web Access Log Bot Analysis
Sometimes the hardest task here is to actually obtain the log files, but assuming you can get them, this method will help you start to analyse them.
Task: Analyse web access logs to identify and summarise Googlebot hits with detailed statistics.
LLM Prompt Type Needed: Log file parsing and bot detection prompt
"Create Python code to parse web access logs (supply example for format) and analyse Googlebot hits. Extract: IP address, user agent, request path, status code, response time, referrer. Generate summary statistics including total hits, unique pages crawled and response code distribution. Include detailed comments explaining the log parsing and bot detection logic."
Common Pitfalls and other hints:
- Not handling different log formats
- Forgetting to validate Googlebot user agent strings (fake bots exist) - Google provides a list of valid IP address to check against
- Not asking for proper handling of log rotation and multiple files - I.e. identifying where gaps may exist
- Not requesting analysis of crawl patterns and frequency
- Not specifying how to handle malformed log entries
- It doesn’t have to be kept to Gooblebot, you can also use AI Crawlers too.
Resources
- Python Quick Reference Guide - Essential Python commands and patterns for SEO automation
- LLM Prompting Guide - Master the art of prompting LLMs for SEO tasks
- SEO Tools Integration Guide - Complete guide to integrating with major SEO tools
| Next: Module 3: APIs & JSON | Supporting materials |
Ready to automate your SEO workflow? This module provides hands-on experience with Python for SEO professionals.