SEO Tools Integration Guide

SEO Tools Integration Guide

APIs, data formats, and integration patterns for major SEO tools. Part of the Building Agentic SEO Consultants course. Use this with Module 3: APIs & JSON (calling APIs and combining data), Module 4: Building SEO Agents (tools agents can use), and Module 6: Deployment and Scaling (production API usage and secrets). For prompt patterns when asking an LLM to integrate these tools, see the LLM Prompting Guide.

Google Search Console Integration

Data Export Formats

  • CSV Exports: Query, page, country, device data
  • API Access: Search Analytics API for real-time data
  • Common Columns: query, page, clicks, impressions, ctr, position

Python Integration

# GSC CSV Processing
import pandas as pd

def process_gsc_data(file_path):
    df = pd.read_csv(file_path)
    df['date'] = pd.to_datetime(df['date'])
    df['ctr'] = (df['clicks'] / df['impressions'] * 100).round(2)
    return df

# GSC API Integration
from google.oauth2 import service_account
from googleapiclient.discovery import build

def get_gsc_data(property_url, start_date, end_date):
    # API implementation here
    pass

Common GSC Data Issues

  • Date format variations
  • Missing data in exports
  • Device/country filtering
  • Large dataset handling

Ahrefs Integration

Data Export Formats

  • Keyword Explorer: Keyword difficulty, search volume, CPC
  • Site Explorer: Backlink data, referring domains
  • Content Explorer: Content performance metrics

Python Processing

# Ahrefs CSV Processing
def process_ahrefs_keywords(file_path):
    df = pd.read_csv(file_path)
    # Handle Ahrefs-specific column names
    df = df.rename(columns={
        'Keyword': 'keyword',
        'Volume': 'search_volume',
        'KD': 'keyword_difficulty'
    })
    return df

Ahrefs API Integration

# Ahrefs API Example
import requests

def get_ahrefs_data(api_key, endpoint, params):
    headers = {'Authorization': f'Bearer {api_key}'}
    response = requests.get(f'https://apiv2.ahrefs.com/{endpoint}', 
                          headers=headers, params=params)
    return response.json()

SEMrush Integration

Data Export Formats

  • Keyword Analytics: Search volume, competition, trends
  • Domain Analytics: Organic traffic, backlinks, competitors
  • Position Tracking: Ranking data, SERP features

Python Processing

# SEMrush CSV Processing
def process_semrush_data(file_path):
    df = pd.read_csv(file_path, encoding='utf-8')
    # Handle SEMrush encoding issues
    df = df.dropna(subset=['Keyword'])
    return df

SEMrush API Integration

# SEMrush API Example
def get_semrush_data(api_key, report_type, domain):
    url = f'https://api.semrush.com/'
    params = {
        'key': api_key,
        'type': report_type,
        'domain': domain
    }
    response = requests.get(url, params=params)
    return response.text

Other SEO Tools Integration

Screaming Frog

# Process Screaming Frog CSV exports
def process_screaming_frog(file_path):
    df = pd.read_csv(file_path)
    # Filter for important issues
    issues = df[df['Status Code'] != 200]
    return issues

Majestic

# Majestic API Integration
def get_majestic_data(api_key, domain):
    url = 'https://api.majestic.com/api/json'
    params = {
        'app_api_key': api_key,
        'cmd': 'GetBackLinkData',
        'item': domain
    }
    response = requests.get(url, params=params)
    return response.json()

Data Integration Patterns

1. Multi-Source Data Merging

def merge_seo_data(gsc_data, ahrefs_data, semrush_data):
    # Merge data from multiple sources
    merged = pd.merge(gsc_data, ahrefs_data, on='keyword', how='outer')
    merged = pd.merge(merged, semrush_data, on='keyword', how='outer')
    return merged

2. Data Validation

def validate_seo_data(df, required_columns):
    missing_cols = set(required_columns) - set(df.columns)
    if missing_cols:
        raise ValueError(f"Missing columns: {missing_cols}")
    return True

3. Data Normalization

def normalize_seo_data(df):
    # Standardize column names
    df.columns = df.columns.str.lower().str.replace(' ', '_')
    # Convert data types
    df['clicks'] = pd.to_numeric(df['clicks'], errors='coerce')
    return df

API Rate Limiting and Best Practices

Rate Limiting

import time
from functools import wraps

def rate_limit(calls_per_minute):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            time.sleep(60 / calls_per_minute)
            return func(*args, **kwargs)
        return wrapper
    return decorator

@rate_limit(60)  # 60 calls per minute
def api_call():
    pass

Error Handling

def safe_api_call(api_function, max_retries=3):
    for attempt in range(max_retries):
        try:
            return api_function()
        except Exception as e:
            if attempt == max_retries - 1:
                raise e
            time.sleep(2 ** attempt)  # Exponential backoff

Data Storage and Management

Database Integration

import sqlite3

def store_seo_data(data, table_name):
    conn = sqlite3.connect('seo_data.db')
    data.to_sql(table_name, conn, if_exists='replace', index=False)
    conn.close()

Cloud Storage

import boto3

def upload_to_s3(data, bucket, key):
    s3 = boto3.client('s3')
    data.to_csv(f's3://{bucket}/{key}', index=False)

Automation Workflows

Scheduled Data Collection

import schedule
import time

def collect_seo_data():
    # Collect data from all sources
    gsc_data = get_gsc_data()
    ahrefs_data = get_ahrefs_data()
    # Process and store
    process_and_store(gsc_data, ahrefs_data)

# Schedule daily collection
schedule.every().day.at("09:00").do(collect_seo_data)

Real-time Monitoring

def monitor_rankings(keywords):
    for keyword in keywords:
        current_position = get_current_position(keyword)
        if current_position > 10:  # Alert if ranking drops
            send_alert(f"Ranking drop for {keyword}")

Common Integration Challenges

1. Data Format Variations

  • Different CSV formats across tools
  • Encoding issues (UTF-8 vs Latin-1)
  • Date format inconsistencies
  • Column name variations

2. API Limitations

  • Rate limiting
  • Data freshness
  • Historical data access
  • Cost considerations

3. Data Quality Issues

  • Missing values
  • Inconsistent metrics
  • Duplicate entries
  • Outdated information

Best Practices

  1. Standardise data formats: Create common data schemas
  2. Implement Error Handling: Handle API failures gracefully
  3. Cache Data: Store data locally to reduce API calls
  4. Monitor API Usage: Track rate limits and costs
  5. Validate Data: Check data quality before processing
  6. Document Integrations: Keep track of API changes
  7. Test Regularly: Verify integrations work correctly

This guide supports Module 3: APIs & JSON, Module 4: Building SEO Agents, Module 5: Data Analysis, and Module 6: Deployment and Scaling.



You Might Also Like