When you refactor menus, migrate content, or run a large SEO cleanup, you often need to prove that specific anchor texts on specific pages now point to the new target URLs. Doing that by hand is slow and error-prone. This article shows a clean, repeatable way to automate it using:
- Playwright (Python) — renders the real page (including JavaScript-built links)
- pytest — data-driven tests from a CSV
- pytest-html / Allure — human-friendly reports you can hand to stakeholders
What we’ll verify
For each row in a spreadsheet:
- Open the Page URL.
- Find one or more
<a>elements whose visible text matches the Anchor Text
(case/whitespace-insensitive; “exact or contains” by default). - Confirm at least one of those links resolves (after redirects) to the New Anchor URL.
- (Optional) Record the Old Anchor URL for reporting context.
Typical uses: sitewide SEO retargeting, product renames, marketing page rewrites, CMS migrations.
Project layout
qa-link-check-py/
├─ data/
│ └─ anchors.csv # your spreadsheet exported to CSV
├─ tests/
│ └─ test_links.py # the test
├─ utils/
│ └─ normalize.py # text/URL normalization helpers
├─ conftest.py # adds custom columns to pytest-html
├─ requirements.txt
├─ pytest.ini
└─ README.md
anchors.csv (sample)
Page URL,Anchor Text,Old Anchor URL,New Anchor URL
https://example.com/marketing,Small Business SEO,https://example.com/old-small-biz,https://example.com/local-seo/affordable-seo-services-for-small-business/
Install once
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m playwright install --with-deps
requirements.txt
pytest==8.2.2
playwright==1.46.0
pytest-playwright==0.5.0
pytest-html==4.1.1
allure-pytest==2.13.5 # optional
pytest.ini
[pytest]
addopts = -ra
Core logic
Normalization helpers (keep comparisons sane)
utils/normalize.py
import re
from urllib.parse import urlsplit, urlunsplit, parse_qsl, urlencode
_IGNORE = {"utm_source","utm_medium","utm_campaign","utm_term","utm_content",
"gclid","fbclid","msclkid","utm_id"}
def normalize_text(s: str) -> str:
if not s: return ""
s = s.replace("\u00A0", " ")
return re.sub(r"\s+", " ", s).strip().lower()
def normalize_url(u: str) -> str:
if not u: return ""
try:
p = urlsplit(u.strip())
scheme = p.scheme or "https"
netloc = re.sub(r":(80|443)$", "", p.netloc.lower())
path = re.sub(r"/{2,}", "/", p.path or "")
if path.endswith("/") and path != "/":
path = path[:-1]
q = [(k,v) for (k,v) in parse_qsl(p.query, keep_blank_values=True) if k not in _IGNORE]
return urlunsplit((scheme, netloc, path, urlencode(q, doseq=True), ""))
except Exception:
return u.strip()
The test (data-driven from CSV)
tests/test_links.py
import csv, os
from typing import Dict, List
import pytest, allure
from playwright.sync_api import Page, APIRequestContext
from utils.normalize import normalize_text, normalize_url
CSV_PATH = os.path.join(os.path.dirname(__file__), "..", "data", "anchors.csv")
def _load_rows(path: str) -> List[Dict[str, str]]:
if not os.path.exists(path):
raise FileNotFoundError(f"Missing {path}")
rows = []
with open(path, newline="", encoding="utf-8-sig") as f:
r = csv.DictReader(f)
for i, row in enumerate(r, start=2): # header is row 1
rows.append({
"sheet_row": i,
"page_url": (row.get("Page URL") or "").strip(),
"anchor_text": (row.get("Anchor Text") or "").strip(),
"old_anchor_url": (row.get("Old Anchor URL") or "").strip(),
"new_anchor_url": (row.get("New Anchor URL") or "").strip(),
})
return rows
TEST_ROWS = _load_rows(CSV_PATH)
@pytest.mark.parametrize("row", TEST_ROWS, ids=lambda r: f"Row_{r['sheet_row']}")
def test_anchor_targets(row: Dict[str, str], page: Page, record_property):
page_url = row["page_url"]
anchor_text = row["anchor_text"]
old_anchor = row["old_anchor_url"]
expected_url = row["new_anchor_url"]
# show these in the report details
for k, v in [("Page URL", page_url), ("Anchor Text", anchor_text),
("Old Anchor URL", old_anchor), ("New Anchor URL", expected_url)]:
record_property(k, v)
allure.dynamic.parameter(k, v)
assert page_url and anchor_text and expected_url, "CSV has empty required field(s)."
page.goto(page_url, wait_until="networkidle")
# find candidate anchors by visible text
target = normalize_text(anchor_text)
anchors = page.locator("a[href]")
candidates: List[Dict[str, str]] = []
for i in range(anchors.count()):
a = anchors.nth(i)
text = normalize_text(a.inner_text() or "")
if not text: continue
if text == target or target in text or text in target:
href = a.get_attribute("href")
if href:
abs_url = page.evaluate("(u)=>new URL(u, window.location.href).toString()", href)
candidates.append({"text": text, "href": abs_url})
allure.attach("\n".join(f"{c['text']} -> {c['href']}" for c in candidates[:20]) or "(none)",
name="candidates.txt", attachment_type=allure.attachment_type.TEXT)
assert candidates, f"No <a> with matching anchor text found on {page_url}"
# resolve redirects, compare normalized
expected_norm = normalize_url(expected_url)
matched, wrong = False, []
api: APIRequestContext = page.context.request
for c in candidates:
try:
resp = api.get(c["href"], max_redirects=10)
final_url = resp.url
except Exception:
final_url = c["href"]
final_norm = normalize_url(final_url)
if final_norm == expected_norm:
matched = True
break
wrong.append(final_norm)
allure.attach(f"expected: {expected_norm}\nwrong examples:\n" + "\n".join(dict.fromkeys(wrong))[:10000],
name="summary.txt", attachment_type=allure.attachment_type.TEXT)
if not matched:
try:
allure.attach(page.screenshot(full_page=True), "screenshot.png", allure.attachment_type.PNG)
except Exception:
pass
allure.attach(page.content(), "page.html", allure.attachment_type.HTML)
assert matched, "Anchor text found but URL mismatch.\nExamples:\n" + "\n".join(dict.fromkeys(wrong[:5]))
Make the HTML report show your four fields as columns
conftest.py
from py.xml import html
FIELDS = [("Page URL","page_url"), ("Anchor Text","anchor_text"),
("Old Anchor URL","old_anchor_url"), ("New Anchor URL","new_anchor_url")]
def pytest_itemcollected(item):
callspec = getattr(item, "callspec", None)
if not callspec: return
row = callspec.params.get("row")
if not isinstance(row, dict): return
props = list(getattr(item, "user_properties", []))
# replace any existing key; add only non-empty values
def set_prop(label, value):
nonlocal props
props = [p for p in props if p[0] != label]
if value: props.append((label, value))
for label, key in FIELDS:
set_prop(label, row.get(key, ""))
item.user_properties = props
def _get_prop(report, name: str, default=""):
val = default
for k, v in getattr(report, "user_properties", []):
if k == name and str(v).strip():
val = v
return val
def pytest_html_results_table_header(cells):
cells.insert(2, html.th("Page URL"))
cells.insert(3, html.th("Anchor Text"))
cells.insert(4, html.th("Old Anchor URL"))
cells.insert(5, html.th("New Anchor URL"))
def pytest_html_results_table_row(report, cells):
page = _get_prop(report, "Page URL")
anchor = _get_prop(report, "Anchor Text")
oldu = _get_prop(report, "Old Anchor URL")
newu = _get_prop(report, "New Anchor URL")
cells.insert(2, html.td(html.a(page, href=page)) if page else html.td(""))
cells.insert(3, html.td(anchor))
cells.insert(4, html.td(html.a(oldu, href=oldu)) if oldu else html.td(""))
cells.insert(5, html.td(html.a(newu, href=newu)) if newu else html.td(""))
def pytest_html_report_title(report):
report.title = "Anchor Link Verification Report"
Run it
All tests with HTML report:
python -m pytest --html=reports/report.html --self-contained-html -ra
Run one spreadsheet row (helpful for smoke tests):
- By node id (after
--collect-onlyto see ids):
python -m pytest 'tests/test_links.py::test_anchor_targets[chromium-Row_2]' --html=reports/report.html --self-contained-html -ra
- Or add a tiny env switch (optional):
# after TEST_ROWS = _load_rows(...)
import os
ONLY_ROW = os.getenv("ONLY_ROW", "")
if ONLY_ROW:
TEST_ROWS = [r for r in TEST_ROWS if str(r["sheet_row"]) == ONLY_ROW]
Then run: ONLY_ROW=2 python -m pytest --html=reports/report.html --self-contained-html -ra
CI example (GitHub Actions)
.github/workflows/anchor-check.yml
name: Anchor Link Check (Py)
on:
workflow_dispatch:
schedule: [{ cron: "0 3 * * 1" }]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.11" }
- run: python -m venv .venv && . .venv/bin/activate && pip install -r requirements.txt
- run: python -m playwright install --with-deps
- run: . .venv/bin/activate && python -m pytest --html=reports/report.html --self-contained-html -ra
- uses: actions/upload-artifact@v4
if: always()
with:
name: pytest-html
path: reports/
Matching policy & useful tweaks
- Text matching: current rule is exact OR contains both ways after lowercasing and space-collapsing.
Make it stricter (exact only) by changing:
if text == target:
...
- Hidden links: skip non-visible anchors:
if not a.is_visible(): continue
- Redirect policy: we follow up to 10 redirects and compare the final URL.
- URL normalization: ignores trailing slash differences and tracking params like
utm_*,gclid, etc. Add or remove keys in_IGNOREas needed. - Flake control: retry once at the pytest level with
-n auto(viapytest-xdist) if you parallelize later.
Troubleshooting
- “pytest not found” → run inside your venv, or use
python -m pytest …. --headed=falseerror →--headedis a flag (no value). Use--headedor omit it.- No tests run with
-k "Row 2"→-kparser dislikes spaces; use node id or theONLY_ROWenv var. - Blank columns in HTML summary → ensure
conftest.pyis present; it injects user properties at collection time so pytest-html can render them. - JS-built links missing → Playwright already renders JS; ensure you use
wait_until="networkidle"and scana[href]after page settles.
Why this approach works well
- Auditable: each spreadsheet row becomes a test with evidence (trace, screenshot, DOM).
- Scalable: add more rows—no code change.
- Portable: CSV in, HTML report out; easy for SEO/content teams to review.
- CI-friendly: one workflow step gives you a weekly compliance check on links.
Optional: .gitignore
__pycache__/
*.py[cod]
.venv/
.pytest_cache/
reports/
playwright-report/
allure-results/
allure-report/
test-results/
.DS_Store
Thumbs.db
.vscode/
.idea/
data/*.csv
!data/anchors.sample.csv
That’s the whole flow. Drop your CSV, run the tests, and ship a clear report showing exactly which anchors point where the links don’t work as expected—no manual crawling, no guesswork.

Leave a comment