A Simple Nikola Link Checker
One of the most important things when you are building a static site generator like Nikola is that your site should not be broken. So, I really should have done this earlier ;-)
This is a very simple link checker that ensures the pages Nikola generates have no broken links. I will make it part of Nikola proper once it's more polished and doit supports getting a list of targets
To try it, get it and run it from the same place where you
have your conf.py, right after you run doit.
import os
import urllib
from urlparse import urlparse
import lxml.html
def analyze(filename):
try:
# Use LXML to parse the HTML
d = lxml.html.fromstring(open(filename).read())
for l in d.iterlinks():
# Get the target link
target = l[0].attrib[l[1]]
if target == "#": # These are always valid
continue
parsed = urlparse(target)
# We only handle relative links.
# TODO: check if the URL points to inside the generated
# site and check it anyway
if parsed.scheme:
continue
# Ignore the fragment, since the link will still work
# TODO: check that the fragment is valid
if parsed.fragment:
target = target.split('#')[0]
# Calculate what file or folder this points to
target_filename = os.path.abspath(
os.path.join(os.path.dirname(filename), urllib.unquote(target)))
# Check if it exists, or report it
if not os.path.exists(target_filename):
print "In %s broken link: " % filename, target
except Exception as exc:
# Something bad happened, report
print "Error with:", filename, exc
# This is hackish: we use doit to get a list of all
# generated files. Minor modifications would let you check
# the non-generated files as well.
for task in os.popen('doit list --all', 'r').readlines():
task = task.strip()
if task.split(':')[0] in (
'render_tags',
'render_archive',
'render_galleries',
'render_indexes',
'render_pages',
'render_site') and '.html' in task:
# It looks like a generated HTML file
analyze(task.split(":")[-1])
You won't find analize in a dictionary but it means something quite different to the word you want: analyse.
Hahaha oops!