Ir al contenido principal

Ralsina.Me — El sitio web de Roberto Alsina

Un Verificador de links simple para Nikola

Es­te script es un muy sim­ple ve­ri­fi­ca­dor de li­nks que se ase­gu­ra que las pá­gi­nas que Niko­la ge­ne­ra no ten­gan li­nks ro­to­s. Va a ser par­te de Niko­la pro­pia­men­te di­cho una vez que es­té más pu­li­do y doit so­por­te lis­tar los tar­ge­ts

Para probarlo, bajalo y ejecutálo desde el mismo lugar donde está tu conf.py, inmediatamente después de un doit.

import os
import urllib
from urlparse import urlparse

import lxml.html

def analyze(filename):
    try:
        # Use LXML to parse the HTML
        d = lxml.html.fromstring(open(filename).read())
        for l in d.iterlinks():
            # Get the target link
            target = l[0].attrib[l[1]]
            if target == "#":  # These are always valid
                continue
            parsed = urlparse(target)
            # We only handle relative links.
            # TODO: check if the URL points to inside the generated
            # site and check it anyway
            if parsed.scheme:
                continue
            # Ignore the fragment, since the link will still work
            # TODO: check that the fragment is valid
            if parsed.fragment:
                target = target.split('#')[0]
            # Calculate what file or folder this points to
            target_filename = os.path.abspath(
                os.path.join(os.path.dirname(filename), urllib.unquote(target)))
            # Check if it exists, or report it
            if not os.path.exists(target_filename):
                print "In %s broken link: " % filename, target
    except Exception as exc:
        # Something bad happened, report
        print "Error with:", filename, exc

# This is hackish: we use doit to get a list of all
# generated files. Minor modifications would let you check
# the non-generated files as well.

for task in os.popen('doit list --all', 'r').readlines():
    task = task.strip()
    if task.split(':')[0] in (
        'render_tags',
        'render_archive',
        'render_galleries',
        'render_indexes',
        'render_pages',
        'render_site') and '.html' in task:
            # It looks like a generated HTML file
            analyze(task.split(":")[-1])
David Buxton / 2012-06-30 10:39:

You won't find analize in a dictionary but it means something quite different to the word you want: analyse.

Roberto Alsina / 2012-06-30 12:30:

Hahaha oops!


Contents © 2000-2023 Roberto Alsina