Website Link Checker

Introduction

This is a Python program that calls muffet on a whole website and then filters and displays the HTTP errors.

Note: It can take a couple of minutes to run if the website has a lot of URLs.

How the Program Exits

Exits with error code 1 if at least one error is found, as specified with --errors flag. Otherwise exits with code 0. Note that errors set as --warnings will always exit with code 0.

Program Arguments

  • url
    • The URL to scan. Please include https:// or http://. (e.g. https://google.com)
  • -h, --help
    • show this help message and exit
  • -e ERRORS [ERRORS ...], --errors ERRORS [ERRORS ...]
    • Specify one, many or all error codes to be filtered (e.g. -e 404, -e 403 404, -e all). Use -e all to show all errors.
  • -w WARNINGS [WARNINGS ...], --warnings WARNINGS [WARNINGS ...]
    • Specify one, many or all error codes to be filtered as warnings (e.g. -w 404, -w 403 404, -w all). Use -w all to show all warnings.

How to Use the Program

With Python

  • Clone the repository
    • git clone https://github.com/threefoldfoundation/website-link-checker
      
  • Change directory
    • cd website-link-checker
      
  • Run the program
    • python website-link-checker.py https://example.com -e 404 -w all
      

With Docker

You can use the following command to run the website link checker with Docker:

docker run ghcr.io/threefoldfoundation/website-link-checker https://example.com -e 404 -w all

With Github Action

The website link checker can be run as an action (e.g. action.yml) set in .github/workflows of a Github repository.

The following action example runs everytime there is a push on the development branch and also every Monday at 6:00AM as set by the cron job.

name: link-checker-example
on:
  push:
    branches: [ development ]
  schedule:
    - cron: '0 6 * * 1' # e.g. 6:00 AM each Monday

jobs:
  job_one:
    name: Check for Broken Links
    runs-on: ubuntu-latest
    steps:
      - name: Check for Broken Links
        id: link-report
        uses: docker://ghcr.io/threefoldfoundation/website-link-checker:latest
        with:
          args: 'https://example.com -e 404 -w all'
Last change: 2024-10-29