check_html.py

Deterministic HTTP & HTML Content Validation

check_html.py is a deterministic HTTP inspection and content‑validation tool designed for monitoring systems, automation pipelines, and operators who require reproducible, audit‑transparent response checks.
It performs lightweight HTTP requests, validates status codes, enforces content‑type expectations, and verifies HTML presence using deterministic hashing and structured output.

Overview

check_html.py provides a reproducible inspection pipeline for HTTP/HTTPS endpoints.
It validates response codes, content types, body size, and HTML presence using deterministic logic consistent across environments.
The tool is built around three principles:

  1. Determinism — same endpoint → same output
  2. Reproducibility — no environment‑dependent behavior
  3. Audit Transparency — explicit, structured, operator‑grade output

Key Features

  1. Deterministic HTTP/HTTPS request handling
  2. Status code validation
  3. Content‑Type enforcement
  4. HTML presence detection
  5. Body size and content‑hash reporting
  6. Backend fingerprinting (Apache, Nginx, etc.)
  7. JSON, verbose, and Nagios output modes

HTTP Inspection

Deterministic Behavior

Architecture Overview

1. Request Layer

Performs a deterministic HTTP/HTTPS request with strict timeout and error classification.
Connection failures return explicit UNKNOWN/CRITICAL states.

2. Response Parsing Layer

Extracts status code, headers, content type, server fingerprint, and body content.
All fields are normalized for reproducibility.

3. Validation Layer

4. Output Layer

Produces deterministic JSON, verbose, or Nagios output.
All fields are explicit and audit‑transparent.

Current Status

Roadmap

Near‑Term

Mid‑Term

Long‑Term

Why check_html.py Exists

Traditional HTTP checkers often produce inconsistent output, rely on environment‑dependent behavior, or provide limited visibility into response content.
check_html.py solves this by applying deterministic engineering principles:

  1. explicit validation
  2. reproducible output
  3. audit‑transparent reporting
  4. no hidden behavior or implicit assumptions
It’s not just an HTTP checker — it’s a deterministic content‑validation engine.

Links

Related Projects