check_html.py
Deterministic HTTP & HTML Content Validation
check_html.py is a deterministic HTTP inspection and content‑validation tool designed for
monitoring systems, automation pipelines, and operators who require reproducible, audit‑transparent
response checks.
It performs lightweight HTTP requests, validates status codes, enforces content‑type expectations,
and verifies HTML presence using deterministic hashing and structured output.
Overview
check_html.py provides a reproducible inspection pipeline for HTTP/HTTPS endpoints.
It validates response codes, content types, body size, and HTML presence using deterministic logic
consistent across environments.
The tool is built around three principles:
- Determinism — same endpoint → same output
- Reproducibility — no environment‑dependent behavior
- Audit Transparency — explicit, structured, operator‑grade output
Key Features
- Deterministic HTTP/HTTPS request handling
- Status code validation
- Content‑Type enforcement
- HTML presence detection
- Body size and content‑hash reporting
- Backend fingerprinting (Apache, Nginx, etc.)
- JSON, verbose, and Nagios output modes
HTTP Inspection
- GET request with deterministic timeout behavior
- Explicit status code evaluation
- Content‑Type matching (e.g.,
text/html) - Server header fingerprinting
- Body length and SHA‑256 hash reporting
Deterministic Behavior
- No implicit redirects or retries
- Consistent output across all environments
- Structured JSON for automation pipelines
- Operator‑grade verbose mode for diagnostics
- Nagios‑compatible single‑line output
Architecture Overview
1. Request Layer
Performs a deterministic HTTP/HTTPS request with strict timeout and error classification.
Connection failures return explicit UNKNOWN/CRITICAL states.
2. Response Parsing Layer
Extracts status code, headers, content type, server fingerprint, and body content.
All fields are normalized for reproducibility.
3. Validation Layer
- Status code evaluation
- Content‑Type enforcement
- HTML presence detection
- Body size and hash verification
4. Output Layer
Produces deterministic JSON, verbose, or Nagios output.
All fields are explicit and audit‑transparent.
Current Status
- Version: 1.0.0
- Edition: Community Edition (public)
- Platform: Linux / Python 3.x
- Dependencies: requests
Roadmap
Near‑Term
- Improved backend fingerprinting
- Enhanced HTML validation rules
- Additional content‑type enforcement modes
Mid‑Term
- Structured response‑time metrics
- Header‑level validation
- Extended JSON schema
Long‑Term
- Full content‑validation engine
- Cross‑suite correlation with check_cert.py
Why check_html.py Exists
Traditional HTTP checkers often produce inconsistent output, rely on environment‑dependent behavior,
or provide limited visibility into response content.
check_html.py solves this by applying deterministic engineering principles:
- explicit validation
- reproducible output
- audit‑transparent reporting
- no hidden behavior or implicit assumptions
Links
- GitHub Repository
- Documentation
- Usage Examples
- Release Notes