hstrip
Leest een html-document op stdin, een geeft html met expliciet toegestane elementen en attributen terug.
Help
Zie hstrip -h
:
Usage: hstrip [options]
Strict filter html and allow only specified elements and attributes.
When allowing attributes on an element, the element is implicitly allowed too.
Ie. when allowing 'a.href', the 'a'-element is allowe too and need not be specified as
an allowd element.
Depending on the allowed html-elements, the output might be valid HTML.
example:
hstrip -e a,table,tr,th,td -a a.href < test.html
Options:
--element, -e allowed html elements (allows multiple)
--attribute, -a allowed element attributes ie: td.colspan (allows multiple)
--help, -h display help
Voorbeeld
Inhoud van test.html
:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
<title></title>
<meta name="generator" content="LibreOffice 7.3.6.2 (Linux)"/>
<meta name="created" content="2022-11-09T09:10:02.167075155"/>
<meta name="changed" content="2022-11-09T09:10:41.309097496"/>
<style type="text/css">
body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small }
a.comment-indicator:hover + comment { background:#ffd; position:absolute; display:block; border:1px solid black; padding:0.5em; }
a.comment-indicator { background:red; display:inline-block; border:1px solid black; width:0.5em; height:0.5em; }
comment { display:none; }
</style>
</head>
<body>
<table cellspacing="0" border="0">
<colgroup span="6" width="85"></colgroup>
<tr>
<td height="17" align="left"><br></td>
<td align="left"><br></td>
<td align="left"><font face="LKLUG">asdf</font></td>
<td align="left"><br></td>
<td align="left"><br></td>
<td align="left"><br></td>
</tr>
...
hstrip -e a,table,tr,th,td < test.html
Output:
<table>
<tr>
<td><br></td>
<td><br></td>
<td>asdf</td>
<td><br></td>
<td><br></td>
<td><br></td>
</tr>
<tr>
...