Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

toc-local

Base Code

Code Block
languagepy
import os

from hte import Html5TreeBuilder
from hte.match import *

tb = Html5TreeBuilder()
doc = tb.html()
body = doc.add(tb.body(tb.h1("Env")))
table = body.add(tb.table())
table.add([tb.tr(tb.td(k), tb.td(v)) for k, v in sorted(os.environ.items())])

Match Functions

FunctionExampleDescription
match_allMatcher(None, match_all)Match all items. Always returns True.
match_anyMatcher(x, match_any)Match x, whether x is an element or text.
match_childregexpMatcher(regexp, match_childregexp)Match element with a child with text that matches against a compiled regular expression regexp.
match_childtextMatcher(Text(txt), match_text)Match element with a child with text that matches txt.
match_elemMatcher(elem, match_elem)Match element (tag only).
match_elemallMatcher(elem, match_elemall)Match element (tag and attributes).
match_regexpMatcher(regexp, match_regexp)Match text against compiled regular expression regexp.
match_textMatcher(Text(txt), match_text)Match text against txt.

Examples

In all cases, the find methods return a generator to support efficiency of space. I.e., instead of collecting all matches into a list (which could be large if the base document is large), each match is returned individually.

If g is a generator, it can be easily converted into a list:

Code Block
languagepy
list(g)

Elements

To find the first (one) <th> element:

Code Block
languagepy
doc.find(Matcher(tb.th(), match_elem))

To find all <tr> elements (returns generator):

Code Block
languagepy
doc.findall(Matcher(tb.tr(), match_elem))

or (convert to list):

Code Block
languagepy
list(doc.findall(Matcher(tb.tr(), match_elem)))

To find all <td> elements:

Code Block
languagepy
doc.findall(Matcher(tb.td(), match_elem))

To find all instances of PYTHONPATH text:

Code Block
languagepy
doc.findall(Matcher(Text("PYTHONPATH"), match_text))

Paths

Not only can we find the elements themselves, but we can get the paths to matching elements (i.e., the elements that provide a path to the match).

To find the path to the first instance of PYTHONPATH text:

Code Block
languagepy
doc.find(Matcher(Text("PYTHONPATH"), match_text), FIND_PATH)

To get the paths of all <tr> elements found:

Code Block
languagepy
doc.findall(Matcher(tb.tr(), match_elem), FIND_PATH)

Double Find

To find the first <td> element in each row:

Code Block
languagepy
tdm = Matcher(tb.td(), match_elem)
for el in doc.findall(Matcher(tb.tr(), match_elem):
	el = el.find(tdm)
    if el:
        print(el.find(tdm))

We can also render the tree at <td> and below:

Code Block
languagepy
tdm = Matcher(tb.td(), match_elem)
for el in doc.findall(Matcher(tb.tr(), match_elem)):
    el = el.find(tdm)
    if el:
        print(el.find(tdm).render())