1# HTML [![API reference](https://img.shields.io/badge/godoc-reference-5272B4)](https://pkg.go.dev/github.com/tdewolff/parse/v2/html?tab=doc)
 2
 3This package is an HTML5 lexer written in [Go][1]. It follows the specification at [The HTML syntax](http://www.w3.org/TR/html5/syntax.html). The lexer takes an io.Reader and converts it into tokens until the EOF.
 4
 5## Installation
 6Run the following command
 7
 8	go get -u github.com/tdewolff/parse/v2/html
 9
10or add the following import and run project with `go get`
11
12	import "github.com/tdewolff/parse/v2/html"
13
14## Lexer
15### Usage
16The following initializes a new Lexer with io.Reader `r`:
17``` go
18l := html.NewLexer(parse.NewInput(r))
19```
20
21To tokenize until EOF an error, use:
22``` go
23for {
24	tt, data := l.Next()
25	switch tt {
26	case html.ErrorToken:
27		// error or EOF set in l.Err()
28		return
29	case html.StartTagToken:
30		// ...
31		for {
32			ttAttr, dataAttr := l.Next()
33			if ttAttr != html.AttributeToken {
34				break
35			}
36			// ...
37		}
38	// ...
39	}
40}
41```
42
43All tokens:
44``` go
45ErrorToken TokenType = iota // extra token when errors occur
46CommentToken
47DoctypeToken
48StartTagToken
49StartTagCloseToken
50StartTagVoidToken
51EndTagToken
52AttributeToken
53TextToken
54```
55
56### Examples
57``` go
58package main
59
60import (
61	"os"
62
63	"github.com/tdewolff/parse/v2/html"
64)
65
66// Tokenize HTML from stdin.
67func main() {
68	l := html.NewLexer(parse.NewInput(os.Stdin))
69	for {
70		tt, data := l.Next()
71		switch tt {
72		case html.ErrorToken:
73			if l.Err() != io.EOF {
74				fmt.Println("Error on line", l.Line(), ":", l.Err())
75			}
76			return
77		case html.StartTagToken:
78			fmt.Println("Tag", string(data))
79			for {
80				ttAttr, dataAttr := l.Next()
81				if ttAttr != html.AttributeToken {
82					break
83				}
84
85				key := dataAttr
86				val := l.AttrVal()
87				fmt.Println("Attribute", string(key), "=", string(val))
88			}
89		// ...
90		}
91	}
92}
93```
94
95## License
96Released under the [MIT license](https://github.com/tdewolff/parse/blob/master/LICENSE.md).
97
98[1]: http://golang.org/ "Go Language"