1# HTML [](https://pkg.go.dev/github.com/tdewolff/parse/v2/html?tab=doc)
2
3This package is an HTML5 lexer written in [Go][1]. It follows the specification at [The HTML syntax](http://www.w3.org/TR/html5/syntax.html). The lexer takes an io.Reader and converts it into tokens until the EOF.
4
5## Installation
6Run the following command
7
8 go get -u github.com/tdewolff/parse/v2/html
9
10or add the following import and run project with `go get`
11
12 import "github.com/tdewolff/parse/v2/html"
13
14## Lexer
15### Usage
16The following initializes a new Lexer with io.Reader `r`:
17``` go
18l := html.NewLexer(parse.NewInput(r))
19```
20
21To tokenize until EOF an error, use:
22``` go
23for {
24 tt, data := l.Next()
25 switch tt {
26 case html.ErrorToken:
27 // error or EOF set in l.Err()
28 return
29 case html.StartTagToken:
30 // ...
31 for {
32 ttAttr, dataAttr := l.Next()
33 if ttAttr != html.AttributeToken {
34 break
35 }
36 // ...
37 }
38 // ...
39 }
40}
41```
42
43All tokens:
44``` go
45ErrorToken TokenType = iota // extra token when errors occur
46CommentToken
47DoctypeToken
48StartTagToken
49StartTagCloseToken
50StartTagVoidToken
51EndTagToken
52AttributeToken
53TextToken
54```
55
56### Examples
57``` go
58package main
59
60import (
61 "os"
62
63 "github.com/tdewolff/parse/v2/html"
64)
65
66// Tokenize HTML from stdin.
67func main() {
68 l := html.NewLexer(parse.NewInput(os.Stdin))
69 for {
70 tt, data := l.Next()
71 switch tt {
72 case html.ErrorToken:
73 if l.Err() != io.EOF {
74 fmt.Println("Error on line", l.Line(), ":", l.Err())
75 }
76 return
77 case html.StartTagToken:
78 fmt.Println("Tag", string(data))
79 for {
80 ttAttr, dataAttr := l.Next()
81 if ttAttr != html.AttributeToken {
82 break
83 }
84
85 key := dataAttr
86 val := l.AttrVal()
87 fmt.Println("Attribute", string(key), "=", string(val))
88 }
89 // ...
90 }
91 }
92}
93```
94
95## License
96Released under the [MIT license](https://github.com/tdewolff/parse/blob/master/LICENSE.md).
97
98[1]: http://golang.org/ "Go Language"