annotate goodreads/simple_html_dom.php @ 20:7679346abfdb

get code and use it
author Charlie Root
date Thu, 25 Oct 2018 09:40:25 -0400
parents 077b0a0a3e6d
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
6
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1 <?php
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
2 /**
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
3 * Website: http://sourceforge.net/projects/simplehtmldom/
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
4 * Acknowledge: Jose Solorzano (https://sourceforge.net/projects/php-html/)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
5 * Contributions by:
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
6 * Yousuke Kumakura (Attribute filters)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
7 * Vadim Voituk (Negative indexes supports of "find" method)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
8 * Antcs (Constructor with automatically load contents either text or file/url)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
9 *
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
10 * all affected sections have comments starting with "PaperG"
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
11 *
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
12 * Paperg - Added case insensitive testing of the value of the selector.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
13 * Paperg - Added tag_start for the starting index of tags - NOTE: This works but not accurately.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
14 * This tag_start gets counted AFTER \r\n have been crushed out, and after the remove_noice calls so it will not reflect the REAL position of the tag in the source,
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
15 * it will almost always be smaller by some amount.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
16 * We use this to determine how far into the file the tag in question is. This "percentage will never be accurate as the $dom->size is the "real" number of bytes the dom was created from.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
17 * but for most purposes, it's a really good estimation.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
18 * Paperg - Added the forceTagsClosed to the dom constructor. Forcing tags closed is great for malformed html, but it CAN lead to parsing errors.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
19 * Allow the user to tell us how much they trust the html.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
20 * Paperg add the text and plaintext to the selectors for the find syntax. plaintext implies text in the innertext of a node. text implies that the tag is a text node.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
21 * This allows for us to find tags based on the text they contain.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
22 * Create find_ancestor_tag to see if a tag is - at any level - inside of another specific tag.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
23 * Paperg: added parse_charset so that we know about the character set of the source document.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
24 * NOTE: If the user's system has a routine called get_last_retrieve_url_contents_content_type availalbe, we will assume it's returning the content-type header from the
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
25 * last transfer or curl_exec, and we will parse that and use it in preference to any other method of charset detection.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
26 *
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
27 * Licensed under The MIT License
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
28 * Redistributions of files must retain the above copyright notice.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
29 *
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
30 * @author S.C. Chen <me578022@gmail.com>
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
31 * @author John Schlick
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
32 * @author Rus Carroll
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
33 * @version 1.11 ($Rev: 184 $)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
34 * @package PlaceLocalInclude
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
35 * @subpackage simple_html_dom
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
36 */
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
37
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
38 /**
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
39 * All of the Defines for the classes below.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
40 * @author S.C. Chen <me578022@gmail.com>
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
41 */
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
42 define('HDOM_TYPE_ELEMENT', 1);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
43 define('HDOM_TYPE_COMMENT', 2);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
44 define('HDOM_TYPE_TEXT', 3);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
45 define('HDOM_TYPE_ENDTAG', 4);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
46 define('HDOM_TYPE_ROOT', 5);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
47 define('HDOM_TYPE_UNKNOWN', 6);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
48 define('HDOM_QUOTE_DOUBLE', 0);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
49 define('HDOM_QUOTE_SINGLE', 1);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
50 define('HDOM_QUOTE_NO', 3);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
51 define('HDOM_INFO_BEGIN', 0);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
52 define('HDOM_INFO_END', 1);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
53 define('HDOM_INFO_QUOTE', 2);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
54 define('HDOM_INFO_SPACE', 3);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
55 define('HDOM_INFO_TEXT', 4);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
56 define('HDOM_INFO_INNER', 5);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
57 define('HDOM_INFO_OUTER', 6);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
58 define('HDOM_INFO_ENDSPACE',7);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
59 define('DEFAULT_TARGET_CHARSET', 'UTF-8');
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
60 define('DEFAULT_BR_TEXT', "\r\n");
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
61 // helper functions
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
62 // -----------------------------------------------------------------------------
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
63 // get html dom from file
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
64 // $maxlen is defined in the code as PHP_STREAM_COPY_ALL which is defined as -1.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
65 function file_get_html($url, $use_include_path = false, $context=null, $offset = -1, $maxLen=-1, $lowercase = true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
66 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
67 // We DO force the tags to be terminated.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
68 $dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $defaultBRText);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
69 // For sourceforge users: uncomment the next line and comment the retreive_url_contents line 2 lines down if it is not already done.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
70 $contents = file_get_contents($url, $use_include_path, $context, $offset);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
71 // Paperg - use our own mechanism for getting the contents as we want to control the timeout.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
72 // $contents = retrieve_url_contents($url);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
73 if (empty($contents))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
74 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
75 return false;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
76 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
77 // The second parameter can force the selectors to all be lowercase.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
78 $dom->load($contents, $lowercase, $stripRN);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
79 return $dom;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
80 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
81
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
82 // get html dom from string
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
83 function str_get_html($str, $lowercase=true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
84 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
85 $dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $defaultBRText);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
86 if (empty($str))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
87 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
88 $dom->clear();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
89 return false;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
90 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
91 $dom->load($str, $lowercase, $stripRN);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
92 return $dom;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
93 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
94
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
95 // dump html dom tree
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
96 function dump_html_tree($node, $show_attr=true, $deep=0)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
97 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
98 $node->dump($node);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
99 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
100
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
101 /**
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
102 * simple html dom node
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
103 * PaperG - added ability for "find" routine to lowercase the value of the selector.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
104 * PaperG - added $tag_start to track the start position of the tag in the total byte index
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
105 *
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
106 * @package PlaceLocalInclude
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
107 */
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
108 class simple_html_dom_node {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
109 public $nodetype = HDOM_TYPE_TEXT;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
110 public $tag = 'text';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
111 public $attr = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
112 public $children = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
113 public $nodes = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
114 public $parent = null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
115 public $_ = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
116 public $tag_start = 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
117 private $dom = null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
118
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
119 function __construct($dom)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
120 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
121 $this->dom = $dom;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
122 $dom->nodes[] = $this;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
123 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
124
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
125 function __destruct()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
126 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
127 $this->clear();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
128 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
129
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
130 function __toString()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
131 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
132 return $this->outertext();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
133 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
134
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
135 // clean up memory due to php5 circular references memory leak...
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
136 function clear()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
137 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
138 $this->dom = null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
139 $this->nodes = null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
140 $this->parent = null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
141 $this->children = null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
142 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
143
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
144 // dump node's tree
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
145 function dump($show_attr=true, $deep=0)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
146 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
147 $lead = str_repeat(' ', $deep);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
148
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
149 echo $lead.$this->tag;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
150 if ($show_attr && count($this->attr)>0)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
151 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
152 echo '(';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
153 foreach ($this->attr as $k=>$v)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
154 echo "[$k]=>\"".$this->$k.'", ';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
155 echo ')';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
156 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
157 echo "\n";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
158
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
159 foreach ($this->nodes as $c)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
160 $c->dump($show_attr, $deep+1);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
161 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
162
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
163
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
164 // Debugging function to dump a single dom node with a bunch of information about it.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
165 function dump_node()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
166 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
167 echo $this->tag;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
168 if (count($this->attr)>0)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
169 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
170 echo '(';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
171 foreach ($this->attr as $k=>$v)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
172 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
173 echo "[$k]=>\"".$this->$k.'", ';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
174 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
175 echo ')';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
176 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
177 if (count($this->attr)>0)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
178 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
179 echo ' $_ (';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
180 foreach ($this->_ as $k=>$v)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
181 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
182 if (is_array($v))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
183 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
184 echo "[$k]=>(";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
185 foreach ($v as $k2=>$v2)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
186 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
187 echo "[$k2]=>\"".$v2.'", ';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
188 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
189 echo ")";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
190 } else {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
191 echo "[$k]=>\"".$v.'", ';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
192 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
193 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
194 echo ")";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
195 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
196
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
197 if (isset($this->text))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
198 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
199 echo " text: (" . $this->text . ")";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
200 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
201
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
202 echo " children: " . count($this->children);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
203 echo " nodes: " . count($this->nodes);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
204 echo " tag_start: " . $this->tag_start;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
205 echo "\n";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
206
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
207 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
208
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
209 // returns the parent of node
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
210 function parent()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
211 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
212 return $this->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
213 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
214
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
215 // returns children of node
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
216 function children($idx=-1)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
217 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
218 if ($idx===-1) return $this->children;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
219 if (isset($this->children[$idx])) return $this->children[$idx];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
220 return null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
221 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
222
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
223 // returns the first child of node
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
224 function first_child()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
225 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
226 if (count($this->children)>0) return $this->children[0];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
227 return null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
228 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
229
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
230 // returns the last child of node
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
231 function last_child()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
232 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
233 if (($count=count($this->children))>0) return $this->children[$count-1];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
234 return null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
235 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
236
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
237 // returns the next sibling of node
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
238 function next_sibling()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
239 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
240 if ($this->parent===null) return null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
241 $idx = 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
242 $count = count($this->parent->children);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
243 while ($idx<$count && $this!==$this->parent->children[$idx])
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
244 ++$idx;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
245 if (++$idx>=$count) return null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
246 return $this->parent->children[$idx];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
247 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
248
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
249 // returns the previous sibling of node
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
250 function prev_sibling()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
251 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
252 if ($this->parent===null) return null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
253 $idx = 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
254 $count = count($this->parent->children);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
255 while ($idx<$count && $this!==$this->parent->children[$idx])
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
256 ++$idx;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
257 if (--$idx<0) return null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
258 return $this->parent->children[$idx];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
259 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
260
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
261 // function to locate a specific ancestor tag in the path to the root.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
262 function find_ancestor_tag($tag)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
263 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
264 global $debugObject;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
265 if (is_object($debugObject))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
266 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
267 $debugObject->debugLogEntry(1);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
268 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
269
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
270 // Start by including ourselves in the comparison.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
271 $returnDom = $this;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
272
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
273 while (!is_null($returnDom))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
274 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
275 if (is_object($debugObject))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
276 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
277 $debugObject->debugLog(2, "Current tag is: " . $returnDom->tag);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
278 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
279
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
280 if ($returnDom->tag == $tag)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
281 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
282 break;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
283 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
284 $returnDom = $returnDom->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
285 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
286 return $returnDom;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
287 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
288
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
289 // get dom node's inner html
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
290 function innertext()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
291 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
292 if (isset($this->_[HDOM_INFO_INNER])) return $this->_[HDOM_INFO_INNER];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
293 if (isset($this->_[HDOM_INFO_TEXT])) return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
294
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
295 $ret = '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
296 foreach ($this->nodes as $n)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
297 $ret .= $n->outertext();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
298 return $ret;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
299 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
300
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
301 // get dom node's outer text (with tag)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
302 function outertext()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
303 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
304 global $debugObject;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
305 if (is_object($debugObject))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
306 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
307 $text = '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
308 if ($this->tag == 'text')
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
309 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
310 if (!empty($this->text))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
311 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
312 $text = " with text: " . $this->text;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
313 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
314 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
315 $debugObject->debugLog(1, 'Innertext of tag: ' . $this->tag . $text);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
316 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
317
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
318 if ($this->tag==='root') return $this->innertext();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
319
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
320 // trigger callback
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
321 if ($this->dom && $this->dom->callback!==null)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
322 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
323 call_user_func_array($this->dom->callback, array($this));
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
324 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
325
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
326 if (isset($this->_[HDOM_INFO_OUTER])) return $this->_[HDOM_INFO_OUTER];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
327 if (isset($this->_[HDOM_INFO_TEXT])) return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
328
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
329 // render begin tag
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
330 if ($this->dom && $this->dom->nodes[$this->_[HDOM_INFO_BEGIN]])
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
331 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
332 $ret = $this->dom->nodes[$this->_[HDOM_INFO_BEGIN]]->makeup();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
333 } else {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
334 $ret = "";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
335 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
336
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
337 // render inner text
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
338 if (isset($this->_[HDOM_INFO_INNER]))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
339 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
340 // If it's a br tag... don't return the HDOM_INNER_INFO that we may or may not have added.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
341 if ($this->tag != "br")
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
342 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
343 $ret .= $this->_[HDOM_INFO_INNER];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
344 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
345 } else {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
346 if ($this->nodes)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
347 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
348 foreach ($this->nodes as $n)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
349 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
350 $ret .= $this->convert_text($n->outertext());
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
351 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
352 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
353 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
354
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
355 // render end tag
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
356 if (isset($this->_[HDOM_INFO_END]) && $this->_[HDOM_INFO_END]!=0)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
357 $ret .= '</'.$this->tag.'>';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
358 return $ret;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
359 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
360
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
361 // get dom node's plain text
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
362 function text()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
363 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
364 if (isset($this->_[HDOM_INFO_INNER])) return $this->_[HDOM_INFO_INNER];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
365 switch ($this->nodetype)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
366 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
367 case HDOM_TYPE_TEXT: return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
368 case HDOM_TYPE_COMMENT: return '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
369 case HDOM_TYPE_UNKNOWN: return '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
370 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
371 if (strcasecmp($this->tag, 'script')===0) return '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
372 if (strcasecmp($this->tag, 'style')===0) return '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
373
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
374 $ret = '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
375 // In rare cases, (always node type 1 or HDOM_TYPE_ELEMENT - observed for some span tags, and some p tags) $this->nodes is set to NULL.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
376 // NOTE: This indicates that there is a problem where it's set to NULL without a clear happening.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
377 // WHY is this happening?
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
378 if (!is_null($this->nodes))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
379 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
380 foreach ($this->nodes as $n)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
381 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
382 $ret .= $this->convert_text($n->text());
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
383 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
384 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
385 return $ret;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
386 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
387
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
388 function xmltext()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
389 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
390 $ret = $this->innertext();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
391 $ret = str_ireplace('<![CDATA[', '', $ret);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
392 $ret = str_replace(']]>', '', $ret);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
393 return $ret;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
394 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
395
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
396 // build node's text with tag
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
397 function makeup()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
398 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
399 // text, comment, unknown
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
400 if (isset($this->_[HDOM_INFO_TEXT])) return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
401
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
402 $ret = '<'.$this->tag;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
403 $i = -1;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
404
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
405 foreach ($this->attr as $key=>$val)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
406 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
407 ++$i;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
408
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
409 // skip removed attribute
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
410 if ($val===null || $val===false)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
411 continue;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
412
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
413 $ret .= $this->_[HDOM_INFO_SPACE][$i][0];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
414 //no value attr: nowrap, checked selected...
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
415 if ($val===true)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
416 $ret .= $key;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
417 else {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
418 switch ($this->_[HDOM_INFO_QUOTE][$i])
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
419 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
420 case HDOM_QUOTE_DOUBLE: $quote = '"'; break;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
421 case HDOM_QUOTE_SINGLE: $quote = '\''; break;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
422 default: $quote = '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
423 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
424 $ret .= $key.$this->_[HDOM_INFO_SPACE][$i][1].'='.$this->_[HDOM_INFO_SPACE][$i][2].$quote.$val.$quote;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
425 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
426 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
427 $ret = $this->dom->restore_noise($ret);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
428 return $ret . $this->_[HDOM_INFO_ENDSPACE] . '>';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
429 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
430
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
431 // find elements by css selector
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
432 //PaperG - added ability for find to lowercase the value of the selector.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
433 function find($selector, $idx=null, $lowercase=false)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
434 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
435 $selectors = $this->parse_selector($selector);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
436 if (($count=count($selectors))===0) return array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
437 $found_keys = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
438
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
439 // find each selector
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
440 for ($c=0; $c<$count; ++$c)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
441 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
442 // The change on the below line was documented on the sourceforge code tracker id 2788009
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
443 // used to be: if (($levle=count($selectors[0]))===0) return array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
444 if (($levle=count($selectors[$c]))===0) return array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
445 if (!isset($this->_[HDOM_INFO_BEGIN])) return array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
446
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
447 $head = array($this->_[HDOM_INFO_BEGIN]=>1);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
448
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
449 // handle descendant selectors, no recursive!
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
450 for ($l=0; $l<$levle; ++$l)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
451 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
452 $ret = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
453 foreach ($head as $k=>$v)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
454 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
455 $n = ($k===-1) ? $this->dom->root : $this->dom->nodes[$k];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
456 //PaperG - Pass this optional parameter on to the seek function.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
457 $n->seek($selectors[$c][$l], $ret, $lowercase);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
458 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
459 $head = $ret;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
460 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
461
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
462 foreach ($head as $k=>$v)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
463 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
464 if (!isset($found_keys[$k]))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
465 $found_keys[$k] = 1;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
466 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
467 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
468
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
469 // sort keys
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
470 ksort($found_keys);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
471
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
472 $found = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
473 foreach ($found_keys as $k=>$v)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
474 $found[] = $this->dom->nodes[$k];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
475
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
476 // return nth-element or array
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
477 if (is_null($idx)) return $found;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
478 else if ($idx<0) $idx = count($found) + $idx;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
479 return (isset($found[$idx])) ? $found[$idx] : null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
480 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
481
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
482 // seek for given conditions
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
483 // PaperG - added parameter to allow for case insensitive testing of the value of a selector.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
484 protected function seek($selector, &$ret, $lowercase=false)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
485 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
486 global $debugObject;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
487 if (is_object($debugObject))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
488 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
489 $debugObject->debugLogEntry(1);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
490 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
491
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
492 list($tag, $key, $val, $exp, $no_key) = $selector;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
493
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
494 // xpath index
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
495 if ($tag && $key && is_numeric($key))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
496 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
497 $count = 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
498 foreach ($this->children as $c)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
499 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
500 if ($tag==='*' || $tag===$c->tag) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
501 if (++$count==$key) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
502 $ret[$c->_[HDOM_INFO_BEGIN]] = 1;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
503 return;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
504 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
505 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
506 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
507 return;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
508 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
509
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
510 $end = (!empty($this->_[HDOM_INFO_END])) ? $this->_[HDOM_INFO_END] : 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
511 if ($end==0) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
512 $parent = $this->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
513 while (!isset($parent->_[HDOM_INFO_END]) && $parent!==null) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
514 $end -= 1;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
515 $parent = $parent->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
516 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
517 $end += $parent->_[HDOM_INFO_END];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
518 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
519
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
520 for ($i=$this->_[HDOM_INFO_BEGIN]+1; $i<$end; ++$i) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
521 $node = $this->dom->nodes[$i];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
522
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
523 $pass = true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
524
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
525 if ($tag==='*' && !$key) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
526 if (in_array($node, $this->children, true))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
527 $ret[$i] = 1;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
528 continue;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
529 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
530
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
531 // compare tag
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
532 if ($tag && $tag!=$node->tag && $tag!=='*') {$pass=false;}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
533 // compare key
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
534 if ($pass && $key) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
535 if ($no_key) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
536 if (isset($node->attr[$key])) $pass=false;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
537 } else {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
538 if (($key != "plaintext") && !isset($node->attr[$key])) $pass=false;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
539 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
540 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
541 // compare value
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
542 if ($pass && $key && $val && $val!=='*') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
543 // If they have told us that this is a "plaintext" search then we want the plaintext of the node - right?
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
544 if ($key == "plaintext") {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
545 // $node->plaintext actually returns $node->text();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
546 $nodeKeyValue = $node->text();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
547 } else {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
548 // this is a normal search, we want the value of that attribute of the tag.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
549 $nodeKeyValue = $node->attr[$key];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
550 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
551 if (is_object($debugObject)) {$debugObject->debugLog(2, "testing node: " . $node->tag . " for attribute: " . $key . $exp . $val . " where nodes value is: " . $nodeKeyValue);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
552
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
553 //PaperG - If lowercase is set, do a case insensitive test of the value of the selector.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
554 if ($lowercase) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
555 $check = $this->match($exp, strtolower($val), strtolower($nodeKeyValue));
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
556 } else {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
557 $check = $this->match($exp, $val, $nodeKeyValue);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
558 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
559 if (is_object($debugObject)) {$debugObject->debugLog(2, "after match: " . ($check ? "true" : "false"));}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
560
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
561 // handle multiple class
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
562 if (!$check && strcasecmp($key, 'class')===0) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
563 foreach (explode(' ',$node->attr[$key]) as $k) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
564 // Without this, there were cases where leading, trailing, or double spaces lead to our comparing blanks - bad form.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
565 if (!empty($k)) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
566 if ($lowercase) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
567 $check = $this->match($exp, strtolower($val), strtolower($k));
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
568 } else {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
569 $check = $this->match($exp, $val, $k);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
570 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
571 if ($check) break;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
572 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
573 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
574 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
575 if (!$check) $pass = false;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
576 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
577 if ($pass) $ret[$i] = 1;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
578 unset($node);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
579 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
580 // It's passed by reference so this is actually what this function returns.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
581 if (is_object($debugObject)) {$debugObject->debugLog(1, "EXIT - ret: ", $ret);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
582 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
583
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
584 protected function match($exp, $pattern, $value) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
585 global $debugObject;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
586 if (is_object($debugObject)) {$debugObject->debugLogEntry(1);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
587
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
588 switch ($exp) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
589 case '=':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
590 return ($value===$pattern);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
591 case '!=':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
592 return ($value!==$pattern);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
593 case '^=':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
594 return preg_match("/^".preg_quote($pattern,'/')."/", $value);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
595 case '$=':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
596 return preg_match("/".preg_quote($pattern,'/')."$/", $value);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
597 case '*=':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
598 if ($pattern[0]=='/') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
599 return preg_match($pattern, $value);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
600 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
601 return preg_match("/".$pattern."/i", $value);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
602 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
603 return false;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
604 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
605
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
606 protected function parse_selector($selector_string) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
607 global $debugObject;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
608 if (is_object($debugObject)) {$debugObject->debugLogEntry(1);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
609
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
610 // pattern of CSS selectors, modified from mootools
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
611 // Paperg: Add the colon to the attrbute, so that it properly finds <tag attr:ibute="something" > like google does.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
612 // Note: if you try to look at this attribute, yo MUST use getAttribute since $dom->x:y will fail the php syntax check.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
613 // Notice the \[ starting the attbute? and the @? following? This implies that an attribute can begin with an @ sign that is not captured.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
614 // This implies that an html attribute specifier may start with an @ sign that is NOT captured by the expression.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
615 // farther study is required to determine of this should be documented or removed.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
616 // $pattern = "/([\w-:\*]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
617 $pattern = "/([\w-:\*]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-:]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
618 preg_match_all($pattern, trim($selector_string).' ', $matches, PREG_SET_ORDER);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
619 if (is_object($debugObject)) {$debugObject->debugLog(2, "Matches Array: ", $matches);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
620
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
621 $selectors = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
622 $result = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
623 //print_r($matches);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
624
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
625 foreach ($matches as $m) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
626 $m[0] = trim($m[0]);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
627 if ($m[0]==='' || $m[0]==='/' || $m[0]==='//') continue;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
628 // for browser generated xpath
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
629 if ($m[1]==='tbody') continue;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
630
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
631 list($tag, $key, $val, $exp, $no_key) = array($m[1], null, null, '=', false);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
632 if (!empty($m[2])) {$key='id'; $val=$m[2];}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
633 if (!empty($m[3])) {$key='class'; $val=$m[3];}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
634 if (!empty($m[4])) {$key=$m[4];}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
635 if (!empty($m[5])) {$exp=$m[5];}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
636 if (!empty($m[6])) {$val=$m[6];}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
637
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
638 // convert to lowercase
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
639 if ($this->dom->lowercase) {$tag=strtolower($tag); $key=strtolower($key);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
640 //elements that do NOT have the specified attribute
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
641 if (isset($key[0]) && $key[0]==='!') {$key=substr($key, 1); $no_key=true;}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
642
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
643 $result[] = array($tag, $key, $val, $exp, $no_key);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
644 if (trim($m[7])===',') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
645 $selectors[] = $result;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
646 $result = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
647 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
648 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
649 if (count($result)>0)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
650 $selectors[] = $result;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
651 return $selectors;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
652 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
653
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
654 function __get($name) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
655 if (isset($this->attr[$name]))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
656 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
657 return $this->convert_text($this->attr[$name]);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
658 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
659 switch ($name) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
660 case 'outertext': return $this->outertext();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
661 case 'innertext': return $this->innertext();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
662 case 'plaintext': return $this->text();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
663 case 'xmltext': return $this->xmltext();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
664 default: return array_key_exists($name, $this->attr);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
665 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
666 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
667
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
668 function __set($name, $value) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
669 switch ($name) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
670 case 'outertext': return $this->_[HDOM_INFO_OUTER] = $value;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
671 case 'innertext':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
672 if (isset($this->_[HDOM_INFO_TEXT])) return $this->_[HDOM_INFO_TEXT] = $value;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
673 return $this->_[HDOM_INFO_INNER] = $value;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
674 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
675 if (!isset($this->attr[$name])) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
676 $this->_[HDOM_INFO_SPACE][] = array(' ', '', '');
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
677 $this->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_DOUBLE;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
678 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
679 $this->attr[$name] = $value;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
680 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
681
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
682 function __isset($name) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
683 switch ($name) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
684 case 'outertext': return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
685 case 'innertext': return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
686 case 'plaintext': return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
687 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
688 //no value attr: nowrap, checked selected...
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
689 return (array_key_exists($name, $this->attr)) ? true : isset($this->attr[$name]);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
690 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
691
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
692 function __unset($name) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
693 if (isset($this->attr[$name]))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
694 unset($this->attr[$name]);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
695 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
696
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
697 // PaperG - Function to convert the text from one character set to another if the two sets are not the same.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
698 function convert_text($text) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
699 global $debugObject;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
700 if (is_object($debugObject)) {$debugObject->debugLogEntry(1);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
701
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
702 $converted_text = $text;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
703
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
704 $sourceCharset = "";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
705 $targetCharset = "";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
706 if ($this->dom) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
707 $sourceCharset = strtoupper($this->dom->_charset);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
708 $targetCharset = strtoupper($this->dom->_target_charset);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
709 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
710 if (is_object($debugObject)) {$debugObject->debugLog(3, "source charset: " . $sourceCharset . " target charaset: " . $targetCharset);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
711
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
712 if (!empty($sourceCharset) && !empty($targetCharset) && (strcasecmp($sourceCharset, $targetCharset) != 0))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
713 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
714 // Check if the reported encoding could have been incorrect and the text is actually already UTF-8
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
715 if ((strcasecmp($targetCharset, 'UTF-8') == 0) && ($this->is_utf8($text)))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
716 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
717 $converted_text = $text;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
718 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
719 else
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
720 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
721 $converted_text = iconv($sourceCharset, $targetCharset, $text);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
722 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
723 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
724
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
725 return $converted_text;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
726 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
727
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
728 function is_utf8($string)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
729 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
730 return (utf8_encode(utf8_decode($string)) == $string);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
731 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
732
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
733 // camel naming conventions
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
734 function getAllAttributes() {return $this->attr;}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
735 function getAttribute($name) {return $this->__get($name);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
736 function setAttribute($name, $value) {$this->__set($name, $value);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
737 function hasAttribute($name) {return $this->__isset($name);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
738 function removeAttribute($name) {$this->__set($name, null);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
739 function getElementById($id) {return $this->find("#$id", 0);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
740 function getElementsById($id, $idx=null) {return $this->find("#$id", $idx);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
741 function getElementByTagName($name) {return $this->find($name, 0);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
742 function getElementsByTagName($name, $idx=null) {return $this->find($name, $idx);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
743 function parentNode() {return $this->parent();}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
744 function childNodes($idx=-1) {return $this->children($idx);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
745 function firstChild() {return $this->first_child();}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
746 function lastChild() {return $this->last_child();}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
747 function nextSibling() {return $this->next_sibling();}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
748 function previousSibling() {return $this->prev_sibling();}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
749 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
750
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
751 /**
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
752 * simple html dom parser
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
753 * Paperg - in the find routine: allow us to specify that we want case insensitive testing of the value of the selector.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
754 * Paperg - change $size from protected to public so we can easily access it
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
755 * Paperg - added ForceTagsClosed in the constructor which tells us whether we trust the html or not. Default is to NOT trust it.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
756 *
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
757 * @package PlaceLocalInclude
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
758 */
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
759 class simple_html_dom {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
760 public $root = null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
761 public $nodes = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
762 public $callback = null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
763 public $lowercase = false;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
764 public $size;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
765 protected $pos;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
766 protected $doc;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
767 protected $char;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
768 protected $cursor;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
769 protected $parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
770 protected $noise = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
771 protected $token_blank = " \t\r\n";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
772 protected $token_equal = ' =/>';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
773 protected $token_slash = " />\r\n\t";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
774 protected $token_attr = ' >';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
775 protected $_charset = '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
776 protected $_target_charset = '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
777 protected $default_br_text = "";
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
778
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
779 // use isset instead of in_array, performance boost about 30%...
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
780 protected $self_closing_tags = array('img'=>1, 'br'=>1, 'input'=>1, 'meta'=>1, 'link'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
781 protected $block_tags = array('root'=>1, 'body'=>1, 'form'=>1, 'div'=>1, 'span'=>1, 'table'=>1);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
782 // Known sourceforge issue #2977341
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
783 // B tags that are not closed cause us to return everything to the end of the document.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
784 protected $optional_closing_tags = array(
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
785 'tr'=>array('tr'=>1, 'td'=>1, 'th'=>1),
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
786 'th'=>array('th'=>1),
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
787 'td'=>array('td'=>1),
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
788 'li'=>array('li'=>1),
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
789 'dt'=>array('dt'=>1, 'dd'=>1),
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
790 'dd'=>array('dd'=>1, 'dt'=>1),
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
791 'dl'=>array('dd'=>1, 'dt'=>1),
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
792 'p'=>array('p'=>1),
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
793 'nobr'=>array('nobr'=>1),
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
794 'b'=>array('b'=>1),
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
795 );
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
796
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
797 function __construct($str=null, $lowercase=true, $forceTagsClosed=true, $target_charset=DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
798 if ($str) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
799 if (preg_match("/^http:\/\//i",$str) || is_file($str))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
800 $this->load_file($str);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
801 else
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
802 $this->load($str, $lowercase, $stripRN, $defaultBRText);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
803 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
804 // Forcing tags to be closed implies that we don't trust the html, but it can lead to parsing errors if we SHOULD trust the html.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
805 if (!$forceTagsClosed) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
806 $this->optional_closing_array=array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
807 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
808 $this->_target_charset = $target_charset;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
809 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
810
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
811 function __destruct() {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
812 $this->clear();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
813 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
814
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
815 // load html from string
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
816 function load($str, $lowercase=true, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
817 global $debugObject;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
818
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
819 // prepare
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
820 $this->prepare($str, $lowercase, $stripRN, $defaultBRText);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
821 // strip out comments
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
822 $this->remove_noise("'<!--(.*?)-->'is");
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
823 // strip out cdata
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
824 $this->remove_noise("'<!\[CDATA\[(.*?)\]\]>'is", true);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
825 // Per sourceforge http://sourceforge.net/tracker/?func=detail&aid=2949097&group_id=218559&atid=1044037
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
826 // Script tags removal now preceeds style tag removal.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
827 // strip out <script> tags
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
828 $this->remove_noise("'<\s*script[^>]*[^/]>(.*?)<\s*/\s*script\s*>'is");
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
829 $this->remove_noise("'<\s*script\s*>(.*?)<\s*/\s*script\s*>'is");
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
830 // strip out <style> tags
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
831 $this->remove_noise("'<\s*style[^>]*[^/]>(.*?)<\s*/\s*style\s*>'is");
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
832 $this->remove_noise("'<\s*style\s*>(.*?)<\s*/\s*style\s*>'is");
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
833 // strip out preformatted tags
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
834 $this->remove_noise("'<\s*(?:code)[^>]*>(.*?)<\s*/\s*(?:code)\s*>'is");
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
835 // strip out server side scripts
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
836 $this->remove_noise("'(<\?)(.*?)(\?>)'s", true);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
837 // strip smarty scripts
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
838 $this->remove_noise("'(\{\w)(.*?)(\})'s", true);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
839
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
840 // parsing
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
841 while ($this->parse());
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
842 // end
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
843 $this->root->_[HDOM_INFO_END] = $this->cursor;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
844 $this->parse_charset();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
845 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
846
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
847 // load html from file
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
848 function load_file() {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
849 $args = func_get_args();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
850 $this->load(call_user_func_array('file_get_contents', $args), true);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
851 // Per the simple_html_dom repositiry this is a planned upgrade to the codebase.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
852 // Throw an error if we can't properly load the dom.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
853 if (($error=error_get_last())!==null) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
854 $this->clear();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
855 return false;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
856 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
857 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
858
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
859 // set callback function
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
860 function set_callback($function_name) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
861 $this->callback = $function_name;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
862 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
863
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
864 // remove callback function
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
865 function remove_callback() {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
866 $this->callback = null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
867 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
868
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
869 // save dom as string
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
870 function save($filepath='') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
871 $ret = $this->root->innertext();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
872 if ($filepath!=='') file_put_contents($filepath, $ret, LOCK_EX);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
873 return $ret;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
874 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
875
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
876 // find dom node by css selector
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
877 // Paperg - allow us to specify that we want case insensitive testing of the value of the selector.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
878 function find($selector, $idx=null, $lowercase=false) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
879 return $this->root->find($selector, $idx, $lowercase);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
880 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
881
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
882 // clean up memory due to php5 circular references memory leak...
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
883 function clear() {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
884 foreach ($this->nodes as $n) {$n->clear(); $n = null;}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
885 // This add next line is documented in the sourceforge repository. 2977248 as a fix for ongoing memory leaks that occur even with the use of clear.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
886 if (isset($this->children)) foreach ($this->children as $n) {$n->clear(); $n = null;}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
887 if (isset($this->parent)) {$this->parent->clear(); unset($this->parent);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
888 if (isset($this->root)) {$this->root->clear(); unset($this->root);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
889 unset($this->doc);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
890 unset($this->noise);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
891 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
892
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
893 function dump($show_attr=true) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
894 $this->root->dump($show_attr);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
895 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
896
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
897 // prepare HTML data and init everything
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
898 protected function prepare($str, $lowercase=true, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
899 $this->clear();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
900
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
901 // set the length of content before we do anything to it.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
902 $this->size = strlen($str);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
903
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
904 //before we save the string as the doc... strip out the \r \n's if we are told to.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
905 if ($stripRN) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
906 $str = str_replace("\r", " ", $str);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
907 $str = str_replace("\n", " ", $str);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
908 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
909
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
910 $this->doc = $str;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
911 $this->pos = 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
912 $this->cursor = 1;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
913 $this->noise = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
914 $this->nodes = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
915 $this->lowercase = $lowercase;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
916 $this->default_br_text = $defaultBRText;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
917 $this->root = new simple_html_dom_node($this);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
918 $this->root->tag = 'root';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
919 $this->root->_[HDOM_INFO_BEGIN] = -1;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
920 $this->root->nodetype = HDOM_TYPE_ROOT;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
921 $this->parent = $this->root;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
922 if ($this->size>0) $this->char = $this->doc[0];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
923 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
924
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
925 // parse html content
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
926 protected function parse() {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
927 if (($s = $this->copy_until_char('<'))==='')
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
928 return $this->read_tag();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
929
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
930 // text
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
931 $node = new simple_html_dom_node($this);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
932 ++$this->cursor;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
933 $node->_[HDOM_INFO_TEXT] = $s;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
934 $this->link_nodes($node, false);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
935 return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
936 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
937
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
938 // PAPERG - dkchou - added this to try to identify the character set of the page we have just parsed so we know better how to spit it out later.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
939 // NOTE: IF you provide a routine called get_last_retrieve_url_contents_content_type which returns the CURLINFO_CONTENT_TYPE fromt he last curl_exec
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
940 // (or the content_type header fromt eh last transfer), we will parse THAT, and if a charset is specified, we will use it over any other mechanism.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
941 protected function parse_charset()
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
942 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
943 global $debugObject;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
944
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
945 $charset = null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
946
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
947 if (function_exists('get_last_retrieve_url_contents_content_type'))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
948 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
949 $contentTypeHeader = get_last_retrieve_url_contents_content_type();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
950 $success = preg_match('/charset=(.+)/', $contentTypeHeader, $matches);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
951 if ($success)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
952 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
953 $charset = $matches[1];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
954 if (is_object($debugObject)) {$debugObject->debugLog(2, 'header content-type found charset of: ' . $charset);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
955 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
956
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
957 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
958
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
959 if (empty($charset))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
960 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
961 $el = $this->root->find('meta[http-equiv=Content-Type]',0);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
962 if (!empty($el))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
963 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
964 $fullvalue = $el->content;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
965 if (is_object($debugObject)) {$debugObject->debugLog(2, 'meta content-type tag found' . $fullValue);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
966
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
967 if (!empty($fullvalue))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
968 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
969 $success = preg_match('/charset=(.+)/', $fullvalue, $matches);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
970 if ($success)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
971 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
972 $charset = $matches[1];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
973 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
974 else
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
975 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
976 // If there is a meta tag, and they don't specify the character set, research says that it's typically ISO-8859-1
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
977 if (is_object($debugObject)) {$debugObject->debugLog(2, 'meta content-type tag couldn\'t be parsed. using iso-8859 default.');}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
978 $charset = 'ISO-8859-1';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
979 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
980 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
981 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
982 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
983
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
984 // If we couldn't find a charset above, then lets try to detect one based on the text we got...
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
985 if (empty($charset))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
986 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
987 // Have php try to detect the encoding from the text given to us.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
988 $charset = mb_detect_encoding($this->root->plaintext . "ascii", $encoding_list = array( "UTF-8", "CP1252" ) );
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
989 if (is_object($debugObject)) {$debugObject->debugLog(2, 'mb_detect found: ' . $charset);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
990
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
991 // and if this doesn't work... then we need to just wrongheadedly assume it's UTF-8 so that we can move on - cause this will usually give us most of what we need...
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
992 if ($charset === false)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
993 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
994 if (is_object($debugObject)) {$debugObject->debugLog(2, 'since mb_detect failed - using default of utf-8');}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
995 $charset = 'UTF-8';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
996 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
997 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
998
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
999 // Since CP1252 is a superset, if we get one of it's subsets, we want it instead.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1000 if ((strtolower($charset) == strtolower('ISO-8859-1')) || (strtolower($charset) == strtolower('Latin1')) || (strtolower($charset) == strtolower('Latin-1')))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1001 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1002 if (is_object($debugObject)) {$debugObject->debugLog(2, 'replacing ' . $charset . ' with CP1252 as its a superset');}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1003 $charset = 'CP1252';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1004 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1005
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1006 if (is_object($debugObject)) {$debugObject->debugLog(1, 'EXIT - ' . $charset);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1007
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1008 return $this->_charset = $charset;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1009 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1010
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1011 // read tag info
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1012 protected function read_tag() {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1013 if ($this->char!=='<') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1014 $this->root->_[HDOM_INFO_END] = $this->cursor;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1015 return false;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1016 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1017 $begin_tag_pos = $this->pos;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1018 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1019
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1020 // end tag
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1021 if ($this->char==='/') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1022 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1023 // This represetns the change in the simple_html_dom trunk from revision 180 to 181.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1024 // $this->skip($this->token_blank_t);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1025 $this->skip($this->token_blank);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1026 $tag = $this->copy_until_char('>');
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1027
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1028 // skip attributes in end tag
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1029 if (($pos = strpos($tag, ' '))!==false)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1030 $tag = substr($tag, 0, $pos);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1031
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1032 $parent_lower = strtolower($this->parent->tag);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1033 $tag_lower = strtolower($tag);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1034
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1035 if ($parent_lower!==$tag_lower) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1036 if (isset($this->optional_closing_tags[$parent_lower]) && isset($this->block_tags[$tag_lower])) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1037 $this->parent->_[HDOM_INFO_END] = 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1038 $org_parent = $this->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1039
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1040 while (($this->parent->parent) && strtolower($this->parent->tag)!==$tag_lower)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1041 $this->parent = $this->parent->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1042
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1043 if (strtolower($this->parent->tag)!==$tag_lower) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1044 $this->parent = $org_parent; // restore origonal parent
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1045 if ($this->parent->parent) $this->parent = $this->parent->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1046 $this->parent->_[HDOM_INFO_END] = $this->cursor;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1047 return $this->as_text_node($tag);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1048 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1049 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1050 else if (($this->parent->parent) && isset($this->block_tags[$tag_lower])) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1051 $this->parent->_[HDOM_INFO_END] = 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1052 $org_parent = $this->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1053
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1054 while (($this->parent->parent) && strtolower($this->parent->tag)!==$tag_lower)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1055 $this->parent = $this->parent->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1056
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1057 if (strtolower($this->parent->tag)!==$tag_lower) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1058 $this->parent = $org_parent; // restore origonal parent
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1059 $this->parent->_[HDOM_INFO_END] = $this->cursor;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1060 return $this->as_text_node($tag);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1061 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1062 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1063 else if (($this->parent->parent) && strtolower($this->parent->parent->tag)===$tag_lower) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1064 $this->parent->_[HDOM_INFO_END] = 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1065 $this->parent = $this->parent->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1066 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1067 else
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1068 return $this->as_text_node($tag);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1069 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1070
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1071 $this->parent->_[HDOM_INFO_END] = $this->cursor;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1072 if ($this->parent->parent) $this->parent = $this->parent->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1073
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1074 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1075 return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1076 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1077
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1078 $node = new simple_html_dom_node($this);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1079 $node->_[HDOM_INFO_BEGIN] = $this->cursor;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1080 ++$this->cursor;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1081 $tag = $this->copy_until($this->token_slash);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1082 $node->tag_start = $begin_tag_pos;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1083
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1084 // doctype, cdata & comments...
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1085 if (isset($tag[0]) && $tag[0]==='!') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1086 $node->_[HDOM_INFO_TEXT] = '<' . $tag . $this->copy_until_char('>');
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1087
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1088 if (isset($tag[2]) && $tag[1]==='-' && $tag[2]==='-') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1089 $node->nodetype = HDOM_TYPE_COMMENT;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1090 $node->tag = 'comment';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1091 } else {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1092 $node->nodetype = HDOM_TYPE_UNKNOWN;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1093 $node->tag = 'unknown';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1094 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1095 if ($this->char==='>') $node->_[HDOM_INFO_TEXT].='>';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1096 $this->link_nodes($node, true);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1097 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1098 return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1099 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1100
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1101 // text
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1102 if ($pos=strpos($tag, '<')!==false) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1103 $tag = '<' . substr($tag, 0, -1);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1104 $node->_[HDOM_INFO_TEXT] = $tag;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1105 $this->link_nodes($node, false);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1106 $this->char = $this->doc[--$this->pos]; // prev
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1107 return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1108 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1109
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1110 if (!preg_match("/^[\w-:]+$/", $tag)) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1111 $node->_[HDOM_INFO_TEXT] = '<' . $tag . $this->copy_until('<>');
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1112 if ($this->char==='<') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1113 $this->link_nodes($node, false);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1114 return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1115 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1116
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1117 if ($this->char==='>') $node->_[HDOM_INFO_TEXT].='>';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1118 $this->link_nodes($node, false);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1119 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1120 return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1121 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1122
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1123 // begin tag
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1124 $node->nodetype = HDOM_TYPE_ELEMENT;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1125 $tag_lower = strtolower($tag);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1126 $node->tag = ($this->lowercase) ? $tag_lower : $tag;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1127
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1128 // handle optional closing tags
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1129 if (isset($this->optional_closing_tags[$tag_lower]) ) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1130 while (isset($this->optional_closing_tags[$tag_lower][strtolower($this->parent->tag)])) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1131 $this->parent->_[HDOM_INFO_END] = 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1132 $this->parent = $this->parent->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1133 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1134 $node->parent = $this->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1135 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1136
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1137 $guard = 0; // prevent infinity loop
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1138 $space = array($this->copy_skip($this->token_blank), '', '');
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1139
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1140 // attributes
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1141 do
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1142 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1143 if ($this->char!==null && $space[0]==='') break;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1144 $name = $this->copy_until($this->token_equal);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1145 if ($guard===$this->pos) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1146 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1147 continue;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1148 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1149 $guard = $this->pos;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1150
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1151 // handle endless '<'
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1152 if ($this->pos>=$this->size-1 && $this->char!=='>') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1153 $node->nodetype = HDOM_TYPE_TEXT;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1154 $node->_[HDOM_INFO_END] = 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1155 $node->_[HDOM_INFO_TEXT] = '<'.$tag . $space[0] . $name;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1156 $node->tag = 'text';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1157 $this->link_nodes($node, false);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1158 return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1159 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1160
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1161 // handle mismatch '<'
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1162 if ($this->doc[$this->pos-1]=='<') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1163 $node->nodetype = HDOM_TYPE_TEXT;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1164 $node->tag = 'text';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1165 $node->attr = array();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1166 $node->_[HDOM_INFO_END] = 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1167 $node->_[HDOM_INFO_TEXT] = substr($this->doc, $begin_tag_pos, $this->pos-$begin_tag_pos-1);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1168 $this->pos -= 2;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1169 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1170 $this->link_nodes($node, false);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1171 return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1172 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1173
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1174 if ($name!=='/' && $name!=='') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1175 $space[1] = $this->copy_skip($this->token_blank);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1176 $name = $this->restore_noise($name);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1177 if ($this->lowercase) $name = strtolower($name);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1178 if ($this->char==='=') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1179 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1180 $this->parse_attr($node, $name, $space);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1181 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1182 else {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1183 //no value attr: nowrap, checked selected...
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1184 $node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_NO;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1185 $node->attr[$name] = true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1186 if ($this->char!='>') $this->char = $this->doc[--$this->pos]; // prev
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1187 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1188 $node->_[HDOM_INFO_SPACE][] = $space;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1189 $space = array($this->copy_skip($this->token_blank), '', '');
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1190 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1191 else
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1192 break;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1193 } while ($this->char!=='>' && $this->char!=='/');
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1194
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1195 $this->link_nodes($node, true);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1196 $node->_[HDOM_INFO_ENDSPACE] = $space[0];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1197
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1198 // check self closing
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1199 if ($this->copy_until_char_escape('>')==='/') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1200 $node->_[HDOM_INFO_ENDSPACE] .= '/';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1201 $node->_[HDOM_INFO_END] = 0;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1202 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1203 else {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1204 // reset parent
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1205 if (!isset($this->self_closing_tags[strtolower($node->tag)])) $this->parent = $node;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1206 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1207 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1208
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1209 // If it's a BR tag, we need to set it's text to the default text.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1210 // This way when we see it in plaintext, we can generate formatting that the user wants.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1211 if ($node->tag == "br") {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1212 $node->_[HDOM_INFO_INNER] = $this->default_br_text;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1213 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1214
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1215 return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1216 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1217
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1218 // parse attributes
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1219 protected function parse_attr($node, $name, &$space) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1220 // Per sourceforge: http://sourceforge.net/tracker/?func=detail&aid=3061408&group_id=218559&atid=1044037
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1221 // If the attribute is already defined inside a tag, only pay atetntion to the first one as opposed to the last one.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1222 if (isset($node->attr[$name]))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1223 {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1224 return;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1225 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1226
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1227 $space[2] = $this->copy_skip($this->token_blank);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1228 switch ($this->char) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1229 case '"':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1230 $node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_DOUBLE;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1231 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1232 $node->attr[$name] = $this->restore_noise($this->copy_until_char_escape('"'));
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1233 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1234 break;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1235 case '\'':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1236 $node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_SINGLE;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1237 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1238 $node->attr[$name] = $this->restore_noise($this->copy_until_char_escape('\''));
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1239 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1240 break;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1241 default:
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1242 $node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_NO;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1243 $node->attr[$name] = $this->restore_noise($this->copy_until($this->token_attr));
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1244 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1245 // PaperG: Attributes should not have \r or \n in them, that counts as html whitespace.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1246 $node->attr[$name] = str_replace("\r", "", $node->attr[$name]);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1247 $node->attr[$name] = str_replace("\n", "", $node->attr[$name]);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1248 // PaperG: If this is a "class" selector, lets get rid of the preceeding and trailing space since some people leave it in the multi class case.
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1249 if ($name == "class") {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1250 $node->attr[$name] = trim($node->attr[$name]);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1251 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1252 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1253
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1254 // link node's parent
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1255 protected function link_nodes(&$node, $is_child) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1256 $node->parent = $this->parent;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1257 $this->parent->nodes[] = $node;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1258 if ($is_child)
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1259 $this->parent->children[] = $node;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1260 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1261
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1262 // as a text node
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1263 protected function as_text_node($tag) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1264 $node = new simple_html_dom_node($this);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1265 ++$this->cursor;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1266 $node->_[HDOM_INFO_TEXT] = '</' . $tag . '>';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1267 $this->link_nodes($node, false);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1268 $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1269 return true;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1270 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1271
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1272 protected function skip($chars) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1273 $this->pos += strspn($this->doc, $chars, $this->pos);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1274 $this->char = ($this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1275 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1276
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1277 protected function copy_skip($chars) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1278 $pos = $this->pos;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1279 $len = strspn($this->doc, $chars, $pos);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1280 $this->pos += $len;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1281 $this->char = ($this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1282 if ($len===0) return '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1283 return substr($this->doc, $pos, $len);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1284 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1285
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1286 protected function copy_until($chars) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1287 $pos = $this->pos;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1288 $len = strcspn($this->doc, $chars, $pos);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1289 $this->pos += $len;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1290 $this->char = ($this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1291 return substr($this->doc, $pos, $len);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1292 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1293
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1294 protected function copy_until_char($char) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1295 if ($this->char===null) return '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1296
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1297 if (($pos = strpos($this->doc, $char, $this->pos))===false) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1298 $ret = substr($this->doc, $this->pos, $this->size-$this->pos);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1299 $this->char = null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1300 $this->pos = $this->size;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1301 return $ret;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1302 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1303
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1304 if ($pos===$this->pos) return '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1305 $pos_old = $this->pos;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1306 $this->char = $this->doc[$pos];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1307 $this->pos = $pos;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1308 return substr($this->doc, $pos_old, $pos-$pos_old);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1309 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1310
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1311 protected function copy_until_char_escape($char) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1312 if ($this->char===null) return '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1313
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1314 $start = $this->pos;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1315 while (1) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1316 if (($pos = strpos($this->doc, $char, $start))===false) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1317 $ret = substr($this->doc, $this->pos, $this->size-$this->pos);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1318 $this->char = null;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1319 $this->pos = $this->size;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1320 return $ret;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1321 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1322
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1323 if ($pos===$this->pos) return '';
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1324
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1325 if ($this->doc[$pos-1]==='\\') {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1326 $start = $pos+1;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1327 continue;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1328 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1329
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1330 $pos_old = $this->pos;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1331 $this->char = $this->doc[$pos];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1332 $this->pos = $pos;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1333 return substr($this->doc, $pos_old, $pos-$pos_old);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1334 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1335 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1336
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1337 // remove noise from html content
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1338 protected function remove_noise($pattern, $remove_tag=false) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1339 $count = preg_match_all($pattern, $this->doc, $matches, PREG_SET_ORDER|PREG_OFFSET_CAPTURE);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1340
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1341 for ($i=$count-1; $i>-1; --$i) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1342 $key = '___noise___'.sprintf('% 3d', count($this->noise)+100);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1343 $idx = ($remove_tag) ? 0 : 1;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1344 $this->noise[$key] = $matches[$i][$idx][0];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1345 $this->doc = substr_replace($this->doc, $key, $matches[$i][$idx][1], strlen($matches[$i][$idx][0]));
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1346 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1347
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1348 // reset the length of content
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1349 $this->size = strlen($this->doc);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1350 if ($this->size>0) $this->char = $this->doc[0];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1351 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1352
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1353 // restore noise to html content
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1354 function restore_noise($text) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1355 while (($pos=strpos($text, '___noise___'))!==false) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1356 $key = '___noise___'.$text[$pos+11].$text[$pos+12].$text[$pos+13];
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1357 if (isset($this->noise[$key]))
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1358 $text = substr($text, 0, $pos).$this->noise[$key].substr($text, $pos+14);
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1359 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1360 return $text;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1361 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1362
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1363 function __toString() {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1364 return $this->root->innertext();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1365 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1366
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1367 function __get($name) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1368 switch ($name) {
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1369 case 'outertext':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1370 return $this->root->innertext();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1371 case 'innertext':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1372 return $this->root->innertext();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1373 case 'plaintext':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1374 return $this->root->text();
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1375 case 'charset':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1376 return $this->_charset;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1377 case 'target_charset':
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1378 return $this->_target_charset;
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1379 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1380 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1381
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1382 // camel naming conventions
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1383 function childNodes($idx=-1) {return $this->root->childNodes($idx);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1384 function firstChild() {return $this->root->first_child();}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1385 function lastChild() {return $this->root->last_child();}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1386 function getElementById($id) {return $this->find("#$id", 0);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1387 function getElementsById($id, $idx=null) {return $this->find("#$id", $idx);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1388 function getElementByTagName($name) {return $this->find($name, 0);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1389 function getElementsByTagName($name, $idx=-1) {return $this->find($name, $idx);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1390 function loadFile() {$args = func_get_args();$this->load_file($args);}
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1391 }
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1392
077b0a0a3e6d remaining originals according to dependency walk
Robert Boland <robert@markup.co.uk>
parents:
diff changeset
1393 ?>