xpath的使用：定位，获取文本和属性值-创新互联

myPage = '''
TITLE

*
****

****
*****
****

Hello,\nworld!
-- by Adam

放在尾部的其他一些说明

'''

创新互联是一家专业提供长岭企业网站建设,专注与做网站、网站制作、H5开发、小程序制作等业务。10年已为长岭众多企业、政府机构等服务。创新互联专业网站设计公司优惠进行中。

html = etree.fromstring(myPage)

#一、定位
divs1 = html.xpath('//div')
divs2 = html.xpath('//div[@id]')
divs3 = html.xpath('//div[@class="foot"]')
divs4 = html.xpath('//div[@]')
divs5 = html.xpath('//div[1]')
divs6 = html.xpath('//div[last()-1]')
divs7 = html.xpath('//div[position()<3]')
divs8 = html.xpath('//div|//h2')
divs9 = html.xpath('//div[not(@)]')

二、取文本 text() 区别 html.xpath('string()')

text1 = html.xpath('//div/text()')
text2 = html.xpath('//div[@id]/text()')
text3 = html.xpath('//div[@class="foot"]/text()')
text4 = html.xpath('//div[@*]/text()')
text5 = html.xpath('//div[1]/text()')
text6 = html.xpath('//div[last()-1]/text()')
text7 = html.xpath('//div[position()<3]/text()')
text8 = html.xpath('//div/text()|//h2/text()')

#三、取属性 @
value1 = html.xpath('//a/@href')
value2 = html.xpath('//img/@src')
value3 = html.xpath('//div[2]/span/@id')

#四、定位（进阶）
#1.文档(DOM)元素(Element)的find，findall方法
divs = html.xpath('//div[position()<3]')
for div in divs:
ass = div.findall('a') # 这里只能找到:div->a, 找不到:div->p->a
for a in ass:
if a is not None:
#print(dir(a))
print(a.text, a.attrib.get('href')) #文档(DOM)元素(Element)的属性：text, attrib

2.与1等价

a_href = html.xpath('//div[position()<3]/a/@href')
print(a_href)

#3.注意与1、2的区别
a_href = html.xpath('//div[position()<3]//a/@href')
print(a_href)

参考：https://www.cnblogs.com/hhh6460/p/5079465.html

另外有需要云服务器可以了解下创新互联scvps.cn，海内外云服务器15元起步，三天无理由+7*72小时售后在线，公司持有idc许可证，提供“云服务器、裸金属服务器、高防服务器、香港服务器、美国服务器、虚拟主机、免备案服务器”等云主机租用服务以及企业上云的综合解决方案，具有“安全稳定、简单易用、服务可用性高、性价比高”等特点与优势，专为企业上云打造定制，能够满足用户丰富、多元化的应用场景需求。

名称栏目：xpath的使用：定位，获取文本和属性值-创新互联
网页地址：http://cqcxhl.cn/article/ccgseo.html

重庆分公司，新征程启航

xpath的使用：定位，获取文本和属性值-创新互联

二、取文本 text() 区别 html.xpath('string()')

2.与1等价

其他资讯