步驟: 1.首先在瀏覽器安裝 'save as we '插件(用于把網(wǎng)頁(yè)保存成HTML文件) <火狐瀏覽器/QQ瀏覽器/360瀏覽器/谷歌瀏覽等都支持此插件> 2.獲取一篇百度文庫(kù)文章word/pdf格式等都可以(以<富甲美國(guó)>為例) 3.點(diǎn)擊'save as we',跳出提示按continue save 繼續(xù)就可以把網(wǎng)頁(yè)保存為HTML, 4.完全之策已準(zhǔn)備就緒,只欠東南風(fēng)了! 5.制作HTML解析軟件,在窗體上添加一個(gè)按鈕,一個(gè)RichTextBox1文本框,一個(gè)textbox控件 6.直接上代碼 Imports HtmlAgilityPack Imports System.Text
Public Class Form1
Sub Get_YBQ() If TextBox1.Text <> '' Then RichTextBox1.Clear() Dim url As String = TextBox1.Text Dim wc As New HtmlWeb With { .OverrideEncoding = Encoding.Default, .AutoDetectEncoding = True } Dim htmldoc As HtmlDocument = wc.Load(url) Dim rootNode As HtmlNode = htmldoc.DocumentNode Try Dim xl As HtmlNodeCollection = rootNode.SelectNodes('//div[@class=' & Chr(34) & 'ie-fix' & Chr(34) & ']/p') If xl IsNot Nothing Then Dim strr As String = '' For Each node As HtmlNode In xl RichTextBox1.AppendText(node.InnerText) Next
End If
Catch ex As Exception MessageBox.Show(ex.Message) End Try End If End Sub
Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click OpenFileDialog1.Title = '請(qǐng)選擇HTML文檔' OpenFileDialog1.Filter = 'HTML文件|*.html|HTM文件|*.htm' OpenFileDialog1.ShowDialog() TextBox1.Text = OpenFileDialog1.FileName If OpenFileDialog1.FileName <> '' Then Get_YBQ() End If
End Sub End Class
7.此控件可以直接輸入網(wǎng)址獲取HTML和打開(kāi)本地HTML文件進(jìn)行解析(這里不用在線是因?yàn)榘俣任膸?kù)網(wǎng)頁(yè)有保護(hù)不能直接獲取網(wǎng)頁(yè)源碼) 8.如有問(wèn)題請(qǐng)?zhí)砑観Q群提問(wèn) 9.聲明:本HTML解析只做技術(shù)交流,切勿用于非法用途,否則后果自負(fù)!謝謝合作!
|
|