Data Crawler using Selenium

Data Crawler

在資料探勘中有一個概念是：Data Driven，意即從資料的角度出發，觀察存在這些資料中問題或規則。因此我們常常說資料是新一代工業革命中的寶藏，一個好的探勘，來自於一份具有潛力的資料，而這也是目前很多人遇到的問題：資料怎麼來，資料從哪裡來？

這次使用 Selenium 實作 Data Crawler，Selenium 主要是拿來模擬瀏覽器行為的工具，而我們也利用的功能，模擬使用者瀏覽資料的過程取得資料，進一步利用 beautifulsoup 將原始資料進行爬梳。

Example

from selenium import webdriver
from selenium.webdriver.support.ui import Select

# 開啟網頁
browser.get("https://taqm.epa.gov.tw/taqm/tw/MonthlyAverage.aspx")

# 模擬行為
selectSite = Select(browser.find_element_by_id("ctl15_ddlSite"))
selectSite.select_by_value(cite)
selectYear = Select(browser.find_element_by_id("ctl15_ddlYear"))
selectYear.select_by_value(str(year))
browser.find_element_by_id('ctl15_btnQuery').click()

# 取得資料
html_source = browser.page_source

# 關閉瀏覽器
browser.quit();

from bs4 import BeautifulSoup

# 取得資料進行整理
soup = BeautifulSoup(html_source, 'html.parser')
city = soup.find(id="ctl15_ddlSite").find_all('option', selected=True)[0].

Reference

[1] Selenium - Web Browser Automation
[2] selenium-crawler
[2] 斧頭幫大挑戰

嗨，我是維元，近期推出一個全新型態的【 Python 資料科學教學實戰營】，結合多元教學形式及豐富課程經驗幫助你更有效地學習。新課程「 Python 程式設計基礎養成」正在早鳥募資中，歡迎你一起加入資料領域！誠摯的邀請你跟著我們一起從 Python 入門開始，走進資料科學的世界 🙌

📍 報名頁面： https://dscareer.kolable.app/
📍 報名頁面： https://dscareer.kolable.app/
📍 報名頁面： https://dscareer.kolable.app/

License

本著作由 Chang, Wei-Yaun (v123582) 製作，
以創用CC 姓名標示-相同方式分享 3.0 Unported授權條款釋出。