Please Wait For Loading
In this article, I would like to raise the subject of web scraping.
Before we start, we have to consider web scraping on several layers: legal and ethical.
Let’s tackle the topic of the legality of web scraping first.
Fundamentally, the law depends on the country you live in. You should remember about:
Authors’ right
Terms of Service
General Data Protection Regulation (GDPR, European Union)
Computer Fraud and Abuse Act (CFAA, USA)
Outside the regulations you have to make sure that you are not affecting on target’s infrastructure by for example DDOSing target’s site.
Let’s get to practice
We want to get all titles from the blog section.
When we analyze the blog section, we can see that the title is an a tag inside a div with the class blog-page-title.
a
div
blog-page-title
The steps we need to take to accomplish our task:
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC url = "https://pawelmajewski.com/blog" driver = webdriver.Chrome() driver.get(url) div_class = "blog-page-title" wait = WebDriverWait(driver, 3) wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, f"div.{div_class} a"))) a_tags = driver.find_elements(By.CSS_SELECTOR, f"div.{div_class} a") for a_tag in a_tags: print(a_tag.text) driver.quit()
I recommend using the Selenium library both for writing your own web scrapers as well as for writing, for example, integration tests.
Save my name, email, and website in this browser for the next time I comment.
A Step-by-Step Guide to Web Scraping for Beginners
In this article, I would like to raise the subject of web scraping.
Before we start, we have to consider web scraping on several layers: legal and ethical.
Fundamentally, the law depends on the country you live in. You should remember about:
Authors’ right
Terms of Service
General Data Protection Regulation (GDPR, European Union)
Computer Fraud and Abuse Act (CFAA, USA)
Outside the regulations you have to make sure that you are not affecting on target’s infrastructure by for example DDOSing target’s site.
Let’s get to practice
We want to get all titles from the blog section.
When we analyze the blog section, we can see that the title is an
a
tag inside adiv
with the classblog-page-title
.The steps we need to take to accomplish our task:
div
with the class “blog-page-title” and ana
tag inside it.a
tag inside thediv
with the class mentioned above.a
tags.I recommend using the Selenium library both for writing your own web scrapers as well as for writing, for example, integration tests.
Archives
Understanding K-Nearest-Neighbors (KNN) – Essential Machine Learning Algorithm
2024-10-06A Step-by-Step Guide to Web Scraping for Beginners
2024-07-12Categories
Meta