John Ludhi/nbshare.io: How To Use Selenium Webdriver To Crawl Websites :
John Ludhi/nbshare.io: How To Use Selenium Webdriver To Crawl Websites
by:
blow post content copied from Planet Python
click here to view original post
January 15, 2023 at 03:09AM
Click here for more details...
=============================
The original post is available in Planet Python by
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================
by:
blow post content copied from Planet Python
click here to view original post
I have Selenium 4.7.2 version installed. You can check yours using following command.
In [11]:
!pip show selenium
Name: selenium Version: 4.7.2 Summary: Home-page: https://www.selenium.dev Author: Author-email: License: Apache 2.0 Location: /home/anaconda3/envs/condapy38/lib/python3.8/site-packages Requires: certifi, trio, trio-websocket, urllib3 Required-by:
Let us first import the necessary packages
In [8]:
from selenium import webdriver
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument("--headless")
chromeOptions.add_argument("--remote-debugging-port=9222")
chromeOptions.add_argument('--no-sandbox')
chromeOptions.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0;Win64) AppleWebkit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36")
wd = webdriver.Chrome(options=chromeOptions)
Let us try to crawl the following url...
In [9]:
url = 'https://www.linkedin.com/jobs/search?keywords=&location=San%20Francisco%2C%20California%2C%20United%20States&locationId=&geoId=102277331&f_TPR=&distance=100&position=1&pageNum=0'
wd.get(url)
no_of_jobs = int(wd.find_element_by_css_selector('h1>span').get_attribute('innerText'))
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /tmp/ipykernel_1309880/129965872.py in <cell line: 2>() 1 wd.get(url) ----> 2 no_of_jobs = int(wd.find_element_by_css_selector('h1>span').get_attribute('innerText')) AttributeError: 'WebDriver' object has no attribute 'find_element_by_css_selector'
Note -
In Selenium version 4.0 and later, the find_element_by_* and find_elements_by_* methods have been deprecated in favor of the find_element() and find_elements() methods, respectively.
To locate an element by its CSS selector, you can use the find_element() method and specify the By.CSS_SELECTOR attribute, like this:
In [2]:
from selenium.webdriver.common.by import By
In [5]:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
element = WebDriverWait(wd, 20).until(
EC.visibility_of_element_located((By.CSS_SELECTOR, "h1>span"))
)
In [7]:
element.text
Out[7]:
'231,000+'
January 15, 2023 at 03:09AM
Click here for more details...
=============================
The original post is available in Planet Python by
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================
Post a Comment