dsci524_group29_webscraping.fetch_html

Functions

fetch_html(url[, timeout])

Fetches the HTML content of a given URL.

Module Contents

dsci524_group29_webscraping.fetch_html.fetch_html(url, timeout=10)[source]

Fetches the HTML content of a given URL.

Parameters:

url (str) – The URL of the webpage to fetch.
timeout (int, optional) – The maximum time to wait for a response, in seconds. Defaults to 10 seconds.

Returns:

The raw HTML content of the webpage if the request is successful.

Return type:

str

Raises:

ValueError – If the URL provided is invalid or improperly formatted.
requests.exceptions.Timeout – If the request times out before receiving a response.
requests.exceptions.RequestException – For other issues during the HTTP request, such as connectivity problems or a non-success HTTP status code.

Examples

Fetch the HTML content of a webpage: >>> html_content = fetch_html(”https://example.com”) >>> print(html_content[:100]) # Prints the first 100 characters of the HTML content

Notes

This function uses the requests library to perform an HTTP GET request.
Ensure the requests library is installed before using this function.