dsci524_group29_webscraping.fetch_html

Functions

fetch_html(url[, timeout])

Fetches the HTML content of a given URL.

Module Contents

dsci524_group29_webscraping.fetch_html.fetch_html(url, timeout=10)[source]

Fetches the HTML content of a given URL.

Parameters:
  • url (str) – The URL of the webpage to fetch.

  • timeout (int, optional) – The maximum time to wait for a response, in seconds. Defaults to 10 seconds.

Returns:

The raw HTML content of the webpage if the request is successful.

Return type:

str

Raises:
  • ValueError – If the URL provided is invalid or improperly formatted.

  • requests.exceptions.Timeout – If the request times out before receiving a response.

  • requests.exceptions.RequestException – For other issues during the HTTP request, such as connectivity problems or a non-success HTTP status code.

Examples

Fetch the HTML content of a webpage: >>> html_content = fetch_html(”https://example.com”) >>> print(html_content[:100]) # Prints the first 100 characters of the HTML content

Notes

  • This function uses the requests library to perform an HTTP GET request.

  • Ensure the requests library is installed before using this function.