dsci524_group29_webscraping.parse_content ========================================= .. py:module:: dsci524_group29_webscraping.parse_content Functions --------- .. autoapisummary:: dsci524_group29_webscraping.parse_content.parse_content Module Contents --------------- .. py:function:: parse_content(html_content, selector, selector_type='css') Parses HTML content to extract data based on the provided selector. :param html_content: The raw HTML content to be parsed. :type html_content: str :param selector: The query to locate elements in the HTML content. - For CSS selectors: Use `.class`, `#id`, or `tagname`. - For XPath: Use expressions like `//tag[@attribute='value']`. :type selector: str :param selector_type: The type of selector to use. Options: - 'css': Uses a CSS selector (e.g., `.item` selects elements with class "item"). - 'xpath': Uses an XPath expression (e.g., `//div[@class='item']` selects
elements with class "item"). Case-insensitive. Default is 'css'. :type selector_type: str, optional :returns: A list of dictionaries containing extracted data. - Example output: `[{'value': 'alfa'}, {'value': 'bravo'}, {'value': 'charlie'}]`. :rtype: list :raises ValueError: If the selector_type is unsupported or an error occurs during parsing. .. rubric:: Example # Sample HTML content html_content = '
alfa
bravo
charlie
' # Using a CSS selector parse_content(html_content, ".item") # Returns: [{'value': 'alfa'}, {'value': 'bravo'}, {'value': 'charlie'}] # Using an XPath selector parse_content(html_content, "//div[@class='item']", selector_type='xpath') # Returns: [{'value': 'alfa'}, {'value': 'bravo'}, {'value': 'charlie'}]