dsci524_group29_webscraping =========================== .. py:module:: dsci524_group29_webscraping Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/dsci524_group29_webscraping/fetch_html/index /autoapi/dsci524_group29_webscraping/parse_content/index /autoapi/dsci524_group29_webscraping/save_data/index Attributes ---------- .. autoapisummary:: dsci524_group29_webscraping.__version__ Functions --------- .. autoapisummary:: dsci524_group29_webscraping.save_data dsci524_group29_webscraping.parse_content dsci524_group29_webscraping.fetch_html Package Contents ---------------- .. py:data:: __version__ .. py:function:: save_data(data, format='csv', destination='output.csv') Saves the extracted data into a file. :param data: The data to be saved. - For 'csv', it must be a list of dictionaries where each dictionary represents a row. - For 'json', it can be either a list or a dictionary. :type data: list or dict :param format: The format in which to save the data. Options are: - 'csv': Saves the data as a CSV file. Each key in the dictionaries becomes a column header. - 'json': Saves the data as a JSON file. The data is serialized with indentation for readability. Default is 'csv'. :type format: str, optional :param destination: The file path to save the data. Can specify: - A file name (e.g., 'output.csv'). - A full path (e.g., '/path/to/output.csv'). Default is 'output.csv'. :type destination: str, optional :returns: The absolute path to the saved file. :rtype: str :raises ValueError: If the format is unsupported or if the data structure is incompatible with the format. :raises FileNotFoundError: If the directory specified in the destination path does not exist. :raises Exception: If an unexpected error occurs during the file-writing process. .. rubric:: Examples # Save data as a CSV file save_data([{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}], format='csv', destination='data.csv') # Save data as a JSON file save_data({"name": "Alice", "age": 25}, format='json', destination='data.json') .. rubric:: Notes - The directory specified in the destination path must exist; otherwise, a FileNotFoundError is raised. - For 'csv', the first dictionary in the list determines the column headers. .. py:function:: parse_content(html_content, selector, selector_type='css') Parses HTML content to extract data based on the provided selector. :param html_content: The raw HTML content to be parsed. :type html_content: str :param selector: The query to locate elements in the HTML content. - For CSS selectors: Use `.class`, `#id`, or `tagname`. - For XPath: Use expressions like `//tag[@attribute='value']`. :type selector: str :param selector_type: The type of selector to use. Options: - 'css': Uses a CSS selector (e.g., `.item` selects elements with class "item"). - 'xpath': Uses an XPath expression (e.g., `//div[@class='item']` selects