/scrape

Purpose

The /scrape endpoint is used to fetch HTML content from a specified URL and extract elements based on a provided CSS selector or regular expression. This can be useful for web scraping tasks such as collecting specific data from web pages.

HTTP Method

POST

URL

https://node.nodetrigger.com/scrape

Request Body Parameters

  • url (string, required): The URL of the web page to scrape.
  • selector (string, optional): A CSS selector to identify the elements to extract from the HTML. Either selector or regex must be provided.
  • regex (string, optional): A regular expression to match text within the HTML content. Either selector or regex must be provided.

Response

  • result (array): An array of strings containing the extracted elements based on the provided selector or regex.
  • error (string): Error message if the request fails.
  • message (string): Additional information about the error.

Example Usage

Example 1: Extracting Elements with a CSS Selector

Request:

curl -X POST https://node.nodetrigger.com/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "selector": ".article-title"
}'

Response:

{
  "result": [
    "Article Title 1",
    "Article Title 2",
    "Article Title 3"
  ]
}

Example 2: Extracting Text with a Regular Expression

Request:

curl -X POST https://node.nodetrigger.com/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "regex": "\\b\\w{5}\\b"
}'

Response:

{
  "result": [
    "words",
    "found",
    "match"
  ]
}

Best Practices

Ensure that the POST method is used for this endpoint to securely send data in the request body. If currently using GET, please change the method to POST.