Smith Scraper API v3.0 - Professional Documentation

📚 Getting Started

Welcome to Smith Scraper API v3.0

Professional web scraping API with curl-impersonate, SOCKS5 proxies, intelligent user-agent rotation, and anti-bot evasion.

🌐 Base URL

https://smith.urbistat.com/api/v1

✨ Key Features

curl-impersonate

Perfect TLS fingerprinting matching real browsers

SOCKS5 Proxies

Rotating IPs with Lightning proxy (IT region)

Mobile Emulation

Automatic mobile/desktop UA rotation (90-98% success)

Bulk Upload

Process up to 900,000 URLs per job via file

Smart Retry

Automatic retry for 403 Forbidden errors

XLSX Export

Download results with full HTML content

📊 Success Rates

Configuration	Success Rate	Speed	Use Case
Mobile + 50 thread + Proxy	95+%	Fast	Maximum success	⭐ RECOMMENDED
Mobile + 150 threads + Proxy	70-75%	Medium
Mobile + No proxy	❌ 10-20%	Fast	DO NOT USE

🚦 HTTP Status Codes

200 OK Success

201 Created Resource created

400 Bad Request Invalid parameters

401 Unauthorized Authentication failed

404 Not Found Resource not found

422 Unprocessable Validation error

500 Server Error Internal error

🔐 Authentication

Bearer Token (Recommended)

Include your API key as Bearer token in the Authorization header.

Headers Required

Authorization: Bearer sk_your_api_key_here
Content-Type: application/json

Alternative: X-API-Key Header

X-API-Key: sk_your_api_key_here
Content-Type: application/json

Security: Never share your API key. Keep it private and secure.

🚀 Quick Start Guide

1️⃣ Test Single URL (Scraper Tests)

Quick validation before production runs

curl -X POST "https://smith.urbistat.com/api/v1/scraper_tests/test" \
  -H "Authorization: Bearer sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "target_url": "https://example.com",
    "mobile": true,
    "debug": true
  }'

2️⃣ Create Production Job (File Upload - RECOMMENDED)

For 10+ URLs, use file upload for best performance

curl -X POST "https://smith.urbistat.com/api/v1/scraper" \
  -H "Authorization: Bearer sk_your_key" \
  -F "urls_file=@urls.txt" \
  -F "name=My Scraping Job" \
  -F "mobile=true" \
  -F "threads=2" \
  -F "auto_start=true"

3️⃣ Check Job Status

curl -X GET "https://smith.urbistat.com/api/v1/scraper/{job_id}/status" \
  -H "Authorization: Bearer sk_your_key"

4️⃣ Download Results

Generate export file and download

# Step 1: Generate export
curl -X GET "https://smith.urbistat.com/api/v1/scraper/{job_id}/export_xlsx" \
  -H "Authorization: Bearer sk_your_key"

# Step 2: Download file (no auth required)
curl "https://smith.urbistat.com/api/v1/scraper/download/{token}" \
  -o results.xlsx

🌐 List Proxies

GET

List All Proxies

GET /proxies

Get list of all configured proxies with details.

Note: Password fields are always censored (********) in responses for security.

Response Example

{
  "success": true,
  "message": "Proxies retrieved",
  "data": {
    "proxies": [
      {
        "id": 1,
        "host": "res-eu.lightningproxies.net",
        "port": 9999,
        "username": "user-zone-lightning-region-it",
        "password": "********",
        "protocol": "socks5",
        "active": true,
        "requires_auth": true,
        "primary": true,
        "debug": false,
        "created_at": "2024-01-15T10:30:00Z"
      }
    ],
    "total": 1
  }
}

⭐ Primary Proxy

GET

Get Primary Proxy

GET /proxies/primary

Get the currently configured primary proxy. This is the proxy used by default when proxy: true without specifying proxy_id.

Use this endpoint to verify which proxy your jobs will use by default.

Response Example

{
  "success": true,
  "message": "Primary proxy retrieved",
  "data": {
    "id": 1,
    "host": "res-eu.lightningproxies.net",
    "port": 9999,
    "username": "user-zone-lightning-region-it",
    "password": "********",
    "protocol": "socks5",
    "requires_auth": true,
    "active": true,
    "primary": true
  }
}

POST

Set Proxy as Primary

POST /proxies/{id}/make_primary

Set a proxy as the primary proxy. This will remove primary flag from all other proxies.

➕ Add Proxy

POST

Add New Proxy

POST /proxies

Add a new proxy configuration.

Request Parameters

Parameter	Type	Required	Description
`proxy_host`	string	Required	Proxy hostname or IP
`proxy_port`	integer	Required	Proxy port number
`proxy_username`	string	Optional	Authentication username
`proxy_password`	string	Optional	Authentication password
`proxy_protocol`	string	Optional	socks5 (default), http, https
`active`	boolean	Optional	Enable proxy (default: true)
`requires_auth`	boolean	Required	Enable proxy (default: true)

Request Example

{
  "proxy_host": "res-eu.lightningproxies.net",
  "proxy_port": 9999,
  "proxy_username": "user-zone-lightning-region-it",
  "proxy_password": "your_password",
  "proxy_protocol": "socks5",
  "active": true,
  "requires_auth": true
}

🧪 Test Proxy

POST

Test Proxy Connection

POST /proxies/{id}/test

Test proxy connectivity and get exit IP information.

Tests against multiple IP detection services (api.ipify.org, ifconfig.me, icanhazip.com)

Response Example - Success

{
  "success": true,
  "message": "Proxy test completed",
  "data": {
    "proxy_id": 1,
    "test_results": [
      {
        "success": true,
        "ip": "185.123.45.67",
        "country": "IT",
        "response_time": 0.45,
        "error": null
      }
    ],
    "all_passed": true
  }
}

🧪 Create Test Job

POST

Create Scraper Test

POST /scraper_tests/test

Create a quick test job for URL validation with detailed debug info.

Request Parameters

Parameter	Type	Default	Description
`target_url`	string	-	Single URL or comma-separated URLs (alternative to file)
`urls_file`	file	-	Text file with URLs (multipart/form-data)
`mobile`	boolean	true	Use mobile user agents (RECOMMENDED)
`threads`	integer	1	Parallel threads (1-10)
`debug`	boolean	false	Enable detailed debug output
`proxy`	boolean	true	Enable proxy (ALWAYS RECOMMENDED)

JSON Request Example

{
  "target_url": "https://www.immobiliare.it/annunci/122249824/",
  "mobile": true,
  "debug": true,
  "threads": 1
}

File Upload Example (RECOMMENDED for multiple URLs)

curl -X POST "https://smith.urbistat.com/api/v1/scraper_tests/test" \
  -H "Authorization: Bearer sk_your_key" \
  -F "urls_file=@test_urls.txt" \
  -F "mobile=true" \
  -F "debug=true" \
  -F "threads=2"

✅ Check Test Results

GET

Get Test Job Results

GET /scraper_tests/{job_id}/check

Get test job execution status and results with detailed debug information.

Query Parameters

Parameter	Type	Default	Description
`detailed`	boolean	false	Include full URL results with body previews
`include_debug`	boolean	true	Include debug information

Response Example

{
  "success": true,
  "data": {
    "id": "a3bb189e-8bf9-3888-9912-ace4e6543002",
    "status": "completed",
    "total_urls": 1,
    "processed_urls": 1,
    "successful": 1,
    "errors_403": 0,
    "errors_404": 0,
    "total_time_spent": 2.45,
    "avg_time_per_url": 2.45
  }
}

📥 Export Test Results

GET

Export Test to XLSX

GET /scraper_tests/{job_id}/export_xlsx

Download test results as Excel file with full HTML body content.

Direct download (no token required for test jobs)

cURL Example

curl -X GET "https://smith.urbistat.com/api/v1/scraper_tests/{job_id}/export_xlsx" \
  -H "Authorization: Bearer sk_your_key" \
  -o test_results.xlsx

➕ Create Production Job

POST

Create Scraping Job

POST /scraper

Create a production scraping job with queue management, retry logic, and export features.

IMPORTANT: Use "auto_start": true for immediate processing (RECOMMENDED). Otherwise job stays in pending state.

Request Parameters

Parameter	Type	Default	Description
`urls`	string	-	Comma-separated URLs (for JSON requests)
`urls_file`	file	-	Text file with URLs (RECOMMENDED for 10+ URLs)
`name`	string	null	Job name for identification
`description`	string	null	Job description
`method`	string	curl-impersonate	Scraping method (curl-impersonate RECOMMENDED)
`threads`	integer	1	Parallel threads (1-10, recommended: 1-3)
`mobile`	boolean	true	Use mobile user agents (RECOMMENDED)
`proxy_enabled`	boolean	true	Enable proxy (ALWAYS RECOMMENDED)
`proxy_id`	integer	null	Specific proxy ID (uses primary if not specified)
`auto_start`	boolean	false	Start job immediately (RECOMMENDED)
`debug`	boolean	false	Enable debug mode

JSON Request Example (Small Lists)

{
  "urls": "https://site1.com,https://site2.com,https://site3.com",
  "name": "My Scraping Job",
  "mobile": true,
  "threads": 2,
  "auto_start": true
}

File Upload Example (RECOMMENDED for 10+ URLs)

curl -X POST "https://smith.urbistat.com/api/v1/scraper" \
  -H "Authorization: Bearer sk_your_key" \
  -F "urls_file=@urls.txt" \
  -F "name=Bulk Scraping Job" \
  -F "description=Processing 50,000 URLs" \
  -F "mobile=true" \
  -F "threads=3" \
  -F "auto_start=true"

Response Example - Auto Started

{
  "success": true,
  "message": "Job created",
  "data": {
    "job_id": "a3bb189e-8bf9-3888-9912-ace4e6543002",
    "queue_id": "job_a3bb189e_1704467400",
    "status": "pending",
    "total_urls": 150,
    "method": "curl-impersonate",
    "threads": 2,
    "mobile": true,
    "proxy_enabled": true,
    "auto_started": true,
    "message": "Job created and queued for processing"
  }
}

📋 List Jobs

GET

List All Jobs

GET /scraper/all

Get paginated list of all jobs regardless of status.

GET

List Queued Jobs

GET /scraper/queue

Get jobs in pending/queued state.

GET

List Running Jobs

GET /scraper/running

Get currently running jobs.

GET

List Completed Jobs

GET /scraper/completed

Get completed jobs.

GET

List Failed Jobs

GET /scraper/failed

Get failed or errored jobs.

ℹ️ Job Information & Statistics

GET

Get Job Info

GET /scraper/{job_id}/info

Get complete job configuration and setup details.

GET

Get Job Status

GET /scraper/{job_id}/status

Get real-time job execution status with progress information.

Response Example

{
  "success": true,
  "data": {
    "id": "b4cc289f-9cga-4999-0023-bdf5f7654113",
    "status": "running",
    "total_urls": 150,
    "processed": 45,
    "pending": 105,
    "successful": 35,
    "failed": 10,
    "last_processed_url": "https://example.com/page45",
    "can_start": false,
    "can_stop": true
  }
}

GET

Get Job Statistics

GET /scraper/{job_id}/stats

Get detailed statistics with error breakdown.

Error Types:
• 403 Forbidden: Retryable (use POST /retry)
• 404 Not Found: Final error (page doesn't exist)
• 5xx Server: Final error (server problem)
• Other: Final error (network/timeout)

Response Example

{
  "success": true,
  "data": {
    "id": "b4cc289f-9cga-4999-0023-bdf5f7654113",
    "status": "running",
    "total_urls": 150,
    "processed": 45,
    "successful": 35,
    "errors_403": 8,
    "errors_404": 1,
    "errors_5xx": 1,
    "errors_other": 0,
    "in_retry_queue": 3,
    "total_failed": 10,
    "success_rate": 77.78,
    "avg_time_per_url": 2.68
  }
}

🎮 Job Control

POST

Start Job

POST /scraper/{job_id}/start

Queue and start a job. Works for both pending and stopped jobs.

RECOMMENDED: Use this endpoint for both starting pending jobs AND resuming stopped jobs.

POST

Stop Job

POST /scraper/{job_id}/stop

Stop a currently running job. Worker completes current URL first, then stops. Can be resumed later with POST /start.

POST

Resume Job

DEPRECATED

POST /scraper/{job_id}/resume

DEPRECATED: This endpoint will be removed in future versions.
Use POST /start instead - it works for both pending AND stopped jobs.

POST

Retry Failed URLs

POST /scraper/{job_id}/retry

Retry all failed URLs with 403 Forbidden status.

IMPORTANT: Only 403 Forbidden errors are retried. 404 and other errors cannot be retried.

Response Example

{
  "success": true,
  "message": "Retry initiated",
  "data": {
    "job_id": "b4cc289f-9cga-4999-0023-bdf5f7654113",
    "retried_urls": 8,
    "retry_details": {
      "error_403": 8,
      "note": "Only 403 Forbidden errors are retried. 404 and other errors are final."
    }
  }
}

POST

Cancel Job

POST /scraper/{job_id}/cancel

Cancel a job. Unlike stop, a cancelled job cannot be resumed.

DELETE

Delete Job

DELETE /scraper/{job_id}

Delete job and all associated data permanently.

WARNING: This action is IRREVERSIBLE! All job data will be permanently deleted.

🔗 URL Management

POST

Add Single URL

POST /scraper/{job_id}/add_url

Request Example

{
  "url": "https://newsite.com/page"
}

POST

Remove Single URL

POST /scraper/{job_id}/remove_url

Request Example

{
  "url": "https://example.com/remove"
}

POST

Bulk Add URLs

POST /scraper/{job_id}/bulk_add_urls

Request Example

{
  "urls": [
    "https://site1.com",
    "https://site2.com",
    "https://site3.com"
  ]
}

POST

Bulk Remove URLs

POST /scraper/{job_id}/bulk_remove_urls

Request Example

{
  "urls": [
    "https://remove1.com",
    "https://remove2.com"
  ]
}

📊 Get Results

GET

Get All Results

GET /scraper/{job_id}/results

Get paginated list of all URLs with their status and results.

GET

Get Successful URLs

GET /scraper/{job_id}/successful

Get paginated list of successfully scraped URLs.

GET

Get Failed URLs

GET /scraper/{job_id}/errors

Get paginated list of failed URLs with error details.

💾 Export Data (XLSX Only)

Two-Step Export Process

Step 1: Call export_xlsx to generate file and get download URL
Step 2: Download file using the URL (no auth required, expires in 60 minutes)

GET

Step 1: Generate Export

GET /scraper/{job_id}/export_xlsx

Generate Excel export of all successful results. Returns download URL.

Response Example

{
  "success": true,
  "message": "Export generated",
  "data": {
    "job_id": "a3bb189e-8bf9-3888-9912-ace4e6543002",
    "export_type": "xlsx",
    "download_url": "https://smith.urbistat.com/api/v1/scraper/download/abc123def456",
    "download_token": "abc123def456",
    "file_size": "15.2 MB",
    "expires_at": "2024-01-15T11:30:00Z",
    "expires_in": "60 minutes"
  }
}

GET

Step 2: Download File

GET /scraper/download/{token}

Download previously generated export file using token.

No authentication required for download (token-based access)

cURL Example

# Step 1: Generate export
curl -X GET "https://smith.urbistat.com/api/v1/scraper/{job_id}/export_xlsx" \
  -H "Authorization: Bearer sk_your_key"

# Step 2: Download file (no auth required)
curl "https://smith.urbistat.com/api/v1/scraper/download/abc123" \
  -o results.xlsx

Note: Download links expire after 60 minutes. Generate a new export if expired.

Smith Scraper API

Introduction

Proxy Management

Production Jobs

URL Management

Results & Export

📚 Getting Started

Welcome to Smith Scraper API v3.0

🌐 Base URL

✨ Key Features

curl-impersonate

SOCKS5 Proxies

Mobile Emulation

Bulk Upload

Smart Retry

XLSX Export

📊 Success Rates

🚦 HTTP Status Codes

🔐 Authentication

Bearer Token (Recommended)

Headers Required

Alternative: X-API-Key Header

🚀 Quick Start Guide

1️⃣ Test Single URL (Scraper Tests)

2️⃣ Create Production Job (File Upload - RECOMMENDED)

3️⃣ Check Job Status

4️⃣ Download Results

🌐 List Proxies

List All Proxies

Response Example

⭐ Primary Proxy

Get Primary Proxy

Response Example

Set Proxy as Primary

➕ Add Proxy

Add New Proxy

Request Parameters

Request Example

🧪 Test Proxy

Test Proxy Connection

Response Example - Success

🧪 Create Test Job

Create Scraper Test

Request Parameters

JSON Request Example

File Upload Example (RECOMMENDED for multiple URLs)

✅ Check Test Results

Get Test Job Results

Query Parameters

Response Example

📥 Export Test Results

Export Test to XLSX

cURL Example

➕ Create Production Job

Create Scraping Job

Request Parameters

JSON Request Example (Small Lists)

File Upload Example (RECOMMENDED for 10+ URLs)

Response Example - Auto Started

📋 List Jobs

List All Jobs

List Queued Jobs

List Running Jobs

List Completed Jobs

List Failed Jobs

ℹ️ Job Information & Statistics

Get Job Info

Get Job Status

Response Example

Get Job Statistics

Response Example

🎮 Job Control

Start Job

Stop Job

Resume Job

Retry Failed URLs

Response Example

Cancel Job

Delete Job

🔗 URL Management