Smith Scraper API
Professional Web Scraping REST API
๐ Getting Started
Welcome to Smith Scraper API v3.0
Professional web scraping API with curl-impersonate, SOCKS5 proxies, intelligent user-agent rotation, and anti-bot evasion.
๐ Base URL
โจ Key Features
curl-impersonate
Perfect TLS fingerprinting matching real browsers
SOCKS5 Proxies
Rotating IPs with Lightning proxy (IT region)
Mobile Emulation
Automatic mobile/desktop UA rotation (90-98% success)
Bulk Upload
Process up to 900,000 URLs per job via file
Smart Retry
Automatic retry for 403 Forbidden errors
XLSX Export
Download results with full HTML content
๐ Success Rates
| Configuration | Success Rate | Speed | Use Case | |
|---|---|---|---|---|
| Mobile + 50 thread + Proxy | 95+% | Fast | Maximum success | โญ RECOMMENDED |
| Mobile + 150 threads + Proxy | 70-75% | Medium | ||
| Mobile + No proxy | โ 10-20% | Fast | DO NOT USE |
๐ฆ HTTP Status Codes
๐ Authentication
Bearer Token (Recommended)
Include your API key as Bearer token in the Authorization header.
Headers Required
Authorization: Bearer sk_your_api_key_here
Content-Type: application/json
Alternative: X-API-Key Header
X-API-Key: sk_your_api_key_here
Content-Type: application/json
๐ Quick Start Guide
1๏ธโฃ Test Single URL (Scraper Tests)
Quick validation before production runs
curl -X POST "https://smith.urbistat.com/api/v1/scraper_tests/test" \
-H "Authorization: Bearer sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"target_url": "https://example.com",
"mobile": true,
"debug": true
}'
2๏ธโฃ Create Production Job (File Upload - RECOMMENDED)
For 10+ URLs, use file upload for best performance
curl -X POST "https://smith.urbistat.com/api/v1/scraper" \
-H "Authorization: Bearer sk_your_key" \
-F "urls_file=@urls.txt" \
-F "name=My Scraping Job" \
-F "mobile=true" \
-F "threads=2" \
-F "auto_start=true"
3๏ธโฃ Check Job Status
curl -X GET "https://smith.urbistat.com/api/v1/scraper/{job_id}/status" \
-H "Authorization: Bearer sk_your_key"
4๏ธโฃ Download Results
Generate export file and download
# Step 1: Generate export
curl -X GET "https://smith.urbistat.com/api/v1/scraper/{job_id}/export_xlsx" \
-H "Authorization: Bearer sk_your_key"
# Step 2: Download file (no auth required)
curl "https://smith.urbistat.com/api/v1/scraper/download/{token}" \
-o results.xlsx
๐ List Proxies
List All Proxies
Get list of all configured proxies with details.
Response Example
{
"success": true,
"message": "Proxies retrieved",
"data": {
"proxies": [
{
"id": 1,
"host": "res-eu.lightningproxies.net",
"port": 9999,
"username": "user-zone-lightning-region-it",
"password": "********",
"protocol": "socks5",
"active": true,
"requires_auth": true,
"primary": true,
"debug": false,
"created_at": "2024-01-15T10:30:00Z"
}
],
"total": 1
}
}
โญ Primary Proxy
Get Primary Proxy
Get the currently configured primary proxy. This is the proxy used by default when proxy: true without specifying proxy_id.
Response Example
{
"success": true,
"message": "Primary proxy retrieved",
"data": {
"id": 1,
"host": "res-eu.lightningproxies.net",
"port": 9999,
"username": "user-zone-lightning-region-it",
"password": "********",
"protocol": "socks5",
"requires_auth": true,
"active": true,
"primary": true
}
}
Set Proxy as Primary
Set a proxy as the primary proxy. This will remove primary flag from all other proxies.
โ Add Proxy
Add New Proxy
Add a new proxy configuration.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
proxy_host |
string | Required | Proxy hostname or IP |
proxy_port |
integer | Required | Proxy port number |
proxy_username |
string | Optional | Authentication username |
proxy_password |
string | Optional | Authentication password |
proxy_protocol |
string | Optional | socks5 (default), http, https |
active |
boolean | Optional | Enable proxy (default: true) |
requires_auth |
boolean | Required | Enable proxy (default: true) |
Request Example
{
"proxy_host": "res-eu.lightningproxies.net",
"proxy_port": 9999,
"proxy_username": "user-zone-lightning-region-it",
"proxy_password": "your_password",
"proxy_protocol": "socks5",
"active": true,
"requires_auth": true
}
๐งช Test Proxy
Test Proxy Connection
Test proxy connectivity and get exit IP information.
Response Example - Success
{
"success": true,
"message": "Proxy test completed",
"data": {
"proxy_id": 1,
"test_results": [
{
"success": true,
"ip": "185.123.45.67",
"country": "IT",
"response_time": 0.45,
"error": null
}
],
"all_passed": true
}
}
๐งช Create Test Job
Create Scraper Test
Create a quick test job for URL validation with detailed debug info.
Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
target_url |
string | - | Single URL or comma-separated URLs (alternative to file) |
urls_file |
file | - | Text file with URLs (multipart/form-data) |
mobile |
boolean | true | Use mobile user agents (RECOMMENDED) |
threads |
integer | 1 | Parallel threads (1-10) |
debug |
boolean | false | Enable detailed debug output |
proxy |
boolean | true | Enable proxy (ALWAYS RECOMMENDED) |
JSON Request Example
{
"target_url": "https://www.immobiliare.it/annunci/122249824/",
"mobile": true,
"debug": true,
"threads": 1
}
File Upload Example (RECOMMENDED for multiple URLs)
curl -X POST "https://smith.urbistat.com/api/v1/scraper_tests/test" \
-H "Authorization: Bearer sk_your_key" \
-F "urls_file=@test_urls.txt" \
-F "mobile=true" \
-F "debug=true" \
-F "threads=2"
โ Check Test Results
Get Test Job Results
Get test job execution status and results with detailed debug information.
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
detailed |
boolean | false | Include full URL results with body previews |
include_debug |
boolean | true | Include debug information |
Response Example
{
"success": true,
"data": {
"id": "a3bb189e-8bf9-3888-9912-ace4e6543002",
"status": "completed",
"total_urls": 1,
"processed_urls": 1,
"successful": 1,
"errors_403": 0,
"errors_404": 0,
"total_time_spent": 2.45,
"avg_time_per_url": 2.45
}
}
๐ฅ Export Test Results
Export Test to XLSX
Download test results as Excel file with full HTML body content.
cURL Example
curl -X GET "https://smith.urbistat.com/api/v1/scraper_tests/{job_id}/export_xlsx" \
-H "Authorization: Bearer sk_your_key" \
-o test_results.xlsx
โ Create Production Job
Create Scraping Job
Create a production scraping job with queue management, retry logic, and export features.
"auto_start": true for immediate processing (RECOMMENDED). Otherwise job stays in pending state.
Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
urls |
string | - | Comma-separated URLs (for JSON requests) |
urls_file |
file | - | Text file with URLs (RECOMMENDED for 10+ URLs) |
name |
string | null | Job name for identification |
description |
string | null | Job description |
method |
string | curl-impersonate | Scraping method (curl-impersonate RECOMMENDED) |
threads |
integer | 1 | Parallel threads (1-10, recommended: 1-3) |
mobile |
boolean | true | Use mobile user agents (RECOMMENDED) |
proxy_enabled |
boolean | true | Enable proxy (ALWAYS RECOMMENDED) |
proxy_id |
integer | null | Specific proxy ID (uses primary if not specified) |
auto_start |
boolean | false | Start job immediately (RECOMMENDED) |
debug |
boolean | false | Enable debug mode |
JSON Request Example (Small Lists)
{
"urls": "https://site1.com,https://site2.com,https://site3.com",
"name": "My Scraping Job",
"mobile": true,
"threads": 2,
"auto_start": true
}
File Upload Example (RECOMMENDED for 10+ URLs)
curl -X POST "https://smith.urbistat.com/api/v1/scraper" \
-H "Authorization: Bearer sk_your_key" \
-F "urls_file=@urls.txt" \
-F "name=Bulk Scraping Job" \
-F "description=Processing 50,000 URLs" \
-F "mobile=true" \
-F "threads=3" \
-F "auto_start=true"
Response Example - Auto Started
{
"success": true,
"message": "Job created",
"data": {
"job_id": "a3bb189e-8bf9-3888-9912-ace4e6543002",
"queue_id": "job_a3bb189e_1704467400",
"status": "pending",
"total_urls": 150,
"method": "curl-impersonate",
"threads": 2,
"mobile": true,
"proxy_enabled": true,
"auto_started": true,
"message": "Job created and queued for processing"
}
}
๐ List Jobs
List All Jobs
Get paginated list of all jobs regardless of status.
List Queued Jobs
Get jobs in pending/queued state.
List Running Jobs
Get currently running jobs.
List Completed Jobs
Get completed jobs.
List Failed Jobs
Get failed or errored jobs.
โน๏ธ Job Information & Statistics
Get Job Info
Get complete job configuration and setup details.
Get Job Status
Get real-time job execution status with progress information.
Response Example
{
"success": true,
"data": {
"id": "b4cc289f-9cga-4999-0023-bdf5f7654113",
"status": "running",
"total_urls": 150,
"processed": 45,
"pending": 105,
"successful": 35,
"failed": 10,
"last_processed_url": "https://example.com/page45",
"can_start": false,
"can_stop": true
}
}
Get Job Statistics
Get detailed statistics with error breakdown.
โข 403 Forbidden: Retryable (use POST /retry)
โข 404 Not Found: Final error (page doesn't exist)
โข 5xx Server: Final error (server problem)
โข Other: Final error (network/timeout)
Response Example
{
"success": true,
"data": {
"id": "b4cc289f-9cga-4999-0023-bdf5f7654113",
"status": "running",
"total_urls": 150,
"processed": 45,
"successful": 35,
"errors_403": 8,
"errors_404": 1,
"errors_5xx": 1,
"errors_other": 0,
"in_retry_queue": 3,
"total_failed": 10,
"success_rate": 77.78,
"avg_time_per_url": 2.68
}
}
๐ฎ Job Control
Start Job
Queue and start a job. Works for both pending and stopped jobs.
Stop Job
Stop a currently running job. Worker completes current URL first, then stops. Can be resumed later with POST /start.
Resume Job
DEPRECATEDUse POST /start instead - it works for both pending AND stopped jobs.
Retry Failed URLs
Retry all failed URLs with 403 Forbidden status.
Response Example
{
"success": true,
"message": "Retry initiated",
"data": {
"job_id": "b4cc289f-9cga-4999-0023-bdf5f7654113",
"retried_urls": 8,
"retry_details": {
"error_403": 8,
"note": "Only 403 Forbidden errors are retried. 404 and other errors are final."
}
}
}
Cancel Job
Cancel a job. Unlike stop, a cancelled job cannot be resumed.
Delete Job
Delete job and all associated data permanently.
๐ URL Management
Add Single URL
Request Example
{
"url": "https://newsite.com/page"
}
Remove Single URL
Request Example
{
"url": "https://example.com/remove"
}
Bulk Add URLs
Request Example
{
"urls": [
"https://site1.com",
"https://site2.com",
"https://site3.com"
]
}
Bulk Remove URLs
Request Example
{
"urls": [
"https://remove1.com",
"https://remove2.com"
]
}
๐ Get Results
Get All Results
Get paginated list of all URLs with their status and results.
Get Successful URLs
Get paginated list of successfully scraped URLs.
Get Failed URLs
Get paginated list of failed URLs with error details.
๐พ Export Data (XLSX Only)
Two-Step Export Process
Step 1: Call export_xlsx to generate file and get download URL
Step 2: Download file using the URL (no auth required, expires in 60 minutes)
Step 1: Generate Export
Generate Excel export of all successful results. Returns download URL.
Response Example
{
"success": true,
"message": "Export generated",
"data": {
"job_id": "a3bb189e-8bf9-3888-9912-ace4e6543002",
"export_type": "xlsx",
"download_url": "https://smith.urbistat.com/api/v1/scraper/download/abc123def456",
"download_token": "abc123def456",
"file_size": "15.2 MB",
"expires_at": "2024-01-15T11:30:00Z",
"expires_in": "60 minutes"
}
}
Step 2: Download File
Download previously generated export file using token.
cURL Example
# Step 1: Generate export
curl -X GET "https://smith.urbistat.com/api/v1/scraper/{job_id}/export_xlsx" \
-H "Authorization: Bearer sk_your_key"
# Step 2: Download file (no auth required)
curl "https://smith.urbistat.com/api/v1/scraper/download/abc123" \
-o results.xlsx
๐ Export File Contents
Excel files include the following columns:
- URL - Target URL
- HTML BODY - Full HTML content
- STATUS_CODE - HTTP response code
- SCRAPED_AT - Timestamp
- Streaming export for large datasets (100k+ rows)
- Full HTML body content preserved
- UTF-8 encoding for international characters
- Memory efficient processing