Playwright Browser Automation
Dynamic page scraping, SPA crawling, form interaction, screenshots, stealth mode.
playwrightscrapingautomationjavascriptpython
# Playwright Browser Automation
## Install
```bash
# Node.js
npm install playwright
npx playwright install chromium
# Python
pip install playwright
playwright install chromium
```
## Basic scraping (Node.js)
```ts
import { chromium } from 'playwright'
const browser = await chromium.launch({ headless: true })
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
viewport: { width: 1280, height: 720 },
locale: 'zh-CN',
})
const page = await context.newPage()
await page.goto('https://example.com', { waitUntil: 'networkidle' })
// Wait for element
await page.waitForSelector('.product-list')
// Extract data
const items = await page.$$eval('.product-card', cards =>
cards.map(card => ({
title: card.querySelector('h3')?.textContent?.trim(),
price: card.querySelector('.price')?.textContent?.trim(),
link: card.querySelector('a')?.href,
}))
)
await browser.close()
console.log(items)
```
## SPA: wait for data load
```ts
// Wait for network request to complete
const [response] = await Promise.all([
page.waitForResponse(r => r.url().includes('/api/products')),
page.goto('https://example.com/products'),
])
const data = await response.json()
// Or wait for element count to stabilize
await page.waitForFunction(() =>
document.querySelectorAll('.item').length > 10
)
```
## Form interaction & login
```ts
await page.fill('input[name="email"]', 'user@example.com')
await page.fill('input[name="password"]', 'password123')
await page.click('button[type="submit"]')
await page.waitForURL('**/dashboard')
// Save auth state for reuse
await context.storageState({ path: 'auth.json' })
// Reuse: newContext({ storageState: 'auth.json' })
```
## Pagination
```ts
const allItems = []
while (true) {
const items = await page.$$eval('.item', els => els.map(e => e.textContent))
allItems.push(...items)
const next = page.locator('a.next-page')
if (!(await next.isVisible())) break
await next.click()
await page.waitForLoadState('networkidle')
}
```
## Intercept network (skip images/ads)
```ts
await page.route('**/*.{png,jpg,gif,svg,woff2}', route => route.abort())
await page.route('**/ads/**', route => route.abort())
// Modify request
await page.route('**/api/**', async route => {
await route.continue({ headers: { ...route.request().headers(), Authorization: 'Bearer token' } })
})
```
## Python example
```python
from playwright.async_api import async_playwright
import asyncio
async def scrape():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto('https://example.com')
titles = await page.locator('h2').all_text_contents()
await browser.close()
return titles
results = asyncio.run(scrape())
```
## Stealth (avoid detection)
```bash
npm install playwright-extra puppeteer-extra-plugin-stealth
```
```ts
import { chromium } from 'playwright-extra'
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
chromium.use(StealthPlugin())
```
## Screenshots & PDF
```ts
await page.screenshot({ path: 'screenshot.png', fullPage: true })
await page.pdf({ path: 'page.pdf', format: 'A4', printBackground: true })
```API: /api/skills/playwright-scraping