AI Skill Library

Playwright Browser Automation

Dynamic page scraping, SPA crawling, form interaction, screenshots, stealth mode.

playwrightscrapingautomationjavascriptpython
# Playwright Browser Automation

## Install
```bash
# Node.js
npm install playwright
npx playwright install chromium

# Python
pip install playwright
playwright install chromium
```

## Basic scraping (Node.js)
```ts
import { chromium } from 'playwright'

const browser = await chromium.launch({ headless: true })
const context = await browser.newContext({
  userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  viewport: { width: 1280, height: 720 },
  locale: 'zh-CN',
})
const page = await context.newPage()

await page.goto('https://example.com', { waitUntil: 'networkidle' })

// Wait for element
await page.waitForSelector('.product-list')

// Extract data
const items = await page.$$eval('.product-card', cards =>
  cards.map(card => ({
    title: card.querySelector('h3')?.textContent?.trim(),
    price: card.querySelector('.price')?.textContent?.trim(),
    link: card.querySelector('a')?.href,
  }))
)

await browser.close()
console.log(items)
```

## SPA: wait for data load
```ts
// Wait for network request to complete
const [response] = await Promise.all([
  page.waitForResponse(r => r.url().includes('/api/products')),
  page.goto('https://example.com/products'),
])
const data = await response.json()

// Or wait for element count to stabilize
await page.waitForFunction(() =>
  document.querySelectorAll('.item').length > 10
)
```

## Form interaction & login
```ts
await page.fill('input[name="email"]', 'user@example.com')
await page.fill('input[name="password"]', 'password123')
await page.click('button[type="submit"]')
await page.waitForURL('**/dashboard')

// Save auth state for reuse
await context.storageState({ path: 'auth.json' })
// Reuse: newContext({ storageState: 'auth.json' })
```

## Pagination
```ts
const allItems = []
while (true) {
  const items = await page.$$eval('.item', els => els.map(e => e.textContent))
  allItems.push(...items)
  const next = page.locator('a.next-page')
  if (!(await next.isVisible())) break
  await next.click()
  await page.waitForLoadState('networkidle')
}
```

## Intercept network (skip images/ads)
```ts
await page.route('**/*.{png,jpg,gif,svg,woff2}', route => route.abort())
await page.route('**/ads/**', route => route.abort())
// Modify request
await page.route('**/api/**', async route => {
  await route.continue({ headers: { ...route.request().headers(), Authorization: 'Bearer token' } })
})
```

## Python example
```python
from playwright.async_api import async_playwright
import asyncio

async def scrape():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto('https://example.com')
        titles = await page.locator('h2').all_text_contents()
        await browser.close()
        return titles

results = asyncio.run(scrape())
```

## Stealth (avoid detection)
```bash
npm install playwright-extra puppeteer-extra-plugin-stealth
```
```ts
import { chromium } from 'playwright-extra'
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
chromium.use(StealthPlugin())
```

## Screenshots & PDF
```ts
await page.screenshot({ path: 'screenshot.png', fullPage: true })
await page.pdf({ path: 'page.pdf', format: 'A4', printBackground: true })
```

API: /api/skills/playwright-scraping