Back to Blog

How to extract thanked people from a blog

Extract thanked people from a blog using Web Transpose Crawl and AI Web Scraper

4 Dec 2023 • 1 min read • < 1 min coding time

Mike Gee





Context

Why would you want to extract the thanked people from a blog?

You can derive interesting relationships between people by extracting the thanked people from blogs.

Here's a visualization of the data for Paul Graham: PG Number.

Extracting Thanked People Using Web Transpose

Defining a Schema

First, we define a schema. We can add additional categories to just the people thanked.

schema = { 'people thanked': { 'type': 'array', 'items': { 'person name': 'string', 'reason mentioned': ['thanked for reading drafts of this blog / essay', 'other kind of praise', 'other reason'] } } }

Extraction using Web Transpose AI Web Scraper

Normally, the AI Web Scraper will generate a web scraper that can be re-used on the same website giving you minimal latency. However, in cases where there isn't a standard format, it will just extract the requested data.



Python

import webtranspose as webt os.environ['WEB_TRANSPOSE_API_KEY'] = "YOUR WEBT API KEY" scraper = webt.Scraper(schema, render_js=False) print(scraper.scrape(url))


Tailwind / JS

A Tailwind & JS SDK is currently in development. You can use the API in the meantime.

// Build AI Web Scraper const options = { method: 'POST', headers: {'X-API-Key': 'YOUR_API_KEY', 'Content-Type': 'application/json'}, body: '{"render_js":false,"schema":schema,"name":"my-scraper"}' }; const scraper_id = fetch('https://api.webtranspose.com/v1/scraper/create', options) .then(response => response.json()) .then(data => data.scraper_id); // Run AI Web Scraper const url = 'https://my-blog-url.com' const options = { method: 'POST', headers: {'X-API-Key': 'YOUR_API_KEY', 'Content-Type': 'application/json'}, body: '{"scraper_id":scraper_id,"url":url}' }; fetch('https://api.webtranspose.com/v1/scraper/scrape', options) .then(response => response.json()) .then(response => console.log(response)) .catch(err => console.error(err));

Complete

You too can contribute to this dataset on Github here.