Skip to main content

Tutorial: Extract News Headlines from BBC News

Learn how to scrape news headlines from BBC News website using Easy Scrape API - no coding experience required!

What You'll Learn

In this tutorial, you'll discover how to:

  • Extract news headlines from a live website
  • Use AI to generate scraping code automatically
  • Test your scraping with Postman
  • Handle dynamic content with Easy Scrape API

Prerequisites

  • Postman installed on your computer
  • Easy Scrape API subscription on RapidAPI
  • Your RapidAPI key ready

Step 1: Understand What We're Scraping

We'll scrape news headlines from BBC News (https://www.bbc.com/news) because:

  • ✅ It's a public website
  • ✅ It has clear, structured content
  • ✅ Headlines are easy to identify
  • ✅ It's a real-world use case

Step 2: Get the Page HTML for AI Analysis

First, we need to get the HTML structure so AI can help us create the scraping code.

2.1 Set Up Basic Postman Request

  1. Open Postman and create a new request
  2. Set the method to POST
  3. Set the URL to: https://easy-scrape-api.p.rapidapi.com/api/scrape

2.2 Add Headers

Go to the Headers tab and add:

KeyValue
X-RapidAPI-KeyYOUR_RAPIDAPI_KEY
X-RapidAPI-Hosteasy-scrape-api.p.rapidapi.com
Content-Typeapplication/json

2.3 Get Page HTML

In the Body tab (select rawJSON), add:

{
"url": "https://www.bbc.com/news",
"outputFormat": "html"
}

Click Send to get the page HTML. Save the HTML from the response as a file (e.g., bbc_news.html) and then upload this file to your AI assistant for analysis.

Step 3: Generate Scraping Code with AI

Now we'll use AI to create the JavaScript code for extracting headlines.

3.1 AI Prompt Template

Upload the HTML file you saved in Step 2 to your favorite AI assistant (ChatGPT, Claude, Gemini, etc.) and use this prompt:

I want to scrape news headlines from BBC News website. I've uploaded an HTML file containing the page structure.

Please analyze the HTML file and write JavaScript code for the Easy Scrape API that:
1. Waits for the page to fully load
2. Extracts all news headlines from the page
3. Returns them as an array of objects with title and link
4. Uses the 'page' object (Puppeteer v24) and 'cheerio' for parsing
5. Includes console.log statements for debugging
6. Don't include inline comments or block comments
AI Code Cleanup

The AI might generate unnecessary code like imports const puppeteer = require('puppeteer') or navigation statements await page.goto('https://...').

Remove any code that includes:

  • Import statements (puppeteer, cheerio imports)
  • page.goto() calls
  • Browser launch/close code

Only copy the code that starts AFTER any page.goto() statements. The Easy Scrape API handles navigation automatically.

Remember: Your script must always include a return statement with your data!

3.2 Example AI-Generated Code

The AI might generate something like this:

await page.waitForSelector('h3');
console.log('Page loaded, starting headline extraction...');


const html = await page.content();
const $ = cheerio.load(html);


const headlines = [];


$('h3').each((index, element) => {
const $element = $(element);
const title = $element.text().trim();
const link = $element.find('a').attr('href') || $element.closest('a').attr('href');

if (title && title.length > 10) { // Filter out short/empty titles
headlines.push({
title: title,
link: link ? (link.startsWith('http') ? link : `https://www.bbc.com${link}`) : null
});
}
});

console.log(`Found ${headlines.length} headlines`);


return headlines.slice(0, 10);

Step 4: Convert Code for Postman

Use our Online Code Parser Tool to convert the AI-generated code:

  1. Go to Online Code Parser Tool
  2. Select "Postman" from the dropdown
  3. Paste the AI-generated JavaScript code
  4. Click "Convert"
  5. Copy the converted string

Step 5: Test with Easy Scrape API

5.1 Update Postman Request

Back in Postman, update your request body:

{
"url": "https://www.bbc.com/news",
"script": "PASTE_YOUR_CONVERTED_CODE_HERE",
"outputFormat": "json"
}

5.2 Send the Request

Click Send and you should see a response like:

{
"message": "Script executed successfully",
"data": [
{
"title": "Breaking: Major news story headline here",
"link": "https://www.bbc.com/news/article-12345"
},
{
"title": "Another important news headline",
"link": "https://www.bbc.com/news/article-67890"
}
],
"executionTime": 2500,
"logs": [
"Page loaded, starting headline extraction...",
"Found 15 headlines"
]
}

Step 6: Improve Your Script

6.1 Refine with AI

If the results aren't perfect, go back to your AI assistant with:

The code worked but I'm getting some unwanted results. Here's what I got:

[PASTE YOUR ACTUAL RESULTS]

Can you improve the code to:
1. Filter out navigation links and ads
2. Only get main story headlines
3. Make sure all links are complete URLs
4. Limit to the top 5 most important stories

Please update the JavaScript code.

6.2 Test Different Selectors

You can also ask AI to try different approaches:

The previous selectors might not be catching all headlines. Can you create 2-3 different versions of the code that try different CSS selectors for BBC News headlines? I want to test which works best.

Step 7: Real-World Applications

Now that you can scrape BBC News headlines, you can:

7.1 Monitor Breaking News

Set up automated requests to check for new headlines every hour.

7.2 Create News Alerts

Compare current headlines with previous results to detect new stories.

Collect headlines over time to analyze trending topics.

7.4 Content Curation

Use the headlines as sources for your own news aggregation.

Troubleshooting

Common Issues and Solutions

"No headlines found"

  • Problem: Selectors might be wrong
  • Solution: Ask AI to analyze the HTML again with different approaches
  • Problem: Selector is too broad
  • Solution: Ask AI to make selectors more specific to article headlines
  • Problem: BBC uses relative URLs
  • Solution: The example code already handles this with URL completion

"Too many results"

  • Problem: Script is catching everything
  • Solution: Ask AI to add better filtering conditions

Next Steps

  1. Try Different News Sites: Apply the same process to CNN, Reuters, or your local news
  2. Add More Data: Extract article summaries, publish dates, or categories
  3. Automate Collection: Set up scheduled runs in your automation platform
  4. Create Alerts: Build a system to notify you of breaking news

Pro Tips

  • 🧠 Be Specific with AI: The more detailed your prompt, the better the code
  • 🔍 Test Selectors: Use browser developer tools to verify CSS selectors
  • 🚀 Start Simple: Get basic headlines working before adding complexity
  • 📊 Check Logs: Use console.log output to debug issues
  • 🔄 Iterate: Don't expect perfect results on the first try

Congratulations! You've successfully created your first web scraper without writing any code yourself. The combination of AI assistance and Easy Scrape API makes web scraping accessible to everyone.