Tutorial: Extract Courses from Coursera

Learn how to scrape course information from Coursera using Easy Scrape API - no coding experience required!

What You'll Learn

In this tutorial, you'll discover how to:

Navigate through multiple pages of course listings
Extract detailed course information automatically
Use AI to generate complex interaction code
Handle pagination and dynamic content loading
Build automated course research systems

Prerequisites

Postman installed on your computer
Easy Scrape API subscription on RapidAPI
Your RapidAPI key ready

Step 1: Why This Tutorial Shows Real Interaction Power

Showcase True Browser Navigation!

Unlike static scraping tools that only capture screenshots, Easy Scrape API lets you interact with websites like a real person! In this tutorial, you'll:

✅ Navigate to Coursera's course listings
✅ Click through pagination (pages 1, 2, 3)
✅ Extract course details from each page
✅ Collect comprehensive course data across multiple pages
✅ Aggregate results into structured format

This demonstrates true browser automation - you're actively using the website, not just reading static content!

We'll scrape courses from Coursera (https://www.coursera.org/courses) because:

✅ Rich educational content and course data
✅ Clear pagination system to demonstrate navigation
✅ Structured course information (titles, ratings, providers)
✅ Real-world use case for education research

Step 2: Get the Page HTML for AI Analysis

First, we need to get the HTML structure so AI can help us create the scraping code.

2.1 Set Up Basic Postman Request

Open Postman and create a new request
Set the method to POST
Set the URL to: https://easy-scrape-api.p.rapidapi.com/api/scrape

2.2 Add Headers

Go to the Headers tab and add:

Key	Value
`X-RapidAPI-Key`	`YOUR_RAPIDAPI_KEY`
`X-RapidAPI-Host`	`easy-scrape-api.p.rapidapi.com`
`Content-Type`	`application/json`

2.3 Get Page HTML

In the Body tab (select raw → JSON), add:

{
  "url": "https://www.coursera.org/courses",
  "outputFormat": "html"
}

Click Send to get the page HTML. Save the HTML from the response as a file (e.g., coursera_courses.html) and then upload this file to your AI assistant for analysis.

Step 3: Generate Scraping Code with AI

Now we'll use AI to create the JavaScript code for extracting course information.

3.1 AI Prompt Template

Upload the HTML file you saved in Step 2 to your favorite AI assistant (ChatGPT, Claude, Gemini, etc.) and use this prompt:

I want to scrape course information from Coursera. I've uploaded an HTML file containing the page structure.

Please analyze the HTML file and write JavaScript code for the Easy Scrape API that:

NAVIGATION INTERACTIONS:
1. Navigate to Coursera courses page (https://www.coursera.org/courses)
2. Wait for the page to fully load
3. Extract courses from page 1
4. Click on page 2 navigation
5. Wait for page 2 content to load and extract courses
6. Click on page 3 navigation  
7. Wait for page 3 content to load and extract courses

COURSE EXTRACTION:
8. Extract course information from each page including:
   - Course title
   - Course link
   - Provider/University name
   - Course rating
   - Number of students enrolled (if available)
9. Combine results from all pages
10. Return them as an array of objects

TECHNICAL REQUIREMENTS:
- Use 'page' object (Puppeteer v24) for navigation and clicking
- Use 'cheerio' for HTML parsing after each page load
- Include detailed console.log statements for debugging each step
- Handle cases where pagination might not be available
- Don't include inline comments or block comments

AI Code Cleanup

The AI might generate unnecessary code like imports const puppeteer = require('puppeteer') or navigation statements await page.goto('https://...').

Remove any code that includes:

Import statements (puppeteer, cheerio imports)
page.goto() calls
Browser launch/close code

Only copy the code that starts AFTER any page.goto() statements. The Easy Scrape API handles navigation automatically.

Remember: Your script must always include a return statement with your data!

3.2 Example Interactive AI-Generated Code

The AI should generate something like this:

 const allCourses = [];
const COURSE_ITEM_SELECTOR = '.cds-ProductCard-body';

const extractCourses = (html, pageNumber) => {
    const $ = cheerio.load(html);
    const courses = [];

    $(COURSE_ITEM_SELECTOR).each((index, element) => {
        try {
            const $el = $(element);

            const linkEl = $el.find('a[data-track-component="product_card"]');
            const courseLink = linkEl.attr('href') ? `https://www.coursera.org${linkEl.attr('href')}` : 'N/A';

            const courseTitle = $el.find('.cds-ProductCard-header-product-name').text().trim() ||
                $el.find('h3').text().trim() ||
                'Title Not Found';

            const providerName = $el.find('.cds-ProductCard-partnerNames').text().trim() ||
                $el.find('.partnerNames').text().trim() ||
                'Provider Not Found';

            const ratingText = $el.find('.ratings-text').text().trim() ||
                $el.find('.rc-ProductCard-ratings-count').text().trim() ||
                'Rating N/A';

            let courseRating = 'N/A';
            let enrolledStudents = 'N/A';

            if (ratingText) {
                const match = ratingText.match(/([\d.]+)\s+\((.+)\)/);
                if (match) {
                    courseRating = match[1];
                }
            }

            const enrollmentMatch = $el.text().match(/(\d+[kMb]*)\s+students/);
            if (enrollmentMatch) {
                enrolledStudents = enrollmentMatch[1];
            }

            courses.push({
                page: pageNumber,
                courseTitle,
                courseLink,
                providerName,
                courseRating,
                enrolledStudents,
            });
        } catch (error) {
            console.error(`Error processing course item on page ${pageNumber}: ${error.message}`);
        }
    });

    console.log(`-> Successfully extracted ${courses.length} courses from Page ${pageNumber}`);
    return courses;
};

console.log('2. Waiting for page 1 content to load...');
await page.waitForSelector(COURSE_ITEM_SELECTOR, { visible: true });


console.log('3. Extracting courses from Page 1');
let html = await page.content();
allCourses.push(...extractCourses(html, 1));

const page2Selector = 'button[aria-label="Go to page 2"]';
console.log(`4. Clicking on page 2 navigation using selector: ${page2Selector}`);
let page2Button = await page.$(page2Selector);

if (page2Button) {
    await page2Button.click();

    console.log('5. Waiting for page 2 content to load (waiting for network idle)...');
    await page.waitForNetworkIdle({ idleTime: 1000, timeout: 30000 });

    console.log('5. Extracting courses from Page 2');
    html = await page.content();
    allCourses.push(...extractCourses(html, 2));

} else {
    console.log('4/5. Page 2 navigation element not found. Stopping after page 1.');
    return allCourses;
}

const page3Selector = 'button[aria-label="Go to page 3"]';
console.log(`6. Clicking on page 3 navigation using selector: ${page3Selector}`);
let page3Button = await page.$(page3Selector);

if (page3Button) {
    await page3Button.click();

    console.log('7. Waiting for page 3 content to load (waiting for network idle)...');
    await page.waitForNetworkIdle({ idleTime: 1000, timeout: 30000 });

    console.log('7. Extracting courses from Page 3');
    html = await page.content();
    allCourses.push(...extractCourses(html, 3));

} else {
    console.log('6/7. Page 3 navigation element not found. Stopping after page 2.');
}

console.log(`9. Combining results. Total courses extracted: ${allCourses.length}`);
return allCourses;

Step 4: Convert Code for Postman

Use our Online Code Parser Tool to convert the AI-generated code:

Go to Online Code Parser Tool
Select "Postman" from the dropdown
Paste the AI-generated JavaScript code
Click "Convert"
Copy the converted string

Step 5: Test with Easy Scrape API

5.1 Update Postman Request

Back in Postman, update your request body:

{
  "url": "https://www.coursera.org/courses",
  "script": "PASTE_YOUR_CONVERTED_CODE_HERE",
  "outputFormat": "json"
}

5.2 Send the Request

Click Send and you should see a response showing interactive navigation across multiple pages:

{
  "message": "Script executed successfully",
  "data": {
    "platform": "Coursera",
    "totalPages": 3,
    "totalCourses": 18,
    "courses": [
      {
        "page": 1,
        "title": "Machine Learning",
        "link": "https://www.coursera.org/learn/machine-learning",
        "provider": "Stanford University",
        "rating": "4.9"
      },
      {
        "page": 1,
        "title": "Deep Learning Specialization",
        "link": "https://www.coursera.org/specializations/deep-learning",
        "provider": "DeepLearning.AI",
        "rating": "4.8"
      },
      {
        "page": 2,
        "title": "Python for Everybody Specialization",
        "link": "https://www.coursera.org/specializations/python",
        "provider": "University of Michigan",
        "rating": "4.8"
      },
      {
        "page": 2,
        "title": "Google Data Analytics Professional Certificate",
        "link": "https://www.coursera.org/professional-certificates/google-data-analytics",
        "provider": "Google",
        "rating": "4.6"
      },
      {
        "page": 3,
        "title": "IBM Data Science Professional Certificate",
        "link": "https://www.coursera.org/professional-certificates/ibm-data-science",
        "provider": "IBM",
        "rating": "4.5"
      },
      {
        "page": 3,
        "title": "Meta Social Media Marketing Professional Certificate",
        "link": "https://www.coursera.org/professional-certificates/facebook-social-media-marketing",
        "provider": "Meta",
        "rating": "4.7"
      }
    ],
    "scrapedAt": "2024-10-02T16:45:00Z"
  },
  "executionTime": 12500,
  "logs": [
    "Starting interactive Coursera course scraping - navigating through multiple pages...",
    "Coursera courses page loaded successfully",
    "Extracting courses from page 1...",
    "Found 6 courses on page 1",
    "Looking for page 2 navigation...",
    "Clicked page 2 link",
    "Page 2 content loaded",
    "Found additional 6 courses on page 2",
    "Looking for page 3 navigation...",
    "Clicked page 3 link",
    "Page 3 content loaded",
    "Found additional 6 courses on page 3",
    "Total courses extracted from all pages: 18"
  ]
}

🎉 This demonstrates advanced interactive navigation! You just automated:

✅ Multi-Page Navigation: Clicking through Coursera's pagination system
✅ Course Extraction: Collecting 6 courses from each of 3 pages
✅ Data Aggregation: Combining results from all pages into a single dataset
✅ Page Tracking: Knowing which page each course came from
✅ Comprehensive Coverage: Getting 18 total courses across multiple pages

Step 6: Improve Your Script

6.1 Refine with AI

If the results aren't perfect, go back to your AI assistant with:

The code worked but I'm getting some unwanted results. Here's what I got:

[PASTE YOUR ACTUAL RESULTS]

Can you improve the code to:
1. Filter out non-course items
2. Only get actual course listings
3. Make sure all links are complete URLs
4. Limit to the top 5 most relevant courses

Please update the JavaScript code.

6.2 Test Different Selectors

You can also ask AI to try different approaches:

The previous selectors might not be catching all course listings. Can you create 2-3 different versions of the code that try different CSS selectors for Coursera course listings? I want to test which works best.

Troubleshooting

Common Issues and Solutions

"No courses found"

Problem: Selectors might be wrong
Solution: Ask AI to analyze the HTML again with different approaches

"Getting non-course items"

Problem: Selector is too broad
Solution: Ask AI to make selectors more specific to course listings

"Links are incomplete"

Problem: Coursera uses relative URLs
Solution: The example code already handles this with URL completion

"Too many results"

Problem: Script is catching everything
Solution: Ask AI to add better filtering conditions

Next Steps

Try Different Educational Sites: Apply the same process to edX, Udacity, or Khan Academy
Add More Data: Extract course descriptions, durations, or categories
Automate Collection: Set up scheduled runs in your automation platform
Create Alerts: Build a system to notify you of new or updated courses

Pro Tips

🧠 Be Specific with AI: The more detailed your prompt, the better the code
🔍 Test Selectors: Use browser developer tools to verify CSS selectors
🚀 Start Simple: Get basic course extraction working before adding complexity
📊 Check Logs: Use console.log output to debug issues
🔄 Iterate: Don't expect perfect results on the first try

Congratulations! You've successfully created your first web scraper for course information. The combination of AI assistance and Easy Scrape API makes web scraping accessible and powerful for everyone.

What You'll Learn​

Prerequisites​

Step 1: Why This Tutorial Shows Real Interaction Power​

Step 2: Get the Page HTML for AI Analysis​

2.1 Set Up Basic Postman Request​

2.2 Add Headers​

2.3 Get Page HTML​

Step 3: Generate Scraping Code with AI​

3.1 AI Prompt Template​

3.2 Example Interactive AI-Generated Code​

Step 4: Convert Code for Postman​

Step 5: Test with Easy Scrape API​

5.1 Update Postman Request​

5.2 Send the Request​

Step 6: Improve Your Script​

6.1 Refine with AI​

6.2 Test Different Selectors​

Troubleshooting​

Common Issues and Solutions​

"No courses found"​

"Getting non-course items"​

"Links are incomplete"​

"Too many results"​

Next Steps​

Pro Tips​