Tutorial: Extract Courses from Coursera
Learn how to scrape course information from Coursera using Easy Scrape API - no coding experience required!
What You'll Learn
In this tutorial, you'll discover how to:
- Navigate through multiple pages of course listings
- Extract detailed course information automatically
- Use AI to generate complex interaction code
- Handle pagination and dynamic content loading
- Build automated course research systems
Prerequisites
Step 1: Why This Tutorial Shows Real Interaction Power
Unlike static scraping tools that only capture screenshots, Easy Scrape API lets you interact with websites like a real person! In this tutorial, you'll:
- ✅ Navigate to Coursera's course listings
- ✅ Click through pagination (pages 1, 2, 3)
- ✅ Extract course details from each page
- ✅ Collect comprehensive course data across multiple pages
- ✅ Aggregate results into structured format
This demonstrates true browser automation - you're actively using the website, not just reading static content!
We'll scrape courses from Coursera (https://www.coursera.org/courses) because:
- ✅ Rich educational content and course data
- ✅ Clear pagination system to demonstrate navigation
- ✅ Structured course information (titles, ratings, providers)
- ✅ Real-world use case for education research
Step 2: Get the Page HTML for AI Analysis
First, we need to get the HTML structure so AI can help us create the scraping code.
2.1 Set Up Basic Postman Request
- Open Postman and create a new request
- Set the method to
POST - Set the URL to:
https://easy-scrape-api.p.rapidapi.com/api/scrape
2.2 Add Headers
Go to the Headers tab and add:
| Key | Value |
|---|---|
X-RapidAPI-Key | YOUR_RAPIDAPI_KEY |
X-RapidAPI-Host | easy-scrape-api.p.rapidapi.com |
Content-Type | application/json |
2.3 Get Page HTML
In the Body tab (select raw → JSON), add:
{
"url": "https://www.coursera.org/courses",
"outputFormat": "html"
}
Click Send to get the page HTML. Save the HTML from the response as a file (e.g., coursera_courses.html) and then upload this file to your AI assistant for analysis.
Step 3: Generate Scraping Code with AI
Now we'll use AI to create the JavaScript code for extracting course information.
3.1 AI Prompt Template
Upload the HTML file you saved in Step 2 to your favorite AI assistant (ChatGPT, Claude, Gemini, etc.) and use this prompt:
I want to scrape course information from Coursera. I've uploaded an HTML file containing the page structure.
Please analyze the HTML file and write JavaScript code for the Easy Scrape API that:
NAVIGATION INTERACTIONS:
1. Navigate to Coursera courses page (https://www.coursera.org/courses)
2. Wait for the page to fully load
3. Extract courses from page 1
4. Click on page 2 navigation
5. Wait for page 2 content to load and extract courses
6. Click on page 3 navigation
7. Wait for page 3 content to load and extract courses
COURSE EXTRACTION:
8. Extract course information from each page including:
- Course title
- Course link
- Provider/University name
- Course rating
- Number of students enrolled (if available)
9. Combine results from all pages
10. Return them as an array of objects
TECHNICAL REQUIREMENTS:
- Use 'page' object (Puppeteer v24) for navigation and clicking
- Use 'cheerio' for HTML parsing after each page load
- Include detailed console.log statements for debugging each step
- Handle cases where pagination might not be available
- Don't include inline comments or block comments
The AI might generate unnecessary code like imports const puppeteer = require('puppeteer') or navigation statements await page.goto('https://...').
Remove any code that includes:
- Import statements (puppeteer, cheerio imports)
page.goto()calls- Browser launch/close code
Only copy the code that starts AFTER any page.goto() statements. The Easy Scrape API handles navigation automatically.
Remember: Your script must always include a return statement with your data!
3.2 Example Interactive AI-Generated Code
The AI should generate something like this:
const allCourses = [];
const COURSE_ITEM_SELECTOR = '.cds-ProductCard-body';
const extractCourses = (html, pageNumber) => {
const $ = cheerio.load(html);
const courses = [];
$(COURSE_ITEM_SELECTOR).each((index, element) => {
try {
const $el = $(element);
const linkEl = $el.find('a[data-track-component="product_card"]');
const courseLink = linkEl.attr('href') ? `https://www.coursera.org${linkEl.attr('href')}` : 'N/A';
const courseTitle = $el.find('.cds-ProductCard-header-product-name').text().trim() ||
$el.find('h3').text().trim() ||
'Title Not Found';
const providerName = $el.find('.cds-ProductCard-partnerNames').text().trim() ||
$el.find('.partnerNames').text().trim() ||
'Provider Not Found';
const ratingText = $el.find('.ratings-text').text().trim() ||
$el.find('.rc-ProductCard-ratings-count').text().trim() ||
'Rating N/A';
let courseRating = 'N/A';
let enrolledStudents = 'N/A';
if (ratingText) {
const match = ratingText.match(/([\d.]+)\s+\((.+)\)/);
if (match) {
courseRating = match[1];
}
}
const enrollmentMatch = $el.text().match(/(\d+[kMb]*)\s+students/);
if (enrollmentMatch) {
enrolledStudents = enrollmentMatch[1];
}
courses.push({
page: pageNumber,
courseTitle,
courseLink,
providerName,
courseRating,
enrolledStudents,
});
} catch (error) {
console.error(`Error processing course item on page ${pageNumber}: ${error.message}`);
}
});
console.log(`-> Successfully extracted ${courses.length} courses from Page ${pageNumber}`);
return courses;
};
console.log('2. Waiting for page 1 content to load...');
await page.waitForSelector(COURSE_ITEM_SELECTOR, { visible: true });
console.log('3. Extracting courses from Page 1');
let html = await page.content();
allCourses.push(...extractCourses(html, 1));
const page2Selector = 'button[aria-label="Go to page 2"]';
console.log(`4. Clicking on page 2 navigation using selector: ${page2Selector}`);
let page2Button = await page.$(page2Selector);
if (page2Button) {
await page2Button.click();
console.log('5. Waiting for page 2 content to load (waiting for network idle)...');
await page.waitForNetworkIdle({ idleTime: 1000, timeout: 30000 });
console.log('5. Extracting courses from Page 2');
html = await page.content();
allCourses.push(...extractCourses(html, 2));
} else {
console.log('4/5. Page 2 navigation element not found. Stopping after page 1.');
return allCourses;
}
const page3Selector = 'button[aria-label="Go to page 3"]';
console.log(`6. Clicking on page 3 navigation using selector: ${page3Selector}`);
let page3Button = await page.$(page3Selector);
if (page3Button) {
await page3Button.click();
console.log('7. Waiting for page 3 content to load (waiting for network idle)...');
await page.waitForNetworkIdle({ idleTime: 1000, timeout: 30000 });
console.log('7. Extracting courses from Page 3');
html = await page.content();
allCourses.push(...extractCourses(html, 3));
} else {
console.log('6/7. Page 3 navigation element not found. Stopping after page 2.');
}
console.log(`9. Combining results. Total courses extracted: ${allCourses.length}`);
return allCourses;
Step 4: Convert Code for Postman
Use our Online Code Parser Tool to convert the AI-generated code:
- Go to Online Code Parser Tool
- Select "Postman" from the dropdown
- Paste the AI-generated JavaScript code
- Click "Convert"
- Copy the converted string
Step 5: Test with Easy Scrape API
5.1 Update Postman Request
Back in Postman, update your request body:
{
"url": "https://www.coursera.org/courses",
"script": "PASTE_YOUR_CONVERTED_CODE_HERE",
"outputFormat": "json"
}
5.2 Send the Request
Click Send and you should see a response showing interactive navigation across multiple pages:
{
"message": "Script executed successfully",
"data": {
"platform": "Coursera",
"totalPages": 3,
"totalCourses": 18,
"courses": [
{
"page": 1,
"title": "Machine Learning",
"link": "https://www.coursera.org/learn/machine-learning",
"provider": "Stanford University",
"rating": "4.9"
},
{
"page": 1,
"title": "Deep Learning Specialization",
"link": "https://www.coursera.org/specializations/deep-learning",
"provider": "DeepLearning.AI",
"rating": "4.8"
},
{
"page": 2,
"title": "Python for Everybody Specialization",
"link": "https://www.coursera.org/specializations/python",
"provider": "University of Michigan",
"rating": "4.8"
},
{
"page": 2,
"title": "Google Data Analytics Professional Certificate",
"link": "https://www.coursera.org/professional-certificates/google-data-analytics",
"provider": "Google",
"rating": "4.6"
},
{
"page": 3,
"title": "IBM Data Science Professional Certificate",
"link": "https://www.coursera.org/professional-certificates/ibm-data-science",
"provider": "IBM",
"rating": "4.5"
},
{
"page": 3,
"title": "Meta Social Media Marketing Professional Certificate",
"link": "https://www.coursera.org/professional-certificates/facebook-social-media-marketing",
"provider": "Meta",
"rating": "4.7"
}
],
"scrapedAt": "2024-10-02T16:45:00Z"
},
"executionTime": 12500,
"logs": [
"Starting interactive Coursera course scraping - navigating through multiple pages...",
"Coursera courses page loaded successfully",
"Extracting courses from page 1...",
"Found 6 courses on page 1",
"Looking for page 2 navigation...",
"Clicked page 2 link",
"Page 2 content loaded",
"Found additional 6 courses on page 2",
"Looking for page 3 navigation...",
"Clicked page 3 link",
"Page 3 content loaded",
"Found additional 6 courses on page 3",
"Total courses extracted from all pages: 18"
]
}
🎉 This demonstrates advanced interactive navigation! You just automated:
- ✅ Multi-Page Navigation: Clicking through Coursera's pagination system
- ✅ Course Extraction: Collecting 6 courses from each of 3 pages
- ✅ Data Aggregation: Combining results from all pages into a single dataset
- ✅ Page Tracking: Knowing which page each course came from
- ✅ Comprehensive Coverage: Getting 18 total courses across multiple pages
Step 6: Improve Your Script
6.1 Refine with AI
If the results aren't perfect, go back to your AI assistant with:
The code worked but I'm getting some unwanted results. Here's what I got:
[PASTE YOUR ACTUAL RESULTS]
Can you improve the code to:
1. Filter out non-course items
2. Only get actual course listings
3. Make sure all links are complete URLs
4. Limit to the top 5 most relevant courses
Please update the JavaScript code.
6.2 Test Different Selectors
You can also ask AI to try different approaches:
The previous selectors might not be catching all course listings. Can you create 2-3 different versions of the code that try different CSS selectors for Coursera course listings? I want to test which works best.
Troubleshooting
Common Issues and Solutions
"No courses found"
- Problem: Selectors might be wrong
- Solution: Ask AI to analyze the HTML again with different approaches
"Getting non-course items"
- Problem: Selector is too broad
- Solution: Ask AI to make selectors more specific to course listings
"Links are incomplete"
- Problem: Coursera uses relative URLs
- Solution: The example code already handles this with URL completion
"Too many results"
- Problem: Script is catching everything
- Solution: Ask AI to add better filtering conditions
Next Steps
- Try Different Educational Sites: Apply the same process to edX, Udacity, or Khan Academy
- Add More Data: Extract course descriptions, durations, or categories
- Automate Collection: Set up scheduled runs in your automation platform
- Create Alerts: Build a system to notify you of new or updated courses
Pro Tips
- 🧠 Be Specific with AI: The more detailed your prompt, the better the code
- 🔍 Test Selectors: Use browser developer tools to verify CSS selectors
- 🚀 Start Simple: Get basic course extraction working before adding complexity
- 📊 Check Logs: Use console.log output to debug issues
- 🔄 Iterate: Don't expect perfect results on the first try
Congratulations! You've successfully created your first web scraper for course information. The combination of AI assistance and Easy Scrape API makes web scraping accessible and powerful for everyone.