Applying to 600+ positions manually was soul-crushing. So I did what any engineer would do — I automated it. Here's how I built a job scraping pipeline that aggregates 1,300+ daily postings into a single Google Sheet.

The Challenge

As an international student seeking Summer 2026 internships, my application volume needed to be high. Job postings were scattered across LinkedIn, Indeed, Glassdoor, Handshake, and company career pages. Manually checking each platform daily, filtering for relevant roles, and tracking applications in a spreadsheet was consuming 2-3 hours every day — time I needed for coursework and interview prep.

The Goal

Build an automated system that scrapes internship postings from multiple sources, filters them by relevance (SDE, backend, full-stack roles that sponsor international students), deduplicates across platforms, and populates a Google Sheet with clean, actionable data — title, company, location, posting date, and application link.

The Approach

I built the pipeline in Python using Selenium for JavaScript-heavy sites and BeautifulSoup for static pages. For LinkedIn and Indeed, I used their unofficial APIs where possible to avoid scraping limitations. The deduplication engine uses fuzzy string matching on job titles and company names to catch cross-platform duplicates. A Google Sheets API integration pushes new postings directly into a structured spreadsheet with conditional formatting that highlights roles matching my target criteria. The entire pipeline runs on a scheduled cron job.

The Impact

The system processes 1,300+ job postings daily with 98%+ accuracy in relevance filtering. My daily job search time dropped from 2-3 hours to about 20 minutes of reviewing pre-filtered, deduplicated results. The structured Google Sheet format made it easy to track application status, follow-up dates, and response rates across hundreds of applications.

Key Takeaways

Automating repetitive tasks is the most valuable skill an engineer can have — even outside of work. Fuzzy matching is essential for real-world deduplication where the same job appears with slightly different titles across platforms. And building tools for yourself teaches you more about real-world software challenges than most coursework projects.

Questions? kanade.pra@northeastern.edu