Building an AI LinkedIn Sourcing Agent

The Challenge: Round Two with LinkedIn

Read the full version at https://blog.pranshu-raj.me/posts/linkedin-scraping-full.

Recruiting is broken. Finding the right candidates is like searching for needles in a haystack, and when you do find them, your generic LinkedIn message gets lost with 50 others.

Two years ago, I tried building a LinkedIn scraper. LinkedIn’s anti-scraping measures crushed that dream within days. This hackathon was my chance for round two.

What I Built

Instead of another keyword-matching tool, I built something that thinks like a recruiter:

Job Description → Smart Search → Profile Scraping → AI Scoring → Personalized Messages

Core components:

Multi-source discovery: LinkedIn + GitHub profiles
6-factor scoring algorithm that goes beyond keywords
AI-powered outreach via Llama/Groq
Async processing for multiple jobs

The Technical Stack

FastAPI backend with async processing throughout. Used RapidAPI for LinkedIn data (because scraping LinkedIn directly is still a nightmare), SerpAPI for search, and Groq for AI messaging.

Data flow diagram

App architecture

Smart Scoring Algorithm

Factor	Weight	What It Measures
Education	20%	Elite schools get higher scores
Career Trajectory	20%	Clear progression vs. lateral moves
Company Relevance	15%	Relevant industry experience
Skill Match	25%	How well skills align with requirements
Location	10%	Geographic fit
Tenure	10%	Stability vs. job hopping

The LLM understands context - an engineer who went startup → Google → senior role gets higher trajectory scores than someone stuck at the same level.

Personalized Outreach

Generic LinkedIn messages get ignored. My solution creates messages that reference specific achievements and feel personal, not templated.

Example: “Hi John, I noticed your transformer optimization work at Google Research, particularly your ICML paper on efficient attention mechanisms…”

What Actually Worked

Smart Caching: Saves API costs by checking if we’ve seen profiles before Async Processing: Process 10 profiles in 5-6 seconds instead of 30 seconds sequentially
The Scoring Algorithm: LLMs recognize career patterns that regex never could

The Real Challenges

LinkedIn’s Anti-Scraping: Learned my lesson from 2022. Went straight to paid APIs instead of fighting their defenses.

LLM Consistency: Groq returned proper JSON maybe 70% of the time. Had to build fallback parsing with regex.

Data Validation: Biggest time sink. Started doing TDD-style development midway through.

Why This Matters

This isn’t just a hackathon project. It could save recruiters hours of manual work, increase response rates through personalization, and find qualified candidates that keyword searches miss.

The expensive API solution turned out cheaper when you factor in development time. Sometimes paying for reliability beats fighting the system.

Try it yourself: GitHub Repository