Case Study

Building Job Hunter SG

An AI-powered job aggregator and resume coaching platform for the Singapore job market.

March 2026 · 10 min read · job.kooexperience.com

FastAPI React SEA-LION AI MCF + Careers@Gov 5 Validation Gates
Origin

The Problem

Before starting AIAP, I was job searching in Singapore and hit a wall.

The project started before AIAP, during my own job search. I was building a scraper for Careers@Gov to create a job matching and reminder system - back when Careers@Gov 2.0 was still in the pipeline and the existing portal had limited search functionality. The original version was simple: scrape, match keywords, send notifications.

But the AI landscape wasn't ready yet. LLMs hadn't reached the point where you could reliably rewrite resume bullets without hallucinating metrics or fabricating skills. The project sat dormant.

Then my friend Yanwen brought up resume building during a conversation, and it reignited everything. With SEA-LION (AI Singapore's open models) now available and capable enough for structured text generation, the missing piece was finally in place. What started as a Careers@Gov scraper evolved into a full resume coaching platform.

The tools that existed were either generic (not built for Singapore) or too simple (keyword stuffing without understanding the actual JD). I wanted something that could:

Search everywhere at once
Aggregate MyCareersFuture and Careers@Gov into one interface with deduplication. Extensible to more sources.
Understand the JD
Pre-parse every job description at scrape time into structured fields: skills, experience years, education level.
Coach, not just match
AI that rewrites resume bullets with 5 validation gates to prevent hallucinated metrics and fabricated skills.
Built for Singapore
SkillsFuture Skills Framework integration, Singapore Professional templates, PR/citizenship-aware fields.
Scale

By the Numbers

67K+
Jobs Indexed
2
Active Sources
7
Pipeline Stages
5
Validation Gates
8
Resume Templates
413
Known Skills
60+
API Endpoints
50ms
JD Pre-Parse
Architecture

How It Works

A FastAPI backend powers job aggregation and AI coaching. React frontend provides the search and resume editing experience.

Backend
FastAPI SQLAlchemy PostgreSQL
60+ endpoints, auto-migrating ORM. SQLite locally, PostgreSQL on Railway.
Frontend
React 18 Vite Tailwind
Single-page app with drag-and-drop resume editing and Framer Motion animations.
AI Models
SEA-LION 70B SEA-LION 32B
70B for strategy and summaries (~10s). 32B for batched bullet rewrites (~15s pipeline, <2s interactive). 5 API keys cycled to stay under per-key rate limits.
Deployment
Railway Docker
Auto-deploys on git push. Persistent PostgreSQL.
Core Feature

The 7-Stage Resume Pipeline

The heart of Job Hunter is the resume tailoring pipeline. It takes your resume and a job description, then produces a tailored version through 7 validated stages.

Analyze
Local, 200ms
Strategize
70B, ~10s
Cleanup
Local, 50ms
Rewrite
32B, ~15s
Polish
Local, 50ms
Summarize
70B, ~12s
Validate
Local, 50ms
Key design decision: Every AI step is followed by a local validation step. The AI generates, the gates verify. This prevents hallucinated metrics, fabricated skills, and inflated claims from reaching the final resume.

The pipeline supports three intensity modes: nudge (local only, 5s), keywords (with bullet rewrites, 30s), and full (all stages including summary, 45-60s).

Quality Control

5 Validation Gates

Every AI-generated rewrite passes through five gates before being accepted. Any failure reverts to the original text.

Gate What it checks On failure
Fact Preservation All numbers and metrics from original must appear in rewrite Revert
AI Phrase Detection Auto-replace 84 weak phrases ("leveraged", "synergize") unless phrase appears in the JD Auto-fix
Keyword Verbatim Required keywords appear exactly as in JD Warn
Length Sanity Max 40 words per bullet, not 1.8x longer than original Revert
Hallucination Detection Reject unknown terms not in resume or JD Revert
Why this matters: LLMs hallucinate. A model might turn "managed 5-person team" into "led 200-person team." The fact preservation gate catches this and reverts. Your resume stays honest.

Before / After

Here is what happens when the pipeline rewrites a bullet and the gates intervene:

Original
"Managed 5-person analytics team and improved report turnaround by 40% across 3 departments"
After pipeline + gates
"Led a 5-person analytics team that cut report turnaround by 40%, streamlining workflows across 3 departments"

The gate checks: "5-person", "40%", and "3 departments" all appear in the original and survive in the rewrite. If the AI had inflated "5-person" to "20-person", the fact preservation gate would revert the entire bullet to the original.

Data

Job Sources

Two primary sources power the nightly crawl via a Railway cron job at 22:00 UTC. Extensible scraper architecture supports more.

Active
MyCareersFuture
Public API. Full pagination nightly (~12K jobs). Salary ranges and skills metadata included.
Active
Careers@Gov 2.0
Workday backend. Custom scraper fetches detail pages and parses skill tags from the API response (~3K jobs).

The scraper architecture supports additional sources (NodeFlair, Indeed, JobStreet, Adzuna, Jooble) via a pluggable SOURCE_MAP, but the crawl currently runs only the two primary APIs. MCF alone covers most of the Singapore job market with structured salary and skills data. Enabling an existing scraper is a config change; adding a new source requires writing a scraper class.

Careers@Gov challenge: Government jobs on Workday return minimal metadata. The CareersGovScraper fetches individual detail pages, extracts skillTags from the API response, and falls back to parsing skill cues from the JD text using jd_preparser.py (regex pattern matching, ~50ms/job).
Tradeoffs

What I Tried and What Broke

The interesting engineering is in the failures, not the features.

Injectable vs non-injectable keywords

The first version of the keyword integration just stuffed every missing JD keyword into the resume. The result read like a search engine, not a human. The fix: Stage 0 classifies every missing skill as injectable (user has adjacent experience) or non-injectable (user has no basis for this claim). The AI is only allowed to weave in injectable keywords. Non-injectable ones get flagged as skill gaps, not fabricated onto the resume.

Sibling bullet context

Early rewrites had a duplication problem: the AI would rewrite three bullets in the same entry to all start with "Spearheaded" and repeat the same achievement. Stage 3 now passes sibling context to every rewrite call, so the model knows what the other bullets already say. Stage 4 then runs verb synonym dedup (15 verb groups) as a safety net.

Degraded mode

SEA-LION's 70B model occasionally times out or returns malformed JSON. When Stage 1 (strategic analysis) fails, the pipeline doesn't crash. It falls back to a local heuristic: prioritize bullets with the most issues (from the scorer) and mark the result as _degraded. The user still gets a tailored resume, just without the full strategic reasoning.

Semantic search with embeddings

Keyword matching misses jobs described differently from how you'd write your resume. The backend includes a semantic search layer using sentence-transformers/all-MiniLM-L6-v2 (384-dim embeddings). Both job descriptions and resumes are encoded, and cosine similarity surfaces matches that keyword search would miss. This is how a "data pipeline engineer" resume can match a "data infrastructure" JD.

Engineering

Key Technical Decisions

Pre-parse at scrape time
JDs are parsed into skills, experience years, and education level at scrape time (50ms/job), not on-demand. Skill gap analysis is instant.
70B vs 32B model split
70B for strategic analysis (progress bar, ~10s). 32B for batched bullet rewrites (~15s in pipeline, <2s for single interactive edits).
Validate, don't trust
Every AI output passes through 5 gates. If any gate fails, the original text is preserved. 222 backend tests verify the gates work.
Structured resume model
Resumes are parsed into hierarchical JSON (sections, entries, bullets) enabling surgical edits, per-bullet scoring, and change tracking.
Rate limit distribution
SEA-LION's free tier allows 10 req/min per key. 5 keys are cycled with a simple index counter, and a single in-memory token bucket (45 tokens, no Redis) gates all outbound calls.
Multi-user auth + versioning
JWT + bcrypt auth with tier-based rate limiting. Users can save, label, and manage multiple resume versions linked to specific jobs.
Roadmap

What's Next

Job Hunter is live at job.kooexperience.com. The core experience works, but there's more to build:

Enable remaining 5 sources in cron
The nightly cron currently seeds MCF + Careers@Gov. Extend it to include the 5 additional scrapers.
Enable remaining 5 scrapers
NodeFlair, Indeed, JobStreet, Adzuna, and Jooble scrapers are built but not yet in the production crawl pipeline.
ATS gap visualization
The backend scores skill gaps; the frontend needs to visualize them clearly.
SkillsFuture course linking
The SSG-WSG Skills Framework API is already integrated. Next: link skill gaps to specific SkillsFuture courses.