Building Job Hunter SG - AI Resume Coaching for Singapore

Origin

The Problem

Before starting AIAP, I was job searching in Singapore and hit a wall.

The project started before AIAP, during my own job search. I was building a scraper for Careers@Gov to create a job matching and reminder system - back when Careers@Gov 2.0 was still in the pipeline and the existing portal had limited search functionality. The original version was simple: scrape, match keywords, send notifications.

But the AI landscape wasn't ready yet. LLMs hadn't reached the point where you could reliably rewrite resume bullets without hallucinating metrics or fabricating skills. The project sat dormant.

Then my friend Yanwen brought up resume building during a conversation, and it reignited everything. With SEA-LION (AI Singapore's open models) now available and capable enough for structured text generation, the missing piece was finally in place. What started as a Careers@Gov scraper evolved into a full resume coaching platform.

The tools that existed were either generic (not built for Singapore) or too simple (keyword stuffing without understanding the actual JD). I wanted something that could:

Search everywhere at once

Aggregate MyCareersFuture and Careers@Gov into one interface with deduplication. Extensible to more sources.

Understand the JD

Pre-parse every job description at scrape time into structured fields: skills, experience years, education level.

Coach, not just match

AI that rewrites resume bullets with 5 validation gates to prevent hallucinated metrics and fabricated skills.

Built for Singapore

SkillsFuture Skills Framework integration, Singapore Professional templates, PR/citizenship-aware fields.

Scale

By the Numbers

67K+

Jobs Indexed

2

Active Sources

7

Pipeline Stages

5

Validation Gates

8

Resume Templates

413

Known Skills

60+

API Endpoints

50ms

JD Pre-Parse

Architecture

How It Works

A FastAPI backend powers job aggregation and AI coaching. React frontend provides the search and resume editing experience.

Backend

FastAPI SQLAlchemy PostgreSQL
60+ endpoints, auto-migrating ORM. SQLite locally, PostgreSQL on Railway.

Frontend

React 18 Vite Tailwind
Single-page app with drag-and-drop resume editing and Framer Motion animations.

AI Models

SEA-LION 70B SEA-LION 32B
70B for strategy and summaries (~10s). 32B for batched bullet rewrites (~15s pipeline, <2s interactive). 5 API keys cycled to stay under per-key rate limits.

Deployment

Railway Docker
Auto-deploys on git push. Persistent PostgreSQL.

Core Feature

The 7-Stage Resume Pipeline

The resume tailoring pipeline takes your resume and a job description, then produces a tailored version through 7 validated stages.

Analyze

Local, 200ms

Strategize

70B, ~10s

Cleanup

Local, 50ms

Rewrite

32B, ~15s

Polish

Local, 50ms

Summarize

70B, ~12s

Validate

Local, 50ms

Key design decision: Every AI step is followed by a local validation step. The AI generates, the gates verify. This prevents hallucinated metrics, fabricated skills, and inflated claims from reaching the final resume.

The pipeline supports three intensity modes: nudge (local only, 5s), keywords (with bullet rewrites, 30s), and full (all stages including summary, 45-60s).

Quality Control

5 Validation Gates

Every AI-generated rewrite passes through five gates before being accepted. Any failure reverts to the original text.

Gate	What it checks	On failure
Fact Preservation	All numbers and metrics from original must appear in rewrite	Revert
AI Phrase Detection	Auto-replace 84 weak phrases ("leveraged", "synergize") unless phrase appears in the JD	Auto-fix
Keyword Verbatim	Required keywords appear exactly as in JD	Warn
Length Sanity	Max 40 words per bullet, not 1.8x longer than original	Revert
Hallucination Detection	Reject unknown terms not in resume or JD	Revert

Why this matters: LLMs hallucinate. A model might turn "managed 5-person team" into "led 200-person team." The fact preservation gate catches this and reverts. Your resume stays honest.

Before / After

Here is what happens when the pipeline rewrites a bullet and the gates intervene:

Original

"Managed 5-person analytics team and improved report turnaround by 40% across 3 departments"

After pipeline + gates

"Led a 5-person analytics team that cut report turnaround by 40%, streamlining workflows across 3 departments"

The gate checks: "5-person", "40%", and "3 departments" all appear in the original and survive in the rewrite. If the AI had inflated "5-person" to "20-person", the fact preservation gate would revert the entire bullet to the original.

Data

Job Sources

Two primary sources power the nightly crawl via a Railway cron job at 22:00 UTC. Extensible scraper architecture supports more.

Active

MyCareersFuture

Public API. Full pagination nightly (~12K jobs). Salary ranges and skills metadata included.

Active

Careers@Gov 2.0

Workday backend. Custom scraper fetches detail pages and parses skill tags from the API response (~3K jobs).

The scraper architecture supports additional sources (NodeFlair, Indeed, JobStreet, Adzuna, Jooble) via a pluggable SOURCE_MAP, but the crawl currently runs only the two primary APIs. MCF alone covers most of the Singapore job market with structured salary and skills data. Enabling an existing scraper is a config change; adding a new source requires writing a scraper class.

Careers@Gov challenge: Government jobs on Workday return minimal metadata. The CareersGovScraper fetches individual detail pages, extracts skillTags from the API response, and falls back to parsing skill cues from the JD text using jd_preparser.py (regex pattern matching, ~50ms/job).

Tradeoffs

What I Tried and What Broke

The interesting engineering is in the failures, not the features.

Injectable vs non-injectable keywords

The first version of the keyword integration just stuffed every missing JD keyword into the resume. The result read like a search engine, not a human. The fix: Stage 0 classifies every missing skill as injectable (user has adjacent experience) or non-injectable (user has no basis for this claim). The AI is only allowed to weave in injectable keywords. Non-injectable ones get flagged as skill gaps, not fabricated onto the resume.

Sibling bullet context

Early rewrites had a duplication problem: the AI would rewrite three bullets in the same entry to all start with "Spearheaded" and repeat the same achievement. Stage 3 now passes sibling context to every rewrite call, so the model knows what the other bullets already say. Stage 4 then runs verb synonym dedup (15 verb groups) as a safety net.

Degraded mode

SEA-LION's 70B model occasionally times out or returns malformed JSON. When Stage 1 (strategic analysis) fails, the pipeline doesn't crash. It falls back to a local heuristic: prioritize bullets with the most issues (from the scorer) and mark the result as _degraded. The user still gets a tailored resume, just without the full strategic reasoning.

Semantic search with embeddings

Keyword matching misses jobs described differently from how you'd write your resume. The backend includes a semantic search layer using sentence-transformers/all-MiniLM-L6-v2 (384-dim embeddings). Both job descriptions and resumes are encoded, and cosine similarity surfaces matches that keyword search would miss. This is how a "data pipeline engineer" resume can match a "data infrastructure" JD.

Engineering

Key Technical Decisions

Pre-parse at scrape time

JDs are parsed into skills, experience years, and education level at scrape time (50ms/job), not on-demand. Skill gap analysis is instant.

70B vs 32B model split

70B for strategic analysis (progress bar, ~10s). 32B for batched bullet rewrites (~15s in pipeline, <2s for single interactive edits).

Validate, don't trust

Every AI output passes through 5 gates. If any gate fails, the original text is preserved. 222 backend tests verify the gates work.

Structured resume model

Resumes are parsed into hierarchical JSON (sections, entries, bullets) enabling surgical edits, per-bullet scoring, and change tracking.

Rate limit distribution

SEA-LION's free tier allows 10 req/min per key. 5 keys are cycled with a simple index counter, and a single in-memory token bucket (45 tokens, no Redis) gates all outbound calls.

Multi-user auth + versioning

JWT + bcrypt auth with tier-based rate limiting. Users can save, label, and manage multiple resume versions linked to specific jobs.

Roadmap

What's Next

Job Hunter is live at job.kooexperience.com. The core experience works, but there's more to build:

Enable remaining 5 sources in cron

The nightly cron currently seeds MCF + Careers@Gov. Extend it to include the 5 additional scrapers.

Enable remaining 5 scrapers

NodeFlair, Indeed, JobStreet, Adzuna, and Jooble scrapers are built but not yet in the production crawl pipeline.

ATS gap visualization

The backend scores skill gaps; the frontend needs to visualize them clearly.

SkillsFuture course linking

The SSG-WSG Skills Framework API is already integrated. Next: link skill gaps to specific SkillsFuture courses.

Related posts: See how I approached stock pattern detection with YOLOv8, weather prediction with LightGBM, or passport compliance with MediaPipe.