After analyzing all 33 NC document URLs, here's the breakdown of how each council stores their meeting documents and what's needed to scrape them.
Source Type Categories
🟢 WordPress Direct PDF (Easiest - 7 NCs)
These sites host PDFs directly on their WordPress site with predictable URL patterns like:
/wp-content/uploads/docs/Minutes_YYYY-MM-DD.pdf
NCs:
- Chatsworth - chatsworthcouncil.org/agendas-minutes/ → Direct PDFs with pattern
Minutes_YYYY-MM-DD.pdf - Canoga Park - canogaparknc.org/agendas-and-minutes/
- Lake Balboa - lakebalboanc.org/agendas-and-minutes/
- Granada Hills North - ghnnc.org/agendas-and-minutes/
- Granada Hills South - ghsnc.org/resources/agendas-and-minutes/
- Northridge West - northridgewest.org/agendas-minutes/
- Mission Hills - mhnconline.org/agendas-minutes/
Scrape approach: Parse HTML for PDF links, download directly.
🔵 Google Drive (Medium - 4 NCs)
Documents stored in public Google Drive folders.
NCs:
- Winnetka - wncla.org links to Drive folders by year
- Chatsworth (backup) - Also has Drive folders
- North Hollywood West - nohowest.org/meetings/meeting-minutes/
- Arleta - arletanc.org/agendas-and-minutes/
Scrape approach: Use Google Drive API or parse folder listings.
🟣 CivicClerk Platform (Medium - 12 NCs)
These use a platform at *.org/committees/viewCommittee/XXX with year tabs.
NCs:
- West Hills - westhillsnc.org/committees/viewCommittee/board
- Sherman Oaks - shermanoaksnc.org/committees/viewCommittee/board
- Tarzana - tarzananc.org/committees/viewCommittee/board
- Sylmar - sylmarneighborhoodcouncil.org/committees/viewCommittee/general-board
- Sunland-Tujunga - stnc.org/committees/viewCommittee/11597
- Studio City - studiocitync.org/committees/viewCommittee/366
- Panorama City - panoramacitync.org/committees/viewCommittee/293
- Pacoima - pacoimanc.com/committees/viewCommittee/13033
- Encino - encinonc.org/committees/viewCommittee/board-agenda-minutes
Scrape approach: Parse HTML tables, PDFs are in /assets/documents/ paths.
🟠 Wix + Google Drive (Medium - 1 NC)
- Winnetka - Wix site with links to Google Drive folders
⚪ Custom Website (Varies - 9 NCs)
Non-standard layouts, need individual review.
NCs:
- Valley Village - myvalleyvillage.com/agendas-minutes/
- Valley Glen - greatervalleyglencouncil.org/agendas-and-minutes/
- Sun Valley - svanc.com/agendas-and-minutes/
- Van Nuys - vnnc.org/agenda-and-minutes/
- Toluca Lake - gtlnc.org/agendas-and-docs/
- Reseda - resedacouncil.org/board/
- Porter Ranch - prnc.org/meetings
- Northridge South - northridgesouth.org/agendas-minutes
- Northridge East - nenc-la.org/agendas-and-minutes/
- North Hills West - nhwnc.org/agendas-and-minutes/
- North Hills East - nhenc.org/agendas/
- North Hollywood NE - nhnenc.org/agendas/
Priority Scraping Plan
Phase 1: WordPress Direct PDF (7 NCs)
Easiest wins. Parse the agendas/minutes page, extract PDF links, download.
Phase 2: CivicClerk Platform (12 NCs)
All use same platform - build one scraper, run on all.
Phase 3: Google Drive (4 NCs)
Need Drive API or folder parsing.
Phase 4: Custom Sites (10 NCs)
Manual review of each site structure.
Unscrapable / Manual Only
None identified yet - all have some form of public document access.
Next Steps
- Update each NC in the database with Scrape Type
- Build WordPress Direct PDF scraper (Phase 1)
- Build CivicClerk scraper (Phase 2)
- Add Google Drive support (Phase 3)
- Handle custom sites case-by-case (Phase 4)