# Catalog Track — Media Archive Organizer

**Date:** 04/30/26
**Status:** Scoping, not built
**Sister track:** Storage Consolidation (`CONSOLIDATION.md`). Both run concurrently. Shared first artifact: the indexer (§Architecture).
**Primary user:** Shalaco — creative studio + personal mixed library
**Scale:** Synology NAS (DX517 expansion confirmed). ~82 TB used, ~21.6 TB free in the active pool. Bay 5 has a blank 18.2 TB Seagate (healthy, unallocated) ready to add — call it ~40 TB of headroom once it joins the pool. **Plus two additional servers** holding more material from prior build/outgrow/rebuild cycles. The catalog track treats all reachable storage as one logical archive; the storage track decides which becomes canonical. 5+ years of project work + decade+ of personal media + still-uningested hard drives.

---

## Problem

The library is bigger than "Mac Photos." The honest scope:

- **iCloud Photos** — decade+ of mixed business/personal, organized in Photos but locked there. icloudpd export is one slice of the archive. Photos stays as the iPhone capture + Edits-app layer.
- **NAS** — ~82 TB already on it. Mix of organized projects, partially-migrated hard drives, and folders that need triage. SF in Bloom (4 years priority) is in here. So is design work, interviews, raw camera footage, project files (Premiere, design tools), and an unknown amount of duplication from 5 years of half-finished migrations.
- **Hard drives** — still uningested. Money and space were the bottleneck; both are mostly solved now (one more 16 TB drive incoming).
- **Multiple projects, not just SFIB.** SFIB is priority but the system needs to handle other project archives without forcing them through an SFIB-shaped pipeline.

The need is one external system that catalogs all of this, makes it searchable, lets you organize without modifying source files, and feeds Premiere and other downstream tools. Photos is one input. icloudpd is one ingestion path. The NAS is the canonical archive.

## Primary driver

> "I just want to be able to organize as much as possible everything I have somewhere external from Mac Photos."

Read more broadly now: organize everything you have, period. The NAS holds most of it. The remaining hard drives need to land there too. The tool then makes the whole thing legible.

## Scope decisions

**1. NAS-first, Photos stays.** The NAS is the canonical archive. Photos remains the iPhone capture and Edits-app layer. icloudpd is one ingestion source feeding the NAS. Other ingestion sources: hard drive imports, camera card dumps, design exports, recorded interviews. The tool indexes whatever lands on the NAS, regardless of provenance.

**2. Sidecar metadata, never modify source files.** Tags, favorites, collections, captions, project assignments live in a SQLite database on the NAS, keyed by file hash. Re-ingestion doesn't break organizational work. Source files stay untouched. If the NAS dies and you restore from backup, the DB restores with it.

**3. Hash-keyed identity. Dedup is a first-class feature.** 5 years of partial migrations means duplicate files almost certainly exist across folders. The indexer hashes everything; the UI surfaces dupes; you decide which to keep or whether to keep both.

**4. File-type-agnostic.** Photos, video, audio (interviews), design files (PSD/AI/Sketch/Figma exports), Premiere project files, RAW camera files, PDFs. All indexable. Thumbnail-ability varies — see §Thumbnails.

**5. Multi-project from day one.** Projects are first-class. SFIB is one project. Others coexist. Assets can belong to multiple projects (a photo can be in SFIB and personal).

**6. Local-first, no cloud sync ambitions.** No two-way sync to Photos, no cloud DB, no multi-device. The NAS is reachable; that's the surface.

**7. Premiere bridge is read-only.** Symlink folders or path manifests. The tool feeds Premiere, doesn't manage it.

## Priority stack

**P-1 — Consolidation and ingestion.** Get the remaining hard drives onto the NAS. Establish a top-level convention (e.g. `/projects/<name>/`, `/personal/`, `/inbox/`, `/archive/`). This is foundation work; doesn't strictly require the tool, but the tool's value is bounded by how complete the archive is. Run dedup as part of this so we don't carry duplicates forward.

**P0 — Make the NAS browsable.** Index the NAS. Hash, EXIF, timestamps, GPS, embedded captions/keywords. SQLite sidecar. HTML viewer with grid, search, date filter, file-type filter. This is the moment the archive stops being a black hole.

**P1 — Curation layer.** Tags, favorites, collections, project assignments. Multi-select batch ops. Faceted search. This is where 82 TB becomes navigable instead of just searchable.

**P2 — Project-scoped views.** SFIB view, other-project views, smart collections, saved searches. Mark business vs personal. The 4 years of SFIB media surface as a first-class workspace.

**P3 — Premiere bridge.** Collection → symlink folder under e.g. `/Volumes/<NAS>/premiere-bins/<collection>/`. Optional FCPXML.

**P4 — Pre-2022 archive surfacing.** Older media gets indexed in P0 anyway; this priority is about *curating* it (tagging, organizing into projects). Lower priority because value density drops.

**P5 — Auto-tagging, AI search, two-way Photos sync, cloud DB.** Defer indefinitely. Each is a project of its own.

## Architecture sketch

- **Indexer (Python script, runs on NAS or against NAS):** walks the tree, computes a hash (xxhash for speed, not crypto), reads EXIF + sidecar JSON + embedded metadata, writes to SQLite. Idempotent. Re-runnable. Tracks a `last_seen_at` so vanished files surface as missing rather than vanishing from the DB.
- **Sidecar DB (SQLite on the NAS):** tables for `media`, `tags`, `collections`, `projects`, `media_tags`, `media_projects`, `duplicates`. Hash-keyed primary identity. File path is metadata, not identity, because files move.
- **Thumbnailer (worker, runs on NAS):** ffmpeg for video, ImageMagick or libvips for stills, Pillow for RAW, dedicated handlers for PSD/AI. Generates a small + large thumb per asset, stored in a thumbs cache directory next to the DB. Async, can run for days against a fresh archive.
- **Web UI (single-file HTML pattern, like Product Tree v3):** grid view, viewer pane, tag editor, collection builder, project switcher, dupe surfacer. Talks to a small Python server (FastAPI) running on the NAS or your laptop, querying the SQLite.
- **Export commands:** "send collection to Premiere" → symlink folder. "export tag list" → CSV. "find dupes" → list with one-click resolve.

## Thumbnails — the realistic part

At 82 TB, full thumbnailing is a multi-day background job. Plan for it. Strategy:

1. Index everything first (fast — metadata only). Search works without thumbs.
2. Generate thumbs lazily on first view, plus a background worker that pre-warms recently-modified and recently-viewed.
3. Video thumbs: single keyframe at 10% in. No spritesheet for v1.
4. Design files: best-effort, fall back to a generic icon if extraction fails.

## Open questions before we build

1. **Build vs evaluate first?** At 82 TB this is a real digital asset management (DAM) problem. Worth ~30 min looking at Tropy, Daminion, Mylio, DigiKam, ResourceSpace, or even Synology Photos before committing to build. Building means full control and no per-seat fees; evaluating means we might find something 80% there. My instinct: build, because every existing DAM has opinions that fight your workflow, but worth the look.
2. **Synology substrate — use it or work alongside?** Confirmed Synology (DX517 expansion). Synology Photos already indexes stills and has a thumbnail/metadata layer we can read or ignore. Synology Drive can mount the share to Mac/Premiere natively. Worth deciding early: do we lean on Synology's indexing (faster to ship, locked-in) or build our own indexer that treats Synology as just disks (more portable, more work)? My instinct: build our own, treat Synology as dumb storage, but read Synology Photos' DB for free metadata if it's already there.
3. **Find vs curate — what's the bigger pain right now?** If finding, P0 ships first as a read-only viewer and probably solves most daily pain. If curating, the tag editor lands in v1 alongside the viewer.
4. **Top-level folder convention on the NAS.** What does the structure look like today? Is there a convention or is it messy? P-1 ingestion plans depend on this.
5. **Business iCloud Library split — yes or no?** Still relevant for clean icloudpd targeting. Not blocking.
6. **Pre-2022 archive — index in P0, curate in P4.** Confirm.

## Out of scope

- Pushing tags or organization back to Photos
- Replacing Photos for capture or iPhone editing
- Cloud hosting or multi-device sync of the sidecar DB
- AI auto-tagging
- Two-way sync of any kind
- Managing Premiere projects (only feeding them)
- Becoming a backup tool — the NAS handles that

## What success looks like

The NAS holds everything. One command keeps the index fresh. An HTML page opens in your browser, shows the full archive as a searchable grid filtered by project, tag, date, or file type, lets you tag and collect and dedupe, and exports a Premiere-ready folder of symlinks for any collection. The 4 years of SFIB media are findable in seconds. The 5+ years of mixed project archive is legible instead of dread-inducing. Photos and Edits keep doing their thing on the iPhone side, untouched.

---

## Scope evolution

- **v0 (initial framing):** Mac Photos export organizer. icloudpd-centric.
- **v1 (this brief):** NAS-wide media archive organizer. icloudpd is one ingestion source. Photos is one upstream system. The NAS is the canonical archive. 82 TB scale acknowledged. Multi-project, multi-file-type, dedup as first-class. Consolidation/ingestion added as P-1.

`[Source: 04/30/26 scoping conversation | Project: Media Archive Organizer v1 | Priority: P-1 = consolidate, P0 = browsable index]`
