Managing Multiple Archives
Advanced Topic - This guide covers strategies for working with multiple related archives while maintaining data consistency.
Overview
As your collection grows, you may want to organize items into separate archives for different purposes:
- Thematic groupings (e.g., family events, vacations, historical periods)
- Collaborative projects (sharing subsets with others)
- Performance optimization (smaller archives load faster)
- Archival preservation (separating master copies from working copies)
The challenge is maintaining person identity consistency across archives. When the same people appear in multiple archives, you want their person records to use the same identifiers (personIDs) so that:
- Archives can be merged later without creating duplicate person records
- Collaborative workflows preserve relationships
- Cross-archive queries can identify the same individuals
Key Concepts
Person Identity and personID
Every person in Shoebox has a unique identifier called a personID (a UUID). This identifier:
- Is generated once when a person is created
- Remains stable across imports, exports, and merges
- Links person biographical data to items that feature them
- Enables person consistency across multiple archives
Why personID matters for multiple archives:
- If two archives have the same person with the same personID, they're recognized as the same individual
- If two archives have different personIDs, Shoebox treats them as different people (even if names match)
- Importing persons before adding items ensures personID consistency from the start
For technical details on person data structure, see Data Structure Guide - Person Library.
Design Philosophy
Our multi-archive approach follows these principles:
Never Overwrite on Conflict - If metadata differs, preserve the target archive and log the conflict for manual resolution. This protects existing data and relationships.
Staged Import Process - Import persons first (without face detection data), then items (with face descriptors attached). This prevents broken references.
File Verification - Before importing items, verify files are truly identical (size + hash). Same filename doesn't mean same file.
Explicit Conflict Reporting - Don't silently merge or guess. Show all conflicts in a collection and log file for deliberate resolution.
Preserve Face Descriptors - Never replace face detection data in target items. Face assignments are valuable manual work.
These principles emerged from real-world needs: collaborative genealogy work, thematic archive organization, and data consolidation projects.
Common Workflows
Workflow 1: Starting a New Related Archive
Scenario: You're creating a new archive for vacation photos. Many people in these photos already exist in your main family archive.
Steps:
- Create the new archive - see Creating Your Archive for directory setup
- Before adding photos, import persons from your main archive:
- Archive > Import Persons from Archive...
- Select your main archive's accessions.json
- Choose "Import (Strip Face Descriptors)" - clean person library without item references
- Review import results
- Add vacation photos (Archive > Add Media Metadata)
- Assign people to photos using the imported persons
- Result: Vacation photos reference the same person records as your main archive
Why this works: Importing persons first establishes personID consistency. Later, you can merge these archives or import items between them without conflicts.
Workflow 2: Collaborative Editing
Scenario: You export a collection to a colleague, they add metadata and face assignments, then you import their updates back.
Steps:
- Export collection from your archive (Collections > Export Collection)
- This creates a standalone archive with only selected items
- Persons referenced by these items are included
- Share the exported archive with collaborator
- Collaborator adds descriptions, dates, face assignments, etc.
- Collaborator returns the modified archive
- Import the archive back (Archive > Import Archive...)
- Run Dry Run first to preview conflicts
- Review conflict log to see what changed
- Do full import - conflicts preserved in _ImportConflicts collection
- Manually resolve conflicts by comparing target items with conflict collection
Considerations:
- If you modified the same items while collaborator worked on them, conflicts will occur
- The conflict collection shows which items differ
- Face descriptors in target are always preserved (never overwritten)
Workflow 3: Thematic Archive Consolidation
Scenario: You've created separate archives for different decades. Now you want to consolidate into one master archive.
Steps:
- Choose which archive will be the target (master)
- For each source archive:
- Archive > Import Archive...
- Run Dry Run first to see conflicts
- Review log file for person and item conflicts
- Run full import
- Resolve any conflicts in _ImportConflicts collection
- After all imports, run Archive > Validate to check for:
- TMGID conflicts (different personIDs with same genealogy ID)
- Unreferenced persons (can be cleaned up if not needed)
- Orphaned face descriptors (can be cleaned up)
Considerations:
- Person conflicts usually indicate data quality issues (same person entered multiple times)
- Item conflicts mean the same file appears with different metadata in multiple archives
- TMGID conflicts should be resolved - they may indicate duplicate person entries
Import Features
Import Persons Only
Menu: Archive > Import Persons from Archive...
Import person library from another archive while maintaining personID consistency.
When to use:
- Starting a new archive - import persons before adding items
- Synchronizing person data across archives
- Preparing for future archive merge
How it works:
- Select source accessions.json file
- Preview: Shows source archive title and person count
- Choose options:
- Import (Strip Face Descriptors) - Recommended. Clean person library without item references
- Import (Include Face Descriptors) - Keeps face detection data (will be orphaned until items imported)
- Create backup - Enabled by default (checkbox)
- Review results:
- Imported persons (new to target archive)
- Skipped persons (same personID already exists with identical data)
- personID collisions (rare - same personID but different biographical data)
- TMGID conflicts (different personIDs with same genealogy ID)
Person Matching Logic:
- personID match with same data: Person skipped (already exists)
- personID collision (same personID, different data): Alert shown - manual resolution required (very rare)
- TMGID conflict (different personID, same TMGID): Import proceeds, validation will flag for review
- New person (personID doesn't exist in target): Person imported
Face Descriptor Handling:
Face descriptors (faceBioData) contain references to specific items by their link field. When importing persons only:
- Strip Face Descriptors (recommended): Clean person library without item references
- Include Face Descriptors: Keeps face detection data, but descriptors will be orphaned until items imported
- Orphaned descriptors can be cleaned up later: Archive > Validate > Cleanup Orphaned Descriptors
After Import:
- Newly imported persons won't be assigned to any items initially (by design)
- Assign them through Add Media Metadata or Edit Media windows
- Unreferenced persons can be cleaned up: Archive > Validate > Cleanup Unreferenced Persons
- TMGID conflicts should be reviewed: Archive > Validate
Best Practice
Always import persons before adding items to new archives. This establishes personID consistency from the start and prevents duplicate person records when merging archives later.
Import Full Archive
Menu: Archive > Import Archive...
Import both persons and items from another archive with comprehensive conflict detection.
When to use:
- Merging thematic archives into a master archive
- Importing collaborative edits back into main archive
- Consolidating split archives
How it works:
- Select source accessions.json file
- Preview: Shows source archive title, person count, item count
- Warning if _ImportConflicts collection already exists (option to cancel)
- Choose mode:
- Cancel - Abort operation
- Import (Full) - Perform actual import with mandatory backup
- Dry Run (Preview Only) - Analyze conflicts without making changes (no backup needed)
- If full import: Mandatory backup created automatically
- Import executes in two stages:
- Stage 1: Import persons (without face descriptors)
- Stage 2: Import items (with face descriptors for successfully imported items)
- File verification for matching links:
- Compare file size (fast check)
- Compare file hash (SHA-256 - always performed)
- Different files with same link → Conflict logged, item not imported
- Review results:
- Import statistics
- Conflicts detected (persons and items)
- Log file created with timestamp
- _ImportConflicts collection created (if conflicts exist)
Conflict Handling:
Person Conflicts:
- Same as "Import Persons Only" feature
- personID collisions and TMGID conflicts flagged
- All persons imported unless collision detected
Item Conflicts:
- File Mismatch: Same link but different files (size or hash differs) → Not imported, logged
- Metadata Mismatch: Same file but different metadata → Not imported, target preserved, added to _ImportConflicts collection
- All fields compared: date, description, type, link, city, state, gps, person[], source[], playlist, faceTag
- Any difference = conflict
Conflict Collection:
- Created only if item metadata conflicts detected
- Key:
_ImportConflicts(sorts to top with underscore) - Title:
Import Conflicts from: [source filename] - Text: "Import Conflicts"
- Contains target archive items that couldn't be replaced (not source items)
- Review collection to see which items differ
- Compare with source archive to decide resolution
Log File:
- Saved to archive directory:
import-log-[timestamp].txt - Contains:
- Import summary statistics
- Person conflicts with details
- Item conflicts with field differences
- File verification results
- Symlink detection warnings (if found)
Face Descriptor Handling (Two-Stage Process):
- Stage 1: Persons imported without face descriptors
- Stage 2: Items imported with face descriptors only for successfully imported items
- Rationale: Prevents orphaned face descriptors - only import face data for items that exist in target
- Target preservation: If target item has face descriptors and conflicts exist, target face data completely preserved
- New items: Face descriptors from source transferred when item import succeeds
Symlink Detection:
- Resource directories (
photo/,audio/,video/) may contain symlinks or actual files - Import treats all references uniformly as links (doesn't distinguish)
- File verification compares actual file content/size regardless of symlink status
- Log file notes if symlinks detected (informational only)
Using Dry Run
Always run a Dry Run first to preview conflicts before committing to a full import. This lets you review the conflict log and _ImportConflicts collection (preview only) to understand what will happen.
Important
Full archive import never overwrites target items with conflicts. When metadata differs, the target is preserved and the conflict is logged. This protects your existing data and face assignments. You must manually resolve conflicts by reviewing the _ImportConflicts collection and log file.
Conflict Resolution
Understanding Conflicts
Person Conflicts:
personID Collision (very rare): Same personID but different biographical data
- Indicates: Data corruption or manual personID manipulation
- Resolution: One person must have their personID changed - see Manual personID Reassignment below
- Check both archives to confirm which data is correct
TMGID Conflict: Different personIDs with same genealogy ID (TMGID)
- Indicates: Same person was entered multiple times with different names/data
- Resolution: Decide which person record is correct, delete duplicate, reassign items - see Manual personID Reassignment below
- Use Archive > Validate to detect these
Item Conflicts:
File Mismatch: Same filename (link) but different files
- Indicates: Files were replaced without changing filename
- Resolution: Rename one file to differentiate, or determine which is correct
Metadata Mismatch: Same file but different metadata
- Indicates: Archives independently edited the same items
- Resolution: Compare metadata field by field, manually update target to merge information
Resolving Item Metadata Conflicts
- Open _ImportConflicts collection (shows all conflicting target items)
- Compare each item with source archive:
- Open both archives side-by-side
- Compare metadata field by field using Edit Media window
- Decide resolution:
- Keep target (no action needed)
- Use source metadata (manually copy into target)
- Merge both (combine information from both)
- After resolution, delete _ImportConflicts collection or Clear its items
Manual personID Reassignment
While a dedicated personID reassignment tool doesn't yet exist, you can manually handle personID conflicts in two scenarios:
Scenario 1: Separating - One personID Used for Multiple People
If the same personID was mistakenly used for different people (collision), you need to create a new person and reassign some references:
- Open Person Manager and note the personID and all details of the conflicting person
- Create a new person for the second individual:
- Open Add Media Metadata or Edit Media windows
- Enter person details (creates new personID automatically)
- Note the new personID from Person Manager
- Manually visit each item that references the old personID:
- Open Edit Media window for each item
- In the People section, evaluate whether this item shows the first or second person
- If second person: Remove old person reference, add new person reference
- If first person: Leave unchanged
- After reassigning all appropriate items:
- Check Source fields (who provided items) and update if needed
- Run Archive > Validate to check for unreferenced persons
- If old person now unreferenced and no longer needed, can be cleaned up
Scenario 2: Joining - Combining Two personIDs into One
If the same person was entered twice with different personIDs (TMGID conflict or name variations):
- Decide which person record to keep as the primary (better data, more complete)
- Note both personIDs from Person Manager
- Manually visit each item that references the personID to be removed:
- Open Edit Media window for each item
- Remove the old person reference
- Add the primary person reference
- Update Source fields if needed (items provided by the merged person)
- After reassigning all references:
- The old person will be unreferenced
- Run Archive > Validate > Cleanup Unreferenced Persons to remove it
- Update biographical data in primary person if needed (merge notes, names, etc.)
Finding All References:
- Use Collection Manager to create temporary collections
- Filter items by person to see all items featuring that individual
- Export collection to see item list
- Work through systematically to ensure no references missed
Time Intensive
Manual personID reassignment is tedious for persons referenced in many items. Consider carefully whether reassignment is necessary, or if the situation can be resolved by editing person biographical data instead.
Cleaning Up After Import
After importing, run Archive > Validate to check for:
Unreferenced Persons:
- Persons not linked to any items
- May occur after importing extra persons "just in case"
- Click "Cleanup Unreferenced Persons" to remove
Orphaned Face Descriptors:
- Face detection data that doesn't match any items
- May occur if face descriptors included but items not imported
- Click "Cleanup Orphaned Descriptors" to remove
TMGID Conflicts:
- Different persons with same genealogy ID
- May indicate duplicate person entries across archives
- Review and resolve manually
Best Practices
Before Starting Multi-Archive Workflows
- Understand personID: Each person has a unique identifier. Same personID = same person across archives.
- Establish a main archive: Designate one archive as your "master" with authoritative person data.
- Import persons first: When creating related archives, import persons from main before adding items.
- Use collections for collaboration: Export collections (Archive > Export Collection) rather than full archives when sharing with others.
During Import Operations
- Always run Dry Run first: Preview conflicts before committing changes.
- Read the log file: Understand what conflicts exist and why.
- Review _ImportConflicts collection: See exactly which items differ.
- One archive at a time: Don't import from multiple sources simultaneously - resolve conflicts between each import.
After Import Operations
- Run Archive > Validate: Check for data quality issues.
- Resolve conflicts promptly: Don't let _ImportConflicts collection accumulate.
- Clean up unreferenced data: Remove orphaned descriptors and unreferenced persons you don't need.
- Document your process: Note which archives have been merged and when.
Technical Considerations
Why personID Instead of Names?
Names are unreliable identifiers:
- Same person may have different names (maiden vs. married, nicknames, spelling variations)
- Different people may have the same name
- Names change over time but identity doesn't
Using personID (UUID):
- Guarantees uniqueness
- Remains stable across all operations
- Enables reliable matching across archives
- Supports complex relationships (multiple last names, name changes)
Why Two-Stage Import?
Importing persons and items separately:
- Prevents orphaned references: Face descriptors only transferred for items that successfully import
- Preserves data integrity: Persons established before items reference them
- Enables cleanup: Can import persons "just in case" and remove unreferenced ones later
- Reduces complexity: Simpler than all-at-once import with complex dependency tracking
Why Never Overwrite on Conflict?
Automatic merging is dangerous:
- Loss of manually-entered data
- Overwrites face assignments (valuable manual work)
- Hides disagreements that need human judgment
- No undo mechanism
Explicit conflict reporting:
- Protects existing data
- Makes user aware of discrepancies
- Allows informed decision-making
- Provides audit trail (log files)
Advanced Scenarios
Splitting One Archive into Multiple
- Create new archive - see Creating Your Archive for directory setup
- Import persons from original (Archive > Import Persons from Archive...)
- In original archive, create collection with items to split off
- Export collection (Collections > Export Collection)
- Import exported items into new archive (Archive > Import Archive...)
- Remove exported items from original (delete individually or use Item Manager)
- Clean up unreferenced persons in both archives (Archive > Validate)
Synchronizing Person Data Across Archives
If you update person biographical data in one archive and want to sync to others:
- Export person library only (currently no direct feature - use Import Persons to transfer)
- In target archive: Archive > Import Persons from Archive...
- personID matches with different data will be flagged as collisions
- Currently requires manual synchronization (person-by-person comparison)
- Future enhancement: Person sync feature with selective field updates
Handling Face Descriptors Across Archives
Face descriptors reference items by link. When importing across archives:
- Same items in both archives: Face descriptors transfer correctly
- Different items: Face descriptors become orphaned (reference non-existent items)
- Best practice: Strip face descriptors on initial person import, re-run face detection in target archive
- Alternative: Include face descriptors if items will also be imported
Limitations and Future Enhancements
Current Limitations
- No batch person synchronization (update person data across multiple archives)
- personID collisions require manual personID reassignment (tool not yet implemented)
- No collection import from source archives (collections ignored)
- No merge preview UI (must use dry-run + log file)
Related Documentation
- Data Structure - Person Library - Technical details on person data structure
- Archives vs Collections - Understanding the difference
- Creating Your Archive - Basic archive setup
- Metadata Features - Person Manager usage
Need Help?
Multi-archive management is an advanced workflow. Start with simple scenarios (import persons to new archive) before attempting complex merges. Always backup before importing. Use dry-run mode liberally.
