Naming and Arrangement Standards for Digital Media Collections

 Data Naming and Arrangement Recommendations

To prevent data loss and confusion when storing and locating data, files must be named and arranged consistently by staff and freelancers. The basic principles of information science encourage the use of persistent naming and arrangement standards to create shared context among user groups who will access data. This shared context, known as “provenance”  makes sets of data understandable to anyone who is familiar with the standards in use. When files are contextualized through naming standards, users may be able to deduce, for example, the time period in which the data was created, the project for which it was created, which files from the project were published and those that were only internally accessible to staff. In short, naming and arrangement standards give users a place to put files during the production process, and allow future users to find data they’re looking for without prior institutional knowledge about the data. 

Basic benefits to file naming and arrangement and data management
The benefits of naming and arrangement standards go far beyond contextualization and tracking provenance.

1.Discoverability

When properly designed, custom standards allow users to locate files quickly and efficiently without navigating through inconsistent or unnecessarily complicated sets of folders or tiers of a user interface. Fewer tiers of folders mean fewer clicks by user and less friction in the user interface. When navigating through sets of standard directory structures, users know exactly what to expect. A predefined arrangement structure gives users a reliable place to store data, helping determine the difference between data that never existed in the first place rather than assume it is lost in a complex folder structure.

2. Data Packaging and Modularity

Standards allow archivists to contain sets of data in packages (sets of folders) so they are identifiable within larger collections. Creating concise packages when organizing projects or sets of data makes careful, intentional data management possible. Files stored in concise packages can be moved when data storage issues arise, for example hard drives crash or run out of space. If a certain project is known to have sensitive content, neatly-packaged data can be encrypted at a moment’s notice. If files related to a certain project are lost, naming and arranging data into packages reduces the number of places misplaced files may be found.

3. Ensuring Data Integrity

Additionally, when preserving critically important data, files stored in a consistent structure can be checked for data integrity many years after it was created. Although these eventualities may not apply to all collections and naming and arrangement may simply be a convenient means of organizing files during production, it’s also important to consider the critical ways these standards sustain collections in ways that simply can’t be anticipated when data is created.

4. Project Management and Collaboration

Project IDs are useful for storing assets, but also identifying categories of work and tracking progress, for example in project management tools such as Monday, Airtable, or Trello. Use of Project IDs can be instrumental for collaboration during the creation/production phase and indispensable for archiving and preservation of completed projects.

5. Permissions, Access and Security

All files associated with a distinct project will be stored in a single, reliable place that is accessible to users who need to edit or view it. Storing data in this way can also help allocate permissions and limit access only to staff who need access.

6. Machine-Readable Collections

Data that is named and arranged according to a standard is both human and machine-readable. This allows developers and administrators to design batch processes and write scripts to analyze and manage data effectively. Examples might include simply running a size report on all video projects within a certain date range, analyzing how many projects were produced by a certain team last year, migrating a set of projects from one cloud server to another, or determining that an automated file backup workflow for a given set of projects is working properly.

Recommendations for Data Packaging and Use of Project Identifiers

Teams at [ORG NAME HERE] may consider using a set of standardized "Project Identifiers" to track data created for current and past projects using a simple naming system. These identifiers help staff track individual files, as well as information associated with projects, including the project status, project owners, assets are currently being edited or produced, as well as published or archived files. This reduces duplication of work, disorganization, and extraneous communication. 
[ORG NAME HERE] currently uses identifiers to track and produce content in some cases, but may consider formalizing this process across the organization, or within specific departments and divisions. To help formalize the implementation of Project IDs, guides can be created to help staff understand how IDs work within each [ORG NAME HERE] department.

Project ID Structure Proposal for [ORG NAME HERE] Departments

Once a Project ID is created, it never changes and is used throughout the lifecycle of a project from production and research to archiving and preservation. Project IDs are structured using a standard set of abbreviations in a string. Teams within [ORG NAME HERE] may use slightly different variations.

Here is an example of a general-purpose project ID for events:
YYYYMMDD_Event_Location_Subject

1.Date Recommendations
The date applied to a Project ID should reflect the year, month, and possibly day that work began on the project - ie, the start date. Although it can be tempting to use the intended publication, release, or event date, deadlines often change and staff would subsequently be required to rename all associated files, folders and instances in which a Project ID was used. Files or instances that may have been missed during this renaming effort may give users the impression that multiple projects with similar IDs had been created instead one project being updated with a date change. In general, using the start date is highly recommended.

2. Additional Naming Categories
The example above incorporates “event”, “location”, and “subject” categories. Each department within [ORG NAME HERE] may create a Project ID that notes their involvement, such as an abbreviated department code or category of work. These categories should reflect, for example, the team who created the Project and any related subject matter. Conversations to determine which categories are most meaningful to include in a Project ID would ideally take a high-level perspective describing the work done by the team in question, and potentially their role within the larger institution. Recommendations for naming standards for different teams and the specific work they produce can be made upon request. 

3. Where to Use Project IDs
Projects with IDs should be stored alongside each other so that they are in a predictable, consistent place and can easily be sorted, scrolled, browsed, and searched. In the context of file storage and/or asset management platforms, Project IDs should be used in a root-level directory or location that is immediately accessible to users. 

Files associated with any given project should be named after the Project ID if they are produced by [ORG NAME HERE] staff. This is optional but recommended, as it allows staff to understand and contextualize final output files that may be distributed in ways that can be difficult to predict (for example uploaded to online platforms, emailed, or saved in locations outside of [ORG NAME HERE]’s control). Note that files used during the production process (photos downloaded from social media, document files from partners, camera files from a videographer, etc.) that are not directly created by [ORG NAME HERE] staff should retain their original file names and should not be renamed to match the Project ID.