Processing Capstone Email Using Predictive Coding
Introduction
The Illinois State Archives, in partnership with the University of Illinois and with three-year funding offered by the National Historical Publications and Records Commission (NHPRC) is launching a project called: Processing Capstone Email Using Predictive Coding (a.k.a. the Capstone Email Project). The project seeks to develop and demonstrate a reliable and sustainable method of identifying and providing appropriate access to the email messages of state agencies that have enduring value.
Following the lead of the National Archives and Records Administration, we will start with using a Capstonei approach to identifying email messages having enduring value. This means the project will identify and secure email messages of senior administrative officers from state agencies according to the priorities of the Director of the State Archives. Once the email is secured, the project will work with experts in the areas of text analytics and electronic discovery to explore tools that use technology-assisted review techniques (predictive coding in particular) for the purposes of parsing and classifying the email.
We envision the tools will assist in identifying and prioritizing review of sensitive content, in generating descriptive metadata, aggregating email threads, identifying near-duplicates, and providing for some level of automatic appraisal and redaction. Once the selected tools have been identified and configured, we will conduct batch processing of email so it may be ingested into a digital repository. From there, the email will be made available for public access through in-person access to an offline computer terminal.
Plan of Work
✔ Phase 1 – Kick-Off and Initial Explorations
✔ Phase 2 – De-duplication and Assessment
✔ Phase 3 – Auto-categorization Tools Assessment
✔ Phase 4 – Restrictions and Redaction Tool Assessment
✔ Phase 5 – Enhancement Tools Assessment
✔ Phase 6 – Batch Email Processing
✔ Phase 7 – Search and Access Tools Evaluation
Phase 8 – Rollout Process
Performance Objectives
1. Establish proven workflows for the processing of Capstone email.
2. Process at least 20 GB of email including at least one senior state official.
3. Demonstrate processing efficiency exceeding manual human review.
4. Provide public access to Capstone email.
CoSA NHPRC Email Symposium 2017 - Brent West presenting
2017 SAA Team Presentation
iPRES Conference - September 2018
iPRES Conference – September 2018
iPRES Conference – September 2019
Brent West (standing) and Josh Hackel (seated) of the University of Illinois explain to Illinois State Archives staff how to access state agency emails using specially developed software. The State Archives and the University of Illinois have been collaborating on a three year project to develop a reliable and sustainable method to provide access to Email records that have enduring value. Funding for the project, "Processing Capstone Email Using Predictive Coding," was made available from the National Historical Publications and Records Commission.
Research Assistant Tara Trentalange tests out the public access computer
Team Members
-
1. Project Director – David Joens
E-Records Archivist and Director | Illinois State Archives
(217) 782-3492,
djoens@ilsos.gov
-
2. Co-Principal Investigator – Joanne Kaczmarek
Associate Professor and Archivist for Electronic Records | University of Illinois
(217) 333-6834,
jkaczmar@illinois.edu
-
3. Co-Principal Investigator – Brent West
Asst. Director for Records and Information Management Services | University of Illinois
(217) 265-9190,
bmwest@uillinois.edu
-
4. Project Manager – Amanda Hartman
Records Archivist | Illinois State Archives
(217) 524-7528,
ahartman@ilsos.gov
-
5. Text Analytics Expert (October 2016 - May 2017) – Dan Roth
Professor of Computer Science | University of Illinois
(217) 244-7068,
danr@illinois.edu
-
6. IT Infrastructure Expert (October 2016 - February 2019)– Tom Habing
Software Development Manager | University of Illinois
(217) 244-4425,
thabing@illinois.edu
-
7. Archival Email Expert (January 2017 - October 2019) – Chris Prom
Assistant Archivist | University of Illinois
prom@illinois.edu
-
8. Archival Advisor (June 2017 - October 2019) – William Maher
Director of Archives | University of Illinois
w-maher@illinois.edu
-
9. Research Assistant (October 2016 - May 2017) – Jiayue Niu
Tools Assessment, Workflow Development, and Email Processing | University of Illinois
jniu6@illinois.edu
-
10. Research Assistant (June 2017 - November 2018) – Mei Mei
Tools Assessment and Workflow Development | University of Illinois
meim2@illinois.edu
-
11. Research Assistant (June 2017 - August 2018) – Aarthi Shankar
Tools Assessment and Workflow Development | University of Illinois
shankar9@illinois.edu
-
12. Research Assistant (January 2019 - June 2019) – Tara Trentalange
Tools Assessment and Workflow Development | University of Illinois
taralt2@illinois.edu
-
13. Research Assistant (January 2019 - Present) – Joshua Hackel
Tools Assessment and Workflow Development | University of Illinois
jhackel2@illinois.edu