Loading…
Welcome to CNI’s Spring 2026 Membership Meeting in Salt Lake City, Utah, April 13–14; attendance is limited to member representatives, speakers, and invited guests.
  • A Sched account is not required to view the event Sched, but it will enable you to personalize or sync it to your calendar. Sched invitations were sent to attendees in March, if you haven’t received yours, please contact [email protected] for access.
  • Wifi: CNI_Connect
    Password: CNIs26confSLC
  • Review CNI’s Code of Conduct
Tuesday April 14, 2026 1:00pm - 2:00pm MDT
Expanding Access to Historic Scanned Documents Using R's Tesseract Package
Adelynn Shirts and David Advent (Utah State University)

Utah State University's Institutional Repository, DigitalCommons@USU, hosts over 100,000 PDF documents, many of which were originally printed pre-1975 and then later scanned. As such, they lack embedded text layers, rendering them inaccessible to screen readers without additional processing. A scalable pipeline was built to identify documents lacking embedded text and perform optical character recognition (OCR), making the content accessible to screen readers. Two preprocessing functions deskew, denoise, and enhance document clarity prior to performing OCR. Dictionary coverage from light and heavy preprocessing functions were compared: light preprocessing was computationally faster but resulted in less dictionary coverage, while heavy preprocessing added a modest amount of time and increased dictionary coverage slightly. After evaluating outputs, it was determined that the dictionary coverage of documents lacking embedded text layers were similar to those containing embedded text layers. While this doesn't make documents exactly compliant with Americans with Disabilities Act standards, it is an important first step in working towards accessibility for older publications, especially considering the open source nature of the code and process.
https://github.com/ashirts/Expanding-Access-to-Historic-Scanned-Documents

Improving Accessibility and Discoverability Utilizing Open Source Models in a Novel Modular Design
Brian McBride, Harish Maringanti, and Bohan Zhu (University of Utah)

University libraries are under growing pressure to expand access and improve discovery while meeting new accessibility expectations for  digital collections. At the University of Utah's J. Willard Marriott Library, the digital infrastructure development team is building a flexible, modular workflow to help bring large audio-visual (AV) collections into alignment with the Department of Justice accessibility requirements at scale. The platform orchestrates open source speech-to-text and language models to generate time-aligned transcripts and captions, structured segmentation, word clouds, entity recognition, and descriptive metadata that improves both compliance and discoverability. The session will highlight the  implementation approach, early results, and lessons learned including human review checkpoints, staff support and buy-in, provenance and auditability, and how adaptable workflows are being designed  as models and standards evolve. The session will also focus on practical strategies other institutions can reuse to accelerate accessible AV delivery without locking into a single vendor or toolchain and the team’s future development plans for supporting other formats, including images, PDFs, and other formats.

Strategies for Responsible AI in Manuscript Transcription (Lightning Talk)
Sara Brumfield (FromThePage)

FromThePage is a crowdsourcing platform for archives and libraries where volunteers transcribe, index, and describe historic documents. This talk will overview how the platform and community are making decisions that make the use of artificial intelligence (AI) in historical document transcription transparent, optional, and tentative, including topics such as:
- Optional usage of AI by transcribers and institutions
- Surfacing and logging use of AI drafts
- Provenance in exports showing both AI and human contributions
- Detecting unauthorized use of AI
- Measuring accuracy

http://www.fromthepage.com




Speakers
AS

Adelynn Shirts

Open Science and Publishing Graduate Assistant, Utah State University
avatar for David Advent

David Advent

Utah State University, Scholarly Communication Librarian
BM

Brian McBride

Associate Director of Digital Infrastructure Development, University of Utah
avatar for Harish Maringanti

Harish Maringanti

Associate Dean for Research, University of Utah
BZ

Bohan Zhu

Web Software Developer, University of Utah
avatar for Sara Brumfield

Sara Brumfield

Partner, FromThePage
Tuesday April 14, 2026 1:00pm - 2:00pm MDT
Regency D

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link