Scope
This comprehensive course is designed to equip students with the essential skills and knowledge required to undertake text and data mining tasks. Throughout this course, students will be introduced to key concepts and tools of text and data mining, including data types, data structures, data pre-processing, text processing, data mining techniques, text mining techniques, and advanced topics in both data and text mining. Each session will include a Python component, discussing the importance of Python and its libraries in handling various aspects of text and data mining. Students are not expected to know Python, rather they will be introduced to how Python can solve key issues so that they are aware of its capabilities. By the end of the course, participants will have a solid understanding of text and data mining concepts, be proficient in using Python for text and data mining tasks, and be able to apply these skills to real-world library applications and case studies.
Learning Objectives
1. Understanding of Data, Data Structures, and Complex Data Types
2. Understanding of the main types of Machine Learning and their Applications
3. Understanding of the key Python libraries for text and data mining
4. Understanding of the primary methods for performing text and data mining
Training Facilitator
Training Facilitator: William Mattingly, Postdoctoral Fellow, Smithsonian Institution's Data Science Lab

William Mattingly is a Postdoctoral Fellow at the Smithsonian Institution Data Science Lab in collaboration with the United States Holocaust Memorial Museum (USHMM). He has a B.A. and M.A. in History from Florida Gulf Coast University and a Ph.D. in History from the University of Kentucky. His dissertation research explored using historical social network analysis, cluster analysis, and computational methods for identifying ninth-century intellectual and pedagogical networks. Most recently, his research has focused on developing text classification neural network models to identify sources in medieval texts and developing natural language processing (NLP) methods for medieval Latin. At the Smithsonian and USHMM, he is developing machine learning methods to aid, in among other things, the cataloging of Holocaust documents. He is co-investigator and developer for the Structured Data Extraction and Enhancement in South Africa’s Truth and Reconciliation Archive project and lead investigator and developer for the Digital Alcuin Project.
Course Duration and Dates
The series consists of eight (8) weekly segments, each lasting 90 minutes. Specific dates are:
- October 12, 19, 26
- November 2, 9, 16, 30
- December 7
Each session will be recorded and links to that archived recording will be disseminated to course registrants within 2 business days of the close of the specific session. We strongly encourage attendees to download these files to ensure continued access.
Additional Information
Each registration allows for up to three (3) individuals to participate using three (3) different user logins. Eventbrite will only ask for information for first individual. Up to two additional names and email addresses maybe added by contacting Sara Groveman directly, her via email at sgroveman@niso.org.
Registrants receive unique sign-on instructions via email three business days prior to each session. If you have not received your instructions by the day before an event, please contact NISO headquarters for assistance via email (nisohq@niso.org).
Registrants for an event may cancel participation and receive a refund (less $35.00) if the notice of cancellation is received at NISO HQ (nisohq@niso.org) one full week prior to the event date. If received less than 7 days before, no refund will be provided.
Links to the archived recording of the broadcast are distributed to registrants 24-48 hours following the close of the live event. Access to that recording is intended for internal use of fellow staff at the registrant’s organization or institution. Speaker presentations are posted to the NISO event page.
Broadcast Platform
NISO uses the Zoom platform for purposes of broadcasting our live events. Zoom provides apps for a variety of computing devices (tablets, laptops, etc.) To view the broadcast, you will need a device that supports the Zoom app. Attendees may also choose to listen just to audio on their phones. Sign-on credentials include the necessary dial-in numbers, if that is your preference. Once notified of their availability, recordings may be downloaded from the Zoom platform to your machine for local viewing.