Skip to main content
Skip to main content

Analysis of Marked Up Documents

Please attend one of the iterations of the “Text Analysis with R” sessions before attending this workshop. If you cannot attend the prerequisite, contact Sarah Stanley for the slides and some test exercises to try before attending this session.

Often times, when we work with texts there are large amounts of extraneous text that we don’t want to deal with. We may want to suppress speaker labels in drama, advertisements in newspapers, or boilerplate language on a webpage. In this workshop, we will discuss how to extract specific textual features from XML and HTML documents using R. In our text exercises, we will explore the differences in results that we get from analyzing text that hasn’t been marked up and text that has.

Please bring a laptop to this session, and have R and RStudio installed, following the instructions in this LibGuide: Before attending, please install the “rvest” and “XML” packages in R, by going to R > “Install packages” in RStudio. If you have problems with installation, or if you do not have access to a laptop, please contact Sarah Stanley prior to the session.

Monday, February 11 at 2:00pm to 3:00pm

Strozier Library, R&D Commons
116 Honors Way, Tallahassee, FL 32306, Tallahassee, FL

Event Type

Training and Development, Workshops & Seminars




digital humanities, Digital Research and Scholarship, text analysis


All Audiences


Contact Name:

Sarah Stanley

Contact Phone

(850) 645-2122

Contact Email

Google Calendar iCal Outlook

Recent Activity