Text segmentation
Firstly, the text in all the scans is recognised. Then, the scans are segmented into separate resolutions, grouped per (session) day.
To achieve this, the session days were first recognised based on the date and the list of present delegates. This was done in two steps. First, we searched for lines of text that contained a date in a certain format to determine whether that was the start of a session day, then we determined which date was mentioned. In this way, we determined which date each piece of text belongs to.
Then, for each session day, the text has been segmented into resolutions. This is done based on a list of fixed (formulaic) expressions (for example, “The report of … has been heard …” and “Received a Missive from …”) with which resolutions are introduced. These formulas not only indicate where in the text a resolution begins, but also what kind of action or document underpinned the resolution (the proposition type). Segmenting into resolutions was more challenging with the late sixteenth- and early seventeenth-century resolution books, which are less formulaic in nature. Volunteers therefore helped train the computer to be able to distinguish the resolutions as well as possible in these books.
Please note: because the segmentation was largely automatic, it is possible that the start of a session day has not been recognised; the corresponding resolutions were then assigned the date of the previous session day. It is also possible that the date was recognised incorrectly; the corresponding resolutions would then also be assigned to the wrong session day. Finally, resolutions are not always segmented correctly, which means that multiple resolutions may be segmented together as one resolution, or the end of a resolution may end up in the next resolution.
Related video: