Home The method Data model

Data model

Within the REPUBLIC project, we worked from a specific vision on structuring the data. It is important in this vision that the data structure has both a physical and a logical dimension. We have elaborated the vision in a data model, which we describe below.

The first phase of structuring focused on the physical dimension. The resolution books of the States General were first scanned in order to obtain the data. We then generated transcriptions by means of automatic region as well as text recognition.

Pages and transcriptions can be regarded as physical structural elements. This also applies to columns, paragraphs and lines. We have automatically identified these structural elements. This process was checked manually. The line was taken as the basic physical unit. Each line in the transcriptions is provided with coordinates that refer to the physical location of the line on the associated scan. The link between scan and transcription therefore always takes place at line level.

The lines form paragraphs and the paragraphs can be arranged in columns. The columns appear on pages. In the original setting, the pages form resolution books, which each have their own inventory number in the archive of the States General in the National Archives. We distinguish four types of pages: empty pages, title pages, resolution pages and index pages. In the web application Goetgevonden, only the resolution pages are shown; all of them are important for the structuring.

The second phase of structuring focused on the logical dimension. The resolution books are divided into sessions. Each session is announced with a date. Sessions consist of an attendance list and multiple resolutions. These are logical structural elements.

Anyone searching in the resolutions of the States General will want to know what is in the individual resolutions, on which day these resolutions were taken and possibly also who was present. In the project, we therefore consider resolutions as a logical basic unit. Anyone who performs a search in Goetgevonden will get resolutions as results. The date of the session day on which the resolution was taken is linked to each resolution. The attendance list for this hearing day is also linked to the resolution, if available. To make this possible, the hearing days and the resolutions have been distinguished in the transcripts by means of automatic text segmentation.

To facilitate searching, we have also recognised entities in the resolutions. Entities are logical structural elements that occur repeatedly in the text, such as personal names, locations and organisations. The automatic recognition and curation of entities is explained in more detail here. A limited number of entities have been identified in more detail.

All physical and logical structural elements are stored in repositories.

Related video: