Born Digital

I have a very practical interest in digital archives and digitization projects. Besides my People’s History of Fallujah digital archive, which is a pretty straightforward collection of materials that were already on the web (I’m just collecting them to a single location, I also want to start a digitization project for the Carlo Danio Library in Grumento Nova, where I’m doing my other project, Lingua e Memoria Grumentina.

If you followed the link, then you’ve seen that this library is a hidden treasure. The translation on the webpage is a bit difficult the follow, but you get the picture. It’s a well preserved library filled with books from the 17th century on, some in Latin, some in Vulgar Latin, some even in Italo-Romance languages other than Italian (like Grumentino). Scholars of the classics and Italian history should be traveling from all of the world to come visit this collection, but no one knows it exists. And it’s not easy to get to Grumento Nova (you need to figure out the chaotic southern Italian bus system to get there).

This is the paradox of southern Italy—it’s rich in natural resources and cultural heritage, and yet poor. Grumento Nova is in some ways better off, and in some ways worse off, than the rest of the south. The single largest problem that Grumento Nova is facing today is pollution from the Eni COVA oil extraction plant within its city limits. It’s the largest oil extraction site in all of Italy, and its ruining the entire Agri Valley area.

This library is one potential source of income for Grumento Nova. This, combined with a tourism industry, could help make Grumento less economically dependent on its petroleum resources. And I think the best way to utilize this library and start building an alternative economy in Grumento will be to digitize this collection and charge subscription fees to libraries around the world. An alternative approach would be to better advertise the contents of the library and hope that scholars come to use it. But my hunch is that more money could be made through selling online subscriptions. I’m not sue how exactly to predict how much more money could come in through online subscriptions. This is really just a hunch. But it seems like common sense to me.

At this point I’m unsure of the benefits of using some sort of mark-up language to digitize the books. I think it will be hard enough to get a grant to bring a scanner to Grumento, let alone finding someone who will put the labor into doing the mark-up (I’m definitely not doing it). I suspect that using a scanner with OCR technology will be sufficient for the vast majority of the books. However, I believe there might be a few handwritten manuscripts in the collection, too. For those, my intuition is that simple image scans will be sufficient.

Second, I’d like to discuss the Atlante Linguistico della Sicilia (the Linguistic Atlas of Sicily). This is a very interesting kind of digitization project because language, in many ways, is intangible. It’s a system of signs based on social conventions. You can begin to document a language and create a digital record by recording analogue sound waves as audio files. These files can then be visualized by transcribing the sounds using a phonetic alphabet. But there are even difficulties with this. First, the perception of linguistic sounds is not straightforward; the languages we speak can shape the way we perceive sounds from another language. So another way that audio files can be visualized is with a spectrogram, which measures amplitude and frequency over time. Spectrograms can help see what we can’t perceive by sound.

And yet this kind of analysis still only goes so far. Transcription and spectrograms can only really tell you about an individual of speaker, not the language itself, or the sociolinguistic context in which this language, other languages, and varieties of them exist together. That’s what I love about the Atlante Linguistico and its “geolinguistic” approach. It uses the traditional tools of language documentation and adds a geospatial dimension to it. The “carta sonora” (sound map) tab is an interesting feature of this site, because it uses a mapping program to show how the same word is pronounced differently in various Sicilian locals, using both transcription and audio files. There’s lots of analysis to go with this on the site’s other pages, which paints a complex picture of the linguistic situation on the island, in which hundreds of distinct languages (though mutually intelligible) exist together in a sociolinguistic environment. I think it’s a brilliant way of taking something so complicated, ephemeral, and intangible as spoken language and preserving it and making it available with digital tools.

Lastly, I’d like to discuss a personal website, managed by Dr. Phil Taylor, on the University of Leeds website. It’s called Phil Taylor’s Papers, and its a collection of articles, essays, doctrinal writings, and reports on the topic of information warfare and strategic communications. I wish more people were aware of how governments use information and the news media to advance their policy goals. So I’m glad to see someone collecting this information under one roof. However, it’s a very casual effort at archiving. It almost seems like Taylor just wanted all these resources in one place for the sake of organizing his own research materials. So I thought it might be worth discussing what went wrong here, when there was so much potential for this to be such a useful public resource.

First, Taylor just cut and pasted the materials he liked onto webpages and organized the many many links under menu tabs. It doesn’t seem like he logged in any metadata at all, so titles and even keywords aren’t searchable. Also, there are no permalinks provided to the original source, and none of the hyperlinks in the original text were preserved. And the themes according to which the materials are organized are really broad. One needs to understand the difference between PSYOP and Strategic Communications to understand what they’re looking at.

There are over 1,000 items in this collection and it would have been an enormously time consuming effort for this one man to catalogue each item, provide metadata and hyperlinks, and create an intuitive schema for organizing all the materials. As it is, it’s a great resource for researchers familiar with the topic, but not much more.

Leave a Reply

Your email address will not be published. Required fields are marked *