Data: pictures, artifacts, movies, music, texts, humans, spoken language, archaeological sites, ... -> digital surrogates
History of corpus linguistics:
while language theory became increasingly interested in language as a universal phenomenon, other linguists had become more and more dissatisfied with the descriptions they found for the various languages they dealt with. - Halliday, M.A.K., et al. Lexicology and Corpus Linguistics, Bloomsbury Publishing, 2004. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/helsinki-ebooks/detail.action?docID=436001.
“[technological] skills training is not research training,” since “the knowledge gained is [as] transient” as the tools themselves, whereas “[critical] thinking skills are the most important because they are the most deeply embedded and the most transferable.”
Scholars who are thinking of "computer projects" need, of course, some notion of what computers can and cannot do. But they need not be formally trained or even deeply knowledgeable about computer theory or technique. The reason for this is that we seem to be coming upon a second stage in using computers in the humanities; it is imagination, if only modestly informed, rather than technical expertness that is most needed now.
Your dataset must be applicable to the methods you choose. Complex methods often make presuppositions about the data they apply to - if you don’t understand these deeply, you’ll end up with invalid results
In typical DH research, 90% of your time will go to gathering and understanding the data and transforming it into a form you can use - using complex methods, another 90% of your time may go to altering them to fit your data, and it’ll run out
Complex methods are often unnecessary for DH work. On the contrary, often simpler methods are actually better.
https://link.springer.com/article/10.1007/BF00120965 Such attitudes can be seen in a person who regards a word processor as something to boost the efficiency of a secretary or in someone who sets out to write an expert system by hiring someone else to do the programming. This tendency to consider the technology as separable from the research or instruction may be fostered by computing and the humanities courses which teach software use or programming in a kind of content-free environment that stresses technique and hopes that desirable and specific applications of the technique will somehow later emerge in students' minds.
On the other hand, when computer scientists entertain the notion of computing and the humanities they tend to regard the activity as somewhat uninteresting, even frivolous. Most computer scientists regard their discipline as primarily one of perfecting approaches to machine computation and they regard problem solving as an abstract task rather than one of developing specific application programs, especially applications in the humanities. In the world of computer science there is not yet an applied computer science specialization, though some engineering schools emphasize applications. If a computer scientist does become involved in humanities research or teaching it is often at a largely technical level, perhaps as a favor to a colleague.
If, for example, a course is taught by two persons, each specializing in either the idea or the technical aspect of the process, this team approach itself quickly communicates the idea that the subject matter being studied cannot easily be handled from a unified perspective by a single person. In other words, the idea is underscored that computing in the humanities is not an identifiable field but a loose, ad hoc amalgamation of at least two fields.
The key to integrating computer science and the humanities is to develop and promote projects and courses for which the involvement of the computer is integral rather than secretarial. In achieving this goal I would insist that some humanists need to develop their understanding of computer science as well as their computer skills and some computer scientists must become more involved in non-technical, humanistic research. The result would be a type of scholar in whom computational and humanistic interests combine to foster new directions and methods on the basis of traditional concerns.
As soon as I had said that it would be a course on literature and computing, the immediate response of one person, delivered in a very knowing and almost amused manner was, "Oh, you mean the kind of course where you count words and decide if Shakespeare did it."