TextMining

For my Text Mining project, I decided to keep my table of contents for all of my sources, so that way, I will have extra guidance while navigating through my sources. In addition, I added stopwords that were completely not related to my topic such as “address, Harris, use, and http.” These were only some of the words that I chose to add to the stopwords list because I wanted to make sure that the words presented were instead, “words [that] matter” to my topic of discussion, and that “hang together in interesting ways,” just as Ted Underwood describes it as “individual dabs of paint that together start to form a picture” (Underwood, T., 2012). Adding these stop words to the list allowed me to generate closer connections between my sources, and ultimately will help easily tie my sources and topic together in my final project. As Trevor Underwood stated in his blog, Text Mining involves “quantitative analysis” which “makes things easier” when “working on a scale where it’s impossible for a human reader to hold everything in memory,” thus, by using tools for text mining such as Voyant, I was able to get a clearer picture of what my topic entails of, and how all of my sources relate to one another (Underwood, T., 2012).

The most helpful tools on Voyant are the Context section, the Reader, and the Terms list. I found these to be the most helpful tools because they each allow users to navigate through their article and quickly find what they may be looking for at the time. The terms list allows Voyant users to view a list of terms and how often they are used throughout a given text. This, according to Underwood, “can provide clues that lead to real insights about a single author or text,” by using multiple sources “for comparison,” (Underwood, T., 2012). Furthermore, the Reader allows users to view the full article, as well as navigate certain vocabulary words quickly by using the drop menu and selecting the word in interest. Once the word is selected from the drop-down menu, the article pinpoints the exact location of each word in the article, allowing users to gain context and insight that surrounds the topic words selected. This will be very convenient when finding evidence in the sources that I selected for my final project. Finally, the Contexts section on Voyant tools allows for users to again, view key terms and the context that revolves around them according to the particular sources that are uploaded. This will also help with finding important details and evidence to add context to my final project. The Cirrus, Trend, and Summary features on Voyant tools were also helpful with identifying certain characteristics of a given source.

The dropdown menu is located under the Reader tool in Voyant tools.

What struck me the most was that Voyant doesn’t allow users to change the words used to interpret the trend. This would have made things easier in terms of navigating through the text in that if I were to click on any point on the graph, I would be able to pull some context revolving around the word I would’ve chosen instead of “use” or “used.” Other than that, Voyant is a great tool to use to navigate through sources by retrieving information easier when managing a large number of complex sources in historical research.

What I experience using Voyant had quite an impact on my past reading and understanding of texts. By utilizing “relatively simple statistical techniques,” I was able to characterize a large body of sources “a great deal better than [my] intuitions would[‘ve] predict[ed]” (Underwood T., 2012). In the future, I will be able to utilize the techniques I picked up on when working with Voyant by gathering my sources and categorizing them through comparing and contrasting particular features such as the frequency of word choices, and context.

Some downfalls of Voyant are that some of my sources didn’t upload fully, as I was unable to view them under the Reader feature of Voyant tools. In addition, the trend doesn’t allow you to change the words used to generate it. Furthermore, Voyant tools analyzed words that were irrelevant to my topic, however, this issue was fixed by adding those irrelevant words to the Stopwords List menu. Also, adding words that you want to analyze can be done by adding them to the White List menu. Finally, when accessing Voyant through Safari, it did not allow me to export my tools onto WordPress. However, when I accessed Voyant tools through Google Chrome, I was able to export the material just fine.

My experience with working with Voyant differs from the ones presented in the readings in that the ones that are discussed in the reading titled “Seven Ways Humanists are Using Computers to Understand Text” are all supervised and my model was unsupervised. The difference between the two is that supervised models tend to “have an explicit goal,” whereas unsupervised models “start with an unlabeled collection of texts; [and] you ask a learning algorithm to organize the collection by finding clusters or patterns of some loosely specified kind,” (Underwood T., 2015). I further narrowed down irrelevant vocabulary words generated for analysis by adding stopwords and additional words to the White list. In addition, there are other ways text mining could be used to present vital relationships between sources that the readings portrayed that was very different from my experience when using Voyant tools. For example, Cameron Blevins presented “topic modeling” which “emerges when we examine thematic trends across [an] entire diary” (Blevins, C., 2010).

Topic Modeling Textmining Example presented above.

Distant reading, as mentioned by Underwood, is also called “text analysis” which is “really an interdisciplinary conversation about methods,” which allows for us to “retrace the syntactic patterns that organize readers’ understanding of specific passages” (Underwood T., 2015). On the other hand, “surface readings, because they visualize textual patterns that are open to direct inspection” and “tells us nothing we couldn’t learn by reading on our own” (Underwood T., 2015). By using both the techniques of distant reading, in addition to “traditional close reading[s]” I was able to not only choose different features to represent my sources such as bubble lines, trends, and knots, based on distinct vocabulary but I was also able to find “rigorous interpretations behind the overall trends” (Cohen, D., 2010).

References:

Aiden, E. & Michel J. (2011, September 20). What we learned from 5 million books. Youtube. Retrieved on December 3, 2021, from https://www.youtube.com/watch?v=5l4cA8zSreQ.

Blevins, C. (2010, April 1). Topic Modeling Martha Ballard’s Diary. Retrieved on December 3, 2021, from http://www.cameronblevins.org/posts/topic-modeling-martha-ballards-diary/.

Cohen, D. (2010, October 4). Searching for the Victorians. Retrieved on December 3, 2021, from https://dancohen.org/2010/10/04/searching-for-the-victorians/.

Underwood, T. (2012, August 14). Where to Start with Text Mining. tedunderwood. Retrieved on December 3, 2021, from https://tedunderwood.com/2012/08/14/where-to-start-with-text-mining/.

Underwood, T. (2015, June 4). Seven Ways Humanists are Using Comupters to Understand Text. tedunderwood. Retrieved on December 3, 2021, from https://tedunderwood.com/2015/06/04/seven-ways-humanists-are-using-computers-to-understand-text/.

Leave a Reply

Your email address will not be published. Required fields are marked *

css.php