Humanities and High Performance Computers Connect at NERSC
December 28, 2008
“Supercomputers have been a vital tool for science, contributing to numerous breakthroughs and discoveries. The Endowment is pleased to partner with DOE to now make these resources and opportunities available to humanities scholars as well, and we look forward to seeing how the same technology can further their work,” says NEH Chairman Bruce Cole.
Three projects have been selected to participate in the program’s inaugural run.
The Perseus Digital Library Project, led by Gregory Crane of Tufts University in Medford, Mass., will use NERSC systems to measure how the meanings of words in Latin and Greek have changed over their lifetimes, and compare classic Greek and Latin texts with literary works written in the past 2,000 years. Team members say the work will be similar to methods currently used to detect plagiarism. The technology will analyze the linguistic structure of classical texts and reveal modern pieces of literature, written or translated into English, which may have been influenced by the classics.
“High performance computing really allows us to ask questions on a scale that we haven’t been able to ask before. We’ll be able to track changes in Greek from the time of Homer to the Middle Ages. We’ll be able to compare the 17th century works of John Milton to those of Vergil, which were written around the turn of the millennium, and try to automatically find those places where Paradise Lost is alluding to the Aeneid, even though one is written in English and the other in Latin,” says David Bamman, a senior researcher in computational linguistics with the Perseus Project.
According to Bamman, the basic methods for creating such a literary analysis tool have existed for some time, but the capability for analyzing such a huge collection of texts couldn’t be fully developed due to a lack of compute power. He notes that the collaboration with DOE and NERSC eliminates that roadblock.
In addition to tracking changes in ancient literature, NERSC computers will also be reconstructing ancient artifacts and architecture with the High Performance Computing for Processing and Analysis of Digitized 3-D Models of Cultural Heritage project, led by David Koller, Assistant Director of the University of Virginia’s Institute for Advanced Technology in the Humanities (IATH) in Charlottesville, Va.
Over the past decade, Koller has traveled to numerous museums and cultural heritage sites around the world, taking 3D scans of historical buildings and objects — recording details down to a quarter of a millimeter.
According to Koller, a 3D scan of the Renaissance statue David, carved by Michelangelo, contains billions of raw data points. To convert this raw data into a finished 3D model is extremely time consuming, and nearly impossible on a desktop computer. Limited compute power has also limited Koller’s ability to efficiently recreate large historical sites, like Roman ruins in Italy or Colonial Williamsburg in Virginia. He hopes to use the NERSC resources to digitally restore these sites in three-dimensional images for analysis.
Over the years, Koller has also digitally scanned thousands of fragments that chipped off ancient works of art, some dating back to the ancient Greek and Roman empires. Koller hopes to use NERSC computers to put these broken works back together again like a digital 3D jigsaw puzzle.
“The collaboration with NERSC opens a wealth of resources that is unprecedented in the humanities,” says Koller. “For years, science reaped the benefits of using supercomputers to visualize complex concepts like combustion. Humanists, on the other hand, didn’t realize that supercomputers could potentially meet their needs too, until NEH and DOE proposed this collaboration last year.… I am really excited to see what comes out of this partnership.”
In contrast to the other Humanities High Performance Computing projects that will be done at NERSC, the Visualizing Patterns in Databases of Cultural Images and Video project, led by Lev Manovich, Director of the Software Studies Initiative at the University of California, San Diego, is not focused on working with a single data set. Instead, this project hopes to investigate the full potential of cultural analytics using different types of data including: millions of images, paintings, professional photography, graphic design, user-generated photos; as well as tens of thousands of videos, feature films, animation, anime music videos and user-generated videos.
“Digitization of media collections, the development of Web 2.0 and the rapid growth of social media have created unique opportunities to studying social and cultural processes in new ways. For the first time in human history, we have access to unprecedented amounts of data about people’s cultural behavior and preferences as well as cultural assets in digital form,” says Manovich.
For approximately three years, Manovich has been developing a broad framework for this research that he calls Cultural Analytics. The framework uses interactive visualization, data mining, and statistical data analysis for research, teaching and presentation of cultural artifacts, processes and flows. Manovich’s lab is focusing on analysis and visualization of large sets of visual and spatial media: art, photography, video, cinema, computer games, space design, architecture, graphic and web design, product design. Another focus is on using the wealth of cultural information available on the web to construct detailed interactive spatio-temporal maps of contemporary global cultural patterns.
“I am very excited about his award to use NERSC resources, this opportunity allows us to undertake quantitative analysis of massive amounts of visual data,” says Manovich. “We plan to process all images and video selected for our study using a number of algorithms to extract image features and structure; then we will use variety of statistical techniques — including multivariate statistics methods such asfactor analysis, cluster analysis, and multidimensional scaling — to analyze this new metadata; finally, we will use the results of our statistical analysis and the original data sets to produce a number of highly detailed visualizations to reveal the new patterns in our data.”
About the National Endowment for the Humanities Created in 1965 as an independent federal agency, the National Endowment for the Humanities supports learning in history, literature, philosophy, and other areas of the humanities. NEH grants enrich classroom learning, create and preserve knowledge, and bring ideas to life through public television, radio, new technologies, museum exhibitions, and programs in libraries and other community places. Additional information about the National Endowment for the Humanities and its grant programs is available on the Internet at www.neh.gov.
For information about the Perseus Digital Library Project, visit: http://www.perseus.tufts.edu/hopper/
For information about IATH at the University of Virginia, visit: http://www.iath.virginia.edu/
For information about Cultural Analytics, visit: http://lab.softwarestudies.com/2008/09/cultural-analytics.html
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high-performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 7,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. DOE Office of Science. »Learn more about computing sciences at Berkeley Lab.