Annette Greiner is a data nerd with particular interest in interaction design and visualization. She came to NERSC upon completion of a Master's in Information Management and Systems from UC Berkeley's School of Information, with a focus on human-computer interaction. Before returning to school, she developed web sites for the Advanced Light Source and the DOE Joint Genome Institute. At NERSC, her duties include wrangling data and developing science gateways of all kinds. At a broader level, she represents the Laboratory to the World Wide Web Consortium (W3C) and has served on international working groups related to publication and sharing of data. Her interests touch the UC Berkeley campus as well, where she teaches data visualization for the Master's in Information and Data Science (MIDS) program at the School of Information. Annette also holds a B.S. in biomedical science and theater from the University of Michigan.
Suzanne M. Kosina, Annette M. Greiner, Rebecca K. Lau, Stefan Jenkins, Richard Baran, Benjamin P. Bowen and Trent R. Northen, "Web of microbes (WoM): a curated microbial exometabolomics database for linking chemistry and microbes", BMC Microbiology, September 12, 2018, 18, doi: https://doi.org/10.1186/s12866-018-1256-y
As microbiome research becomes increasingly prevalent in the fields of human health, agriculture and biotechnology, there exists a need for a resource to better link organisms and environmental chemistries. Exometabolomics experiments now provide assertions of the metabolites present within specific environments and how the production and depletion of metabolites is linked to specific microbes. This information could be broadly useful, from comparing metabolites across environments, to predicting competition and exchange of metabolites between microbes, and to designing stable microbial consortia. Here, we introduce Web of Microbes (WoM; freely available at: http://webofmicrobes.org), the first exometabolomics data repository and visualization tool.
Adam P Arkin, Robert W Cottingham, Christopher S Henry, Nomi L Harris, Rick L Stevens, Sergei Maslov, Paramvir Dehal, Doreen Ware, Fernando Perez, Shane Canon, Michael W Sneddon, Matthew L Henderson, William J Riehl, Dan Murphy-Olson, Stephen Y Chan, Roy T Kamimura, Sunita Kumari, Meghan M Drake, Thomas S Brettin, Elizabeth M Glass, Dylan Chivian, Dan Gunter, David J Weston, Benjamin H Allen, Jason Baumohl, Aaron A Best, Ben Bowen, Steven E Brenner, Christopher C Bun, John-Marc Chandonia, Jer-Ming Chia, Ric Colasanti, Neal Conrad, James J Davis, Brian H Davison, Matthew DeJongh, Scott Devoid, Emily Dietrich, Inna Dubchak, Janaka N Edirisinghe, Gang Fang, José P Faria, Paul M Frybarger, Wolfgang Gerlach, Mark Gerstein, Annette Greiner, James Gurtowski, Holly L Haun, Fei He, Rashmi Jain, Marcin P Joachimiak, Kevin P Keegan, Shinnosuke Kondo, Vivek Kumar, Miriam L Land, Folker Meyer, Marissa Mills, Pavel S Novichkov, Taeyun Oh, Gary J Olsen, Robert Olson, Bruce Parrello, Shiran Pasternak, Erik Pearson, Sarah S Poon, Gavin A Price, Srividya Ramakrishnan, Priya Ranjan, Pamela C Ronald, Michael C Schatz, Samuel M D Seaver, Maulik Shukla, Roman A Sutormin, Mustafa H Syed, James Thomason, Nathan L Tintle, Daifeng Wang, Fangfang Xia, Hyunseung Yoo, Shinjae Yoo, Dantong Yu, "KBase: the United States department of energy systems biology knowledgebase", Nature Biotechnology, July 6, 2018, 36.7, doi: 10.1038/nbt.4163.
Here we present the DOE Systems Biology Knowledgebase (KBase, http://kbase.us), an open-source software and data platform that enables data sharing, integration, and analysis of microbes, plants, and their communities. KBase maintains an internal reference database that consolidates information from widely used external data repositories. This includes over 90,000 microbial genomes from RefSeq4, over 50 plant genomes from Phytozome5, over 300 Biolog media formulations6, and >30,000 reactions and compounds from KEGG7, BIGG8, and MetaCyc9. These public data are available for integration with user data where appropriate (e.g., genome comparison or building species trees). KBase links these diverse data types with a range of analytical functions within a web-based user interface. This extensive community resource facilitates large-scale analyses on scalable computing infrastructure and has the potential to accelerate scientific discovery, improve reproducibility, and foster open collaboration.
Oliver Rübel, Annette Greiner, Shreyas Cholia, Katherine Louie, E. Wes Bethel, Trent R. Northen, Benjamin P. Bowen, "OpenMSI: A High-Performance Web-Based Platform for Mass Spectrometry Imaging", Analytical Chemistry, 2013, 85 (21), pp 10354–10361, October 2, 2013, doi: 10.1021/ac402540a
Stephen Leak, Annette Greiner, Ann Gentile, James Brandt, "Supporting failure analysis with discoverable, annotated log datasets", CUG 2018 Proceedings, Stockholm, Cray User Group, May 22, 2018,
Detection, characterization, and mitigation of faults on supercomputers is complicated by the large variety of interacting subsystems. Failures often manifest as vague observations like ``my job failed" and may result from system hardware/firmware/software, filesystems, networks, resource manager state, and more. Data such as system logs, environmental metrics, job history, cluster state snapshots, published outage notices and user reports is routinely collected. These data are typically stored in different locations and formats for specific use by targeted consumers. Combining data sources for analysis generally requires a consumer-dependent custom approach. We present a vocabulary for describing data, including format and access details, an annotation schema for attaching observations to a dataset, and tools to aid in discovery and publishing system-related insights. We present case studies in which our analysis tools utilize information from disparate data sources to investigate failures and performance issues from user and administrator perspectives.
Ville Ahlgren, Stefan Andersson, Jim Brandt, Nicholas Cardo, Sudheer Chunduri, Jeremy Enos, Parks Fields, Ann Gentile, Richard Gerber, Joe Greenseid, Annette Greiner, Bilel Hadri, Helen He, Dennis Hoppe, Urpo Kaila, Kaki Kelly, Mark Klein, Alex Kristiansen, Steve Leak, Michael Mason, Kevin Pedretti, Jean-Guillaume Piccinali, Jason Repik, Jim Rogers, Susanna Salminen, Michael Showerman, Cary Whitney, Jim Williams, "Cray System Monitoring: Successes, Priorities, Visions", CUG 2018 Proceedings, Stockholm, Cray User Group, May 22, 2018,
Effective HPC system operations and utilization require unprecedented insight into system state, applications’ demands for resources, contention for shared resources, and system demands on center power and cooling. Monitoring can provide such insights when the necessary fundamental capabilities for data availability and usability are provided. In this paper, multiple Cray sites seek to motivate monitoring as a core capability in HPC design, through the presentation of success stories illustrating enhanced understanding and improved performance and/or operations as a result of monitoring and analysis.We present the utility, limitations, and gaps of the data necessary to enable the required insights. The capabilities developed to enable the case successes drive our identification and prioritization of monitoring system requirements. Ultimately, we seek to engage all HPC stakeholders to drive community and vendor progress on these priorities.
Kirill Lozinskiy, Lisa Gerhardt, Annette Greiner, Ravi Cheema, Damian Hazen, Kristy Kallback-Rose, Rei Lee, User-Friendly Data Management for Scientific Computing Users, Cray User Group (CUG) 2019, May 9, 2019,
Wrangling data at a scientific computing center can be a major challenge for users, particularly when quotas may impact their ability to utilize resources. In such an environment, a task as simple as listing space usage for one's files can take hours. The National Energy Research Scientific Computing Center (NERSC) has roughly 50 PBs of shared storage utilizing more than 4.6B inodes, and a 146 PB high-performance tape archive, all accessible from two supercomputers. As data volumes increase exponentially, managing data is becoming a larger burden on scientists. To ease the pain, we have designed and built a “Data Dashboard”. Here, in a web-enabled visual application, our 7,000 users can easily review their usage against quotas, discover patterns, and identify candidate files for archiving or deletion. We describe this system, the framework supporting it, and the challenges for such a framework moving into the exascale age.
"BigNeuron", Prabhat, Kris Bouchard, Shreyas Cholia, Annette Greiner, NERSC Science Highlight, March 31, 2015,
Annette Greiner, Evan Racah, Shane Canon, Jialin Liu, Yunjie Liu, Debbie Bard, Lisa Gerhardt, Rollin Thomas, Shreyas Cholia, Jeff Porter, Wahid Bhimji, Quincey Koziol, Prabhat, "Data-Intensive Supercomputing for Science", Berkeley Institute for Data Science (BIDS) Data Science Faire, May 3, 2016,
Review of current DAS activities for a non-NERSC audience.