Digital Archiving in the Physics Literature

Author to Archive and Beyond -The American Physical Society-

Robert A. Kelly APS Director - Journal Information Systems

The American Physical Society (APS) has long recognized as its goal the diffusion of the knowledge of physics. For the past 100 years, the society has used a paper-based, print-oriented publishing process in support of this goal. This paper is a summary of a presentation delivered at the North American Serials Interest Group (NASIG) Conference on June 21, 1996. It describes the strategies and projects, being developed and implemented, that will enable the exploitation of the emerging Internet and electronic publishing technologies in support of the Society’s goal. The transparencies used in this presentation are available as a pdf file (34K)

Introduction

The primary journal family of the American Physical Society is the Physical Review. Since its inception in 1893, the journal has twigged into six primary journals, Physical Review A through E and Physical Review Letters. The Society also publishes a review journal, Reviews of Modern Physics. I estimate that by the end of 1996, we will have published approximately one and a half million pages of physics.

At the start of our journey into electronic publishing, in the summer of 1993, the APS adopted three unique, at the time, approaches to the journey. For starters, we decided to take a systems view and consider publishing as a process that starts with the researcher (author), passes through peer review, production, distribution, an archive and then onto another scholar (the reader). 1 The process most likely is repeating itself, with research building upon research, in a continuing cycle from 1893 into the future. Our strategy, was and is, to apply electronic publishing technologies and the use of the Internet to this physics publishing process.

Our second approach was to change the paradigm from the printing of articles to one of archiving and then distributing physics information. We were early adopters of SGML and the use of ISO12083 as a DTD. In 1994, we migrated Physical Review Letters (PRL) from a TROFF based composition system to a production process based on SGML. This enabled APS to not only reduce costs in the composition of the PRL print journal but to also offer the journal on-line. In July of 1995, we launched Physical Review Letters - online (PRL-o), via the OCLC Guidon interface. In August of 1995, we launched a HTML and subsequently a pdf version, also through OCLC, of PRL-o. Both of the on-line versions were made possible by the use of SGML in the production process.

We also recognized that much of the innovation in this new era would come from the researchers, the authors and the readers. Paul Ginsparg’s e-print server is a prime example of user innovation being brought into the process. I’ll talk more on e-prints later.

Over the past three years, our goals have been formed in response to technology and business challenges. The business challenges include rising costs, declining funds for subscriptions and the internationalization of the process.

The business challenges will continue. However, it is time to explore the new media as a new way of scholarly communication and research. In his book, "Film Style and Technology: History and Analysis:" 2 Barry Salt posits that the basic format of a stage play, possibly enhanced, has been carried over into the cinema. It was the introduction of technology that enabled the director to change the theatrical viewing perspective by moving the camera. It was the introduction of technology that enabled trick effects leading to the current film releases exploiting special effects.

Our efforts, to date, have been to replicate the print journal, with its supporting infrastructure, on-line. There are differences between a paper being presented at a conference, an article in a journal, or a paper on an interactive bulletin board on the Internet. The same "information" could be communicated through any of these forms, yet the treatment of the information is quite different in the way that the reader interacts with it. The role of the "reader", on the Internet, has been replaced with that of "user". Assuming the continued explosion of the Internet technology, what can happen to the journal structure and process? We have recently set up a journals research group to answer this question.

Projects

In anticipation of answering the above question, The American Physical Society has created four major project categories as an umbrella for experimentation and implementation of our strategic goals: E-Print, Re-engineering the Editorial Process, Archiving and Distribution. The future direction of these projects will be guided by the work of our research group. The remainder of this paper will be to describe these projects.

APS E-Print Server

For years physicists had been communicating current research with their colleagues through a process known as pre-print. This was the distribution of articles in advance of the article being submitted for peer review. The world wide web was created at CERN as a tool to facilitate this communication. In 1991 the pre-print process evolved into an e-print archive, at Los Alamos National Laboratory. For the first time, physicists had a place to store their papers so that others might retrieve them. The process changed from one where the article was pushed to the reader to one where the reader pulled the article. The paradigm shift of push to pull is an important distinction in the evolution from print to e-publishing.

The American Physical Society had followed the e-print archive phenomenon with great interest. The APS sensed that it might have an important role to play at this critical juncture . In the summer and fall of 1994, the APS conducted an on-line forum and on Oct 30, 1994 a workshop 3 , in Los Alamos National Lab., to assist in the formulation of APS policy concerning e- print archives.

    Observations from the workshop are the following:
  1. The landscape of scientific publishing is changing rapidly.
  2. We can use technology to not only accelerate the process but to improve it and add value to it.
  3. A "one size fits all" paper-based policy is obsolete. We can craft a process that fits the different needs of many users.
  4. Author tools and education are necessary.
  5. There is a place for e-prints in the current and probably future APS publishing policy.
  6. The definition of future processes should be explored through small experiments.
  7. We should establish a collaborative effort with universities, libraries, and other societies; representatives from all of these were present at the workshop.'

On July 1, 1996 APS launched a prototype e-print server. 4 The APS server will complement the XXX server at Los Alamos. Testing and development is scheduled to be complete for this first version by October 1, 1996. The APS e-print archive will be freely available to all physicists, without regard to discipline or journal of ultimate publication. Authors, if they so choose, will be able to submit papers , from the e-print archive directly to APS for review. In addition to allowing authors to make their preprints readily available, the service is intended to help with the exploration of new Internet-based technology for submitting, refereeing, editing, and publishing papers in the Physical Review journals.

Articles submitted for information sharing purposes should be in a readable format. We prefer postscript, pdf or HTML. Articles submitted for eventual submission to an APS journal should include the source material for the article. ReVTeX, LaTeX are preferred. We hope to accept SGML in the future.

For the future, we are considering the development of a web-based SGML authoring tool, similar in concept to NetScape Navigator Gold. Using JAVA appletts, the tool would guide an author through the creation of a manuscript, following an SGML DTD. The resulting article would be available in pdf and PostScript with a SGML source file that parses to an arbitrary DTD. In addition to preparing the article for future archiving in a database, the tool could provide services such as verifying references and creating web links.

Re-engineering the Editorial Process

In an article titled "Re-Engineering Peer Review" in the magazine The Economist 5 , the author states "The Internet was developed by scientists for scientists, but many now fear that its anarchic style could endanger the quality of research. The threat is an opportunity in disguise." We believe that it is a major opportunity to reduce costs and improve the cycle time of the peer review process. We also believe that once our entire process is using web-based technology, it will be easier to change the paradigm of peer review.

It is our intent to build a web-based Intra- and Internet process to manage our editorial and peer review processes electronically and with electronic manuscripts. This system will service users who are authors, referees, editors and readers, scattered around the planet. An example of the services we will provide is in our recently implemented Author Status Inquiry System (ASIS). The system, accessed through the world wide web, enables an author to pinpoint the progress of a paper in peer review.

Our target is to convert accepted articles into SGML and archive them for subsequent use. Currently we archive Physical Review D, Physical Review Letters and the Rapid Communications sections of all of our journals in SGML. We use ISO 12083 as the standard for our DTD. I anticipate that all of our journals will be in SGML format by year end 1997. All of our journals are composed electronically, producing postscript for the printed issue. This gives us the opportunity to distill pdf files from the PostScript files. We are conducting our e-journal experiments using these pdf files.

Archive

APS, like most publishers, started composing journals electronically in the early 1980’s. The first steps were a combination of electronic keystrokes of text and math that created a single-column camera-ready copy that was then cut and pasted with figures into mechanicals. We saved all of these keystrokes which have become the nucleus of our archive activities. Currently the archive is a hodgepodge of formats; from TROFF to TeX to Xyvision to SGML. By the end of 1997, I hope to have all future content in SGML, our theory being that an SGML database will provide us with the greatest reusability and data base options.

In 1994 we started an experiment at Los Alamos: Physical Review On-Line Archives (PROLA) 6 . The goal of the project is to see if we can use the saved keystrokes and integrate the various composition formats used over the past ten years. Image and pdf will be used for display and the keystrokes for searching , linking and other navigation techniques. We have been successful and the project is being tested by physicists at Los Alamos. Our immediate goal is to have an archive from 1985 through 1995 that can be linked to from our e-journal offerings. We are seriously considering bringing the archive, on-line back to 1893, the beginning of Physical Review. This archive will be a component of future offerings.

E-Journal Offerings

The paradigm shift of push to pull is an important distinction in the evolution from print to e-publishing. The current STM publishing process, from a readers point of view, is a push process. The peer reviewed articles are packaged by a publisher, into a journal, and pushed out to the readers. With the introduction of web technology it is possible for a reader to interact with the publishers system and pull the information. The publisher need only make the URL of the content available and the reader can obtain the article with a web browser.

The American Physical Society has several ongoing experiments where the reader comes to the content. On July 1, 1995 we launched Physical Review Letters- on-line through OCLC. As of July 1, 1996 we will have three volumes of this weekly letters journal, on-line, covering from January 1, 1995 to July 1, 1996. The journal is available on the web with the articles in pdf format. The journal is also available through the OCLC Guidon interface.

On July 1, 1996 we launched Physical Review C on-line (PRC-o) and Physical Review B Rapid Communications on-line (PRBR-o). Both of these offerings are web-based and deliver pdf files of the articles. PRC-o is available to institutions and members of the APS. PRBR-o is only available to members of the APS.

We have several experiments where we deliver content to institutions for subsequent use. The TORPEDO project at the Naval Research Laboratory 7 is an example of a subsequent-use archive customized for an institution. APS supplies unbound copies of Physical Review Letters and Physical Review E to NRL for availability to the NRL researchers. The journals are scanned into a image format and are available to the researchers. In 1997, we plan to assist NRL in migrating this system to an SGML database-oriented system with pdf copies of the articles.

The University of Illinois at Urbana-Champagne, was one of the winners of the National Science Foundation grants for the Digital Library Initiative. 8 One of the goals of this project is to develop the technologies and processes to integrate the SGML (ISO 12083 compliant) files from several publishers into a data base that will provide on-line access to the journals at the University. The APS is supporting this effort by contributing SGML created for the distribution of the paper and on-line editions of Physical Review Letters. We are using the same SGML process and files to feed the paper, on-line, and DLI.

Summary

In summary, the American Physical Society is committed to exploiting emerging technologies to develop a full electronic infrastructure, setting the stage for innovation in scholarly communication and research. Our immediate goal is to improve the cycle time of the process and reduce costs.

Acknowledgment

The visions, strategies and projects described in this paper are not the product of any one individual but the sum of numerous conversations, debates, and meetings with many physicists, librarians, the APS Electronic Publishing Committee, the APS Editorial staff, and the Journal Information Systems staff of the American Physical Society.

Notes

  1. One of our first public discussions of our strategy was presented at the Association of Research Libraries conference on November 6, 1994. A copy of the paper is on http://publish.aps.org/RAKELLY/arl.html

    An even earlier presentation is included in my testimony before the Joint Committee on Printing, US Congress, "New Technology and the Government Printing Office" - July 24, 1991.

  2. Film Style and Technology: History and Analysis:, Barry Salt, Starword (ISBN 0 9509066 2 x)
  3. The workshop proceedings are available at http://publish.aps.org/EPRINT/eprthome.html The observations are in the REPORT TO COUNCIL ON E-PRINT ARCHIVE WORKSHOP, LOS ALAMOS, OCT. 14-15, 1994 B. Bederson at http://publish.aps.org/EPRINT/losa.html
  4. The APS E-Print Archive can be found at http://publish.aps.org/artintro.html The description of the service, in this paper, is extracted in part from the above URL.
  5. The Economist, June 22, 1996, page 79.
  6. A paper on our archival thinking is available at http://www.c3.lanl.gov:8077/papers/apsPlan.html
  7. NRL Library Home Page - InfoWeb, http://infonext.nrl.navy.mil/
  8. Illinois Digital Library Initiative Project, http://surya.grainger.uiuc.edu/dli/