PGN

For some time I have been toying with Harold van der Heijden’s magnificent study database (see http://www.hhdbvi.nl/ for the latest release). It comes as a PGN database, and requires a chess database or closely related software, such as CQL programs to use. If I want to use it with software more closely adapted to chess compositions, there seems to be no alternative than to create the software to do so.

The first problem in that task is getting to grips with PGN, and for that a copy of the PGN Standard is needed. As with most Internet-distributed material, it should preferably as close to its latest publication as possible in order to avoid any later modifications, intentional or not.

Where can I find one? 

The Short Story

The web page http://tim-mann.org/Standard is the best I have found so far.

The Long Story

Starting with the Wikipedia article on PGN I find a link to https://www.chessclub.com/help/PGN-spec. This document is unfortunately incomplete, and breaks off in section 8.2.3.5, after about 800 lines. (Later findings suggest that the full specification has about 2920 lines).

(Added on 2020-09-29: When I revisit the wikipedia article, it looks as if things have changed since I first wrote this.)

Another Wiki link goes to http://www.saremba.de/chessgml/standards/pgn/pgn-complete.htm. This is a web translation of the original document, so it can’t be regarded as a primary source. Judging the content will require a copy of the standard to compare it against, … but that’s what I’m looking for.

The Usenet newsgroup rec.games.chess mentioned in both these documents don’t seem to be archived by Google groups. There are some commercial providers that may have them – this may be a later track to follow, and probably requires signing up for their services

The Internet Archive provide several archives that cover Usenet news. The most promising is the https://archive.org/details/usenet-rec set, which contains rec.games.chess postings from March 1989, including one by Steven J Edwards with a call for discussion about an early version of PGN, and referring to it as part of the SAN kit. (So I need to start looking for the SAN kit as well.)

The first mention of a formal specification of PGN is from a posting dated 29 Sep 1993, and mentions that it should be available ‘early next week’. The specification appear to have been emailed to those who showed an interest. In Dec 1993, there is an announcement that the standard (version 1993.12.19) would be part of the PGN games archives at chess.uoknor.edu. This seems to be the first authoritative source for the standard document itself, apart from the copies that the author sent on request by email.

In a posting from 19 Apr 2003, I find a reference to http://tim-mann.org/Standard , which, if I may trust the Internet Archive WaybackMachine, goes back to 2001. (The site belongs to Tim Mann, who wrote Gnu XBoard and WinBoard.) However, this file should probably be regarded as unauthoritative.   Unfortunately, chess.uoknor.edu is long gone. And it does not appear to be among the FTP sites archived at the Internet Archive. 

The newsgroup also provided a FAQ posting, part 2 of which (http://www.faqs.org/faqs/games/chess/part2/) identifies the Internet Chess Server as a general repository where, among other things, the PGN Standard can be found. But just as chess.uoknor.edu seems to have disappeared, so does ics.onenet.net. The FAQ was last updated in 2002.

A similar FAQ for the newsgroup rec.games.chess.computer points to caissa.onenet.net as repository, but that site has also disappeared in the history of the Internet. (The FAQ does not say when it was last updated, but the current copy appears to have been last posted in November, 1995, the same year as the newsgroup was created.)

The author of the PGN specification, Steven J. Edwards, died in 2016.

Result: one copy found, but not from what could be regarded as an authoritative site.

Next try.

The Chess Programming Wiki provides an alternative PGN wiki page (see https://www.chessprogramming.org/Portable_Game_Notation). This page links to https://www.thechessdrum.net/PGN_Reference.txt. This is a pure text file so this is at least a candidate standard document. 

Steven J. Edwards also developed the SAN kit, a toolkit for chess programming. The original seems to have been distributed as a file ‘SAN.tar.Z’, but other archiving and compression methods have been observed. Using that file name as search pattern for Google and FTP search engines, eventually produced the hit [ftp://ftp.freechess.org/pub/chess/Unix/SAN.tar.gz]

While freechess.org still exists, its FTP services appears to have gone. The search engine entry was last updated in 2012. There are no obvious traces from the freechess.org web site to this file either.

Additionally, the Internet Archive Wayback Machine (https://web.archive.org/) saved a copy of the www.chessclub.com page (the page with only 813 lines mentioned above) on Dec 2, 2000. This copy has 2918 lines, and so appears to a better copy than the one they presently provide.

After searches for the SAN kit itself, a project SANKit was located on Sourceforge. While the project description makes it clear that the code base is based on the original SAN kit but has been modified, the downloadable zip archive appears to contain the original SAN kit file (…/original_archive/sankit.tar.gz) in which a file ‘Standard’, dated 1994-02-22, containing a PGN specification with a revision date of 1994-02-21.

Compared with the original Wikipedia links, which both have 1994-03-12 as revision date, this is not the latest release, but it is very likely something against which other candidates can be compared and evaluated.

Additional searches in personal backups and old-but-not-yet-discarded hard drives came up with a PGN_Standard.txt files from 2005, as well as one file from the distribution of chest 3.19 (a chess-problem solving program by Heiner Marxen, which also appears to have been lost to sight).

The search result is now, leaving out the SAN Kit file:

  1. thechessdrum file from ??? but probably later than original
  2. chest file from 1998
  3. personal file from 2005
  4. chessclub file from 2000
  5. tim mann file from 2001

The personal file was quickly eliminated, as it appeared to have inserted CR/LF in the middle of some lines, sometime in the middle of a word. These breaks were very regularly placed throughout the files: the appear to have been inserted by some automatic process; possibly the result of bad compression or, more probably, bad decompression software.

The chest file did not have these any such insertions. It matched the personal files reasonably well, and is also the oldest of the located files.

The thechessdrum file differed from the chest file in having 187 lines of table of contents inserted at the beginning of the file. It also appears to have an additional character inserted in section 0. Preface, line 199:

>From the Tower of Babel story:

  The chest file does not have the initial ‘>’ character.

The archived chessclub file was almost identical with the chest file. The only differences found was that the chest file had two additional empty lines at the beginning, but it is not clear what significance this has. The chessclub file lacked the line ‘Standard: EOF’ at the very end.

The tim_mann file was the same as the chest file, except that it lacked the two empty lines that the chest file had at the beginning of the file. It also differed regarding line separators. Where the chest file has CR/LF, the timm_mann file has LF only.

The slightly earlier SANKit file was also examined, mainly to get an idea of what the original may have looked like. As to the technical content, it shows some differences but most of the changes were editorial. The number of differences suggest that the PGN standard was being actively developed just a few weeks before release.

The SANKit file uses LF for line separators. It lacks the two empty initial lines of the chest file. It ends with ‘Standard: EOF’.

Thus, the timm_mann file seems to be the best to use, even if the technical contents of all files are the same. The difference in line separators is a technically unimportant factor, but may be important for deciding if a file is closer to the original or not.

This seems to point at the tim_mann file probably being the best copy found with a day’s work. It is may not be the original, as it is dated 2001, about seven years after the original was published, so there might still be a even better copy somewhere out there to be found.

And just to ensure that the standard remains available if the tim_mann site should give up the ghost, a copy has been uploaded to: https://archive.org/details/pgn-standard-1994-03-12.

(File comparisons was made by Notepad++ 7.8.5 and the Compare plugin 2.0.0 with default settings.)